prompt_multimodal.py 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322
  1. """Multimodal analysis prompts for LightRAG.
  2. These templates are consumed by ``LightRAG.analyze_multimodal`` to produce
  3. modality-specific analysis JSON written into each sidecar item's
  4. ``llm_analyze_result``.
  5. Each template accepts the same variable set so the caller can format them
  6. uniformly:
  7. - ``language`` : target language for ``name`` / ``description`` outputs.
  8. - ``content`` : modality body (table JSON/HTML, equation LaTeX, etc.).
  9. Images pass an empty string and rely on ``image_inputs``.
  10. - ``captions`` : caption text or ``"n/a"``.
  11. - ``footnotes`` : joined footnotes string or ``"n/a"``.
  12. - ``leading`` : surrounding leading context or ``"n/a"``.
  13. - ``trailing`` : surrounding trailing context or ``"n/a"``.
  14. - ``item_id`` : sidecar item identifier (for diagnostics, not required by
  15. every template).
  16. - ``file_path`` : source document path (diagnostics only).
  17. The output schema differs by modality:
  18. - Image : ``{"name": str, "type": str, "description": str}``
  19. - Table : ``{"name": str, "description": str}``
  20. - Equation : ``{"name": str, "equation": str, "description": str}``
  21. Image ``type`` is restricted to :data:`IMAGE_TYPE_ENUM`; values outside the
  22. enum are folded into :data:`IMAGE_TYPE_FALLBACK` by the caller.
  23. """
  24. from __future__ import annotations
  25. IMAGE_TYPE_ENUM: tuple[str, ...] = (
  26. "Photo",
  27. "Illustration",
  28. "Screenshot",
  29. "Icon",
  30. "Chart",
  31. "Table",
  32. "Infographic",
  33. "Flowchart",
  34. "Chat Log",
  35. "Wireframe",
  36. "Texture",
  37. "Other",
  38. )
  39. IMAGE_TYPE_FALLBACK = "Other"
  40. MULTIMODAL_PROMPTS: dict[str, str] = {}
  41. MULTIMODAL_PROMPTS[
  42. "image_analysis"
  43. ] = """You are an expert image analyzer. Analyze the provided image and return a single JSON object describing its content.
  44. ================ INSTRUCTIONS ================
  45. 1. CONTENT RECOGNITION
  46. Examine the image carefully and identify:
  47. - The primary subject(s), scene, or composition.
  48. - Salient visual elements (objects, people, text overlays, diagrams, charts, screenshots, etc.).
  49. - Spatial layout when meaningful (e.g. left/right, foreground/background, panels of a figure).
  50. - Any visible text — quote it verbatim when short; summarize when long.
  51. - Color, style, or visual cues only when they materially aid interpretation.
  52. 2. USE OF ADDITIONAL CONTEXT
  53. The Additional Context section provides surrounding information that may help disambiguate the image's role in its source document:
  54. - Captions : caption attached to the image ("n/a" = none)
  55. - Footnotes : footnote attached to the image ("n/a" = none)
  56. - Leading Text : text appearing immediately BEFORE the image ("n/a" = none)
  57. - Trailing Text : text appearing immediately AFTER the image ("n/a" = none)
  58. Rules:
  59. - Use context to disambiguate abbreviations, units, named entities, and the image's purpose.
  60. - The IMAGE ITSELF takes priority when it conflicts with context — describe what is visible.
  61. - Only mention a relationship between the image and Leading/Trailing Text if it is clearly supported. If uncertain, omit it.
  62. - Captions, footnotes, leading text and trailing text must NOT be used to invent visual content not present in the image.
  63. 3. NAMING (`name`)
  64. - Produce a concise, distinctive name (3–8 words, snake_case preferred).
  65. - It should convey what the image depicts, not just "image".
  66. - Good examples: `crispr_cas9_workflow_diagram`, `q4_revenue_bar_chart`, `paris_eiffel_tower_photo`.
  67. - Bad examples: `image`, `figure`, `picture_1`.
  68. 4. TYPE (`type`)
  69. - Pick exactly one value from this fixed list (verbatim, case-sensitive):
  70. Photo, Illustration, Screenshot, Icon, Chart, Table, Infographic, Flowchart, Chat Log, Wireframe, Texture, Other
  71. - Choose the single best fit. Use `Other` when no listed type clearly applies.
  72. 5. DESCRIPTION (`description`, ≤ 500 words, natural prose — not bullets)
  73. Cover the following where applicable:
  74. - What the image depicts overall and what question/claim it visually supports.
  75. - The primary subject(s), their attributes, and any meaningful relationships between them.
  76. - Quantitative findings if the image is a chart/diagram (cite specific values when visible).
  77. - Visible text content that carries meaning (labels, annotations, axis titles).
  78. - Use specific proper nouns rather than pronouns whenever possible.
  79. - If the image clearly supports the surrounding context(leading or trailing text), briefly note that relationship at the end. Otherwise omit.
  80. 6. OUTPUT RULES
  81. - Return ONE valid JSON object only.
  82. - No surrounding markdown, no code fences, no preamble, no explanation.
  83. - All string values must be properly escaped JSON strings (escape `"` as `\\"`, newlines as `\\n`).
  84. - The output values for the JSON fields `name` and `description` must be written in `{language}`.
  85. ================ ADDITIONAL CONTEXT ================
  86. - Captions: {captions}
  87. - Footnotes: {footnotes}
  88. - Leading Text:
  89. ```
  90. {leading}
  91. ```
  92. - Trailing Text:
  93. ```
  94. {trailing}
  95. ```
  96. ================ OUTPUT FORMAT ================
  97. {{
  98. "name": "<concise distinctive name>",
  99. "type": "<one value from the fixed type list>",
  100. "description": "<interpretive description, ≤500 words>"
  101. }}
  102. Output:
  103. """
  104. MULTIMODAL_PROMPTS[
  105. "table_analysis"
  106. ] = """You are an expert table analyzer. The provided content contains table content in JSON or HTML format. Analyze it and return a single JSON object describing its structure and content.
  107. ================ INSTRUCTIONS ================
  108. 1. CONTENT RECOGNITION
  109. Read the table carefully and identify:
  110. - Overall structure: number of rows and columns, presence of merged cells, multi-level headers, row groupings, or totals/subtotals rows.
  111. - Column headers and (if present) row headers — capture their exact wording.
  112. - Units of measurement (%, $, ms, kg, etc.) and any scale indicators ("in millions", "×1000").
  113. - Key data points: maxima, minima, outliers, notable values, totals.
  114. - Patterns and trends across rows or columns (growth, decline, correlation, ranking).
  115. - Empty cells, "—", "N/A", or other null markers — preserve them as-is, do NOT fabricate values.
  116. - Footnote markers inside cells (e.g. "*", "†", "[1]") and what they refer to.
  117. 2. USE OF ADDITIONAL CONTEXT
  118. The Additional Context section provides surrounding information to help you understand the table's role in its source document:
  119. - Captions : the table's caption ("n/a" = none)
  120. - Footnotes : footnote attached to the table ("n/a" = none)
  121. - Leading Text : text appearing immediately BEFORE the table ("n/a" = none)
  122. - Trailing Text : text appearing immediately AFTER the table ("n/a" = none)
  123. Rules:
  124. - Use context to disambiguate column meanings, units, abbreviations, and entity names.
  125. - TABLE CONTENT TAKES PRIORITY over context when they conflict. Describe what you actually see; note the discrepancy only if it is material.
  126. - Only mention a relationship between the table and Leading/Trailing Text if it is clearly supported. If uncertain, omit it.
  127. - Captions, footnotes, leading text and trailing text may only be used for disambiguation purposes and must not be used to infer or fabricate content not present in TABLE CONTENT.
  128. - NEVER invent rows, columns, values, units, or entities that are not visible.
  129. 3. NAMING (`name`)
  130. - Produce a concise, distinctive name (3–8 words, snake_case preferred).
  131. - It should convey what the table is about, not just "table".
  132. - Good examples: `q4_2024_revenue_by_region`, `model_benchmark_accuracy_latency`, `patient_demographics_baseline`.
  133. - Bad examples: `table`, `data_table`, `results`.
  134. 4. DESCRIPTION (`description`, ≤ 500 words, natural prose — not bullets)
  135. Cover the following where applicable:
  136. - What the table is about and what question it answers.
  137. - What the rows represent and what the columns represent (the "shape" of the data).
  138. - Units, time range, and scope of the data.
  139. - The most important patterns, trends, comparisons, or outliers — cite specific values from the table to support each observation (e.g. "revenue grew from $1.2M in Q1 to $3.8M in Q4").
  140. - Any totals, subtotals, averages, or computed columns and what they reveal.
  141. - Use specific proper nouns (entity names, column names) instead of pronouns.
  142. - If the table clearly illustrates or supports the surrounding context(leading or trailing text), briefly note that relationship at the end. Otherwise omit.
  143. - Do not restate the table cell by cell or row by row; focus on interpretation.
  144. 5. OUTPUT RULES
  145. - Return ONE valid JSON object only.
  146. - No surrounding markdown, no code fences, no preamble, no explanation.
  147. - All string values must be properly escaped JSON strings (escape `"` as `\\"`, newlines as `\\n`).
  148. - The output values for the JSON fields `name` and `description` must be written in `{language}`.
  149. ================ TABLE CONTENT ================
  150. ```
  151. {content}
  152. ```
  153. ================ ADDITIONAL CONTEXT ================
  154. - Captions: {captions}
  155. - Footnotes: {footnotes}
  156. - Leading Text:
  157. ```
  158. {leading}
  159. ```
  160. - Trailing Text:
  161. ```
  162. {trailing}
  163. ```
  164. ================ OUTPUT FORMAT ================
  165. {{
  166. "name": "<concise distinctive name>",
  167. "description": "<interpretive description of the table, ≤500 words>"
  168. }}
  169. Output:
  170. """
  171. MULTIMODAL_PROMPTS[
  172. "equation_analysis"
  173. ] = """You are an expert analyzer of mathematical and chemical equations. The input is a TEXT-form equation written in LaTeX or Markdown. Analyze it and return a single JSON object describing its meaning and role.
  174. ================ INSTRUCTIONS ================
  175. 1. CONTENT RECOGNITION
  176. Read the equation carefully and identify:
  177. - The type of expression: definition, identity, equation to solve, inequality, differential / integral equation, recurrence, chemical reaction, balance equation, etc.
  178. - The mathematical or chemical meaning of the expression as a whole.
  179. - The variables, constants, operators, and functions that appear, and what each likely denotes given the surrounding context.
  180. - The application domain (e.g. classical mechanics, probability, thermodynamics, organic chemistry, machine learning loss function) inferred from context.
  181. - Any physical, statistical, or theoretical significance.
  182. - Whether the expression matches a well-known named formula (e.g. Bayes' theorem, Schrödinger equation, softmax, Michaelis–Menten). Name it explicitly when you are confident; do NOT guess.
  183. 2. USE OF ADDITIONAL CONTEXT
  184. The Additional Context section provides surrounding information to help you understand the equation's role in its source document:
  185. - Captions : the equation's caption or label ("n/a" = none)
  186. - Footnotes : footnote attached to the equation ("n/a" = none)
  187. - Leading Text : text appearing immediately BEFORE the equation ("n/a" = none)
  188. - Trailing Text : text appearing immediately AFTER the equation ("n/a" = none)
  189. Rules:
  190. - Use context to determine variable meanings, units, and the domain of discussion.
  191. - THE EQUATION ITSELF TAKES PRIORITY over context if they conflict; note the discrepancy if material.
  192. - Only mention a relationship between the equation and Leading/Trailing Text if it is clearly supported. If uncertain, omit it.
  193. - Captions, footnotes, leading text and trailing text may only be used for disambiguation purposes and must not be used to infer or fabricate content not present in EQUATION BODY.
  194. - NEVER invent variables, terms, or interpretations that are not justified by either the equation or the context.
  195. 3. NAMING (`name`)
  196. - Produce a concise, distinctive name (3–8 words, snake_case preferred).
  197. - It should convey what the equation IS or DOES, not just "equation".
  198. - Good examples:
  199. `bayes_theorem_posterior`
  200. `softmax_cross_entropy_loss`
  201. `ideal_gas_law`
  202. `michaelis_menten_rate`
  203. `combustion_of_methane`
  204. `quadratic_formula_roots`
  205. - Bad examples:
  206. `equation`, `formula`, `math`, `the_equation`, `eq_1`
  207. 4. NORMALIZED EQUATION (`equation`)
  208. - Output the math-mode BODY ONLY. Do NOT wrap in any delimiter or environment: no `$...$`, no `$$...$$`, no `\\(...\\)`, no `\\[...\\]`, no `\\begin{{equation}}...\\end{{equation}}`.
  209. - Strip those outer wrappers if present in the input.
  210. - KEEP semantic inner environments such as `aligned`, `cases`, `pmatrix`, `bmatrix`, `array`, `split` — they are part of the equation's structure, not delimiters.
  211. - If the input uses `\\begin{{align}}` or `\\begin{{align*}}`, convert to `\\begin{{aligned}}`.
  212. - Strip equation numbering (`\\tag{{...}}`, automatic numbers from `align`/`equation`).
  213. - Preserve all symbols, subscripts, superscripts, and operators faithfully. Do NOT simplify or rename variables.
  214. - Convert Markdown / plain-text / Unicode math to standard LaTeX (`x^2` → `x^{{2}}`, `sqrt(a)` → `\\sqrt{{a}}`, `≤` → `\\leq`, `α` → `\\alpha`).
  215. - For chemical equations, use `mhchem`: `\\ce{{2H2 + O2 -> 2H2O}}`.
  216. - If multiple independent equations appear together, join them with `\\\\` inside a single `\\begin{{aligned}}...\\end{{aligned}}` and note the grouping in `description`.
  217. 5. DESCRIPTION (`description`, ≤ 300 words, natural prose — not bullets)
  218. Cover the following where applicable:
  219. - What the equation expresses and what problem it addresses.
  220. - Its role in the surrounding text (e.g. defines a quantity, states a constraint, derives a result, models a phenomenon).
  221. - The named formula it corresponds to, if any, and where it is commonly used.
  222. - Briefly clarify only those symbols whose meaning is non-obvious or domain-specific, OR whose meaning is fixed by the Leading/Trailing Text. Do NOT enumerate every symbol mechanically.
  223. - Use specific proper nouns (variable names, entity names) instead of pronouns.
  224. - If the equation clearly illustrates or supports the surrounding context(leading or trailing text), briefly note that relationship at the end. Otherwise omit.
  225. 6. OUTPUT RULES
  226. - Return ONE valid JSON object only.
  227. - No surrounding markdown, no code fences, no preamble, no explanation.
  228. - All string values must be properly escaped JSON strings (escape `"` as `\\"`, escape backslashes as `\\\\`, newlines as `\\n`).
  229. - LaTeX backslashes inside the `equation` string must be double-escaped (e.g. `\\frac{{a}}{{b}}` is written as `"\\\\frac{{a}}{{b}}"` in the JSON).
  230. - If the input uses `\\begin{{align}}` or `\\begin{{align*}}`, convert to `\\begin{{aligned}}` in the output (since the outer display wrapper is stripped).
  231. - The output values for the JSON fields `name` and `description` must be written in `{language}`.
  232. ================ EQUATION BODY ================
  233. ```
  234. {content}
  235. ```
  236. ================ ADDITIONAL CONTEXT ================
  237. - Captions: {captions}
  238. - Footnotes: {footnotes}
  239. - Leading Text:
  240. ```
  241. {leading}
  242. ```
  243. - Trailing Text:
  244. ```
  245. {trailing}
  246. ```
  247. ================ OUTPUT FORMAT ================
  248. {{
  249. "name": "<concise distinctive name>",
  250. "equation": "<normalized LaTeX, math-mode body only>",
  251. "description": "<interpretive description, ≤300 words>"
  252. }}
  253. Output:
  254. """
  255. __all__ = [
  256. "IMAGE_TYPE_ENUM",
  257. "IMAGE_TYPE_FALLBACK",
  258. "MULTIMODAL_PROMPTS",
  259. ]