Problem
For long documents that route through PageIndex (doc_type: pageindex), images are correctly extracted to wiki/sources/images/<doc_name>/ and referenced with correct wiki-relative paths inside wiki/sources/<doc_name>.json (each page object has "images": [{"path": "sources/images/<doc>/pX_imgY.png"}], and the paths are also inlined as  in each page's content).
However, tree_renderer.py's render_summary_md() — which builds wiki/summaries/<doc_name>.md, the actual page a user opens in Obsidian — never reads or embeds any of this. Its per-node renderer (_render_nodes_summary) explicitly strips ![]() syntax found in node["text"] (necessary, since PageIndex's own embedded refs point into a private .openkb/files/{doc_id}/images/... cache that doesn't resolve from the wiki), but never re-inserts the correctly-pathed images that live in the page JSON.
Net effect: images are on disk, and technically "referenced" in a JSON data file, but invisible everywhere a human actually browses the vault — not in the summary, not in any concept/entity page, not in index.md. wiki/sources/<doc_name>.json isn't rendered as a wiki page by Obsidian (or anything else), so those references are effectively inert.
Reproduction
openkb add a PDF long enough to trigger PageIndex (pageindex_threshold, default 20 pages), with no PAGEINDEX_API_KEY set (so it falls back to local pymupdf extraction, which does extract images — see images.py:convert_pdf_to_pages).
- Open the resulting
wiki/summaries/<doc>.md in Obsidian, or grep '!\[' wiki/summaries/*.md wiki/concepts/*.md wiki/entities/*.md wiki/index.md.
- Zero image references anywhere, despite
wiki/sources/images/<doc>/ containing real extracted files and wiki/sources/<doc>.json referencing them correctly.
Confirmed on openkb 0.4.2 with a 31-page manual (35 images extracted across 21 pages, 0 surfaced in the summary).
Relation to existing issues
Suggested fix
_write_long_doc_artifacts in indexer.py already has the per-page pages list (with images) in scope when it calls render_summary_md — it's just not passed through. render_summary_md/_render_nodes_summary could accept that list, build a page_num -> [image paths] map, and embed each node's page-range images inline (tracking already-emitted paths the same way duplicate summaries are already collapsed, so a page split across many sibling nodes doesn't repeat the same figure at every one of them).
Happy to share a working patch/diff if useful — implemented and verified this locally against a real ingest (35/35 images now appear in the rendered summary, none duplicated across sibling nodes on the same page).
Problem
For long documents that route through PageIndex (
doc_type: pageindex), images are correctly extracted towiki/sources/images/<doc_name>/and referenced with correct wiki-relative paths insidewiki/sources/<doc_name>.json(each page object has"images": [{"path": "sources/images/<doc>/pX_imgY.png"}], and the paths are also inlined asin each page'scontent).However,
tree_renderer.py'srender_summary_md()— which buildswiki/summaries/<doc_name>.md, the actual page a user opens in Obsidian — never reads or embeds any of this. Its per-node renderer (_render_nodes_summary) explicitly strips![]()syntax found innode["text"](necessary, since PageIndex's own embedded refs point into a private.openkb/files/{doc_id}/images/...cache that doesn't resolve from the wiki), but never re-inserts the correctly-pathed images that live in the page JSON.Net effect: images are on disk, and technically "referenced" in a JSON data file, but invisible everywhere a human actually browses the vault — not in the summary, not in any concept/entity page, not in
index.md.wiki/sources/<doc_name>.jsonisn't rendered as a wiki page by Obsidian (or anything else), so those references are effectively inert.Reproduction
openkb adda PDF long enough to trigger PageIndex (pageindex_threshold, default 20 pages), with noPAGEINDEX_API_KEYset (so it falls back to local pymupdf extraction, which does extract images — seeimages.py:convert_pdf_to_pages).wiki/summaries/<doc>.mdin Obsidian, orgrep '!\[' wiki/summaries/*.md wiki/concepts/*.md wiki/entities/*.md wiki/index.md.wiki/sources/images/<doc>/containing real extracted files andwiki/sources/<doc>.jsonreferencing them correctly.Confirmed on
openkb0.4.2 with a 31-page manual (35 images extracted across 21 pages, 0 surfaced in the summary).Relation to existing issues
Suggested fix
_write_long_doc_artifactsinindexer.pyalready has the per-pagepageslist (with images) in scope when it callsrender_summary_md— it's just not passed through.render_summary_md/_render_nodes_summarycould accept that list, build apage_num -> [image paths]map, and embed each node's page-range images inline (tracking already-emitted paths the same way duplicate summaries are already collapsed, so a page split across many sibling nodes doesn't repeat the same figure at every one of them).Happy to share a working patch/diff if useful — implemented and verified this locally against a real ingest (35/35 images now appear in the rendered summary, none duplicated across sibling nodes on the same page).