Feature: Optional image understanding / vision for inline and referenced images

## Feature Request

OpenKB currently treats images in documents as **text-only syntax** — the LLM sees `![alt](path/to/image.png)` but never the actual image content. This significantly reduces knowledge base quality for technical and scientific documents where figures, diagrams, and charts carry essential information.

## Current Behavior

### Image path rewriting (`images.py`)
- `copy_relative_images()` scans for `![alt](relative/path)` references
- Copies referenced image files into `wiki/sources/images/<doc_name>/`
- Rewrites links to `sources/images/<doc_name>/<filename>`
- Skips images not found on disk, http/https/data: URIs

### During LLM compilation
- The markdown with image references is sent to the LLM as **plain text**
- The LLM sees `![Figure 3: SBA rendering pipeline](sources/images/doc_name/fig3.png)` but cannot see the actual image
- No image bytes are ever sent to the LLM
- No vision/multimodal capability is used

## Why This Matters

For technical and scientific documents — the kind that benefit most from a knowledge base — figures are often **irreplaceable**:

- **Architecture diagrams**: Show signal flow, system topology, protocol stacks
- **Tables rendered as images**: Contain normative reference data that isn't in the text
- **Charts and plots**: Performance benchmarks, measurement results
- **Schematics**: Circuit diagrams, filter responses, encoder block diagrams

A knowledge base that ignores all of this produces summaries and concept articles that are missing critical information. For example, a 3GPP spec document on "Immersive Audio Rendering" might have 15+ figures showing rendering pipelines, binaural processing chains, and speaker layouts — none of which would be captured.

## Proposed Solution (Optional / Configurable)

Since not all users need image understanding (and it requires a vision-capable model), this should be **opt-in**:

1. **Config flag**: `image_understanding: true` (default: `false`)
2. **Detection**: During compilation, identify `![]()` references in the markdown
3. **Vision pass**: For each referenced image file found on disk, send the image to a vision-capable LLM with a prompt like: *"Describe this figure from document {doc_name}. Include: caption, what it depicts, key information conveyed, visible text/labels, related concepts."*
4. **Injection**: Prepend the vision-generated description as a text block before the image reference in the prompt sent to the summarization LLM
5. **Wiki output**: Include the description in the generated summary/concept pages alongside the image reference

This approach is framework-agnostic — it works with any vision-capable model (GPT-4o, Claude 3.5+, Gemini, LLaVA via local Ollama, etc.) and doesn't require changes to the wiki output format.

## Alternative (Minimal)

If full vision integration is too complex, a simpler approach: add an `image_caption_step` config that lets the user provide pre-generated captions in a sidecar file (e.g., `doc_name.images.yaml`), which get injected into the LLM prompt. This avoids the vision dependency entirely while still giving the LLM access to image content descriptions.

## Environment

- Document corpus: 3GPP ATIAS technical specifications (converted PDF → markdown with inline image references)
- Many documents contain critical figures (protocol diagrams, test setups, signal flow charts) that are essential for understanding the content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Optional image understanding / vision for inline and referenced images #74

Feature Request

Current Behavior

Image path rewriting (`images.py`)

During LLM compilation

Why This Matters

Proposed Solution (Optional / Configurable)

Alternative (Minimal)

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Optional image understanding / vision for inline and referenced images #74

Description

Feature Request

Current Behavior

Image path rewriting (images.py)

During LLM compilation

Why This Matters

Proposed Solution (Optional / Configurable)

Alternative (Minimal)

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Image path rewriting (`images.py`)