No chunking for markdown files — entire document sent as single LLM message, fails on context overflow

## Problem

OpenKB sends the **entire source document as one LLM message** when compiling markdown files. There is no splitting, truncation, or chunking strategy for `.md` sources.

In `agent/compiler.py`, `compile_short_doc()` injects the full document text into a single prompt:

```python
content = source_path.read_text(encoding="utf-8")
doc_msg = {"role": "user", "content": _cached_text(_SUMMARY_USER.format(
    doc_name=doc_name, content=content,
))}
```

The long-document path (`index_long_document()` via `PageIndex`) only triggers for **PDFs** with page count ≥ `pageindex_threshold` (default 20). Markdown files have no such fallback — they always take the short-doc code path regardless of size.

## Impact: Unusable with local/private LLMs

The README and defaults suggest models like `gpt-5.4-mini`, implying cloud-scale models with 1M+ token context windows. But for **private knowledge bases** — the stated use case — users often need to run **local models** with context windows of 4K–32K tokens.

With a 32K context model, any markdown file exceeding ~24K tokens (roughly 96K characters / ~30 pages) will:
- Fail outright with a context-length-exceeded API error, or
- Produce truncated/garbled output when the LLM silently drops the tail of the prompt

For reference, our corpus of 3GPP technical specification documents includes converted markdown files ranging from a few KB to **14 MB**. We had 3 documents that could never be ingested because they exceed any reasonable context window. Even mid-sized documents (~50 pages) are risky with 8K–16K context models.

## What a fix could look like

A chunking strategy for markdown (and other text-based formats) similar to what other wiki frameworks implement:

1. **Heading-aware splitting**: Split on `#`/`##`/`###` boundaries so chunks respect document structure
2. **Token-aware sizing**: Estimate token count per chunk and split when exceeding a configurable threshold (e.g., 75% of model context)
3. **Hierarchical synthesis**: Summarize each chunk individually, then synthesize chunk summaries into a final document summary
4. **Graceful degradation**: At minimum, truncate with a `[...truncated at N tokens...]` marker instead of sending an oversized prompt that will fail

The existing `pageindex_threshold` config key could be extended to apply to markdown files (e.g., character or token count threshold), or a new config key could be introduced.

## Environment

- OpenKB version: latest (pip)
- Model: `deepseek-v4-flash` via Ollama cloud (128K context, but reasoning tokens consume significant budget)
- Document corpus: 475 markdown files converted from 3GPP ATIAS specifications
- 3 documents too large for any practical context window (3–14 MB)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No chunking for markdown files — entire document sent as single LLM message, fails on context overflow #73

Problem

Impact: Unusable with local/private LLMs

What a fix could look like

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

No chunking for markdown files — entire document sent as single LLM message, fails on context overflow #73

Description

Problem

Impact: Unusable with local/private LLMs

What a fix could look like

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions