Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/features/chat-conversations/rag/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,21 @@ Web pages often contain extraneous information such as navigation and footer. Fo

Customize the RAG template from the `Admin Panel` > `Settings` > `Documents` menu.

## File Context Scope

Admins can choose how attached non-image files are added to the prompt from `Admin Panel` > `Settings` > `Documents` > `File Context Scope`.

| Scope | Behavior | Use when |
|---|---|---|
| **Conversation** (default) | Collects attached file context at the conversation level and injects it into the latest user message. | You want each new prompt to include the currently attached file context. |
| **Message** | Scopes attached file context to the user message that originally included the file. | Users compare several files over multiple turns, or refer back to "this document", "the previous document", or "the first document". |

Message scope can also improve prompt-cache reuse because older attached document context stays in a more stable position instead of moving into the latest prompt on every turn.

:::note
`RAG_SYSTEM_CONTEXT=True` takes priority over File Context Scope. When system-context mode is enabled, RAG context is injected into the system message and the File Context Scope selector is hidden.
:::

## Markdown Header Splitting

When enabled, documents are first split by markdown headers (H1-H6). This preserves document structure and ensures that sections under the same header are kept together when possible. The resulting chunks are then further processed by the standard character or token splitter.
Expand Down
2 changes: 1 addition & 1 deletion docs/features/extensibility/plugin/functions/filter.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ The chip being present = the filter is enabled for the next request. The chip be

### Owning Retrieval With file_handler

By default, when a user attaches a knowledge collection or uploads a file to a chat, Open WebUI runs the built-in RAG pipeline **after** every inlet filter has returned. The chat-completion handler queries the vector DB for chunks relevant to the user's last message, wraps them in `<source>` tags, appends them to the last user message (or to a system message, depending on `RAG_SYSTEM_CONTEXT`), and only then calls the LLM.
By default, when a user attaches a knowledge collection or uploads a file to a chat, Open WebUI runs the built-in RAG pipeline **after** every inlet filter has returned. The chat-completion handler queries the vector DB for chunks relevant to the user's message, wraps them in `<source>` tags, appends them to the latest user message, to the file-owning user message when `RAG_FILE_CONTEXT_SCOPE=message`, or to a system message when `RAG_SYSTEM_CONTEXT` is enabled, and only then calls the LLM.
Comment thread
Eole7 marked this conversation as resolved.

This is important to understand for filter authors: at `inlet()` time, `body["metadata"]["files"]` and `body["files"]` contain only the file/collection *references* (IDs, names, types). **The chunk text doesn't exist yet** — retrieval hasn't happened. So if you want to inspect or transform the chunks themselves (PII / PHI redaction, reranking, custom hybrid scoring, translation, chunk-level access control, anonymization), the standard inlet contract is not enough — the data you want isn't there yet.

Expand Down
12 changes: 11 additions & 1 deletion docs/reference/env-configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3768,11 +3768,21 @@ Strictly return in JSON format:
- Description: Specifies whether to use the full context for RAG.
- Persistence: This environment variable is a `ConfigVar` variable.

#### `RAG_FILE_CONTEXT_SCOPE`

- Type: `str`
- Default: `conversation`
- Accepted values:
- `conversation` - Attached file context is collected at the conversation level and injected into the latest user message.
- `message` - Attached file context is scoped to the user message that owns the file attachment.
- Description: Controls where non-image attached file context is injected when RAG runs. Use `message` for chats that compare or reference multiple attached files over several turns, so each file's context stays with the message where it was uploaded. If `RAG_SYSTEM_CONTEXT=True`, this setting is ignored and RAG context is injected into the system message instead.
- Persistence: This environment variable is a `ConfigVar` variable.

#### `RAG_SYSTEM_CONTEXT`

- Type: `bool`
- Default: `False`
- Description: When enabled, injects RAG context into the **system message** instead of the user message. For models that support **KV prefix caching** or **Prompt Caching** (local engines like Ollama, llama.cpp or vLLM, and cloud providers like OpenAI and Vertex AI), this keeps the context at a stable position at the start of the conversation, so the cache can persist across turns. When disabled (default), context is injected into the user message, which shifts position each turn and invalidates the cache.
- Description: When enabled, injects RAG context into the **system message** instead of the user message. For models that support **KV prefix caching** or **Prompt Caching** (local engines like Ollama, llama.cpp or vLLM, and cloud providers like OpenAI and Vertex AI), this keeps the context at a stable position at the start of the conversation, so the cache can persist across turns. When disabled (default), context is injected into user messages according to `RAG_FILE_CONTEXT_SCOPE`.
Comment thread
Eole7 marked this conversation as resolved.

#### `ENABLE_RAG_LOCAL_WEB_FETCH`

Expand Down
2 changes: 2 additions & 0 deletions docs/troubleshooting/rag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,8 @@ If your initial response is fast but follow-up questions become increasingly slo
- This injects the RAG context into the **system message**, which stays at a fixed position at the start of the conversation.
- This allows providers to effectively use **KV prefix caching** or **Prompt Caching**, resulting in nearly instant follow-up responses even with large documents.

If you want attached file context to stay with the message that introduced each file, set **File Context Scope** to **Message** in **Admin Settings > Documents**, or set `RAG_FILE_CONTEXT_SCOPE=message`. This can help multi-file chats where users refer back to "the first document" or "the previous document". `RAG_SYSTEM_CONTEXT=True` takes priority over this setting.
Comment thread
Eole7 marked this conversation as resolved.

---

| Problem | Fix |
Expand Down