From cba7592b8c7d17c70343a39f4fb82c37766ef03d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=89ole=20PASKALI?= <eole.p7@gmail.com>
Date: Sat, 20 Jun 2026 11:21:52 +0200
Subject: [PATCH] docs: document RAG file context scope

---
 docs/features/chat-conversations/rag/index.md     | 15 +++++++++++++++
 .../extensibility/plugin/functions/filter.mdx     |  2 +-
 docs/reference/env-configuration.mdx              | 12 +++++++++++-
 docs/troubleshooting/rag.mdx                      |  2 ++
 4 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/docs/features/chat-conversations/rag/index.md b/docs/features/chat-conversations/rag/index.md
index cbc9ebd92..e56435649 100644
--- a/docs/features/chat-conversations/rag/index.md
+++ b/docs/features/chat-conversations/rag/index.md
@@ -43,6 +43,21 @@ Web pages often contain extraneous information such as navigation and footer. Fo
 
 Customize the RAG template from the `Admin Panel` > `Settings` > `Documents` menu.
 
+## File Context Scope
+
+Admins can choose how attached non-image files are added to the prompt from `Admin Panel` > `Settings` > `Documents` > `File Context Scope`.
+
+| Scope | Behavior | Use when |
+|---|---|---|
+| **Conversation** (default) | Collects attached file context at the conversation level and injects it into the latest user message. | You want each new prompt to include the currently attached file context. |
+| **Message** | Scopes attached file context to the user message that originally included the file. | Users compare several files over multiple turns, or refer back to "this document", "the previous document", or "the first document". |
+
+Message scope can also improve prompt-cache reuse because older attached document context stays in a more stable position instead of moving into the latest prompt on every turn.
+
+:::note
+`RAG_SYSTEM_CONTEXT=True` takes priority over File Context Scope. When system-context mode is enabled, RAG context is injected into the system message and the File Context Scope selector is hidden.
+:::
+
 ## Markdown Header Splitting
 
 When enabled, documents are first split by markdown headers (H1-H6). This preserves document structure and ensures that sections under the same header are kept together when possible. The resulting chunks are then further processed by the standard character or token splitter.
diff --git a/docs/features/extensibility/plugin/functions/filter.mdx b/docs/features/extensibility/plugin/functions/filter.mdx
index 270da77e1..892305d5e 100644
--- a/docs/features/extensibility/plugin/functions/filter.mdx
+++ b/docs/features/extensibility/plugin/functions/filter.mdx
@@ -146,7 +146,7 @@ The chip being present = the filter is enabled for the next request. The chip be
 
 ### Owning Retrieval With file_handler
 
-By default, when a user attaches a knowledge collection or uploads a file to a chat, Open WebUI runs the built-in RAG pipeline **after** every inlet filter has returned. The chat-completion handler queries the vector DB for chunks relevant to the user's last message, wraps them in `<source>` tags, appends them to the last user message (or to a system message, depending on `RAG_SYSTEM_CONTEXT`), and only then calls the LLM.
+By default, when a user attaches a knowledge collection or uploads a file to a chat, Open WebUI runs the built-in RAG pipeline **after** every inlet filter has returned. The chat-completion handler queries the vector DB for chunks relevant to the user's message, wraps them in `<source>` tags, appends them to the latest user message, to the file-owning user message when `RAG_FILE_CONTEXT_SCOPE=message`, or to a system message when `RAG_SYSTEM_CONTEXT` is enabled, and only then calls the LLM.
 
 This is important to understand for filter authors: at `inlet()` time, `body["metadata"]["files"]` and `body["files"]` contain only the file/collection *references* (IDs, names, types). **The chunk text doesn't exist yet** — retrieval hasn't happened. So if you want to inspect or transform the chunks themselves (PII / PHI redaction, reranking, custom hybrid scoring, translation, chunk-level access control, anonymization), the standard inlet contract is not enough — the data you want isn't there yet.
 
diff --git a/docs/reference/env-configuration.mdx b/docs/reference/env-configuration.mdx
index fbaa98f01..25fe1845a 100644
--- a/docs/reference/env-configuration.mdx
+++ b/docs/reference/env-configuration.mdx
@@ -3768,11 +3768,21 @@ Strictly return in JSON format:
 - Description: Specifies whether to use the full context for RAG.
 - Persistence: This environment variable is a `ConfigVar` variable.
 
+#### `RAG_FILE_CONTEXT_SCOPE`
+
+- Type: `str`
+- Default: `conversation`
+- Accepted values:
+  - `conversation` - Attached file context is collected at the conversation level and injected into the latest user message.
+  - `message` - Attached file context is scoped to the user message that owns the file attachment.
+- Description: Controls where non-image attached file context is injected when RAG runs. Use `message` for chats that compare or reference multiple attached files over several turns, so each file's context stays with the message where it was uploaded. If `RAG_SYSTEM_CONTEXT=True`, this setting is ignored and RAG context is injected into the system message instead.
+- Persistence: This environment variable is a `ConfigVar` variable.
+
 #### `RAG_SYSTEM_CONTEXT`
 
 - Type: `bool`
 - Default: `False`
-- Description: When enabled, injects RAG context into the **system message** instead of the user message. For models that support **KV prefix caching** or **Prompt Caching** (local engines like Ollama, llama.cpp or vLLM, and cloud providers like OpenAI and Vertex AI), this keeps the context at a stable position at the start of the conversation, so the cache can persist across turns. When disabled (default), context is injected into the user message, which shifts position each turn and invalidates the cache.
+- Description: When enabled, injects RAG context into the **system message** instead of the user message. For models that support **KV prefix caching** or **Prompt Caching** (local engines like Ollama, llama.cpp or vLLM, and cloud providers like OpenAI and Vertex AI), this keeps the context at a stable position at the start of the conversation, so the cache can persist across turns. When disabled (default), context is injected into user messages according to `RAG_FILE_CONTEXT_SCOPE`.
 
 #### `ENABLE_RAG_LOCAL_WEB_FETCH`
 
diff --git a/docs/troubleshooting/rag.mdx b/docs/troubleshooting/rag.mdx
index 83668dcbd..91453b4c4 100644
--- a/docs/troubleshooting/rag.mdx
+++ b/docs/troubleshooting/rag.mdx
@@ -236,6 +236,8 @@ If your initial response is fast but follow-up questions become increasingly slo
 - This injects the RAG context into the **system message**, which stays at a fixed position at the start of the conversation.
 - This allows providers to effectively use **KV prefix caching** or **Prompt Caching**, resulting in nearly instant follow-up responses even with large documents.
 
+If you want attached file context to stay with the message that introduced each file, set **File Context Scope** to **Message** in **Admin Settings > Documents**, or set `RAG_FILE_CONTEXT_SCOPE=message`. This can help multi-file chats where users refer back to "the first document" or "the previous document". `RAG_SYSTEM_CONTEXT=True` takes priority over this setting.
+
 ---
 
 | Problem | Fix |