LCORE-1037: Update the BYOK guide to use Lightspeed stack config instead of run.yaml#1838
Conversation
Replace all run.yaml references with lightspeed-stack.yaml byok_rag configuration. Users should no longer edit run.yaml directly — the Lightspeed Stack service auto-generates Llama Stack config at startup from the byok_rag and rag sections in lightspeed-stack.yaml. - Rewrite BYOK guide Step 4 to use byok_rag as primary config path - Add field reference table for all byok_rag options - Rewrite RAG guide vector store sections with byok_rag examples - Replace full run.yaml config examples with lightspeed-stack.yaml format - Add embedding_model field to BYOK example config - Update Step 3 to reference byok_rag embedding_model field Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughThis PR updates BYOK and RAG configuration documentation to align with the new ChangesBYOK and RAG Configuration Documentation Update
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/byok_guide.md`:
- Around line 397-400: The blockquote note starting with the marker "[!NOTE]"
has an extra blank line inside the block which breaks markdownlint MD028; remove
the empty line so all lines in the note are contiguous '>'-prefixed lines (i.e.,
keep the "[!NOTE]" marker and the following lines like "For pgvector, ensure
your PostgreSQL credentials..." and "(e.g., `POSTGRES_PASSWORD`)." without any
blank line between them) to satisfy the linter.
In `@docs/rag_guide.md`:
- Line 227: Replace the phrase "OpenAI compatible" with the hyphenated compound
adjective "OpenAI-compatible" in the sentence that reads "While Ollama also
exposes an OpenAI compatible endpoint..." so the docs use "OpenAI-compatible"
for correct grammar and consistency; update that exact occurrence in
docs/rag_guide.md.
- Line 385: The heading "RAG annotations" lacks a preceding blank line (MD022);
update the markdown so there is one empty line immediately before the line
starting with "# RAG annotations" to satisfy the MD022 rule and ensure the
heading is properly separated from the prior content.
- Around line 91-94: The NOTE block beginning with "[!NOTE]" that mentions
pgvector and LCORE-2437 contains an extra blank quoted line; edit that block
(the "[!NOTE]" block containing the sentence "pgvector is not yet supported via
`byok_rag`..." and the following "It must be configured directly...") and remove
the empty/blank line between the quoted lines so the blockquote has no blank
quoted line (MD028).
In `@examples/lightspeed-stack-byok-okp-rag.yaml`:
- Around line 40-42: The embedding_dimension for entries using embedding_model
"sentence-transformers/all-mpnet-base-v2" in the byok_rag configuration is
incorrect (set to 1024 and 384); update both BYOK FAISS stores (ocp-docs and
knowledge-base) to use embedding_dimension 768 to match the model's hidden size
and avoid incompatible vector sizes at runtime—search for occurrences of
embedding_model: sentence-transformers/all-mpnet-base-v2 and set the
corresponding embedding_dimension fields to 768.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d90172e1-09b6-46bc-8307-b3a42a1da0b4
📒 Files selected for processing (3)
docs/byok_guide.mddocs/rag_guide.mdexamples/lightspeed-stack-byok-okp-rag.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: unit_tests (3.12)
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: unit_tests (3.13)
- GitHub Check: Pylinter
- GitHub Check: build-pr
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
examples/lightspeed-stack-byok-okp-rag.yaml
🪛 LanguageTool
docs/rag_guide.md
[grammar] ~227-~227: Use a hyphen to join words.
Context: ...G. While Ollama also exposes an OpenAI compatible endpoint that supports tool c...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.22.1)
docs/byok_guide.md
[warning] 400-400: Blank line inside blockquote
(MD028, no-blanks-blockquote)
docs/rag_guide.md
[warning] 94-94: Blank line inside blockquote
(MD028, no-blanks-blockquote)
[warning] 385-385: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above
(MD022, blanks-around-headings)
|
|
||
| The `remote::ollama` provider can be used for inference. However, it does not support tool calling, including RAG. | ||
| While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot be used with `llama-stack` due to current limitations in the `remote::openai` provider. | ||
| While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider. |
There was a problem hiding this comment.
Use hyphenated compound adjective (“OpenAI-compatible”).
This improves grammar and consistency in user-facing docs.
Suggested fix
-While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.
+While Ollama also exposes an OpenAI-compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider. | |
| While Ollama also exposes an OpenAI-compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider. |
🧰 Tools
🪛 LanguageTool
[grammar] ~227-~227: Use a hyphen to join words.
Context: ...G. While Ollama also exposes an OpenAI compatible endpoint that supports tool c...
(QB_NEW_EN_HYPHEN)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/rag_guide.md` at line 227, Replace the phrase "OpenAI compatible" with
the hyphenated compound adjective "OpenAI-compatible" in the sentence that reads
"While Ollama also exposes an OpenAI compatible endpoint..." so the docs use
"OpenAI-compatible" for correct grammar and consistency; update that exact
occurrence in docs/rag_guide.md.
| embedding_model: sentence-transformers/all-mpnet-base-v2 | ||
| embedding_dimension: 1024 | ||
| vector_db_id: vs_123 # Llama-stack vector_store_id | ||
| vector_db_id: vs_123 # Vector store ID (from index generation) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Confirm declared dimensions in repo examples/docs
rg -n "embedding_model:\s*sentence-transformers/all-mpnet-base-v2|embedding_dimension:" -C1
# 2) Check model config dimension from Hugging Face
curl -fsSL https://huggingface.co/sentence-transformers/all-mpnet-base-v2/raw/main/config.json | jq '.hidden_size'Repository: lightspeed-core/lightspeed-stack
Length of output: 11601
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show exact YAML block for the affected entries
sed -n '30,60p' examples/lightspeed-stack-byok-okp-rag.yaml | nl -baRepository: lightspeed-core/lightspeed-stack
Length of output: 119
🏁 Script executed:
#!/bin/bash
set -euo pipefail
awk 'NR>=30 && NR<=60 {printf "%5d\t%s\n", NR, $0}' examples/lightspeed-stack-byok-okp-rag.yamlRepository: lightspeed-core/lightspeed-stack
Length of output: 1413
Fix embedding_dimension mismatch for sentence-transformers/all-mpnet-base-v2 in byok_rag.
In examples/lightspeed-stack-byok-okp-rag.yaml, both BYOK FAISS stores use sentence-transformers/all-mpnet-base-v2 but set incompatible dimensions (1024 for ocp-docs and 384 for knowledge-base). The model’s hidden size is 768, and the repo/docs use embedding_dimension: 768 for this model—otherwise stored/query vectors can be incompatible at runtime.
embedding_model: sentence-transformers/all-mpnet-base-v2
embedding_dimension: 1024Suggested fix
byok_rag:
- rag_id: ocp-docs
rag_type: inline::faiss
embedding_model: sentence-transformers/all-mpnet-base-v2
- embedding_dimension: 1024
+ embedding_dimension: 768
vector_db_id: vs_123
db_path: /tmp/ocp.faiss
@@
- rag_id: knowledge-base
rag_type: inline::faiss
embedding_model: sentence-transformers/all-mpnet-base-v2
- embedding_dimension: 384
+ embedding_dimension: 768
vector_db_id: vs_456
db_path: /tmp/kb.faiss🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/lightspeed-stack-byok-okp-rag.yaml` around lines 40 - 42, The
embedding_dimension for entries using embedding_model
"sentence-transformers/all-mpnet-base-v2" in the byok_rag configuration is
incorrect (set to 1024 and 384); update both BYOK FAISS stores (ocp-docs and
knowledge-base) to use embedding_dimension 768 to match the model's hidden size
and avoid incompatible vector sizes at runtime—search for occurrences of
embedding_model: sentence-transformers/all-mpnet-base-v2 and set the
corresponding embedding_dimension fields to 768.
There was a problem hiding this comment.
@are-ces, sure! Planning the fix to correct both embedding_dimension values to 768 (the actual output dimension of sentence-transformers/all-mpnet-base-v2).
- Update
embedding_dimensionfrom1024→768for theocp-docsentry - Update
embedding_dimensionfrom384→768for theknowledge-baseentry
🧠 Learnings used
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Failed to handle agent chat message. Please try again.
Replace remaining Llama Stack mentions with generic terms — users should not need to know about the underlying Llama Stack layer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
105e002 to
82ddcf0
Compare
|
|
||
| ```bash | ||
| mkdir ./embeddings_model | ||
| pdm run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2 |
|
|
||
| > [!NOTE] | ||
| > Your LLM inference provider (e.g., OpenAI, vLLM) must also be configured in your `run.yaml`. | ||
| > For OpenAI, set the `OPENAI_API_KEY` environment variable. |
There was a problem hiding this comment.
This doesn't seem relevant to BYOK...
| > pgvector is not yet supported via `byok_rag` in `lightspeed-stack.yaml` (see [LCORE-2437](https://redhat.atlassian.net/browse/LCORE-2437)). | ||
| > It must be configured directly in the Llama Stack configuration file. | ||
|
|
||
| > You will need to install PostgreSQL with a matching version to pgvector, then log in with `psql` and enable the extension with: |
There was a problem hiding this comment.
matching version of pgvector?
| While Ollama also exposes an OpenAI compatible endpoint that supports tool calling, it cannot currently be used due to limitations in the `remote::openai` provider. | ||
|
|
||
| There is an [ongoing discussion](https://github.com/meta-llama/llama-stack/discussions/3034) about enabling tool calling with Ollama. | ||
| Tool calling with Ollama is not yet supported. |
There was a problem hiding this comment.
By now it's pretty clear that Ollama doesn't support tool calling :)
|
|
||
| --- | ||
|
|
||
| # References |
Description
Update the BYOK and RAG guides so users configure BYOK knowledge sources via the
byok_ragsection inlightspeed-stack.yamlinstead of editingrun.yamldirectly. The required configuration is now auto-generated at startup bymake run,make run-stack,docker-compose, and library mode.Key changes:
run.yamlBYOK/RAG snippets withbyok_ragentries inlightspeed-stack.yamlbyok_ragoptionsbyok_rag(see LCORE-2437) and must be configured directly in the Llama Stack configrun.yamlembedding_modelfieldType of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
docs/byok_guide.mdanddocs/rag_guide.mdbyok_ragfield names matchsrc/models/config.py:ByokRagrun.yamlreferences for BYOK configurationSummary by CodeRabbit