engine: rename pageindex→treewalk + auto strategy default (HAL-106)#35
Conversation
Strip the borrowed PageIndex name — treewalk is now the native identifier for Vectorless's tree-navigation strategy across the strategy id, Go types, config keys, the /v1/answer/treewalk endpoint, and all tests. Pure rename; go build, go vet, and package tests all green. Remaining HAL-106 sub-tasks: promote treewalk to default strategy, dedup the agentic overlap, and fold the dedicated endpoint into /v1/answer.
Smart default selection: route small documents (total leaf content within the call budget) to single-pass for one fast cheap call, and larger or more complex documents to treewalk for agentic navigation. New AutoStrategy in pkg/retrieval with unit tests; wired into config (default value + validation) and both binaries' strategy builders + the selectable-strategy set. go build, go vet, and package tests all green.
There was a problem hiding this comment.
Sorry @hallelx2, your pull request is larger than the review limit of 150000 diff characters
📝 WalkthroughWalkthroughReplaces PageIndex with TreeWalk across config, OpenAPI, server/router, handlers, and retrieval strategy; adds AutoStrategy; rewrites endpoint tests to /v1/answer/treewalk; applies many best‑effort Close() changes and serializes pdftable calls. ChangesTreeWalk migration and Auto strategy
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Pull request overview
This PR completes the engine-wide rename from the borrowed “PageIndex” terminology to “TreeWalk”, updates the dedicated answer endpoint to /v1/answer/treewalk, and introduces an auto retrieval strategy as the new default to route small documents to single-pass and larger documents to tree-walking.
Changes:
- Renames PageIndex → TreeWalk across retrieval strategy types, identifiers, config blocks, handlers, and tests.
- Adds
AutoStrategyand switches the defaultretrieval.strategytoauto. - Rewires both binaries and routers to mount
/v1/answer/treewalkand to exposetreewalk/autoas selectable strategies.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/tree/tree.go | Updates TOCNode docs to reference TreeWalk terminology. |
| pkg/retrieval/treewalk_strategy.go | Renames PageIndexStrategy/Event/Action → TreeWalkStrategy/Event/Action and updates identifiers/prompts/log strings. |
| pkg/retrieval/treewalk_strategy_test.go | Renames and updates tests to exercise TreeWalkStrategy. |
| pkg/retrieval/strategy.go | Updates Result.PagesRead comment to cite treewalk as the page-based example. |
| pkg/retrieval/auto.go | Introduces AutoStrategy routing between “Small” and “Large” strategies. |
| pkg/retrieval/auto_test.go | Adds unit tests for AutoStrategy routing and naming. |
| pkg/ingest/toc_builder.go | Updates TOC builder docs/comments to reference TreeWalk pipeline naming. |
| pkg/ingest/minimal_mode_test.go | Updates minimal-mode test commentary to reference TreeWalk coverage. |
| pkg/ingest/ingest.go | Updates ingest docs/comments referencing the page-based strategy endpoint and TOC stage naming. |
| pkg/config/config.go | Renames retrieval.pageindex → retrieval.treewalk, switches default retrieval.strategy to auto, updates env overrides and validation. |
| pkg/config/config_test.go | Updates defaults/validation/env override tests for TreeWalk + auto default. |
| internal/handler/router.go | Renames wiring from /answer/pageindex to /answer/treewalk and updates Deps fields accordingly. |
| internal/handler/router_parity_test.go | Updates deployed router parity assertions for /v1/answer/treewalk. |
| internal/handler/query.go | Updates strategy override docs/comments and keeps override resolution by map lookup. |
| internal/handler/query_strategy_test.go | Updates override acceptance test from pageindex → treewalk. |
| internal/handler/answer_treewalk.go | Renames the multi-tenant deployed handler to AnswerTreeWalkHandler and rewires event tracing/citations. |
| internal/config/config.go | Forwards TreeWalk env overrides into embedded engine config. |
| internal/config/config_treewalk_test.go | Adds tests for TreeWalk max-citations env pass-through behavior. |
| internal/config/config_pageindex_test.go | Removes PageIndex env forwarding tests (superseded by TreeWalk equivalents). |
| internal/api/treewalk.go | Renames standalone engine handler and supporting helpers to TreeWalk equivalents. |
| internal/api/treewalk_test.go | Updates standalone API tests for /v1/answer/treewalk behavior. |
| internal/api/server.go | Mounts /v1/answer/treewalk and renames Deps fields for TreeWalk. |
| cmd/server/main.go | Switches default strategy to auto; wires selectable strategies set including treewalk and auto; rewires dedicated answer endpoint strategy instance. |
| cmd/engine/main.go | Switches default strategy to auto; rewires dedicated answer endpoint strategy instance and strategy builder cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (a *AutoStrategy) pick(t *tree.Tree, budget ContextBudget) Strategy { | ||
| threshold := a.SinglePassMaxTokens | ||
| if threshold <= 0 { | ||
| threshold = budget.Available() | ||
| } |
| // Select implements Strategy. | ||
| func (a *AutoStrategy) Select(ctx context.Context, t *tree.Tree, query string, budget ContextBudget) ([]tree.SectionID, error) { | ||
| return a.pick(t, budget).Select(ctx, t, query, budget) | ||
| } |
| func (a *AutoStrategy) SelectWithCost(ctx context.Context, t *tree.Tree, query string, budget ContextBudget) (*Result, error) { | ||
| s := a.pick(t, budget) | ||
| if cs, ok := s.(CostStrategy); ok { | ||
| return cs.SelectWithCost(ctx, t, query, budget) | ||
| } |
| // Strategy optionally overrides the configured retrieval strategy | ||
| // for THIS request only. One of: chunked-tree, pageindex, agentic, | ||
| // for THIS request only. One of: chunked-tree, treewalk, agentic, | ||
| // single-pass. Empty uses the server default. This lets a caller | ||
| // (e.g. the benchmark harness) A/B strategies against the same | ||
| // running engine without a redeploy. Unknown values return 400. |
| for _, s := range []string{"single-pass", "chunked-tree", "agentic", "treewalk"} { | ||
| cfg := Default() |
| switch c.Retrieval.Strategy { | ||
| case "single-pass", "chunked-tree", "agentic", "pageindex": | ||
| case "auto", "single-pass", "chunked-tree", "agentic", "treewalk": | ||
| default: | ||
| return fmt.Errorf("unknown retrieval.strategy: %q", c.Retrieval.Strategy) | ||
| } |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/api/treewalk.go (1)
472-508:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove unused helper or add test coverage.
The
treeWalkTraceTokenFromCitationsfunction is unused according to static analysis (U1000) and the comment admits it's "unused at the moment." Unused code increases maintenance burden and can drift from the actual implementation.Either:
- Add test coverage that uses this function to verify trace token computation, or
- Remove it (can always restore from git history when needed)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/api/treewalk.go` around lines 472 - 508, The function treeWalkTraceTokenFromCitations is flagged as unused; either remove it or add test coverage that calls it: if removing, delete the entire function (including its references to tree.DocumentID, model, ranges, retrieval.SystemPromptVersion) and run static checks; if keeping, add a unit test that imports the same package, calls treeWalkTraceTokenFromCitations with representative DocumentID, model and ranges inputs, and asserts the returned hex string matches a precomputed expected value (or verifies deterministic properties), ensuring the test exercises the sha256/hex path so the function is no longer unused.Source: Linters/SAST tools
internal/handler/answer_treewalk.go (1)
277-299:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPersist replay data on the SSE path too.
The non-stream flow stores the final payload through
writeJSONWithReplay, but the streaming flow only emits the final"answer"event. That leaves streamedtrace_tokenvalues orphaned in practice, because nothing ever writes the replay entry behind them.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/handler/answer_treewalk.go` around lines 277 - 299, The streaming path currently only calls emitSSE("answer", final) and never persists the replay entry, so trace_token values from the stream are never written; update the streaming end in the handler (where buildCitations produces citations and final is assembled) to call the same persistence helper used by the non-stream flow (writeJSONWithReplay) with the final payload (including trace_token) before or immediately after emitSSE so the replay entry is created; ensure you pass the same context/body identifiers and any required replay metadata to writeJSONWithReplay so replay lookups for trace_token succeed.
🧹 Nitpick comments (1)
internal/config/config.go (1)
329-333: ⚡ Quick winClarify fallback wording to match actual precedence behavior.
On Line 333, the comment says garbled/negative input “leaves the engine default (3)”. In this function, parse failure/no-op preserves the current configured value (which can come from YAML), not always default 3.
Suggested comment-only fix
- // VLE_; a garbled or negative value leaves the engine default (3). + // VLE_; a garbled or negative value leaves the current configured + // value unchanged (default is 3 when unset elsewhere).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/config/config.go` around lines 329 - 333, Update the comment on the TreeWalk final-citation cap to accurately reflect precedence and fallback: state that VLS_ overrides VLE_, and that a garbled or negative environment value is a no-op that preserves the currently configured value (which may come from YAML or a prior config), rather than always reverting to the engine default of 3; locate the comment near the TreeWalk cap handling in internal/config/config.go (references: VLS_ and VLE_) and revise the wording accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/handler/answer_treewalk.go`:
- Around line 260-262: serveStream currently forwards the whole
retrieval.TreeWalkEvent via strat.OnEvent/emitSSE regardless of
body.IncludeReasoning, leaking Reasoning/Answer; change the strat.OnEvent
handler (and the emit path) to check the request flag (body.IncludeReasoning)
and, when false, send a redacted payload that omits or zeroes sensitive fields
(e.g., TreeWalkEvent.Reasoning, TreeWalkEvent.Answer) or build a separate
minimal event type to emit; ensure the decision is done inside the strat.OnEvent
closure (or before calling emitSSE) so raw TreeWalkEvent is never streamed
unless IncludeReasoning is true.
In `@pkg/retrieval/auto.go`:
- Around line 52-68: The pick method on AutoStrategy can return nil if both
a.Small and a.Large are nil, causing Select/SelectWithCost to panic; fix by
validating strategies at construction or adding defensive checks: update NewAuto
to require non-nil Small or Large and return an error (preferred), or change
pick to return a non-nil fallback (e.g., prefer a.Small then a.Large) and update
Select and SelectWithCost to detect a nil return and return an error instead of
dereferencing; reference symbols: AutoStrategy, pick, NewAuto, Select,
SelectWithCost.
---
Outside diff comments:
In `@internal/api/treewalk.go`:
- Around line 472-508: The function treeWalkTraceTokenFromCitations is flagged
as unused; either remove it or add test coverage that calls it: if removing,
delete the entire function (including its references to tree.DocumentID, model,
ranges, retrieval.SystemPromptVersion) and run static checks; if keeping, add a
unit test that imports the same package, calls treeWalkTraceTokenFromCitations
with representative DocumentID, model and ranges inputs, and asserts the
returned hex string matches a precomputed expected value (or verifies
deterministic properties), ensuring the test exercises the sha256/hex path so
the function is no longer unused.
In `@internal/handler/answer_treewalk.go`:
- Around line 277-299: The streaming path currently only calls emitSSE("answer",
final) and never persists the replay entry, so trace_token values from the
stream are never written; update the streaming end in the handler (where
buildCitations produces citations and final is assembled) to call the same
persistence helper used by the non-stream flow (writeJSONWithReplay) with the
final payload (including trace_token) before or immediately after emitSSE so the
replay entry is created; ensure you pass the same context/body identifiers and
any required replay metadata to writeJSONWithReplay so replay lookups for
trace_token succeed.
---
Nitpick comments:
In `@internal/config/config.go`:
- Around line 329-333: Update the comment on the TreeWalk final-citation cap to
accurately reflect precedence and fallback: state that VLS_ overrides VLE_, and
that a garbled or negative environment value is a no-op that preserves the
currently configured value (which may come from YAML or a prior config), rather
than always reverting to the engine default of 3; locate the comment near the
TreeWalk cap handling in internal/config/config.go (references: VLS_ and VLE_)
and revise the wording accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eaee24cb-1189-4825-8bba-e42087fe803f
📒 Files selected for processing (24)
cmd/engine/main.gocmd/server/main.gointernal/api/server.gointernal/api/treewalk.gointernal/api/treewalk_test.gointernal/config/config.gointernal/config/config_pageindex_test.gointernal/config/config_treewalk_test.gointernal/handler/answer_treewalk.gointernal/handler/query.gointernal/handler/query_strategy_test.gointernal/handler/router.gointernal/handler/router_parity_test.gopkg/config/config.gopkg/config/config_test.gopkg/ingest/ingest.gopkg/ingest/minimal_mode_test.gopkg/ingest/toc_builder.gopkg/retrieval/auto.gopkg/retrieval/auto_test.gopkg/retrieval/strategy.gopkg/retrieval/treewalk_strategy.gopkg/retrieval/treewalk_strategy_test.gopkg/tree/tree.go
💤 Files with no reviewable changes (1)
- internal/config/config_pageindex_test.go
| strat.OnEvent = func(ev retrieval.TreeWalkEvent) { | ||
| emitSSE(ev.Type, ev) | ||
| } |
There was a problem hiding this comment.
Don't stream raw TreeWalk events unless reasoning was requested.
serveStream ignores body.IncludeReasoning and forwards the full retrieval.TreeWalkEvent as the SSE payload. That exposes intermediate Reasoning/Answer fields to any streaming caller, which breaks the endpoint's opt-in reasoning contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/answer_treewalk.go` around lines 260 - 262, serveStream
currently forwards the whole retrieval.TreeWalkEvent via strat.OnEvent/emitSSE
regardless of body.IncludeReasoning, leaking Reasoning/Answer; change the
strat.OnEvent handler (and the emit path) to check the request flag
(body.IncludeReasoning) and, when false, send a redacted payload that omits or
zeroes sensitive fields (e.g., TreeWalkEvent.Reasoning, TreeWalkEvent.Answer) or
build a separate minimal event type to emit; ensure the decision is done inside
the strat.OnEvent closure (or before calling emitSSE) so raw TreeWalkEvent is
never streamed unless IncludeReasoning is true.
… rename) Addresses the Copilot + CodeRabbit review on #35: - auto: non-paged docs (no PageStart/PageEnd — Markdown/HTML/DOCX/TXT) now route to single-pass, never treewalk (which navigates by page range) + nil-safe pick/Select/SelectWithCost so a misconfigured Auto can't panic. - complete the rename in non-Go files: openapi.yaml + config.example*.yaml (incl. the uppercase env-var refs) — zero pageindex remains anywhere. - query.go: list 'auto' in the strategy-override doc comment. - config_test: exercise 'auto' as a valid Validate() strategy. - tests: non-paged fallback + nil-safety. Out of scope (filed separately): treewalk SSE streams reasoning regardless of IncludeReasoning (pre-existing behaviour, not introduced here).
|
Thanks @copilot-pull-request-reviewer and @coderabbitai — addressed in dd6697b: Fixed (in scope):
Deferred (out of scope for this rename PR, tracked as HAL-115):
Build + |
|
Tip For best results, initiate chat on the files or code changes.
Would you like me to:
Just let me know! 🐇✨ |
…ump, lint - toc_builder: fix the data race in verifyTitlesConcurrent — each goroutine now accumulates a local Usage and folds it under the mutex, instead of sharing &localUse across errgroup goroutines (concurrent usage.add writes). - deps: golang.org/x/net v0.53.0 -> v0.55.0 (GO-2026-5030); add toolchain go1.25.11 + pin CI go-version (stdlib GO-2026-5039 / GO-2026-5037). - lint: //lint:ignore U1000 the four staged helpers with reasons; drop the genuinely-unused idempotencyCache.mu field. These CI failures were pre-existing (surfaced on PR #35, not caused by the rename). go build + staticcheck green locally; -race verified in CI. Closes HAL-116.
|
Pushed
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
openapi.yaml (1)
306-310:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFinish the rename in the endpoint description.
Line 310 still says
page-index responses uniformly, so the spec does not actually complete the PageIndex → TreeWalk rename.Suggested diff
- /v1/replay accepts page-index responses uniformly. + /v1/replay accepts TreeWalk responses uniformly.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@openapi.yaml` around lines 306 - 310, The endpoint description still uses the old term "page-index responses uniformly" — update the wording in the Differentiators block for the /v1/replay endpoint to complete the PageIndex → TreeWalk rename (e.g., replace "page-index responses uniformly" with "TreeWalk responses uniformly" or another consistent "TreeWalk" phrasing), and scan the same description for any remaining PageIndex occurrences to ensure all references (PageIndex, page-index) are renamed consistently to TreeWalk across that endpoint's documentation.config.example.yaml (1)
98-113:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThe example config still opts users out of the new default.
This block still advertises
chunked-treeas the default and leavesstrategy: "chunked-tree"in place. Anyone bootstrapping from this file will miss the newautorouting behavior the PR is introducing.Suggested diff
- # strategy: single-pass | chunked-tree | agentic | treewalk + # strategy: auto | single-pass | chunked-tree | agentic | treewalk # + # auto: choose single-pass for small/non-paged docs and + # treewalk for larger paged docs. # single-pass: whole tree in one LLM call; fastest, smallest docs. # chunked-tree: split the tree, reason over slices in parallel, merge. - # The default. Scales to any tree size by trading + # Scales to any tree size by trading # context for parallelism. # agentic: iterative outline → expand → read → done loop. # Picks per-section IDs via a tool-using model. # treewalk: TreeWalk-style page-based agentic loop. Three # tools (get_document_structure / get_pages / done); # the model navigates by INCLUSIVE PAGE RANGE # rather than by section ID. Best for paginated # documents (SEC filings, academic PDFs) where the # per-section interface is too noisy. - strategy: "chunked-tree" + strategy: "auto"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@config.example.yaml` around lines 98 - 113, Update the example config to use the new default routing by replacing the strategy value "chunked-tree" with "auto" and adjust the surrounding comment to describe the "auto" behavior (and that it is now the default) so users bootstrapping from config.example.yaml opt into the new automatic routing; locate the strategy key (strategy: "chunked-tree") and the explanatory comment block listing strategy options and update both the value and the description text to reference "auto" as the default and briefly summarize what "auto" does.pkg/config/config_test.go (1)
783-807:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winIsolate the bad-env test from the
VLS_alias too.
applyEnvOverrides()readsVLE_RETRIEVAL_TREEWALK_MAX_CITATIONSbeforeVLS_RETRIEVAL_TREEWALK_MAX_CITATIONS. If theVLE_value here is garbage but the ambientVLS_alias is set, this test stops observing the default3and becomes environment-dependent.Suggested diff
func TestTreeWalkEnvOverrideRejectsBad(t *testing.T) { prevHops := os.Getenv("VLE_RETRIEVAL_TREEWALK_MAX_HOPS") prevLimit := os.Getenv("VLE_RETRIEVAL_TREEWALK_PAGE_CONTENT_LIMIT") prevCits := os.Getenv("VLE_RETRIEVAL_TREEWALK_MAX_CITATIONS") + prevVLSCits := os.Getenv("VLS_RETRIEVAL_TREEWALK_MAX_CITATIONS") defer func() { os.Setenv("VLE_RETRIEVAL_TREEWALK_MAX_HOPS", prevHops) os.Setenv("VLE_RETRIEVAL_TREEWALK_PAGE_CONTENT_LIMIT", prevLimit) os.Setenv("VLE_RETRIEVAL_TREEWALK_MAX_CITATIONS", prevCits) + os.Setenv("VLS_RETRIEVAL_TREEWALK_MAX_CITATIONS", prevVLSCits) }() os.Setenv("VLE_RETRIEVAL_TREEWALK_MAX_HOPS", "abc") os.Setenv("VLE_RETRIEVAL_TREEWALK_PAGE_CONTENT_LIMIT", "not-a-number") os.Setenv("VLE_RETRIEVAL_TREEWALK_MAX_CITATIONS", "lots") + os.Unsetenv("VLS_RETRIEVAL_TREEWALK_MAX_CITATIONS")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/config/config_test.go` around lines 783 - 807, TestTreeWalkEnvOverrideRejectsBad depends on the ambient VLS_ alias and applyEnvOverrides reads VLE_ before VLS_, so when VLE_ is garbage but VLS_ is set the test is flaky; update the test to also capture and restore the VLS_ environment variables (e.g. VLS_RETRIEVAL_TREEWALK_MAX_HOPS, VLS_RETRIEVAL_TREEWALK_PAGE_CONTENT_LIMIT, VLS_RETRIEVAL_TREEWALK_MAX_CITATIONS) like the existing VLE_ saves, and explicitly unset them (or set to empty) before calling Default() and applyEnvOverrides() so the test always exercises applyEnvOverrides with only the VLE_ garbage values.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@openapi.yaml`:
- Around line 666-668: The OpenAPI enum for QueryResponse.strategy omits the
newly supported "auto" strategy; update the schema so the enum for the
"strategy" property on QueryResponse includes "auto" (i.e., enum: [single-pass,
chunked-tree, agentic, treewalk, auto]) to match server behavior and avoid
client/schema validation errors when the handler emits "auto".
---
Outside diff comments:
In `@config.example.yaml`:
- Around line 98-113: Update the example config to use the new default routing
by replacing the strategy value "chunked-tree" with "auto" and adjust the
surrounding comment to describe the "auto" behavior (and that it is now the
default) so users bootstrapping from config.example.yaml opt into the new
automatic routing; locate the strategy key (strategy: "chunked-tree") and the
explanatory comment block listing strategy options and update both the value and
the description text to reference "auto" as the default and briefly summarize
what "auto" does.
In `@openapi.yaml`:
- Around line 306-310: The endpoint description still uses the old term
"page-index responses uniformly" — update the wording in the Differentiators
block for the /v1/replay endpoint to complete the PageIndex → TreeWalk rename
(e.g., replace "page-index responses uniformly" with "TreeWalk responses
uniformly" or another consistent "TreeWalk" phrasing), and scan the same
description for any remaining PageIndex occurrences to ensure all references
(PageIndex, page-index) are renamed consistently to TreeWalk across that
endpoint's documentation.
In `@pkg/config/config_test.go`:
- Around line 783-807: TestTreeWalkEnvOverrideRejectsBad depends on the ambient
VLS_ alias and applyEnvOverrides reads VLE_ before VLS_, so when VLE_ is garbage
but VLS_ is set the test is flaky; update the test to also capture and restore
the VLS_ environment variables (e.g. VLS_RETRIEVAL_TREEWALK_MAX_HOPS,
VLS_RETRIEVAL_TREEWALK_PAGE_CONTENT_LIMIT, VLS_RETRIEVAL_TREEWALK_MAX_CITATIONS)
like the existing VLE_ saves, and explicitly unset them (or set to empty) before
calling Default() and applyEnvOverrides() so the test always exercises
applyEnvOverrides with only the VLE_ garbage values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ca0e5052-7ad9-42c3-ab14-a52aac0e4eca
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (15)
.github/workflows/ci.ymlconfig.example.yamlconfig.server.example.yamlgo.modinternal/api/treewalk.gointernal/handler/query.gointernal/middleware/cors.gointernal/middleware/idempotency.goopenapi.yamlpkg/config/config_test.gopkg/ingest/ingest.gopkg/ingest/toc_builder.gopkg/parser/pdf.gopkg/retrieval/auto.gopkg/retrieval/auto_test.go
✅ Files skipped from review due to trivial changes (5)
- config.server.example.yaml
- internal/middleware/cors.go
- internal/handler/query.go
- pkg/parser/pdf.go
- pkg/ingest/ingest.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/retrieval/auto.go
- internal/api/treewalk.go
| strategy: | ||
| type: string | ||
| enum: [single-pass, chunked-tree, agentic, pageindex] | ||
| enum: [single-pass, chunked-tree, agentic, treewalk] |
There was a problem hiding this comment.
Add auto to QueryResponse.strategy.
auto is now a supported first-class retrieval strategy, but the response enum still excludes it. That leaves the published contract narrower than the server/config behavior and can break generated clients or schema validation if the handler serializes the selected strategy name directly.
Suggested diff
- enum: [single-pass, chunked-tree, agentic, treewalk]
+ enum: [auto, single-pass, chunked-tree, agentic, treewalk]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| strategy: | |
| type: string | |
| enum: [single-pass, chunked-tree, agentic, pageindex] | |
| enum: [single-pass, chunked-tree, agentic, treewalk] | |
| strategy: | |
| type: string | |
| enum: [auto, single-pass, chunked-tree, agentic, treewalk] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@openapi.yaml` around lines 666 - 668, The OpenAPI enum for
QueryResponse.strategy omits the newly supported "auto" strategy; update the
schema so the enum for the "strategy" property on QueryResponse includes "auto"
(i.e., enum: [single-pass, chunked-tree, agentic, treewalk, auto]) to match
server behavior and avoid client/schema validation errors when the handler emits
"auto".
- pkg/parser: serialize pdftable.OpenBytes behind a package mutex (openPDFBytes). pdftable mutates package-level state on open and is not concurrency-safe; the -race detector flagged concurrent opens (real in prod — ingest workers parse in parallel). Stopgap; the proper fix belongs in pdftable (Foundational Libs). - convert the unused-helper suppressions from staticcheck //lint:ignore to golangci-lint //nolint:unused,staticcheck (CI lint uses golangci-lint). go build + go vet + parser tests green.
…heck (HAL-117) Completes the errcheck/ineffassign cleanup across the repo, including the renamed files (treewalk.go / answer_treewalk.go) the off-main pass couldn't reach: - errcheck: _, _ = fmt.Fprintf on the SSE writers; unchecked returns handled across cmd/benchmark, handlers, connecthandler, parser, storage, queue, db, ingest. - ineffassign: drop the ineffectual 'written += remaining' before break in treewalk_strategy. - staticcheck: real TLS-disabled assertion (was an empty branch) in config_test; De Morgan's rewrite in retrieval_test; drop embedded-field selector in decompose_test. - misspell: ignore-rule for 'strat' (our 'strategy' abbreviation) in .golangci.yml. - formatting normalized via the golangci formatters. golangci-lint run: 0 issues. go build + unit tests green. Closes HAL-117.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/benchmark/main.go`:
- Around line 104-110: Benchmark currently records durations for failed requests
(e.g., the REST call via restGET and gRPC via healthClient.Check) which skews
latency stats; change each measured block (places using restGET,
healthClient.Check and similar REST/gRPC calls that append to
r.rest/r.grpc/r.rerpc etc.) to capture and check the error/result and only
append time.Since(start) when the call succeeded, otherwise increment a failure
counter (e.g., r.restFailures/r.grpcFailures or a shared successes/failures map)
so latency arrays contain only successful-sample durations and failures are
tracked separately; update all similar blocks (the other ranges mentioned) to
follow the same pattern using the existing variables like ctx,
connect.NewRequest, r.rest and r.grpc to locate the spots to change.
- Line 80: The json.Unmarshal call in cmd/benchmark/main.go is ignoring errors
which can leave sectionID empty and cause the "Get Section" benchmark to be
silently skipped; change the unmarshalling of treeResp into treeData (the
json.Unmarshal(treeResp, &treeData) call) to check the returned error, and on
failure log an explicit setup error (including the error detail) and mark the
benchmark run as failed or abort further section selection (e.g., return a
non-zero exit or set a setupFailure flag) so the missing sectionID is not
treated as a successful run; ensure downstream code that uses sectionID (section
selection / "Get Section" benchmark) respects this failure signal.
In `@internal/api/server.go`:
- Around line 395-399: The ReadAll result is used even when io.ReadAll returns
an error; update the rc handling (the block after d.Storage.Get returning err ==
nil where variables rc, raw, content are used) to check the error returned by
io.ReadAll, close rc best-effort, and discard the section by setting content =
"" (or skip adding the section) when ReadAll fails so truncated/corrupted bytes
aren’t used; apply the same fix to the other similar blocks referenced (the
occurrences around lines 538-542, 835-839, 1077-1081) to consistently handle
ReadAll errors.
In `@internal/handler/answer.go`:
- Around line 169-173: The code ignores the error returned by io.ReadAll after
h.storage.Get, which can yield truncated content; change the block around
h.storage.Get / io.ReadAll to check the read error (the second return from
io.ReadAll), ensure rc is closed (best-effort) even on read failure, and if
io.ReadAll returns an error do not use the partial raw content — either skip
adding this section to content or return/propagate an error; update the handling
where variables rc, raw, getErr are used so only successfully-read data is
assigned to content.
In `@internal/handler/documents.go`:
- Around line 428-431: The code currently ignores errors from io.ReadAll and may
return truncated section content; change the read to capture the error (raw,
readErr := io.ReadAll(rc)) and if readErr != nil then ensure rc is closed and
propagate a read/storage error instead of serializing partial bytes — e.g.,
log/report readErr and return an appropriate HTTP error (or skip including the
section body) so that h.storage.Get / sec.ContentRef reads never produce a 200
with truncated data; update uses of rc.Close() accordingly to always close in
both success and error paths.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 89368ae7-3f90-426d-899e-b14165d27cc8
📒 Files selected for processing (42)
.golangci.ymlcmd/benchmark/main.gocmd/engine/main.gocmd/server/main.gointernal/api/replay_test.gointernal/api/server.gointernal/api/treewalk.gointernal/api/treewalk_test.gointernal/connecthandler/documents.gointernal/connecthandler/query.gointernal/handler/answer.gointernal/handler/answer_treewalk.gointernal/handler/documents.gointernal/handler/query.gointernal/handler/query_multi.gointernal/handler/query_strategy_test.gointernal/handler/query_stream.gointernal/handler/query_stream_multi.gointernal/middleware/cors.gointernal/middleware/idempotency.gopkg/cache/cache_test.gopkg/config/config_test.gopkg/db/db.gopkg/ingest/hyde.gopkg/ingest/ingest.gopkg/ingest/minimal_mode_test.gopkg/ingest/toc_builder.gopkg/parser/chunk_test.gopkg/parser/docx.gopkg/parser/pdf.gopkg/queue/asynq.gopkg/queue/asynq_test.gopkg/queue/qstash.gopkg/queue/river_test.gopkg/retrieval/agentic.gopkg/retrieval/decompose_test.gopkg/retrieval/retrieval_test.gopkg/retrieval/single_pass.gopkg/retrieval/treewalk_strategy.gopkg/storage/local.gopkg/storage/s3_test.gopkg/tree/compact_test.go
💤 Files with no reviewable changes (1)
- pkg/retrieval/treewalk_strategy.go
✅ Files skipped from review due to trivial changes (13)
- .golangci.yml
- pkg/retrieval/agentic.go
- internal/handler/query_multi.go
- pkg/retrieval/single_pass.go
- pkg/queue/qstash.go
- pkg/tree/compact_test.go
- internal/middleware/cors.go
- pkg/queue/asynq.go
- internal/handler/query_stream_multi.go
- pkg/ingest/hyde.go
- pkg/parser/docx.go
- internal/connecthandler/query.go
- pkg/parser/chunk_test.go
🚧 Files skipped from review as they are similar to previous changes (9)
- internal/handler/query.go
- pkg/ingest/minimal_mode_test.go
- internal/middleware/idempotency.go
- pkg/ingest/toc_builder.go
- internal/handler/query_strategy_test.go
- internal/api/treewalk_test.go
- cmd/engine/main.go
- internal/api/treewalk.go
- pkg/config/config_test.go
| } `json:"sections"` | ||
| } | ||
| json.Unmarshal(treeResp, &treeData) | ||
| _ = json.Unmarshal(treeResp, &treeData) // benchmark best-effort parse |
There was a problem hiding this comment.
Handle tree parse failure explicitly before section benchmark selection.
If json.Unmarshal fails, sectionID remains empty and “Get Section” benchmarking is silently skipped. Log/track this as a failed setup step so results aren’t misread as complete.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/benchmark/main.go` at line 80, The json.Unmarshal call in
cmd/benchmark/main.go is ignoring errors which can leave sectionID empty and
cause the "Get Section" benchmark to be silently skipped; change the
unmarshalling of treeResp into treeData (the json.Unmarshal(treeResp, &treeData)
call) to check the returned error, and on failure log an explicit setup error
(including the error detail) and mark the benchmark run as failed or abort
further section selection (e.g., return a non-zero exit or set a setupFailure
flag) so the missing sectionID is not treated as a successful run; ensure
downstream code that uses sectionID (section selection / "Get Section"
benchmark) respects this failure signal.
| _, _ = restGET(base + "/v1/health") | ||
| r.rest = append(r.rest, time.Since(start)) | ||
|
|
||
| // gRPC (Connect) | ||
| start = time.Now() | ||
| healthClient.Check(ctx, connect.NewRequest(&v1.HealthCheckRequest{})) | ||
| _, _ = healthClient.Check(ctx, connect.NewRequest(&v1.HealthCheckRequest{})) | ||
| r.grpc = append(r.grpc, time.Since(start)) |
There was a problem hiding this comment.
Don’t include failed calls in latency samples.
All benchmark paths discard request errors, so network/protocol failures are still timed and recorded as “successful” latency. This skews median/p95 and can invert REST vs gRPC conclusions. Track failures separately and only append durations for successful calls (or report success-rate alongside latency).
Also applies to: 121-126, 137-144, 156-163, 185-194
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cmd/benchmark/main.go` around lines 104 - 110, Benchmark currently records
durations for failed requests (e.g., the REST call via restGET and gRPC via
healthClient.Check) which skews latency stats; change each measured block
(places using restGET, healthClient.Check and similar REST/gRPC calls that
append to r.rest/r.grpc/r.rerpc etc.) to capture and check the error/result and
only append time.Since(start) when the call succeeded, otherwise increment a
failure counter (e.g., r.restFailures/r.grpcFailures or a shared
successes/failures map) so latency arrays contain only successful-sample
durations and failures are tracked separately; update all similar blocks (the
other ranges mentioned) to follow the same pattern using the existing variables
like ctx, connect.NewRequest, r.rest and r.grpc to locate the spots to change.
| rc, _, err := d.Storage.Get(r.Context(), sec.ContentRef) | ||
| if err == nil { | ||
| raw, _ := io.ReadAll(rc) | ||
| rc.Close() | ||
| _ = rc.Close() // best-effort close | ||
| content = string(raw) |
There was a problem hiding this comment.
Discard section content when storage reads fail.
These paths ignore io.ReadAll errors and still use whatever bytes were returned. A short read will quietly surface truncated text in /v1/sections, /v1/query, /v1/answer, and /v1/query/multi, which can ground answers on corrupted evidence instead of failing or dropping the section.
Representative fix
- raw, _ := io.ReadAll(rc)
- _ = rc.Close() // best-effort close
- content = string(raw)
+ raw, readErr := io.ReadAll(rc)
+ _ = rc.Close() // best-effort close
+ if readErr == nil {
+ content = string(raw)
+ }Also applies to: 538-542, 835-839, 1077-1081
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/api/server.go` around lines 395 - 399, The ReadAll result is used
even when io.ReadAll returns an error; update the rc handling (the block after
d.Storage.Get returning err == nil where variables rc, raw, content are used) to
check the error returned by io.ReadAll, close rc best-effort, and discard the
section by setting content = "" (or skip adding the section) when ReadAll fails
so truncated/corrupted bytes aren’t used; apply the same fix to the other
similar blocks referenced (the occurrences around lines 538-542, 835-839,
1077-1081) to consistently handle ReadAll errors.
| rc, _, getErr := h.storage.Get(r.Context(), sec.ContentRef) | ||
| if getErr == nil { | ||
| raw, _ := io.ReadAll(rc) | ||
| rc.Close() | ||
| _ = rc.Close() // best-effort close | ||
| content = string(raw) |
There was a problem hiding this comment.
Don't synthesize from partially read section bodies.
io.ReadAll errors are ignored here, so a short storage read still becomes citation / answer input. That can return a grounded answer from truncated evidence instead of skipping the section or failing the request.
Representative fix
- raw, _ := io.ReadAll(rc)
- _ = rc.Close() // best-effort close
- content = string(raw)
+ raw, readErr := io.ReadAll(rc)
+ _ = rc.Close() // best-effort close
+ if readErr != nil {
+ h.logger.Warn("answer: section content read failed", "section_id", sec.ID, "err", readErr)
+ continue
+ }
+ content = string(raw)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/answer.go` around lines 169 - 173, The code ignores the
error returned by io.ReadAll after h.storage.Get, which can yield truncated
content; change the block around h.storage.Get / io.ReadAll to check the read
error (the second return from io.ReadAll), ensure rc is closed (best-effort)
even on read failure, and if io.ReadAll returns an error do not use the partial
raw content — either skip adding this section to content or return/propagate an
error; update the handling where variables rc, raw, getErr are used so only
successfully-read data is assigned to content.
| rc, _, getErr := h.storage.Get(r.Context(), sec.ContentRef) | ||
| if getErr == nil { | ||
| raw, _ := io.ReadAll(rc) | ||
| rc.Close() | ||
| _ = rc.Close() // best-effort close |
There was a problem hiding this comment.
Avoid returning partial section content on read errors.
This handler ignores io.ReadAll errors and still serializes whatever bytes were fetched. A short storage read becomes a 200 with truncated section text, which is indistinguishable from a real section body to clients.
Representative fix
- raw, _ := io.ReadAll(rc)
- _ = rc.Close() // best-effort close
- content = string(raw)
+ raw, readErr := io.ReadAll(rc)
+ _ = rc.Close() // best-effort close
+ if readErr == nil {
+ content = string(raw)
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/handler/documents.go` around lines 428 - 431, The code currently
ignores errors from io.ReadAll and may return truncated section content; change
the read to capture the error (raw, readErr := io.ReadAll(rc)) and if readErr !=
nil then ensure rc is closed and propagate a read/storage error instead of
serializing partial bytes — e.g., log/report readErr and return an appropriate
HTTP error (or skip including the section body) so that h.storage.Get /
sec.ContentRef reads never produce a 200 with truncated data; update uses of
rc.Close() accordingly to always close in both success and error paths.
…6/117) - lint: the CI 'lint' job runs BOTH golangci-lint and standalone staticcheck (honnef), and the latter reads //lint:ignore, not golangci's //nolint. Rather than juggle two directive formats, delete the 4 genuinely-unused funcs (treeWalkTraceTokenFromCitations, originMatches, isTimeout, isEncryptedPDFError) — both linters agree they're dead. (Re-add with callers when HAL-73/HAL-112 land.) - race: pdftable.ExtractTables also mutates package-level state — a second pdftable race distinct from OpenBytes; serialize it on the same mutex. Stopgap; the root fix is HAL-118 (make pdftable concurrency-safe). golangci-lint + standalone staticcheck: 0 issues. go build + unit tests green.
The ubuntu-only 'Verify go.mod/go.sum are tidy' CI step requires the canonical format go mod tidy emits: a blank line between the 'go' and 'toolchain' directives. My earlier manual insertion of the toolchain line omitted it.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/parser/pdf.go (1)
1605-1637:⚠️ Potential issue | 🔴 CriticalFix:
safeExtractTablescan deadlock the globalpdftableOpenMuon timeout/hang or panic
safeExtractTableslockspdftableOpenMuimmediately beforepage.ExtractTables(settings), but on timeout the caller returns while the goroutine keeps running; ifExtractTableswedges, the global lock stays held and blocks lateropenPDFBytes/table extraction.- If
page.ExtractTablespanics, the goroutine’srecoverruns but the mutex unlock is not deferred, sopdftableOpenMu.Unlock()is skipped and the mutex remains permanently locked.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/parser/pdf.go` around lines 1605 - 1637, safeExtractTables currently locks pdftableOpenMu and then calls page.ExtractTables without deferring the unlock, which can permanently lock the mutex on panic or timeout; fix by moving pdftableOpenMu.Lock() into the goroutine and immediately calling defer pdftableOpenMu.Unlock() (before invoking page.ExtractTables) so the mutex is always released on return/recover, and ensure the goroutine still sends result on the done channel; keep references to pdftableOpenMu, safeExtractTables, page.ExtractTables, and the done/result channel when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/parser/pdf.go`:
- Around line 1605-1637: safeExtractTables currently locks pdftableOpenMu and
then calls page.ExtractTables without deferring the unlock, which can
permanently lock the mutex on panic or timeout; fix by moving
pdftableOpenMu.Lock() into the goroutine and immediately calling defer
pdftableOpenMu.Unlock() (before invoking page.ExtractTables) so the mutex is
always released on return/recover, and ensure the goroutine still sends result
on the done channel; keep references to pdftableOpenMu, safeExtractTables,
page.ExtractTables, and the done/result channel when making this change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 26e5083c-3a99-44a3-a0fd-78159420f3dd
📒 Files selected for processing (5)
go.modinternal/api/treewalk.gointernal/middleware/cors.gopkg/ingest/ingest.gopkg/parser/pdf.go
💤 Files with no reviewable changes (3)
- internal/middleware/cors.go
- pkg/ingest/ingest.go
- internal/api/treewalk.go
🚧 Files skipped from review as they are similar to previous changes (1)
- go.mod
Strips the borrowed PageIndex name from the engine and makes treewalk the canonical strategy.
What's in here
openapi.yaml), and config examples (config.example*.yaml), including env-var references. Zero pageindex references remain.AutoStrategyselection paths to be nil-safe (pick/Select/SelectWithCost) and added tests for non-paged fallback + nil-safety.auto(config validation coverage and request-override docs).Verification
Not in here (tracked separately)
Summary by CodeRabbit
New Features
/v1/answer/treewalkendpointChanges
Summary by CodeRabbit
New Features
Behavior / Request Overrides
Configuration Changes
Chores / Bug Fixes