feat(search): path_excludes filter + teach the MCP keyword-OR / noise-dir exclusion#471
Merged
Merged
Conversation
…-dir exclusion Agents kept doing "find files about X" badly — N separate searches (one per keyword) and hand-stacked excludes — and even claimed the MCP "doesn't have path-exclude". Two real causes: (1) a teaching gap (regex-OR and the existing filters weren't in the agent instructions), and (2) a genuine capability gap — `exclude` matches the FILENAME, `path_contains` is a positive include, and there was no way to drop noise DIRECTORIES. This fixes both. Capability — new `path_excludes`: a comma-separated list of directory-path globs; a record is dropped when its directory matches ANY entry (inverse of `path_contains`, multi-value so several noise dirs go in one query). Wired end-to-end: protocol `SearchParams` → core `SearchFilters` (parsing extracted to `path_normalize::parse_path_excludes`) → daemon → MCP `uffs_search` tool, plus a `--not-in-path` CLI flag mirroring `--in-path`. Teaching — AGENT_INSTRUCTIONS now spell out: • KEYWORD-OR with ONE regex `>(a|b|c)` (case-insensitive), never N searches; • `path_excludes` for noise dirs, `exclude` (filename glob), `match_path`; • the `documents` collection covers xls/xlsx; • a worked "topic files in a folder, minus dev noise" one-call example. filters/mod.rs is at the 800-LOC boundary; the new field tips it over, so it is registered in file_size_exceptions.txt (cohesive struct + from_params). Tests: `from_params_path_excludes_*`. macOS + windows-msvc builds + clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
An LLM was asked "find my solar/energy spreadsheets" and (a) hallucinated CLI flags, (b) ran 8 separate searches (one per keyword), and (c) claimed "UFFS MCP doesn't have path-exclude." Two real causes:
exclude,match_path) weren't in the agent instructions.excludematches the filename,path_containsis a positive include, so there was genuinely no way to drop noise directories (.cargo,AppData, …). The agent was right.What this does
Capability — new
path_excludesA comma-separated list of directory-path globs; a record is dropped when its directory matches any entry (inverse of
path_contains, multi-value so several noise dirs go in one query). Wired end-to-end:SearchParams(protocol) →SearchFilters(core; parsing inpath_normalize::parse_path_excludes— comma-split, lowered, separator-normalized) → daemon → MCPuffs_searchtool →--not-in-pathCLI flag mirroring--in-path.Teaching —
AGENT_INSTRUCTIONS>(a|b|c)(case-insensitive), never N searches;path_excludes(noise dirs),exclude(filename glob),match_path;documentscollection covers xls/xlsx;So the failure becomes:
Tests / verification
from_params_path_excludes_*(comma-split / lowercase / normalize / None-on-blank). macOS +x86_64-pc-windows-msvcbuilds + clippy clean; full pre-push gate green.filters/mod.rs(cohesive struct +from_params) registered infile_size_exceptions.txtas it crosses the 800-LOC line.🤖 Generated with Claude Code