diff --git a/CHANGELOG.md b/CHANGELOG.md index d1fc54e..600f9d6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,12 @@ Please choose versions by [Semantic Versioning](http://semver.org/). * MINOR version when you add functionality in a backwards-compatible manner, and * PATCH version when you make backwards-compatible bug fixes. +## Unreleased + +- feat: add opt-in `--selector` mode to `/coding:pr-review` and `/coding:code-review` — replaces per-owner Task fanout (Step 4b-ii) with two in-session steps: Step 4c-sel CLASSIFY (narrows glob candidates via recall-optimized contract) and Step 4d-sel ADJUDICATE (reads only applicable rule blocks, emits findings in existing severity buckets); zero sub-agent spawns; default mode unchanged +- refactor: extract selector-mode classify/adjudicate procedure to docs/selector-mode-guide.md — commands carry thin pointers (single source of truth; answers pr-reviewer command-thin + single-source-of-truth findings) +- fix: selector-mode guide resolution is now fail-fast — explicit `GUIDE_OK`/`GUIDE_MISSING` echo, and on missing guide the review STOPS with a Must Fix toolchain failure instead of silently continuing mechanical-only (caught by MiniMax-M2.7 benchmark: weaker model skipped classify+adjudicate entirely when the guide path failed, presenting a judgment-less review as complete) + ## v0.19.0 - refactor!: BREAKING CHANGE — rename `coding:release-changelog-agent` → `coding:release-changelog-assistant` to match marketplace `-.md` naming convention (`-assistant` / `-auditor`, not `-agent`). Coordinated rename: agent file `agents/release-changelog-agent.md` → `agents/release-changelog-assistant.md`, `name:` frontmatter, both callers' `subagent_type=` reference, README agents-table entry, llms.txt entry, and CHANGELOG mentions. Callers on v0.18.0 will fail to resolve until updated; v0.18.0 callers are bundled in this same commit so any installer of v0.19.0+ has the matching pair. - refactor: restructure `coding:release-changelog-assistant` body to XML schema per agent-command-development-guide — add `` block (NEVER commit/tag/push, ALWAYS return error field when malformed, etc.), `` block summarizing the 5-step workflow up-front (was buried), `` for the success JSON schema, `` for the three error codes (separates concerns from success). Markdown `#` headings retained only for high-level sections (Purpose, Inputs, classification rules, rewrite rules, caller profiles, invocation example); structural directives now use XML tags as the guide requires. diff --git a/README.md b/README.md index fdf8c18..53b1251 100644 --- a/README.md +++ b/README.md @@ -177,6 +177,7 @@ All guides live in [`docs/`](docs/) and can be read standalone without the plugi | [Skill Writing](docs/claude-code-skill-writing-guide.md) | Claude Code skill directory structure | | [Rule Block Schema](docs/rule-block-schema.md) | `### RULE` block contract and index schema | | [ast-grep Rule Writing Guide](docs/ast-grep-rule-writing-guide.md) | ast-grep YAML conventions for mechanical rule enforcement | +| [Selector Mode Guide](docs/selector-mode-guide.md) | In-session classify + adjudicate procedure for `--selector` mode | ### Frontend diff --git a/commands/code-review.md b/commands/code-review.md index a12280e..f8e7294 100644 --- a/commands/code-review.md +++ b/commands/code-review.md @@ -1,6 +1,6 @@ --- allowed-tools: Task, Bash(git diff:+), Bash(git log:+), Bash(git status:+), Bash(git ls-files:+) -argument-hint: "[short|standard|full] [directory]" +argument-hint: "[short|standard|full|selector] [directory]" description: Perform a comprehensive code review of recent changes --- @@ -20,6 +20,7 @@ Perform a code review with configurable depth based on mode. Parse the first argument to determine mode: - If first arg is `short|quick|fast` → **Short mode** (manual review only) - If first arg is `full|comprehensive|complete` → **Full mode** (all 13 agents) +- If first arg is `--selector` or `selector` → **Selector mode** (in-session classify + adjudicate, zero sub-agent spawns) - Otherwise → **Standard mode** (4 core agents, default) Any remaining arguments are treated as the directory path. @@ -179,6 +180,31 @@ echo "{\"event\":\"per_owner_summary\",\"owners_invoked\":$owners_invoked,\"tota Diagnostic only — operators read it to answer "is Owner X worth dispatching?" with data. Not part of the Step 5 user-facing report. `code-review.md` has no formal cleanup step (the command works against the current branch in-place; there is no review-worktree to remove), so end the run with `rm -f /tmp/code-review-timing.jsonl` to keep the file from accumulating stale lines across reviews. +#### Selector mode: Steps 4c-sel and 4d-sel + +When the mode argument is `--selector` or `selector`: Steps 4.0, 4a, and 4b-i run unchanged. Skip Step 4b-ii. Resolve the guide and execute Steps 4c-sel and 4d-sel from it — zero sub-agent spawns. + +Run exactly this one command, once: + +```bash +GUIDE="${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/coding}/docs/selector-mode-guide.md" +[ -f "$GUIDE" ] || GUIDE="${CLAUDE_CONFIG_DIR:-$HOME/.claude}/plugins/marketplaces/coding/docs/selector-mode-guide.md" +[ -f "$GUIDE" ] && echo "GUIDE_OK: $GUIDE" || echo "GUIDE_MISSING" +``` + +If it prints `GUIDE_MISSING`: report "selector guide unavailable" as a **Must Fix toolchain failure** in Step 5 and STOP the selector path — do NOT continue with a mechanical-findings-only review presented as a complete selector review (a review without the judgment tier silently misses every judgment-tier rule; same fail-fast discipline as Step 4.0). Do NOT investigate further (no `find`, no `ls`, no path probing). + +If it prints `GUIDE_OK`: Read the file at that path, then execute its **Step 4c-sel CLASSIFY** and **Step 4d-sel ADJUDICATE** with: + +- **DIFF** = `git diff HEAD~1` (or directory diff as parsed in Step 1) +- **CANDIDATES** = the Step 4b-i ` ` output +- **MECHANICAL_FINDINGS** = `/tmp/code-review-findings.json` +- **Working directory** = the reviewed directory + +On the guide's short-circuit condition the report line is `selector clean — no adjudication needed`. When the mode is anything else (short/standard/full), skip this section entirely. + +Include the traceability section per `docs/selector-mode-guide.md` § Traceability Report Section. + #### 4c: Context-specific conventions Load these conventionally when the diff matches: @@ -225,6 +251,10 @@ Agent-reported important issues (error handling, architecture, factory/handler p #### Nice to Have (Optional) Agent-reported minor issues (style, documentation, naming conventions, version updates). +#### Selector Mode: Classify Traceability (selector mode only) + +Include the traceability section per `docs/selector-mode-guide.md` § Traceability Report Section. + ### Step 6: Next Steps If `go-test-coverage-assistant` reported missing tests, suggest: diff --git a/commands/pr-review.md b/commands/pr-review.md index c490776..012a46d 100644 --- a/commands/pr-review.md +++ b/commands/pr-review.md @@ -1,6 +1,6 @@ --- allowed-tools: Task, Bash(git diff:+), Bash(git log:+), Bash(git status:+), Bash(git ls-files:+), Bash(git fetch:+), Bash(git worktree:+), Bash(git branch:+), Bash(rm -rf:+) -argument-hint: " [short|standard|full]" +argument-hint: " [short|standard|full|selector]" description: Review current branch diff against target branch (excludes vendor/node_modules) --- @@ -80,6 +80,7 @@ cd && git worktree remove /tmp/pr-review-- --for - `short|quick|fast` → **Short mode** (manual review only) - `full|comprehensive|complete` → **Full mode** (all agents) +- `--selector` or `selector` → **Selector mode** (in-session classify + adjudicate, zero sub-agent spawns) - Otherwise → **Standard mode** (4 core agents, default) ### Step 2: Project Detection @@ -210,6 +211,31 @@ echo "{\"event\":\"per_owner_summary\",\"owners_invoked\":$owners_invoked,\"tota The timing file `/tmp/pr-review-timing.jsonl` is purely diagnostic — it lets operators answer "is Owner X worth dispatching?" with data instead of intuition. Not part of the Step 5 user-facing report; include it in the cleanup step for the review worktree so it gets removed after Step 6. +#### Selector mode: Steps 4c-sel and 4d-sel + +When the mode argument is `--selector` or `selector`: Steps 4.0, 4a, and 4b-i run unchanged. Skip Step 4b-ii. Resolve the guide and execute Steps 4c-sel and 4d-sel from it — zero sub-agent spawns. + +Run exactly this one command, once: + +```bash +GUIDE="${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/coding}/docs/selector-mode-guide.md" +[ -f "$GUIDE" ] || GUIDE="${CLAUDE_CONFIG_DIR:-$HOME/.claude}/plugins/marketplaces/coding/docs/selector-mode-guide.md" +[ -f "$GUIDE" ] && echo "GUIDE_OK: $GUIDE" || echo "GUIDE_MISSING" +``` + +If it prints `GUIDE_MISSING`: report "selector guide unavailable" as a **Must Fix toolchain failure** in Step 5 and STOP the selector path — do NOT continue with a mechanical-findings-only review presented as a complete selector review (a review without the judgment tier silently misses every judgment-tier rule; same fail-fast discipline as Step 4.0). Do NOT investigate further (no `find`, no `ls`, no path probing). + +If it prints `GUIDE_OK`: Read the file at that path, then execute its **Step 4c-sel CLASSIFY** and **Step 4d-sel ADJUDICATE** with: + +- **DIFF** = the Step 0c diff (`git diff origin/...HEAD`) +- **CANDIDATES** = the Step 4b-i ` ` output +- **MECHANICAL_FINDINGS** = `/tmp/pr-review-findings.json` +- **Working directory** = `REVIEW_DIR` + +On the guide's short-circuit condition the report line is `selector clean — no adjudication needed`. When the mode is anything else (short/standard/full), skip this section entirely. + +Include the traceability section per `docs/selector-mode-guide.md` § Traceability Report Section. + #### 4c: Context-specific conventions (kept from prior Step 2.5) Some review questions still benefit from a full-doc read even in dispatcher mode. Load these conventionally when the diff matches: @@ -253,6 +279,10 @@ The script exits non-zero if any finding's `rule_id` is not in `rules/index.json #### Nice to Have (Optional) - Style, code organization, Go patch updates, tool updates, naming conventions, copyright headers +#### Selector Mode: Classify Traceability (selector mode only) + +Include the traceability section per `docs/selector-mode-guide.md` § Traceability Report Section. + ### Step 6: Next Steps Recommendation If test coverage gaps found, suggest `/go-write-test` commands. diff --git a/docs/selector-mode-guide.md b/docs/selector-mode-guide.md new file mode 100644 index 0000000..660b521 --- /dev/null +++ b/docs/selector-mode-guide.md @@ -0,0 +1,81 @@ +## Selector Mode — Classify and Adjudicate Procedure + +Selector mode replaces Step 4b-ii's per-owner Task dispatch with two in-session steps that run in the calling command's session. Steps 4.0, 4a, and 4b-i run unchanged and produce the candidate set. The design goal is zero sub-agent spawns: every rule is evaluated inside the current session context rather than cold-starting one sub-agent per owner. + +Selector mode is opt-in (`--selector`/`selector` mode token); the default per-owner dispatch path is unchanged. + +## Inputs + +| Input | Description | +|-------|-------------| +| `DIFF` | The full diff for this review (caller-provided; see note below) | +| `CANDIDATES` | The ` ` list produced by Step 4b-i jq glob output | +| `MECHANICAL_FINDINGS` | Path to the Step 4a runner output JSON (e.g. `/tmp/pr-review-findings.json`) | +| Working directory | The directory under review (caller-provided) | + +**Diff source differs per caller**: `commands/pr-review.md` uses the Step 0c worktree diff (`git diff origin/...HEAD`); `commands/code-review.md` uses `git diff HEAD~1` (or directory diff as parsed in Step 1). + +## Step 4c-sel: CLASSIFY (in-session, no Task spawn) + +**Input**: `DIFF`, the `CANDIDATES` list, and the `applies_when` text for each candidate from `rules/index.json`. + +For each candidate rule, decide: **applicable** or **skipped** with a one-line reason (≤ 8 words). + +**Recall contract (embed verbatim)**: "INCLUDE if a reasonable reviewer would want to read this rule before judging. Do not evaluate compliance. Do not evaluate violations. When uncertain, include." + +**Skip justification rule**: a skip decision MUST be justified against the rule's `applies_when` text itself — the reason states why the `applies_when` condition does not hold for this diff. NEVER infer a rule's scope from its rule-id name or prefix (e.g. `go-testing/*` rules are NOT necessarily scoped to test files — read the `applies_when`). If the diff plausibly matches the `applies_when` condition, the rule is applicable. + +**HARD INVARIANT**: the applicable set MUST be a subset of the candidate set. Every applicable rule_id must appear in the Step 4b-i candidate list. Never add a rule the glob did not produce. + +**Architecture-tier bypass**: any candidate rule whose `enforcement` text contains "architecture" OR whose `doc_path` is `go-architecture-patterns.md` and concerns SRP/layering is unconditionally applicable — do not classify, always include it. + +**Short-circuit**: if the applicable set is empty AND the mechanical findings from Step 4a are also empty, report: + +> `selector clean — no adjudication needed` + +and skip Step 4d-sel, proceeding directly to Step 5. Include the candidate count and a note that all candidates were classified as non-applicable in the Step 5 traceability section. + +Produce a classify result: +```json +{ + "applicable": ["", "..."], + "skipped": { + "": "", + "...": "..." + } +} +``` + +## Step 4d-sel: ADJUDICATE (in-session, no Task spawn) + +**Input**: the full diff (no truncation — this is the load-bearing step), the mechanical findings from `MECHANICAL_FINDINGS`, and the applicable rules from Step 4c-sel. + +For each applicable rule: locate the rule's `doc_path` in `rules/index.json`, then read only the matching `### RULE ` block from that file (grep for the heading, read the block — do not read the whole document). + +Judge the full diff plus mechanical findings. For each violation found, emit a finding that cites `rule_id` + file + line and lands in the existing report buckets: + +- **Must Fix (Critical)** — security, context violations, concurrency bugs, data correctness, SRP (3+ concerns) +- **Should Fix (Important)** — architectural violations, error handling, factory/handler patterns, test gaps +- **Nice to Have (Optional)** — style, naming, minor version issues + +Do not emit a per-rule "passed" entry for rules with no violation — silently omit them. + +**Batching**: if the applicable set exceeds 20 rules, split adjudication into 2–3 thematic in-session passes (e.g. architecture rules first, then quality rules, then style rules). Each pass runs in the current session — still zero Task spawns. Collect all findings before proceeding to citation validation. + +**Citation validation** (invoked directly — selector mode spawns NO sub-agents, including `coding:simple-bash-runner`): run the validator as a plain Bash call over the adjudication findings before consolidation — findings citing a `rule_id` absent from `rules/index.json` are dropped and logged to stderr. Resolve the script path via the plugin install chain (the working directory may not be the plugin checkout): + +```bash +VALIDATOR="${CLAUDE_PLUGIN_ROOT:-$HOME/.claude/plugins/marketplaces/coding}/scripts/validate-citations.sh" +[ -x "$VALIDATOR" ] || VALIDATOR="${CLAUDE_CONFIG_DIR:-$HOME/.claude}/plugins/marketplaces/coding/scripts/validate-citations.sh" +bash "$VALIDATOR" +``` + +## Traceability Report Section + +Include this section in the Step 5 report only when the review ran in selector mode. List counts and every classify skip so operators can spot false drops: + +- **Candidates**: `` rules matched by Step 4b-i glob filter +- **Applicable**: `` rules selected by Step 4c-sel (M ≤ N) +- **Skipped** (one line each): + - `` → `` + - … diff --git a/llms.txt b/llms.txt index 30e2ff2..2a89943 100644 --- a/llms.txt +++ b/llms.txt @@ -73,6 +73,7 @@ - [Skill Writing](docs/claude-code-skill-writing-guide.md): Structure and conventions for Claude Code skill directories - [Rule Block Schema](docs/rule-block-schema.md): `### RULE` block contract and index schema - [ast-grep Rule Writing Guide](docs/ast-grep-rule-writing-guide.md): ast-grep YAML conventions for mechanical rule enforcement +- [Selector Mode Guide](docs/selector-mode-guide.md): In-session classify + adjudicate procedure for `--selector` mode (single source of truth; commands carry thin pointers) ## Frontend - [Vue 3 + TypeScript](docs/vue3-typescript-frontend-guide.md): Composition API, Vite, Vitest diff --git a/specs/fixtures/golden-legacy-verdict.json b/specs/fixtures/golden-legacy-verdict.json new file mode 100644 index 0000000..db5a535 --- /dev/null +++ b/specs/fixtures/golden-legacy-verdict.json @@ -0,0 +1,124 @@ +{ + "fixture": "bborbe/maintainer#2", + "captured": "2026-06-11T15:47:39Z", + "owners_spawned": [ + "agent-auditor", + "go-architecture-assistant", + "go-error-assistant", + "go-http-handler-assistant", + "go-metrics-assistant", + "go-quality-assistant", + "go-security-specialist", + "go-test-quality-assistant", + "go-time-assistant", + "godoc-assistant", + "license-assistant" + ], + "findings": [ + { + "rule_id": "go-architecture/constructor-returns-interface", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 18, + "severity": "critical", + "owner": "go-architecture-assistant", + "message": "NewService1() returns *Service1 (concrete pointer) instead of an interface. The package declares no interface that Service1 implements, so the constructor leaks the concrete type to callers. Fix: declare a Service1 interface (e.g. `type Service1Iface interface { ... }`) in the same package and change the return type to it." + }, + { + "rule_id": "go-architecture/no-globals-or-singletons", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 22, + "severity": "critical", + "owner": "go-architecture-assistant", + "message": "Package-level `var sharedService1 = NewService1()` creates a singleton dependency. This breaks test parallelism and hides the dependency graph. Fix: remove the package-level var and inject the dependency via constructor parameters." + }, + { + "rule_id": "go-concurrency/no-raw-go-func", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 33, + "severity": "critical", + "owner": "go-architecture-assistant", + "message": "Raw `go func() { ... }()` is used outside a main entry point. Use `github.com/bborbe/run` strategies (e.g. `run.CancelOnFirstErrorWait`, `run.All`) instead. Raw goroutines require hand-rolled sync and risk leaks and races." + }, + { + "rule_id": "go-errors/no-fmt-errorf", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 37, + "severity": "critical", + "owner": "go-error-assistant", + "message": "fmt.Errorf is forbidden in production code. Replace with `errors.Errorf(\"boom: %d\", 42)` from `github.com/bborbe/errors` (new error) or `errors.Wrapf(ctx, err, ...)` (wrapping an existing error). Rule go-errors/no-fmt-errorf." + }, + { + "rule_id": "go-time/no-time-now-direct", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 29, + "severity": "critical", + "owner": "go-time-assistant", + "message": "time.Now() is called directly in production code. Inject `libtime.CurrentDateTimeGetter` and call its `Now(ctx)` method. Direct time.Now() makes tests non-deterministic. Rule go-time/no-time-now-direct." + }, + { + "rule_id": "go-time/inject-getter-not-create", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 29, + "severity": "critical", + "owner": "go-time-assistant", + "message": "The Now() function creates a time value directly rather than delegating to an injected getter. This is the same root violation as no-time-now-direct. Fix: replace with an injected libtime.CurrentDateTimeGetter parameter. Rule go-time/inject-getter-not-create." + }, + { + "rule_id": "go-library/semver-vprefix-tag-required", + "file": ".", + "line": 0, + "severity": "major", + "owner": "go-quality-assistant", + "message": "No vMAJOR.MINOR.PATCH git tags found in the repo. This is expected for this fixture branch (delete-this-pr-never, never merged), but noted per rule go-library/semver-vprefix-tag-required. Non-blocking for a test fixture." + }, + { + "rule_id": "go-doc/exported-item-must-have-comment", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 16, + "severity": "major", + "owner": "godoc-assistant", + "message": "Exported type Service1 has a comment, but exported functions Now(), Run(), Boom() lack proper standalone doc comments (the inline comments on the line above describe the violation, not the function's purpose). Rule go-doc/exported-item-must-have-comment. Fix: add `// Now returns...`, `// Run starts...`, `// Boom returns...` doc comments." + }, + { + "rule_id": "go-doc/comment-starts-with-name", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 17, + "severity": "nit", + "owner": "godoc-assistant", + "message": "The comment above Service1 says `// Service1 + NewService1: violates...` — starts with `Service1` correctly, but the comment describes the rule violation rather than the type's purpose. Per go-doc/comment-starts-with-name, doc comments should start with the identifier name and describe the type. Exempt as test fixture." + }, + { + "rule_id": "go-doc/package-comment-in-doc-go", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 1, + "severity": "nit", + "owner": "godoc-assistant", + "message": "The package comment lives in violations.go rather than a dedicated doc.go. Rule go-doc/package-comment-in-doc-go. For a test fixture this is not blocking, but the pattern should be corrected in production code." + }, + { + "rule_id": "go-licensing/source-file-header-required", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 1, + "severity": "major", + "owner": "license-assistant", + "message": "violations.go is missing the BSD-2-Clause 3-line copyright header. All public Go files outside vendor/ require this header per go-licensing/source-file-header-required. Fix: add the standard `// Copyright (c) 2026 Benjamin Borbe All rights reserved. // Use of this source code is governed by a BSD-style // license that can be found in the LICENSE file.` block." + }, + { + "rule_id": "git-commit/imperative-mood", + "file": "@commits", + "line": 0, + "severity": "nit", + "owner": "agent-auditor", + "message": "Commit subject: `test(fixture): add scenario-004 violations fixture for bborbe/coding acceptance walks`. Subject starts with `test(fixture): add` — imperative mood is correct. No violation found for go-commit/imperative-mood." + }, + { + "rule_id": "go-testing/libtime-injection-required", + "file": "pkg/scenarios-test-fixture/violations.go", + "line": 29, + "severity": "major", + "owner": "go-test-quality-assistant", + "message": "The Now() function uses time.Now() directly, which makes any test of this function non-deterministic. Rule go-testing/libtime-injection-required requires injecting libtime.CurrentDateTimeGetter so tests can supply a fixed time. Fix: add a TimeGetter parameter to Now() and call getter.Now(ctx)." + } + ], + "summary": "5 of 5 intended mechanical violations confirmed as Critical. Additional findings: missing BSD-2-Clause copyright header (Major), missing semver tag for shallow clone (Major), exported function doc comment gaps (Major), go-doc/package-comment-in-doc-go (Nit), git-commit/imperative-mood clean (no violation). Total: 6 Critical, 4 Major, 3 Nit. Owners spawned: 11. All finding rule_ids validated against rules/index.json — zero citation failures." +} diff --git a/specs/selector-mode-classify-adjudicate.md b/specs/selector-mode-classify-adjudicate.md new file mode 100644 index 0000000..898421f --- /dev/null +++ b/specs/selector-mode-classify-adjudicate.md @@ -0,0 +1,109 @@ +--- +tags: + - dark-factory + - spec +status: draft +--- + +## Summary + +- Add an opt-in `--selector` review mode to both `commands/pr-review.md` and `commands/code-review.md` that replaces per-owner sub-agent fanout with two in-session steps: CLASSIFY (narrow the glob candidates) then ADJUDICATE (judge the diff against only the applicable rule blocks). +- This is migration Step 1 of the AI-Selector Review Redesign: flagged, default OFF — without `--selector` the dispatcher behaves byte-for-byte as today. +- The existing jq glob pre-filter (Step 4b-i) STAYS as the candidate source; classify may only narrow it (`applicable ⊆ candidates`), never extend it. +- Selector mode spawns ZERO sub-agents; classify and adjudicate run as turns in the current session. +- Validation target: the bborbe/maintainer#2 fixture must surface all 13 frozen golden rule_ids at matching severities, and default-mode scenarios 002/004 must still pass unchanged. + +## Problem + +The dispatcher's Step 4b-ii spawns one `Task(coding:)` sub-agent per affected owner. This is the last LLM cost that scales with rule count: each owner is a cold-start sub-agent that re-reads rules and re-reads the diff, costing ~4 min of wall time apiece (observed 12-15 min for a single-file markdown PR with 2 owners; ~40 min for a standard 8-owner Go PR). The fanout also fires on glob-truthful but semantically-irrelevant matches (a typo edit in a command's description still loads the full agent-auditor context to conclude "nothing to do"). We need an in-session path that decouples review wall-time from rule count, but we must ship it behind a flag so the current behavior — which the bot and the acceptance scenarios depend on — does not change until the new path is validated on the golden fixture. + +## Goal + +Both review commands accept a `--selector` mode argument. When it is active, Step 4b-ii's per-owner Task dispatch is replaced by an in-session CLASSIFY step followed by an in-session ADJUDICATE step, with zero sub-agent spawns, and the existing jq glob pre-filter and citation validator both still run. When `--selector` is absent, the dispatcher executes its current logic unchanged. On the bborbe/maintainer#2 fixture, selector mode surfaces the same rule_ids at the same severities as the frozen legacy verdict. + +## Non-goals + +- Do NOT flip the default mode — selector is opt-in only; the default path is untouched (later migration step). +- Do NOT remove or alter the per-owner Task dispatch code path — it remains the default and the fallback (later migration step). +- Do NOT rewrite scenarios 002 or 004 — they assert default-mode behavior, which this spec leaves unchanged. +- Do NOT add `style_summary` extraction to `build-index.py` or `agents/index.json` (later step). +- Do NOT change any bot-side prompt in bborbe/maintainer (separate repo). +- Do NOT add new trigger types (`symbols:` / `imports:`) — later hardening wave. +- Do NOT author missing `### RULE` blocks — content backlog, separate spec. +- Do NOT add a tunable batch-size threshold knob — the >20-rule batching split is a fixed in-prose rule; if a future consumer demands a configurable threshold, that's a separate spec. + +## Desired Behavior + +1. Both commands accept `--selector` as a recognized mode token. When present, the dispatcher enters selector mode for Step 4; when absent, Step 4 runs exactly as today (short/standard/full unchanged). +2. In selector mode, Step 4a (mechanical funnel) and Step 4b-i (jq glob match producing the candidate rule list) run unchanged and produce the candidate set the same way they do today. +3. A new **Step 4c-sel CLASSIFY** runs in-session (no spawn): input is the diff plus the Step 4b-i candidate rule list. It decides, per candidate, applicable or skipped-with-reason, under a recall-optimized contract ("INCLUDE if a reasonable reviewer would want to read this rule before judging. Do not evaluate compliance. When uncertain, include."). The applicable set is a subset of the candidate set — never a superset. (Step labels carry a `-sel` suffix because the default branch already owns `4c` = context conventions and `4d` = citation validation; the selector branch is a parallel namespace and must not collide.) +4. Architecture-tier rules bypass classify and are always applicable: a rule whose `enforcement` text contains "architecture", OR whose doc lives in `go-architecture-patterns.md` and belongs to the SRP/layering family, is unconditionally included whenever it is in the candidate set. +5. If the applicable set AND the mechanical findings are both empty, the dispatcher reports the literal short-circuit `selector clean — no adjudication needed` and skips to the Step 5 report (no adjudication turn). +6. A new **Step 4d-sel ADJUDICATE** runs in-session (no spawn): it reads ONLY the `### RULE` blocks of the applicable rules from their `doc_path` files (not whole docs), judges the full diff plus the mechanical findings, and emits findings each citing `rule_id` + file + line + severity bucket — the files' existing report buckets **Must Fix (Critical) / Should Fix (Important) / Nice to Have (Optional)**. If the applicable set exceeds 20 rules, adjudication splits into 2-3 thematic in-session passes (still zero spawns). +7. Step 4d citation validation (`scripts/validate-citations.sh`) runs unchanged over the selector-mode findings, dropping any finding whose `rule_id` is not in `rules/index.json`. +8. The Step 5 report includes a traceability section listing each classify skip-reason as `rule_id → one-line reason`. + +## Constraints + +- Markdown-only plugin repo: edits are confined to `commands/pr-review.md`, `commands/code-review.md`, `docs/selector-mode-guide.md`, and a `## Unreleased` CHANGELOG.md bullet. No Go, no script changes, no new scripts. +- The selector-mode classify/adjudicate contracts now live in `docs/selector-mode-guide.md` (single source of truth); the command files carry thin pointers that resolve and execute the guide. AC6 literals (`--selector`, `selector clean — no adjudication needed`) remain in both command files. +- The jq glob pre-filter (Step 4b-i) prose and behavior MUST NOT change — it is the candidate source and the structural guarantee that classify can only narrow. +- `scripts/validate-citations.sh` is invoked unchanged; its contract (drop findings whose `rule_id` is absent from `rules/index.json`) is frozen. +- The default (no-flag) Step 4 logic — early exit, 4.0 preflight, 4a runner, 4b-i jq, 4b-ii per-owner dispatch, timing instrumentation, 4c conventions, 4d citation — MUST remain byte-for-byte as today; selector mode is added alongside, not by editing the existing branch. +- The two command files' selector sections MUST stay thin-pointer siblings: same step numbering references (4c-sel CLASSIFY, 4d-sel ADJUDICATE), same guide resolution chain, same short-circuit string, differing only in diff source and findings path. +- `docs/dod.md` is the repo's Definition-of-Done gate; the implementation must satisfy it (CHANGELOG `## Unreleased` entry present; no personal paths introduced; `coding:` prefix on any agent reference). +- Severity vocabulary is the files' existing report buckets — **Must Fix (Critical) / Should Fix (Important) / Nice to Have (Optional)**. Do not introduce new severity names. For AC1's golden comparison apply the deterministic map Critical↔`critical`, Important↔`major`, Optional↔`nit` (the golden JSON uses the bot's verdict schema vocabulary). +- The golden baseline is committed in-repo at `specs/fixtures/golden-legacy-verdict.json` (13 findings frozen from the legacy per-owner pipeline on bborbe/maintainer#2, captured 2026-06-11) — AC1 is verifiable by anyone from this file. The broader architecture rationale (v2/v3 design evolution, external review) lives in the author's design notes; everything binding for implementation is in this spec. + +## Failure Modes + +| Trigger | Expected behavior | Recovery | Detection | +|---------|-------------------|----------|-----------| +| Mechanical funnel (`ast-grep-runner.sh`) missing/fails in selector mode | Same as today: note "mechanical funnel unavailable" and proceed to classify with judgment-rule candidates only | Re-run with toolchain present | Report line "mechanical funnel unavailable" | +| jq glob match (4b-i) yields empty candidate set AND mechanical findings empty | Short-circuit: report `selector clean — no adjudication needed`, skip to Step 5 | None needed — clean diff | Literal short-circuit string in report | +| Classify attempts to add a rule absent from the candidate set | Forbidden by invariant — applicable MUST stay a subset; the extra rule is dropped | Operator re-reads traceability section; invariant holds structurally | Report shows applicable count ≤ candidate count; every applicable id is in candidate list | +| Applicable set > 20 rules | Adjudication splits into 2-3 thematic in-session passes, zero spawns | None — batching is automatic | Report shows multiple adjudication passes, no `subagent_type` events | +| Adjudicate emits a finding citing a `rule_id` absent from `rules/index.json` | `validate-citations.sh` drops it; dispatcher logs the offender to stderr and continues with the validated subset | Author the missing rule block (separate backlog) | validate-citations.sh stderr log; finding absent from final report | +| Diff exceeds context budget on a mega-PR | Adjudicate consumes the diff truncated to changed-files-only (vendor/node_modules already excluded at Step 0c) | Re-run full mode for an exhaustive sweep | Report notes truncation | + +## Security / Abuse Cases + +Not user-facing input — the dispatcher operates on a diff already fetched into a controlled worktree, and the candidate set is bounded by the deterministic jq glob match. The only trust boundary is the LLM's adjudication output, which `validate-citations.sh` already gates against `rules/index.json` (hallucinated `rule_id`s are dropped). The narrowing invariant (`applicable ⊆ candidates`) structurally prevents the classify turn from injecting rules outside the deterministic pre-filter. No new attacker-controllable surface is introduced. + +## Acceptance Criteria + +- [ ] AC1: `--selector` mode on the bborbe/maintainer#2 fixture surfaces all 13 golden rule_ids at matching severities (the `go-library/semver-vprefix-tag-required` anomaly may be excluded when the clone carries tags) — evidence: selector run findings diffed against `golden-legacy-verdict.json` per rule_id; every golden rule_id present at its golden severity. +- [ ] AC2: zero Task/sub-agent spawns occur in `--selector` mode — evidence: session transcript / stdout contains no `tool_use` block whose `subagent_type` starts with `coding:` (`grep '"subagent_type"' ` returns no `coding:*` lines). +- [ ] AC3: default mode (no flag) behavior unchanged — evidence: `make check-acceptance` exits 0 with scenarios 002 and 004 assertions holding verbatim; `git diff` shows no edits to any `scenarios/*.md` file. +- [ ] AC4: classify respects the narrowing invariant — evidence: the Step 5 traceability/report shows a candidate count and an applicable count with candidate ≥ applicable, and every applicable rule_id also appears in the Step 4b-i candidate list. +- [ ] AC5: `make precommit` exits 0 (check-links, check-json, check-index, check-coverage, check-acceptance all green) — evidence: exit code 0. +- [ ] AC6: both `commands/pr-review.md` and `commands/code-review.md` define a `--selector` mode with Step 4c-sel CLASSIFY and Step 4d-sel ADJUDICATE and the literal short-circuit `selector clean — no adjudication needed` — evidence: `grep -n -- '--selector' commands/pr-review.md commands/code-review.md` and `grep -n 'selector clean — no adjudication needed' commands/pr-review.md commands/code-review.md` each return ≥1 line per file. +- [ ] AC7: CHANGELOG.md has a `## Unreleased` bullet describing the selector mode — evidence: `grep -n -A20 '## Unreleased' CHANGELOG.md` shows a bullet mentioning `--selector`. +- [ ] AC8: classify actually narrows on the fixture (anti-no-op guard) — evidence: on the bborbe/maintainer#2 walk, the Step 5 traceability section contains ≥1 `rule_id → skip-reason` entry and the candidate count strictly exceeds the applicable count. (The legacy golden shows 31 glob-matched judgment rules but only ~13 findings-relevant ones — a working classify must skip at least one candidate with a reason.) + +Scenario coverage: NO new scenario. The behavior is reachable by the existing acceptance harness plus a manual fixture walk; default-mode regression is already locked by scenarios 002/004, which AC3 keeps green. Adding an E2E scenario for the opt-in path is premature before the default flips (a later migration step explicitly owns the 002/004 rewrites). + +## Verification + +``` +make precommit +grep -n -- '--selector' commands/pr-review.md commands/code-review.md +grep -n 'selector clean — no adjudication needed' commands/pr-review.md commands/code-review.md +grep -n -A20 '## Unreleased' CHANGELOG.md +``` + +Expected: `make precommit` exits 0; both greps over the command files return ≥1 line per file; the CHANGELOG grep shows an Unreleased bullet naming `--selector`. Manual fixture walk on bborbe/maintainer#2 confirms AC1, AC2, AC4 against `golden-legacy-verdict.json`. + +## Suggested Decomposition + +| # | Prompt focus | Covers DBs | Covers ACs | Depends on | +|---|---|---|---|---| +| 1 | Add `--selector` mode + Step 4c CLASSIFY + Step 4d ADJUDICATE to `commands/pr-review.md` (mode parse, candidate reuse, classify contract, architecture bypass, short-circuit, adjudicate rule-block read + batching, citation reuse, traceability) + CHANGELOG Unreleased bullet | 1-8 | AC2, AC4, AC6, AC7 | — | +| 2 | Mirror the identical selector section into `commands/code-review.md` as a sibling (same step numbers, contracts, short-circuit string; adapt only `directory`/in-place diff source) | 1-8 | AC6 | prompt 1 | +| 3 | Fixture validation walk on bborbe/maintainer#2 vs `golden-legacy-verdict.json` + full precommit/acceptance gate | — | AC1, AC3, AC5 | prompts 1-2 | + +Rationale: prompt 1 establishes the canonical selector prose in pr-review.md; prompt 2 copies it verbatim into the sibling so they cannot drift; prompt 3 validates against the frozen golden and confirms the default path did not regress. Ordering avoids two prompts editing overlapping prose, and keeps the sibling-consistency constraint enforceable by diffing prompt 2's output against prompt 1's. + +## Do-Nothing Option + +If we don't do this, per-owner Task fanout remains the only review path: wall-time stays coupled to rule count (~40 min on a standard 8-owner Go PR, growing as the rule base grows toward 120+), and the bot pays 2 + N calls per PR. The current path is correct and acceptable for now — that's why this ships flagged and default-OFF — but it is the bottleneck the redesign exists to remove. Shipping Step 1 behind a flag is the lowest-risk way to validate the in-session path against the golden before flipping the default in a later step.