Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ Please choose versions by [Semantic Versioning](http://semver.org/).
* MINOR version when you add functionality in a backwards-compatible manner, and
* PATCH version when you make backwards-compatible bug fixes.

## Unreleased

- fix: ast-grep-runner.sh excludes `.git/` paths from both input file list and findings output — incident 2026-06-11 surfaced 274 stale agent-auditor findings from `.git/COMMIT_EDITMSG` during selector validation
- feat: scenarios/005 (selector clean short-circuit) and scenarios/006 (selector findings path) — two new E2E scenarios (walked 2026-06-11, promoted to active) contracting the selector-mode journeys for a README-only diff and the perpetual fixture PR
- feat: acceptance.sh section 5 "Selector mode contracts" — 16 scripted checks covering --selector parse, short-circuit string, GUIDE_OK/GUIDE_MISSING fail-fast, selector-mode-guide.md content, sibling consistency, citation-validator rejection of unknown rule_ids, and runner .git/ exclusion

## v0.20.0

- feat: add opt-in `--selector` mode to `/coding:pr-review` and `/coding:code-review` — replaces per-owner Task fanout (Step 4b-ii) with two in-session steps: Step 4c-sel CLASSIFY (narrows glob candidates via recall-optimized contract) and Step 4d-sel ADJUDICATE (reads only applicable rule blocks, emits findings in existing severity buckets); zero sub-agent spawns; default mode unchanged
Expand Down
2 changes: 2 additions & 0 deletions scenarios/002-clean-pr-zero-findings.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,5 @@ Updated for funnel v2 (PR #48): runner output now lands in `/tmp/code-review-fin
---

**Status**: updated for funnel v2 (PR #48), not yet walked. Requires a live Claude Code session; mark as walked once a session produces the stream-json transcript. Key new assertions vs v1: (1) findings path is `/tmp/code-review-findings.json` not an awk-extracted stdout block; (2) "funnel clean — no adjudication needed" message asserted; (3) zero `coding:*-assistant` subagent spawns verified from transcript.

Selector-mode sibling journey: scenarios/005. These two files keep guarding the legacy default path until the default flip retires it.
2 changes: 2 additions & 0 deletions scenarios/004-findings-exist-path.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,5 @@ go-time-assistant: AGENT_PRESENT
All 7 Expected items: **7/7 PASS**

Baseline tuple: `(findings_count=6, distinct_rule_ids=6, distinct_owners=4)`.

Selector-mode sibling journey: scenarios/006. These two files keep guarding the legacy default path until the default flip retires it.
38 changes: 38 additions & 0 deletions scenarios/005-selector-clean-short-circuit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
status: active
---

# Scenario 005: Selector mode short-circuits cleanly on a README-only whitespace diff

Validates that `/coding:code-review master selector` (in-place review — works on a local-only branch, mirroring scenario 002) against a branch whose diff is a single README.md whitespace edit emits `selector clean — no adjudication needed`, produces a report with empty Must Fix / Should Fix / Nice to Have sections, and spawns zero sub-agents — locking down the regression risk that the classify step incorrectly marks any rule as applicable when no rule-relevant files changed.

## Setup

- [ ] Clone master at the current HEAD: `WORK=$(mktemp -d) && cd "$WORK" && git clone --depth=1 --branch master git@github.com:bborbe/coding.git . && git checkout -b selector-clean-fixture`
- [ ] Apply a README-only whitespace edit (no `.go` / `.py` / `rules/**` / `docs/**` touched): `sed -i.bak 's/^# bborbe\/coding$/# bborbe\/coding /' README.md && rm README.md.bak && git add README.md && git commit -qm 'docs: whitespace typo'`
- [ ] `git diff --name-only master..HEAD` prints exactly one line: `README.md`
- [ ] `ast-grep --version` resolves on host (Step 4.0 preflight will pass)

## Action

- [ ] Run `/coding:code-review master selector` in `$WORK` (in-place; the fixture branch is local-only, which `/coding:pr-review`'s worktree flow does not support — that journey is scenario 006's) in a fresh Claude Code session; tee stdout to `/tmp/scen005-stdout.log`, stderr to `/tmp/scen005-stderr.log`, capture exit code to `/tmp/scen005-exit`

## Expected

- [ ] `cat /tmp/scen005-exit` prints `0`
- [ ] Guide resolution prints `GUIDE_OK` (not `GUIDE_MISSING`) — verify via: `grep -c 'GUIDE_OK' /tmp/scen005-stdout.log` ≥ 1
- [ ] Stdout contains the literal string `selector clean — no adjudication needed` — the Step 4c-sel short-circuit fires because no candidate rule is applicable to a README-only diff: `grep -c 'selector clean — no adjudication needed' /tmp/scen005-stdout.log` ≥ 1
- [ ] Traceability section is present in the report with a candidate count ≥ 0: `grep -c 'Candidates:' /tmp/scen005-stdout.log` ≥ 1; every line in the Skipped subsection carries a one-line reason (≤ 8 words) for each candidate that was evaluated
- [ ] Zero `coding:*` sub-agent spawns: `grep '"subagent_type"' /tmp/scen005-stdout.log` returns no lines, OR the session's stream-json transcript has no `tool_use` blocks whose `subagent_type` field starts with `coding:`
- [ ] Zero findings reported: either all three severity sections (`Must Fix` / `Should Fix` / `Nice to Have`) read `None.`, or the report carries an explicit funnel-clean/selector-clean statement with no finding entries (observable outcome over exact formatting; M2.7 walk 2026-06-11 used the latter form)
- [ ] Report includes `Should Fix (Important)` section with body `None.`
- [ ] Report includes `Nice to Have (Optional)` section with body `None.`
- [ ] Run completes in under 10 minutes (wall clock `real` < 10m)

## Cleanup

- `rm -rf "$WORK" /tmp/scen005-*`

---

**Status**: walked 2026-06-11 (MiniMax-M2.7-highspeed via an isolated `CLAUDE_CONFIG_DIR` (plugin clone pinned to the branch under test), skill @ f93717e): 1m29s, 17 turns, zero `coding:*` spawns, GUIDE_OK ×2 in stream, classify table with per-candidate `applies_when` + reasons, 5 candidates → 0 applicable → 5 skipped, literal `selector clean — no adjudication needed` fired. Transcript: `/tmp/scen005-walk2.jsonl` (volatile; key numbers recorded here). Promoted draft → active. Note: original draft used `/coding:pr-review` — switched to `/coding:code-review` (in-place) because the fixture branch is local-only; pr-review's worktree flow needs an origin branch (that journey is scenario 006's).
54 changes: 54 additions & 0 deletions scenarios/006-selector-findings-path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
status: active
---

# Scenario 006: Selector mode surfaces all seeded violations on the perpetual fixture PR

Validates that `/coding:pr-review master selector` against `bborbe/maintainer#2` surfaces all 5 deliberate mechanical violations by name in the adjudication report, emits at least 1 judgment-tier finding beyond the mechanical 5, produces a traceability section with ≥ 1 skipped candidate carrying a reason, and spawns zero sub-agents — locking down the regression risk that selector mode silently drops owners or skips the adjudication step on a real multi-violation diff.

## Test PR

The stable fixture is [bborbe/maintainer#2](https://github.com/bborbe/maintainer/pull/2) (branch `delete-this-pr-never`, title `test: delete-this-pr-never`). The diff is one file (`pkg/scenarios-test-fixture/violations.go`, 38 additions) committed to violate 5 ast-grep YAML rules:

- `go-architecture/constructor-returns-interface`
- `go-architecture/no-globals-or-singletons`
- `go-time/no-time-now-direct`
- `go-concurrency/no-raw-go-func`
- `go-errors/no-fmt-errorf`

The PR stays open in perpetuity. The baseline for expected finding shape is `specs/fixtures/golden-legacy-verdict.json`.

## Setup

- [ ] `ast-grep --version` resolves on host
- [ ] Confirm the PR is still open and shows exactly 1 changed file: `gh pr view 2 --repo bborbe/maintainer --json state,changedFiles -q '"state=\(.state) changedFiles=\(.changedFiles)"'` prints `state=OPEN changedFiles=1`
- [ ] Clone the PR branch to a temporary worktree (mirrors `commands/pr-review.md` Step 0b): `WORK=$(mktemp -d) && cd "$WORK" && git clone --depth=1 --branch delete-this-pr-never git@github.com:bborbe/maintainer.git . && git remote update --prune`
- [ ] Confirm fixture file landed: `test -f pkg/scenarios-test-fixture/violations.go`

## Action

- [ ] Run `/coding:pr-review master selector` against `$WORK` in a fresh Claude Code session; tee stdout to `/tmp/scen006-stdout.log`, stderr to `/tmp/scen006-stderr.log`, capture exit code to `/tmp/scen006-exit`

## Expected

- [ ] `cat /tmp/scen006-exit` prints `0`
- [ ] Guide resolution prints `GUIDE_OK` (not `GUIDE_MISSING`): `grep -c 'GUIDE_OK' /tmp/scen006-stdout.log` ≥ 1
- [ ] All 5 seeded mechanical violations named in the report by rule_id — verify each is present in stdout:
- `grep -c 'go-architecture/constructor-returns-interface' /tmp/scen006-stdout.log` ≥ 1
- `grep -c 'go-architecture/no-globals-or-singletons' /tmp/scen006-stdout.log` ≥ 1
- `grep -c 'go-time/no-time-now-direct' /tmp/scen006-stdout.log` ≥ 1
- `grep -c 'go-concurrency/no-raw-go-func' /tmp/scen006-stdout.log` ≥ 1
- `grep -c 'go-errors/no-fmt-errorf' /tmp/scen006-stdout.log` ≥ 1
- [ ] At least 1 judgment-tier finding beyond the mechanical 5 is reported (e.g. `go-licensing/source-file-header-required` per `specs/fixtures/golden-legacy-verdict.json`): total finding count in stdout > 5, or a non-mechanical rule_id appears in the report
- [ ] Traceability section present with candidates ≥ applicable: `grep -c 'Candidates:' /tmp/scen006-stdout.log` ≥ 1; at least 1 skipped candidate carries a one-line reason — `grep -c 'Skipped' /tmp/scen006-stdout.log` ≥ 1 (anti-no-op guard: a review that marks every candidate applicable would not satisfy this)
- [ ] Zero `coding:*` sub-agent spawns: `grep '"subagent_type"' /tmp/scen006-stdout.log` returns no lines, OR the session's stream-json transcript has no `tool_use` blocks whose `subagent_type` field starts with `coding:`
- [ ] Every finding's `rule_id` is present in `rules/index.json` (citation discipline holds): extract rule_ids from the report, cross-check against index — zero hallucinated rule_ids
- [ ] Run completes in under 10 minutes (wall clock `real` < 10m)

## Cleanup

- `rm -rf "$WORK" /tmp/scen006-*`

---

**Status**: walked 2026-06-11 (MiniMax-M2.7-highspeed via an isolated `CLAUDE_CONFIG_DIR` (plugin clone pinned to the branch under test), branch `feature/selector-scenarios` @ f93717e): 3m21s, 30 turns, zero `coding:*` spawns, GUIDE_OK, all 5 seeded rule_ids + `go-licensing/source-file-header-required` (judgment tier) surfaced, traceability 31 candidates → 5 applicable → 26 skipped-with-reasons. Transcript: `/tmp/scen006-walk.jsonl` (volatile; key numbers recorded here). Promoted draft → active.
126 changes: 122 additions & 4 deletions scripts/acceptance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ ACCEPTANCE_FAIL=0
ok() { ACCEPTANCE_PASS=$((ACCEPTANCE_PASS+1)); printf " ✅ %s\n" "$1"; }
fail() { ACCEPTANCE_FAIL=$((ACCEPTANCE_FAIL+1)); printf " ❌ %s\n" "$1"; }

echo "=== 1/4 Mode coverage ==="
echo "=== 1/5 Mode coverage ==="

# Short mode in both dispatcher commands must skip Step 4 entirely (no runner agent).
if grep -qE "Short Mode.*No agents|Short Mode.*skip" commands/pr-review.md && \
Expand Down Expand Up @@ -76,7 +76,7 @@ else
fail "full mode references only $conditional_agents legacy-path agents (< 4) in code-review.md"
fi

echo "=== 2/4 Per-language routing ==="
echo "=== 2/5 Per-language routing ==="

# Per-Owner adjudication block present (Step 4b) — the routing surface.
if grep -qE "Per-Owner adjudication|coding:<owner>|findings_by_owner" commands/code-review.md && \
Expand Down Expand Up @@ -112,7 +112,7 @@ else
fail "cross-language owner leak: go→python=$go_python_leak, python→go=$python_go_leak"
fi

echo "=== 3/4 Context loading (Step 2.5 globs) ==="
echo "=== 3/5 Context loading (Step 2.5 globs) ==="

# Each canonical context-doc mapping must be referenced in both dispatcher commands.
for mapping in "teamvault-conventions.md" "go-k8s-binary-conventions.md" "k8s-manifest-guide.md" "changelog-guide.md"; do
Expand All @@ -123,7 +123,7 @@ for mapping in "teamvault-conventions.md" "go-k8s-binary-conventions.md" "k8s-ma
fi
done

echo "=== 4/4 Broken-YAML isolation ==="
echo "=== 4/5 Broken-YAML isolation ==="

# Construct a sandbox: one valid YAML + one syntactically broken YAML.
# Verify ast-grep errors on the broken one but the valid one still surfaces findings.
Expand Down Expand Up @@ -160,6 +160,124 @@ else
fail "good YAML emitted zero findings — broken YAML may be masking the others when run together"
fi

echo "=== 5/5 Selector mode contracts ==="

# (a) Both command files reference --selector token in Step 1 parse, the
# short-circuit string, and the GUIDE_OK/GUIDE_MISSING fail-fast block.
for cmd in commands/pr-review.md commands/code-review.md; do
if grep -qE '\-\-selector|selector.*mode' "$cmd"; then
ok "$cmd: --selector token present in Step 1 parse"
else
fail "$cmd: --selector token missing from Step 1 parse"
fi
if grep -qF 'selector clean — no adjudication needed' "$cmd"; then
ok "$cmd: short-circuit string 'selector clean — no adjudication needed' present"
else
fail "$cmd: short-circuit string 'selector clean — no adjudication needed' missing"
fi
if grep -qF 'GUIDE_OK' "$cmd" && grep -qF 'GUIDE_MISSING' "$cmd"; then
ok "$cmd: GUIDE_OK/GUIDE_MISSING fail-fast block present"
else
fail "$cmd: GUIDE_OK/GUIDE_MISSING fail-fast block missing"
fi
done

# (b) docs/selector-mode-guide.md exists and contains the required sections
# and the verbatim recall contract sentence.
SELECTOR_GUIDE="docs/selector-mode-guide.md"
if [ -f "$SELECTOR_GUIDE" ]; then
ok "docs/selector-mode-guide.md exists"
else
fail "docs/selector-mode-guide.md missing"
fi
if grep -qF 'Step 4c-sel' "$SELECTOR_GUIDE" && grep -qF 'CLASSIFY' "$SELECTOR_GUIDE"; then
ok "selector guide contains Step 4c-sel CLASSIFY"
else
fail "selector guide missing Step 4c-sel CLASSIFY"
fi
if grep -qF 'Step 4d-sel' "$SELECTOR_GUIDE" && grep -qF 'ADJUDICATE' "$SELECTOR_GUIDE"; then
ok "selector guide contains Step 4d-sel ADJUDICATE"
else
fail "selector guide missing Step 4d-sel ADJUDICATE"
fi
if grep -qF 'When uncertain, include.' "$SELECTOR_GUIDE"; then
ok "selector guide contains verbatim recall contract sentence 'When uncertain, include.'"
else
fail "selector guide missing verbatim recall contract sentence 'When uncertain, include.'"
fi

# (c) Sibling consistency: both pr-review.md and code-review.md reference the
# same guide filename and the same step labels (4c-sel, 4d-sel).
pr_guide=$(grep -oE 'selector-mode-guide\.md' commands/pr-review.md | head -1)
cr_guide=$(grep -oE 'selector-mode-guide\.md' commands/code-review.md | head -1)
if [ "$pr_guide" = "selector-mode-guide.md" ] && [ "$cr_guide" = "selector-mode-guide.md" ]; then
ok "both commands reference the same guide filename: selector-mode-guide.md"
else
fail "commands reference different guide filenames: pr-review='$pr_guide' code-review='$cr_guide'"
fi
for label in '4c-sel' '4d-sel'; do
pr_has=$(grep -c "$label" commands/pr-review.md || true)
cr_has=$(grep -c "$label" commands/code-review.md || true)
if [ "$pr_has" -ge 1 ] && [ "$cr_has" -ge 1 ]; then
ok "step label '$label' present in both commands"
else
fail "step label '$label' missing — pr-review=$pr_has code-review=$cr_has"
fi
done

# (d) Citation validator rejects unknown rule_ids: build a temp findings JSON
# citing a fake rule_id, run validate-citations.sh, assert non-zero exit.
FAKE_FINDINGS="$WORK/fake-findings.json"
cat > "$FAKE_FINDINGS" <<'FAKE_EOF'
[{"rule_id": "fake/this-rule-does-not-exist", "file": "x.go", "line": 1}]
FAKE_EOF
fake_exit=0
bash scripts/validate-citations.sh "$FAKE_FINDINGS" > /dev/null 2>&1 || fake_exit=$?
if [ "$fake_exit" -ne 0 ]; then
ok "citation validator exits non-zero for unknown rule_id 'fake/this-rule-does-not-exist'"
else
fail "citation validator returned 0 for unknown rule_id — validator not enforcing index membership"
fi

# (e) Runner .git exclusion: create a temp dir with a .git/COMMIT_EDITMSG
# containing a go-errors violation and a normal .go file with the same
# violation; run ast-grep-runner.sh over both paths; assert findings contain
# the .go file hit and ZERO findings whose file path contains .git/
GIT_EXCL_DIR="$WORK/git-excl-test"
mkdir -p "$GIT_EXCL_DIR/.git" "$GIT_EXCL_DIR/pkg"
# Plant a violation in .git/COMMIT_EDITMSG (should be excluded)
cat > "$GIT_EXCL_DIR/.git/COMMIT_EDITMSG" <<'GIT_EOF'
package x
import "fmt"
func G() error { return fmt.Errorf("stale") }
GIT_EOF
# Plant the same violation in a normal .go file (should be found)
cat > "$GIT_EXCL_DIR/pkg/real.go" <<'REAL_EOF'
package x
import "fmt"
func F() error { return fmt.Errorf("bad") }
REAL_EOF

RUNNER_OUT="$WORK/git-excl-runner.json"
bash scripts/ast-grep-runner.sh "$GIT_EXCL_DIR" \
"$GIT_EXCL_DIR/pkg/real.go" \
"$GIT_EXCL_DIR/.git/COMMIT_EDITMSG" \
> "$RUNNER_OUT" 2>/dev/null || true

git_findings=$(jq -r '[.findings_by_owner | to_entries[] | .value[] | .file] | map(select(contains("/.git/"))) | length' "$RUNNER_OUT" 2>/dev/null || echo "0")
if [ "$git_findings" -eq 0 ]; then
ok "runner excludes .git/ paths — zero findings with /.git/ in file path"
else
fail "runner produced $git_findings finding(s) from .git/ paths — exclusion not working"
fi

go_findings=$(jq -r '[.findings_by_owner | to_entries[] | .value[] | .file] | map(select(contains("pkg/real.go"))) | length' "$RUNNER_OUT" 2>/dev/null || echo "0")
if [ "$go_findings" -ge 1 ]; then
ok "runner still finds violations in normal .go file after .git/ exclusion"
else
fail "runner found zero violations in pkg/real.go — normal scan broken or no-fmt-errorf YAML missing"
fi

echo ""
echo "=== Summary ==="
echo " PASS: $ACCEPTANCE_PASS"
Expand Down
21 changes: 18 additions & 3 deletions scripts/ast-grep-runner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,22 @@ fi

TARGET_DIR="$(cd "$TARGET_DIR" && pwd)"

# Normalise changed files: make absolute, resolve relative-to-target
# Normalise changed files: make absolute, resolve relative-to-target.
# Incident 2026-06-11: 274 stale agent-auditor findings surfaced from
# .git/COMMIT_EDITMSG when selector validation passed git internals as
# changed-file args. Filter out any path containing /.git/ or starting
# with .git/ — defense in depth (the runner and caller both exclude).
CHANGED_FILES=()
for f in "$@"; do
case "$f" in
/*) CHANGED_FILES+=("$f") ;;
*) CHANGED_FILES+=("$TARGET_DIR/$f") ;;
/*) abs="$f" ;;
*) abs="$TARGET_DIR/$f" ;;
esac
# Skip .git/ internals — they are never source files to scan.
case "$abs" in
*/.git/*) continue ;;
esac
CHANGED_FILES+=("$abs")
done

START_MS=$(python3 -c 'import time; print(int(time.time()*1000))')
Expand Down Expand Up @@ -187,6 +196,12 @@ while IFS=$'\t' read -r rule_id yml_rel owner level; do
matched_text=$(printf '%s' "$match_line" | jq -r '.text // ""')
message=$(printf '%s' "$match_line" | jq -r '.message // ""')

# Defense-in-depth: drop findings from .git/ paths even if one slipped
# past the input filter above (e.g. absolute paths the caller supplied).
case "$file" in
*/.git/*) continue ;;
esac

finding=$(jq -n \
--arg rule_id "$rule_id" \
--arg rule_level "$level" \
Expand Down
Loading