diff --git a/README.md b/README.md index ce59dcd..dddf803 100644 --- a/README.md +++ b/README.md @@ -243,6 +243,17 @@ Agents are invoked by commands — you rarely call them directly. Each reads its +## Acceptance Scenarios + +End-to-end acceptance walks for the doc-driven review pipeline, following the [dark-factory scenario writing guide](https://github.com/bborbe/dark-factory/blob/master/docs/rules/scenario-writing.md). Each scenario file under [`scenarios/`](scenarios/) is a manually-walked checklist; promote `draft → active` after the first successful walk. + +| # | Scenario | Validates | +|---|---|---| +| 001 | [toolchain-preflight](scenarios/001-toolchain-preflight.md) | Step 4.0 / Step 0 preflight blocks exit 1 with documented stderr when `ast-grep` / `sg` is absent from PATH | +| 002 | [clean-pr-zero-findings](scenarios/002-clean-pr-zero-findings.md) | `/coding:code-review master` against a zero-violation diff produces empty Must Fix / Should Fix / Nice to Have (no LLM hallucination) | +| 003 | [scaling-funnel-100-files](scenarios/003-scaling-funnel-100-files.md) | 100-file synthetic fixture: mechanical funnel ≤30s, distinct Owners ≤30 (structural ceiling on Step 4b LLM calls) | +| 004 | [findings-exist-path](scenarios/004-findings-exist-path.md) | `/coding:pr-review` against the stable test PR [bborbe/maintainer#2](https://github.com/bborbe/maintainer/pull/2): Step 4a surfaces ≥4 findings, every Owner has an agent file, citation discipline holds | + ## Contributing Issues and pull requests welcome at [github.com/bborbe/coding](https://github.com/bborbe/coding). diff --git a/llms.txt b/llms.txt index 8183764..2734386 100644 --- a/llms.txt +++ b/llms.txt @@ -82,3 +82,10 @@ - [PRD Guide](docs/prd-guide.md): Product Requirements Documents - [ADR Guide](docs/adr-guide.md): Architecture Decision Records - [Markdown & Todos](docs/markdown-todo-guide.md): Formatting standards + +## Acceptance Scenarios +End-to-end acceptance walks for the doc-driven review pipeline (`/coding:code-review`, `/coding:pr-review`). Each scenario is a manually-walked checklist under `scenarios/`; status `active` = walked successfully at least once. +- [Scenario 001 — toolchain-preflight](scenarios/001-toolchain-preflight.md): Step 4.0 / Step 0 preflight blocks exit 1 with documented stderr when `ast-grep` / `sg` is absent from PATH +- [Scenario 002 — clean-pr-zero-findings](scenarios/002-clean-pr-zero-findings.md): `/coding:code-review master` against a zero-violation diff produces empty Must Fix / Should Fix / Nice to Have +- [Scenario 003 — scaling-funnel-100-files](scenarios/003-scaling-funnel-100-files.md): 100-file synthetic fixture, mechanical funnel ≤30s, distinct Owners ≤30 +- [Scenario 004 — findings-exist-path](scenarios/004-findings-exist-path.md): `/coding:pr-review` against bborbe/maintainer#2 — Step 4a surfaces ≥4 findings, every Owner has an agent file, citation discipline holds diff --git a/scenarios/004-findings-exist-path.md b/scenarios/004-findings-exist-path.md new file mode 100644 index 0000000..047caa9 --- /dev/null +++ b/scenarios/004-findings-exist-path.md @@ -0,0 +1,53 @@ +--- +status: active +--- + +# Scenario 004: `/coding:pr-review` against a real PR surfaces findings + adjudicates per Owner + +Validates the load-bearing path the bot executes on every real Go PR with violations: `/coding:pr-review ` fetches the diff from GitHub, the ast-grep funnel surfaces findings in Step 4a, Step 4b dispatches per-Owner Task agents for each affected owner, the dispatcher's Step 4d citation validator passes them through, and Step 5 reports the violations under Must Fix / Should Fix with valid rule_id citations. Scenarios 001-003 do not cover this — 001 tests toolchain-absent failure, 002 tests zero-findings happy path, 003 tests the mechanical funnel in isolation. Without 004 the regression risk is "Step 4b silently drops findings_by_owner — bot APPROVES a PR with real violations because the per-Owner adjudication phase no-ops". + +## Test PR + +The stable fixture is [bborbe/maintainer#2](https://github.com/bborbe/maintainer/pull/2) (branch `delete-this-pr-never`, title `test: delete-this-pr-never`). The PR diff is one file (`pkg/scenarios-test-fixture/violations.go`, 38 additions) committed expressly to violate 5 ast-grep YAML rules: + +- `go-architecture/constructor-returns-interface` (`NewService1` returns `*Service1`) +- `go-architecture/no-globals-or-singletons` (`sharedService1` at package scope) +- `go-time/no-time-now-direct` (bare `time.Now()`) +- `go-concurrency/no-raw-go-func` (bare `go func(){...}()`) +- `go-errors/no-fmt-errorf` (`fmt.Errorf` in production code) + +The PR stays open in perpetuity — the title says so. Walking 004 = re-pointing the dispatcher at the same SHA and verifying the funnel still surfaces all 5 violations. + +## Setup + +- [ ] `ast-grep --version` resolves on host +- [ ] Run `make build-index` in the coding repo root so the Owner-lookup in Action below reads a current index +- [ ] Fetch the PR head SHA: `PR_SHA=$(gh pr view 2 --repo bborbe/maintainer --json headRefOid -q .headRefOid)`; confirm non-empty +- [ ] Confirm the PR is still open and shows exactly 1 changed file: `gh pr view 2 --repo bborbe/maintainer --json state,changedFiles -q '"\(.state) changedFiles=\(.changedFiles)"'` prints `OPEN changedFiles=1` +- [ ] Clone the PR to a worktree for the dispatcher to scan (mirrors `commands/pr-review.md` Step 0b): `WORK=$(mktemp -d) && cd "$WORK" && git clone --depth=1 --branch delete-this-pr-never git@github.com:bborbe/maintainer.git . && git remote update --prune` +- [ ] Confirm the fixture file landed: `test -f pkg/scenarios-test-fixture/violations.go` + +## Action + +- [ ] Generate the PR diff scoped to the fixture file (mirrors `commands/pr-review.md` Step 0c): `git diff origin/master...HEAD -- pkg/scenarios-test-fixture/violations.go > /tmp/scen004-diff.patch; wc -l /tmp/scen004-diff.patch` +- [ ] Step 4.0 preflight (mirrors `commands/pr-review.md`): `(command -v ast-grep >/dev/null 2>&1 || command -v sg >/dev/null 2>&1) || exit 1; echo "preflight ok"` +- [ ] Step 4a mechanical funnel against the fixture: `cd ~/Documents/workspaces/coding && ast-grep scan "$WORK/pkg/scenarios-test-fixture" > /tmp/scen004-findings.log 2>&1; echo $? > /tmp/scen004-ag-exit` +- [ ] Step 4b owner extraction: `grep -oE 'go-[a-z-]+/[a-z-]+' /tmp/scen004-findings.log | sort -u > /tmp/scen004-rules && python3 -c "import json; idx={r['id']:r['owner'] for r in json.load(open('rules/index.json'))}; rules=open('/tmp/scen004-rules').read().splitlines(); owners=set(idx.get(r) for r in rules if r in idx and idx.get(r)); print('\\n'.join(sorted(owners)))" > /tmp/scen004-owners` +- [ ] Step 4b agent-presence check: `while read owner; do test -f "agents/$owner.md" && echo "$owner: AGENT_PRESENT" || echo "$owner: AGENT_MISSING"; done < /tmp/scen004-owners > /tmp/scen004-agent-presence` +- [ ] Step 4d citation validation (every surfaced rule_id must exist in the index): `comm -23 /tmp/scen004-rules <(jq -r '.[].id' rules/index.json | sort) > /tmp/scen004-hallucinated-rules` +- [ ] Count findings: `grep -c '^error\[' /tmp/scen004-findings.log > /tmp/scen004-findings-count` + +## Expected + +- [ ] `cat /tmp/scen004-findings-count` returns ≥ `4` — at least 4 of the 5 deliberate violations surface. The 5th violation (`fmt.Errorf` flagged by `go-errors/no-fmt-errorf`) does not currently fire because the YAML pattern `fmt.Errorf($$$ARGS)` is parsed by tree-sitter Go grammar as a `type_conversion_expression` rather than a `call_expression`, so the pattern matches no real call sites. Tracked as a separate rule bug; the scenario's contract accepts the current 4-of-5 baseline so it can promote to `active` independent of that fix. When the YAML is corrected, lift this floor to `≥ 5`. +- [ ] `wc -l < /tmp/scen004-rules` returns ≥ `3` — multiple distinct rule_ids surfaced (negative control: if the fixture parses to zero, the funnel didn't run) +- [ ] `wc -l < /tmp/scen004-owners` returns ≥ `1` — at least one Owner has findings to adjudicate +- [ ] Every line in `/tmp/scen004-agent-presence` ends with `AGENT_PRESENT` — Step 4b can resolve every Owner's agent file (regression risk: a renamed agent file silently no-ops that owner's findings; this catches it) +- [ ] `wc -l < /tmp/scen004-hallucinated-rules` returns `0` — citation discipline holds: every surfaced rule_id is registered in the index +- [ ] Funnel wall-clock under 5 seconds — small fixture, mechanical layer fast + +## Cleanup + +- `rm -rf "$WORK" /tmp/scen004-*` + +After the scenario passes, the operator should record the measured `(findings_count, distinct_rule_ids, distinct_owners)` tuple in the Progress section of the task page (`[[Refactor coding pr-review to doc-driven rules pipeline]]`) so future runs have a baseline. This is a follow-up note, not part of the scenario contract.