Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,17 @@ Agents are invoked by commands — you rarely call them directly. Each reads its

</details>

## Acceptance Scenarios

End-to-end acceptance walks for the doc-driven review pipeline, following the [dark-factory scenario writing guide](https://github.com/bborbe/dark-factory/blob/master/docs/rules/scenario-writing.md). Each scenario file under [`scenarios/`](scenarios/) is a manually-walked checklist; promote `draft → active` after the first successful walk.

| # | Scenario | Validates |
|---|---|---|
| 001 | [toolchain-preflight](scenarios/001-toolchain-preflight.md) | Step 4.0 / Step 0 preflight blocks exit 1 with documented stderr when `ast-grep` / `sg` is absent from PATH |
| 002 | [clean-pr-zero-findings](scenarios/002-clean-pr-zero-findings.md) | `/coding:code-review master` against a zero-violation diff produces empty Must Fix / Should Fix / Nice to Have (no LLM hallucination) |
| 003 | [scaling-funnel-100-files](scenarios/003-scaling-funnel-100-files.md) | 100-file synthetic fixture: mechanical funnel ≤30s, distinct Owners ≤30 (structural ceiling on Step 4b LLM calls) |
| 004 | [findings-exist-path](scenarios/004-findings-exist-path.md) | `/coding:pr-review` against the stable test PR [bborbe/maintainer#2](https://github.com/bborbe/maintainer/pull/2): Step 4a surfaces ≥4 findings, every Owner has an agent file, citation discipline holds |

## Contributing

Issues and pull requests welcome at [github.com/bborbe/coding](https://github.com/bborbe/coding).
Expand Down
7 changes: 7 additions & 0 deletions llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -82,3 +82,10 @@
- [PRD Guide](docs/prd-guide.md): Product Requirements Documents
- [ADR Guide](docs/adr-guide.md): Architecture Decision Records
- [Markdown & Todos](docs/markdown-todo-guide.md): Formatting standards

## Acceptance Scenarios
End-to-end acceptance walks for the doc-driven review pipeline (`/coding:code-review`, `/coding:pr-review`). Each scenario is a manually-walked checklist under `scenarios/`; status `active` = walked successfully at least once.
- [Scenario 001 — toolchain-preflight](scenarios/001-toolchain-preflight.md): Step 4.0 / Step 0 preflight blocks exit 1 with documented stderr when `ast-grep` / `sg` is absent from PATH
- [Scenario 002 — clean-pr-zero-findings](scenarios/002-clean-pr-zero-findings.md): `/coding:code-review master` against a zero-violation diff produces empty Must Fix / Should Fix / Nice to Have
- [Scenario 003 — scaling-funnel-100-files](scenarios/003-scaling-funnel-100-files.md): 100-file synthetic fixture, mechanical funnel ≤30s, distinct Owners ≤30
- [Scenario 004 — findings-exist-path](scenarios/004-findings-exist-path.md): `/coding:pr-review` against bborbe/maintainer#2 — Step 4a surfaces ≥4 findings, every Owner has an agent file, citation discipline holds
53 changes: 53 additions & 0 deletions scenarios/004-findings-exist-path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
status: active
---

# Scenario 004: `/coding:pr-review` against a real PR surfaces findings + adjudicates per Owner

Validates the load-bearing path the bot executes on every real Go PR with violations: `/coding:pr-review <URL>` fetches the diff from GitHub, the ast-grep funnel surfaces findings in Step 4a, Step 4b dispatches per-Owner Task agents for each affected owner, the dispatcher's Step 4d citation validator passes them through, and Step 5 reports the violations under Must Fix / Should Fix with valid rule_id citations. Scenarios 001-003 do not cover this — 001 tests toolchain-absent failure, 002 tests zero-findings happy path, 003 tests the mechanical funnel in isolation. Without 004 the regression risk is "Step 4b silently drops findings_by_owner — bot APPROVES a PR with real violations because the per-Owner adjudication phase no-ops".

## Test PR

The stable fixture is [bborbe/maintainer#2](https://github.com/bborbe/maintainer/pull/2) (branch `delete-this-pr-never`, title `test: delete-this-pr-never`). The PR diff is one file (`pkg/scenarios-test-fixture/violations.go`, 38 additions) committed expressly to violate 5 ast-grep YAML rules:

- `go-architecture/constructor-returns-interface` (`NewService1` returns `*Service1`)
- `go-architecture/no-globals-or-singletons` (`sharedService1` at package scope)
- `go-time/no-time-now-direct` (bare `time.Now()`)
- `go-concurrency/no-raw-go-func` (bare `go func(){...}()`)
- `go-errors/no-fmt-errorf` (`fmt.Errorf` in production code)

The PR stays open in perpetuity — the title says so. Walking 004 = re-pointing the dispatcher at the same SHA and verifying the funnel still surfaces all 5 violations.

## Setup

- [ ] `ast-grep --version` resolves on host
- [ ] Run `make build-index` in the coding repo root so the Owner-lookup in Action below reads a current index
- [ ] Fetch the PR head SHA: `PR_SHA=$(gh pr view 2 --repo bborbe/maintainer --json headRefOid -q .headRefOid)`; confirm non-empty
- [ ] Confirm the PR is still open and shows exactly 1 changed file: `gh pr view 2 --repo bborbe/maintainer --json state,changedFiles -q '"\(.state) changedFiles=\(.changedFiles)"'` prints `OPEN changedFiles=1`
- [ ] Clone the PR to a worktree for the dispatcher to scan (mirrors `commands/pr-review.md` Step 0b): `WORK=$(mktemp -d) && cd "$WORK" && git clone --depth=1 --branch delete-this-pr-never git@github.com:bborbe/maintainer.git . && git remote update --prune`
- [ ] Confirm the fixture file landed: `test -f pkg/scenarios-test-fixture/violations.go`

## Action

- [ ] Generate the PR diff scoped to the fixture file (mirrors `commands/pr-review.md` Step 0c): `git diff origin/master...HEAD -- pkg/scenarios-test-fixture/violations.go > /tmp/scen004-diff.patch; wc -l /tmp/scen004-diff.patch`
- [ ] Step 4.0 preflight (mirrors `commands/pr-review.md`): `(command -v ast-grep >/dev/null 2>&1 || command -v sg >/dev/null 2>&1) || exit 1; echo "preflight ok"`
- [ ] Step 4a mechanical funnel against the fixture: `cd ~/Documents/workspaces/coding && ast-grep scan "$WORK/pkg/scenarios-test-fixture" > /tmp/scen004-findings.log 2>&1; echo $? > /tmp/scen004-ag-exit`
- [ ] Step 4b owner extraction: `grep -oE 'go-[a-z-]+/[a-z-]+' /tmp/scen004-findings.log | sort -u > /tmp/scen004-rules && python3 -c "import json; idx={r['id']:r['owner'] for r in json.load(open('rules/index.json'))}; rules=open('/tmp/scen004-rules').read().splitlines(); owners=set(idx.get(r) for r in rules if r in idx and idx.get(r)); print('\\n'.join(sorted(owners)))" > /tmp/scen004-owners`
- [ ] Step 4b agent-presence check: `while read owner; do test -f "agents/$owner.md" && echo "$owner: AGENT_PRESENT" || echo "$owner: AGENT_MISSING"; done < /tmp/scen004-owners > /tmp/scen004-agent-presence`
- [ ] Step 4d citation validation (every surfaced rule_id must exist in the index): `comm -23 /tmp/scen004-rules <(jq -r '.[].id' rules/index.json | sort) > /tmp/scen004-hallucinated-rules`
- [ ] Count findings: `grep -c '^error\[' /tmp/scen004-findings.log > /tmp/scen004-findings-count`

## Expected

- [ ] `cat /tmp/scen004-findings-count` returns ≥ `4` — at least 4 of the 5 deliberate violations surface. The 5th violation (`fmt.Errorf` flagged by `go-errors/no-fmt-errorf`) does not currently fire because the YAML pattern `fmt.Errorf($$$ARGS)` is parsed by tree-sitter Go grammar as a `type_conversion_expression` rather than a `call_expression`, so the pattern matches no real call sites. Tracked as a separate rule bug; the scenario's contract accepts the current 4-of-5 baseline so it can promote to `active` independent of that fix. When the YAML is corrected, lift this floor to `≥ 5`.
- [ ] `wc -l < /tmp/scen004-rules` returns ≥ `3` — multiple distinct rule_ids surfaced (negative control: if the fixture parses to zero, the funnel didn't run)
- [ ] `wc -l < /tmp/scen004-owners` returns ≥ `1` — at least one Owner has findings to adjudicate
- [ ] Every line in `/tmp/scen004-agent-presence` ends with `AGENT_PRESENT` — Step 4b can resolve every Owner's agent file (regression risk: a renamed agent file silently no-ops that owner's findings; this catches it)
- [ ] `wc -l < /tmp/scen004-hallucinated-rules` returns `0` — citation discipline holds: every surfaced rule_id is registered in the index
- [ ] Funnel wall-clock under 5 seconds — small fixture, mechanical layer fast

## Cleanup

- `rm -rf "$WORK" /tmp/scen004-*`

After the scenario passes, the operator should record the measured `(findings_count, distinct_rule_ids, distinct_owners)` tuple in the Progress section of the task page (`[[Refactor coding pr-review to doc-driven rules pipeline]]`) so future runs have a baseline. This is a follow-up note, not part of the scenario contract.
Loading