Status: Phase 2 (Design) Complete, Phase 3 (Implementation) 90% Complete
Date: 2026-06-08
Branch: task/2301-verdict-comment-posting (5 commits ahead of origin/main at e483551)
Research Document: /tmp/agents-fix/.docs/research-2301-verdict-comment-posting.md
Design Document: /tmp/agents-fix/.docs/design-2301-verdict-comment-posting.md
The ADF review agents (pr-reviewer, pr-validator, pr-verifier) run real structural reviews but never post parseable verdict comments with the required Confidence Score: N/5 and Last reviewed commit: <head[:7]> footer. Since #2450 replaced the agent's self-posting task script with a one-line build_review_task prompt, the orchestrator's verdict-driven auto-merge and remediation paths have been non-functional. All "merged through the gate" PRs were actually manual clean-merges on green exit-code statuses, not real verdicts.
- Auto-merge: Never functions --
evaluate_pr_verdictalways returnsStaleRevieworNoReviewerComment - Remediation loop (#2264): Cannot dispatch fix-agents because no parseable CONDITIONAL verdict exists
- PR #17: Dead-code cleanup stuck at 3/5 with no path to autonomous remediation
- AC1: Comment present and parseable -- a
Last reviewed commit: <head[:7]>comment is posted by the orchestrator after every review run - AC2: Status verdict-derived --
adf/pr-reviewerstatus reflects the parsed verdict (Success only when parseable + >= min_confidence), not the agent exit code - AC3: Classifier sees what the drain sees -- the broadcast receiver Lagged-tolerance + drain .log read ensures complete output capture
- AC4: Verdict-driven merge + remediation resolve -- a fresh verdict comment for a new head enables auto-merge and remediation dispatch
| Issue | Status | Delivered |
|---|---|---|
| #2203 -- repo-local agents (6 polyrepos) | DONE | Merged, deployed, verified |
| #2225 -- doc-generator churn fix | DONE | Merged, deployed, verified |
| #2264 -- remediation loop | PARTIAL | Implemented, merged (PR #18), deployed, rolled back due to stampede |
| #2285 -- deploy-safe hardening | DONE | Implemented, merged (PR #21), deployed with kill-switch, verified |
| #2275 -- review-gate robustness | DONE | Implemented, merged (PR #23), deployed |
| #2301 -- verdict comment posting | RESEARCH/DESIGN STARTED | Research complete, design approved, implementation NOT started |
| Issue | Status | Delivered |
|---|---|---|
| #2301 -- verdict comment posting | IMPLEMENTATION 90% COMPLETE | 5 commits, 918 tests pass, P0/P1 fixes applied |
| P0 bug fix -- duplicate status post | FIXED | Removed duplicate post_terminal_commit_status that overwrote verdict-derived status |
| P1 fixes -- dead code, unused fields | FIXED | Removed #[allow(dead_code)] annotations, integrated fields into render_verdict_body |
| Verification + Validation | COMPLETE | 918 lib tests pass, fmt clean, structural review performed |
| Item | Status | Owner |
|---|---|---|
| Step 5: Integration tests (fake-Gitea harness) | NOT STARTED | Next step |
| Push branch to remote | NOT STARTED | Next step |
| Open PR via gitea-robot / gtr | NOT STARTED | Next step |
| Safe redeploy (disabled-first, then enable) | NOT STARTED | Post-merge |
| End-to-end verification on PR #17 | NOT STARTED | Post-deploy |
The #2450 change (replacing def.task with build_review_task) had an unintended side effect:
- Before #2450: The agent ran a bash script that reviewed + posted the verdict comment + verdict-derived status itself
- After #2450: The agent receives a one-line prompt summary, reviews, but is never told to post anything
- Status: Falls back to exit-code (green if exit 0, regardless of whether a verdict was produced)
- Classifier:
try_recvon the broadcast receiver lags out over 240s of streaming (256-slot ring overflow), producingempty_successclassification
Rather than resurrect the def.task script (rejected in design -- model adherence issues + double review spend), the fix makes the orchestrator own verdict posting:
- Template anchors in the dispatched prompt instruct the model to emit the four required sections
- Per-CLI extractor reads the orchestrator-written drain
.logand returns the final assistant text - Drain .log as capture source at reconcile time -- the orchestrator already writes it via
start_output_log_drain - Verdict poster reads the drain, runs the extractor, runs
parse_verdict, and posts the comment under thepr-reviewerlogin
The terraphim_spawner broadcast-lag issue (root cause of empty_success classification) is a separate concern in the terraphim-ai repo. This design routes around it by reading the drain .log, which is complete regardless of broadcast overflow.
pr_dispatch.rs:build_review_task()
|
v
[Agent runs, outputs to stdout/stderr]
|
v
lib.rs:start_output_log_drain() --> drain .log file (complete record)
|
v
reconcile_impl.rs (agent exits)
|
+-- read_drain_log() --> Vec<String> (complete lines)
|
+-- Lagged-tolerant try_recv loop --> Vec<String> (partial, from broadcast)
|
v
pr_review/extractor.rs:extract_final_assistant_text()
|
v
pr_review/poster.rs:post_verdict_from_drain()
|
+-- find_existing_verdict_for_head() [idempotency check]
+-- count_existing_reviews_for_head() [Reviews (N) counter]
+-- render_verdict_body() [wrap extracted text with canonical footer]
+-- parse_verdict() [sanity check before posting]
|
v
OutputPoster::post_raw_as_agent_for_project("pr-reviewer", body)
|
v
Gitea API POST /repos/{owner}/{repo}/issues/{pr_number}/comments
| Decision | Rationale | Alternatives Rejected |
|---|---|---|
| Orchestrator owns posting (not agent) | #2450 broke agent self-posting; orchestrator already has the drain .log | Resurrect def.task script (model adherence issues, double spend) |
| Drain .log over broadcast receiver | Broadcast 256-slot ring overflows at 240s; drain file is complete | Fix spawner broadcast (cross-repo, larger scope) |
| Per-CLI extractor | Different CLIs emit different JSON stream formats | Single parser (would miss format-specific events) |
| Idempotency on (project, pr, head_sha) | Re-gated PRs get new SHAs; same-SHA retries increment Reviews (N) | Idempotency on PR number only (would skip re-reviews) |
| NoParseableVerdict -> StatusState::Failure | The load-bearing guard: no verdict = NOT green | Exit-code fallback (preserves the bug) |
Top 5 (IN scope):
- Template anchors in dispatched prompt
- Per-CLI assistant-text extractor
- Drain .log read at reconcile
- Verdict comment poster + verdict-derived status
- Integration tests against fake-Gitea harness
Avoid At All Cost (20 eliminated):
- Resurrecting
def.taskscript (model adherence + double review spend) - Hybrid A+B approach (no gain over A alone)
- LLM-assisted synthesis (expensive, prompt-injectable)
- Retry on parse failure (compounds 240s cost)
- terraphim_spawner migration (out of scope, sidestepped by drain read)
- Changing
parse_verdictsignature (keep unchanged, it's the anchor) - Mock-based tests (use real fixtures)
- Config-wired kill-switch (already exists from #2285)
Status: DONE (commit f71b077)
File: crates/terraphim_orchestrator/src/pr_dispatch.rs
Change: build_review_task() now emits:
### Summary
### Confidence Score: N/5
### Inline Findings
<sub>Last reviewed commit: <first-7-hex-of-ADF_PR_HEAD_SHA></sub>
Test: pr_dispatch::tests::build_review_task_carries_template_anchors (passes)
Status: DONE (commit a7e8d71)
File: crates/terraphim_orchestrator/src/pr_review/extractor.rs (NEW)
Change: Pure function extract_final_assistant_text(lines, cli_tool) with per-CLI routing:
pi-rust/kimi: newline-delimited JSON events (message.content[*].text)claudestream-json:content_block_delta/message_endeventsopencode:type: "text"/part: {"text": ...}events- Non-JSON pass-through: concatenates non-empty, non-JSON, non-header lines
Tests: 8 unit tests against 6 real-captured drain fixtures (all pass)
Fixtures:
tests/fixtures/pr_review/drain_*.log(6 files)
Status: DONE (commit 72f8bda)
Files: crates/terraphim_orchestrator/src/reconcile_impl.rs, crates/terraphim_orchestrator/src/lib.rs
Changes:
read_drain_log(path)helper reads the orchestrator-written drain file- Lagged-tolerant
try_recvloop:Err(Lagged(n))continues draining instead of bailing - Drain lines preferred over broadcast receiver lines for downstream classifier
Tests: 4 unit tests in
drain_log_tests(all pass)
Status: DONE (commits 8edc6c6 + ededa27)
Files:
crates/terraphim_orchestrator/src/pr_review/poster.rs(NEW)crates/terraphim_orchestrator/src/pr_handlers_impl.rscrates/terraphim_orchestrator/src/reconcile_impl.rscrates/terraphim_orchestrator/src/lib.rscrates/terraphim_orchestrator/src/spawn_impl.rs
Changes:
PrVerdictMetastruct carries PR metadata for postingVerdictPostOutcomeenum:Posted,AlreadyPresent,NoParseableVerdict,NoDrainpost_verdict_from_drain()async function: reads drain -> extracts -> idempotency check -> renders body -> sanity-checks withparse_verdict-> posts commentManagedAgentgainsverdict_post: Option<PrVerdictMeta>fieldpr_handlers_impl.rs: pr-reviewer and pr-validator spawns setverdict_post: Some(meta)reconcile_impl.rs: verdict-derived status branch (Success for Posted/AlreadyPresent, Failure for NoParseableVerdict, Error for NoDrain)- P0 fix: Removed duplicate
post_terminal_commit_statusblock that overwrote verdict-derived status with exit-code status - P1 fixes: Removed dead code (
_path_markerstub), integratedauthor_login,title,diff_locintorender_verdict_body
Tests: 4 unit tests in poster::tests (all pass)
Target: crates/terraphim_orchestrator/tests/verdict_poster_tests.rs (NEW)
Harness: Existing real-axum fake-Gitea in tests/remediation_tests.rs:295-322
Test Cases:
- Verdict comment posted after agent exit (fake-Gitea receives POST)
- Status is verdict-derived (not exit-code) -- verify
set_commit_statuscall state - No false empty_success -- agent with no parseable text gets StatusState::Failure
- Idempotency -- second dispatch on same SHA returns AlreadyPresent
- Verdict-driven merge -- fresh verdict comment enables auto-merge path
| Metric | Target | Actual | Status |
|---|---|---|---|
| Unit tests covering 4 ACs | >= 8 | 16 (8 extractor + 4 poster + 4 drain_log) | PASS |
| Gate: cargo fmt --check | clean | clean | PASS |
| Gate: cargo clippy default | clean | clean | PASS |
| Gate: cargo clippy --features quickwit | clean | clean | PASS |
| Gate: cargo test --lib (default) | 0 regressions | 918 pass + 1 pre-existing failure | PASS |
| Gate: cargo test --lib (--features quickwit) | 0 regressions | 918 pass + 1 pre-existing failure | PASS |
| AC1, AC2, AC3, AC4 | all present | all present in code | PASS |
| O1, O2, O3 (deploy-safety from #2285) | unchanged | tests still pass | PASS |
fleet_config::test_source_template_build_runner_event_only -- missing fixture scripts/adf-setup/agents/build-runner.toml. Fails identically on origin/main (not a regression).
| Severity | Issue | Status |
|---|---|---|
| P0 | Duplicate post_terminal_commit_status overwrote verdict-derived status |
FIXED in ededa27 |
| P1 | _path_marker dead-code stub |
FIXED in ededa27 |
| P1 | #[allow(dead_code)] on PrVerdictMeta fields |
FIXED in ededa27 |
| P1 | Posted { comment_id: 0, confidence: 0 } sentinels |
DOCUMENTED (requires OutputPoster API change, deferred) |
| P2 | Sequential fetch_comments calls (idempotency + counter) |
ACCEPTABLE (design chose degradation) |
| P2 | Silent error swallowing on Gitea list errors | ACCEPTABLE (design chose degradation) |
-
Write Integration Tests (Step 5)
- File:
crates/terraphim_orchestrator/tests/verdict_poster_tests.rs - Harness: Extend existing
ServerStatefromremediation_tests.rs - Tests: 5 test cases covering the 4 ACs + idempotency
- Time estimate: 2-3 hours
- File:
-
Run Full Gate
cargo fmt --all -- --checkcargo clippy -p terraphim_orchestrator --all-targets -- -D warningscargo test -p terraphim_orchestrator(default +--features quickwit)- Time estimate: 15 min
-
Commit Integration Tests
- Commit:
test(orchestrator): add verdict poster integration tests (Refs #2301)
- Commit:
-
Push Branch
cd /tmp/agents-fix git push origin task/2301-verdict-comment-posting -
Open PR via gitea-robot
gtr create-pull --owner terraphim --repo terraphim-agents \ --title "Fix #2301: orchestrator-owned verdict comment posting" \ --base main --head task/2301-verdict-comment-posting -
Monitor PR Through Gate
- Wait for haiku/glm reviewers to run
- Verify reviewers produce verdict comments (this is the dogfood test)
- If green, merge; if issues, fix and re-push
-
Rebuild adf from gitea/main
cd /data/projects/terraphim/terraphim-agents git worktree add --force /home/alex/adf-build-clean origin/main cd /home/alex/adf-build-clean export CARGO_TARGET_DIR=/data/projects/terraphim/terraphim-agents/target cargo build --release -p terraphim_orchestrator --bin adf
-
--check Before Install
cd /opt/ai-dark-factory export GITEA_URL=https://git.terraphim.cloud export GITEA_TOKEN=... /usr/local/bin/adf --check orchestrator.toml
Must show: pr-remediation on glm, reviewers on haiku/review_tier, no errors
-
Atomic Install + Restart
sudo cp target/release/adf /usr/local/bin/adf.new sudo mv -f /usr/local/bin/adf.new /usr/local/bin/adf sudo systemctl restart adf-orchestrator.service
-
Verify Startup
sudo systemctl is-active adf-orchestrator.service sudo journalctl -u adf-orchestrator.service --since "1 min ago" | grep -E "reconciliation|webhook|error|panic"
-
Disabled-First Verification
- Keep
max_remediation_attempts = 0in[auto_merge]config - Verify 0 remediation dispatches over 2-3 poll cycles
- Confirm no stampede (from #2285 hardening)
- Keep
-
Enable Conservatively
- Set
max_remediation_attempts = 3,max_dispatches_per_tick = 1 - Restart
- Verify <= 1 dispatch per tick, terraphim-agents only
- Set
-
Test on PR #17
- Re-trigger review on PR #17 (empty commit to task/1-impl)
- Watch for: fresh verdict comment with
Last reviewed commit: <new-head[:7]> - Watch for:
adf/pr-reviewerstatus reflects verdict (not just exit-code) - Watch for: remediation loop dispatches if confidence < 5/5
- Expected: fix-agent commits load_state() fix, re-gates, scores 4-5/5, auto-merges
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Integration tests fail on fake-Gitea harness | Medium | Medium | Harness exists and works for remediation tests; extend carefully |
| Reviewers crash on PR (size/quota) | Low | High | #2275 hardening + glm routing (non-Claude) already deployed |
| Deploy stampede on restart | Low | High | #2285 kill-switch + cap + persistence already verified |
| Verdict comment still not posted (deeper bug) | Low | Critical | This is the exact fix; if it fails, the extractor/poster logic is the issue |
| Merge conflict with main | Medium | Low | Rebase if needed; branch is off e483551 which is before #2275 merge |
-
Proceed with Step 5 (integration tests)?
- The 4 core commits are complete and verified
- Integration tests add coverage but are not blocking for the core fix
- Recommendation: Yes, write them before PR for completeness
-
Deploy immediately after merge, or wait for observation window?
- #2285 hardening means the deploy is safe (kill-switch + cap)
- But #2301 changes the core verdict path -- a rollback would be needed if broken
- Recommendation: Deploy during low-activity window, keep kill-switch ready
-
Should PR #17 be the first live test, or a simpler test PR?
- #17 has real complexity (load_state() P1 + 2 P2s)
- A simpler test PR (e.g., typo fix) would verify the loop mechanics first
- Recommendation: Use a simple test PR first, then #17
| File | Lines | Type | Purpose |
|---|---|---|---|
src/pr_dispatch.rs |
+65/-3 | Modified | Template anchors in review prompt |
src/pr_review/extractor.rs |
+327 | NEW | Per-CLI assistant-text extractor |
src/pr_review/poster.rs |
+361 | NEW | Verdict comment poster |
src/pr_review.rs |
+4/-1 | Modified | Module declarations |
src/reconcile_impl.rs |
+122/-19 | Modified | Drain read, Lagged-tolerance, verdict-derived status |
src/lib.rs |
+6 | Modified | verdict_post field on ManagedAgent |
src/spawn_impl.rs |
+1 | Modified | Default verdict_post: None |
src/pr_handlers_impl.rs |
+45 | Modified | Set verdict_post: Some(meta) on pr-reviewer/validator |
tests/fixtures/pr_review/*.log |
+6 files | NEW | Real-captured drain fixtures |
f71b077-- feat(pr_dispatch): carry verdict-template anchors in review task prompta7e8d71-- feat(pr_review): add per-CLI assistant-text extractor with real-captured fixtures72f8bda-- feat(orchestrator): source capture from drain .log + Lagged-tolerant broadcast drain8edc6c6-- feat(pr_review): orchestrator-owned verdict comment poster + verdict-derived statusededa27-- fix(orchestrator): address structural PR review findings on #2301
- Research:
/tmp/agents-fix/.docs/research-2301-verdict-comment-posting.md(382 lines) - Design:
/tmp/agents-fix/.docs/design-2301-verdict-comment-posting.md(406 lines) - Structural Review:
/tmp/agents-fix/.docs/structural-review-2301.md(generated inline in session) - Plan:
/Users/alex/.claude/plans/snuggly-giggling-quill.md
Recommended: Write the 5 integration tests (Step 5), commit, push branch, open PR, monitor through gate, deploy with disabled-first posture, verify on simple test PR, then on #17.
Alternative: Push the 4 commits as-is (without integration tests), open PR, and write tests in parallel while PR is under review.
Risk of delaying: The verdict-comment bug is the binding constraint on the entire ADF auto-merge and remediation system. Every day without this fix means:
- PRs cannot auto-merge (all require manual intervention)
- The remediation loop cannot fire (no CONDITIONAL verdicts to consume)
- PR #17 (and similar below-threshold PRs) sit indefinitely