Complete ce-ops triage queue apply mode#767
Conversation
ce-dev-2
left a comment
There was a problem hiding this comment.
Independent review (read-only correctness reviewer) on head b38da6c: REQUEST_CHANGES — one blocking finding, controller-verified.
BLOCKING: overlapping cron runs can double-create the sentinel. The workflow has no concurrency stanza (verified: zero matches in ce-ops-triage-queue.yml), and _create_queue_comment (ce_ops_triage_queue.py:678-717) has an unprotected read→POST window: two overlapping scheduled runs that both read no-sentinel will both POST, producing two sentinel comments on ce-ops#67 — violating the exactly-once mandate. The post-create re-read tolerates the race but does not prevent it, and the test only exercises sequential calls.
Required fix (small): add a concurrency group to the workflow (e.g. group: ce-ops-triage-queue, cancel-in-progress: false so runs queue rather than overlap or cancel mid-apply), plus extend the workflow-content test to assert the stanza exists.
Non-blocking (banked): kill-switch behavior is asserted at workflow-file-content level only — acceptable given Actions-variable integration testing is impractical; the shell gating logic itself reads correctly.
Everything else verified sound: sentinel search/patch path, kill-switch fail-direction (unset/invalid → dry-run), mutation bound test captures create/patch/label calls, dry-run parity preserved, workflow_dispatch apply input intact. Will re-review the delta on the updated head.
ce-dev-2
left a comment
There was a problem hiding this comment.
Re-review on head 4218364 (supersedes my CHANGES_REQUESTED on b38da6c): APPROVE. The delta is exactly the prescribed remedy and nothing else — workflow-level concurrency stanza (group: ce-ops-triage-queue, cancel-in-progress: false, so an in-flight apply run is never killed mid-mutation and overlapping cron runs serialize away the sentinel double-create race) plus three content assertions pinning the stanza in test_triage_queue_workflow_scheduled_apply_has_kill_switch. Delta verified by direct controller inspection of the full 2-hunk diff (the independent venue identified the defect last round; this is its verbatim mechanical fix). Non-blocking from last round stands as banked. Merging this PR flips scheduled apply-mode ON per the Operator's 2026-07-04 night-mandate grant; rollback = CE_TRIAGE_APPLY_KILL_SWITCH=true. Controller will spot-check the first scheduled apply run. CI validate running on same head; daemon defers until green.
Summary
Create the triage queue sentinel comment in apply mode when absent, then patch it on later runs.
Flip scheduled triage queue runs to apply mode with CE_TRIAGE_APPLY_KILL_SWITCH as the rollback switch.
Add unit coverage for exactly-once sentinel creation, scheduled kill-switch wiring, and bounded apply mutations.
Declared work class: story
merging flips scheduled apply ON (Operator-granted 2026-07-04); rollback = set CE_TRIAGE_APPLY_KILL_SWITCH=true
Validation
PYTHONPATH=validators .venv-test/bin/python -m pytest validators/tests/unit/test_ce_ops_triage_queue.py -q-> 34 passed.env -u GH_TOKEN -u BAO_TOKEN -u OPENBAO_TOKEN -u CE_OVERWATCH_PAT TMPDIR=/var/tmp CE_VALIDATOR_PYTHON=.venv-test/bin/python .venv-test/bin/ce validate-pr --repo-root . --base origin/main --head-ref ce-l3-triage-apply-completion --declared-work-class story-> PASS: PR preflight.ce-approval-capability: v1.eyJhcHByb3ZlZF9ieSI6ImNlLWRldi0yIiwiZXhwaXJlc19hdCI6MTc4MzE0Mjc4MCwiaGVhZF9zaGEiOiI0MjE4MzY0MTJhY2Y5MDcxOTk1OGIyZjg4YzFmNTY5NGIyMmM4YmEzIiwiaXNzdWVkX2F0IjoxNzgzMTM5MTgwLCJwb2xpY3lfc2hhIjoiNzliOWRjOGI0MjllNGFmMTA5ZWNjNjhhZjE5YjI2ZTliM2Y2NDdkOWZhZDhkZDM3MzA5ZDZmNWVhZDgxYjNiMCIsInByX251bWJlciI6NzY3LCJyZXBvIjoiY3JlYXRvci1lbmdpbmUvY3JlYXRvci1lbmdpbmUifQ.ZZnGVa-vVUwVUk6Rl69IqIiswdOSE9ib2PzlRfwXPaQ