Skip to content

First mate v1b — live integration (supervisor + heartbeat)#3

Open
tomplex wants to merge 7 commits into
claude-driverfrom
tc/first-mate-v1b
Open

First mate v1b — live integration (supervisor + heartbeat)#3
tomplex wants to merge 7 commits into
claude-driverfrom
tc/first-mate-v1b

Conversation

@tomplex

@tomplex tomplex commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Makes the first mate live, on top of the v1a substrate. Stacked on #2 (base is claude-driver, not main) — its diff is v1b-only; retarget to main once #2 lands.

⚠️ Do not merge without a manual demo. Once merged and prod restarts, periscope auto-spawns a budget-spending first-mate Claude on boot and runs Haiku-free heartbeats (the heartbeat itself is pure Python; the spawned first mate is what spends). The live behavior — a real first mate booting, obeying its role, reasoning over pushed digests — is not auto-testable and must be eyeballed first. Tests cover the supervisor/adapter/push/hook logic, not the live Claude.

What it adds (all inside the already-is_prod()-gated activity worker)

  • Supervisor (first_mate.supervisor_pass/_spawn_first_mate) — ensures one live bridge:first-mate pane: spawns via claude_exec() --append-system-prompt ROLE_PROMPT, marks it (first_mate table), respawns if it dies. Idempotent (a live marker short-circuits). Self-gates on is_prod()dev never spawns.
  • Heartbeat_worker_tick (worker thread) assembles a side-effect-free per-pane digest (assemble_pane_views + pure _curate_pane, no build_window_view so it can't race the poll), diverges vs the last push, and stashes a Push; run_worker (main loop) awaits emit_channel_event, advancing _LAST_SENT only on a successful send (retry-next-tick fallback).
  • need_human interrupt hook in _do_notify_tool — immediate push to the first mate, out of band from the 30s heartbeat.
  • fleet_digest pull tool (the one v1a deferred) + the ROLE_PROMPT (chief-of-staff role, standing-tier only, absolute prohibitions incl. never-merge-fdy).

Correctness points worth a reviewer's eye

  • Event-loop hoist (load-bearing). The emit is awaited on the main loop, never asyncio.run in the worker thread — _MCP_SESSIONS streams are loop-affine; an in-thread loop would be a cross-loop bug. The need_human hook uses _task(create_task(...)) because _do_notify_tool already runs on the main loop.
  • handle = tmux pane_id (%N), not @periscope_id — worker rows carry no resolved pid (resolution writes state.json and isn't thread-safe).
  • Phantom-marker guard (code-review catch): if the window create fails or the %N read comes back empty, it leaves the marker unset rather than stamping pane_id="" — which would've been never-in-the-live-set and respawned a window every tick (unbounded budget leak). Regression-tested.
  • run_worker stays mocked in test_app.py — so the in-tick supervisor/heartbeat never fires a live pass during pytest (budget-safety landmine).

Provenance & tests

structure → (event-loop + spec corrections) → plan → plan-review (caught 2 Must-fix: pid-empty handle, async tests no-op) → subagent execution → 2-stage review (code-review caught the phantom-marker bug). uv run pytest -q706 passed, incl. a real @needs_tmux spawn/respawn integration test (isolated -L socket, cat stub exec).

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant