Skip to content

fix(loops): sandbox live-path — sandbox 0.4.3 (SSE reconnect) + poll session-id + fail-loud infra errors#189

Merged
drewstone merged 1 commit into
mainfrom
fix/sandbox-live-path-sse
Jun 7, 2026
Merged

fix(loops): sandbox live-path — sandbox 0.4.3 (SSE reconnect) + poll session-id + fail-loud infra errors#189
drewstone merged 1 commit into
mainfrom
fix/sandbox-live-path-sse

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

What

The in-box agentic batch path (runLoop → sandbox → benchmark) was losing its SSE prompt stream on long quiet turns (clone/build/test), surfacing as Stream dropped without terminal eventReplay endpoint 404. Root cause: the bench resolved @tangle-network/sandbox 0.4.2, which drops the stream on a long tail. 0.4.3's reconnect-backoff survives it.

Verified

A commit0 in-box rollout (clone real repo → opencode implements → repo pytest judges) streamed >520s with zero drops on 0.4.3, completed end-to-end, wrote real RunRecords. On 0.4.2 the same path dropped at ~256s.

Changes

  • bench: @tangle-network/sandbox ^0.4.0^0.4.3.
  • experiment.ts: surface the underlying iter0.error (an [infra-cause] line) instead of hiding every failure behind a generic INFRA-ERROR label — a swallowed-error fail-loud gap that cost real debugging time.
  • sandbox-lineage.ts poll-mode: follow the session id dispatchPrompt actually assigned (DispatchedSession.sessionId), not the client-minted one. Polling the wrong id 404s the session-events endpoint.

Residual (not fixed here)

Poll-mode (dispatchPrompt + session().result()) is still gated on a platform session-events endpoint for dispatched sessions (result()_sessionEvents→404 on all SDK versions). SSE is the working batch path; poll-mode remains opt-in and currently unusable until the platform serves events for dispatched sessions.

…, poll follows assigned session id, surface infra errors

The in-box agentic batch path (runLoop -> sandbox -> benchmark) lost its SSE
stream on long quiet turns (clone/build/test): the bench resolved
@tangle-network/sandbox 0.4.2, which drops the prompt stream without a terminal
event on a long tail. 0.4.3's reconnect-backoff survives it — a commit0 in-box
rollout streamed >520s with zero drops on 0.4.3.

- bench: @tangle-network/sandbox ^0.4.0 -> ^0.4.3
- experiment.ts: surface the underlying iter0 error instead of hiding it behind a
  generic INFRA-ERROR (a swallowed-error fail-loud gap)
- sandbox-lineage poll-mode: follow the session id dispatchPrompt assigned
  (DispatchedSession.sessionId), not the client-minted one — polling the wrong id
  404s the session-events endpoint. Poll-mode still needs a platform session-events
  endpoint for dispatched sessions; SSE is the working batch path.
@tangletools
Copy link
Copy Markdown
Contributor

⚠️ Review Incomplete — 2b23c33a

At least one required reviewer lane failed closed. No approval or request-changes review was published. Trigger a fresh review on the current PR head.

tangletools · 2026-06-07T00:46:41Z

@tangletools
Copy link
Copy Markdown
Contributor

✅ No Blockers — 2b23c33a

Readiness 92/100 · Confidence 80/100 · 1 finding (1 low)

deepseek: Correctness 92 · Security 92 · Testing 92 · Architecture 92

Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision.

🟡 LOW No test coverage for the platform-session-remap scenario — src/runtime/sandbox-lineage.ts

The poll-mode test at tests/loops/sandbox-lineage.test.ts:167-221 uses a fake dispatchPrompt that returns o?.sessionId ?? 'minted' — always echoing the supplied ID. The core fix (using the platform-assigned sessionId) has no test where dispatchPrompt returns a different ID than the one supplied. Adding a test with a divergent sessionId would cover the exact bug this PR fixes. Impact: low — the fix is straightforward and the existing test proves dispatchPrompt fires and the result path works end-to-end.


tangletools · 2026-06-07T01:09:23Z · trace

Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Approved — 1 non-blocking finding — 2b23c33a

Full multi-shot audit completed 4/4 planned shots over 4 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary


tangletools · 2026-06-07T01:09:23Z · immutable trace

@drewstone drewstone merged commit cb1029a into main Jun 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants