Skip to content

bug: paused-at-gate sessions stuck at status='new' instead of transitioning to 'awaiting_input' #42

@aksOps

Description

@aksOps

Summary

When a session pauses at a HITL gate (gate_fired event), the session row's status column stays at 'new' instead of transitioning to 'awaiting_input'. The graph IS paused correctly, but the persisted status doesn't reflect that — UIs that filter by status='awaiting_input' (like the approvals queue) miss it.

This is the sibling case of CRITICAL #1 (finalizer asymmetry) closed in v2.0.0-rc3 / PR #41. The rc3 fix correctly handles graph-completed-without-pause via _finalize_session_status_async(). The paused branch was left untouched and the status-write that should accompany gate_fired is missing.

Reproduction (live, against rc3 backend on main @ 8be7ea2)

# Backend on uvicorn :37777 serving SPA + API, Ollama gpt-oss provider.
SID=$(curl -sf -X POST https://clm.randomcodespace.dev/api/v1/sessions \
  -H "Content-Type: application/json" \
  -d '{"query":"rc3 agent_running smoke","environment":"dev","submitter":{"id":"t"}}' \
  | jq -r .session_id)
# Wait ~15s for the LLM to drive the graph through to the gate.
curl -sf "https://clm.randomcodespace.dev/api/v1/sessions/$SID/full" | jq '.session.status, [.events[] | select(.kind == "gate_fired")] | length'

Observed for INC-20260516-061:

  • session.status'new'
  • events[].kind == 'gate_fired' → 1 occurrence (local_remediation:apply_fix, reason high_risk)
  • agents_run → 2 entries (triage, deep_investigator), resolution started but didn't finish
  • Event log clearly shows the graph paused at the gate

INC-20260516-058 (from earlier today, pre-PR #41 backend) has the same symptom — this is pre-existing, not introduced by rc3.

Expected behavior

When the gateway emits gate_fired, the session row's status should be updated to 'awaiting_input' (and session.status_changed event with to='awaiting_input' should fire).

Suggested fix

Likely lives next to where gate_fired is emitted — either in src/runtime/tools/gateway.py (where the gate decision lands) or where the graph reads the gate signal in src/runtime/graph.py. Add a paired event_log.record(sid, "session.status_changed", payload={"from": ..., "to": "awaiting_input"}) AND a SessionStore.update_status(sid, "awaiting_input") write, both in the same transaction as the audit row.

Reuse the same async-finalizer machinery from rc3 (_finalize_session_status_async) but with the paused branch taken explicitly: when is_graph_paused returns True after ainvoke, set status to awaiting_input.

Regression test

In tests/test_finalizer_paths.py add a case where the fake graph pauses at a gate (interrupt). Assert:

  1. store.load(sid).status == 'awaiting_input'
  2. event_log.iter_for(sid) contains a session.status_changed with to='awaiting_input'

Touches v2.1 scope

This is the obvious next item after the v2.0.0-rc3 audit fixes. Tagging for the v2.0.0-rc4 / v2.0.0 GA cleanup pass.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions