[AI-897] Deliver Copilot post-sessionEnd session.shutdown tail (live finalize drain)#168
Conversation
Copilot appends `session.shutdown` (per-model input/cache token aggregates) — and sometimes the final assistant turn — to events.jsonl only AFTER the sessionEnd hook returns. The hook's inline-drain runs before that line exists and then kills the live watcher, so the live path never delivers it and input/cache totals stay 0 (batch `kcap import --copilot` is unaffected). Spawn a detached `kcap copilot-finalize` process from the Copilot sessionEnd hook, AFTER the session-end POST so its poll budget isn't consumed by the pre-drain + POST. It outlives the hook, polls until `session.shutdown` is the terminal transcript line (or a 10s budget), then performs one idempotent inline-drain. Fully decoupled from the hook's return and the server's StopAndDrain, so it can't deadlock or stall them; the server watermark + deterministic event ids make the late delivery safe. The timeout fallback also rescues a dropped final assistant turn when shutdown never lands (crash). Also redact `WatcherManager.InlineDrainAsync` output via SecretRedactor (it previously skipped redaction, unlike the watcher's drain) since the finalize drain can carry real assistant/tool content. Server-side ingest of `session.shutdown` -> CopilotUsageBackfilled ships separately in kurrent-io/kcap-server#763; until that merges + deploys this delivers the line but token totals are unchanged. Tests: 8 new (terminal-line detection incl. resume-safety/malformed; timing through WireMock incl. slow-hook and timeout fallback). Full Unit suite (1544) and Integration (30) green; AOT publish clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Addresses review on CopilotHookCommand.HandleSessionEnd: the finalizer was spawned AFTER the session-end POST, but PostHookAsync uses the default 30s PostWithRetryAsync budget and Copilot's hook timeout is commonly ~30s too. A slow/unreachable server could let Copilot SIGKILL the hook before the spawn line ran — so no finalizer, and the post-sessionEnd tail is lost: exactly the failure mode this change exists to fix. Spawn the detached finalizer FIRST, before the capped pre-drain and the retrying POST, so it is guaranteed to be created. It is already detached (setsid + closed std streams), so it survives the hook being killed mid-POST and still delivers the tail. Because it now starts before the pre-drain + POST, bump its poll budget 10s -> 45s so it outlasts the worst-case hook lifetime (PreHookDrainCap + Copilot's ~30s hook timeout) and is still polling when session.shutdown is flushed after the hook returns/dies. Kept the lifecycle POST uncapped — capping it risks the session sticking "Active", which PreHookDrainCap exists to prevent. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
829fd52 moved the finalizer spawn to the hook's first action (before the pre-drain + session-end POST), but the XML docs on CopilotFinalizeDrainCommand and WatcherManager.SpawnCopilotFinalizeDrain still described the old "after the session-end POST" ordering. Since the spawn timing is the core correctness property, update both so future readers don't reason from the stale order. Comment-only change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
One more stale-doc nit from the rerun: the file-level |
…ordering (review) The file-level CopilotHookCommand <remarks> wire contract still summarised sessionEnd as "kill watcher + capped inline drain, then POST", omitting the correctness-critical first step added in 829fd52: spawn the detached copilot-finalize drainer before the pre-drain + POST. Update the sessionEnd entry to reflect the final ordering. Comment-only change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Done in the latest commit — updated the |
|
Approved to merge. |
Re-merge after #168 (AI-897 Copilot session.shutdown tail) landed on main. Only conflict was help-usage.txt's plugin flag list — kept the superset (--gemini AND --pi). Program.cs auto-merged (Pi + Gemini hook routing intact).
Problem
In the live hook path, Copilot session token stats show output-only — input/cache stay 0. Copilot writes
session.shutdown(the per-model input/cache token aggregate) toevents.jsonlafter thesessionEndhook returns:The hook's inline-drain runs before that line exists and then kills the live watcher, so the live path never delivers it. (Batch
kcap import --copilotreads the complete file and is unaffected.) Secondary risk: a session that ends promptly can also drop its final assistant turn, since the watcher is killed before draining the tail.Fix
Spawn a detached
kcap copilot-finalize <sid> <path>process from the CopilotsessionEndhook, after the session-end POST (so its poll budget isn't consumed by the pre-drain + POST). It outlives the hook, polls untilsession.shutdownis the terminal transcript line (or a 10s budget), then performs one idempotent inline-drain.StopAndDrainAsync(which sendsStopWatcherand waits up to 10s on session-end). The watermark + deterministic event ids make the late delivery safe.session.shutdownnever lands (e.g. Copilot crash).WatcherManager.InlineDrainAsyncoutput viaSecretRedactor— it previously skipped redaction unlike the watcher's drain, a latent leak now exercised by the finalize tail.Rejected the alternative "keep the watcher alive to observe shutdown": the server actively stops the watcher on session-end and waits on drain-complete, so lingering would deadlock or force a ~10s stall on every Copilot session-end.
Dependency
Server-side ingest of
session.shutdown→CopilotUsageBackfilledships separately in kurrent-io/kcap-server#763. Until that merges + deploys, this PR reliably delivers the line but token totals are unchanged. Real-copilotlive E2E should run against a #763-inclusive server.Testing
InlineDrainAsyncredaction change is safe); Integration 30/30 green.copilot-finalizeis an internal command (spawned by the hook, likewatch/generate-whats-done) — not user-facing, so no README change.Linear: https://linear.app/kurrent/issue/AI-897
🤖 Generated with Claude Code