feat(tools): queue hosted-key tool calls instead of failing with 429#4416
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Replace the per-call distributed lock with a Redis-backed FIFO queue so callers within a workspace get strict ordering instead of racing the bucket. Adds heartbeat-based crash recovery and dead-head reaping in a single Lua script. Bumps Exa search hosted RPM from 5 to 60.
|
@BugBot review |
PR SummaryMedium Risk Overview Adds a new Redis-backed Tool execution now optionally re-acquires a hosted key and retries once after upstream 429 backoff is exhausted, and Exa search hosted RPM is increased from 5 to 60; tests are expanded to cover queue ordering, heartbeat, cap timeouts, and wait-then-succeed flows. Reviewed by Cursor Bugbot for commit 0b80ed3. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0b80ed3. Configure here.
|
@greptile review |
Greptile SummaryThis PR replaces the immediate-429 behavior for hosted-key rate-limit hits with a per-workspace+provider FIFO queue backed by Redis. Each
Confidence Score: 5/5Safe to merge; the queue logic is well-tested, all Redis failure paths fail-open to plain bucket racing, and the FIFO ordering and heartbeat mechanisms are correctly implemented. The core queue mechanics — enqueue atomicity, Lua-based head reap, heartbeat sharing across wait phases, dequeue in finally, AbortSignal propagation — are all correct. Test coverage is thorough: FIFO ordering, dead-head reap, cap exceeded, abort mid-sleep, low-RPM heartbeat refresh, and no-Redis fallback are all exercised. The only finding is that No files require special attention; the telemetry Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller as Tool Caller
participant RL as HostedKeyRateLimiter
participant Q as HostedKeyQueue (Redis)
participant Bucket as Token Bucket
Caller->>RL: acquireKey(provider, workspaceId, signal)
RL->>Q: enqueue(provider, workspaceId, ticketId)
Q-->>RL: "{ position, enabled }"
loop waitForQueueHead
RL->>Q: checkHead (Lua: reap dead + check position)
Q-->>RL: "waiting | head | missing"
alt not at head and budget remains
RL->>Q: maybeRefreshHeartbeat
RL->>RL: interruptibleSleep(200ms, signal)
end
end
alt queue timed out
RL-->>Caller: 429 (queue wait exceeded)
end
loop waitForActorCapacity
RL->>Bucket: checkActorRateLimit
Bucket-->>RL: allowed? retryAfterMs?
alt not allowed and budget remains
RL->>Q: maybeRefreshHeartbeat
RL->>RL: heartbeatAwareSleep(min(retryAfterMs,10s), signal)
end
end
RL->>Caller: success true, key
RL->>Q: dequeue(ticketId) [finally block]
Note over Caller,Bucket: On upstream 429 after maxRetries
Caller->>RL: "reacquireHostedKey -> acquireKey (fresh ticket)"
RL-->>Caller: fresh key injected into params
Caller->>Caller: executeToolRequest (one final retry)
Reviews (3): Last reviewed commit: "feat(rate-limiter): make hosted-key queu..." | Re-trigger Greptile |
…ix heartbeat + telemetry Tie the per-workspace hosted-key queue wait to the surrounding execution budget instead of a flat 5-minute cap. acquireKey now accepts the execution AbortSignal (threaded from ExecutionContext): when present, the wait is bounded by the run's actual plan timeout / cancellation, with the enterprise async ceiling as a backstop; when absent it falls back to MAX_QUEUE_WAIT_MS. This lets long-running async (Trigger.dev) runs use their full budget while no longer letting a single queued call burn a short sync run's entire budget. Also addresses Greptile review: - P1: share one lastHeartbeatAt across all wait phases and cap every sleep to HEARTBEAT_REFRESH_INTERVAL_MS so a long low-RPM retryAfterMs can no longer let the head's heartbeat lapse mid-wait and break FIFO ordering. - P2: derive hostedKeyQueueWaited telemetry reason from the actual bottleneck (queue_position / dimension / actor_requests) instead of hardcoding it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the plain capped sleeps in the queue-head and bucket-capacity wait loops with an interruptibleSleep that resolves early when the execution AbortSignal fires (timeout or cancellation), cleaning up its own timer and listener. Previously a cancelled/timed-out run could overshoot by up to the heartbeat cap (~10s) before the loop re-checked its budget; now it wakes within a tick. The cap remains for heartbeat renewal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@greptile review |

Summary
platform.hosted_key.queue_waited(withqueuePositionfield) andplatform.hosted_key.queue_wait_exceeded.Type of Change
Testing
bun run lintcleanbun run check:api-validation:strictpassesChecklist