feat(tools): queue hosted-key tool calls instead of failing with 429 by TheodoreSpeaks · Pull Request #4416 · simstudioai/sim

TheodoreSpeaks · 2026-05-03T01:42:48Z

Summary

Hosted-key tool calls (Sim-provided keys, not BYOK) now enqueue onto a per-workspace+provider FIFO queue. Only the head of the queue consumes from the token bucket — strict ordering, no racing.
Different workspaces have independent queues. BYOK paths short-circuit before any of this and are unaffected.
Total wait (queue position + bucket refill) capped at 5 minutes; over the cap returns the existing 429 result.
Crash-tolerant: each ticket has a heartbeat key (TTL 30s, refreshed every 10s while waiting). Dead heads are reaped lazily by the next caller. Queue list TTL is 10 minutes for fully abandoned queues.
One Lua script per poll (reap + head-check + self-presence-check atomic) keeps Redis traffic low under contention.
Bump Exa search hosted RPM from 5 → 60.
New telemetry: platform.hosted_key.queue_waited (with queuePosition field) and platform.hosted_key.queue_wait_exceeded.

Type of Change

New feature

Testing

39 hosted-key tests pass (15 queue + 24 rate-limiter, including FIFO ordering, head-only consume, dead-head reap, cap-exceeded, missing-ticket fall-through)
141/141 across rate-limiter + tools regression
Manually verified in dev: depth, head rotation, heartbeat refresh, drain rate match the bucket config
bun run lint clean
bun run check:api-validation:strict passes

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

vercel · 2026-05-03T01:42:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		May 26, 2026 7:38pm

Replace the per-call distributed lock with a Redis-backed FIFO queue so callers within a workspace get strict ordering instead of racing the bucket. Adds heartbeat-based crash recovery and dead-head reaping in a single Lua script. Bumps Exa search hosted RPM from 5 to 60.

TheodoreSpeaks · 2026-05-05T01:34:07Z

@BugBot review

cursor · 2026-05-05T01:34:15Z

PR Summary

Medium Risk
Introduces blocking FIFO queueing and wait loops around hosted-key acquisition (Redis + polling/heartbeats) and changes tool retry behavior to re-enter the queue after upstream 429s, which could impact throughput/latency and failure modes under contention or Redis issues.

Overview
Hosted-key acquisition is changed from immediate token-bucket racing/429s to a per-workspace+provider FIFO queue: callers enqueue, wait until they reach the head (with heartbeat refresh), then wait for actor and (custom) dimension capacity up to a 5-minute cap before returning the existing 429-style error.

Adds a new Redis-backed HostedKeyQueue (Lua-based checkHead with dead-head reaping, TTLs, and fail-open behavior when Redis is unavailable) plus new telemetry events platform.hosted_key.queue_waited and platform.hosted_key.queue_wait_exceeded.

Tool execution now optionally re-acquires a hosted key and retries once after upstream 429 backoff is exhausted, and Exa search hosted RPM is increased from 5 to 60; tests are expanded to cover queue ordering, heartbeat, cap timeouts, and wait-then-succeed flows.

^{Reviewed by Cursor Bugbot for commit 0b80ed3. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 0b80ed3. Configure here.}

TheodoreSpeaks · 2026-05-26T17:36:28Z

@greptile review

greptile-apps · 2026-05-26T17:45:56Z

Greptile Summary

This PR replaces the immediate-429 behavior for hosted-key rate-limit hits with a per-workspace+provider FIFO queue backed by Redis. Each acquireKey call enqueues a ticket, polls until it reaches the head of the queue, and only then attempts to consume from the token bucket; the ticket is always dequeued in a finally block regardless of outcome.

Queue lifecycle (queue.ts): RPUSH/EXPIRE/SET on enqueue, a single Lua EVAL that atomically reaps dead heads and returns head/waiting/missing status on every poll, and LREM/DEL on dequeue. Heartbeat keys (30s TTL, refreshed every 10s via shared WaitState) prevent live callers from being reaped as dead. All Redis operations fail-open so the system degrades to plain bucket-racing when Redis is unavailable.
Rate-limiter refactor (hosted-key-rate-limiter.ts): acquireKey is restructured into waitForQueueHead → waitForActorCapacity → waitForDimensionCapacity; each phase shares a single WaitState.lastHeartbeatAt (fixing the heartbeat-expiry regression from earlier review), and heartbeatAwareSleep caps every bucket-wait sleep at the heartbeat interval. Wait budget is bounded by the execution AbortSignal when available, falling back to 5 min; both phases respect the shared deadline.
tools/index.ts: Passes executionContext?.abortSignal through to acquireKey, and introduces a reacquireAfterRetriesExhausted hook so a single re-queue attempt is made when upstream 429s exhaust local exponential-backoff retries. Exa hosted RPM is bumped 5 → 60 to take advantage of the queuing.

Confidence Score: 5/5

Safe to merge; the queue logic is well-tested, all Redis failure paths fail-open to plain bucket racing, and the FIFO ordering and heartbeat mechanisms are correctly implemented.

The core queue mechanics — enqueue atomicity, Lua-based head reap, heartbeat sharing across wait phases, dequeue in finally, AbortSignal propagation — are all correct. Test coverage is thorough: FIFO ordering, dead-head reap, cap exceeded, abort mid-sleep, low-RPM heartbeat refresh, and no-Redis fallback are all exercised. The only finding is that attempts in hostedKeyQueueWaited telemetry is always emitted as 1, which only affects observability and not runtime behavior.

No files require special attention; the telemetry attempts field in hosted-key-rate-limiter.ts is the sole minor gap.

Important Files Changed

Filename	Overview
apps/sim/lib/core/rate-limiter/hosted-key/queue.ts	New FIFO queue implementation: Redis list for ordering + per-ticket heartbeat keys + a single Lua EVAL for atomic dead-head reap + head/waiting/missing status. All Redis failure paths fail-open correctly.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts	Refactored acquireKey to enqueue/waitForQueueHead/waitForActorCapacity/waitForDimensionCapacity phases with shared WaitState heartbeat tracking; dequeue in finally block ensures cleanup on all exit paths. The `attempts` field in hostedKeyQueueWaited telemetry is hardcoded to 1.
apps/sim/lib/core/rate-limiter/hosted-key/queue.test.ts	New test file with 15 tests covering enqueue position math, checkHead Lua result passthrough, heartbeat refresh, dequeue cleanup, and fail-open/no-op Redis scenarios.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.test.ts	Updated tests inject a MockQueue, add FIFO ordering suite (enqueue, dequeue-on-exit, wait-at-head, heartbeat, cap exceeded, missing fall-through) and execution-budget tests (abort, mid-sleep abort, live-signal past cap, low-RPM heartbeat refresh).
apps/sim/lib/core/telemetry.ts	Adds two new platform events: hostedKeyQueueWaited and hostedKeyQueueWaitExceeded. The `attempts` field is always emitted as 1 from the call site.
apps/sim/tools/index.ts	Passes abortSignal to acquireKey; adds reacquireHostedKey helper and reacquireAfterRetriesExhausted callback so a single re-queue attempt is made when upstream 429s exhaust local retries.
apps/sim/tools/exa/search.ts	Bumps hosted RPM from 5 → 60 for Exa search, leveraging the new queue-based throttling instead of instant 429s.

Sequence Diagram

sequenceDiagram
    participant Caller as Tool Caller
    participant RL as HostedKeyRateLimiter
    participant Q as HostedKeyQueue (Redis)
    participant Bucket as Token Bucket

    Caller->>RL: acquireKey(provider, workspaceId, signal)
    RL->>Q: enqueue(provider, workspaceId, ticketId)
    Q-->>RL: "{ position, enabled }"

    loop waitForQueueHead
        RL->>Q: checkHead (Lua: reap dead + check position)
        Q-->>RL: "waiting | head | missing"
        alt not at head and budget remains
            RL->>Q: maybeRefreshHeartbeat
            RL->>RL: interruptibleSleep(200ms, signal)
        end
    end

    alt queue timed out
        RL-->>Caller: 429 (queue wait exceeded)
    end

    loop waitForActorCapacity
        RL->>Bucket: checkActorRateLimit
        Bucket-->>RL: allowed? retryAfterMs?
        alt not allowed and budget remains
            RL->>Q: maybeRefreshHeartbeat
            RL->>RL: heartbeatAwareSleep(min(retryAfterMs,10s), signal)
        end
    end

    RL->>Caller: success true, key
    RL->>Q: dequeue(ticketId) [finally block]

    Note over Caller,Bucket: On upstream 429 after maxRetries
    Caller->>RL: "reacquireHostedKey -> acquireKey (fresh ticket)"
    RL-->>Caller: fresh key injected into params
    Caller->>Caller: executeToolRequest (one final retry)

_{Reviews (3): Last reviewed commit: "feat(rate-limiter): make hosted-key queu..." | Re-trigger Greptile}

…ix heartbeat + telemetry Tie the per-workspace hosted-key queue wait to the surrounding execution budget instead of a flat 5-minute cap. acquireKey now accepts the execution AbortSignal (threaded from ExecutionContext): when present, the wait is bounded by the run's actual plan timeout / cancellation, with the enterprise async ceiling as a backstop; when absent it falls back to MAX_QUEUE_WAIT_MS. This lets long-running async (Trigger.dev) runs use their full budget while no longer letting a single queued call burn a short sync run's entire budget. Also addresses Greptile review: - P1: share one lastHeartbeatAt across all wait phases and cap every sleep to HEARTBEAT_REFRESH_INTERVAL_MS so a long low-RPM retryAfterMs can no longer let the head's heartbeat lapse mid-wait and break FIFO ordering. - P2: derive hostedKeyQueueWaited telemetry reason from the actual bottleneck (queue_position / dimension / actor_requests) instead of hardcoding it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the plain capped sleeps in the queue-head and bucket-capacity wait loops with an interruptibleSleep that resolves early when the execution AbortSignal fires (timeout or cancellation), cleaning up its own timer and listener. Previously a cancelled/timed-out run could overshoot by up to the heartbeat cap (~10s) before the loop re-checked its budget; now it wakes within a tick. The cap remains for heartbeat renewal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TheodoreSpeaks · 2026-05-26T23:34:55Z

@greptile review

Add queueing for hosted keys

d7a6339

vercel Bot temporarily deployed to Preview May 5, 2026 00:30 Inactive

cursor Bot reviewed May 5, 2026

View reviewed changes

Comment thread apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts Outdated

greptile-apps Bot reviewed May 26, 2026

View reviewed changes

Comment thread apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts

Comment thread apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts

vercel Bot temporarily deployed to Preview May 26, 2026 18:13 Inactive

vercel Bot temporarily deployed to Preview May 26, 2026 19:38 Inactive

TheodoreSpeaks marked this pull request as ready for review May 26, 2026 23:34

TheodoreSpeaks merged commit 4fa7e74 into staging May 26, 2026
13 checks passed

waleedlatif1 deleted the feat/queued-hosted-key branch May 27, 2026 00:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): queue hosted-key tool calls instead of failing with 429#4416

feat(tools): queue hosted-key tool calls instead of failing with 429#4416
TheodoreSpeaks merged 4 commits into
stagingfrom
feat/queued-hosted-key

TheodoreSpeaks commented May 3, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 3, 2026 •

edited

Loading

Uh oh!

TheodoreSpeaks commented May 5, 2026

Uh oh!

cursor Bot commented May 5, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

TheodoreSpeaks commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

TheodoreSpeaks commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TheodoreSpeaks commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Testing

Checklist

Uh oh!

vercel Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheodoreSpeaks commented May 5, 2026

Uh oh!

cursor Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheodoreSpeaks commented May 26, 2026

Uh oh!

greptile-apps Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

TheodoreSpeaks commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TheodoreSpeaks commented May 3, 2026 •

edited

Loading

vercel Bot commented May 3, 2026 •

edited

Loading

cursor Bot commented May 5, 2026 •

edited

Loading

greptile-apps Bot commented May 26, 2026 •

edited

Loading