Skip to content

fix(rate-limiter): hosted-key queue follow-up fixes#4762

Open
TheodoreSpeaks wants to merge 2 commits into
stagingfrom
fix/hosted-key-queue-followups
Open

fix(rate-limiter): hosted-key queue follow-up fixes#4762
TheodoreSpeaks wants to merge 2 commits into
stagingfrom
fix/hosted-key-queue-followups

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

Summary

  • Re-extend the hosted-key queue-list TTL on each head heartbeat, so a head waiting longer than the 600s list TTL (possible for long enterprise async budgets) can't let the list expire and drop every waiter to "missing", collapsing FIFO into concurrent bucket racing
  • Close a TOCTOU window in interruptibleSleep: re-check signal.aborted after registering the listener so an abort firing in the gap still resolves immediately instead of sleeping the full duration
  • Forward enrichmentId: next.enrichmentId explicitly when re-driving a workflow group, instead of inheriting a stale value from the prior group via the payload spread

Follow-up to the Greptile review findings on #4756. The Bugbot cell-render.tsx comment was intentionally skipped (dismissed as "feature isn't live yet").

Type of Change

  • Bug fix

Testing

Tested manually. bun run lint clean, bun run check:api-validation:strict passed, hosted-key tests (43) and workflow-columns tests (5) green.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 27, 2026 6:03pm

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 27, 2026

PR Summary

Medium Risk
Queue and abort-timing changes affect shared rate limiting and long waits; the enrichmentId fix touches table workflow cascade correctness but is narrowly scoped.

Overview
Follow-up fixes for the hosted-key FIFO queue and table workflow cascade.

Hosted-key queue: Each head heartbeat refresh now also re-extends the Redis queue list TTL (via a pipeline expire), so a head waiting longer than the 600s list TTL—e.g. under a long enterprise async budget—cannot let the list vanish and mark waiters as "missing", which would break FIFO and allow concurrent bucket racing.

Rate limiter: interruptibleSleep removes the abort listener on both timeout and abort paths and re-checks signal.aborted right after registering the listener, closing a TOCTOU gap where an abort in that window could still sleep the full duration.

Workflow columns: When re-driving the next queued group after a cascade lock window, the payload now sets enrichmentId from the picked group instead of keeping a stale value from the prior enrichment group via spread.

Reviewed by Cursor Bugbot for commit 33c0138. Bugbot is set up for automated code reviews on this repo. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR applies three targeted follow-up fixes from a prior review of the hosted-key queue implementation. All three address real failure modes in production code paths.

  • Queue-list TTL: refreshHeartbeat is upgraded from a single redis.set to a redis.multi() pipeline that also calls pipeline.expire on the queue list key. Without this, a head ticket waiting longer than the 600 s list TTL (reachable for enterprise async budgets) could let the list expire, dropping every waiter to \"missing\" and collapsing FIFO into concurrent bucket racing.
  • interruptibleSleep TOCTOU: A post-addEventListener re-check of signal.aborted is added with a call to the onAbort handler if true, alongside an explicit removeEventListener inside onAbort to clean up the listener when it is called directly rather than through the event.
  • enrichmentId inheritance: executeWorkflowGroupCellJob's outer re-drive loop now sets enrichmentId: next.enrichmentId explicitly instead of inheriting the prior group's value through the payload spread.

Confidence Score: 5/5

All three changes are narrow, well-scoped fixes with corresponding test coverage; no new external dependencies or schema changes are introduced.

Each fix targets a specific, well-understood failure mode. The implementations are consistent with the surrounding patterns, tests cover the changed behaviour, and no new failure modes are introduced.

No files require special attention; queue.ts and hosted-key-rate-limiter.ts carry the core logic changes and both look correct.

Important Files Changed

Filename Overview
apps/sim/lib/core/rate-limiter/hosted-key/queue.ts refreshHeartbeat now uses a multi() pipeline to atomically refresh both the ticket heartbeat and the queue-list TTL, preventing long-waiting heads from letting the list expire mid-wait.
apps/sim/lib/core/rate-limiter/hosted-key/hosted-key-rate-limiter.ts interruptibleSleep gains a post-addEventListener re-check of signal.aborted and an explicit removeEventListener in onAbort to close the TOCTOU window and prevent listener leaks.
apps/sim/background/workflow-column-execution.ts Outer re-drive loop now explicitly sets enrichmentId: next.enrichmentId so the payload reflects the target group rather than inheriting a stale value from the prior group via spread.
apps/sim/lib/core/rate-limiter/hosted-key/queue.test.ts Tests updated to match the new pipeline-based refreshHeartbeat, asserting on pipeline.set, pipeline.expire, and pipeline.exec; no-op check correctly switched from mockRedis.set to mockRedis.multi.

Reviews (1): Last reviewed commit: "fix(rate-limiter): hosted-key queue foll..." | Re-trigger Greptile

@TheodoreSpeaks TheodoreSpeaks changed the title fix(rate-limiter): hosted-key queue follow-ups from #4756 review fix(rate-limiter): hosted-key queue follow-up fixes May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant