improvement(schedules): retries, concurrency limits#4755
Conversation
les): retries, concurrency limits
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview The cron
Smaller changes: workspace env routes invalidate a shorter-lived decrypted-env cache; copilot integration tool schemas use a brief LRU cache; Reviewed by Cursor Bugbot for commit d3103b7. Configure here. |
Greptile SummaryThis PR introduces schedule execution backpressure and reliability improvements: a configurable concurrency limit (
Confidence Score: 5/5Safe to merge. The concurrency-limit and infra-retry logic is well-guarded by PostgreSQL advisory locks, and the DB migration adds a non-breaking column with a default value. All core state transitions (claim, defer, fail, recover) are guarded by expectedLastQueuedAt conditions that prevent double-processing. The advisory lock in tryStartDatabaseScheduleJob provides cross-process enforcement of the concurrency cap. The one edge case (null claimedAt bypassing the stale-claim check) is not reachable from any normal code path since payload.now is always a valid ISO timestamp. No files require special attention. The migration is additive and the schedule-execution flow changes are isolated to the scheduler path. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([Cron tick]) --> B[recoverStaleDatabaseScheduleJobs]
B --> C[getDatabaseScheduleExecutionSlots]
C --> D[resumePendingDatabaseScheduleJobs]
D --> E[getDatabaseScheduleExecutionSlots updated]
E --> F{slots > 0 AND budget > 0?}
F -- No --> G[schedulesExhausted = true]
F -- Yes --> H[claimWorkflowSchedules + claimJobSchedules]
H --> I[processScheduleItem per claimed schedule]
I --> J{existing job?}
J -- stale --> K[cancelJob / releaseScheduleLock]
J -- pending DB --> L[executeDatabaseScheduleJob]
J -- none --> M[jobQueue.enqueue]
M --> N{useDatabaseFallback?}
N -- Yes --> L
N -- No --> O[Trigger.dev handles execution]
L --> P[tryStartDatabaseScheduleJob via advisory lock]
P -- capacity_full --> Q[job stays pending / resumed next tick]
P -- started --> R[executeScheduleJob]
R --> S[preprocessExecution]
S -- retryable 500 --> T[retryScheduleAfterInfraFailure]
T --> U{infraRetryCount > MAX_ATTEMPTS?}
U -- Yes --> V[markClaimedScheduleFailed]
U -- No --> W[set nextRunAt with backoff]
S -- success --> X[runWorkflowExecution]
X --> Y{retryable_setup_failure?}
Y -- Yes --> T
Y -- No --> Z[update schedule: success / failure / skip]
Reviews (3): Last reviewed commit: "retryable errs cleanup" | Re-trigger Greptile |
Drop the pre-merge generated 0213 migration so it can be regenerated after syncing with staging. Co-authored-by: Cursor <cursoragent@cursor.com>
|
bugbot run |
|
@greptile |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit d3103b7. Configure here.
|
@greptile |
Summary
Schedule backpressure / discovery improvements.
Type of Change
Testing
Tested in Staging environment
Checklist