fix(codeapi-service): poll job results when queue events lag#10
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
This PR fixes the CodeAPI service path where a worker completes a BullMQ job but the API never receives the
QueueEventsfinish notification and waits untilJOB_TIMEOUT.The API now waits for each job with an abortable queue-event listener plus a bounded Redis polling fallback. If the event path is missed or delayed, the service can still observe the completed or failed job state and return the worker result.
How it works
waitForJobFinished(job, queue, queueEvents, timeoutMs)for API-side job waits.completed/failedqueue events against pollingqueue.getJob(jobId)andgetState().returnvaluefor completed jobs and throws the failed reason for failed jobs.count: 1tocount: 100for 60 seconds so the fallback can read results under concurrent completions. Failed-job retention is unchanged.Branch coverage
main2e451bcfix(codeapi-service): poll job results when queue events lagReview focus
removeOnComplete: { age: 60, count: 100 }is the right short retention bound for this service.Verification
Current head:
bun run testfromservice-> 276 pass, 0 failbun run buildfromservice-> passed. Existing Rollup warnings and existing TS2352 warnings insrc/egress-grant.ts:441andsrc/service/replay-state.ts:302remain.git diff --cached --checkbefore commit -> passedFailed or warning:
bun run testattempt failed because Redis DNS was blocked in the sandbox. Rerunning the same suite with network access passed.Security / privacy
Blocking before merge
None.
Follow-up after merge
stg./tmp/codeapi-jwt.env.