Skip to content

ci: make integration-suite telemetry queries resilient to Datadog API 429s#1288

Merged
jchrostek-dd merged 6 commits into
mainfrom
john/lambda-ext-integ-datadog-429-retry
Jun 25, 2026
Merged

ci: make integration-suite telemetry queries resilient to Datadog API 429s#1288
jchrostek-dd merged 6 commits into
mainfrom
john/lambda-ext-integ-datadog-429-retry

Conversation

@jchrostek-dd

@jchrostek-dd jchrostek-dd commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Overview

The integration tests intermittently fail due to not fetching telemetry data from Datadog due to 429, rate limit exceeded.

This PR adds retries for fetching telemetry data. Since we are now retrying, the total time of a test might increase. So, this PR also increase the test timeout from 15 minutes to 30 minutes.

@datadog-prod-us1-3

datadog-prod-us1-3 Bot commented Jun 24, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 5 Pipeline jobs failed

DataDog/datadog-lambda-extension | e2e-test-status (amd64)   View in Datadog   GitLab

DataDog/datadog-lambda-extension | integration-suite: [lmi]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | integration-suite: [on-demand]   View in Datadog   GitLab

View all 5 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 92f2336 | Docs | Datadog PR Page | Give us feedback!

@jchrostek-dd jchrostek-dd marked this pull request as ready for review June 24, 2026 20:12
@jchrostek-dd jchrostek-dd requested a review from a team as a code owner June 24, 2026 20:12
@jchrostek-dd jchrostek-dd requested review from Copilot and lym953 June 24, 2026 20:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces flakiness in the integration test suite by adding retry handling for Datadog API rate limiting (HTTP 429) when querying telemetry, and by extending Jest timeouts to accommodate the additional waiting.

Changes:

  • Added a shared requestWithRetry wrapper for Datadog API calls to retry on HTTP 429 with bounded waits and jitter.
  • Wrapped trace/log/metric query calls in the integration test Datadog client utilities with the retry wrapper.
  • Increased integration test timeouts (per-suite hooks and Jest global config) to 30 minutes.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
integration-tests/tests/utils/datadog.ts Adds 429 retry logic and applies it to Datadog telemetry queries (traces/logs/metrics).
integration-tests/tests/snapstart.test.ts Extends the beforeAll timeout to 30 minutes.
integration-tests/tests/payload-size.test.ts Extends the beforeAll timeout to 30 minutes.
integration-tests/tests/otlp.test.ts Extends the beforeAll timeout to 30 minutes.
integration-tests/tests/on-demand.test.ts Extends the beforeAll timeout to 30 minutes.
integration-tests/tests/lmi.test.ts Extends the beforeAll timeout to 30 minutes.
integration-tests/tests/custom-metrics.test.ts Extends the beforeAll timeout to 30 minutes.
integration-tests/tests/auth.test.ts Extends the beforeAll timeout to 30 minutes.
integration-tests/jest.config.js Increases Jest testTimeout to 30 minutes.

Comment on lines +85 to +86
const jitter = Math.floor(Math.random() * 1000);
const wait = parseRetryAfterMs(error as AxiosError) + jitter;
@jchrostek-dd jchrostek-dd merged commit a73cce1 into main Jun 25, 2026
56 of 61 checks passed
@jchrostek-dd jchrostek-dd deleted the john/lambda-ext-integ-datadog-429-retry branch June 25, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants