Skip to content

perf(init): parallelize init HTTP client construction#1282

Draft
duncanista wants to merge 1 commit into
jordan.gonzalez/cold-start-instrumentation/featurefrom
jordan.gonzalez/http-client-reuse/feature
Draft

perf(init): parallelize init HTTP client construction#1282
duncanista wants to merge 1 commit into
jordan.gonzalez/cold-start-instrumentation/featurefrom
jordan.gonzalez/http-client-reuse/feature

Conversation

@duncanista

Copy link
Copy Markdown
Contributor

Overview

Cold-start optimization (Confluence H4). During init, main.rs builds two reqwest clients whose TLS construction loads native root certificates:

  • the register / /next client (built via create_reqwest_client_builder()…no_proxy().build()), and
  • the shared flushing client (bottlecap::http::get_client, used for metrics / logs / trace-proxy / Datadog API calls).

Previously these two builds ran serially. This PR overlaps them: the register//next client is now built on a blocking thread (spawn_blocking) inside a spawned task, and the extension register network round-trip runs there too — so both the second cert load and the register call overlap with config parsing and the shared client build on the main task.

Approach: parallelize, not merge

I deliberately did not collapse the two clients into one. Their requirements genuinely conflict:

Setting register / /next client shared flushing client
Proxy (DD_PROXY_HTTPS) must not use it (.no_proxy()) — Runtime API is local proxy-aware
flush_timeout none/next is a long-poll that blocks until the next invocation; a timeout would abort it timeout(flush_timeout)
pool_max_idle_per_host(0) (issue #1092 stale-conn fix) n/a required
http2/http1 + custom cert n/a required

Because a flush timeout and the proxy would both break the /next long-poll, merging is not behavior-safe. So I took the fallback: build the two clients concurrently instead.

Preserved settings / behavior

  • Shared client still built via bottlecap::http::get_client(&config) → keeps flush_timeout, pool_max_idle_per_host(0), proxy, http2/http1 selection, and custom-cert unchanged.
  • Register//next client still built via create_reqwest_client_builder()…no_proxy().build() (no proxy, no timeout); it is handed back from the task and reused for the /next long-poll for the extension's lifetime — same as before.
  • clippy.toml disallowed-methods respected — no direct reqwest::Client::builder; still uses the FIPS adapter / get_client.
  • H0 cold-start init instrumentation is preserved. The tls_client_build checkpoint is folded into the parallel build phase (it is no longer a distinct serial step); crypto_provider_readyconfig_parseshared_client_readyregister_ready remain sequential and meaningful.

Files changed: bottlecap/src/bin/bottlecap/main.rs (init section only).

Draft, stacked on #1271 (jordan.gonzalez/cold-start-instrumentation/feature), which provides the H0 init instrumentation this builds on. Review/merge #1271 first.

Overlaps with H1 (#1276, shared TLS ClientConfig): H1 reduces the per-client cert-loading cost; this PR overlaps the two builds. They are complementary and will need a trivial reconcile when both land.

Testing

  • cargo fmt clean.
  • cargo clippy --bin bottlecap --no-deps clean (clippy::all + pedantic + unwrap_used denied); only the pre-existing buf_redux/multipart future-incompat warning remains.
  • Behavior-preserving refactor: client settings and downstream usage (extension_loop_active / extension_loop_idle) are unchanged; the register//next client is the same one, just built concurrently and returned from the task.

Jira: none yet — add before marking ready.

Build the register/`/next` reqwest client on a blocking thread inside a
spawned task so its native-cert-loading TLS build (and the register network
round-trip) overlaps with config parsing and the shared flushing client
build, instead of running serially during cold start.

The register/`/next` client and the shared flushing client are kept
separate on purpose and not collapsed: the Extension API register + `/next`
long-poll must use `.no_proxy()` and carry no `flush_timeout` (which would
abort the long-poll), while the shared client requires proxy support, a
flush_timeout, and pool_max_idle_per_host(0). Those needs conflict, so their
construction is overlapped rather than merged. All existing client settings
and the cold-start init checkpoints are preserved.
@datadog-prod-us1-3

datadog-prod-us1-3 Bot commented Jun 24, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 6 Pipeline jobs failed

DataDog/datadog-lambda-extension | integration-suite: [lmi]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | e2e-test-status (amd64)   View in Datadog   GitLab

DataDog/datadog-lambda-extension | integration-suite: [auth]   View in Datadog   GitLab

View all 6 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 48694df | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant