Skip to content

feat(http): share one TLS root config across init HTTP clients#1276

Merged
duncanista merged 3 commits into
mainfrom
jordan.gonzalez/tls-shared-roots/feature
Jun 25, 2026
Merged

feat(http): share one TLS root config across init HTTP clients#1276
duncanista merged 3 commits into
mainfrom
jordan.gonzalez/tls-shared-roots/feature

Conversation

@duncanista

@duncanista duncanista commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Jira: SVLS-9380 (H1) · epic SVLS-9378

What

Switch the non-FIPS default feature from reqwest/rustls-tls-native-roots to reqwest/rustls-tls-webpki-roots. This removes the rustls_native_certs::load_native_certs() filesystem cert scan that ran on every reqwest::Client::build() — i.e. on the cold-start path for all three init reqwest clients (register in main.rs, shared flush in http.rs, secrets-decrypt in secrets/decrypt.rs). Cold-start hypothesis H1. FIPS builds are unchanged (keep native roots).

Trade-off

webpki trusts only the public Mozilla root bundle, so private/internal OS-installed CAs are no longer trusted implicitly. Affected setups (e.g. a TLS-intercepting DD_PROXY_HTTPS) supply the CA via DD_TLS_CERT_FILE (still layers on top of webpki) or use DD_SKIP_SSL_VALIDATION. Datadog public intake chains to public roots, so the default path is unaffected.

Tested & verified

  • Hermetic TLS tests (src/http.rs): tls_cert_file trusts a private CA on top of webpki; webpki alone rejects it without the cert file; skip_ssl_validation escape hatch works; load_custom_cert parsing (single / multiple / empty / non-PEM).
  • CI feature-graph guard (scripts/verify_tls_root_features.sh + job): default build stays on webpki (no native-cert scan), FIPS stays on native roots — catches silent feature-graph drift.
  • cargo --unit-graph: native-cert scan is gone from all three reqwest clients' runtime path (native-roots survives only in a build-script dep); FIPS feature graph unchanged.
  • All GitHub Actions green (clippy default + fips, test, build, license, audit, the new TLS Root Features job).

Not doing (by design)

  • Secrets/AWS client (secrets/decrypt.rs — KMS/SM/SSM/STS): intentionally keeps default roots, no proxy / tls_cert_file / skip_ssl_validation. These are direct AWS control-plane calls that chain to public AWS CAs; DD_PROXY_HTTPS/tls_cert_file are for Datadog egress; and we deliberately don't disable cert validation on credential/secret fetches. Only a fully air-gapped setup intercepting AWS traffic with a private CA would be affected — those should use VPC endpoints (public certs). Documented in code; revisit only on a real report.
  • Trace client (traces/http_client.rs, hyper/libdd-common): keeps native roots via its own trust path — out of scope here.

Before all-region rollout

Verify GovCloud / dedicated-region intake certs chain to the webpki bundle (deferred real-endpoint canary).

@datadog-official

datadog-official Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 4 Pipeline jobs failed

DataDog/datadog-lambda-extension | integration-suite: [lmi]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | integration-suite: [on-demand]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | integration-suite: [snapstart]   View in Datadog   GitLab

View all 4 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a14f45e | Docs | Datadog PR Page | Give us feedback!

@duncanista duncanista marked this pull request as ready for review June 25, 2026 17:48
@duncanista duncanista changed the base branch from jordan.gonzalez/cold-start-instrumentation/feature to main June 25, 2026 17:49
@duncanista duncanista requested a review from a team as a code owner June 25, 2026 17:49
@duncanista duncanista requested a review from shreyamalpani June 25, 2026 17:49
Switch the non-FIPS default feature from reqwest/rustls-tls-native-roots
to reqwest/rustls-tls-webpki-roots so the two init-time reqwest clients
(the register client in bin/bottlecap/main.rs and the shared flush client
in src/http.rs) no longer call rustls_native_certs::load_native_certs() on
every reqwest::Client::build(). webpki-roots uses a compiled-in Mozilla CA
bundle, eliminating the per-build filesystem cert scan during cold start.

Custom-cert (tls_cert_file -> add_root_certificate), proxy, and
skip-ssl-validation paths are unchanged. The FIPS feature still uses
native roots and is untouched.
@duncanista duncanista force-pushed the jordan.gonzalez/tls-shared-roots/feature branch from b5de2b8 to 755cfd3 Compare June 25, 2026 17:54
Cover the trust-model change from the webpki-roots switch with hermetic
tests and a CI guard, so the cold-start win can't regress trust behavior
silently:

- tls_cert_file_trusts_private_ca: a server presenting a privately-signed
  cert is trusted when its CA is supplied via tls_cert_file — verifies
  add_root_certificate layers on top of the webpki roots (the documented
  mitigation for private-CA / DD_PROXY_HTTPS setups).
- private_ca_rejected_without_tls_cert_file: the default webpki root set
  rejects a privately-signed cert without tls_cert_file (pins the new
  behavior).
- skip_ssl_validation_accepts_untrusted_cert: the skip-validation escape
  hatch still works.
- load_custom_cert unit tests (single / multiple / empty / non-PEM).
- scripts/verify_tls_root_features.sh + CI job: assert the default build
  keeps reqwest on webpki roots (no native-cert filesystem scan) and the
  FIPS build keeps native roots, guarding against silent feature-graph drift.

Uses a self-contained, offline-generated test PKI and a local tokio-rustls
server (already in the tree via reqwest; promoted to a dev-dep, no change
to the shipped binary).
Copilot AI review requested due to automatic review settings June 25, 2026 18:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a cold-start optimization for Bottlecap’s non-FIPS builds by switching reqwest’s Rustls root source from OS-native certificate loading (filesystem scan at client build time) to compiled-in webpki-roots, and adds guards to prevent accidental feature-graph regressions.

Changes:

  • Switch bottlecap default feature from reqwest/rustls-tls-native-roots to reqwest/rustls-tls-webpki-roots (FIPS build remains on native roots).
  • Add Rust tests in src/http.rs that pin the intended trust behavior around tls_cert_file and skip_ssl_validation under the new root model.
  • Add a CI check (scripts/verify_tls_root_features.sh + workflow job) to verify resolved reqwest TLS-root features for both default and FIPS builds without compiling.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
scripts/verify_tls_root_features.sh New script to assert reqwest uses webpki roots in default builds and native roots in FIPS builds by inspecting the resolved feature graph.
bottlecap/src/http.rs Adds TLS root-trust tests using a local TLS server and a private CA to validate tls_cert_file and skip_ssl_validation behavior.
bottlecap/Cargo.toml Flips default reqwest TLS roots to webpki-roots and adds tokio-rustls as a dev-dependency for the new tests.
bottlecap/Cargo.lock Lockfile updates reflecting the added dev-dependency.
.github/workflows/rs_ci.yml Adds a new “TLS Root Features” job to run the feature-graph guard script in CI.

Comment thread bottlecap/Cargo.toml Outdated
Comment on lines +139 to +142
# scan/parse — on *every* reqwest::Client::build(), and bottlecap builds two
# reqwest clients during cold-start init (the register client in
# bin/bottlecap/main.rs and the shared flush client in src/http.rs).
# webpki-roots removes that per-build scan. Trade-off: webpki trusts only the

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in a14f45e. Updated the comment to say three reqwest clients on the cold-start path (register client in bin/bottlecap/main.rs, shared flush client in src/http.rs, and the secrets-decrypt client in src/secrets/decrypt.rs for KMS/SM/SSM/STS).

I also added a comment in decrypt.rs documenting why that third client intentionally uses the default root set with no proxy/tls_cert_file: it makes direct AWS control-plane calls (which chain to public AWS CAs), whereas DD_PROXY_HTTPS/tls_cert_file are for Datadog egress. The delegated-auth path (dd_org_uuid) talks to a Datadog endpoint and so uses the shared client instead.

…rust model

Addresses review feedback on #1276:

- Cargo.toml: the webpki-roots rationale said bottlecap builds "two" reqwest
  clients on the cold-start path; it's three — corrected to include the
  secrets-decrypt client (src/secrets/decrypt.rs, KMS/SM/SSM/STS).
- decrypt.rs: document why that client intentionally uses the default root set
  with no proxy / tls_cert_file / skip_ssl_validation — it makes direct AWS
  control-plane calls (public AWS CAs), whereas DD_PROXY_HTTPS / tls_cert_file
  are for Datadog egress; the delegated-auth path uses the shared client instead.
@duncanista duncanista merged commit b849b2e into main Jun 25, 2026
57 of 61 checks passed
@duncanista duncanista deleted the jordan.gonzalez/tls-shared-roots/feature branch June 25, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants