feat(http): share one TLS root config across init HTTP clients#1276
Conversation
|
Switch the non-FIPS default feature from reqwest/rustls-tls-native-roots to reqwest/rustls-tls-webpki-roots so the two init-time reqwest clients (the register client in bin/bottlecap/main.rs and the shared flush client in src/http.rs) no longer call rustls_native_certs::load_native_certs() on every reqwest::Client::build(). webpki-roots uses a compiled-in Mozilla CA bundle, eliminating the per-build filesystem cert scan during cold start. Custom-cert (tls_cert_file -> add_root_certificate), proxy, and skip-ssl-validation paths are unchanged. The FIPS feature still uses native roots and is untouched.
b5de2b8 to
755cfd3
Compare
Cover the trust-model change from the webpki-roots switch with hermetic tests and a CI guard, so the cold-start win can't regress trust behavior silently: - tls_cert_file_trusts_private_ca: a server presenting a privately-signed cert is trusted when its CA is supplied via tls_cert_file — verifies add_root_certificate layers on top of the webpki roots (the documented mitigation for private-CA / DD_PROXY_HTTPS setups). - private_ca_rejected_without_tls_cert_file: the default webpki root set rejects a privately-signed cert without tls_cert_file (pins the new behavior). - skip_ssl_validation_accepts_untrusted_cert: the skip-validation escape hatch still works. - load_custom_cert unit tests (single / multiple / empty / non-PEM). - scripts/verify_tls_root_features.sh + CI job: assert the default build keeps reqwest on webpki roots (no native-cert filesystem scan) and the FIPS build keeps native roots, guarding against silent feature-graph drift. Uses a self-contained, offline-generated test PKI and a local tokio-rustls server (already in the tree via reqwest; promoted to a dev-dep, no change to the shipped binary).
There was a problem hiding this comment.
Pull request overview
This PR implements a cold-start optimization for Bottlecap’s non-FIPS builds by switching reqwest’s Rustls root source from OS-native certificate loading (filesystem scan at client build time) to compiled-in webpki-roots, and adds guards to prevent accidental feature-graph regressions.
Changes:
- Switch
bottlecapdefault feature fromreqwest/rustls-tls-native-rootstoreqwest/rustls-tls-webpki-roots(FIPS build remains on native roots). - Add Rust tests in
src/http.rsthat pin the intended trust behavior aroundtls_cert_fileandskip_ssl_validationunder the new root model. - Add a CI check (
scripts/verify_tls_root_features.sh+ workflow job) to verify resolved reqwest TLS-root features for both default and FIPS builds without compiling.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
scripts/verify_tls_root_features.sh |
New script to assert reqwest uses webpki roots in default builds and native roots in FIPS builds by inspecting the resolved feature graph. |
bottlecap/src/http.rs |
Adds TLS root-trust tests using a local TLS server and a private CA to validate tls_cert_file and skip_ssl_validation behavior. |
bottlecap/Cargo.toml |
Flips default reqwest TLS roots to webpki-roots and adds tokio-rustls as a dev-dependency for the new tests. |
bottlecap/Cargo.lock |
Lockfile updates reflecting the added dev-dependency. |
.github/workflows/rs_ci.yml |
Adds a new “TLS Root Features” job to run the feature-graph guard script in CI. |
| # scan/parse — on *every* reqwest::Client::build(), and bottlecap builds two | ||
| # reqwest clients during cold-start init (the register client in | ||
| # bin/bottlecap/main.rs and the shared flush client in src/http.rs). | ||
| # webpki-roots removes that per-build scan. Trade-off: webpki trusts only the |
There was a problem hiding this comment.
Good catch — fixed in a14f45e. Updated the comment to say three reqwest clients on the cold-start path (register client in bin/bottlecap/main.rs, shared flush client in src/http.rs, and the secrets-decrypt client in src/secrets/decrypt.rs for KMS/SM/SSM/STS).
I also added a comment in decrypt.rs documenting why that third client intentionally uses the default root set with no proxy/tls_cert_file: it makes direct AWS control-plane calls (which chain to public AWS CAs), whereas DD_PROXY_HTTPS/tls_cert_file are for Datadog egress. The delegated-auth path (dd_org_uuid) talks to a Datadog endpoint and so uses the shared client instead.
…rust model Addresses review feedback on #1276: - Cargo.toml: the webpki-roots rationale said bottlecap builds "two" reqwest clients on the cold-start path; it's three — corrected to include the secrets-decrypt client (src/secrets/decrypt.rs, KMS/SM/SSM/STS). - decrypt.rs: document why that client intentionally uses the default root set with no proxy / tls_cert_file / skip_ssl_validation — it makes direct AWS control-plane calls (public AWS CAs), whereas DD_PROXY_HTTPS / tls_cert_file are for Datadog egress; the delegated-auth path uses the shared client instead.
Jira: SVLS-9380 (H1) · epic SVLS-9378
What
Switch the non-FIPS
defaultfeature fromreqwest/rustls-tls-native-rootstoreqwest/rustls-tls-webpki-roots. This removes therustls_native_certs::load_native_certs()filesystem cert scan that ran on everyreqwest::Client::build()— i.e. on the cold-start path for all three init reqwest clients (register inmain.rs, shared flush inhttp.rs, secrets-decrypt insecrets/decrypt.rs). Cold-start hypothesis H1. FIPS builds are unchanged (keep native roots).Trade-off
webpki trusts only the public Mozilla root bundle, so private/internal OS-installed CAs are no longer trusted implicitly. Affected setups (e.g. a TLS-intercepting
DD_PROXY_HTTPS) supply the CA viaDD_TLS_CERT_FILE(still layers on top of webpki) or useDD_SKIP_SSL_VALIDATION. Datadog public intake chains to public roots, so the default path is unaffected.Tested & verified
src/http.rs):tls_cert_filetrusts a private CA on top of webpki; webpki alone rejects it without the cert file;skip_ssl_validationescape hatch works;load_custom_certparsing (single / multiple / empty / non-PEM).scripts/verify_tls_root_features.sh+ job): default build stays on webpki (no native-cert scan), FIPS stays on native roots — catches silent feature-graph drift.cargo --unit-graph: native-cert scan is gone from all three reqwest clients' runtime path (native-roots survives only in a build-script dep); FIPS feature graph unchanged.default+fips, test, build, license, audit, the new TLS Root Features job).Not doing (by design)
secrets/decrypt.rs— KMS/SM/SSM/STS): intentionally keeps default roots, no proxy /tls_cert_file/skip_ssl_validation. These are direct AWS control-plane calls that chain to public AWS CAs;DD_PROXY_HTTPS/tls_cert_fileare for Datadog egress; and we deliberately don't disable cert validation on credential/secret fetches. Only a fully air-gapped setup intercepting AWS traffic with a private CA would be affected — those should use VPC endpoints (public certs). Documented in code; revisit only on a real report.traces/http_client.rs, hyper/libdd-common): keeps native roots via its own trust path — out of scope here.Before all-region rollout
Verify GovCloud / dedicated-region intake certs chain to the webpki bundle (deferred real-endpoint canary).