Skip to content

perf(appsec): build WAF off the init critical path#1283

Draft
duncanista wants to merge 1 commit into
jordan.gonzalez/cold-start-instrumentation/featurefrom
jordan.gonzalez/appsec-defer/feature
Draft

perf(appsec): build WAF off the init critical path#1283
duncanista wants to merge 1 commit into
jordan.gonzalez/cold-start-instrumentation/featurefrom
jordan.gonzalez/appsec-defer/feature

Conversation

@duncanista

Copy link
Copy Markdown
Contributor

Stacked on #1271. Draft. Base branch is jordan.gonzalez/cold-start-instrumentation/feature (the H0 cold-start instrumentation PR), not main. Review/merge #1271 first; this PR's diff is only the AppSec change on top of it.

Jira: none yet — add before marking ready.

Overview

When App & API Protection (AAP) is enabled, building the WAF is expensive and happens on the init critical path today. AppSecProcessor::new(config) runs synchronously during extension init and:

  • zstd-decompresses the recommended ruleset (~29 KB compressed -> ~322 KB),
  • serde_json-parses it into a WafMap, and
  • compiles the libddwaf WAF instance.

That is tens of milliseconds of CPU-bound work that blocks startup. But the WAF is only needed once the first request payload is evaluated — in the proxy interceptor's process_invocation_next (and on the trace path / response path) — which is strictly after the first /next. So none of it needs to be on the synchronous init path.

Approach (deferred, awaitable handle + spawn_blocking)

Replace the eager Option<Arc<TokioMutex<AppSecProcessor>>> that was threaded through every consumer with a deferred handle:

type SharedProcessor   = Arc<tokio::sync::Mutex<Processor>>;
type DeferredProcessor = Arc<tokio::sync::OnceCell<Option<SharedProcessor>>>;
  • Disabled stays cheap. appsec::defer_processor(cfg) checks the feature flag synchronously and returns None immediately — no OnceCell, no build, no spawned task. This preserves the disabled-by-default behavior (previously the Err(FeatureDisabled) arm).
  • Enabled builds off the critical path. When enabled, defer_processor creates the OnceCell, spawns a background task that builds the processor inside tokio::task::spawn_blocking (CPU-bound work belongs on the blocking pool), and returns the handle right away. Init no longer blocks on the build.
  • Consumers resolve at point of use. The trace processor (SendingTraceProcessor::send_processed_traces) and the proxy interceptor (invocation_next_proxy / invocation_response_proxy / invocation_error_proxy) call appsec::resolve(handle, cfg).await exactly where they need the WAF, then .lock().await as before.

The two nested Options encode three states cleanly:

  • outer None -> feature disabled (no handle);
  • inner Some(proc) -> WAF built successfully;
  • inner None -> build failed (logged) -> feature is a silent no-op, exactly as the old Err(_) => None arm behaved.

Correctness: what if a request arrives before the WAF is built?

The handle is a tokio::sync::OnceCell, and consumers resolve it via get_or_init:

  • Normal case: the background build (kicked off immediately after init, off the synchronous path) is finished well before the first request payload is evaluated — that evaluation only happens after the first /next round-trip — so resolve returns instantly.
  • Race case: if a consumer somehow reaches resolve before the background build has completed, get_or_init simply awaits the in-flight build (or, if the background task has not been scheduled yet, runs an equivalent spawn_blocking build itself). OnceCell guarantees a single initializer, so there is never a double-build and never a missed evaluation — the first request transparently waits just until the WAF is ready instead of the whole init blocking on it.

No request can be processed against a half-built WAF, and no evaluation is skipped.

Scope

Only affects AppSec-enabled configurations. With AAP disabled (the default), behavior is byte-for-byte unchanged and the path stays synchronous and allocation-free.

Files changed:

  • bottlecap/src/appsec/mod.rs — new SharedProcessor / DeferredProcessor types, defer_processor, resolve, and build_processor (the spawn_blocking builder).
  • bottlecap/src/bin/bottlecap/main.rs — construct via appsec::defer_processor; thread the handle type through; pass config to the proxy. H0 init instrumentation preserved.
  • bottlecap/src/proxy/interceptor.rs — hold the deferred handle in InterceptorState, carry Arc<Config>, resolve at each WAF use site.
  • bottlecap/src/traces/trace_agent.rs — field/param type updated to the deferred handle.
  • bottlecap/src/traces/trace_processor.rsSendingTraceProcessor.appsec is now the deferred handle; resolve before locking.

Testing

  • cargo fmt clean.
  • cargo clippy --bin bottlecap --no-deps clean (clippy::all + pedantic + unwrap_used denied); also clippy-clean across --all-targets. Only the pre-existing buf_redux / multipart future-incompat warning remains.
  • cargo test for appsec::* (24 tests), proxy::interceptor::tests::test_noop_proxy, and traces::trace_processor::* (21 tests) all pass.
  • Disabled-by-default path verified: Config::default() (AAP off) yields None from defer_processor with no spawned build, exercised by test_noop_proxy.

AppSecProcessor::new zstd-decompresses a ~29KB->322KB ruleset, JSON-parses
it, and compiles the libddwaf WAF (tens of ms) synchronously during init.
The WAF is only needed once the first request payload is evaluated, which is
strictly after the first /next, so this work does not belong on the init
critical path.

Replace the eager Option<Arc<Mutex<Processor>>> with a deferred, awaitable
handle (Arc<OnceCell<Option<Arc<Mutex<Processor>>>>>). When AppSec is enabled,
the build runs on the blocking pool (spawn_blocking) from a background task;
consumers (trace processor and the runtime API proxy) resolve the handle where
they actually use the WAF, awaiting the in-flight build if a request somehow
arrives before it finishes. The disabled-by-default path stays cheap: the
feature flag is checked synchronously and yields no handle and no build.
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 24, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 5 Pipeline jobs failed

DataDog/datadog-lambda-extension | integration-suite: [lmi]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | e2e-test-status (amd64)   View in Datadog   GitLab

DataDog/datadog-lambda-extension | e2e-test-status (amd64, fips)   View in Datadog   GitLab

View all 5 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 47f251b | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant