Skip to content

perf(runtime): size Tokio worker pool from Lambda memory tier#1277

Draft
duncanista wants to merge 1 commit into
jordan.gonzalez/cold-start-instrumentation/featurefrom
jordan.gonzalez/tokio-runtime/feature
Draft

perf(runtime): size Tokio worker pool from Lambda memory tier#1277
duncanista wants to merge 1 commit into
jordan.gonzalez/cold-start-instrumentation/featurefrom
jordan.gonzalez/tokio-runtime/feature

Conversation

@duncanista

Copy link
Copy Markdown
Contributor

DRAFT — stacked on #1271 (jordan.gonzalez/cold-start-instrumentation/feature). Review/merge #1271 first; this PR's diff is against that branch, not main.

Jira: none yet — add before marking ready.

Overview

Cold-start hypothesis H15: right-size the Tokio runtime.

Today the extension uses #[tokio::main], which sizes the worker pool from std::thread::available_parallelism(). In a Lambda sandbox that reflects the host's core count, not the fraction of vCPU the function is actually granted, so low-memory functions can spin up more worker threads than they have CPU for (extra thread stacks + scheduler overhead during init).

This PR replaces #[tokio::main] with an explicit multi-thread runtime whose worker count is derived from the Lambda memory tier:

  • AWS allocates roughly 1 vCPU per 1769 MB of configured memory.
  • workers = round(mem_mb / 1769), computed with integer math ((mem_mb + 884) / 1769, since 884 = 1769 / 2) to avoid clippy::pedantic float-cast lints, then clamped to 1..=4.
  • The memory value is read from AWS_LAMBDA_FUNCTION_MEMORY_SIZE and parsed as u32; if the variable is missing or unparseable, we fall back to 2 workers (no .unwrap()).

Resulting mapping (a few points):

Memory vCPU (approx) Workers
≤ 1769 MB ≤ 1.0 1
~2654–3538 MB ~1.5–2.0 2
~5307 MB ~3.0 3
≥ 7076 MB (incl. 10240 max) ≥ 4.0 4 (clamped)

.enable_all() is preserved and the rt-multi-thread Tokio feature is unchanged. The init body is moved verbatim from async fn main into a new async fn run() that the runtime drives via block_on(run()); no other behavior changes, and all H0 cold-start instrumentation (the available_parallelism debug log, log_init_checkpoint helper and its calls) is preserved.

This pairs with the H0 available_parallelism log added in #1271: that log makes the value the runtime would have used observable per tier, and this change makes the value it does use deliberate. Net cold-start and steady-state effect of the chosen worker count still needs benchmarking across memory tiers before this is considered a confirmed win.

Testing

  • cargo fmt and cargo clippy --bin bottlecap --no-deps are clean (only the pre-existing buf_redux/multipart future-incompatibility note remains; pedantic + unwrap_used are denied crate-wide and pass).
  • The heuristic was unit-checked against representative inputs — 128 / 512 / 884 / 1769 → 1, 2654 / 3008 / 3538 → 2, 5307 → 3, 7076 / 10240 → 4, 0 → 1, and missing/empty/non-numeric/whitespace-padded inputs → fallback/trimmed as expected.
  • Worker-count performance impact across memory tiers to be measured (see note above).

Replace #[tokio::main] with an explicit multi-thread runtime whose worker
count is derived from AWS_LAMBDA_FUNCTION_MEMORY_SIZE. AWS grants ~1 vCPU
per 1769 MB, so workers = round(mem_mb / 1769) clamped to 1..=4 (integer
math, no float casts; defaults to 2 when the env var is missing or
unparseable). The init body moves verbatim into run(); all H0 cold-start
instrumentation is preserved.
@datadog-official

datadog-official Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 7 Pipeline jobs failed

DataDog/datadog-lambda-extension | integration-suite: [on-demand]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | e2e-test-status (amd64)   View in Datadog   GitLab

DataDog/datadog-lambda-extension | e2e-test-status (amd64, fips)   View in Datadog   GitLab

View all 7 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a75e3cb | Docs | Datadog PR Page | Give us feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant