perf(runtime): size Tokio worker pool from Lambda memory tier#1277
Draft
duncanista wants to merge 1 commit into
Draft
Conversation
Replace #[tokio::main] with an explicit multi-thread runtime whose worker count is derived from AWS_LAMBDA_FUNCTION_MEMORY_SIZE. AWS grants ~1 vCPU per 1769 MB, so workers = round(mem_mb / 1769) clamped to 1..=4 (integer math, no float casts; defaults to 2 when the env var is missing or unparseable). The init body moves verbatim into run(); all H0 cold-start instrumentation is preserved.
Contributor
|
12 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Jira: none yet — add before marking ready.
Overview
Cold-start hypothesis H15: right-size the Tokio runtime.
Today the extension uses
#[tokio::main], which sizes the worker pool fromstd::thread::available_parallelism(). In a Lambda sandbox that reflects the host's core count, not the fraction of vCPU the function is actually granted, so low-memory functions can spin up more worker threads than they have CPU for (extra thread stacks + scheduler overhead during init).This PR replaces
#[tokio::main]with an explicit multi-thread runtime whose worker count is derived from the Lambda memory tier:workers = round(mem_mb / 1769), computed with integer math ((mem_mb + 884) / 1769, since884 = 1769 / 2) to avoidclippy::pedanticfloat-cast lints, then clamped to1..=4.AWS_LAMBDA_FUNCTION_MEMORY_SIZEand parsed asu32; if the variable is missing or unparseable, we fall back to 2 workers (no.unwrap()).Resulting mapping (a few points):
.enable_all()is preserved and thert-multi-threadTokio feature is unchanged. The init body is moved verbatim fromasync fn maininto a newasync fn run()that the runtime drives viablock_on(run()); no other behavior changes, and all H0 cold-start instrumentation (theavailable_parallelismdebug log,log_init_checkpointhelper and its calls) is preserved.This pairs with the H0
available_parallelismlog added in #1271: that log makes the value the runtime would have used observable per tier, and this change makes the value it does use deliberate. Net cold-start and steady-state effect of the chosen worker count still needs benchmarking across memory tiers before this is considered a confirmed win.Testing
cargo fmtandcargo clippy --bin bottlecap --no-depsare clean (only the pre-existingbuf_redux/multipartfuture-incompatibility note remains; pedantic +unwrap_usedare denied crate-wide and pass).128 / 512 / 884 / 1769 → 1,2654 / 3008 / 3538 → 2,5307 → 3,7076 / 10240 → 4,0 → 1, and missing/empty/non-numeric/whitespace-padded inputs → fallback/trimmed as expected.