Skip to content

feat(init): add cold-start init-phase timing instrumentation#1271

Draft
duncanista wants to merge 1 commit into
mainfrom
jordan.gonzalez/cold-start-instrumentation/feature
Draft

feat(init): add cold-start init-phase timing instrumentation#1271
duncanista wants to merge 1 commit into
mainfrom
jordan.gonzalez/cold-start-instrumentation/feature

Conversation

@duncanista

@duncanista duncanista commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Please include Jira ticket in title. (no ticket yet — add your SLS ticket before marking ready for review.)

Overview

Adds debug-gated init-phase timing instrumentation to the extension entry point so cold-start (initialization) time can be attributed per phase and measured rather than guessed. This is the measurement prerequisite (H0) for the cold-start improvement effort.

A small helper log_init_checkpoint(start_time, phase) is called at each key boundary in main() / extension_loop_active() and logs both:

  • delta — time since the previous checkpoint (that phase's own cost), and
  • cumulative — time since process start,

in milliseconds to 6 decimal places (nanosecond resolution), so sub-millisecond phases are visible and per-phase cost is read directly off each line — no subtracting consecutive values by hand. Phases:

crypto_provider_readytls_client_buildconfig_parseshared_client_readyregister_readydogstatsd_startedtrace_agent_startedtelemetry_subscribedready

These isolate the costs the cold-start work targets — e.g. tls_client_build and shared_client_ready are the two redundant reqwest/TLS client builds (H1), register_ready/telemetry_subscribed are the two Extensions-API round-trips (H2). It also logs available_parallelism() once — the value Tokio sizes its worker pool from, which varies per Lambda memory tier (H15).

Example output (only at DD_LOG_LEVEL=debug):

INIT | available_parallelism=Ok(2)
INIT | phase=crypto_provider_ready delta=0.412000ms cumulative=0.412000ms
INIT | phase=tls_client_build delta=5.931200ms cumulative=6.343200ms
INIT | phase=config_parse delta=1.204050ms cumulative=7.547250ms
INIT | phase=shared_client_ready delta=4.883100ms cumulative=12.430350ms
INIT | phase=register_ready delta=11.077000ms cumulative=23.507350ms
...
INIT | phase=ready delta=2.918000ms cumulative=41.902000ms

Notes

  • Effectively free at the default info level: the per-phase bookkeeping (clock read + atomic) is guarded behind a DEBUG-level check and returns early otherwise; nothing is logged or computed.
  • Nanosecond-resolution timing underneath; per-phase delta is tracked across the two functions via a module-level AtomicU64 of nanos (checkpoints run sequentially on the init path; saturating_sub guards against any reordering).
  • No behavior change — debug-only observability. grep 'INIT |' yields the full timeline as one series.

Testing

  • cargo clippy --bin bottlecap — clean (crate denies clippy::all + clippy::pedantic + clippy::unwrap_used).
  • cargo fmt — clean.
  • Manual (to do): run with DD_LOG_LEVEL=debug via the local_tests/ RIE harness and confirm the INIT | series prints, deltas are non-negative, and they sum to ~the existing "ready in {}ms" total. Use scripts/bench.sh + scripts/report_stats.py to capture deployed cold-start Init Duration deltas.

@datadog-official

datadog-official Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 5 Pipeline jobs failed

DataDog/datadog-lambda-extension | integration-suite: [lmi]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | integration-suite: [on-demand]   View in Datadog   GitLab

DataDog/datadog-lambda-extension | e2e-test-status (amd64)   View in Datadog   GitLab

View all 5 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: d46a7ec | Docs | Datadog PR Page | Give us feedback!

@duncanista duncanista force-pushed the jordan.gonzalez/cold-start-instrumentation/feature branch 2 times, most recently from 815f22a to 2dd3d75 Compare June 24, 2026 01:20
Add debug-gated checkpoints at the key cold-start init boundaries (crypto
provider, TLS client build, config parse, shared client, register,
dogstatsd, trace agent, telemetry subscribe, ready), plus a one-time
available_parallelism() log.

Each checkpoint logs delta (time since the previous checkpoint = that
phase's own cost) and cumulative (time since process start), in
milliseconds to 6 decimal places (nanosecond resolution) so sub-millisecond
phases are visible. Init time is attributed per phase directly, with no
manual subtraction. The per-phase bookkeeping is guarded behind a
DEBUG-level check, so it stays effectively free at the default info level.

This is the measurement prerequisite (H0) for the cold-start improvements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant