Skip to content

Improve performance of limit and throttle packages#7231

Open
michel-laterman wants to merge 2 commits into
elastic:mainfrom
michel-laterman:perf/limit-throttle
Open

Improve performance of limit and throttle packages#7231
michel-laterman wants to merge 2 commits into
elastic:mainfrom
michel-laterman:perf/limit-throttle

Conversation

@michel-laterman

@michel-laterman michel-laterman commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

What is the problem this PR solves?

Reduce allocations that limit and throttle packages use per request.

How does this PR solve the problem?

  • Pre allocate limit error response bodies - preventing two allocations from json encoding when under load
  • add Limiter.releaseFunc attribute to avoid a heap allocation whenever a limiter with a defined max value is acquired
  • Change signature of Throttle.Acquire to return Token, bool instead of *Token in order to avoid the token being allocated to heap

How to test this PR locally

mage test:benchmark

Benchmark findings

  internal/pkg/limit

  goos: darwin
  goarch: arm64
  pkg: github.com/elastic/fleet-server/v7/internal/pkg/limit
  cpu: Apple M3 Pro
                           │    baseline     │               after                 │
                           │    sec/op       │   sec/op     vs base                │
  LimiterAcquireNoLimit-12   1.913n ± 1%      1.913n ± 0%        ~ (p=0.925 n=10)
  LimiterAcquireWithMax-12  19.875n ± 1%      7.659n ± 1%  -61.47% (p=0.000 n=10)
  WriteError-12             226.90n ± 2%     82.03n ± 1%   -63.85% (p=0.000 n=10)
  geomean                    20.51n           10.63n        -48.16%

                           │    baseline     │               after                 │
                           │    B/op         │   B/op       vs base                │
  LimiterAcquireNoLimit-12   0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10)
  LimiterAcquireWithMax-12  16.00 ± 0%        0.00 ± 0%   -100.00% (p=0.000 n=10)
  WriteError-12             160.00 ± 0%      32.00 ± 0%    -80.00% (p=0.000 n=10)

                           │    baseline     │               after                 │
                           │    allocs/op    │ allocs/op    vs base                │
  LimiterAcquireNoLimit-12   0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=10)
  LimiterAcquireWithMax-12   1.000 ± 0%       0.000 ± 0%  -100.00% (p=0.000 n=10)
  WriteError-12              4.000 ± 0%       2.000 ± 0%   -50.00% (p=0.000 n=10)
  internal/pkg/throttle

  (Baseline benchmark was named ThrottleAcquirePointer; renamed to ThrottleAcquire for comparison.)

  goos: darwin
  goarch: arm64
  pkg: github.com/elastic/fleet-server/v7/internal/pkg/throttle
  cpu: Apple M3 Pro
                     │    baseline     │               after                 │
                     │    sec/op       │   sec/op     vs base                │
  ThrottleAcquire-12   95.38n ± 1%      85.44n ± 1%   -10.42% (p=0.000 n=10)

                     │    baseline     │               after                 │
                     │    B/op         │   B/op        vs base               │
  ThrottleAcquire-12   32.00 ± 0%       0.00 ± 0%    -100.00% (p=0.000 n=10)

                     │    baseline     │               after                 │
                     │    allocs/op    │ allocs/op     vs base               │
  ThrottleAcquire-12   1.000 ± 0%       0.000 ± 0%   -100.00% (p=0.000 n=10)

Memory profile findings

  limit (pprof -diff_base):

  ┌──────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────┐
  │           Function           │                                       Freed                                       │
  ├──────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ (*Limiter).Acquire           │ -9.09 GB - method value closure gone entirely                                     │
  ├──────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ encoding/json.Marshal        │ -3.98 GB - no longer called on 429 paths                                          │
  ├──────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ limit.writeError             │ -2.38 GB - anonymous struct allocation gone                                       │
  ├──────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────┤
  │ net/textproto.MIMEHeader.Set │ -1.46 GB - indirectly reduced (fewer header writes from fewer json.Marshal paths) │
  └──────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────┘

  Total: -16.91 GB across the benchmark run, representing 79.5% of total allocations eliminated.
  throttle (pprof -diff_base):

  ┌─────────────────────┬──────────────────────────────────────────────────┐
  │      Function       │                      Freed                       │
  ├─────────────────────┼──────────────────────────────────────────────────┤
  │ (*Throttle).Acquire │ -3.70 GB - Token struct no longer heap-allocated │
  └─────────────────────┴──────────────────────────────────────────────────┘

  Total: -3.70 GB, 99.9% of all allocations in the package eliminated. Acquire produces zero heap allocations, so essentially nothing shows up in the after profile for the acquire path.

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool

Adds benchmarks establishing allocation baselines for two hot paths:
- Limiter.Acquire: current implementation allocates a new method value
  (heap escapes l.release) on every call when maxLimit is set; local
  limiterPreBound shows 0-alloc alternative via pre-bound releaseFunc
- writeError: json.Marshal allocates on every 429 response; local
  writeErrorStatic shows 0-alloc alternative via pre-computed bodies
- Throttle.Acquire: current *Token return allocates 32 B per acquire;
  local acquireValue shows (Token, bool) keeps Token on the stack

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_ac289332-cb6f-4471-827a-686196f1398c
@michel-laterman michel-laterman added the enhancement New feature or request label Jun 19, 2026
@mergify

mergify Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

This pull request does not have a backport label. Could you fix it @michel-laterman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

limit.Acquire: add releaseFunc field to Limiter, bound once at
NewLimiter time. Acquire now returns the pre-bound field directly
instead of capturing l.release as a new method value on each call.
Eliminates 1 heap allocation (16 B) per Acquire when maxLimit is set.

limit.writeError: replace json.Marshal of an anonymous struct with
pre-computed []byte bodies. Eliminates 2 allocations (~128 B) on
every 429 response.

throttle.Acquire: change return from *Token to (Token, bool). Token
no longer escapes to the heap on each successful acquire, eliminating
1 allocation (32 B) per call. Update caller in handleArtifacts.go and
all tests accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_ac289332-cb6f-4471-827a-686196f1398c
@github-actions

Copy link
Copy Markdown
Contributor

TL;DR

The Mac unit-test job did not run any tests. Buildkite build 15171 failed during checkout because it fetched PR #7231 at current head 5c3b3ca1ea7b3ffa2aa5cab197f45b6c05f0f7c2, then tried to check out older commit bc8b1627ba88feef4fbf3dee25b2e8cb26df0002.

Remediation

  • Treat build 15171 as stale/obsolete after the PR head changed; do not chase a unit-test failure from this run.
  • Let the newer Buildkite run for current head 5c3b3ca1ea7b3ffa2aa5cab197f45b6c05f0f7c2 complete, or rerun CI on the current PR head if it does not progress.
Investigation details

Root Cause

This is an infrastructure/stale-status checkout failure. The job fetched refs/pull/7231/head, which resolved to 5c3b3ca1ea7b3ffa2aa5cab197f45b6c05f0f7c2, but the Buildkite event was for bc8b1627ba88feef4fbf3dee25b2e8cb26df0002. Since the job only fetched the PR ref and that ref no longer pointed at bc8b162..., git checkout failed before .buildkite/scripts/unit_test.sh could execute.

Evidence

# FETCH_HEAD is now `5c3b3ca1ea7b3ffa2aa5cab197f45b6c05f0f7c2`
$ git checkout -f bc8b1627ba88feef4fbf3dee25b2e8cb26df0002
fatal: reference is not a tree: bc8b1627ba88feef4fbf3dee25b2e8cb26df0002

The same checkout failure repeated for all 3 retry attempts. GitHub currently reports PR #7231 head as 5c3b3ca1ea7b3ffa2aa5cab197f45b6c05f0f7c2, and the buildkite/fleet-server status for that head points to newer build 15172.

Verification

  • Not run locally: no source/test command failed in this job; checkout failed before tests started.

Follow-up

  • If build 15172 fails after checkout succeeds, investigate that newer log as the actionable failure.

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@michel-laterman michel-laterman linked an issue Jun 22, 2026 that may be closed by this pull request
@michel-laterman

Copy link
Copy Markdown
Contributor Author

buildkite run perf-tests

@michel-laterman michel-laterman marked this pull request as ready for review June 23, 2026 16:11
@michel-laterman michel-laterman requested a review from a team as a code owner June 23, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fleet-server throttle/limit improvements

1 participant