Skip to content

perf: configurable pool buffer retention limit via SetPoolBufferLimit#74

Open
xe-nvdk wants to merge 2 commits into
v6from
perf/pool-buffer-limit
Open

perf: configurable pool buffer retention limit via SetPoolBufferLimit#74
xe-nvdk wants to merge 2 commits into
v6from
perf/pool-buffer-limit

Conversation

@xe-nvdk

@xe-nvdk xe-nvdk commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Closes #62.

PutEncoder/PutDecoder drop internal buffers above 32KiB, so workloads with consistently larger messages re-grow the buffer on every pooled use. This adds SetPoolBufferLimit(n) to raise the retention limit. Of the three options in the issue, the configurable limit was chosen over (1) raising the default — which would silently change memory retention for all users — and (3) adaptive sizing — which adds racy bookkeeping for marginal benefit.

Design notes:

  • Default (32 KiB) unchanged. Values below the default clamp to it; n <= 0 restores it.
  • Limit stored in an atomic.Int64; the Put path checks the constant default first (poolBufOversized), so small buffers — the common case — never pay for the atomic load. Verified inlinable.
  • Applied to both encoder wbuf and decoder buf, per the issue.

Benchmarks (Apple M3 Max, benchstat, count=8)

100KB payloads, limit raised to 256 KiB vs default:

benchmark default limit 256k limit delta
MarshalAppendLarge100KB 9.31µs, 104KiB, 3 allocs 4.61µs, 24B, 1 alloc -50% ns/op
DecodeLargeStream100KB 16.14µs, 208KiB, 3 allocs 10.23µs, 104KiB, 1 alloc -37% ns/op

No regression at default settings vs v6 (the atomic is not touched for small buffers):

benchmark v6 branch delta
StructMarshal 368.5n 363.6n -1.33%
StructUnmarshal 338.8n 338.9n ~

Testing

  • pool_test.go (first internal package msgpack test file — needed to inspect unexported wbuf/buf): retention under raised limit, dropping under default, clamping behavior.
  • go test ./... and go test -race ./... pass.
  • Internal review pass (correctness+security, quality+perf) completed; only finding was the missing CHANGELOG entry, fixed in the follow-up commit.

Ignacio Van Droogenbroeck and others added 2 commits June 12, 2026 14:53
PutEncoder/PutDecoder drop internal buffers whose capacity exceeds
32KiB, so workloads with consistently larger messages re-grow the
buffer on every pooled use. Add SetPoolBufferLimit(n) to raise the
retention limit for such workloads. The default (32KiB) is unchanged;
values below the default are clamped to it, n <= 0 restores it.

The limit is stored atomically, and the Put path checks the constant
default first so small buffers (the common case) never pay for the
atomic load — small-message benchmarks are unchanged vs v6:

  StructMarshal     368.5n ± 0%   363.6n ± 1%  -1.33%
  StructUnmarshal   338.8n ± 0%   338.9n ± 0%  ~

100KB payloads with the limit raised to 256KiB (count=8):

  MarshalAppendLarge100KB   9.31µs → 4.61µs  (-50%)  104KiB → 24B/op   3 → 1 allocs
  DecodeLargeStream100KB   16.14µs → 10.23µs (-37%)  208KiB → 104KiB   3 → 1 allocs

Closes #62
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable pooled buffer retention limit via SetPoolBufferLimit(n) to allow workloads with large messages to retain grown buffers in the encoder/decoder pools instead of re-allocating. It includes benchmarks, implementation using atomic operations, and unit tests. Feedback points out that the unit tests in pool_test.go rely on the non-deterministic behavior of sync.Pool, which can lead to flaky test failures during garbage collection or goroutine rescheduling. It is recommended to test the underlying poolBufOversized function directly to ensure deterministic test results.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread pool_test.go
Comment on lines +10 to +60
func TestPoolBufferLimitEncoder(t *testing.T) {
const big = 100 * 1024

// Default limit: an oversized wbuf is dropped on Put.
enc := GetEncoder()
enc.wbuf = make([]byte, big)
PutEncoder(enc)
enc = GetEncoder()
if cap(enc.wbuf) > defaultPoolBufferLimit {
t.Fatalf("wbuf cap=%d retained above default limit", cap(enc.wbuf))
}
PutEncoder(enc)

// Raised limit: the same buffer is retained.
SetPoolBufferLimit(256 * 1024)
defer SetPoolBufferLimit(0)

enc = GetEncoder()
enc.wbuf = make([]byte, big)
PutEncoder(enc)
enc = GetEncoder()
if cap(enc.wbuf) < big {
t.Fatalf("wbuf cap=%d, want >= %d retained under raised limit", cap(enc.wbuf), big)
}
PutEncoder(enc)
}

func TestPoolBufferLimitDecoder(t *testing.T) {
const big = 100 * 1024

dec := GetDecoder()
dec.buf = make([]byte, big)
PutDecoder(dec)
dec = GetDecoder()
if cap(dec.buf) > defaultPoolBufferLimit {
t.Fatalf("buf cap=%d retained above default limit", cap(dec.buf))
}
PutDecoder(dec)

SetPoolBufferLimit(256 * 1024)
defer SetPoolBufferLimit(0)

dec = GetDecoder()
dec.buf = make([]byte, big)
PutDecoder(dec)
dec = GetDecoder()
if cap(dec.buf) < big {
t.Fatalf("buf cap=%d, want >= %d retained under raised limit", cap(dec.buf), big)
}
PutDecoder(dec)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These tests rely on sync.Pool returning the exact same encoder/decoder instance that was just put back. However, sync.Pool behavior is non-deterministic: the pool can be cleared at any time by the garbage collector (GC), or it may return a new instance if the goroutine is rescheduled onto a different P or thread. When this happens, GetEncoder() or GetDecoder() will return a newly allocated instance with a nil buffer, causing the test to fail with a false positive (e.g., wbuf cap=0, want >= 102400 retained under raised limit).

To make the tests 100% deterministic and robust, we should test the underlying poolBufOversized function directly with various capacities and configured limits, rather than relying on sync.Pool side effects.

func TestPoolBufferLimit(t *testing.T) {
	// Test default limit (32 KiB)
	if poolBufOversized(16 * 1024) {
		t.Error("16 KiB buffer should not be oversized under default limit")
	}
	if poolBufOversized(32 * 1024) {
		t.Error("32 KiB buffer should not be oversized under default limit")
	}
	if !poolBufOversized(33 * 1024) {
		t.Error("33 KiB buffer should be oversized under default limit")
	}

	// Test raised limit (256 KiB)
	SetPoolBufferLimit(256 * 1024)
	defer SetPoolBufferLimit(0)

	if poolBufOversized(100 * 1024) {
		t.Error("100 KiB buffer should not be oversized under 256 KiB limit")
	}
	if poolBufOversized(256 * 1024) {
		t.Error("256 KiB buffer should not be oversized under 256 KiB limit")
	}
	if !poolBufOversized(257 * 1024) {
		t.Error("257 KiB buffer should be oversized under 256 KiB limit")
	}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: adaptive encoder pool buffer threshold

1 participant