perf: configurable pool buffer retention limit via SetPoolBufferLimit#74
perf: configurable pool buffer retention limit via SetPoolBufferLimit#74xe-nvdk wants to merge 2 commits into
Conversation
PutEncoder/PutDecoder drop internal buffers whose capacity exceeds 32KiB, so workloads with consistently larger messages re-grow the buffer on every pooled use. Add SetPoolBufferLimit(n) to raise the retention limit for such workloads. The default (32KiB) is unchanged; values below the default are clamped to it, n <= 0 restores it. The limit is stored atomically, and the Put path checks the constant default first so small buffers (the common case) never pay for the atomic load — small-message benchmarks are unchanged vs v6: StructMarshal 368.5n ± 0% 363.6n ± 1% -1.33% StructUnmarshal 338.8n ± 0% 338.9n ± 0% ~ 100KB payloads with the limit raised to 256KiB (count=8): MarshalAppendLarge100KB 9.31µs → 4.61µs (-50%) 104KiB → 24B/op 3 → 1 allocs DecodeLargeStream100KB 16.14µs → 10.23µs (-37%) 208KiB → 104KiB 3 → 1 allocs Closes #62
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable pooled buffer retention limit via SetPoolBufferLimit(n) to allow workloads with large messages to retain grown buffers in the encoder/decoder pools instead of re-allocating. It includes benchmarks, implementation using atomic operations, and unit tests. Feedback points out that the unit tests in pool_test.go rely on the non-deterministic behavior of sync.Pool, which can lead to flaky test failures during garbage collection or goroutine rescheduling. It is recommended to test the underlying poolBufOversized function directly to ensure deterministic test results.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func TestPoolBufferLimitEncoder(t *testing.T) { | ||
| const big = 100 * 1024 | ||
|
|
||
| // Default limit: an oversized wbuf is dropped on Put. | ||
| enc := GetEncoder() | ||
| enc.wbuf = make([]byte, big) | ||
| PutEncoder(enc) | ||
| enc = GetEncoder() | ||
| if cap(enc.wbuf) > defaultPoolBufferLimit { | ||
| t.Fatalf("wbuf cap=%d retained above default limit", cap(enc.wbuf)) | ||
| } | ||
| PutEncoder(enc) | ||
|
|
||
| // Raised limit: the same buffer is retained. | ||
| SetPoolBufferLimit(256 * 1024) | ||
| defer SetPoolBufferLimit(0) | ||
|
|
||
| enc = GetEncoder() | ||
| enc.wbuf = make([]byte, big) | ||
| PutEncoder(enc) | ||
| enc = GetEncoder() | ||
| if cap(enc.wbuf) < big { | ||
| t.Fatalf("wbuf cap=%d, want >= %d retained under raised limit", cap(enc.wbuf), big) | ||
| } | ||
| PutEncoder(enc) | ||
| } | ||
|
|
||
| func TestPoolBufferLimitDecoder(t *testing.T) { | ||
| const big = 100 * 1024 | ||
|
|
||
| dec := GetDecoder() | ||
| dec.buf = make([]byte, big) | ||
| PutDecoder(dec) | ||
| dec = GetDecoder() | ||
| if cap(dec.buf) > defaultPoolBufferLimit { | ||
| t.Fatalf("buf cap=%d retained above default limit", cap(dec.buf)) | ||
| } | ||
| PutDecoder(dec) | ||
|
|
||
| SetPoolBufferLimit(256 * 1024) | ||
| defer SetPoolBufferLimit(0) | ||
|
|
||
| dec = GetDecoder() | ||
| dec.buf = make([]byte, big) | ||
| PutDecoder(dec) | ||
| dec = GetDecoder() | ||
| if cap(dec.buf) < big { | ||
| t.Fatalf("buf cap=%d, want >= %d retained under raised limit", cap(dec.buf), big) | ||
| } | ||
| PutDecoder(dec) | ||
| } |
There was a problem hiding this comment.
These tests rely on sync.Pool returning the exact same encoder/decoder instance that was just put back. However, sync.Pool behavior is non-deterministic: the pool can be cleared at any time by the garbage collector (GC), or it may return a new instance if the goroutine is rescheduled onto a different P or thread. When this happens, GetEncoder() or GetDecoder() will return a newly allocated instance with a nil buffer, causing the test to fail with a false positive (e.g., wbuf cap=0, want >= 102400 retained under raised limit).
To make the tests 100% deterministic and robust, we should test the underlying poolBufOversized function directly with various capacities and configured limits, rather than relying on sync.Pool side effects.
func TestPoolBufferLimit(t *testing.T) {
// Test default limit (32 KiB)
if poolBufOversized(16 * 1024) {
t.Error("16 KiB buffer should not be oversized under default limit")
}
if poolBufOversized(32 * 1024) {
t.Error("32 KiB buffer should not be oversized under default limit")
}
if !poolBufOversized(33 * 1024) {
t.Error("33 KiB buffer should be oversized under default limit")
}
// Test raised limit (256 KiB)
SetPoolBufferLimit(256 * 1024)
defer SetPoolBufferLimit(0)
if poolBufOversized(100 * 1024) {
t.Error("100 KiB buffer should not be oversized under 256 KiB limit")
}
if poolBufOversized(256 * 1024) {
t.Error("256 KiB buffer should not be oversized under 256 KiB limit")
}
if !poolBufOversized(257 * 1024) {
t.Error("257 KiB buffer should be oversized under 256 KiB limit")
}
}
Summary
Closes #62.
PutEncoder/PutDecoderdrop internal buffers above 32KiB, so workloads with consistently larger messages re-grow the buffer on every pooled use. This addsSetPoolBufferLimit(n)to raise the retention limit. Of the three options in the issue, the configurable limit was chosen over (1) raising the default — which would silently change memory retention for all users — and (3) adaptive sizing — which adds racy bookkeeping for marginal benefit.Design notes:
n <= 0restores it.atomic.Int64; the Put path checks the constant default first (poolBufOversized), so small buffers — the common case — never pay for the atomic load. Verified inlinable.wbufand decoderbuf, per the issue.Benchmarks (Apple M3 Max, benchstat, count=8)
100KB payloads, limit raised to 256 KiB vs default:
No regression at default settings vs v6 (the atomic is not touched for small buffers):
Testing
pool_test.go(first internalpackage msgpacktest file — needed to inspect unexportedwbuf/buf): retention under raised limit, dropping under default, clamping behavior.go test ./...andgo test -race ./...pass.