Skip to content

fix: enforce alloc limit in bytes decode; perf: exact-capacity readNGrow growth (#63)#75

Open
xe-nvdk wants to merge 3 commits into
v6from
fix/bytes-alloc-limit
Open

fix: enforce alloc limit in bytes decode; perf: exact-capacity readNGrow growth (#63)#75
xe-nvdk wants to merge 3 commits into
v6from
fix/bytes-alloc-limit

Conversation

@xe-nvdk

@xe-nvdk xe-nvdk commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Closes #63. Two related changes (separate commits):

1. Security fix: bytes decode bypassed the alloc limit

The fork's readN/readNGrow split (upstream's readN was the chunked variant) left bytes()/bytesPtr() calling the unbounded package-level readN directly. A malicious bin32/str32 header declaring ~4GB forced a ~4GB upfront allocation — on stream decode and on Unmarshal — even with alloc limits enabled (the default). The string path was unaffected (d.readN already branches on the flag).

Bytes decoding now routes through a new readNInto method:

  • Byte-slice (Unmarshal) path: declared length is validated against the remaining input before allocating → still a single exact-size allocation (1 alloc for 4MB, same as before), bomb closed for free.
  • Stream path: chunked readNGrow by default; DisableAllocLimit(true) restores unbounded single-allocation reads for trusted input.

2. Perf (#63): exact-capacity chunked growth in readNGrow

Replaces append-based chunk growth with explicit min(n, max(2*pos, pos+bytesAllocLimit)) sizing — provably the same allocation-ahead-of-data bound as before (≤ max(received, 1MiB)), but no capacity overshoot beyond n and fewer alloc+copy rounds. readN similarly allocates exactly n, dropping a useless prefix copy.

Benchmarks (Apple M3 Max, benchstat, count=6)

benchmark v6 this PR delta
DecodeStringStream4MB (chunked on both) 549.7µs, 15.4MiB, 7 allocs 381.9µs, 11.0MiB, 6 allocs -30.5% ns, -28.5% B
UnmarshalBytes4MB (byte-slice path) 1 alloc, 4MB 1 alloc, 4MB unchanged
DecodeBytesStream4MB 139.6µs (unbounded, vulnerable) 317.2µs (chunked, safe) cost of the fix; DisableAllocLimit restores old perf
DecodeBytesStream512KB 31.2µs 31.1µs ~

Testing

  • Huge-declared-length tests on all three bytes paths (Decode(&[]byte), DecodeBytes, struct field) for stream, plus the Unmarshal path — fail fast with ≤1MiB allocated.
  • 2.5MB round-trips (stream + Unmarshal + DisableAllocLimit); caller-ownership (no input aliasing) asserted.
  • go test ./..., go test -race ./... pass; existing decoderTests EOF semantics preserved.
  • 4-agent internal review (correctness/security/quality/perf) completed; all findings addressed (bsr single-alloc fast path, builtin min/max cleanup, CHANGELOG).

Note for follow-up

The security review also found a pre-existing allocation bomb in DecodeUntypedMap (decode_map.go:224 — make(map[interface{}]interface{}, n) without the maxMapSize clamp its siblings have). Separate PR incoming.

🤖 Generated with Claude Code

Ignacio Van Droogenbroeck added 3 commits June 12, 2026 15:05
The fork's readN/readNGrow split (upstream's readN was the chunked,
limit-respecting variant) left bytes(), bytesPtr() calling the
package-level readN directly — the unbounded variant that allocates the
full declared length upfront. On a stream decode, a malicious bin32/
str32 header declaring ~4GB forced a ~4GB allocation before any payload
was read, even with alloc limits enabled (the default). The string path
(d.readN method) was unaffected — it already branches on the flag.

Route bytes decoding through a new readNInto method that mirrors
d.readN's branch: chunked readNGrow by default, unbounded readN only
when DisableAllocLimit(true) is set.

Trusted workloads with large binary payloads can keep the old
single-allocation behavior via DisableAllocLimit; the follow-up commit
reduces the cost of the chunked path.
Replace the append-based chunk growth in readNGrow with explicit
exact-capacity sizing: grow to min(n, max(2*pos, pos+bytesAllocLimit))
per round. This preserves the prior security bound — allocation never
runs more than max(received, bytesAllocLimit) ahead of data actually
supplied — while eliminating append's capacity overshoot beyond n and
reducing alloc+copy rounds. readN (DisableAllocLimit path) similarly
allocates exactly n instead of append-growing, skipping a useless copy
of old contents that ReadFull overwrites anyway.

4MB stream decode vs v6 (count=6, Apple M3 Max):

  DecodeStringStream4MB   549.7µs → 381.9µs  (-30.5%)  15.4MiB → 11.0MiB (-28.5%)  7 → 6 allocs

The bytes path is slower than v6 (+127%) only because v6 skipped the
alloc limit entirely there (see previous commit); against the corrected
chunked baseline this change is -20% time, -38% B/op. DisableAllocLimit
restores single-allocation reads for trusted input.

Closes #63
- readNInto: validate the declared length against remaining input on the
  byte-slice (Unmarshal) path before allocating — restores single
  exact-size allocation there (1 alloc for a 4MB payload, matching the
  pre-fix behavior) while also closing the same declared-length bomb on
  the byte-slice path. Error semantics match io.ReadFull (EOF vs
  ErrUnexpectedEOF), keeping decoderTests expectations intact.
- Remove the pre-Go-1.21 local min helper; use builtin min/max.
- Add Unmarshal-path tests: huge declared length fails fast; 2.5MB
  round-trip; result is caller-owned (no aliasing of the input).
- Add CHANGELOG entries (Bug Fixes + Performance) for #63.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to enforce allocation limits during msgpack bytes and string decoding, preventing malicious headers with huge declared lengths from triggering upfront out-of-memory allocations. It routes decoding through a new readNInto helper, optimizes slice growth in readNGrow to avoid capacity overshoot, and adds corresponding benchmarks and tests. The review feedback highlights critical integer overflow vulnerabilities on 32-bit architectures where large uint32 lengths can cast to negative int values, potentially bypassing length checks, causing panics during allocation, or being incorrectly interpreted as Nil values.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread decode.go
// decoder's alloc limit: unless DisableAllocLimit is set, allocation for
// a declared length is chunked by bytesAllocLimit so a malicious header
// can't force a huge upfront allocation.
func (d *Decoder) readNInto(b []byte, n int) ([]byte, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On 32-bit architectures, a large uint32 length (e.g., >= 2^31) decoded from bin32 or str32 headers can overflow when cast to int, resulting in a negative value for n. If n is negative, it will bypass the remaining check and cause a panic in make([]byte, n) or slice bounds out of range. We should add a check for n < 0 at the beginning of readNInto to prevent this.

Suggested change
func (d *Decoder) readNInto(b []byte, n int) ([]byte, error) {
func (d *Decoder) readNInto(b []byte, n int) ([]byte, error) {
if n < 0 {
return nil, fmt.Errorf("msgpack: invalid length %d", n)
}

Comment thread decode.go
return readNGrow(d.r, b, n)
}

func readN(r io.Reader, b []byte, n int) ([]byte, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To prevent panics or silent data corruption on 32-bit architectures when a large uint32 length overflows to a negative int, we should validate that n >= 0 at the beginning of readN.

func readN(r io.Reader, b []byte, n int) ([]byte, error) {
	if n < 0 {
		return nil, fmt.Errorf("msgpack: invalid length %d", n)
	}

Comment thread decode.go
}
}
for pos < n {
newCap := min(max(2*pos, pos+bytesAllocLimit), n)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On 32-bit architectures, if pos is large (e.g., >= 2^30), 2*pos or pos+bytesAllocLimit can overflow to a negative value. This would cause max to return a negative value or a value smaller than expected, leading to a negative newCap and a panic in make([]byte, newCap). We should calculate newCap in an overflow-safe manner.

		newCap := pos + bytesAllocLimit
		if newCap < pos || newCap < 0 {
			newCap = n
		}
		if double := 2 * pos; double > newCap && double > 0 {
			newCap = double
		}
		if newCap > n || newCap < 0 {
			newCap = n
		}

Comment thread decode_string.go
Comment on lines 92 to +95
if n == -1 {
return nil, nil
}
return readN(d.r, b, n)
return d.readNInto(b, n)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On 32-bit architectures, a large uint32 length (e.g., 0xffffffff) decoded from bin32 or str32 headers will overflow to -1 when cast to int. Since n == -1 is checked here, it will be incorrectly treated as msgpcode.Nil, returning nil, nil and silently succeeding instead of failing with an error. We should ensure that n == -1 only returns nil, nil if the code is actually msgpcode.Nil, and otherwise fail on negative lengths.

Suggested change
if n == -1 {
return nil, nil
}
return readN(d.r, b, n)
return d.readNInto(b, n)
if n == -1 && c == msgpcode.Nil {
return nil, nil
}
if n < 0 {
return nil, fmt.Errorf("msgpack: invalid length %d", n)
}
return d.readNInto(b, n)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: readN/readNGrow buffer reuse for large reads

1 participant