Skip to content

Size the buffer to the input for one-shot Base-N encode#439

Closed
nishantmehta wants to merge 1 commit into
apache:masterfrom
nishantmehta:pr/base-n-buffer-size
Closed

Size the buffer to the input for one-shot Base-N encode#439
nishantmehta wants to merge 1 commit into
apache:masterfrom
nishantmehta:pr/base-n-buffer-size

Conversation

@nishantmehta

Copy link
Copy Markdown

Summary

BaseNCodec.encode(byte[], int, int) — which backs encode(byte[]), encodeToString and the static Base64/Base32 helpers — created a Context whose buffer was lazily allocated by ensureBufferSize at max(size, 8192). For a one-shot encode of a small input this allocated the full 8192-byte default streaming buffer regardless of the actual output size.

The exact output size is already computable via getEncodedLength, so this pre-sizes the context buffer to it before encoding. The streaming path (Base64OutputStream etc.) is unchanged and still grows from the default size. When the encoded length does not fit an int the code falls back to the streaming buffer (such an output cannot be returned as a single array anyway). getEncodedLength(byte[]) is refactored to delegate to a private length-based helper so the one-shot encode can size from the requested range length.

Measurement

ThreadMXBean allocation driver, 200k warmed ops, 25-byte input:

call before after
Base64.encodeToString 8479 B/op 345 B/op (−96%)
Base32.encodeToString 8592 B/op 458 B/op (−95%)

Testing

Base64Test, Base32Test, Base16Test, BaseNCodecTest and the Base64 input/output stream tests pass unchanged (360 tests). The streaming buffer growth path is exercised by the existing stream tests.

BaseNCodec.encode(byte[], int, int) (which backs encode(byte[]),
encodeToString and the static Base64/Base32 helpers) created a Context whose
buffer was lazily allocated by ensureBufferSize at max(size, 8192). For a
one-shot encode of a small input this allocated the full 8192-byte default
streaming buffer regardless of the actual output size.

The exact output size is already computable via getEncodedLength, so pre-size
the context buffer to it before encoding. The streaming path (Base64OutputStream
etc.) is unchanged and still grows from the default size. When the encoded
length does not fit an int the code falls back to the streaming buffer; such an
output cannot be returned as a single array anyway. getEncodedLength(byte[]) is
refactored to delegate to a private length-based helper so the one-shot encode
can size from the requested range length.

Measured with a ThreadMXBean allocation driver (200k warmed ops, 25-byte input):

  Base64.encodeToString  8479 B/op -> 345 B/op  (-96%)
  Base32.encodeToString  8592 B/op -> 458 B/op  (-95%)

Base64Test, Base32Test, Base16Test, BaseNCodecTest and the Base64
input/output stream tests pass unchanged (360 tests).

Signed-off-by: Nishant Mehta <nishantmehta.n@gmail.com>
@garydgregory

Copy link
Copy Markdown
Member

Closing: change for the sake of change and not worth the extra complications and bloat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants