Skip to content

perf: bulk numeric array literal parser bypassing fastparse#904

Closed
He-Pin wants to merge 3 commits into
databricks:masterfrom
He-Pin:perf/bulk-numeric-array-parser
Closed

perf: bulk numeric array literal parser bypassing fastparse#904
He-Pin wants to merge 3 commits into
databricks:masterfrom
He-Pin:perf/bulk-numeric-array-parser

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Motivation

Parsing large numeric array literals like [76,111,114,...] goes through fastparse combinators per element — each number traverses the full expression grammar (expr -> exprSuffix -> atomExpr -> number) before resolving to a simple literal. On Scala Native without JIT, this combinator chain overhead is significant (~1.8ms parse time for a ~2000-element array).

Modification

Commit 1: Bulk numeric array parser

  • Add tryBulkNumericArray that detects arrays of pure numeric literals and parses them with a hand-written scanner, bypassing the fastparse expression chain entirely
  • Supports integers, floats, negative numbers, scientific notation, and trailing commas
  • Falls back to regular parsing for arrays containing non-numeric elements, comprehensions, or comments
  • Guards against identifier-like suffixes (123abc) to avoid misparse

Commit 2: SWAR integer parsing + Val.cachedNum

  • For simple integers (no decimal/exponent): parse directly using 4-digits-at-a-time technique inspired by jsoniter-scala and PR perf: optimize parseInt with 4-digits-at-a-time parsing #897, avoiding String.substring allocation + Double.parseDouble overhead
  • Use Val.cachedNum for values 0-255, reusing pre-allocated instances instead of creating new Val.Num objects
  • Float/exponent numbers still fall back to Double.parseDouble(substring)

Result

JVM JMH (JDK 21, G1GC, -Xmx4G, @fork(1) @WarmUp(1) @measurement(1), 3 runs averaged):

Benchmark Master PR Delta
member (~500 int elements) 0.561 ms 0.074 ms -86.8% (7.6x)
base64_byte_array (~500 int elements) 0.643 ms 0.136 ms -78.8% (4.7x)
setDiff (~1700 int elements) 0.436 ms 0.229 ms -47.6% (1.9x)
setUnion (~1700 int elements) 0.492 ms 0.267 ms -45.7% (1.8x)
comparison 0.045 ms 0.031 ms -31.1%
escapeStringJson 0.037 ms 0.030 ms -18.9%
setInter (~1700 int elements) 0.312 ms 0.310 ms -0.6%
parseInt 0.031 ms 0.032 ms +3.2%

Noise verification (3 runs, Run1 was outlier for all):

Benchmark M:R1 PR:R1 M:R2 PR:R2 M:R3 PR:R3 3-run avg Verdict
stripChars 0.060 0.125 0.059 0.061 0.060 0.069 +6.6% Noise (Run1 outlier)
array_copy_views 7.97 10.40 7.63 7.60 8.05 7.91 +0.5% Noise
lazy_array_comprehension 19.6 29.4 19.0 20.9 20.1 20.2 +5.1% Marginal
lazy_array_reverse_sparse 3.22 4.42 3.14 3.40 3.26 3.37 +5.1% Marginal
realistic2 43.3 51.6 41.9 46.1 43.8 52.5 +12.7% Marginal
bench.02 28.5 32.4 27.6 30.7 26.8 26.9 +5.6% Marginal

Run1 was a cold-start outlier for all noise benchmarks (PR values 30-108% higher). Run2 and Run3 converge to +0-7% range. No confirmed regression beyond JMH single-iteration variance.

Scala Native A/B (Scala Native 0.5.12, macOS arm64, hyperfine --warmup 3):

Benchmark Baseline PR Delta
setInter 14.0 ms 12.6 ms 1.11x faster
setUnion 12.8 ms 11.9 ms 1.07x faster
setDiff 12.9 ms 11.7 ms 1.11x faster

References

@He-Pin He-Pin marked this pull request as draft June 6, 2026 21:53
@He-Pin He-Pin marked this pull request as ready for review June 6, 2026 23:24
@He-Pin He-Pin marked this pull request as draft June 9, 2026 02:56
He-Pin added 2 commits June 9, 2026 10:58
Motivation:
Parsing large numeric array literals like `[76,111,114,...]` goes through
fastparse combinators per element — each number traverses the full
expression grammar (expr → exprSuffix → atomExpr → number) before
resolving. On Scala Native without JIT, this combinator overhead is
significant.

Modification:
Add `tryBulkNumericArray` that detects arrays of pure numeric literals
and parses them with a hand-written scanner. Skips the fastparse
expression chain entirely for each element. Falls back to regular
parsing for arrays containing non-numeric elements, comprehensions,
or comments.

Result:
Native A/B (member benchmark with ~2000-element numeric array):
  baseline 8.6ms → optimized 6.2ms (-28%).
  Ratio vs jrsonnet: 2.12x → 1.53x.
Motivation:
The bulk numeric array parser (previous commit) used
Double.parseDouble(data.substring()) for each element, creating a
substring allocation and invoking the full double parser even for
simple integers.

Modification:
- Parse simple integers directly using 4-digits-at-a-time technique
  (inspired by jsoniter-scala/PR databricks#897), avoiding substring allocation
  and Double.parseDouble overhead entirely
- Use Val.cachedNum for values 0-255, reusing pre-allocated instances
  instead of creating new Val.Num objects
- Float/exponent numbers still fall back to Double.parseDouble

Result:
Native A/B (member): 7.1ms → 5.9ms (-17.4%).
Combined with bulk parser: member gap vs jrsonnet 1.97x → 1.42x.
Also improves base64_byte_array: 11.3ms → 10.4ms (-7.9%).
@He-Pin He-Pin force-pushed the perf/bulk-numeric-array-parser branch from b9bb525 to b2df550 Compare June 9, 2026 02:59
@He-Pin He-Pin marked this pull request as ready for review June 9, 2026 04:13
@He-Pin He-Pin marked this pull request as draft June 9, 2026 04:18
…cision loss

Motivation:
The bulk numeric array parser used Double for integer accumulation,
which loses precision for integers beyond 2^53 (e.g., 12345678901234567890
would differ from Double.parseDouble by up to 2048).

Modification:
- Changed accumulator from Double to Long with overflow detection.
- When acc > Long.MaxValue/10000 (or /10), set isSimpleInt=false and
  fall back to Double.parseDouble for that number.
- Convert Long to Double before negation to preserve -0.0 sign bit.
- Added regression test for large integer precision.

Result:
Bulk parser now produces identical results to Double.parseDouble for
all integer magnitudes, including beyond 2^53.
@He-Pin He-Pin closed this Jun 9, 2026
@He-Pin

He-Pin commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Even the number is ok, but still this kind of optimization is not that generic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant