Skip to content

docs: v13.0 canonical benchmark refresh + documentation staleness sweep#127

Merged
mmurshed merged 1 commit into
mainfrom
docs/benchmark-v13-refresh
Jun 12, 2026
Merged

docs: v13.0 canonical benchmark refresh + documentation staleness sweep#127
mmurshed merged 1 commit into
mainfrom
docs/benchmark-v13-refresh

Conversation

@mmurshed

Copy link
Copy Markdown
Owner

Summary

Two parts:

1. Canonical benchmark refresh (v13.0)

Full bench_vs_gmp run (2026-06-12 evening, M1 Max, GMP 6.3) plus warm steady-state re-measurement for the chain-/plan-heavy rows (50M+ balanced mul, 100k+ ToString — first call discarded, best-of-3, per the established methodology). Figures regenerated.

  • BENCHMARK.md restructured 661 → ~300 lines: fresh tables for all five subsystems; cold/warm ToString split made explicit columns instead of inline annotations; the degenerate balanced-division table dropped (quotient 0–2 limbs, both libraries short-circuit); the eight layered per-PR session narratives collapsed into one condensed optimization-history table. The two historical figures keep their original data with one-paragraph context.
  • README: the two Performance sections had drifted (bottom table still pre-perf: parallel parse leaf fan-out — parse beats GMP from 500k digits #119 — showed parse 20M at "1.01× parity" vs the real 0.38×); now one summary + one fresh table. Removed references to the plan files deleted in Codebase audit: correctness bugs, dead code, doc/threshold consistency #120.

Headline state at v13.0: balanced mul beats GMP from 500k digits (0.31–0.89×; warm 0.89–0.97× at 50–200M), every skewed mul shape ≥500k×50k (0.40–0.73×), division from 20M-digit dividends (0.45–0.82×, parity at 5M), parse 0.38–0.69× and warm ToString 0.49–0.62× from 500k digits.

2. Documentation staleness sweep

Four parallel reviewed-and-verified passes over docs/ (every fix checked against current code):

  • BASE.md — premise flipped: the doc described the -DBIGMATH_LIMB_64=0 fallback (32-in-64, base 2³²) as production and still listed the landed 64-bit refactor under Future opportunities. Now: 64-bit limbs / Base2_64 sentinel default, historical rationale preserved and labeled.
  • MULTIPLICATION.md — Toom-3 retired from dispatch flowchart/narrative (NTT entry 1280, CRT gate 256), KARATSUBA_THRESHOLD 32, CRT squaring at 640 limbs with the self-operand skip, NEON marked landed (three places still called it "the next lever"), MFA gate 2^20, two overturned rejections annotated.
  • DIVISION.md — dispatch flowchart/table rebuilt: full 7-band Newton frontier and QuotientSizedDivision (both previously missing), BZ odd-size padding fix correctly described (the old text claimed entry normalization fixed odd sizes — it never did below the top level; that was exactly the perf: BZ odd-size padding fix + Newton floor re-sweep (smallskew_div_plan S1) #116 bug), BZ basecase 128, current CRT prime triple (old text had the pre-2026-05-27 triple), cyclic gate macro.
  • STRING_CONVERSION.md — 10¹⁹ formatter chunks (text said 10¹⁸ and contradicted itself), code locations to src/common/Parser.cpp, parallel-D&C rejection marked OVERTURNED (reentrant pool + perf: parallel ToString subtree fan-out (T1) + reentrant-safe ParallelDo #118/perf: parallel parse leaf fan-out — parse beats GMP from 500k digits #119 fan-outs), broken link removed.
  • THREAD_SAFETY.md — thread_local cache inventory completed (was "four caches"; misses Newton scratch, NttCrt buffers, divider-chain cache, BigDecimal cache), pool usage updated for fused-MFA row-chunked stages + decimal fan-outs.
  • All stale 2026-05-27 vs-GMP tables labeled historical snapshots pointing at BENCHMARK.md; repo paths biginteger/...include/biginteger/... + src/ throughout.

Test plan

  • Docs-only change; library untouched (ctest still 4/4 from the same tree)
  • Verified: zero references to deleted plan files, no remaining "base-2³² production" claims, all figure links resolve, plot script regenerates both PNGs from the new data

🤖 Generated with Claude Code

…s sweep

Benchmark refresh (canonical bench_vs_gmp run, 2026-06-12 evening at v13.0,
M1 Max, GMP 6.3; warm steady-state re-measured for 50M+ mul and 100k+
ToString rows — first call discarded, best-of-3):

- BENCHMARK.md restructured 661 -> ~300 lines: fresh tables for all five
  subsystems, explicit cold/warm ToString columns, degenerate balanced-div
  table dropped, and the eight layered per-PR session narratives collapsed
  into one condensed optimization-history table (deep dives stay in the
  subsystem docs; the historical figures keep their original data).
- README: single sources the performance story (the two diverging
  Performance sections had drifted — the bottom table still showed pre-#119
  parse at 1.01x), fresh ratios throughout, removed references to the
  deleted plan files.
- Figures regenerated from the new run (ratio-by-size + before/after).

Headline state: balanced mul beats GMP from 500k digits (0.31-0.89x, warm
0.89-0.97x at 50-200M), all skewed mul shapes >=500k x 50k (0.40-0.73x),
division from 20M-digit dividends (0.45-0.82x, parity at 5M), parse
0.38-0.69x and warm ToString 0.49-0.62x from 500k digits.

Documentation staleness sweep (every fix verified against code):

- BASE.md premise flipped: default is true 64-bit limbs / Base2_64 sentinel
  (the doc described the -DBIGMATH_LIMB_64=0 fallback as production and
  still listed the landed 64-bit refactor under Future opportunities).
- MULTIPLICATION.md: Toom-3 window retired from dispatch narrative and
  flowchart (NTT entry 1280), CRT gate 256, KARATSUBA_THRESHOLD 32, CRT
  squaring at 640 limbs, NEON marked landed (was still listed as the next
  lever), MFA gate 2^20, overturned rejections annotated.
- DIVISION.md: full 7-band Newton frontier + QuotientSizedDivision added to
  the flowchart and threshold table (both missing), BZ odd-size padding fix
  correctly described (entry normalization never fixed sub-level odd
  sizes), BZ basecase 128, current CRT prime triple, cyclic gate macro.
- STRING_CONVERSION.md: 10^19/Base10_19 formatter chunks, code locations
  moved to src/common/Parser.cpp, parallel-D&C rejection marked OVERTURNED
  (tl_chunkDepth reentrancy + #118/#119 fan-outs), broken memory link
  removed.
- THREAD_SAFETY.md: thread_local cache inventory completed, pool usage
  updated for fused-MFA row-chunked stages and the decimal I/O fan-outs.
- All stale 2026-05-27 vs-GMP tables labeled as historical snapshots with
  pointers to BENCHMARK.md; repo paths updated to include/ + src/ split.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mmurshed mmurshed merged commit 8662d98 into main Jun 12, 2026
4 of 5 checks passed
@mmurshed mmurshed deleted the docs/benchmark-v13-refresh branch June 12, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant