docs: v13.0 canonical benchmark refresh + documentation staleness sweep#127
Merged
Conversation
…s sweep Benchmark refresh (canonical bench_vs_gmp run, 2026-06-12 evening at v13.0, M1 Max, GMP 6.3; warm steady-state re-measured for 50M+ mul and 100k+ ToString rows — first call discarded, best-of-3): - BENCHMARK.md restructured 661 -> ~300 lines: fresh tables for all five subsystems, explicit cold/warm ToString columns, degenerate balanced-div table dropped, and the eight layered per-PR session narratives collapsed into one condensed optimization-history table (deep dives stay in the subsystem docs; the historical figures keep their original data). - README: single sources the performance story (the two diverging Performance sections had drifted — the bottom table still showed pre-#119 parse at 1.01x), fresh ratios throughout, removed references to the deleted plan files. - Figures regenerated from the new run (ratio-by-size + before/after). Headline state: balanced mul beats GMP from 500k digits (0.31-0.89x, warm 0.89-0.97x at 50-200M), all skewed mul shapes >=500k x 50k (0.40-0.73x), division from 20M-digit dividends (0.45-0.82x, parity at 5M), parse 0.38-0.69x and warm ToString 0.49-0.62x from 500k digits. Documentation staleness sweep (every fix verified against code): - BASE.md premise flipped: default is true 64-bit limbs / Base2_64 sentinel (the doc described the -DBIGMATH_LIMB_64=0 fallback as production and still listed the landed 64-bit refactor under Future opportunities). - MULTIPLICATION.md: Toom-3 window retired from dispatch narrative and flowchart (NTT entry 1280), CRT gate 256, KARATSUBA_THRESHOLD 32, CRT squaring at 640 limbs, NEON marked landed (was still listed as the next lever), MFA gate 2^20, overturned rejections annotated. - DIVISION.md: full 7-band Newton frontier + QuotientSizedDivision added to the flowchart and threshold table (both missing), BZ odd-size padding fix correctly described (entry normalization never fixed sub-level odd sizes), BZ basecase 128, current CRT prime triple, cyclic gate macro. - STRING_CONVERSION.md: 10^19/Base10_19 formatter chunks, code locations moved to src/common/Parser.cpp, parallel-D&C rejection marked OVERTURNED (tl_chunkDepth reentrancy + #118/#119 fan-outs), broken memory link removed. - THREAD_SAFETY.md: thread_local cache inventory completed, pool usage updated for fused-MFA row-chunked stages and the decimal I/O fan-outs. - All stale 2026-05-27 vs-GMP tables labeled as historical snapshots with pointers to BENCHMARK.md; repo paths updated to include/ + src/ split. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two parts:
1. Canonical benchmark refresh (v13.0)
Full
bench_vs_gmprun (2026-06-12 evening, M1 Max, GMP 6.3) plus warm steady-state re-measurement for the chain-/plan-heavy rows (50M+ balanced mul, 100k+ ToString — first call discarded, best-of-3, per the established methodology). Figures regenerated.Headline state at v13.0: balanced mul beats GMP from 500k digits (0.31–0.89×; warm 0.89–0.97× at 50–200M), every skewed mul shape ≥500k×50k (0.40–0.73×), division from 20M-digit dividends (0.45–0.82×, parity at 5M), parse 0.38–0.69× and warm ToString 0.49–0.62× from 500k digits.
2. Documentation staleness sweep
Four parallel reviewed-and-verified passes over
docs/(every fix checked against current code):-DBIGMATH_LIMB_64=0fallback (32-in-64, base 2³²) as production and still listed the landed 64-bit refactor under Future opportunities. Now: 64-bit limbs /Base2_64sentinel default, historical rationale preserved and labeled.KARATSUBA_THRESHOLD32, CRT squaring at 640 limbs with the self-operand skip, NEON marked landed (three places still called it "the next lever"), MFA gate 2^20, two overturned rejections annotated.QuotientSizedDivision(both previously missing), BZ odd-size padding fix correctly described (the old text claimed entry normalization fixed odd sizes — it never did below the top level; that was exactly the perf: BZ odd-size padding fix + Newton floor re-sweep (smallskew_div_plan S1) #116 bug), BZ basecase 128, current CRT prime triple (old text had the pre-2026-05-27 triple), cyclic gate macro.src/common/Parser.cpp, parallel-D&C rejection marked OVERTURNED (reentrant pool + perf: parallel ToString subtree fan-out (T1) + reentrant-safe ParallelDo #118/perf: parallel parse leaf fan-out — parse beats GMP from 500k digits #119 fan-outs), broken link removed.biginteger/...→include/biginteger/...+src/throughout.Test plan
🤖 Generated with Claude Code