fix(hugepages): size 1G reservation by NUMA nodes, not sockets#111
Merged
Conversation
RandomX fast mode keeps a NUMA-local copy of the ~2080 MB dataset per NUMA node (XMRig allocates one dataset per node). util/proposed-grub.sh multiplied the per-dataset 1G page count (3) by the SOCKET count, but a single-socket EPYC 7642 exposes 4 NUMA nodes — so setup reserved 3x 1G instead of 12x. The boxes ran fine on an older boot's reservation, but a fresh setup + reboot would leave 3 of 4 nodes without 1G backing and tank hashrate. Detect NUMA nodes (lscpu "NUMA node(s)", then count /sys/devices/system/node, then fall back to sockets, then 1) and scale both the 1G reservation and the pure-2M fallback by it. 2M scratchpad sizing is per-thread total and unchanged. - proposed-grub.sh: NUMA_NODES detection + use it for TOTAL_GB_PAGES and TOTAL_2MB_FALLBACK; verbose output shows the NUMA node count. - tests: stub lscpu now emits "NUMA node(s)" (defaults to socket count, so existing assertions are unchanged); added cases for the multi-NUMA 1G scaling, the 2M fallback scaling, and the sysfs/socket detection fallbacks. Verified on a real 4-NUMA EPYC 7642: lscpu reports 4 nodes, calculator now emits hugepages=1G hugepages=12 (was 3). Found while upgrading a fleet to v1.0.0; the bug is in released v1.0.0 (and 0.1.0). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
diff-cover flagged the new verbose "NUMA Nodes:" output line (proposed-grub.sh:119) as uncovered — every other proposed-grub test runs with -q or --runtime. Add a verbose-mode assertion exercising it (and the sockets line alongside it). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…NUMA sizing Two follow-ups found while validating the NUMA fix with a clean install on a real EPYC (autotune disabled in that worker's config): - tests/e2e-real.sh: the #92 re-own check did `op=$(systemctl cat rigforge-autotune.service ...)`. When autotune is disabled the unit doesn't exist, systemctl exits non-zero, and under the gate's `set -Eeuo pipefail` the bare assignment aborted the whole verify phase right before the SKIP branch that was meant to handle exactly this. Add `|| true` so it reaches the skip. (Earlier gate runs all had autotune enabled, so this never surfaced.) - docs/hardware.md: `randomx.numa` was described as "a no-op on single-socket", but a single-socket EPYC exposes several NUMA nodes — the misconception behind the 1G HugePage sizing bug. Clarify that the reservation scales per NUMA node. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
VijitSingh97
added a commit
that referenced
this pull request
Jun 13, 2026
Patch release. Roll CHANGELOG [Unreleased] -> [1.0.1] and bump VERSION to 1.0.1. Fixes NUMA-unaware 1 GB HugePage sizing (#111): a multi-NUMA CPU (e.g. EPYC) keeps one RandomX dataset copy per NUMA node, so the reservation must scale with NUMA nodes, not sockets — a fresh setup on a 4-NUMA EPYC now reserves 12x 1G (was 3), so it stays 100%-backed across a reboot. Validated end to end by the real-hardware gate on a 4-NUMA EPYC 7642: clean install -> reboot -> 12x 1G reserved, full hashrate, 12/12 NUMA-backed -> verify (38 checks) -> teardown (13), all PASS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
RandomX fast mode keeps a NUMA-local copy of the ~2080 MB dataset per NUMA node (XMRig allocates one dataset per node). But
util/proposed-grub.shsized 1 GB HugePages as3 × Socket(s)— and a single-socket EPYC 7642 exposes 4 NUMA nodes (NPS / L3-as-NUMA). Sosetupreserved 3× 1G instead of 12×, and after a reboot three of four nodes would lose 1 GB backing — a large RandomX hashrate hit.It's latent because affected boxes keep running on whatever reservation their last boot applied; it only bites on the next reboot after a fresh
setup. Present in released v1.0.0 (and 0.1.0).Evidence from a live EPYC (miner-2):
The fix
Detect NUMA nodes (not sockets) and scale the per-node dataset reservation by them:
lscpuNUMA node(s):→ count/sys/devices/system/node/node*(overridable viaNODE_SYSfor tests) → fall back to socket count → 1.TOTAL_GB_PAGES = 3 × NUMA_NODES; the pure-2 MB fallback (BASE_2MB_PAGES × NUMA_NODES) scales too.Tests
The stub
lscpunow emitsNUMA node(s)(defaulting to the socket count, so existing assertions are unchanged). Added cases: multi-NUMA 1G scaling (1 socket / 4 nodes → 12), 2 MB fallback scaling, and both detection fallbacks (sysfs node count, then sockets).make lint+ 610 tests green.Real-hardware validation
On a real 4-NUMA EPYC 7642, the calculator now reports:
Follow-up for a 1.0.1. (The live fleet's GRUB was already hand-corrected to 12, so those boxes are reboot-safe today; this fixes the code so a fresh
setupis correct.)🤖 Generated with Claude Code