Skip to content

fix: valgrind on ARM#21

Draft
not-matthias wants to merge 10 commits into
masterfrom
cod-2985-arm-flamegraph-failures
Draft

fix: valgrind on ARM#21
not-matthias wants to merge 10 commits into
masterfrom
cod-2985-arm-flamegraph-failures

Conversation

@not-matthias

Copy link
Copy Markdown
Member

No description provided.

not-matthias and others added 9 commits June 30, 2026 16:42
…l JSON

New Rust crate (edition 2024) that reads a Callgrind .out profile and extracts call-graph topology (costs/addresses ignored), serializing to canonical index-ref JSON for stable cross-platform callgraph diffing.

Node identity is the {object,file,function} tuple so same-named statics stay distinct. Edges are emitted only on calls= lines (cl-format.xml CallSpec); name compression across three ID spaces, the cfl/cfi alias, inline fi/fe callee-context inheritance, and multi-part merge are handled. 18 integration tests; clippy and rustfmt clean.
Add testdata/*.c fixtures (recursion, chain, diamond, mutual) profiled by the in-repo Callgrind through an rstest harness that compiles each fixture and runs vg-in-place, then snapshots the canonical JSON. --instr-atstart=no plus the fixtures' client requests keep loader/libc frames out, so the JSON is stable across platforms.
The AArch64 B{L} decoder tagged the whole opcode group as Ijk_Call,
but only BL (bit 31 = 1, writes the link register) is a call; a plain
B (bit 31 = 0) is an ordinary unconditional branch.

Mislabelling B as a call made Callgrind treat every branch to a
function epilogue or tail target as a call. At -O0 a conditional like
`return n < 2 ? n : fib(...)` compiles the base case to `b <epilogue>`,
so each base case was counted as a recursive call -- inflating
recursive/cyclic call graphs and inventing phantom self-edges on arm64
(e.g. fib recursion 64 -> 98; mutual is_even/is_odd gaining self-loops).

Align plain B with B.cond and the register-indirect JMP, which already
use Ijk_Boring. Fixes the callgrind-utils recursion/mutual snapshot
failures.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a fixture_full_trace rstest matrix over the same four fixtures,
traced with --instr-atstart=yes so the whole program (loader, libc
startup, main's own entry) is captured, not just the client-request
scoped region.

The startup frames carry non-portable names (__libc_start_main@@GLIBC_2.34,
raw loader addresses), so this asserts version-stable invariants rather
than a golden snapshot: JSON round-trips, main appears as a callee
(full-program capture), the fixture's own functions are present, and the
per-fixture call shape matches the scoped snapshots. The recursion count
(fib'2->fib'2 == 64) and mutual no-self-edge checks double as regression
guards for the arm64 B-vs-BL jump-kind fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Profile a Python workload (recursion.py) live under the in-repo Callgrind,
mirroring pytest-codspeed: a ctypes-loaded shim (clgctl.c) fires
CALLGRIND_START/STOP and adds libpython + the python executable to the
obj-skip list at runtime via CALLGRIND_ADD_OBJ_SKIP. Callgrind never names
Python-level frames, so the test asserts structure rather than a golden
snapshot: the START shim is captured and the Python runtime is folded out.
@codspeed-hq

codspeed-hq Bot commented Jun 30, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 47.44%

❌ 2 regressed benchmarks
✅ 40 untouched benchmarks
⏩ 88 skipped benchmarks1

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_valgrind[valgrind-3.26.0, python3 testdata/test.py, full-no-inline] 7 s 22.3 s -68.69%
test_valgrind[valgrind-3.25.1, python3 testdata/test.py, full-no-inline] 7 s 7.9 s -11.77%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing cod-2985-arm-flamegraph-failures (bdc4911) with master (ce9d871)

Open in CodSpeed

Footnotes

  1. 88 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant