This repository collects performance benchmarks for SPy, comparing it against CPython, PyPy, Julia, Codon, and NumPy depending on the benchmark.
Each subdirectory is a self-contained benchmark with a Makefile that exposes
a set of standard targets:
| Target | Purpose |
|---|---|
make all |
Run every implementation available locally |
make check-ci |
Run the subset used in CI (all but CPython, which is too slow for CI) |
make test |
Run only the SPy implementation; output is compared against expected_output/ |
make clean |
Remove build artefacts |
The test target is what the test suite (test_spy_bench.py) invokes. Its output
must be deterministic modulo timing lines, which are marked with a # prefix
(or an inline # sentinel) so the test framework can strip them before
comparison.
A virtual environment containing SPy and NumPy can be setup with uv sync. The SPy repository
(https://github.com/spylang/spy) must be installed next to the benchmarks repository.
Beyond SPy itself, some benchmarks compare against:
-
NumPy — installed in the main venv.
-
PyPy — installed and managed via uv:
uv python install pypy@3.11 # then available as: uv python find pypy -
Julia — install from https://julialang.org/downloads/ or via juliaup:
curl -fsSL https://install.julialang.org | sh -
Codon — install from https://github.com/exaloop/codon:
/bin/bash -c "$(curl -fsSL https://exaloop.io/install.sh)"
With the virtual environment activated (. .venv/bin/activate), one can run:
# Run a single benchmark (all implementations)
make -C fibo all
# Run all benchmarks via pytest
pytest
# Run only SPy comparisons (fast, no extra deps)
pytest test_spy_bench.py
# Run the full CI suite (requires Julia, Codon, PyPy)
pytest test_check_ci.py
# Regenerate expected output after an intentional SPy change
pytest test_spy_bench.py --update-expected-outputSeveral external suites are good candidates for future SPy benchmarks:
-
Julia microbenchmarks (results) — small, targeted kernels (
fib,pisum,parse_int, ...). Useful for early-stage languages since they require very little stdlib support. -
Energy Languages — based on the Computer Language Benchmarks Game, but more modern and actively maintained. Covers both performance and energy consumption, which could be a differentiating angle for SPy. See also the associated paper.
-
nbabel — N-body simulation used to compare scientific computing languages. Particularly interesting because there are advanced, modern implementations in both Julia and Mojo, providing strong reference points for comparison.
-
pyperformance — the standard CPython benchmark suite, also used (with variations) by PyPy and GraalPy. Likely the most meaningful for positioning SPy relative to the Python ecosystem, though it requires broader stdlib coverage first.