Linux/macOS builds bake AVX/AVX2 based on the build machine's CPU, so 0.1.10-alpha wheels SIGILL on older x86_64


The Makefile's `OMIT_SIMD` block greps the **build machine's** `/proc/cpuinfo` and, when the builder has AVX, compiles the whole translation unit with `-mavx -mavx2`. GitHub runners all have AVX2, so release artifacts inherit it. Visible on the published wheels via `vec_debug()`:

- `0.1.9` linux x86_64: `Build flags:` empty — runs on any x86_64.
- `0.1.10a4` linux x86_64: `Build flags: avx rescore diskann` — built with `-mavx -mavx2` file-wide.

Reproduced under qemu-user: a basic insert + KNN workload on the `0.1.10a4` wheel dies with SIGILL under `-cpu Westmere` (SSE4.2, no AVX), while the `0.1.9` wheel passes the identical workload. Under `-cpu SandyBridge` (AVX, no AVX2) the basic workload happens to pass, but with `-mavx2` applied file-wide the compiler is free to emit AVX2 in any function, so there is no guarantee for other code paths.

I'm a paperless-ngx maintainer, currently planning a move of its RAG vector store from LanceDB to sqlite-vec. A big part of the motivation is that 0.1.9's wheels run everywhere, while LanceDB's AVX2-baked wheels SIGILL on exactly this CPU class — Sandy/Ivy Bridge and Goldmont Atom/Celeron NAS boxes (paperless-ngx/paperless-ngx#12970). So I'd love for 0.1.10 final to keep 0.1.9's run-anywhere property.

The SIMD surface here looks small enough to have both speed and portability, and I'm happy to write the patch:

1. `__attribute__((target("avx")))` / `(("avx2"))` on `l2_sqr_float_avx` and `distance_hamming_avx2` only.
2. `__builtin_cpu_supports()` checks at their two existing dispatch sites (cached in a static; libgcc accounts for OS XSAVE state).
3. Makefile: drop `-mavx -mavx2`, define `SQLITE_VEC_ENABLE_AVX` unconditionally on x86_64 gcc/clang.
4. Optionally: `vec_debug()` reports runtime-detected features (additive line), plus an env override to force the scalar paths for testing.

The dispatch cost is a cached static int read per distance call, so AVX2-host throughput should be unchanged; I'd verify that with benchmarks as part of the PR.

The zero-code alternative is dropping `-mavx -mavx2` from release builds (ship 0.1.10 the way 0.1.9 was built), at the cost of dead SIMD kernels in the wheels. Either works for me — which direction do you prefer?

Related: #211 (no sdist on PyPI, so affected users can't easily build from source as a fallback).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux/macOS builds bake AVX/AVX2 based on the build machine's CPU, so 0.1.10-alpha wheels SIGILL on older x86_64 #302

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Linux/macOS builds bake AVX/AVX2 based on the build machine's CPU, so 0.1.10-alpha wheels SIGILL on older x86_64 #302

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions