Skip to content

Add BitPacker16x (AVX-512)#71

Draft
AlJohri wants to merge 1 commit into
quickwit-oss:masterfrom
AlJohri:add-bitpacker16x-avx512
Draft

Add BitPacker16x (AVX-512)#71
AlJohri wants to merge 1 commit into
quickwit-oss:masterfrom
AlJohri:add-bitpacker16x-avx512

Conversation

@AlJohri

@AlJohri AlJohri commented Jun 18, 2026

Copy link
Copy Markdown

Adds BitPacker16x, an AVX-512 (512-int block, 16 lanes) flavor mirroring BitPacker8x, with a scalar fallback behind the default-on bitpacker16x feature.

Adds a fourth bitpacking flavor that leverages AVX-512, mirroring the
existing BitPacker8x (AVX2) flavor:

- New `bitpacker16x` feature (enabled by default) exposing a `BitPacker16x`
  public type.
- AVX-512 implementation plus a scalar fallback; the instruction set is
  detected at runtime and falls back to scalar when avx512f is unavailable,
  so the produced format is identical regardless of CPU.
- The cross-lane helpers (compute_delta / integrate_delta / or_collapse) use
  whole-register `valignd` (`_mm512_alignr_epi32`).
- The avx512 helpers carry `#[target_feature(enable = "avx512f")]` so they
  inline into the feature-gated pack/unpack/num_bits. Without it `num_bits`
  spills the 512-bit accumulator to the stack and calls out-of-line, which
  measured ~6x slower and (because compress recomputes num_bits per block)
  looked like a 2.5x compress regression.
- Block size is 512 integers (32 registers x 16 lanes).

Also wired into lib.rs docs, the criterion benchmark, the README, and the
CHANGELOG; crate version bumped to 0.10.0.

Correctness is validated against the scalar reference via the crate's existing
`test_compatible` cross-check (byte-for-byte), run under Intel SDE since AVX-512
hardware was not available locally. Throughput verified on AVX-512 hardware
(AWS r8a / AMD Zen5 and r7i / Intel Sapphire Rapids): per-int parity-or-better
than BitPacker8x on num_bits/compress/decompress, at 2x lanes per instruction.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant