Add BitPacker16x (AVX-512)#71
Draft
AlJohri wants to merge 1 commit into
Draft
Conversation
Adds a fourth bitpacking flavor that leverages AVX-512, mirroring the existing BitPacker8x (AVX2) flavor: - New `bitpacker16x` feature (enabled by default) exposing a `BitPacker16x` public type. - AVX-512 implementation plus a scalar fallback; the instruction set is detected at runtime and falls back to scalar when avx512f is unavailable, so the produced format is identical regardless of CPU. - The cross-lane helpers (compute_delta / integrate_delta / or_collapse) use whole-register `valignd` (`_mm512_alignr_epi32`). - The avx512 helpers carry `#[target_feature(enable = "avx512f")]` so they inline into the feature-gated pack/unpack/num_bits. Without it `num_bits` spills the 512-bit accumulator to the stack and calls out-of-line, which measured ~6x slower and (because compress recomputes num_bits per block) looked like a 2.5x compress regression. - Block size is 512 integers (32 registers x 16 lanes). Also wired into lib.rs docs, the criterion benchmark, the README, and the CHANGELOG; crate version bumped to 0.10.0. Correctness is validated against the scalar reference via the crate's existing `test_compatible` cross-check (byte-for-byte), run under Intel SDE since AVX-512 hardware was not available locally. Throughput verified on AVX-512 hardware (AWS r8a / AMD Zen5 and r7i / Intel Sapphire Rapids): per-int parity-or-better than BitPacker8x on num_bits/compress/decompress, at 2x lanes per instruction.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
BitPacker16x, an AVX-512 (512-int block, 16 lanes) flavor mirroringBitPacker8x, with a scalar fallback behind the default-onbitpacker16xfeature.