fix(variants): RaggedVariants.to_packed()/rc_() on sliced/reordered views#210
Merged
Conversation
to_packed() and rc_() crash on any sliced/reversed/fancy-indexed RaggedVariants (IndexedArray/ListArray layouts) for both alt/ref and numeric fields. Design resolves lazy views via numba packing (seqpro fancy-index for numeric, new _pack_alleles kernel for doubly-nested alt/ref); no ak.to_packed, canonical hot path unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Numeric-field half is an upstream seqpro unbox()/_extract_list_offsets() gap on IndexedArray (record-index + field-extract). Fix seqpro first (project Indexed* in the layout walkers), release, bump gvl pin; gvl numeric fields then need no special handling. alt/ref doubly-nested half stays gvl-owned (numba _pack_alleles kernel + generalized _alt_layout_parts). rc_ returns a new object for reordered input. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two-repo, TDD plan: Part A fixes seqpro unbox()/_extract_list_offsets() IndexedArray gap (land+release first); Part B bumps the pin and adds a numba _pack_alleles kernel + layout decomposition for non-canonical alt/ref, with rc_ delegating to to_packed() for reordered views. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…w (seqpro 0.15.1 fix)
…on-canonical views Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… kernel Gate the alt/ref branch in RaggedVariants.to_packed() on _is_canonical_alleles: canonical layouts take the existing fast path unchanged; IndexedArray/ListArray layouts (produced by fancy-indexing or reversals) fall through to _decompose_alleles + _pack_alleles for a correct numba-based gather. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…otation polish Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
seqpro 0.15.1 carries the IndexedArray unbox fix that powers RaggedVariants.to_packed()/rc_() on sliced/reordered views; genoray 2.9.2 relaxes its seqpro pin to <0.16 so the two co-resolve. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RaggedVariants.to_packed()andrc_()previously crashed on sliced / reversed / fancy-indexed views (non-canonical awkward layouts —IndexedArray/ListArray). This fixes both:start) work for free via the upstream seqproIndexedArrayunbox fix (seqpro 0.15.1).alt/ref: new_decompose_alleles+ numba_pack_alleleskernel, gated by_is_canonical_allelesso the canonical hot path is byte-for-byte unchanged; non-canonical views go through the kernel.rc_()on a non-canonical view materializes a contiguous copy then recurses (returns a new object; the sole caller uses the return value).Depends on seqpro >= 0.15.1 (IndexedArray unbox fix) and genoray >= 2.9.2 (relaxes its
seqpro<0.15pin) — both now on PyPI.Test plan
tests/dataset/test_flat_variants.py— 19 tests incl. to_packed reverse/fancy/explicit-ListArray, rc_ reverse/fancy/mixed-mask, ploidy=2 reordered, canonical fast-path regression. All green against the published seqpro 0.15.1 / genoray 2.9.2.-m "not slow"regression: 618 passed, 0 failed._is_canonical_allelesTrue for freshly-built arrays) is untouched — guarded by the existingtest_rc_matches_awkward*/test_to_packed_matches_awkward*tests.🤖 Generated with Claude Code