fix(fast-path): recurse into child fields when computing next_fid#46
Open
FANNG1 wants to merge 1 commit into
Open
fix(fast-path): recurse into child fields when computing next_fid#46FANNG1 wants to merge 1 commit into
FANNG1 wants to merge 1 commit into
Conversation
8479c6a to
e070c3e
Compare
lance_schema.fields() returns only top-level fields. Nested types (struct, fixed_size_list, blob-v2, etc.) have child fields with their own IDs. max(f.id() for top-level f) skips these, so next_fid can collide with an existing child field ID. The committed fragment metadata then maps the new column file to the wrong schema field, causing lance to read null for the new column. Fix: use recursive _max_field_id() that walks all descendants via f.children(). Fixes daft-engine#41
e070c3e to
9233c54
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #41
Problem
FastPathFragmentWritercomputed the field ID for new columns as:lance_schema.fields()returns only top-level fields. A struct column with M children occupies M+1 consecutive Lance field IDs (1 for the parent + M for the children). The top-level-only scan misses the child IDs, sonext_fidcan collide with an existing child.Example for a table with
id(fid=0) and a structmeta(fid=1) with childrenmeta.a(fid=2) andmeta.b(fid=3):"fields": [2]→ Lance maps tometa.aWith the wrong
next_fid, the committed fragment metadata maps the new file to the wrong schema field. Lance finds no file for the new column's actual field ID and returns null for every row.The same problem occurs with any nested type:
fixed_size_list, blob-v2, maps, etc.Fix
Replace the flat
max()with a recursive_max_field_id()that walks all descendants viaf.children():Test
TestRegressions::test_next_fid_skips_nested_child_field_ids— creates a table with a struct column (occupying 3 field IDs), merges a new scalar column via fast path, and asserts the column reads back as non-null with correct values.