Skip to content

fix(fast-path): preserve Arrow type when building new-column table#44

Open
FANNG1 wants to merge 1 commit into
daft-engine:mainfrom
FANNG1:fix/fast-path-type-erasure
Open

fix(fast-path): preserve Arrow type when building new-column table#44
FANNG1 wants to merge 1 commit into
daft-engine:mainfrom
FANNG1:fix/fast-path-type-erasure

Conversation

@FANNG1

@FANNG1 FANNG1 commented Jun 26, 2026

Copy link
Copy Markdown

Fixes #40

Problem

FastPathFragmentWriter built the new-column Arrow table via:

arr = pa.array(s.to_pylist() if hasattr(s, "to_pylist") else list(s))

pa.array(s.to_pylist()) loses Arrow type information. Python floats are always float64 and Python lists carry no fixed-size constraint, so fixed_size_list<item: float>[N] (e.g. an embedding vector from a @daft.func.batch(return_dtype=DataType.fixed_size_list(DataType.float32(), 128)) UDF) was silently widened to list<item: double>. Lance then commits the wrong type in the schema.

Fix

Use s.to_arrow().combine_chunks() instead, which preserves the Arrow type that daft declared in the UDF's return_dtype.

if hasattr(s, "to_arrow"):
    arr = s.to_arrow()
    if isinstance(arr, pa.ChunkedArray):
        arr = arr.combine_chunks()
else:
    arr = pa.array(list(s))

Test

TestRegressions::test_fixed_size_list_float32_type_preserved — creates a UDF returning fixed_size_list<float32>[8], merges via fast path, and asserts the Lance schema preserves the type and values are non-null.

pa.array(s.to_pylist()) loses type information: fixed_size_list<float32>[N]
becomes list<double> because Python floats are float64 and list structure is
inferred from Python lists. Use s.to_arrow().combine_chunks() to preserve
the declared daft return type exactly.

Fixes daft-engine#40
@FANNG1 FANNG1 force-pushed the fix/fast-path-type-erasure branch from bed8628 to 48f49ca Compare June 26, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(fast-path): fixed_size_list<float32>[N] type erased to list<float64> by FastPathFragmentWriter

1 participant