fix(fast-path): use count_rows() instead of collect() in _can_use_fast_path by FANNG1 · Pull Request #45 · daft-engine/daft-lance

FANNG1 · 2026-06-26T01:06:28Z

Fixes #42

Problem

_can_use_fast_path used len(df.collect()) to count rows. collect() populates df._result_cache with the materialised results. When the pipeline includes take_blobs(), those cached results include BlobFile instances — one-shot streams whose .read() returns b"" after the first call.

The fast-path merge subsequently calls df.groupby("fragment_id").map_groups(...). Because _result_cache is already set, Daft returns the same exhausted BlobFile objects instead of re-materialising from Lance. Downstream UDFs that call blob.read() receive empty bytes and produce null or incorrect output.

Fix

Replace len(df.collect()) with df.count_rows(). count_rows() does not set _result_cache, so the pipeline re-executes fresh when the merge runs.

Test

TestRegressions::test_fast_path_check_does_not_set_result_cache — verifies that _can_use_fast_path leaves df._result_cache unset after it returns, confirming count_rows() (not collect()) is used internally.

…t_path collect() populates df._result_cache with one-shot Python objects (e.g. BlobFile from take_blobs()). A subsequent groupby().map_groups() re-uses the same df object, receiving the same exhausted objects from the cache; downstream UDFs that call blob.read() get b'' and produce null/wrong output. count_rows() does not set _result_cache, so the pipeline re-executes fresh. Fixes daft-engine#42

FANNG1 mentioned this pull request Jun 26, 2026

fix(fast-path): three bugs in FastPathFragmentWriter and _can_use_fast_path #43

Closed

FANNG1 force-pushed the fix/fast-path-collect-sideeffect branch from 0e7a835 to 43da0eb Compare June 26, 2026 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(fast-path): use count_rows() instead of collect() in _can_use_fast_path#45

fix(fast-path): use count_rows() instead of collect() in _can_use_fast_path#45
FANNG1 wants to merge 1 commit into
daft-engine:mainfrom
FANNG1:fix/fast-path-collect-sideeffect

FANNG1 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FANNG1 commented Jun 26, 2026

Problem

Fix

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant