Skip to content

fix(reader): graceful handling of missing column index#2693

Open
sandugood wants to merge 2 commits into
apache:mainfrom
sandugood:fix/2452-graceful-missing-column-index
Open

fix(reader): graceful handling of missing column index#2693
sandugood wants to merge 2 commits into
apache:mainfrom
sandugood:fix/2452-graceful-missing-column-index

Conversation

@sandugood

Copy link
Copy Markdown

Which issue does this PR close?

What changes are included in this PR?

Changed the get_row_selection_for_filter_predicate to return Ok(None) on cases where indices were absent instead of throwing an error.

Are these changes tested?

Unit-tested the change in logic. As specified in the #2464 (comment)

jd-dlx and others added 2 commits May 18, 2026 08:05
…lection

When row_selection_enabled is true and the Parquet file lacks column or
offset index metadata (common with older/migrated files), the reader now
skips page-level row pruning instead of returning an error.

Row-group filtering via statistics and the ArrowPredicate row filter
still function normally; only page-index-based RowSelection is skipped.

Closes apache#2452
@sandugood

Copy link
Copy Markdown
Author

It seems that the original PR didn't receive any feedback from the committer. We keep having the same bug (regarding .parquet reading in Datafusion Comet). That is why I iterated over it and introduced the proposed changes.

cc @laskoviymishka, if you have time to review this. Thanks in advance!

@sandugood sandugood changed the title fix(reader): graceful missing column index fix(reader): graceful handling of missing column index Jun 22, 2026
@laskoviymishka

Copy link
Copy Markdown
Contributor

I'll take a look on this week

@laskoviymishka laskoviymishka self-requested a review June 22, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Iceberg scan error: Parquet file metadata does not contain a column index

3 participants