Skip to content

Document repetition / definition levels more#10101

Draft
alamb wants to merge 3 commits into
apache:mainfrom
alamb:alamb/document_repetition_definition
Draft

Document repetition / definition levels more#10101
alamb wants to merge 3 commits into
apache:mainfrom
alamb:alamb/document_repetition_definition

Conversation

@alamb

@alamb alamb commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

While reading #9849 I spent some non trivial time reviewing definition and repetition levels and I wanted to encode this knowledge as comments for my future self.

What changes are included in this PR?

Document levels

Are these changes tested?

Doc only, covered by CI

Are there any user-facing changes?

docs only (of internal structures)

@github-actions github-actions Bot added the parquet Changes to the parquet crate label Jun 9, 2026
Comment thread parquet/src/arrow/array_reader/mod.rs
/// * a [`StructArrayReader`] has no `empty` state — only present `d >= D` vs
/// null `d < D`.
///
/// **Repetition level** — where a value attaches relative to this reader's list:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now wondering whether there can actually be inconsistencies between repetition and definition levels, which might lead to different implementations interpreting the same file differently. Consider the following level data for a required list with required items:

Rep levels: [0, 1]
Def levels: [0, 1]

The first row should be an empty list according to its def level, but then then the second rep level indicates a continuation/insert into the previously started list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants