Skip to content

fix: write valid empty BinaryRow bytes for empty table stats#349

Merged
JingsongLi merged 4 commits into
apache:mainfrom
shyjsarah:worktree-fix-empty-binary-table-stats
Jun 2, 2026
Merged

fix: write valid empty BinaryRow bytes for empty table stats#349
JingsongLi merged 4 commits into
apache:mainfrom
shyjsarah:worktree-fix-empty-binary-table-stats

Conversation

@shyjsarah
Copy link
Copy Markdown
Contributor

Purpose

A non-partitioned table written via paimon-rust (e.g. through fusion / pypaimon) is unreadable by the Java reader. Spark/Flink/Hive crash with BufferUnderflowException inside SerializationUtils.deserializeBinaryRow the moment they try to scan the manifest list:

Caused by: java.nio.BufferUnderflowException
    at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:435)
    at org.apache.paimon.utils.SerializationUtils.deserializeBinaryRow(SerializationUtils.java:85)
    at org.apache.paimon.stats.SimpleStats.fromRow(SimpleStats.java:97)
    at org.apache.paimon.manifest.ManifestFileMetaSerializer.convertFrom(ManifestFileMetaSerializer.java:75)
    ...

Repro — any PK / append table without partitions:

CREATE TABLE test_pk (order_id INT, price STRING, customer STRING, PRIMARY KEY (order_id) NOT ENFORCED);
INSERT INTO test_pk VALUES (1, 'TEST', 'TEST');   -- written via paimon-rust
SELECT * FROM test_pk;                            -- read via Spark → crash

Root causeBinaryTableStats was being constructed with Vec::new() for min_values / max_values when there were no columns to collect stats for. The Java reader uses SerializationUtils.deserializeBinaryRow, which requires at minimum a 4-byte big-endian arity prefix (an EMPTY_ROW serializes to 12 bytes: 4-byte arity + 8-byte null bit set). Zero-length input fails at the very first buffer.getInt().

The single hot path is compute_partition_stats in table_commit.rs, which returns this stats record whenever a non-partitioned table is committed — which is why the failure is so easy to hit. The same wrong empty value is repeated in several other places (Avro decode fallbacks, test fixtures) and would re-emit the same broken bytes if those paths ever round-tripped to disk.

Brief change log

  • Add BinaryTableStats::empty() (crates/paimon/src/spec/stats.rs) that returns stats backed by the existing EMPTY_SERIALIZED_ROW constant (arity-0 BinaryRow, 12 bytes), with a doc comment explaining the Java-side protocol contract.
  • Fix the production write path: compute_partition_stats in crates/paimon/src/table/table_commit.rs now returns BinaryTableStats::empty() instead of zero-length bytes when there are no partition fields or no entries.
  • Fix Avro decode fallbacks so a missing _PARTITION_STATS / key_stats / value_stats reconstitutes as protocol-valid empty stats (crates/paimon/src/spec/avro/manifest_file_meta_decode.rs, crates/paimon/src/spec/avro/manifest_entry_decode.rs).
  • Migrate test fixtures off the bad pattern (crates/paimon/src/spec/manifest.rs, crates/paimon/src/table/referenced_files.rs, crates/paimon/src/table/data_evolution_writer.rs, crates/paimon/src/table/table_commit.rs test helpers).

Tests

Three new unit tests, all round-tripping the empty stats through BinaryRow::from_serialized_bytes — the Rust mirror of Java's SerializationUtils.deserializeBinaryRow, so passing here ≡ passing on the Java reader:

  • spec::stats::tests::empty_stats_carries_arity_prefix_parseable_by_readerBinaryTableStats::empty() produces bytes ≥ 4 bytes long and decodes back to an arity=0 BinaryRow.
  • table::table_commit::tests::compute_partition_stats_no_partition_fields_returns_decodable_empty — directly reproduces the user-visible case (non-partitioned table with one entry).
  • table::table_commit::tests::compute_partition_stats_empty_entries_returns_decodable_empty — covers the other fallback branch (partitioned schema, no entries).

Local verification:

  • cargo build -p paimon — ok
  • cargo test -p paimon --lib — 675 passed, 0 failed
  • cargo clippy -p paimon --all-targets -- -D warnings — clean
  • cargo fmt --check — clean

API and Format

No on-disk schema change. The fix restores conformance with the existing binary format that the Java reference implementation expects.

Public API: adds BinaryTableStats::empty() constructor. No breaking changes — BinaryTableStats::new(...) is untouched and remains the way to construct stats with real values.

Documentation

No documentation changes required. The new empty() method carries a rustdoc comment explaining when to use it and why Vec::new() would break the Java reader; that captures the contract at the point where future contributors will look.

Empty `BinaryTableStats` previously stored `Vec::new()` for `min_values` /
`max_values`. Java readers parse those fields via
`SerializationUtils.deserializeBinaryRow`, which requires at minimum a
4-byte big-endian arity prefix and throws `BufferUnderflowException` on
zero-length input. The most visible casualty is a non-partitioned table
written via paimon-rust: any Spark/Flink read of its manifest list crashes
inside `SimpleStats.fromRow` because `_PARTITION_STATS._MIN_VALUES` is
empty bytes.

Introduce `BinaryTableStats::empty()` that returns stats backed by
`EMPTY_SERIALIZED_ROW` (the arity=0 BinaryRow, 12 bytes) and route every
existing call site through it: the production fix in
`compute_partition_stats`, the Avro decode fallbacks for missing
`_PARTITION_STATS` / `key_stats` / `value_stats`, and the test fixtures
that copied the old pattern.

Add regression tests that round-trip the empty stats through
`BinaryRow::from_serialized_bytes` — the same protocol the Java reader
uses — so future copies of `vec![]` are caught locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. The non-partitioned and empty-entry paths are covered now, but one write path can still emit zero-length min/max stats.

In crates/paimon/src/table/table_commit.rs, compute_partition_stats still calls datums_to_binary_row for the aggregated partition min/max rows. If every value for every partition field is NULL, both mins and maxs remain all None, so datums_to_binary_row returns Vec::new() while null_counts is non-empty. Java SimpleStats.fromRow still calls SerializationUtils.deserializeBinaryRow(row.getBinary(0/1)) before it checks for EMPTY_STATS, so a partitioned table whose committed partition key is null can hit the same BufferUnderflowException this PR is fixing.

Could you make this path write decodable min/max bytes too, for example a serialized BinaryRow with the partition arity and null bits set, and add a regression test for a partitioned table with an all-null partition key?

Verification: cargo test -p paimon passes locally on this PR.

shaoyijie and others added 2 commits June 2, 2026 10:35
The previous commit covered the empty-stats path (no partition columns
or no entries), but `compute_partition_stats` could still emit
zero-length `_MIN_VALUES` / `_MAX_VALUES` for a partitioned table whose
committed entries carry NULL partition values. After the per-entry loop,
when every collected min/max stays `None`, the path called
`spec::datums_to_binary_row(&all_none)` which intentionally collapses
to `vec![]` — a valid "no partition columns" sentinel used by
`partitions_to_bytes`, but a protocol violation here. The accompanying
`null_counts` is non-empty (incremented per null observed), so Java's
`SimpleStats.fromRow` cannot short-circuit to `EMPTY_STATS` and crashes
inside `SerializationUtils.deserializeBinaryRow` on the first
`buffer.getInt()` — the same `BufferUnderflowException` the parent fix
targets.

Stop routing this through `datums_to_binary_row`. Add a tiny
`build_partition_stats_row` helper that always produces a protocol-valid
serialized `BinaryRow` with the partition schema's arity, setting the
null bit for every `None` datum (mirroring Java's
`SimpleStatsConverter.toBinaryAllMode` with a `GenericRow(num_fields)`).
`datums_to_binary_row` is left alone because `partitions_to_bytes`
relies on its `vec![]` sentinel.

Add a regression test
`compute_partition_stats_all_null_partition_values_returns_decodable_bytes`
that hand-builds an entry with a valid arity-1 BinaryRow whose only
partition column is NULL, runs `compute_partition_stats`, and asserts
the min/max bytes decode back to an arity-1 BinaryRow with the null
bit set and `null_counts == [Some(1)]`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review feedback the prior commit's inline rationale was too long.
The WHY lives in the commit message and the regression test name;
keep only a one-line doc on the helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shyjsarah
Copy link
Copy Markdown
Contributor Author

Thanks for the fix. The non-partitioned and empty-entry paths are covered now, but one write path can still emit zero-length min/max stats.

In crates/paimon/src/table/table_commit.rs, compute_partition_stats still calls datums_to_binary_row for the aggregated partition min/max rows. If every value for every partition field is NULL, both mins and maxs remain all None, so datums_to_binary_row returns Vec::new() while null_counts is non-empty. Java SimpleStats.fromRow still calls SerializationUtils.deserializeBinaryRow(row.getBinary(0/1)) before it checks for EMPTY_STATS, so a partitioned table whose committed partition key is null can hit the same BufferUnderflowException this PR is fixing.

Could you make this path write decodable min/max bytes too, for example a serialized BinaryRow with the partition arity and null bits set, and add a regression test for a partitioned table with an all-null partition key?

Verification: cargo test -p paimon passes locally on this PR.

Good catch — pushed b23f870 + dc62a5e: stops routing compute_partition_stats's min/max through datums_to_binary_row (whose all-None → vec![] shortcut is a partitions_to_bytes sentinel, not safe here when null_counts is non-empty), uses a new build_partition_stats_row that always emits an arity-N BinaryRow with null bits set, and adds a regression test for the all-null partition row case.

…nary-table-stats

# Conflicts:
#	crates/paimon/src/table/table_commit.rs
@shyjsarah shyjsarah closed this Jun 2, 2026
@shyjsarah shyjsarah reopened this Jun 2, 2026
@JingsongLi JingsongLi closed this Jun 2, 2026
@JingsongLi JingsongLi reopened this Jun 2, 2026
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after the all-null partition stats follow-up. The new build_partition_stats_row path keeps min/max bytes decodable for the arity-N all-null case while preserving datums_to_binary_row for partition sentinel usage, and the added regressions cover the empty/non-partitioned/all-null cases.

I did not find further blocking issues.

Verification:

  • cargo test -p paimon table_commit::tests::compute_partition_stats
  • cargo test -p paimon empty_stats_carries_arity_prefix_parseable_by_reader

@JingsongLi JingsongLi merged commit d23cd95 into apache:main Jun 2, 2026
20 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants