fix: write valid empty BinaryRow bytes for empty table stats#349
Conversation
Empty `BinaryTableStats` previously stored `Vec::new()` for `min_values` / `max_values`. Java readers parse those fields via `SerializationUtils.deserializeBinaryRow`, which requires at minimum a 4-byte big-endian arity prefix and throws `BufferUnderflowException` on zero-length input. The most visible casualty is a non-partitioned table written via paimon-rust: any Spark/Flink read of its manifest list crashes inside `SimpleStats.fromRow` because `_PARTITION_STATS._MIN_VALUES` is empty bytes. Introduce `BinaryTableStats::empty()` that returns stats backed by `EMPTY_SERIALIZED_ROW` (the arity=0 BinaryRow, 12 bytes) and route every existing call site through it: the production fix in `compute_partition_stats`, the Avro decode fallbacks for missing `_PARTITION_STATS` / `key_stats` / `value_stats`, and the test fixtures that copied the old pattern. Add regression tests that round-trip the empty stats through `BinaryRow::from_serialized_bytes` — the same protocol the Java reader uses — so future copies of `vec![]` are caught locally. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks for the fix. The non-partitioned and empty-entry paths are covered now, but one write path can still emit zero-length min/max stats.
In crates/paimon/src/table/table_commit.rs, compute_partition_stats still calls datums_to_binary_row for the aggregated partition min/max rows. If every value for every partition field is NULL, both mins and maxs remain all None, so datums_to_binary_row returns Vec::new() while null_counts is non-empty. Java SimpleStats.fromRow still calls SerializationUtils.deserializeBinaryRow(row.getBinary(0/1)) before it checks for EMPTY_STATS, so a partitioned table whose committed partition key is null can hit the same BufferUnderflowException this PR is fixing.
Could you make this path write decodable min/max bytes too, for example a serialized BinaryRow with the partition arity and null bits set, and add a regression test for a partitioned table with an all-null partition key?
Verification: cargo test -p paimon passes locally on this PR.
The previous commit covered the empty-stats path (no partition columns or no entries), but `compute_partition_stats` could still emit zero-length `_MIN_VALUES` / `_MAX_VALUES` for a partitioned table whose committed entries carry NULL partition values. After the per-entry loop, when every collected min/max stays `None`, the path called `spec::datums_to_binary_row(&all_none)` which intentionally collapses to `vec![]` — a valid "no partition columns" sentinel used by `partitions_to_bytes`, but a protocol violation here. The accompanying `null_counts` is non-empty (incremented per null observed), so Java's `SimpleStats.fromRow` cannot short-circuit to `EMPTY_STATS` and crashes inside `SerializationUtils.deserializeBinaryRow` on the first `buffer.getInt()` — the same `BufferUnderflowException` the parent fix targets. Stop routing this through `datums_to_binary_row`. Add a tiny `build_partition_stats_row` helper that always produces a protocol-valid serialized `BinaryRow` with the partition schema's arity, setting the null bit for every `None` datum (mirroring Java's `SimpleStatsConverter.toBinaryAllMode` with a `GenericRow(num_fields)`). `datums_to_binary_row` is left alone because `partitions_to_bytes` relies on its `vec![]` sentinel. Add a regression test `compute_partition_stats_all_null_partition_values_returns_decodable_bytes` that hand-builds an entry with a valid arity-1 BinaryRow whose only partition column is NULL, runs `compute_partition_stats`, and asserts the min/max bytes decode back to an arity-1 BinaryRow with the null bit set and `null_counts == [Some(1)]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per review feedback the prior commit's inline rationale was too long. The WHY lives in the commit message and the regression test name; keep only a one-line doc on the helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Good catch — pushed b23f870 + dc62a5e: stops routing compute_partition_stats's min/max through datums_to_binary_row (whose all-None → vec![] shortcut is a partitions_to_bytes sentinel, not safe here when null_counts is non-empty), uses a new build_partition_stats_row that always emits an arity-N BinaryRow with null bits set, and adds a regression test for the all-null partition row case. |
…nary-table-stats # Conflicts: # crates/paimon/src/table/table_commit.rs
JingsongLi
left a comment
There was a problem hiding this comment.
Re-reviewed after the all-null partition stats follow-up. The new build_partition_stats_row path keeps min/max bytes decodable for the arity-N all-null case while preserving datums_to_binary_row for partition sentinel usage, and the added regressions cover the empty/non-partitioned/all-null cases.
I did not find further blocking issues.
Verification:
cargo test -p paimon table_commit::tests::compute_partition_statscargo test -p paimon empty_stats_carries_arity_prefix_parseable_by_reader
Purpose
A non-partitioned table written via
paimon-rust(e.g. throughfusion/pypaimon) is unreadable by the Java reader. Spark/Flink/Hive crash withBufferUnderflowExceptioninsideSerializationUtils.deserializeBinaryRowthe moment they try to scan the manifest list:Repro — any PK / append table without partitions:
Root cause —
BinaryTableStatswas being constructed withVec::new()formin_values/max_valueswhen there were no columns to collect stats for. The Java reader usesSerializationUtils.deserializeBinaryRow, which requires at minimum a 4-byte big-endian arity prefix (anEMPTY_ROWserializes to 12 bytes: 4-byte arity + 8-byte null bit set). Zero-length input fails at the very firstbuffer.getInt().The single hot path is
compute_partition_statsintable_commit.rs, which returns this stats record whenever a non-partitioned table is committed — which is why the failure is so easy to hit. The same wrong empty value is repeated in several other places (Avro decode fallbacks, test fixtures) and would re-emit the same broken bytes if those paths ever round-tripped to disk.Brief change log
BinaryTableStats::empty()(crates/paimon/src/spec/stats.rs) that returns stats backed by the existingEMPTY_SERIALIZED_ROWconstant (arity-0 BinaryRow, 12 bytes), with a doc comment explaining the Java-side protocol contract.compute_partition_statsincrates/paimon/src/table/table_commit.rsnow returnsBinaryTableStats::empty()instead of zero-length bytes when there are no partition fields or no entries._PARTITION_STATS/key_stats/value_statsreconstitutes as protocol-valid empty stats (crates/paimon/src/spec/avro/manifest_file_meta_decode.rs,crates/paimon/src/spec/avro/manifest_entry_decode.rs).crates/paimon/src/spec/manifest.rs,crates/paimon/src/table/referenced_files.rs,crates/paimon/src/table/data_evolution_writer.rs,crates/paimon/src/table/table_commit.rstest helpers).Tests
Three new unit tests, all round-tripping the empty stats through
BinaryRow::from_serialized_bytes— the Rust mirror of Java'sSerializationUtils.deserializeBinaryRow, so passing here ≡ passing on the Java reader:spec::stats::tests::empty_stats_carries_arity_prefix_parseable_by_reader—BinaryTableStats::empty()produces bytes ≥ 4 bytes long and decodes back to an arity=0 BinaryRow.table::table_commit::tests::compute_partition_stats_no_partition_fields_returns_decodable_empty— directly reproduces the user-visible case (non-partitioned table with one entry).table::table_commit::tests::compute_partition_stats_empty_entries_returns_decodable_empty— covers the other fallback branch (partitioned schema, no entries).Local verification:
cargo build -p paimon— okcargo test -p paimon --lib— 675 passed, 0 failedcargo clippy -p paimon --all-targets -- -D warnings— cleancargo fmt --check— cleanAPI and Format
No on-disk schema change. The fix restores conformance with the existing binary format that the Java reference implementation expects.
Public API: adds
BinaryTableStats::empty()constructor. No breaking changes —BinaryTableStats::new(...)is untouched and remains the way to construct stats with real values.Documentation
No documentation changes required. The new
empty()method carries a rustdoc comment explaining when to use it and whyVec::new()would break the Java reader; that captures the contract at the point where future contributors will look.