[spark] Fix empty projection causing Invalid metadata length for COUNT(*)/COUNT(1) by Kaixuan-Duan · Pull Request #3227 · apache/fluss

Kaixuan-Duan · 2026-04-28T13:54:13Z

Purpose

Linked issue: close #2724

When Spark pushes down an empty column projection for COUNT(*)/COUNT(1) queries, the Fluss server fails with IllegalStateException("Invalid metadata length") in FileLogProjection.project(), causing the client to retry indefinitely and the query to hang.

This PR fixes the issue from two sides:

Server side: reject empty projection early with a clear InvalidColumnProjectionException instead of crashing with an internal error.
Spark connector side: fall back to projecting the first column when Spark pushes down an empty projection, so the row count is preserved without fetching unnecessary data.

Brief change log

FileLogProjection#setCurrentProjection: add a guard that throws InvalidColumnProjectionException when selectedFieldPositions is empty.
FileLogProjectionTest: add testEmptyProjectionRejectsWithClearError to verify the server-side guard.
FlussBatch#projection / FlussMicroBatchStream#projection: when readSchema yields an empty projection, fall back to Array(0) (first column).
SparkLogTableReadTest: add COUNT(*) and COUNT(1) end-to-end tests for log tables.
SparkPrimaryKeyTableReadTest: add COUNT(*) end-to-end test for primary key tables.

Tests

./mvnw -pl fluss-common -DskipTests=false -Dtest=FileLogProjectionTest#testEmptyProjectionRejectsWithClearError test

./mvnw -pl fluss-spark/fluss-spark-ut -am install -DskipTests
./mvnw -pl fluss-spark/fluss-spark-ut -Dsuites='org.apache.fluss.spark.SparkLogTableReadTest' test
./mvnw -pl fluss-spark/fluss-spark-ut -Dsuites='org.apache.fluss.spark.SparkPrimaryKeyTableReadTest' test

API and Format

Documentation

luoyuxia · 2026-04-30T03:00:55Z

@Kaixuan-Duan Hi, seems is it same with #2725. cc @beryllw

Kaixuan-Duan · 2026-04-30T16:26:21Z

@luoyuxia I’m sorry. It was my fault. I didn't notice that there was already a PR pointing to issue#2724

beryllw · 2026-05-08T07:34:24Z

I've closed PR #2725.Please go ahead.
Also, is it possible to use the pre-computed table statistics directly when dealing with PK tables, instead of scanning the data and counting afterward?

Kaixuan-Duan · 2026-05-09T15:03:13Z

@beryllw Thanks for the suggestion!
Done. The Spark connector now implements SupportsPushDownAggregates and answers COUNT(*) / COUNT(1) / COUNT(non-null col) (no GROUP BY, no DISTINCT) on PK non-lake tables directly via Admin.getTableStats().getRowCount() — no data scan.
Falls back gracefully on UnsupportedVersionException for older clusters. PTAL.

Yohahaha · 2026-05-11T01:53:38Z

I've closed PR #2725.Please go ahead. Also, is it possible to use the pre-computed table statistics directly when dealing with PK tables, instead of scanning the data and counting afterward?

@beryllw curious could we assume table statistics always accurate in fluss?

beryllw · 2026-05-19T06:16:34Z

@beryllw curious could we assume table statistics always accurate in fluss?

I think we could assume they're always accurate in Fluss — for KV tables it's backed by rowCount, and for log tables it's derived from getHighWatermark() - logStartOffset().

fluss/fluss-server/src/main/java/org/apache/fluss/server/log/LogTablet.java

Lines 248 to 250 in d4cd1a2

    
           public long getRowCount() { 
        
               return getHighWatermark() - logStartOffset(); 
        
           }

fluss/fluss-server/src/main/java/org/apache/fluss/server/kv/KvTablet.java

Lines 307 to 318 in c6a1c2d

    
           public long getRowCount() { 
        
               if (rowCount == ROW_COUNT_DISABLED) { 
        
                   throw new InvalidTableException( 
        
                           String.format( 
        
                                   "Row count is disabled for this table '%s'. This usually happens when the table is" 
        
                                           + "created before v0.9 or the changelog image is set to WAL, " 
        
                                           + "as maintaining row count in WAL mode is costly and not necessary for most use cases. " 
        
                                           + "If you want to enable row count, please set changelog image to FULL.", 
        
                                   getTablePath())); 
        
               } 
        
               return rowCount; 
        
           }

…T(*)/COUNT(1)

Yohahaha reviewed Apr 30, 2026

View reviewed changes

Comment thread fluss-common/src/main/java/org/apache/fluss/record/FileLogProjection.java Outdated

Yohahaha reviewed Apr 30, 2026

View reviewed changes

Comment thread fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkLogTableReadTest.scala Outdated

beryllw reviewed May 8, 2026

View reviewed changes

Comment thread ...park/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkPrimaryKeyTableReadTest.scala Outdated

Kaixuan-Duan force-pushed the issue-2724-empty-projection branch from 5f4d23a to 91624a4 Compare May 9, 2026 19:29

Kaixuan-Duan requested a review from beryllw May 18, 2026 01:45

Kaixuan-Duan force-pushed the issue-2724-empty-projection branch from 91624a4 to 1fb013b Compare May 26, 2026 14:34

Kaixuan-Duan added 2 commits June 2, 2026 16:06

[spark] Fix empty projection causing Invalid metadata length for COUN…

cf03dee

…T(*)/COUNT(1)

Address feedback

db85a8d

Kaixuan-Duan force-pushed the issue-2724-empty-projection branch from c61edcc to 036af32 Compare June 2, 2026 09:06

Resolve conflicts

8602e38

Kaixuan-Duan force-pushed the issue-2724-empty-projection branch from 036af32 to 8602e38 Compare June 2, 2026 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Fix empty projection causing Invalid metadata length for COUNT(*)/COUNT(1)#3227

[spark] Fix empty projection causing Invalid metadata length for COUNT(*)/COUNT(1)#3227
Kaixuan-Duan wants to merge 3 commits into
apache:mainfrom
Kaixuan-Duan:issue-2724-empty-projection

Kaixuan-Duan commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

luoyuxia commented Apr 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Kaixuan-Duan commented Apr 30, 2026

Uh oh!

beryllw commented May 8, 2026

Uh oh!

Uh oh!

Kaixuan-Duan commented May 9, 2026 •

edited

Loading

Uh oh!

Yohahaha commented May 11, 2026

Uh oh!

beryllw commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Kaixuan-Duan commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

Uh oh!

luoyuxia commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Kaixuan-Duan commented Apr 30, 2026

Uh oh!

beryllw commented May 8, 2026

Uh oh!

Uh oh!

Kaixuan-Duan commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yohahaha commented May 11, 2026

Uh oh!

beryllw commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Kaixuan-Duan commented Apr 28, 2026 •

edited

Loading

luoyuxia commented Apr 30, 2026 •

edited

Loading

Kaixuan-Duan commented May 9, 2026 •

edited

Loading

beryllw commented May 19, 2026 •

edited

Loading