Skip to content

[server] Add Cluster Health API implementation#3400

Open
swuferhong wants to merge 1 commit into
apache:mainfrom
swuferhong:fluss-server-recovery
Open

[server] Add Cluster Health API implementation#3400
swuferhong wants to merge 1 commit into
apache:mainfrom
swuferhong:fluss-server-recovery

Conversation

@swuferhong
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #3399

  • Add GetClusterHealth RPC to Coordinator that computes cluster health from in-memory state
  • Track inactive leaders in CoordinatorContext (marked inactive on NotifyLeaderAndIsr send,
    marked active on successful response when responding server is still the leader)
  • Handle send failures in CoordinatorRequestBatch by synthesizing error responses to clear
    pending inactive state
  • Add client API Admin.getClusterHealth() with ClusterHealth / ClusterHealthStatus types
  • Add ClusterHealthReadinessCheck CLI tool in fluss-dist (exit 0=GREEN, 1=not ready, 2=API unsupported)
  • Add readiness-check.sh two-step readiness probe script (TCP + Cluster Health API)
    with first-boot detection and grace period for API-unsupported (mixed-version rolling upgrade)
  • Wire tablet-server readiness probe to readiness-check.sh in Helm chart
  • Add documentation for Helm deployment and upgrade guide

Brief change log

Tests

API and Format

Documentation

@swuferhong swuferhong force-pushed the fluss-server-recovery branch from fe0c5c1 to 042fc7a Compare May 29, 2026 03:01
Copy link
Copy Markdown
Contributor

@loserwang1024 loserwang1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comment.

* send NotifyLeaderAndIsr until the target server responds successfully confirming it is the
* leader. Also inactive when leader == NO_LEADER.
*/
private final Set<TableBucket> inactiveLeaderBuckets = new HashSet<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming: inactiveLeaderBuckets → pendingLeaderActivationBuckets

The name over-claims. The set is only populated in sendNotifyLeaderAndIsrRequest — buckets where we've sent a NotifyLeaderAndIsr and await leadership confirmation. Buckets leaderless for other reasons (server offline, rebalance with no new leader yet) aren't in it, though their leader is effectively inactive. So it really means "leader change dispatched, awaiting confirmation" — pendingLeaderActivationBuckets fits. Avoid pendingIsr...: this has nothing to do with ISR (that's a separate dimension driving YELLOW).

Bigger issue: "active leader" is defined twice. computeClusterHealth checks leader != NO_LEADER && liveServers.contains(leader) && !inSet, but the helper only checks the set:

public boolean isLeaderActive(TableBucket bucket) {
    return !inactiveLeaderBuckets.contains(bucket); // ignores liveServers / NO_LEADER
}

So for a leader on a dead server, isLeaderActive() says true while computeClusterHealth() says RED — inconsistent, and a trap for future reuse.

Suggestion: rename the set + make isLeaderActive the single source of truth:

public boolean isLeaderActive(TableBucket tb) {
    return getBucketLeaderAndIsr(tb)
        .map(lai -> lai.leader() != LeaderAndIsr.NO_LEADER
                && liveTabletServers.containsKey(lai.leader())
                && !pendingLeaderActivationBuckets.contains(tb))
        .orElse(false);
}

Then computeClusterHealth just calls ctx.isLeaderActive(tb).


// PbClusterHealthStatus: GREEN=0, YELLOW=1, RED=2, UNKNOWN=3
int status;
if (numLeaderReplicas == 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can merge with else {
status = 0; // GREEN
}

}

@Override
public CompletableFuture<GetClusterHealthResponse> getClusterHealth(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move to CoordinatorGateway? Could tablet server support it?

}
}
if (!offlineReplicas.isEmpty()) {
// trigger replicas to offline
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this comment?

// re-election via onReplicaBecomeOffline.
List<NotifyLeaderAndIsrResultForBucket> failedResults =
new ArrayList<>();
ApiError sendError =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, after this modification, even a network timeout exception, the CoordinatorEventProcessor#processNotifyLeaderAndIsrResponseReceivedEvent will mark the server as offlineReplicas

won't is any problem?
Image

echo "advertised.listeners: ${ADVERTISED_LISTENERS}" >> $FLUSS_HOME/conf/server.yaml && \

bin/coordinator-server.sh start-foreground
exec bin/coordinator-server.sh start-foreground
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to change it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[server] Support Cluster Health API for safe rolling upgrades

2 participants