Skip to content

fix: stop reconciling on standby management clusters#7

Open
jiazhiguang wants to merge 1 commit into
release-1.10-alaudafrom
fix/stop-reconcile-in-standby-cluster
Open

fix: stop reconciling on standby management clusters#7
jiazhiguang wants to merge 1 commit into
release-1.10-alaudafrom
fix/stop-reconcile-in-standby-cluster

Conversation

@jiazhiguang
Copy link
Copy Markdown
Collaborator

Add a standby reconciler wrapper backed by the system etcd-sync ConfigMap so controllers fail closed when standby state cannot be determined and skip reconciliation while the management cluster is acting as DR standby.

Wire the wrapper into core, topology, experimental, runtime, CRD migrator, Machine, MachineSet, MachineDeployment, MachineHealthCheck, MachinePool, ClusterResourceSet, and ClusterResourceSetBinding reconcilers, and expose the system namespace through a manager flag.

Harden Machine node cleanup by validating NodeRef UID and providerID before drain, volume detach waits, or delete operations, and use UID preconditions for Node deletion to avoid mutating replacement nodes.

Add unit coverage for standby detection/wrapping and replacement-node cleanup safeguards.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • master
  • ^\d.x$

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06ed7980-454a-4751-8666-f6d0c6bbe4bb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/stop-reconcile-in-standby-cluster

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jiazhiguang jiazhiguang force-pushed the fix/stop-reconcile-in-standby-cluster branch from 666f522 to 871b91a Compare June 4, 2026 03:08
Make standby handling cluster-aware so standby management clusters continue reconciling the global Cluster while skipping non-global business cluster resources.

Use global-aware wrappers for Cluster, Machine, MachineSet, MachineDeployment, MachineHealthCheck, MachinePool, topology, and ClusterResourceSetBinding reconcilers. Leave global/no-cluster controllers such as ClusterClass, ExtensionConfig, and CRD migrator unguarded for now.

Filter ClusterResourceSet target clusters in standby so only global clusters are applied. During deletion, keep the ClusterResourceSet finalizer and requeue if standby filtering skipped non-global clusters, so their ClusterResourceSetBindings can be cleaned up after failback.

Use GlobalClusterName for the reserved global cluster name, and do not add standby blocking at the ClusterCache layer.

Guard the Machine infrastructure providerID fallback against nil values.
@jiazhiguang jiazhiguang force-pushed the fix/stop-reconcile-in-standby-cluster branch from 871b91a to 5ffb6d4 Compare June 4, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant