Skip to content

<fix>[storage]: ZSTAC-85462 prepare zbs hosts on attach#4092

Closed
ZStack-Robot wants to merge 1 commit into
5.5.22from
sync/haidong.pang/fix/ZSTAC-85462-zbs-attach-hosts
Closed

<fix>[storage]: ZSTAC-85462 prepare zbs hosts on attach#4092
ZStack-Robot wants to merge 1 commit into
5.5.22from
sync/haidong.pang/fix/ZSTAC-85462-zbs-attach-hosts

Conversation

@ZStack-Robot
Copy link
Copy Markdown
Collaborator

Summary

Fix ZBS external primary storage attach so existing connected, non-maintenance KVM hosts are prepared before attach succeeds. This refreshes host-side ZBS client configuration and ensures the heartbeat volume without running host-storage check-host-status in attachHook.

Changes

  • Implement ExternalPrimaryStorage.attachHook() to prepare usable KVM hosts during attach.
  • Run host preparation as deployClient -> activateHeartbeatVolume and fail attach if preparation fails for a usable host.
  • Add ZBS integration coverage for attach preparation, skipped unavailable hosts, reconnect behavior, no attach-time check-host-status, and preparation failure.

Testing

  • rtk git -C zstack diff --check
  • rtk scripts/backend-build-container/container-run java-package zstack/storage
  • rtk scripts/backend-build-container/container-run java-package zstack/plugin/zbs
  • rtk scripts/backend-build-container/container-run test-zstack ZbsPrimaryStorageCase blocked before the case runs by unrelated premium/test-premium compile errors (VmOvsNicConstant, ExternalService.getServiceType()). Direct zstack/test Maven execution is also blocked by unrelated existing KVMHostUtilsTest and CoalesceQueueCase compile errors.

Resolves: ZSTAC-85462

sync from gitlab !9991

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

Review Change Stack

Warning

Review limit reached

@MatheMatrix, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 33 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 756653be-6b9d-45ee-88b4-b195011b330f

📥 Commits

Reviewing files that changed from the base of the PR and between 6495ac6 and 9022e98.

📒 Files selected for processing (2)
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy

Walkthrough

ExternalPrimaryStorage 添加了 attachHook 方法实现,用于在集群附加时逐台部署主机客户端并激活心跳卷。补充了三个集成测试验证成功、部署失败和心跳卷激活失败的场景。

Changes

外部主存储附加流程实现

Layer / File(s) Summary
ExternalPrimaryStorage的attachHook实现
storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
新增导入 HostStateHostStatusHostVO_;实现 attachHook(String clusterUuid, Completion completion),在集群下查询已连接KVM主机,按列表逐个调用 deployClient 部署客户端,失败时聚合首个错误码,成功后依序激活心跳卷。
附加流程集成测试
test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy
新增导入 SQLHostState/Status/VOPrimaryStorageClusterRefVO;在 test() 调用三个新增测试。testPrepareHostsWhenAttachPrimaryStorageToCluster 验证成功附加时的部署与心跳卷激活;testAttachPrimaryStorageFailsWhenPreparingUsableHostFails 验证部署失败时集群关联未产生且状态不变;testAttachPrimaryStorageFailsWhenActivatingHeartbeatVolumeFails 验证心跳卷失败时的状态回滚。

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

💾 外存链千主机,钩子引流自合心;
部署哈心跳齐唱,失败回滚保幂整。
测试三景验成败,兔子欢舞庆新章。🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title follows the required [scope]: format (55 characters, well under 72-character limit) and directly describes the main change: preparing ZBS hosts during storage attach operation.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, clearly explaining the fix for ZBS external primary storage attach and detailing the implementation approach and test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/haidong.pang/fix/ZSTAC-85462-zbs-attach-hosts

Comment @coderabbitai help to get the list of available commands and usage tips.

@MatheMatrix MatheMatrix force-pushed the sync/haidong.pang/fix/ZSTAC-85462-zbs-attach-hosts branch 7 times, most recently from 210b728 to 6495ac6 Compare May 28, 2026 06:47
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`:
- Around line 122-127: The host filter currently selects hosts by Connected
status and KVM hypervisor but still includes Disabled hosts; update the
Q.New(HostVO.class) query (the chain that uses HostVO_.clusterUuid,
HostVO_.status, HostVO_.hypervisorType and HostVO_.state) to exclude Disabled as
well—e.g. extend the notIn/state check to include HostState.Disabled (or add
.ne(HostVO_.state, HostState.Disabled)) so only Connected + Enabled + KVM hosts
are returned for attach/deploy operations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 77e386bb-7404-4730-a5fc-95e6768c0553

📥 Commits

Reviewing files that changed from the base of the PR and between 210b728 and 6495ac6.

📒 Files selected for processing (2)
  • storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/addon/zbs/ZbsPrimaryStorageCase.groovy

Comment on lines +122 to +127
List<HostVO> hosts = Q.New(HostVO.class)
.eq(HostVO_.clusterUuid, clusterUuid)
.eq(HostVO_.status, HostStatus.Connected)
.eq(HostVO_.hypervisorType, VmInstanceConstant.KVM_HYPERVISOR_TYPE)
.notIn(HostVO_.state, Arrays.asList(HostState.PreMaintenance, HostState.Maintenance))
.list();
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

筛选条件应排除 Disabled 主机,避免被“不可调度主机”误阻塞附加流程。

当前仅排除了 PreMaintenance/MaintenanceDisabled 仍会进入部署列表;若其 deployClient 失败,会直接导致 attach 失败,即使集群里还有可用主机。建议将可用主机收敛为 Connected + Enabled + KVM

建议修改
         List<HostVO> hosts = Q.New(HostVO.class)
                 .eq(HostVO_.clusterUuid, clusterUuid)
                 .eq(HostVO_.status, HostStatus.Connected)
+                .eq(HostVO_.state, HostState.Enabled)
                 .eq(HostVO_.hypervisorType, VmInstanceConstant.KVM_HYPERVISOR_TYPE)
-                .notIn(HostVO_.state, Arrays.asList(HostState.PreMaintenance, HostState.Maintenance))
                 .list();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`
around lines 122 - 127, The host filter currently selects hosts by Connected
status and KVM hypervisor but still includes Disabled hosts; update the
Q.New(HostVO.class) query (the chain that uses HostVO_.clusterUuid,
HostVO_.status, HostVO_.hypervisorType and HostVO_.state) to exclude Disabled as
well—e.g. extend the notIn/state check to include HostState.Disabled (or add
.ne(HostVO_.state, HostState.Disabled)) so only Connected + Enabled + KVM hosts
are returned for attach/deploy operations.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from haidong.pang:

Disable的主机也应当deploy。避免该主机enable后无法连接存储。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatheMatrix,感谢 haidong.pang 的解释!这个设计考量是合理的——Disabled 状态只是管理员主动禁用调度,主机仍处于 Connected 状态,提前部署好 ZBS client 可以确保主机重新 Enabled 后立即能连接存储,不需要再等重连触发。与此同时 deployClient 是幂等操作,多次调用无副作用。

我之前建议排除 Disabled 主机是错误的,请忽略该建议。


✏️ Learnings added
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 4092
File: storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java:122-126
Timestamp: 2026-05-28T07:21:00.466Z
Learning: In ExternalPrimaryStorage.attachHook() (storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java), the host filter intentionally includes Disabled hosts (only PreMaintenance and Maintenance are excluded). This is by design: Disabled hosts are still Connected and should have the ZBS client deployed proactively so that when they are re-enabled they can immediately connect to storage without waiting for a reconnect. deployClient is idempotent so deploying to Disabled hosts is safe. Do NOT suggest filtering out Disabled hosts from this query.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 3826
File: compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java:261-265
Timestamp: 2026-04-22T04:38:00.864Z
Learning: 在 ZStack 的 `Platform.err()` / `Platform.toRawString()` 调用中,传入用于格式化的 `%s` 参数若实现了 `org.zstack.header.core.I18nMessage`,框架会自动通过 `getDetails()` 取得原始文本、并通过 `getI18nDetails()` 取得国际化文本。因此,在代码审查中不应将“直接把实现了 `I18nMessage` 的对象作为 `%s` 参数传入而可能打印出对象地址/不安全”当作问题;此用法是正确且应当被视为安全的(例如将 `HostCandidate.RejectedCandidate` 这类实现了 `I18nMessage` 的对象作为 `%s` 参数传入)。

@MatheMatrix MatheMatrix force-pushed the sync/haidong.pang/fix/ZSTAC-85462-zbs-attach-hosts branch 2 times, most recently from 7b4fbc5 to 09d138e Compare May 28, 2026 08:11
Resolves: ZSTAC-85462

Change-Id: Iba5d8a7f144cdaf367030867d202887f99eef77e
@MatheMatrix MatheMatrix force-pushed the sync/haidong.pang/fix/ZSTAC-85462-zbs-attach-hosts branch from 09d138e to 9022e98 Compare May 28, 2026 09:11
@MatheMatrix MatheMatrix deleted the sync/haidong.pang/fix/ZSTAC-85462-zbs-attach-hosts branch May 28, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants