Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,10 @@
import org.zstack.header.errorcode.ErrorCodeList;
import org.zstack.header.errorcode.OperationFailureException;
import org.zstack.header.host.HostInventory;
import org.zstack.header.host.HostState;
import org.zstack.header.host.HostStatus;
import org.zstack.header.host.HostVO;
import org.zstack.header.host.HostVO_;
import org.zstack.header.image.ImageConstant;
import org.zstack.header.image.ImageInventory;
import org.zstack.header.image.ImageVO;
Expand Down Expand Up @@ -114,6 +117,61 @@ public ExternalPrimaryStorage(ExternalPrimaryStorage other) {
this.selfConfig = other.selfConfig;
}

@Override
public void attachHook(String clusterUuid, Completion completion) {
List<HostVO> hosts = Q.New(HostVO.class)
.eq(HostVO_.clusterUuid, clusterUuid)
.eq(HostVO_.status, HostStatus.Connected)
.notIn(HostVO_.state, Arrays.asList(HostState.PreMaintenance, HostState.Maintenance))
.list();
Comment on lines +122 to +126
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

筛选条件应排除 Disabled 主机,避免被“不可调度主机”误阻塞附加流程。

当前仅排除了 PreMaintenance/MaintenanceDisabled 仍会进入部署列表;若其 deployClient 失败,会直接导致 attach 失败,即使集群里还有可用主机。建议将可用主机收敛为 Connected + Enabled + KVM

建议修改
         List<HostVO> hosts = Q.New(HostVO.class)
                 .eq(HostVO_.clusterUuid, clusterUuid)
                 .eq(HostVO_.status, HostStatus.Connected)
+                .eq(HostVO_.state, HostState.Enabled)
                 .eq(HostVO_.hypervisorType, VmInstanceConstant.KVM_HYPERVISOR_TYPE)
-                .notIn(HostVO_.state, Arrays.asList(HostState.PreMaintenance, HostState.Maintenance))
                 .list();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java`
around lines 122 - 127, The host filter currently selects hosts by Connected
status and KVM hypervisor but still includes Disabled hosts; update the
Q.New(HostVO.class) query (the chain that uses HostVO_.clusterUuid,
HostVO_.status, HostVO_.hypervisorType and HostVO_.state) to exclude Disabled as
well—e.g. extend the notIn/state check to include HostState.Disabled (or add
.ne(HostVO_.state, HostState.Disabled)) so only Connected + Enabled + KVM hosts
are returned for attach/deploy operations.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from haidong.pang:

Disable的主机也应当deploy。避免该主机enable后无法连接存储。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MatheMatrix,感谢 haidong.pang 的解释!这个设计考量是合理的——Disabled 状态只是管理员主动禁用调度,主机仍处于 Connected 状态,提前部署好 ZBS client 可以确保主机重新 Enabled 后立即能连接存储,不需要再等重连触发。与此同时 deployClient 是幂等操作,多次调用无副作用。

我之前建议排除 Disabled 主机是错误的,请忽略该建议。


✏️ Learnings added
Learnt from: MatheMatrix
Repo: MatheMatrix/zstack PR: 4092
File: storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java:122-126
Timestamp: 2026-05-28T07:21:00.466Z
Learning: In ExternalPrimaryStorage.attachHook() (storage/src/main/java/org/zstack/storage/addon/primary/ExternalPrimaryStorage.java), the host filter intentionally includes Disabled hosts (only PreMaintenance and Maintenance are excluded). This is by design: Disabled hosts are still Connected and should have the ZBS client deployed proactively so that when they are re-enabled they can immediately connect to storage without waiting for a reconnect. deployClient is idempotent so deploying to Disabled hosts is safe. Do NOT suggest filtering out Disabled hosts from this query.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 3826
File: compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java:261-265
Timestamp: 2026-04-22T04:38:00.864Z
Learning: 在 ZStack 的 `Platform.err()` / `Platform.toRawString()` 调用中,传入用于格式化的 `%s` 参数若实现了 `org.zstack.header.core.I18nMessage`,框架会自动通过 `getDetails()` 取得原始文本、并通过 `getI18nDetails()` 取得国际化文本。因此,在代码审查中不应将“直接把实现了 `I18nMessage` 的对象作为 `%s` 参数传入而可能打印出对象地址/不安全”当作问题;此用法是正确且应当被视为安全的(例如将 `HostCandidate.RejectedCandidate` 这类实现了 `I18nMessage` 的对象作为 `%s` 参数传入)。

if (hosts.isEmpty()) {
completion.success();
return;
}

List<HostInventory> hostInventories = HostInventory.valueOf(hosts);
// Deploy-client is idempotent. Hosts prepared before an attach failure are left for
// the next attach or reconnect to overwrite.
new While<>(hostInventories).each((host, compl) -> {
node.deployClient(host, new Completion(compl) {
@Override
public void success() {
compl.done();
}

@Override
public void fail(ErrorCode errorCode) {
compl.addError(errorCode);
compl.allDone();
}
});
}).run(new WhileDoneCompletion(completion) {
@Override
public void done(ErrorCodeList errorCodeList) {
if (!errorCodeList.getCauses().isEmpty()) {
completion.fail(errorCodeList.getCauses().get(0));
return;
}

activateHeartbeatVolumeForAttach(hostInventories.get(0), completion);
}
});
}

private void activateHeartbeatVolumeForAttach(HostInventory host, Completion completion) {
node.activateHeartbeatVolume(host, new ReturnValueCompletion<HeartbeatVolumeTopology>(completion) {
@Override
public void success(HeartbeatVolumeTopology returnValue) {
completion.success();
}

@Override
public void fail(ErrorCode errorCode) {
completion.fail(errorCode);
}
});
}

@Override
public void handleMessage(Message msg) {
if (msg instanceof PrimaryStorageMessage && !destMaker.isManagedByUs(((PrimaryStorageMessage) msg).getPrimaryStorageUuid())) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ import org.zstack.header.storage.addon.primary.ExternalPrimaryStorageSpaceVO_
import org.zstack.header.storage.addon.primary.PrimaryStorageOutputProtocolRefVO
import org.zstack.header.storage.primary.PrimaryStorageCapacityVO
import org.zstack.header.storage.primary.PrimaryStorageCapacityVO_
import org.zstack.header.storage.primary.PrimaryStorageClusterRefVO
import org.zstack.header.storage.primary.PrimaryStorageClusterRefVO_
import org.zstack.header.storage.primary.PrimaryStorageHostRefVO
import org.zstack.header.storage.primary.PrimaryStorageHostRefVO_
import org.zstack.header.storage.primary.PrimaryStorageStatus
Expand Down Expand Up @@ -176,6 +178,9 @@ class ZbsPrimaryStorageCase extends SubCase {
testSyncPrimaryStorageCapacityConcurrently()
testDefaultConfig()
testUpdateExternalPrimaryStorage()
testPrepareHostsWhenAttachPrimaryStorageToCluster()
testAttachPrimaryStorageFailsWhenPreparingUsableHostFails()
testAttachPrimaryStorageFailsWhenActivatingHeartbeatVolumeFails()
testMdsConnectFailed()
testLifecycle()
testDataVolumeLifecycle()
Expand Down Expand Up @@ -486,7 +491,205 @@ class ZbsPrimaryStorageCase extends SubCase {
assert rc.value == ZbsConstants.VOLUME_PHYSICAL_BLOCK_SIZE
}

void testPrepareHostsWhenAttachPrimaryStorageToCluster() {
KVMHostInventory kvm2 = env.inventoryByName("kvm-2") as KVMHostInventory
KVMHostInventory kvm3 = env.inventoryByName("kvm-3") as KVMHostInventory
changeHostState {
uuid = kvm3.uuid
stateEvent = "maintain"
}

List<String> attachCalls = Collections.synchronizedList(new ArrayList<>())
AtomicInteger attachCheckHostStatusCount = new AtomicInteger(0)
env.afterSimulator(ZbsStorageController.DEPLOY_CLIENT_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.DeployClientCmd)
attachCalls.add("deploy:${cmd.ip}".toString())
return rsp
}
env.afterSimulator(ZbsStorageController.CREATE_VOLUME_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.CreateVolumeCmd)
if (cmd.volume == ZbsConstants.ZBS_HEARTBEAT_VOLUME_NAME) {
attachCalls.add("heartbeat:${cmd.logicalPool}".toString())
ZbsStorageController.CreateVolumeRsp createVolumeRsp = new ZbsStorageController.CreateVolumeRsp()
createVolumeRsp.installPath = "zbs://${cmd.logicalPool}/${cmd.volume}"
return createVolumeRsp
}

return rsp
}
env.afterSimulator(ZbsStorageController.CHECK_HOST_STORAGE_CONNECTION_PATH) { rsp, HttpEntity<String> e ->
attachCheckHostStatusCount.incrementAndGet()
return rsp
}

attachPrimaryStorageToCluster {
primaryStorageUuid = ps.uuid
clusterUuid = cluster.uuid
}

assert attachCalls.findAll { it.startsWith("deploy:") }.sort() == ["deploy:127.0.0.1", "deploy:127.0.0.2"]
assert attachCalls.take(2).every { it.startsWith("deploy:") }
assert attachCalls.findAll { it.startsWith("heartbeat:") }.sort() == ["heartbeat:lpool1", "heartbeat:lpool2"]
assert attachCheckHostStatusCount.get() == 0

env.cleanAfterSimulatorHandlers()
changeHostState {
uuid = kvm3.uuid
stateEvent = "enable"
}
retryInSecs {
HostInventory host = queryHost {
conditions = ["uuid=${kvm3.uuid}"]
}[0] as HostInventory
assert host.state == "Enabled"
assert host.status == "Connected"
}

List<String> reconnectCalls = Collections.synchronizedList(new ArrayList<>())
AtomicInteger reconnectCheckHostStatusCount = new AtomicInteger(0)
env.afterSimulator(ZbsStorageController.DEPLOY_CLIENT_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.DeployClientCmd)
if (cmd.ip == "127.0.0.2") {
reconnectCalls.add("deploy:${cmd.ip}".toString())
}
return rsp
}
env.afterSimulator(ZbsStorageController.CREATE_VOLUME_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.CreateVolumeCmd)
if (cmd.volume == ZbsConstants.ZBS_HEARTBEAT_VOLUME_NAME) {
reconnectCalls.add("heartbeat:${cmd.logicalPool}".toString())
ZbsStorageController.CreateVolumeRsp createVolumeRsp = new ZbsStorageController.CreateVolumeRsp()
createVolumeRsp.installPath = "zbs://${cmd.logicalPool}/${cmd.volume}"
return createVolumeRsp
}

return rsp
}
env.afterSimulator(ZbsStorageController.CHECK_HOST_STORAGE_CONNECTION_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.CheckHostStorageConnectionCmd)
if (cmd.hostUuid == kvm2.uuid) {
reconnectCheckHostStatusCount.incrementAndGet()
}
ZbsStorageController.CheckHostStorageConnectionRsp checkHostStorageConnectionRsp = new ZbsStorageController.CheckHostStorageConnectionRsp()
checkHostStorageConnectionRsp.success = true
return checkHostStorageConnectionRsp
}

reconnectHost {
uuid = kvm2.uuid
}

assert reconnectCalls.contains("deploy:127.0.0.2")
assert reconnectCalls.containsAll(["heartbeat:lpool1", "heartbeat:lpool2"])
assert reconnectCheckHostStatusCount.get() > 0

if (Q.New(PrimaryStorageClusterRefVO.class)
.eq(PrimaryStorageClusterRefVO_.primaryStorageUuid, ps.uuid)
.eq(PrimaryStorageClusterRefVO_.clusterUuid, cluster.uuid)
.isExists()) {
detachPrimaryStorageFromCluster {
primaryStorageUuid = ps.uuid
clusterUuid = cluster.uuid
}
}
}

void testAttachPrimaryStorageFailsWhenPreparingUsableHostFails() {
env.cleanAfterSimulatorHandlers()

PrimaryStorageStatus statusBeforeAttach = Q.New(ExternalPrimaryStorageVO.class)
.select(ExternalPrimaryStorageVO_.status)
.eq(ExternalPrimaryStorageVO_.uuid, ps.uuid)
.findValue()
AtomicInteger failedDeployCount = new AtomicInteger(0)
AtomicInteger heartbeatCount = new AtomicInteger(0)
env.afterSimulator(ZbsStorageController.DEPLOY_CLIENT_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.DeployClientCmd)
if (cmd.ip == "127.0.0.1") {
failedDeployCount.incrementAndGet()
ZbsStorageController.DeployClientRsp deployClientRsp = new ZbsStorageController.DeployClientRsp()
deployClientRsp.success = false
deployClientRsp.error = "on purpose"
return deployClientRsp
}

return rsp
}
env.afterSimulator(ZbsStorageController.CREATE_VOLUME_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.CreateVolumeCmd)
if (cmd.volume == ZbsConstants.ZBS_HEARTBEAT_VOLUME_NAME) {
heartbeatCount.incrementAndGet()
}

return rsp
}

expect(AssertionError.class) {
attachPrimaryStorageToCluster {
primaryStorageUuid = ps.uuid
clusterUuid = cluster.uuid
}
}

assert !Q.New(PrimaryStorageClusterRefVO.class)
.eq(PrimaryStorageClusterRefVO_.primaryStorageUuid, ps.uuid)
.eq(PrimaryStorageClusterRefVO_.clusterUuid, cluster.uuid)
.isExists()
assert Q.New(ExternalPrimaryStorageVO.class)
.select(ExternalPrimaryStorageVO_.status)
.eq(ExternalPrimaryStorageVO_.uuid, ps.uuid)
.findValue() == statusBeforeAttach
assert failedDeployCount.get() == 1
assert heartbeatCount.get() == 0
}

void testAttachPrimaryStorageFailsWhenActivatingHeartbeatVolumeFails() {
env.cleanAfterSimulatorHandlers()

PrimaryStorageStatus statusBeforeAttach = Q.New(ExternalPrimaryStorageVO.class)
.select(ExternalPrimaryStorageVO_.status)
.eq(ExternalPrimaryStorageVO_.uuid, ps.uuid)
.findValue()
AtomicInteger deployCount = new AtomicInteger(0)
AtomicInteger failedHeartbeatCount = new AtomicInteger(0)
env.afterSimulator(ZbsStorageController.DEPLOY_CLIENT_PATH) { rsp, HttpEntity<String> e ->
deployCount.incrementAndGet()
return rsp
}
env.afterSimulator(ZbsStorageController.CREATE_VOLUME_PATH) { rsp, HttpEntity<String> e ->
def cmd = JSONObjectUtil.toObject(e.body, ZbsStorageController.CreateVolumeCmd)
if (cmd.volume == ZbsConstants.ZBS_HEARTBEAT_VOLUME_NAME) {
failedHeartbeatCount.incrementAndGet()
ZbsStorageController.CreateVolumeRsp createVolumeRsp = new ZbsStorageController.CreateVolumeRsp()
createVolumeRsp.setError("on purpose")
return createVolumeRsp
}

return rsp
}

expect(AssertionError.class) {
attachPrimaryStorageToCluster {
primaryStorageUuid = ps.uuid
clusterUuid = cluster.uuid
}
}

assert !Q.New(PrimaryStorageClusterRefVO.class)
.eq(PrimaryStorageClusterRefVO_.primaryStorageUuid, ps.uuid)
.eq(PrimaryStorageClusterRefVO_.clusterUuid, cluster.uuid)
.isExists()
assert Q.New(ExternalPrimaryStorageVO.class)
.select(ExternalPrimaryStorageVO_.status)
.eq(ExternalPrimaryStorageVO_.uuid, ps.uuid)
.findValue() == statusBeforeAttach
assert deployCount.get() > 0
assert failedHeartbeatCount.get() == 1
}

void testLifecycle() {
env.cleanAfterSimulatorHandlers()

updateExternalPrimaryStorage {
uuid = ps.uuid
name = "test-zbs-new-name"
Expand Down