<fix>[kvm]: TIC-5776 backport memory hotfix#4066
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (12)
功能概览该 PR 为 ZStack KVM 插件引入 KVM 代理物理内存监控与自适应 ping 跳过机制。当代理内存使用超过硬限制时自动触发重启流程,当代理处于繁忙状态时动态跳过主机 ping,同时支持跨管理节点的任务队列空闲检查以保证重启安全性。 变更内容KVM 代理内存监控与 Ping 自适应控制
🎯 4 (复杂) | ⏱️ ~45 分钟
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java`:
- Around line 130-133: Read skippedPingHostDeadline.getIfPresent(uuid) once into
a local Long variable (e.g., Long deadline =
skippedPingHostDeadline.getIfPresent(uuid)) in HostTrackImpl (the method
containing the shown block), then check deadline != null &&
System.currentTimeMillis() / 1000 <= deadline and use that same deadline
variable in the logger.debug call; this avoids repeated cache reads, possible
NPE/auto-unboxing, and ensures the logged value matches the checked value while
keeping the continueToRunThisTimer() and return behavior unchanged.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 879-895: The current flow always calls trigger.next() even when
ErrorCodeList shows all retry probes failed (the block in the
done(ErrorCodeList) uses errorCodeList.getCauses().size() == retryCount but
still calls trigger.next()), and then sends a fire-and-forget ReconnectHostMsg
which can cause RestartKvmAgentReply to be reported successful prematurely;
change the done(ErrorCodeList) handler to call trigger.fail(...) or set an error
in the workflow when probes exhausted (use the ErrorCode from ErrorCodeList to
construct the failure), and change the subsequent reconnect step to send
ReconnectHostMsg with a callback (bus.send with a reply handler) and only call
trigger.next() when the reconnect reply indicates success; reference
ErrorCodeList, retryCount, trigger.next()/trigger.fail(), ReconnectHostMsg and
the reconnect run(FlowTrigger, Map) to locate and update the code.
- Around line 747-756: GetLocalTaskReply gr may not contain an entry for the
syncSignature so ChainInfo chainInfo = gr.getResults().get(syncSignature) can be
null and calling chainInfo.getRunningTask() causes an NPE; update the block
around GetLocalTaskReply/chainInfo to null-check the result from
gr.getResults().get(syncSignature) (using the local variables gr, syncSignature,
ChainInfo) and treat a null ChainInfo as "no running tasks" (i.e. call
compl.done() and continue) while preserving the existing branch where a
non-empty chainInfo.getRunningTask() sets queueIsEmpty to false, logs via
logger.debug (including self.getUuid(), mnId, buildTaskInfo(...)) and calls
compl.allDone()/return.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java`:
- Around line 544-550: The switch directly uses data.getProcessName() which may
be null and cause an NPE; update KVMHostFactory to first store
data.getProcessName() into a local variable, check for null/empty (log a
suitable message via logger.debug or handle as default) and only then perform
the switch on that non-null value; keep the existing case "zstack-kvmagent"
calling processKvmagentPhysicalMemUsageAbnormal(cmd) and the default branch
logging unknown process names.
In
`@test/src/test/groovy/org/zstack/test/integration/kvm/host/KVMPingCase.groovy`:
- Around line 126-137: This test mutates
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD and
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT without restoring
them, risking cross-test interference; wrap the modifications and assertions
(the blocks using SizeUtils.sizeStringToBytes and retryInSecs) in a try/finally
where you capture the original values of
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD and
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT before changing them
and restore those originals in the finally block so the global config is
returned to its prior state regardless of test outcome.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5dadea20-c8c4-4580-9182-df8d1b308a89
⛔ Files ignored due to path filters (2)
conf/globalConfig/kvm.xmlis excluded by!**/*.xmltest/src/test/resources/globalConfig/kvm.xmlis excluded by!**/*.xml
📒 Files selected for processing (12)
compute/src/main/java/org/zstack/compute/host/HostTrackImpl.javacore/src/main/java/org/zstack/core/CoreManagerImpl.javaheader/src/main/java/org/zstack/header/core/GetLocalTaskMsg.javaheader/src/main/java/org/zstack/header/host/HostCanonicalEvents.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMGlobalConfig.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHost.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.javaplugin/kvm/src/main/java/org/zstack/kvm/RestartKvmAgentMsg.javaplugin/kvm/src/main/java/org/zstack/kvm/RestartKvmAgentReply.javatest/src/test/groovy/org/zstack/test/integration/kvm/host/KVMPingCase.groovy
| if (skippedPingHostDeadline.getIfPresent(uuid) != null && System.currentTimeMillis() / 1000 <= skippedPingHostDeadline.getIfPresent(uuid)) { | ||
| logger.debug(String.format("skip tracking host[uuid:%s] this time, deadline %s", uuid, skippedPingHostDeadline.getIfPresent(uuid))); | ||
| continueToRunThisTimer(); | ||
| return; |
There was a problem hiding this comment.
避免重复读取缓存导致潜在空指针
Line [130]-Line [131] 对 getIfPresent(uuid) 进行了重复读取;在并发 invalidate/expire 时,后续读取可能变为 null,触发自动拆箱异常。建议先读取到局部变量再判断与打印。
建议修复
- if (skippedPingHostDeadline.getIfPresent(uuid) != null && System.currentTimeMillis() / 1000 <= skippedPingHostDeadline.getIfPresent(uuid)) {
- logger.debug(String.format("skip tracking host[uuid:%s] this time, deadline %s", uuid, skippedPingHostDeadline.getIfPresent(uuid)));
+ Long deadline = skippedPingHostDeadline.getIfPresent(uuid);
+ if (deadline != null && System.currentTimeMillis() / 1000 <= deadline) {
+ logger.debug(String.format("skip tracking host[uuid:%s] this time, deadline %s", uuid, deadline));
continueToRunThisTimer();
return;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (skippedPingHostDeadline.getIfPresent(uuid) != null && System.currentTimeMillis() / 1000 <= skippedPingHostDeadline.getIfPresent(uuid)) { | |
| logger.debug(String.format("skip tracking host[uuid:%s] this time, deadline %s", uuid, skippedPingHostDeadline.getIfPresent(uuid))); | |
| continueToRunThisTimer(); | |
| return; | |
| Long deadline = skippedPingHostDeadline.getIfPresent(uuid); | |
| if (deadline != null && System.currentTimeMillis() / 1000 <= deadline) { | |
| logger.debug(String.format("skip tracking host[uuid:%s] this time, deadline %s", uuid, deadline)); | |
| continueToRunThisTimer(); | |
| return; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@compute/src/main/java/org/zstack/compute/host/HostTrackImpl.java` around
lines 130 - 133, Read skippedPingHostDeadline.getIfPresent(uuid) once into a
local Long variable (e.g., Long deadline =
skippedPingHostDeadline.getIfPresent(uuid)) in HostTrackImpl (the method
containing the shown block), then check deadline != null &&
System.currentTimeMillis() / 1000 <= deadline and use that same deadline
variable in the logger.debug call; this avoids repeated cache reads, possible
NPE/auto-unboxing, and ensures the logged value matches the checked value while
keeping the continueToRunThisTimer() and return behavior unchanged.
| GetLocalTaskReply gr = r.castReply(); | ||
| ChainInfo chainInfo = gr.getResults().get(syncSignature); | ||
| if (!chainInfo.getRunningTask().isEmpty()) { | ||
| logger.debug(String.format("%s tasks exist on the running task of host %s, mnNodeId %s: %s...", chainInfo.getRunningTask().size(), | ||
| self.getUuid(), mnId, buildTaskInfo(chainInfo.getRunningTask()))); | ||
| queueIsEmpty.set(false); | ||
| compl.allDone(); | ||
| return; | ||
| } | ||
| compl.done(); |
There was a problem hiding this comment.
补上空结果保护,避免跨节点查询在无任务节点上直接 NPE。
这里默认每个管理节点都会返回 syncSignature 对应的 ChainInfo,但没有本地任务的节点很可能不会带这个 key。此时 gr.getResults().get(syncSignature) 会是 null,后面的 chainInfo.getRunningTask() 会让重启流程在正常场景下直接失败。
建议修改
GetLocalTaskReply gr = r.castReply();
- ChainInfo chainInfo = gr.getResults().get(syncSignature);
- if (!chainInfo.getRunningTask().isEmpty()) {
+ ChainInfo chainInfo = gr.getResults() == null ? null : gr.getResults().get(syncSignature);
+ List<? extends TaskInfo> runningTasks = chainInfo == null || chainInfo.getRunningTask() == null
+ ? Collections.emptyList()
+ : chainInfo.getRunningTask();
+ if (!runningTasks.isEmpty()) {
logger.debug(String.format("%s tasks exist on the running task of host %s, mnNodeId %s: %s...", chainInfo.getRunningTask().size(),
- self.getUuid(), mnId, buildTaskInfo(chainInfo.getRunningTask())));
+ self.getUuid(), mnId, buildTaskInfo(runningTasks)));
queueIsEmpty.set(false);
compl.allDone();
return;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 747 - 756,
GetLocalTaskReply gr may not contain an entry for the syncSignature so ChainInfo
chainInfo = gr.getResults().get(syncSignature) can be null and calling
chainInfo.getRunningTask() causes an NPE; update the block around
GetLocalTaskReply/chainInfo to null-check the result from
gr.getResults().get(syncSignature) (using the local variables gr, syncSignature,
ChainInfo) and treat a null ChainInfo as "no running tasks" (i.e. call
compl.done() and continue) while preserving the existing branch where a
non-empty chainInfo.getRunningTask() sets queueIsEmpty to false, logs via
logger.debug (including self.getUuid(), mnId, buildTaskInfo(...)) and calls
compl.allDone()/return.
| public void done(ErrorCodeList errorCodeList) { | ||
| if (errorCodeList.getCauses().size() == retryCount) { | ||
| logger.debug("waiting for kvmagent to start timeout: " + errorCodeList.getCauses().get(0).getDetails()); | ||
| } | ||
| trigger.next(); | ||
| } | ||
| }); | ||
| } | ||
| }).then(new NoRollbackFlow() { | ||
| String __name__ = String.format("reconnect host %s after restart kvmagent", self.getUuid()); | ||
|
|
||
| public void run(FlowTrigger trigger, Map data) { | ||
| ReconnectHostMsg rmsg = new ReconnectHostMsg(); | ||
| rmsg.setHostUuid(self.getUuid()); | ||
| bus.makeTargetServiceIdByResourceUuid(rmsg, HostConstant.SERVICE_ID, self.getUuid()); | ||
| bus.send(rmsg); | ||
| trigger.next(); |
There was a problem hiding this comment.
不要在 agent 仍未就绪时继续返回成功。
这里即使 60 次探测全部失败也会 trigger.next(),随后 ReconnectHostMsg 还是 fire-and-forget 发送,没有回调校验。结果是 RestartKvmAgentReply 可能在 agent 没起来、甚至重连失败时仍然返回成功,调用方会误判修复已经完成。
建议修改
}).run(new WhileDoneCompletion(trigger) {
`@Override`
public void done(ErrorCodeList errorCodeList) {
if (errorCodeList.getCauses().size() == retryCount) {
- logger.debug("waiting for kvmagent to start timeout: " + errorCodeList.getCauses().get(0).getDetails());
+ trigger.fail(operr("kvmagent on host %s did not become ready after %s retries: %s",
+ self.getUuid(), retryCount, errorCodeList.getCauses().get(0).getDetails()));
+ return;
}
trigger.next();
}
});
}
}).then(new NoRollbackFlow() {
String __name__ = String.format("reconnect host %s after restart kvmagent", self.getUuid());
public void run(FlowTrigger trigger, Map data) {
ReconnectHostMsg rmsg = new ReconnectHostMsg();
rmsg.setHostUuid(self.getUuid());
bus.makeTargetServiceIdByResourceUuid(rmsg, HostConstant.SERVICE_ID, self.getUuid());
- bus.send(rmsg);
- trigger.next();
+ bus.send(rmsg, new CloudBusCallBack(trigger) {
+ `@Override`
+ public void run(MessageReply reply) {
+ if (reply.isSuccess()) {
+ trigger.next();
+ } else {
+ trigger.fail(reply.getError());
+ }
+ }
+ });
}
})🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 879 - 895,
The current flow always calls trigger.next() even when ErrorCodeList shows all
retry probes failed (the block in the done(ErrorCodeList) uses
errorCodeList.getCauses().size() == retryCount but still calls trigger.next()),
and then sends a fire-and-forget ReconnectHostMsg which can cause
RestartKvmAgentReply to be reported successful prematurely; change the
done(ErrorCodeList) handler to call trigger.fail(...) or set an error in the
workflow when probes exhausted (use the ErrorCode from ErrorCodeList to
construct the failure), and change the subsequent reconnect step to send
ReconnectHostMsg with a callback (bus.send with a reply handler) and only call
trigger.next() when the reconnect reply indicates success; reference
ErrorCodeList, retryCount, trigger.next()/trigger.fail(), ReconnectHostMsg and
the reconnect run(FlowTrigger, Map) to locate and update the code.
| switch (data.getProcessName()) { | ||
| case "zstack-kvmagent": | ||
| processKvmagentPhysicalMemUsageAbnormal(cmd); | ||
| break; | ||
| default: | ||
| logger.debug(String.format("unknown process name[%s] in host[uuid:%s]", cmd.getProcessName(), cmd.getHostUuid())); | ||
| } |
There was a problem hiding this comment.
避免在 switch 中直接使用可能为 null 的 processName
Line 544 当前对 data.getProcessName() 直接 switch,上报缺失该字段时会抛出 NPE,导致该回调处理失败。建议先做空值校验,再进行分支判断。
🔧 建议修改
- switch (data.getProcessName()) {
- case "zstack-kvmagent":
- processKvmagentPhysicalMemUsageAbnormal(cmd);
- break;
- default:
- logger.debug(String.format("unknown process name[%s] in host[uuid:%s]", cmd.getProcessName(), cmd.getHostUuid()));
- }
+ String processName = cmd.getProcessName();
+ if (StringUtils.isBlank(processName)) {
+ logger.warn(String.format("empty process name in host[uuid:%s]", cmd.getHostUuid()));
+ return null;
+ }
+
+ if ("zstack-kvmagent".equals(processName)) {
+ processKvmagentPhysicalMemUsageAbnormal(cmd);
+ } else {
+ logger.debug(String.format("unknown process name[%s] in host[uuid:%s]", processName, cmd.getHostUuid()));
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 544
- 550, The switch directly uses data.getProcessName() which may be null and
cause an NPE; update KVMHostFactory to first store data.getProcessName() into a
local variable, check for null/empty (log a suitable message via logger.debug or
handle as default) and only then perform the switch on that non-null value; keep
the existing case "zstack-kvmagent" calling
processKvmagentPhysicalMemUsageAbnormal(cmd) and the default branch logging
unknown process names.
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("100M")) | ||
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("200M")) | ||
| retryInSecs { | ||
| assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("200M") | ||
| assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("100M") | ||
| } | ||
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("5G")) | ||
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("20G")) | ||
| retryInSecs { | ||
| assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("20G") | ||
| assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("5G") | ||
| } |
There was a problem hiding this comment.
测试修改全局配置后应恢复,避免用例串扰
Line [126]-Line [137] 在测试中多次修改全局配置,但未恢复原值,后续用例可能受到影响。建议在该测试内用 try/finally 还原。
建议修复
+ def oldAlarm = KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.value(Long.class)
+ def oldHardLimit = KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.value(Long.class)
+ try {
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("100M"))
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("200M"))
retryInSecs {
assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("200M")
assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("100M")
}
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("5G"))
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("20G"))
retryInSecs {
assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("20G")
assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("5G")
}
+ } finally {
+ KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(oldAlarm)
+ KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(oldHardLimit)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("100M")) | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("200M")) | |
| retryInSecs { | |
| assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("200M") | |
| assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("100M") | |
| } | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("5G")) | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("20G")) | |
| retryInSecs { | |
| assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("20G") | |
| assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("5G") | |
| } | |
| def oldAlarm = KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.value(Long.class) | |
| def oldHardLimit = KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.value(Long.class) | |
| try { | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("100M")) | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("200M")) | |
| retryInSecs { | |
| assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("200M") | |
| assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("100M") | |
| } | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(SizeUtils.sizeStringToBytes("5G")) | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(SizeUtils.sizeStringToBytes("20G")) | |
| retryInSecs { | |
| assert kvmagent_mem_hard_limit == SizeUtils.sizeStringToBytes("20G") | |
| assert kvmagent_mem_threshold == SizeUtils.sizeStringToBytes("5G") | |
| } | |
| } finally { | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD.updateValue(oldAlarm) | |
| KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT.updateValue(oldHardLimit) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/src/test/groovy/org/zstack/test/integration/kvm/host/KVMPingCase.groovy`
around lines 126 - 137, This test mutates
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD and
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT without restoring
them, risking cross-test interference; wrap the modifications and assertions
(the blocks using SizeUtils.sizeStringToBytes and retryInSecs) in a try/finally
where you capture the original values of
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_ALARM_THRESHOLD and
KVMGlobalConfig.KVMAGENT_PHYSICAL_MEMORY_USAGE_HARD_LIMIT before changing them
and restore those originals in the finally block so the global config is
returned to its prior state regardless of test outcome.
1、support agent memory monitoring and alarm 2、support automatic dump of thread stack and tracking of objects when the memory continues to increase 3、automatically restart the process when the memory exceeds the threshold Resolves/Related: ZSTAC-71000 Change-Id: I676f656f6f6d6b6565656e737677666667767170 (cherry picked from commit 8fc2e9d)
1、skip host ping when dumping kvmagent 2、Wait for the kvmagent service to fully start before reconnecting to the host to avoid restarting twice. Resolves/Related: ZSTAC-72552 Change-Id: I676f656f6f6d6b6565656e737677666667767176 (cherry picked from commit 914853d) # Conflicts: # plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java
6197976 to
fb9e84e
Compare
Summary
Backport TIC-5776 kvmagent memory hotfix into feature-ky10-gfb.
Includes the ZSTAC-71000 management-node KVM memory monitor implementation and the ZSTAC-72552 host ping/restart safeguard.
Changes
Testing
Related: TIC-5776
Backports: ZSTAC-71000, ZSTAC-72552
sync from gitlab !9965