Skip to content

<feature>[vm]: optimize the statistics method of OS running time on virtual machines#4096

Open
zstack-robot-2 wants to merge 2 commits into
zsv_5.1.0from
sync/zstackio/ZSV-12297
Open

<feature>[vm]: optimize the statistics method of OS running time on virtual machines#4096
zstack-robot-2 wants to merge 2 commits into
zsv_5.1.0from
sync/zstackio/ZSV-12297

Conversation

@zstack-robot-2
Copy link
Copy Markdown
Collaborator

DBImpact

Resolves: ZSV-12297

Change-Id: I6a746a746b6b6a6172646470627a64636173656c

sync from gitlab !9994

…irtual machines

DBImpact

Resolves: ZSV-12297

Change-Id: I6a746a746b6b6a6172646470627a64636173656c
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Review Change Stack

Warning

Review limit reached

@MatheMatrix, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 32 minutes and 7 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 79e58af8-a57a-4598-b630-8178219f1c20

📥 Commits

Reviewing files that changed from the base of the PR and between 5f2d534 and 53ad7b6.

📒 Files selected for processing (3)
  • compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • conf/db/zsv/V5.1.0__schema.sql

Walkthrough

新增 VmInstance.bootTime 列与视图暴露;新增 VmBootTimeUtils 提供格式化、读写、重置与条件回填;VmInstanceBase 在状态变更时同步 bootTime 并优先本地返回 uptime;KVM 重启事件触发重置。

变更内容

VM 启动时间追踪功能

Layer / File(s) Summary
数据模型与 ORM 映射
conf/db/zsv/V5.1.0__schema.sql, header/src/main/java/org/zstack/header/vm/VmInstanceAO.java, header/src/main/java/org/zstack/header/vm/VmInstanceAO_.java
在 VmInstanceEO 表新增 bootTime(timestamp NULL)并重建视图以暴露该字段;VmInstanceAO 新增持久化字段 bootTime、拷贝构造复制、getter/setter;JPA 元模型新增 bootTime 声明。
启动时间工具类实现
compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java
新增 VmBootTimeUtils,包含常量 STOPPED_UPTIMEUNKNOWN_UPTIME、固定格式化器 yyyy-MM-dd HH:mm:ss;提供 isBootTimeValidStategetBootTimeresetBootTimeclearBootTimebackfillBootTimeIfMissingvmExistssetBootTimeformatBootTimeparseBootTime 等方法,实现基于 DB 的读取、写入与条件回填。
VM 实例状态与 Uptime 查询处理
compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
引入静态 vmBootTimeUtils;在 changeVmStateInDb 写入新状态后,根据旧/新状态与当前值决定将 bootTime 置空、保留或重置为当前时间;APIGetVmUptimeMsg 优先使用本地 vmBootTimeUtils.getBootTime(vmUuid) 返回 uptime,缺失时仍使用总线兜底并在回调成功后调用 backfillBootTimeIfMissing 回填。
KVM 重启事件集成
plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java
在 KVM 重启事件处理(KVM_REPORT_VM_REBOOT_EVENT)回调中调用 vmBootTimeUtils.resetBootTime(cmd.vmUuid),以在主机报告虚拟机重启时同步重置 VM 的启动时间。

🎯 3 (Moderate) | ⏱️ ~20 minutes

"🐰 我数着重启的步伐,
数据库里藏着每次起跑,
时间格式像胡萝卜的纹路,
从 Running 到 Stopped 我默默记录,
兔子也要记日志,咕噜咕噜~"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确概括了主要变更,即优化虚拟机OS运行时间(启动时间/正常运行时间)的统计方法。
Description check ✅ Passed 描述与变更集相关联,虽然信息量有限,但涵盖了issue关联、change-id和同步来源等必要信息,与新增bootTime字段和相关逻辑调整相符。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/zstackio/ZSV-12297

Comment @coderabbitai help to get the list of available commands and usage tips.

@MatheMatrix
Copy link
Copy Markdown
Owner

Comment from yaohua.wu:

Review: MR !9994 — ZSV-12297

Background (preserved across rounds)

  • Jira: ZSV-12297 — VM运行时间保持准确性(Improvement,P2,陈泰岳)
  • Bug summary: 原 APIGetVmUptimeMsg 通过向宿主机 agent 发 GetVmUptimeMsg 获取 VM 实际运行时长,依赖 agent 通道、有可靠性/性能问题。本次将 VM 运行时长来源切换为持久化到 DB 的 bootTime 字段,由状态机和 KVM reboot 事件驱动维护。
  • Intent & scope: 在 VmInstanceEO 新增 bootTime 字段;新增 VmBootTimeUtils 工具类负责读/写/清;在 VmInstanceBase.changeVmStateInDb 状态切换时调用、在 KVMHostFactoryKVM_REPORT_VM_REBOOT_EVENT handler 中调用;APIGetVmUptimeMsg 改为直接读 DB 不再走 host。影响 6 个文件,含 1 个 schema 迁移。
  • Round 1 initial findings: 2 × P0 (API 语义破坏 + 升级后存量 VM 丢失真实 boot time),4 × P1 (状态切换非原子 / 状态机覆盖不全 / TOCTOU / 时区缺失),5 × P2 (静态-实例混用 / 无单测 / 视图列维护 / 文件末尾缺换行 / 复制构造遗漏 reservedMemorySize)
  • Suggested fix direction: (a) 不要悄悄改 APIGetVmUptimeReply.uptime 的语义,或者新增字段 bootTime 同时保留 uptime 为持续时长,给前端/外部消费者一个过渡口子;(b) 升级时对存量 Running VM 做一次性 backfill(或第一次查询时 fallback 到 host agent,再缓存),不能直接把"now"当 boot time 写回;(c) 把 bootTime 更新与 state 更新合到同一事务;(d) 补 Starting/Rebooting/Stopped 之外的 → Running 路径(Unknown/Crashed);(e) 补 VmBootTimeUtils 单测。

🔴 P0 — Critical

# File:Line Issue Fix
1 compute/.../VmInstanceBase.java:3739-3743 & compute/.../VmBootTimeUtils.java:14-27 APIGetVmUptimeReply.uptime 语义被悄悄改了,对外契约不兼容。原实现走 host agent,GetVmUptimeReply.getUptime() 是宿主机/libvirt 报告的 VM 运行时长(一般是 uptime-like 持续时间秒数或 "X days Y hours")。新实现返回 "0""yyyy-MM-dd HH:mm:ss" 格式的绝对时间戳。字段名仍是 uptime,但语义从"持续时长"变成"启动时刻",所有现存监控/计费/SLA/Dashboard 消费者会拿到看不懂的字符串(甚至按时长解析后崩溃)。 两条路任选其一:(a) 保持 uptime 仍为持续时长字符串(在 reply 端基于 now - bootTime 计算后再 setUptime),新增 bootTime 字段单独暴露绝对时间;(b) 若确实要改语义,必须改 API 字段名(如 bootTime)+ 标记原 uptime deprecated + dev-portal 公告 + 同步前端,不能用同名字段塞不同语义的字符串
2 compute/.../VmBootTimeUtils.java:20-27 + conf/db/zsv/V5.1.0__schema.sql:39 升级后存量 Running VM 的 boot time 全部错成"升级时刻"。迁移 INSERT_COLUMN('VmInstanceEO', 'bootTime', ..., 1, NULL, ...) 给所有现存 VM 写入 NULL。升级后首次调用 getUptime 对 Running VM 会走 getBootTime → null → resetBootTime → 写入 now(),把"now"当作启动时刻持久化。一台已经运行 30 天的 VM 升级后 uptime 会回退到 0。原实现是查 host agent 拿真实值,这是对升级用户的硬性回归 需要 backfill。最稳:迁移脚本里对所有 state = 'Running' 的 VM,在升级时由管理节点发 GetVmUptimeMsg 拿到真实 boot time 后写入(一次性、可异步)。退而求其次:在 getUptime 路径中遇到 NULL 时先 fallback 到 host agent 查一次再缓存,不要把"now"当 boot time。最低限度也要返回 STOPPED_UPTIME / 空字符串而非 now,让上层显式知道"未知"。

🟡 P1 — High

# File:Line Issue Fix
1 compute/.../VmInstanceBase.java:398-401 state 切换和 bootTime 更新非原子changeVmStateInDb 改完 state 之后立刻 updateBootTimeAfterStateChanged,但后者是另一条 SQL.New(...).update(),跨事务。如果 state UPDATE 成功提交、bootTime UPDATE 因为偶发 DB 抖动失败,VM 进入 Running 但 bootTime 为 NULL,下一次 getUptime 又会把"now"写进去(同 P0-2)。state 流转链上的回滚逻辑也覆盖不到 bootTime。 两选一:(1) 把 bootTime 字段直接通过 self.setBootTime(...) 走当前 EntityManager 的 merge/persist,让它和 state 在同一事务;(2) 不在 state-change 时主动写 bootTime,让 getUptime 路径懒加载(但要先解决 P0-2 的 fallback)。无论哪种,都不应该让"state=Running 且 bootTime=NULL"成为一个可观察的稳定状态。
2 compute/.../VmInstanceBase.java:430-434 updateBootTimeAfterStateChanged 没覆盖 Unknown→Running 和 Crashed→Running。当前只在 oldState ∈ {Starting, Rebooting, Stopped}newState=Running 时 reset。Unknown→Running(agent 重连后恢复探测)以及 Crashed/Paused 自动恢复路径都不会触发。Unknown→Running 的语义是"我们重新感知到 VM 在跑",并非真的发生过 reboot,但因为 oldState 不在白名单里也不会触发"如果 NULL 就 reset"的兜底,会和 P0-2 叠加放大错误。 列举所有应当 reset 的转换;或反过来——只在 VM 真正重启(Stopped→Running、Rebooting→Running、KVM_REPORT_VM_REBOOT_EVENT)时 reset,Starting→Running 也算(首次启动)。Unknown→Running 不 reset 但要走"如果 NULL 就 fallback host 查一次"的逻辑,避免静默错值。
3 compute/.../VmBootTimeUtils.java:34-40 resetBootTimevmExistssetBootTime 有 TOCTOU。VM 可能在两次 SQL 之间被删,SQL.UPDATE 影响 0 行但方法照样返回一个格式化好的"假"时间戳,调用方拿到一个看似合法的字符串。在并发 delete + 状态切换路径下会出现幽灵 boot time。 直接合到一条 UPDATE 里并检查返回值:int n = SQL.New(...).set(bootTime, ts).update();n == 0return STOPPED_UPTIMEvmExists 这个独立步骤可以删掉。
4 compute/.../VmBootTimeUtils.java:15 yyyy-MM-dd HH:mm:ss 没带时区信息。多管理节点跨时区/DST 切换时,前端拿到的字符串语义不明确,做差值计算 uptime 会算错。原 host 实现返回的是持续时长,没有这个问题。 用 ISO-8601 带 offset(yyyy-MM-dd'T'HH:mm:ssXXX)或者干脆返回 epoch millis 让前端自己 format。如果产品坚持要本地显示字符串,至少在 dev-portal 注明这是"管理节点 local time"。

🟢 P2 — Moderate

# File:Line Issue Fix
1 header/.../VmInstanceAO.java:102-127(MR diff) 复制构造里加了 bootTime 拷贝,但 reservedMemorySize 等字段长期就漏拷贝(参考迁移 view 列表里有 reservedMemorySize)。本次没引入新漏拷贝,但本次正好在动这个方法,顺手对齐 view 字段集即可。 把 copy ctor 字段与 view 列表对齐,并加注释"新增字段必须同时更新此构造器、VmInstanceVO view、VmInstanceAO_ 元模型"。
2 compute/.../VmBootTimeUtils.java:14, 22, 30, 42 静态/实例方法混用getUptime/getBootTime/resetBootTime/clearBootTime 是实例方法,但内部 helper vmExists/setBootTime/formatBootTimestatic。两处调用点都用 private final static VmBootTimeUtils vmBootTimeUtils = new VmBootTimeUtils(); 当单例。要么全 static(推荐,类本身无状态),要么走 Spring @Component autowire,目前的写法二者皆非。 全部改为 public static,去掉 new VmBootTimeUtils() 单例字段。
3 (整 MR) 无单测VmBootTimeUtils 这种纯工具类 + 状态机驱动的逻辑,最容易在 state transition 集合上长草。本次还涉及 API 语义改动,应当至少覆盖:getUptime(stopped) / getUptime(running, has bootTime) / getUptime(running, null bootTime) / state 路径表里每条触发/不触发 reset 的转换。 VmBootTimeUtilsTest 单测;状态机覆盖建议挂到 ZTest 的 StabilityTestCase 里跑 2 次(按 CLAUDE.md 项目惯例)。
4 conf/db/zsv/V5.1.0__schema.sql:72compute/.../VmBootTimeUtils.java:83 两个新增文件都缺末尾换行符\ No newline at end of file)。 文件末尾加换行。
5 conf/db/zsv/V5.1.0__schema.sql:42-71 CREATE OR REPLACE VIEW VmInstanceVO 硬编码 26 列。本身是 ZStack schema 维护的惯例,但任何并行 MR 给 VmInstanceEO 加列又改 view 时,merge 顺序会导致后到的 MR 静默丢列。本次 view 列表和当前 VmInstanceAO 字段集是否完全一致需要在 merge 之前重新核对一次(特别是 reservedMemorySize 已经出现在列表里,但本仓 VmInstanceAO.java 本地版本里没找到对应字段——意味着源分支上还有未在此 MR 里展示的依赖)。 merge 前用 source 分支的 VmInstanceAO 字段集与 view 列表 diff 一次;考虑在迁移注释里写明本次新增 bootTime,并 grep 同窗口期是否有别的 MR 在改 VmInstanceVO

Pre-existing

本轮无 pre-existing 类问题需单独列出。

Coverage

  • 高信号(多角度独立命中):P0-1(语义不兼容)、P0-2(升级回归)、P1-1(非原子)。
  • 单源(需作者/二审复核):P1-2(状态机覆盖)、P1-4(时区)。
  • Upstream freshness:未发现 zsv_5.1.0 上游有冲突文件改动(changes_count=6,base/head 一致)。
  • Residual risks: API 字段语义改动会直接影响 dev-portal/UI/外部 SDK;需要产品 + 前端确认后再合并。
  • Testing gaps: 无任何新增测试;建议覆盖 VmBootTimeUtils 全部分支 + VmInstanceBase state transition 集合 + KVM reboot 事件路径。

Verdict: REVISION_REQUIRED

两条 P0 都属于"会让现网升级用户立即感知到错误"的级别(API 契约变更 + 存量 VM uptime 归零),合并前必须修复或与产品/SDK 团队达成"接受破坏 + 同步外部"的明确决策。P1 列表里的非原子/状态机覆盖问题与 P0-2 是同一类根因,建议一并处理。


🤖 Robot Reviewer

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java (1)

543-545: ⚡ Quick win

建议为重启事件增加 vmUuid 空值短路。

当前新增了 bootTime 重置调用,先做空值防护可避免异常输入导致回调链路抖动。

建议修改
restf.registerSyncHttpCallHandler(KVMConstant.KVM_REPORT_VM_REBOOT_EVENT, ReportVmRebootEventCmd.class, cmd -> {
+    if (StringUtils.isBlank(cmd.vmUuid)) {
+        logger.warn("ignore vm reboot event because vmUuid is blank");
+        return null;
+    }
     vmBootTimeUtils.resetBootTime(cmd.vmUuid);
     evf.fire(VmCanonicalEvents.VM_LIBVIRT_REPORT_REBOOT, cmd.vmUuid);

     return null;
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java` around lines 543
- 545, KVMHostFactory's handler for KVMConstant.KVM_REPORT_VM_REBOOT_EVENT calls
vmBootTimeUtils.resetBootTime(cmd.vmUuid) and evf.fire(...) without guarding
cmd.vmUuid; add a null/empty check for cmd.vmUuid at the start of the lambda
(for ReportVmRebootEventCmd) and short-circuit (return) if missing, optionally
logging a warning, so neither vmBootTimeUtils.resetBootTime nor
evf.fire(VmCanonicalEvents.VM_LIBVIRT_REPORT_REBOOT, cmd.vmUuid) runs with a
null/empty uuid.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 3742-3743: 当前把 APIGetVmUptimeMsg 在 VmInstanceBase 中统一改成读取
vmBootTimeUtils.getUptime(self.getUuid(), self.getState()) 会导致非 KVM 的虚拟机在 guest
重启但 host 状态仍为 Running 时返回错误的累计 uptime;请修改处理逻辑:在 VmInstanceBase 对该消息的处理里根据虚拟机所处
host 的 hypervisor 类型(或通过现有 host factory 判定)仍保留非 KVM 的 host-side 查询兜底(即对非 KVM
调用原来的从 Host/HostFactory 查询 uptime 的代码路径),仅在 KVM(或明确已同步 reboot 事件的
hypervisor)时才使用 vmBootTimeUtils.getUptime(self.getUuid(), self.getState());
参考符号:VmInstanceBase, APIGetVmUptimeMsg, vmBootTimeUtils.getUptime 和
KVMHostFactory 来定位并实现这个条件分支。

---

Nitpick comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java`:
- Around line 543-545: KVMHostFactory's handler for
KVMConstant.KVM_REPORT_VM_REBOOT_EVENT calls
vmBootTimeUtils.resetBootTime(cmd.vmUuid) and evf.fire(...) without guarding
cmd.vmUuid; add a null/empty check for cmd.vmUuid at the start of the lambda
(for ReportVmRebootEventCmd) and short-circuit (return) if missing, optionally
logging a warning, so neither vmBootTimeUtils.resetBootTime nor
evf.fire(VmCanonicalEvents.VM_LIBVIRT_REPORT_REBOOT, cmd.vmUuid) runs with a
null/empty uuid.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b4861025-3169-4860-b204-93e4a4042375

📥 Commits

Reviewing files that changed from the base of the PR and between 8740fbd and 88ff08e.

📒 Files selected for processing (6)
  • compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • conf/db/zsv/V5.1.0__schema.sql
  • header/src/main/java/org/zstack/header/vm/VmInstanceAO.java
  • header/src/main/java/org/zstack/header/vm/VmInstanceAO_.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java

Comment thread compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java Outdated
}
});
reply.setUptime(vmBootTimeUtils.getUptime(self.getUuid(), self.getState()));
bus.reply(msg, reply);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from taiyue.chen:

这个也和“当升级时,bootTime为null,所有虚拟机uptime被重置为0“有关。对于该问题,应当设置一个兜底策略,即当uptime为null时,应当从去从各个host上拿对应的时间。

@ZStack-Robot
Copy link
Copy Markdown
Collaborator

Comment from taiyue.chen:

re.getUptime() 这个返回的格式是 ”yyyy-MM-dd HH:mm:ss“,我已经在真机上测试过。所以,P0-1不成立。
原先版本也是返回,这个是什么时候boot的,所以P0-2也不成立。

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/ZSV-12297 branch 2 times, most recently from a84f8f4 to 3fa4d40 Compare May 26, 2026 05:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java (1)

43-50: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

resetBootTime 存在 TOCTOU 竞态。

Line 44 先 vmExists() 再在 Line 49 更新,期间目标记录可能被删除或状态变化,导致返回值与真实写入结果不一致。建议改成单条条件更新并基于 affected rows 判断结果,避免检查与写入分离。

可选修复示例
 public String resetBootTime(String vmUuid) {
-    if (!vmExists(vmUuid)) {
+    if (StringUtils.isBlank(vmUuid)) {
         return STOPPED_UPTIME;
     }

     Timestamp bootTime = Timestamp.valueOf(LocalDateTime.now());
-    setBootTime(vmUuid, bootTime);
-    return formatBootTime(bootTime);
+    int updated = SQL.New(VmInstanceVO.class)
+            .eq(VmInstanceVO_.uuid, vmUuid)
+            .set(VmInstanceVO_.bootTime, bootTime)
+            .update();
+    return updated > 0 ? formatBootTime(bootTime) : STOPPED_UPTIME;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java` around
lines 43 - 50, The resetBootTime method has a TOCTOU race: it calls
vmExists(vmUuid) then setBootTime(vmUuid, bootTime), which can race with
deletes/updates; change resetBootTime to perform a single conditional update
(e.g. "update ... where uuid = ? and <running/exists condition>") in one DB call
and use the returned affected-rows to decide whether to return STOPPED_UPTIME or
formatBootTime(bootTime); remove the separate vmExists check and ensure you call
the same setBootTime/SQL logic (or refactor setBootTime into an
update-returning-rows variant) so the method's return reflects the actual write
result.
🧹 Nitpick comments (1)
compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java (1)

106-109: ⚡ Quick win

解析失败被静默吞掉,排障信号丢失。

Line 108-109 直接忽略 DateTimeParseException,会让 host uptime 格式漂移时难以及时定位。建议至少记录 debug/warn 日志(包含 vmUuid/原始 uptime)。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java` around
lines 106 - 109, The DateTimeParseException in VmBootTimeUtils is currently
swallowed; update the catch block where bootTime is parsed (using
BOOT_TIME_FORMATTER and the bootTime variable) to log a warning or debug message
that includes the vmUuid (if available) and the original bootTime string plus
the exception details, then continue to return null; ensure the log uses the
class logger (VmBootTimeUtils) and includes clear context ("vmUuid", "bootTime",
and exception) so parsing failures are recorded for troubleshooting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java`:
- Around line 19-29: The method VmBootTimeUtils#getUptime(String vmUuid, boolean
isRunning) uses a boolean to convey VM state, which is unclear and error-prone;
change it to a clearer API by either (A) adding two semantic methods
getRunningUptime(String vmUuid) and getStoppedUptime(String vmUuid) that
encapsulate the current logic, or (B) replace the boolean signature with
getUptime(String vmUuid, VmInstanceState state) and branch on the enum (only
call getBootTime when state == VmInstanceState.Running); update all callers in
VmInstanceBase to use the new methods/enum and remove usages that pass raw
booleans. Ensure UNKNOWN_UPTIME/STOPPED_UPTIME behavior is preserved and
deprecate/remove the old boolean-based method.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 424-437: getBootTimeAfterStateChanged currently sets bootTime =
now for broad transitions (e.g., Stopped -> Running) causing incorrect
truncation of uptime and misses the first-run Resuming case; change the logic in
getBootTimeAfterStateChanged to only set Timestamp.valueOf(LocalDateTime.now())
when you can confirm a real start/reboot completion (e.g., oldState ==
VmInstanceState.Starting || oldState == VmInstanceState.Rebooting -> newState ==
Running) and do not unconditionally set now for Stopped -> Running so host-side
checkState/APIGetVmUptimeMsg can correct uptime; additionally, add a branch to
set now when oldState == VmInstanceState.Resuming && currentBootTime == null
(first resume -> Running) so first-time resumes get a bootTime written.

In `@conf/db/zsv/V5.1.0__schema.sql`:
- Line 42: The migration only adds a nullable bootTime column via INSERT_COLUMN
for VmInstanceEO but doesn't backfill historical running VMs, so post-upgrade
callers using VmBootTimeUtils will see unknown uptimes; add an idempotent
backfill step in the same migration that initializes VmInstanceEO.bootTime for
existing rows in running state (use the same source VmBootTimeUtils relies on —
e.g., existing uptime/lastPing/created_at or the calculated candidate from your
service — falling back to created_at or current_timestamp-uptime if available),
and document the approach; ensure the backfill is safe to re-run and preserves
NULL for VMs where no reasonable bootTime can be inferred.

---

Outside diff comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java`:
- Around line 43-50: The resetBootTime method has a TOCTOU race: it calls
vmExists(vmUuid) then setBootTime(vmUuid, bootTime), which can race with
deletes/updates; change resetBootTime to perform a single conditional update
(e.g. "update ... where uuid = ? and <running/exists condition>") in one DB call
and use the returned affected-rows to decide whether to return STOPPED_UPTIME or
formatBootTime(bootTime); remove the separate vmExists check and ensure you call
the same setBootTime/SQL logic (or refactor setBootTime into an
update-returning-rows variant) so the method's return reflects the actual write
result.

---

Nitpick comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java`:
- Around line 106-109: The DateTimeParseException in VmBootTimeUtils is
currently swallowed; update the catch block where bootTime is parsed (using
BOOT_TIME_FORMATTER and the bootTime variable) to log a warning or debug message
that includes the vmUuid (if available) and the original bootTime string plus
the exception details, then continue to return null; ensure the log uses the
class logger (VmBootTimeUtils) and includes clear context ("vmUuid", "bootTime",
and exception) so parsing failures are recorded for troubleshooting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 23762618-a750-44ca-af90-8c5c9e15e6b4

📥 Commits

Reviewing files that changed from the base of the PR and between 88ff08e and 3fa4d40.

📒 Files selected for processing (3)
  • compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • conf/db/zsv/V5.1.0__schema.sql

Comment thread compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java Outdated
Comment thread compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java

-- Feature: VM boot time | ZSV-12297

CALL INSERT_COLUMN('VmInstanceEO', 'bootTime', 'timestamp NULL', 1, NULL, 'state');
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

补齐升级场景的历史数据回填策略。

Line 42 仅新增了可空 bootTime 列,但迁移脚本里没有处理“已在运行中的历史 VM”初始化。结合 VmBootTimeUtils 在 Line 28 的行为,升级后这批 VM 在 bootTime 缺失时会返回未知 uptime,影响升级后可用性与一致性。建议在迁移阶段补充回填方案(或明确并保证可接受的惰性回填路径与行为)。

As per coding guidelines “Upgrading scene has been carefully handled”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/db/zsv/V5.1.0__schema.sql` at line 42, The migration only adds a
nullable bootTime column via INSERT_COLUMN for VmInstanceEO but doesn't backfill
historical running VMs, so post-upgrade callers using VmBootTimeUtils will see
unknown uptimes; add an idempotent backfill step in the same migration that
initializes VmInstanceEO.bootTime for existing rows in running state (use the
same source VmBootTimeUtils relies on — e.g., existing
uptime/lastPing/created_at or the calculated candidate from your service —
falling back to created_at or current_timestamp-uptime if available), and
document the approach; ensure the backfill is safe to re-run and preserves NULL
for VMs where no reasonable bootTime can be inferred.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from taiyue.chen:

这个会在后面兜底时,自动去各个机器上查询其状态进行回填。所以,这边无需这么操作。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZStack-Robot,感谢 taiyue.chen 的说明。

确认代码中存在兜底路径:当 APIGetVmUptimeMsg 被调用且 bootTimenull 时,VmInstanceBase 会回退到向 Host Agent 发送 GetVmUptimeMsg,成功后通过 backfillBootTimeIfMissing 将结果写回数据库。

需要注意的是,这是按需(request-triggered)的惰性回填,而非升级时的主动回填:

  • 升级后,直到有调用方对某台 VM 发起 getUptime 请求之前,其 bootTime 仍为 null
  • 每次未命中都会产生一次 Host Agent 查询,直到写入成功为止。

如果产品侧和 UI 侧对这个"首次查询延迟"行为可接受,则当前方案是合理的。

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/ZSV-12297 branch 2 times, most recently from d3c25a2 to 5f2d534 Compare May 26, 2026 09:25
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java`:
- Around line 36-43: The current resetBootTime performs a separate vmExists
check then calls setBootTime, creating a TOCTOU race where the VM may be removed
between the check and update; change resetBootTime to perform a single
conditional update by uuid (i.e. issue an UPDATE that sets boot_time = now()
WHERE uuid = :vmUuid) and treat zero rows affected as "not found" instead of
relying on vmExists, or adjust setBootTime to execute that single UPDATE and
return the affected-row count so resetBootTime can call it directly without a
prior vmExists check.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`:
- Around line 3758-3776: The code mixes bootTime and uptime: both the local-path
(vmBootTimeUtils.getBootTime) and the host-path set reply.setUptime to a boot
timestamp, changing semantics; update VmInstanceBase so reply.setUptime always
contains the runtime duration (use GetVmUptimeReply#getUptime when handling the
host response) and do not overwrite it with vmBootTimeUtils.getBootTime; if you
need to persist or backfill boot timestamp keep
vmBootTimeUtils.backfillBootTimeIfMissing(self.getUuid(), re.getUptime()) but
store that value into a new field (e.g., reply.setBootTime or internal-only
variable) rather than reply.setUptime, and ensure the initial fast-path still
computes/returns a duration string rather than a boot timestamp.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f6788aa0-b5f9-4ad4-ad9b-7790249a32de

📥 Commits

Reviewing files that changed from the base of the PR and between 3fa4d40 and 5f2d534.

📒 Files selected for processing (3)
  • compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java
  • compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java
  • conf/db/zsv/V5.1.0__schema.sql

Comment thread compute/src/main/java/org/zstack/compute/vm/VmBootTimeUtils.java
Comment on lines 3758 to +3776
@@ -3732,12 +3770,12 @@ private void handle(APIGetVmUptimeMsg msg) {
bus.send(gmsg, new CloudBusCallBack(msg) {
@Override
public void run(MessageReply r) {
if (!r.isSuccess()) {
reply.setSuccess(false);
reply.setError(r.getError());
} else {
if (r.isSuccess()) {
GetVmUptimeReply re = (GetVmUptimeReply) r;
reply.setUptime(re.getUptime());
vmBootTimeUtils.backfillBootTimeIfMissing(self.getUuid(), re.getUptime());
reply.setUptime(StringUtils.defaultString(vmBootTimeUtils.getBootTime(self.getUuid()), VmBootTimeUtils.UNKNOWN_UPTIME));
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

APIGetVmUptimeMsg 现在把 bootTime 塞进了 uptime 字段。

这里两条成功路径最终都执行 reply.setUptime(vmBootTimeUtils.getBootTime(...)),连 host 兜底成功后也不会直接返回 GetVmUptimeReply#getUptime()。这样 uptime 的语义已经变成了“启动时间”,并且返回格式会随本地是否已有 bootTime 而变化。建议保持 uptime 继续返回运行时长,bootTime 另起字段或仅作为内部计算值。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java` around lines
3758 - 3776, The code mixes bootTime and uptime: both the local-path
(vmBootTimeUtils.getBootTime) and the host-path set reply.setUptime to a boot
timestamp, changing semantics; update VmInstanceBase so reply.setUptime always
contains the runtime duration (use GetVmUptimeReply#getUptime when handling the
host response) and do not overwrite it with vmBootTimeUtils.getBootTime; if you
need to persist or backfill boot timestamp keep
vmBootTimeUtils.backfillBootTimeIfMissing(self.getUuid(), re.getUptime()) but
store that value into a new field (e.g., reply.setBootTime or internal-only
variable) rather than reply.setUptime, and ensure the initial fast-path still
computes/returns a duration string rather than a boot timestamp.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from taiyue.chen:

这边upTime实际返回就是bootTime,没有问题。

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zstack-robot-2 明白了,感谢说明。既然 APIGetVmUptimeReply.uptime 在当前约定里实际就是返回 bootTime,那我这条“语义从 uptime 变成 bootTime”的评论不成立;这里按现有实现返回 vmBootTimeUtils.getBootTime(...) 没问题。我会按这个约定看待后续相关改动。

🐰🤔


✏️ Learnings added
Learnt from: zstack-robot-2
Repo: MatheMatrix/zstack PR: 4096
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:3758-3776
Timestamp: 2026-05-26T09:38:38.804Z
Learning: In MatheMatrix/zstack, for the VM uptime API (`APIGetVmUptimeMsg` / `APIGetVmUptimeReply.uptime` in `compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java`), the `uptime` field is expected by maintainers to return the VM boot time value, not necessarily a duration. Do not flag `reply.setUptime(vmBootTimeUtils.getBootTime(...))` as an API semantic break solely because it returns bootTime.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 3826
File: compute/src/main/java/org/zstack/compute/allocator/HostAllocatorChain.java:261-265
Timestamp: 2026-04-22T04:38:00.864Z
Learning: 在 ZStack 的 `Platform.err()` / `Platform.toRawString()` 调用中,传入用于格式化的 `%s` 参数若实现了 `org.zstack.header.core.I18nMessage`,框架会自动通过 `getDetails()` 取得原始文本、并通过 `getI18nDetails()` 取得国际化文本。因此,在代码审查中不应将“直接把实现了 `I18nMessage` 的对象作为 `%s` 参数传入而可能打印出对象地址/不安全”当作问题;此用法是正确且应当被视为安全的(例如将 `HostCandidate.RejectedCandidate` 这类实现了 `I18nMessage` 的对象作为 `%s` 参数传入)。

Learnt from: ZStack-Robot
Repo: MatheMatrix/zstack PR: 3837
File: compute/src/main/java/org/zstack/compute/vm/VmInstanceBase.java:8789-8791
Timestamp: 2026-04-22T14:27:42.894Z
Learning: In ZStack compute code, calls to VmInstanceResourceMetadataManager.deleteVmResourceMetadata(...) should generally be treated as best-effort: do not fail/block the primary VM operation when deletion fails (for example, during detach/swap flows such as VmDetachNicFlow, VolumeManagerImpl, KVMHostFactory, and VmInstanceBase when swapping the default CD-ROM). Prefer logging the returned ErrorCode (and relevant context) rather than enforcing strict failure, unless maintainers explicitly request strict enforcement for that specific flow.

Learnt from: zstack-robot-1
Repo: MatheMatrix/zstack PR: 3906
File: compute/src/main/java/org/zstack/compute/zone/ZoneManagerImpl.java:162-164
Timestamp: 2026-05-09T02:01:09.283Z
Learning: In ZStack compute code, when invoking post-commit extension point hooks (for example `ZoneExtensionPointEmitter.afterCreate`, `ClusterExtensionPointEmitter.afterCreate`, and other `*ExtensionPointEmitter.after*` hooks), treat extension implementations as fail-loud by convention. Do not wrap these hook invocations in `try/catch` to suppress or convert exceptions—let exceptions thrown by the extension propagate to the caller so they are detected and handled. If you must do local cleanup, use `finally` for resource cleanup, but do not swallow or mask the thrown exception.

Resolves: ZSV-12297

Change-Id: I617664617a6f6c66736f6b717575636f676f7961
@MatheMatrix MatheMatrix force-pushed the sync/zstackio/ZSV-12297 branch from 5f2d534 to 53ad7b6 Compare May 26, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants