feat(ci): add integration test workflows (single-node + multinode) by warku123 · Pull Request #6789 · tronprotocol/java-tron

warku123 · 2026-05-20T08:31:45Z

What does this PR do?

Adds two GitHub Actions workflows that gate PRs to release_v4.8.2 (and develop, release_**) with the modernized integration test suite, and removes system-test.yml since the new workflows supersede it. Single-node and multinode workflows build FullNode.jar from the PR, pull troninfra/troninfra-ci:latest (public DockerHub image), and run the full integration test target against the freshly-built jar.

The integration test image is published from a separate upstream-owned repo (tron_integration-test/java-tron_integration-test on internal GitLab); its release pipeline pushes vX.Y.Z + :latest tags to DockerHub. This PR's workflows just consume that published image — no source access required.

Why are these changes required?

The integration test suite (JUnit 5 + AssertJ on TronTestBase fixtures) covers gRPC / HTTP / JSON-RPC, V2 staking, governance, smart contracts, event subscription, multi-witness consensus boundaries, and several other areas that system-test either didn't touch or asserted only weakly on. Up to now it has been a pre-release manual gate; wiring it into PR CI catches integration regressions before merge instead of after.

This PR has been tested by:

Manual Testing — fork PR warku123/java-tron#5 ran Integration Test Single Node (Full), 73/73 pass.
Manual Testing — fork PR warku123/java-tron#7 ran Integration Test Multinode (Full), all multinode tests pass.

Follow up

Two assertions in the integration suite hard-code java.version.startsWith("1.8"). The workflow works around it by setting JAVA_HOME to JDK 8 inside the container; the assertions themselves will be relaxed in a separate MR on the integration-test repo.
If multinode's per-PR runner cost becomes a pain point, it can be moved to nightly + workflow_dispatch later.

Extra details

Wall-clock impact: ~24 min ceiling from multinode (vs ~19.5 min for pr-build) — net +5 min on PR wall-clock.
Image dependency: troninfra/troninfra-ci:latest is public and rolls forward on each vX.Y.Z tag from the integration-test repo. Workflows can be pinned to a specific tag if needed.

Pulls troninfra/troninfra-ci:latest from DockerHub on every PR/dispatch and runs the integration test suite against the FullNode.jar built from the PR. Uses JDK 8 inside the container so FullNode runs on the production runtime; JAVA_HOME_17 keeps Gradle on JDK 17 for the test tooling. Image is built and published from a separate repo (tron_integration-test/java-tron_integration-test) as part of that project's release pipeline (vX.Y.Z tag push).

Mirrors the single-node workflow but starts a 3-witness docker-compose stack via the integration-test image and runs the multinode test target against it. HOST_WORKDIR is set so the script's path-alignment logic resolves compose configs to host paths usable by the host docker daemon (DinD via socket mount). Builds the PR's FullNode.jar, wraps it in a local docker image that inherits from tronprotocol/java-tron:latest, and injects it into the multinode stack via TRON_IMAGE.

Both Integration Test workflows (Single Node / Multinode, Full) cover strictly more than the existing System Test runs. Drop System Test from the PR + push auto-trigger set so every commit doesn't pay the ~7 min runner cost twice for overlapping coverage. workflow_dispatch is kept so a regression suspect can still run stest manually from the Actions UI if needed. To re-enable for routine PR runs, uncomment the push + pull_request blocks at the top of the file.

bladehan1 · 2026-05-21T08:24:22Z

@@ -0,0 +1,117 @@
+name: Integration Test Multinode (Full)


[SHOULD] Sync pr-cancel.yml workflows list with the workflow changes in this PR

.github/workflows/pr-cancel.yml:20 hard-codes the workflows list as ['pr-build.yml', 'system-test.yml', 'codeql.yml']. This PR deletes system-test.yml and adds two new workflows, but does not touch pr-cancel.yml. Once this PR is merged:

github.rest.actions.listWorkflowRuns({workflow_id: 'system-test.yml', ...}) will hit a 404 on the deleted workflow file. actions/github-script@v8 has no try/catch around the loop, so the whole cancel job throws — meaning pr-build.yml, codeql.yml, and the two new integration-test workflows are never cancelled on a closed-unmerged PR.

Multinode timeout is 60 min — a single un-cancelled run consumes a full runner slot.

Suggestion: in this PR, update pr-cancel.yml:20 to remove 'system-test.yml', add 'integration-test-single-node.yml' and 'integration-test-multinode.yml', and wrap each workflow iteration in a try/catch to make the job resilient to future drift.

Good catch — both issues are valid. Fixed in eed9dad7d:

Updated the workflow list: dropped system-test.yml, added integration-test-single-node.yml and integration-test-multinode.yml.

Wrapped each workflow iteration in a try/catch so a missing or renamed file logs a Skipping ... line instead of taking down the whole cancel job. This makes the cancel job resilient to future drift, not just the current rename.

bladehan1 · 2026-05-21T08:24:22Z

+name: Integration Test Single Node (Full)
+
+on:
+  pull_request:


[DISCUSS] Is this workflow intended to be a hard merge gate, or an informational signal?

The PR description frames the new workflows as 'wiring it into PR CI catches integration regressions before merge instead of after' — i.e., a real gate. However, the workflow files alone do not block merge: even if Integration Test Single Node Full or Integration Test Multinode Full fails red, GitHub will still allow merge unless these check names are explicitly added to the repository's required status checks under Branches → Branch protection rules.

That configuration lives outside the repository tree and is not part of this PR, so two questions to clarify before / soon after merge:

Intent: are these meant to be required checks on release_v4.8.2 / develop, or informational only for now (e.g., let the new suite bake for 1–2 release cycles before promoting to required)?

Rollout: if required, who owns adding Integration Test Single Node Full (JDK 8 / x86_64) and Integration Test Multinode Full (JDK 8 / x86_64) to the branch protection rules on which protected branches, and when? Without that follow-up, replacing system-test.yml (which is also not currently required, but conceptually was the gate) silently downgrades the gating posture — failures become advisory.

If gating is the goal, suggest tracking the branch-protection update as an explicit follow-up in the PR description (same vein as the existing 'two java.version assertions' and 'multinode cost' follow-ups). If informational-only is the goal, please say so in the PR description so reviewers don't assume blocking semantics.

Confirming: hard merge gates on release_v4.8.2 and develop is the intent, not informational. You're right that the workflow files alone don't enforce that — the branch protection update is a follow-up that lives outside this PR.

The reasoning: system-test.yml has historically been the on-chain-behavior gate — gRPC/HTTP/JSON-RPC responses, governance lifecycle, V2 staking, transaction validation, multi-witness consensus boundaries. The new workflows cover the same surface with strictly tighter assertions (JUnit 5 + AssertJ hard assertions on exact values, instead of isNotNull/isNotEmpty checks). Replacing stest without making the new workflows required would silently downgrade the gating posture — which is exactly the risk you flagged.

The two workflows are complementary, not redundant:

Integration Test Single Node (Full) runs the suite against one FullNode. Covers the bulk of on-chain behavior: API correctness (gRPC / HTTP / JSON-RPC response shapes and values), transaction validation, governance proposal lifecycle, V2 staking, smart contract execution, event subscription. Anything that depends on a single node's state machine.

Integration Test Multinode (Full) spins up a 3-witness docker-compose stack. Covers what one node can't observe: witness rotation across maintenance boundaries, block propagation between peers, solidification dynamics with multiple confirmations, view-change behavior, isolated-land detection, multi-witness-specific config matrices (RocksDB on node2, native event queue on node3, prometheus on node1, etc.).

A regression in chain parameter logic typically shows up only in single-node; a regression in consensus / P2P / witness scheduling typically shows up only in multinode. Removing either one leaves a class of regressions undetectable at PR time — which is why both need to be required.

Hardcodes getMaintenanceTimeInterval to return 6000L instead of reading from DB. The Integration Test workflows added in this PR should catch this protocol-level regression and turn CI red. REVERT BEFORE MERGE.

This reverts commit 8ce5c51.

… gating Truncates ConsensusDelegate.getActiveWitnesses() to return only the first witness. Single-node setups have 1 witness so this is effectively a no-op there; multinode setups (3 witnesses) lose rotation entirely, which the Integration Test (Multinode Full) workflow should catch while the Single Node workflow stays green. REVERT BEFORE MERGE.

…cific CI gating" This reverts commit 703c087.

warku123 · 2026-05-21T10:13:02Z

To address the concern about whether an integration test failure actually reaches the PR check status:

The "Run integration tests" step runs the container via docker run, which surfaces the container's exit code. Inside the container, ./run-tests.sh invokes Gradle's test target — Gradle exits non-zero on any test failure. That non-zero exit propagates all the way up: step → job → workflow run → PR check (red).

Verified end-to-end on this PR:

The pipeline on 196db87 (before any demo break) was green across all checks.
Commit 8ce5c51 deliberately broke a single line — hardcoded getMaintenanceTimeInterval() to return a wrong value. PR Build (via chainbase UT) and Integration Test (Single Node Full) both went red.
Commit ccbc8f0 reverted that one-line change. The next pipeline returned to green.

So a single bad line of Java did flip the PR from green to red, and the revert flipped it back. Combined with the repo's branch protection (all required checks green + ≥2 maintainer approvals), integration test failures block merge in practice.

system-test.yml is removed in this PR and replaced by two new workflows (integration-test-single-node.yml + integration-test-multinode.yml). Update pr-cancel.yml's hard-coded workflow list to match: drop system-test.yml, add the two new ones. Also wrap each workflow iteration in try/catch so that a missing or renamed workflow file no longer takes down the whole cancel job — the remaining workflows still get processed and a clear "Skipping ..." line is logged for the offending one. Closes the resilience gap flagged in code review.

halibobo1205 · 2026-05-29T03:43:12Z

+name: Integration Test Multinode (Full)
+
+on:
+  pull_request:


Post-merge validation gap

Removing the push: [master, release_**] trigger means the suite only ever runs against a merge preview (refs/pull/N/merge), computed against the base HEAD at the time the PR's CI ran — never against the actual post-merge tree. A semantic conflict between two independently green PRs slips through:

PR A changes a caller: wallet.getReward(addr) → wallet.getReward(addr, cycle).

PR B, opened earlier, adds a new caller of the old signature, getReward(addr), in a different file.

Both PRs run integration tests. Each merge preview contains only its own change, so both go green.

PR A merges. Then PR B merges (its CI ran before A landed and was never re-evaluated against A's change, since "require up to date" is off).

The branch now has a call to a signature that no longer exists. No PR's CI ever saw this combination, and with the push trigger gone, there is no post-merge run to catch it — it surfaces only when someone next builds the release branch.

The old system-test.yml ran on push to master/release_**, so it re-validated the real merged code after every merge and would have caught this. That safety net is dropped in this PR.

Recommended approach (post-merge net as baseline, pre-merge enforcement optional):

Keep single-node as the required per-PR gate.

Add back a push trigger running the full suite as a post-merge catch-net — runs once per merge, not once per base advance:
on: push: branches: [ 'master', 'release_**' ] pull_request: branches: [ 'develop', 'release_**' ] types: [ opened, synchronize, reopened ] workflow_dispatch:

Move multinode off the per-PR path → nightly (schedule) + workflow_dispatch + post-merge push (cost-driven; consistent with the PR's own "move multinode to nightly" note).

Optional: enable "Require branches to be up to date before merging" to catch conflicts pre-merge — but given the 18–25 min suite cost, this forces frequent expensive reruns; if used, scope it to single-node only.

Thanks for catching the post-merge gap — push trigger is back on both single-node and multinode workflows (master + release_**, matching the old system-test.yml).

On moving multinode off per-PR, pushing back gently to gauge whether the cost argument applies here:

Runner cost: java-tron is a public OSS repo, so github-hosted runners are free — no metered minutes to optimize for.

Wall-clock: pr-build already takes ~20 min today. Multinode caps at ~24 min and runs in parallel with pr-build, so the net per-PR wait goes from ~20 min → ~25 min — about +5 min. (Same number as the PR description's "+5 min on PR wall-clock".)

Given multinode is the only gate that exercises witness rotation, block propagation between peers, solidification dynamics, view-change, and the multi-witness config matrix — none of which single-node can observe — is +5 min wall-clock acceptable to keep it as a per-PR gate?

If you'd still prefer cron, two follow-up questions so I can wire it correctly:

Frequency — nightly (1×/day), or less often (e.g., 3×/week, weekly)?

Time — any preferred UTC window (off-peak for the team, aligned with an existing nightly, etc.)?

The old system-test.yml ran on push to master and release_** branches, which re-validated the actual merged tree after every merge. Removing that trigger left a post-merge gap: integration tests only ever run against PR merge previews (computed at PR-CI time against the base HEAD then), so a semantic conflict between two independently-green PRs that land back-to-back is never re-evaluated against the real merged code. Restoring the push trigger on both single-node and multinode workflows brings back the post-merge catch-net.

halibobo1205

LGTM. Post-merge push trigger restored and multinode kept as a per-PR gate — main concerns addressed. Non-blocking follow-ups: pin the :latest images, add permissions: contents: read, and align the concurrency groups. Please make sure branch-protection required checks swap "System Test" for the two new check names when this lands.

bladehan1

LGTM

warku123 added 3 commits May 20, 2026 15:50

github-actions Bot requested review from bladehan1 and halibobo1205 May 20, 2026 08:31

ci: delete System Test workflow (superseded by Integration Test)

196db87

bladehan1 reviewed May 21, 2026

View reviewed changes

warku123 added 3 commits May 21, 2026 16:56

DEMO: break getMaintenanceTimeInterval to prove CI gating

8ce5c51

Hardcodes getMaintenanceTimeInterval to return 6000L instead of reading from DB. The Integration Test workflows added in this PR should catch this protocol-level regression and turn CI red. REVERT BEFORE MERGE.

Revert "DEMO: break getMaintenanceTimeInterval to prove CI gating"

758ad78

This reverts commit 8ce5c51.

halibobo1205 added the topic:CI label May 21, 2026

halibobo1205 added this to the GreatVoyage-v4.8.2 milestone May 21, 2026

Revert "DEMO: break multinode witness rotation to prove multinode-spe…

ccbc8f0

…cific CI gating" This reverts commit 703c087.

halibobo1205 reviewed May 29, 2026

View reviewed changes

halibobo1205 approved these changes May 29, 2026

View reviewed changes

warku123 requested a review from bladehan1 May 29, 2026 07:36

bladehan1 approved these changes May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add integration test workflows (single-node + multinode)#6789

feat(ci): add integration test workflows (single-node + multinode)#6789
warku123 wants to merge 10 commits into
tronprotocol:release_v4.8.2from
warku123:feat/ci-integration-tests

warku123 commented May 20, 2026 •

edited

Loading

Uh oh!

bladehan1 May 21, 2026

Uh oh!

warku123 May 21, 2026

Uh oh!

bladehan1 May 21, 2026

Uh oh!

warku123 May 21, 2026

Uh oh!

warku123 commented May 21, 2026 •

edited

Loading

Uh oh!

halibobo1205 May 29, 2026

Uh oh!

warku123 May 29, 2026

Uh oh!

halibobo1205 left a comment

Uh oh!

bladehan1 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

warku123 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bladehan1 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

warku123 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

bladehan1 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

warku123 May 21, 2026

Choose a reason for hiding this comment

Uh oh!

warku123 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

halibobo1205 May 29, 2026

Choose a reason for hiding this comment

Post-merge validation gap

Uh oh!

warku123 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

halibobo1205 left a comment

Choose a reason for hiding this comment

Uh oh!

bladehan1 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

warku123 commented May 20, 2026 •

edited

Loading

warku123 commented May 21, 2026 •

edited

Loading