ci: fix flaky integration tests by distributing images via GHCR#3582
ci: fix flaky integration tests by distributing images via GHCR#3582amir-deris wants to merge 5 commits into
Conversation
PR SummaryMedium Risk Overview Matrix jobs log in to GHCR, Adds Reviewed by Cursor Bugbot for commit 061d214. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3582 +/- ##
==========================================
- Coverage 59.22% 58.35% -0.87%
==========================================
Files 2214 2140 -74
Lines 183389 174842 -8547
==========================================
- Hits 108604 102031 -6573
+ Misses 64994 63720 -1274
+ Partials 9791 9091 -700
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Problem
The
Docker Integration Testworkflow packaged the localnode/rpcnode Docker images into a ~1 GB artifact (integration-docker-images.tar.zst) that ~40 matrix jobs each downloaded concurrently viaactions/download-artifact@v4. The action streams and extracts the zip without an end-to-end integrity check, so a prematurely closed connection can leave a truncated file without failing the step. The first detector waszstd -d | docker load, failing withRead error (39): premature end/unexpected EOFand requiring a manual rerun. With 40 concurrent 1 GB downloads per run, this flaked regularly.Fix
Distribute the images via GHCR instead of an artifact. Registry pulls are content-addressed — every layer is sha256-verified and retried automatically by the docker client — so truncation cannot slip through silently.
prepare-clusterpushes both images toghcr.io/sei-protocol/sei-chain-integration-test-{localnode,rpcnode}:<run_id>usingGITHUB_TOKEN(no OIDC or external secrets required). The CI artifact now carries only the smallseidtarball.docker pullthe run-tagged images, and retag them tosei-chain/{localnode,rpcnode}— everything downstream (docker-cluster-start-cietc.) is unchanged.sei-chain.ci-run-idlabel so every run pushes a unique image digest. Labels are config-only: the layer cache is unaffected and a cache-hit run uploads just a new config blob + manifest. This avoids the pitfall of re-tagging a stable digest where in-flight runs could be affected by tag moves.run_idand persist in GHCR across attempts.ghcr-integration-test-cleanup.yml: a weekly scheduled workflow (Sundays 06:00 UTC) that prunes run-id tags older than 14 days from both GHCR repos, while preserving the:cachetag. Supportsworkflow_dispatchwith a dry-run option.Advantage over ECR
It avoid ~3000$ monthly cost for egress charge from AWS to GitHub runners. Also
GITHUB_TOKENis automatically available to all workflows including fork PRs, removing the need for OIDC role assumptions or AWS credentials for image distribution. No IAM setup required.