Skip to content

Change cloudE2E test to use oblt-cli#7209

Open
michel-laterman wants to merge 9 commits into
elastic:mainfrom
michel-laterman:ci/cloude2e-oblt-cli
Open

Change cloudE2E test to use oblt-cli#7209
michel-laterman wants to merge 9 commits into
elastic:mainfrom
michel-laterman:ci/cloude2e-oblt-cli

Conversation

@michel-laterman

Copy link
Copy Markdown
Contributor

What is the problem this PR solves?

Change buildkite pipelines to use oblt-cli instead of terraform

@michel-laterman michel-laterman added backport-skip Skip notification from the automated backport with mergify Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team ci CI related tasks skip-changelog labels Jun 16, 2026
@michel-laterman

michel-laterman commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Some instability with the cloudE2E tests in the builds: https://buildkite.com/elastic/fleet-server/builds/15100

@michel-laterman michel-laterman requested a review from v1v June 16, 2026 17:43
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Comment thread .buildkite/pipeline.yml Outdated
Comment thread .buildkite/pipeline.yml Outdated
Comment thread .buildkite/pipeline.yml Outdated
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Comment thread .buildkite/scripts/cloud_e2e_test.sh Outdated
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@michel-laterman michel-laterman marked this pull request as ready for review June 17, 2026 20:49
@michel-laterman michel-laterman requested a review from a team as a code owner June 17, 2026 20:49
Comment thread .buildkite/scripts/cloud_e2e_test.sh Outdated
Co-authored-by: Victor Martinez <victormartinezrubio@gmail.com>
Comment thread .buildkite/scripts/cloud_e2e_test.sh
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@michel-laterman michel-laterman requested a review from v1v June 18, 2026 23:53
Passing only StackVersion=9.5.0-SNAPSHOT lets oblt-cli resolve to the
latest snapshot build, which may differ from the pinned build in
dev-tools/integration/.env and can include breaking Kibana changes.

Pass ElasticsearchDockerImage and KibanaDockerImage using the full
ES_VERSION (e.g. 9.5.0-335b21fa-SNAPSHOT) read from .env to pin the
cluster to the exact build being tested. Mirrors the approach taken in
elastic/elastic-agent#14985.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

TL;DR

Both package jobs failed while mage docker:image was resolving the Docker runtime base image, not while running PR-specific Cloud E2E logic. cgr.dev returned 500 Internal Server Error for the anonymous token request for cgr.dev/chainguard/wolfi-base:latest, so retry the package jobs/build.

Remediation

  • Retry Package x86_64 and Package aarch64; the failing operation is an external registry token/metadata fetch during Docker build.
  • If this keeps recurring, pin or mirror the Wolfi runtime image used by Dockerfile:23 so package CI does not depend on resolving cgr.dev/chainguard/wolfi-base:latest on every run.
Investigation details

Root Cause

This is an infrastructure/dependency availability failure in the package Docker image build. The failed jobs run .buildkite/scripts/release_test.sh, which calls mage docker:image after package artifact upload/test steps. Docker.Image invokes docker build -f Dockerfile ... from magefile.go:1054-1065, and the Dockerfile runtime stage is pulled from cgr.dev/chainguard/wolfi-base:latest at Dockerfile:23.

Both package architectures failed at the same BuildKit metadata step for that external base image. The PR diff changes Cloud E2E/OBLT wiring and test env variable names, but it does not change .buildkite/scripts/release_test.sh, Dockerfile, or the Docker.Image build path involved here.

Evidence

=> ERROR [internal] load metadata for cgr.dev/chainguard/wolfi-base:latest
Dockerfile:23
--------------------
  23 | >>> FROM cgr.dev/chainguard/wolfi-base:latest
--------------------
ERROR: failed to build: failed to solve: failed to fetch anonymous token: unexpected status from GET request to (cgr.dev/redacted) 500 Internal Server Error
Error: running "docker build --build-arg GO_VERSION=1.26.4 ... -f Dockerfile ..." failed with exit code 1

Verification

  • Not run locally: Docker-in-Docker is unavailable in this workflow environment, and the captured failure depends on Buildkite worker access to cgr.dev during Docker metadata resolution.
  • Checked recent PR detective comments. The latest prior comments covered OBLT Cloud E2E provisioning/auth issues; older comments covered a similar cgr.dev dependency failure in a different E2E job, but not this current package build.
  • Checked for matching flaky-test issues in elastic/fleet-server; none matched this cgr.dev anonymous-token failure.

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

v1v
v1v previously approved these changes Jun 19, 2026
Comment thread .buildkite/scripts/cloud_e2e_test.sh
Comment thread .buildkite/scripts/cloud_e2e_test.sh
ycombinator
ycombinator previously approved these changes Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-skip Skip notification from the automated backport with mergify ci CI related tasks skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants