Skip to content

[9.4](backport #7009) fix: enforce policy-based access control on artifact downloads#7162

Open
mergify[bot] wants to merge 1 commit into
9.4from
mergify/bp/9.4/pr-7009
Open

[9.4](backport #7009) fix: enforce policy-based access control on artifact downloads#7162
mergify[bot] wants to merge 1 commit into
9.4from
mergify/bp/9.4/pr-7009

Conversation

@mergify

@mergify mergify Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

What is the problem this PR solves?

The artifact download endpoint (/api/fleet/artifacts/{id}/{sha256}) only validates the agent's API key but never checks whether the requested artifact belongs to the agent's assigned policy. This means an agent enrolled under one policy can download artifacts belonging to a different policy if it knows the artifact ID and SHA256 hash. For example, an agent enrolled under a policy with no integrations can retrieve Elastic Defend trust lists, exception lists, and other security artifacts from another policy.

How does this PR solve the problem?

Implements the authorizeArtifact() function (previously a no-op that returned nil) to enforce policy-based access control:

  1. Adds a GetPolicy(ctx, policyID) method to the policy.Monitor interface that returns the cached policy for a given ID (reloads from ES on cache miss).
  2. In authorizeArtifact, fetches the agent's policy via the monitor using agent.AgentPolicyID and verifies that the requested artifact (identifier + decoded_sha256) appears in the policy's inputs[].artifact_manifest.artifacts.
  3. Returns 403 Forbidden (ErrUnauthorizedArtifact) if the artifact is not listed in the agent's assigned policy.

How to test this PR locally

  1. Set up Fleet Server with Elasticsearch and Kibana
  2. Create two agent policies: Victim-Policy with Elastic Defend integration (add a trusted application), and Attacker-Policy with no integrations
  3. Create an enrollment token for Attacker-Policy and enroll an agent
  4. Attempt to download an artifact belonging to Victim-Policy using the attacker agent's API key — should now receive 403 Forbidden instead of the artifact contents
  5. Verify that an agent enrolled under Victim-Policy can still download its own artifacts normally (200 OK)

Design Checklist

  • I have ensured my design is stateless and will work when multiple fleet-server instances are behind a load balancer.
  • I have or intend to scale test my changes, ensuring it will work reliably with 100K+ agents connected.
  • I have included fail safe mechanisms to limit the load on fleet-server: rate limiting, circuit breakers, caching, load shedding, etc.

Checklist

* fix: enforce policy-based access control on artifact downloads

The artifact download endpoint (/api/fleet/artifacts/{id}/{sha256})
previously only validated the agent's API key but never checked whether
the requested artifact belonged to the agent's assigned policy. This
allowed an agent enrolled under one policy to download artifacts from
a different policy if it knew the artifact ID and SHA256 hash.

Add authorizeArtifact implementation that fetches the agent's policy
from the in-memory policy monitor cache and verifies the requested
artifact appears in the policy's artifact_manifest before serving it.
Returns 403 Forbidden if the artifact is not in the agent's policy.

Resolves: https://github.com/elastic/security/issues/8396

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: add changelog fragment for artifact access control fix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: document race condition tradeoffs in authorizeArtifact

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use any instead of interface{} per Go conventions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: add typed ArtifactManifest struct for policy input parsing

Defines model.ArtifactManifest and model.ManifestEntry structs so
policyHasArtifact no longer navigates untyped map[string]any chains.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit caa8b2d)
@mergify mergify Bot requested a review from a team as a code owner June 5, 2026 07:24
@mergify mergify Bot added the backport label Jun 5, 2026
@mergify mergify Bot requested review from lorienhu and macdewee June 5, 2026 07:24
@mergify mergify Bot added the backport label Jun 5, 2026
@github-actions github-actions Bot added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Jun 5, 2026
@github-actions

github-actions Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

TL;DR

The Buildkite E2E step failed (.buildkite/scripts/e2e_test.sh), but the captured log excerpt does not contain the actual failing assertion/test; it only shows a final package-level FAIL after one suite summary that is entirely PASS. Immediate action: rerun the E2E step with full go test output preserved (or attach build/test-e2e-*.out) so the first failing test line is visible.

Remediation

  • Re-run the failing step and capture the full output of mage test:e2e (the first --- FAIL: / panic line is missing in the current artifact).
  • Run targeted repro locally/CI for artifact-related E2E tests introduced in this PR:
    • TEST_RUN='TestStandAloneCurrentAPI/TestArtifact' mage test:e2e
    • TEST_RUN='TestStandAlone20230601API/TestArtifact' mage test:e2e
  • If the failure is a timeout/flaky 403 loop, add explicit logging of response codes in the retry path and fail with the last response body to make future CI failures diagnosable.
Investigation details

Root Cause

I could not conclusively determine a single root cause from the provided Buildkite log file because the failing assertion is not present in the available output.

What is verifiable:

  • Build step runs mage test:e2e test:junitReport in .buildkite/scripts/e2e_test.sh:16.
  • E2E execution itself is go test ... -timeout 30m -race -p 1 ./... from magefile.go:2120 and magefile.go:2162.
  • The captured log shows TestStandAloneRunningSuite subtests all passing, then package-level FAIL for github.com/elastic/fleet-server/testing/e2e.

Given this PR’s scope, the most likely affected surface is the updated artifact E2E flow:

  • testing/e2e/api_version/client_api_current.go:407
  • testing/e2e/api_version/client_api_2023_06_01.go:307
  • testing/e2e/scaffold/scaffold.go:599

These paths now depend on eventual consistency (policy artifact manifest visibility + retry-on-403), which is a common source of CI-only flakiness/timeouts when logs are truncated.

Evidence

  • Build: https://buildkite.com/elastic/fleet-server/builds/15017
  • Job/step: E2E Test (.buildkite/scripts/e2e_test.sh)
  • Key log excerpt:
    • --- PASS: TestStandAloneRunningSuite ...
    • FAIL github.com/elastic/fleet-server/testing/e2e 1378.250s
    • no --- FAIL: block or panic for the actual failing test is present in the provided log artifact.

Verification

  • Not run in this environment (Docker-in-Docker unavailable), so verification is based on Buildkite artifacts and repository source inspection only.

Follow-up

  • I checked for matching flaky-test issues for TestWithElasticsearchConnectionFailures, TestWithElasticsearchConnectionFlakyness, and artifact API suite names; no direct match was found.

Note

🔒 Integrity filter blocked 1 item

The following item were blocked because they don't meet the GitHub integrity level.

  • #3909 search_issues: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

What is this? | From workflow: PR Buildkite Detective

Give us feedback! React with 🚀 if perfect, 👍 if helpful, 👎 if not.

@mergify

mergify Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

This pull request has not been merged yet. Could you please review and merge it @ycombinator? 🙏

2 similar comments
@mergify

mergify Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

This pull request has not been merged yet. Could you please review and merge it @ycombinator? 🙏

@mergify

mergify Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

This pull request has not been merged yet. Could you please review and merge it @ycombinator? 🙏

@ebeahan

ebeahan commented Jun 22, 2026

Copy link
Copy Markdown
Member

@ycombinator do we need this backport?

@mergify

mergify Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

This pull request has not been merged yet. Could you please review and merge it @ycombinator? 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants