Skip to content

feat: pluggable policy sourcing via a Policy Provider subsystem #1713

@dvavili

Description

@dvavili

Problem Statement

OpenShell sources sandbox policy exactly one way: authored, validated, and composed inside the gateway, then enforced by the supervisor (Landlock/seccomp + proxy/OPA). Policy therefore lives entirely in the gateway's trust domain, with a single built-in implementation and no seam to source it elsewhere.

That can't serve deployments where the policy authority and the gateway are different parties — notably enterprise models needing attestation and independent audit: policy signed by a central authority in a separate trust domain; tamper-evident even against a compromised gateway; independently verifiable by auditors against a signed artifact. The built-in path structurally can't provide this — it sits inside the gateway's own trust domain.

Meanwhile OpenShell already makes compute, credentials, and identity pluggable via a driver model (a type = selector + a gRPC driver, RFC 0001). Policy is the one gateway concern that isn't.

Prior Art

The proposal has two halves — a pluggable seam, and sourcing signed policy from outside the gateway — each grounded in established systems.

Pluggable drivers with a neutral core. A thin, versioned contract at the boundary lets an ecosystem add backends without the core growing backend-specific knowledge:

  • HashiCorp Nomad task drivers and Terraform providers — core defines a minimal contract; each backend implements it out-of-process; the core stays agnostic. RFC 0001 cites both as the lineage for OpenShell's own driver model.
  • Kubernetes CRI / CSI — the kubelet and control plane talk to runtime and storage plugins over a stable gRPC contract and pass driver-specific config through without interpreting it, so new drivers ship independently of the Kubernetes release cycle.
  • OpenShell's own out-of-tree compute driver (#1703, RFC #1344) — the gateway already dispatches to an external compute driver over a Unix socket via --compute-driver-socket. This RFC applies the same out-of-tree-over-socket shape to policy.

Sourcing signed policy from outside the consumer's trust domain. A separate process vending verified, signed policy to a local consumer over a host-local contract, backed by an authority in its own trust domain, mirrors:

  • Open Policy Agent (OPA) signed bundles — a central authority signs policy bundles; a co-located OPA verifies the signature before serving decisions, so the consuming service need not trust the bundle's transport.
  • SPIFFE/SPIRE — a host-local agent issues workload identities over a local socket, backed by a separate central server. The same split (local contract, separate authority, consumer verifies) is what makes attested policy tamper-evident even against a compromised gateway.

Proposed Design

Add a Policy subsystem following that same driver model:

  • local (default) — today's store-backed path, unchanged.
  • external — the gateway sources policy from a separate provider process over a gRPC contract (UDS now, network later).

At admission the gateway acquires a per-sandbox handle, fetches a projection (a serialized SandboxPolicy body + digest, optionally signed), verifies the signature against a configured trust store, then relays the body to the supervisor over the existing channel. Everything downstream is unchanged.

One driver per gateway, selected at startup; opt-in and additive (local stays the default).

Use Cases

Scenario Driver What it enables
One party owns OpenShell and policy (status quo) local Today's behavior — user-authored policy in the gateway, including the RFC 0002 agent-driven loop. Unchanged.
Enterprise with a central policy authority external (attesting) Policy signed off-host by a security team in a separate trust domain; tamper-evident against a compromised gateway and independently auditable against the signed artifact.
Fleet of gateways sharing one policy source external (network transport) A central provider serves many gateways, so policy is administered once and sourced consistently across the fleet.
Policy-as-code from a git or bundle server external A provider process pulls versioned policy from a git repo or bundle server and projects it, without building that integration into the gateway.
Per-tenant / per-user policy external + runtime-context extensions (tenant_id, …) The provider returns a different projection per tenant or user, scoped by the runtime context the gateway binds at admission.

Alternatives Considered

  • Driver in-process / per-sandbox. Collapses the trust split — a per-sandbox provider sits inside the domain it constrains. external keeps the provider a separate process, like OpenShell's other drivers.
  • Do nothing. Fine when one party owns both OpenShell and policy; leaves enterprise deployments trusting the gateway as the policy authority, with no attestation or independent audit.

Compatibility

  • Additive and opt-in. local remains the default; a deployment opts in by setting type = "external". Gateways that never configure it behave exactly as today.
  • No change downstream. The projection body is the existing openshell.sandbox.v1.SandboxPolicy; the supervisor, proxy, and OPA engine are untouched, so the enforcement path is unchanged.
  • Wire-safe. The driver contract is versioned via surface_id / supported_surfaces; the gateway and driver reconcile at startup and fail closed on no overlap rather than enforcing a mismatched schema.
  • No data migration. Switching to external changes where policy is sourced, not how it's stored or enforced; switching back to local restores the built-in path with no migration.
  • Behavioral change to be explicit about. Under a non-mutating external driver, the runtime policy-mutation surface (openshell policy set/update, global policy delete, and the RFC 0002 agent-driven loop) is refused. This is intended, but it is a visible change for callers that rely on those verbs.

Definition of Done

  • A policy subsystem on the gateway with a driver selector, defaulting to today's built-in path.
  • An external driver that sources policy from a separate provider process, leaving the built-in local path unchanged and the default.
  • The gateway verifies and enforces sourced policy end-to-end: authentic (signature verified against a configured trust store), complete (the whole policy enforced or admission refused), and unaltered (the policy-mutation surface refused when the provider is read-only).
  • An in-tree example/null provider so the external path can be exercised end-to-end without a real provider behind it.
  • Documentation sufficient for a third party to build a provider against the integration.

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions