feat: pluggable policy sourcing via a Policy Provider subsystem

## Problem Statement

OpenShell sources sandbox policy exactly one way: authored, validated, and composed **inside the gateway**, then enforced by the supervisor (Landlock/seccomp + proxy/OPA). Policy therefore lives entirely in the gateway's trust domain, with a single built-in implementation and no seam to source it elsewhere.

That can't serve deployments where the policy authority and the gateway are **different parties** — notably enterprise models needing attestation and independent audit: policy signed by a central authority in a separate trust domain; tamper-evident even against a compromised gateway; independently verifiable by auditors against a signed artifact. The built-in path structurally can't provide this — it sits inside the gateway's own trust domain.

Meanwhile OpenShell already makes **compute, credentials, and identity** pluggable via a driver model (a `type =` selector + a gRPC driver, [RFC 0001](https://github.com/NVIDIA/OpenShell/blob/main/rfc/0001-core-architecture/README.md)). Policy is the one gateway concern that isn't.

## Prior Art

The proposal has two halves — a pluggable seam, and sourcing signed policy from outside the gateway — each grounded in established systems.

**Pluggable drivers with a neutral core.** A thin, versioned contract at the boundary lets an ecosystem add backends without the core growing backend-specific knowledge:

- **HashiCorp Nomad task drivers** and **Terraform providers** — core defines a minimal contract; each backend implements it out-of-process; the core stays agnostic. RFC 0001 cites both as the lineage for OpenShell's own driver model.
- **Kubernetes CRI / CSI** — the kubelet and control plane talk to runtime and storage plugins over a stable gRPC contract and pass driver-specific config through without interpreting it, so new drivers ship independently of the Kubernetes release cycle.
- **OpenShell's own out-of-tree compute driver** ([#1703](https://github.com/NVIDIA/OpenShell/pull/1703), RFC [#1344](https://github.com/NVIDIA/OpenShell/pull/1344)) — the gateway already dispatches to an external compute driver over a Unix socket via `--compute-driver-socket`. This RFC applies the same out-of-tree-over-socket shape to policy.

**Sourcing signed policy from outside the consumer's trust domain.** A separate process vending verified, signed policy to a local consumer over a host-local contract, backed by an authority in its own trust domain, mirrors:

- **Open Policy Agent (OPA) signed bundles** — a central authority signs policy bundles; a co-located OPA verifies the signature before serving decisions, so the consuming service need not trust the bundle's transport.
- **SPIFFE/SPIRE** — a host-local agent issues workload identities over a local socket, backed by a separate central server. The same split (local contract, separate authority, consumer verifies) is what makes attested policy tamper-evident even against a compromised gateway.

## Proposed Design

Add a **Policy subsystem** following that same driver model:

- `local` *(default)* — today's store-backed path, unchanged.
- `external` — the gateway sources policy from a separate provider process over a gRPC contract (UDS now, network later).

At admission the gateway acquires a per-sandbox handle, fetches a **projection** (a serialized `SandboxPolicy` body + digest, optionally signed), verifies the signature against a configured trust store, then relays the body to the supervisor over the existing channel. Everything downstream is unchanged.

One driver per gateway, selected at startup; opt-in and additive (`local` stays the default).

## Use Cases


| Scenario                                         | Driver                                                   | What it enables                                                                                                                                                             |
| ------------------------------------------------ | -------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| One party owns OpenShell and policy (status quo) | `local`                                                  | Today's behavior — user-authored policy in the gateway, including the RFC 0002 agent-driven loop. Unchanged.                                                                |
| Enterprise with a central policy authority       | `external` (attesting)                                   | Policy signed off-host by a security team in a separate trust domain; tamper-evident against a compromised gateway and independently auditable against the signed artifact. |
| Fleet of gateways sharing one policy source      | `external` (network transport)                           | A central provider serves many gateways, so policy is administered once and sourced consistently across the fleet.                                                          |
| Policy-as-code from a git or bundle server       | `external`                                               | A provider process pulls versioned policy from a git repo or bundle server and projects it, without building that integration into the gateway.                             |
| Per-tenant / per-user policy                     | `external` + runtime-context extensions (`tenant_id`, …) | The provider returns a different projection per tenant or user, scoped by the runtime context the gateway binds at admission.                                               |


## Alternatives Considered

- **Driver in-process / per-sandbox.** Collapses the trust split — a per-sandbox provider sits inside the domain it constrains. `external` keeps the provider a separate process, like OpenShell's other drivers.
- **Do nothing.** Fine when one party owns both OpenShell and policy; leaves enterprise deployments trusting the gateway as the policy authority, with no attestation or independent audit.

## Compatibility

- **Additive and opt-in.** `local` remains the default; a deployment opts in by setting `type = "external"`. Gateways that never configure it behave exactly as today.
- **No change downstream.** The projection body is the existing `openshell.sandbox.v1.SandboxPolicy`; the supervisor, proxy, and OPA engine are untouched, so the enforcement path is unchanged.
- **Wire-safe.** The driver contract is versioned via `surface_id` / `supported_surfaces`; the gateway and driver reconcile at startup and fail closed on no overlap rather than enforcing a mismatched schema.
- **No data migration.** Switching to `external` changes where policy is *sourced*, not how it's stored or enforced; switching back to `local` restores the built-in path with no migration.
- **Behavioral change to be explicit about.** Under a non-mutating `external` driver, the runtime policy-mutation surface (`openshell policy set`/`update`, global `policy delete`, and the RFC 0002 agent-driven loop) is refused. This is intended, but it is a visible change for callers that rely on those verbs.

## Definition of Done

- [ ] A `policy` subsystem on the gateway with a driver selector, defaulting to today's built-in path.
- [ ] An `external` driver that sources policy from a separate provider process, leaving the built-in `local` path unchanged and the default.
- [ ] The gateway verifies and enforces sourced policy end-to-end: authentic (signature verified against a configured trust store), complete (the whole policy enforced or admission refused), and unaltered (the policy-mutation surface refused when the provider is read-only).
- [ ] An in-tree example/null provider so the external path can be exercised end-to-end without a real provider behind it.
- [ ] Documentation sufficient for a third party to build a provider against the integration.

## Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pluggable policy sourcing via a Policy Provider subsystem #1713

Problem Statement

Prior Art

Proposed Design

Use Cases

Alternatives Considered

Compatibility

Definition of Done

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scenario	Driver	What it enables
One party owns OpenShell and policy (status quo)	`local`	Today's behavior — user-authored policy in the gateway, including the RFC 0002 agent-driven loop. Unchanged.
Enterprise with a central policy authority	`external` (attesting)	Policy signed off-host by a security team in a separate trust domain; tamper-evident against a compromised gateway and independently auditable against the signed artifact.
Fleet of gateways sharing one policy source	`external` (network transport)	A central provider serves many gateways, so policy is administered once and sourced consistently across the fleet.
Policy-as-code from a git or bundle server	`external`	A provider process pulls versioned policy from a git repo or bundle server and projects it, without building that integration into the gateway.
Per-tenant / per-user policy	`external` + runtime-context extensions (`tenant_id`, …)	The provider returns a different projection per tenant or user, scoped by the runtime context the gateway binds at admission.

feat: pluggable policy sourcing via a Policy Provider subsystem #1713

Description

Problem Statement

Prior Art

Proposed Design

Use Cases

Alternatives Considered

Compatibility

Definition of Done

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions