Broken off from:
Summary
Context
Goal: Enable multiple ateom implementations (specifically gVisor + microVM) for substrate worker pools by decoupling sandbox binary configuration from ActorTemplate.
The problem today: ActorTemplate must specify the runsc (gVisor) binary for every architecture (amd64 + arm64), including paths and hashes. This is required because gVisor snapshots aren't compatible across versions. It has two downsides:
Usability — every ActorTemplate author must know about and pin a gVisor version.
Security — anyone who can create an ActorTemplate can point to a potentially compromised sandbox binary.
Proposed changes:
Split runsc config out of ActorTemplate.
Add sandboxClass to ActorTemplate to select a sandbox implementation.
Support per-ateom config formats.
Store binary version with the snapshot: Since the sandbox binary version is really only needed at GoldenSnapshot creation, move binary configuration to a CRD/ConfigMap on the worker pool. When a golden snapshot is created, the worker records everything needed to use it (versions/paths/config) in the snapshot itself; on restore, that config is fetched.
Benefits:
Reduces trust needed for ActorTemplate creation (binary selection moves to a separate role/config).
Casual users no longer need to know about sandbox binaries.
Per-implementation config without polluting the shared ActorTemplate API.
Downside: A small config artifact must be fetched from the snapshot before the sandbox binaries — but binaries are usually cached in atelet and rarely change, and can be prefetched on startup. Analogous to how container images have a top-level manifest. Could later enable broader snapshot versioning.
SandboxClass
SandboxClass (deferred): Today the default sandbox (gVisor) is non-configurable. Proposes a configurable default (like storageClass), possibly per-subspace rather than per-cluster (ties into the "subspaces" proposal). Deferrable — keep gVisor as default for now.
Full doc: https://docs.google.com/document/d/1U1Q7Njy-XDKwnhhU4t6KJGfWKtBL-bZvmpLIB4MA3RI/edit?tab=t.0
(shared with the ate-dev mailinglist documented in the README, join the list for access)
Broken off from:
Summary
Context
Goal: Enable multiple ateom implementations (specifically gVisor + microVM) for substrate worker pools by decoupling sandbox binary configuration from ActorTemplate.
The problem today: ActorTemplate must specify the runsc (gVisor) binary for every architecture (amd64 + arm64), including paths and hashes. This is required because gVisor snapshots aren't compatible across versions. It has two downsides:
Usability — every ActorTemplate author must know about and pin a gVisor version.
Security — anyone who can create an ActorTemplate can point to a potentially compromised sandbox binary.
Proposed changes:
Split runsc config out of ActorTemplate.
Add
sandboxClassto ActorTemplate to select a sandbox implementation.Support per-ateom config formats.
Store binary version with the snapshot: Since the sandbox binary version is really only needed at GoldenSnapshot creation, move binary configuration to a CRD/ConfigMap on the worker pool. When a golden snapshot is created, the worker records everything needed to use it (versions/paths/config) in the snapshot itself; on restore, that config is fetched.
Benefits:
Reduces trust needed for ActorTemplate creation (binary selection moves to a separate role/config).
Casual users no longer need to know about sandbox binaries.
Per-implementation config without polluting the shared ActorTemplate API.
Downside: A small config artifact must be fetched from the snapshot before the sandbox binaries — but binaries are usually cached in atelet and rarely change, and can be prefetched on startup. Analogous to how container images have a top-level manifest. Could later enable broader snapshot versioning.
SandboxClass
SandboxClass (deferred): Today the default sandbox (gVisor) is non-configurable. Proposes a configurable default (like storageClass), possibly per-subspace rather than per-cluster (ties into the "subspaces" proposal). Deferrable — keep gVisor as default for now.
Full doc: https://docs.google.com/document/d/1U1Q7Njy-XDKwnhhU4t6KJGfWKtBL-bZvmpLIB4MA3RI/edit?tab=t.0
(shared with the ate-dev mailinglist documented in the README, join the list for access)