Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,6 @@ make docker-push IMG=<image> # Push container image
- **Condition ownership** — The planner owns all condition management on the owning resource. It sets conditions when creating plans (e.g., `NodeUpdateInProgress=True`) and when observing terminal plans (e.g., `NodeUpdateInProgress=False`). The executor does not set conditions — it only mutates plan/task state and phase transitions.
- **Single-patch model** — All status mutations (plan state, conditions, phase, currentImage) accumulate in-memory during a reconcile and are flushed in a single `Status().Patch()` at the end. Tasks mutate owned resources (StatefulSets, Services, PVCs); the executor mutates plan state in-memory; the reconciler flushes once.
- **Resource generators** live in `internal/noderesource/` — pure functions that produce StatefulSets, Services, and PVCs from a SeiNode spec. Used by both the controller and plan tasks.
- **Platform config** is resolved by `platform.Load` (`internal/platform/load.go`). Infra fields (scheduling, storage, resources, snapshot/genesis/result-export buckets, images) are read from the mounted app-config file (`SEI_CONTROLLER_CONFIG``platform.FileConfig`) when present, falling back to their historical env vars — PLT-475, transitional: the env fallback is removed in a follow-up once the ConfigMap is verified populated. Networking/gateway fields (`SEI_GATEWAY_*`, `SEI_P2P_ENDPOINT_DOMAIN`, `SEI_NLB_TARGET_TYPE`) stay env-sourced pending their removal from the controller in PLT-451. The file is read once at startup for infra fields (an infra change needs a restart); the `stateSync` section is re-read per reconcile (it hot-reloads). See `internal/platform/platform.go` for the field list and `docs/controller-app-config.md` for the file schema.
- **Platform config** is resolved by `platform.Load` (`internal/platform/load.go`). Infra fields (scheduling, storage, resources, snapshot/genesis/result-export buckets, images) come from the mounted app-config file (`SEI_CONTROLLER_CONFIG``platform.FileConfig`), which is authoritative — a required field unset in the file fails `Config.Validate` at startup. Networking/gateway fields (`SEI_GATEWAY_*`, `SEI_P2P_ENDPOINT_DOMAIN`, `SEI_NLB_TARGET_TYPE`) stay env-sourced pending their removal from the controller in PLT-451. The file is read once at startup for infra fields (an infra change needs a restart); the `stateSync` section is re-read per reconcile (it hot-reloads). See `internal/platform/platform.go` for the field list and `docs/controller-app-config.md` for the file schema.
- **Genesis resolution** is handled by the sidecar autonomously: embedded sei-config for well-known chains, S3 fallback at `{SEI_GENESIS_BUCKET}/{chainID}/genesis.json` for custom chains.
- Config keys in seid's `config.toml` use **hyphens** (e.g., `persistent-peers`, `trust-height`), not underscores.
26 changes: 3 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,29 +79,9 @@ spec:

## Platform Configuration

The controller reads all infrastructure-level settings from environment variables. Every field is required — the controller fails fast at startup if any are missing.

| Env var | Description |
|---------|-------------|
| `SEI_NODEPOOL_NAME` | Karpenter NodePool for pod scheduling |
| `SEI_TOLERATION_KEY` | Taint key to tolerate |
| `SEI_TOLERATION_VALUE` | Taint value to tolerate |
| `SEI_SERVICE_ACCOUNT` | ServiceAccount for node pods |
| `SEI_STORAGE_CLASS_PERF` | StorageClass for full/validator/archive nodes |
| `SEI_STORAGE_CLASS_DEFAULT` | StorageClass for other modes |
| `SEI_STORAGE_SIZE_DEFAULT` | PVC size for full/validator nodes |
| `SEI_STORAGE_SIZE_ARCHIVE` | PVC size for archive nodes |
| `SEI_RESOURCE_CPU_ARCHIVE` | CPU request for archive nodes |
| `SEI_RESOURCE_MEM_ARCHIVE` | Memory request for archive nodes |
| `SEI_RESOURCE_CPU_DEFAULT` | CPU request for full/validator nodes |
| `SEI_RESOURCE_MEM_DEFAULT` | Memory request for full/validator nodes |
| `SEI_SNAPSHOT_BUCKET` | S3 bucket for snapshot storage |
| `SEI_SNAPSHOT_REGION` | AWS region for snapshot S3 operations |
| `SEI_RESULT_EXPORT_BUCKET` | S3 bucket for shadow result exports |
| `SEI_RESULT_EXPORT_REGION` | AWS region for result export bucket |
| `SEI_RESULT_EXPORT_PREFIX` | S3 key prefix for result exports |
| `SEI_GENESIS_BUCKET` | S3 bucket for genesis artifacts |
| `SEI_GENESIS_REGION` | AWS region for genesis artifacts bucket |
Infrastructure-level settings (node pools, storage, resources, snapshot/genesis/result-export buckets, sidecar images) are read from the mounted app-config file (`SEI_CONTROLLER_CONFIG` → `platform.FileConfig`), which is authoritative — the controller fails fast at startup if a required field is unset. See [`docs/controller-app-config.md`](docs/controller-app-config.md) for the schema.

Gateway config (`SEI_GATEWAY_NAME`, `SEI_GATEWAY_NAMESPACE`, `SEI_GATEWAY_DOMAIN`) and the config-file path (`SEI_CONTROLLER_CONFIG`) remain environment variables.

## Development

Expand Down
45 changes: 5 additions & 40 deletions config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,46 +35,11 @@ spec:
image: 189176372795.dkr.ecr.us-east-2.amazonaws.com/sei/sei-k8s-controller:8b3f1749067bd07140fccd6d05da7894754688ca
name: manager
env:
- name: SEI_NODEPOOL_NAME
value: sei-node
- name: SEI_NODEPOOL_ARCHIVE
value: sei-archive
- name: SEI_TOLERATION_KEY
value: sei.io/workload
- name: SEI_SERVICE_ACCOUNT
value: seid-node
- name: SEI_STORAGE_CLASS_PERF
value: gp3-10k-750
- name: SEI_STORAGE_CLASS_DEFAULT
value: gp3
- name: SEI_STORAGE_CLASS_ARCHIVE
value: gp3-archive
- name: SEI_STORAGE_SIZE_DEFAULT
value: 2000Gi
- name: SEI_STORAGE_SIZE_ARCHIVE
value: 40Ti
- name: SEI_RESOURCE_CPU_ARCHIVE
value: "48"
- name: SEI_RESOURCE_MEM_ARCHIVE
value: 448Gi
- name: SEI_RESOURCE_CPU_DEFAULT
value: "4"
- name: SEI_RESOURCE_MEM_DEFAULT
value: 32Gi
- name: SEI_SNAPSHOT_BUCKET
value: dev-sei-snapshots
- name: SEI_SNAPSHOT_REGION
value: us-east-2
- name: SEI_RESULT_EXPORT_BUCKET
value: dev-sei-shadow-results
- name: SEI_RESULT_EXPORT_REGION
value: us-east-2
- name: SEI_RESULT_EXPORT_PREFIX
value: shadow-results/
- name: SEI_GENESIS_BUCKET
value: dev-sei-k8s-genesis-artifacts
- name: SEI_GENESIS_REGION
value: us-east-2
# Infra config (node pools, storage, resources, snapshot/genesis/
# result-export buckets, images) is sourced from the mounted
# app-config ConfigMap (SEI_CONTROLLER_CONFIG below); see
# docs/controller-app-config.md. Gateway config stays env-sourced
# pending its removal from the controller in PLT-451.
- name: SEI_GATEWAY_NAME
value: sei-gateway
- name: SEI_GATEWAY_NAMESPACE
Expand Down
66 changes: 32 additions & 34 deletions docs/controller-app-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,11 @@ Two read paths, by design:
- **`stateSync`** is re-read **per reconcile** so syncer changes hot-reload
without a restart (the directory mount swaps atomically).

## Transitional env fallback (PLT-475)
## Source of truth

For each infra field, a non-empty file value wins; an absent one falls back to
its historical `SEI_*` env var. So an unset `SEI_CONTROLLER_CONFIG` reproduces
the original all-env behavior. The fallback is removed in a follow-up once the
ConfigMap is verified populated, after which the file is authoritative.
The file is **authoritative** for infra config: a required field unset in the
file fails `Config.Validate` at startup (the controller does not boot). There is
no env-var fallback for these fields.

Networking/gateway config (`SEI_GATEWAY_*`, `SEI_P2P_ENDPOINT_DOMAIN`,
`SEI_NLB_TARGET_TYPE`) is **not** in the file — it stays env-sourced pending its
Expand All @@ -37,46 +36,45 @@ stateSync:
- rpc-1.example.net:26657
- rpc-2.example.net:26657

# --- infra (read once at startup; env-var fallback during PLT-475) ---
# --- infra (authoritative; read once at startup) ---

scheduling:
nodepoolName: sei-node # SEI_NODEPOOL_NAME
nodepoolArchive: sei-archive # SEI_NODEPOOL_ARCHIVE
tolerationKey: sei.io/workload # SEI_TOLERATION_KEY
serviceAccount: seid-node # SEI_SERVICE_ACCOUNT

storage: # note: no sizePerf — matches the historical env layout
classPerf: gp3-10k-750 # SEI_STORAGE_CLASS_PERF
classDefault: gp3 # SEI_STORAGE_CLASS_DEFAULT
classArchive: gp3-archive # SEI_STORAGE_CLASS_ARCHIVE
sizeDefault: 2000Gi # SEI_STORAGE_SIZE_DEFAULT
sizeArchive: 40Ti # SEI_STORAGE_SIZE_ARCHIVE
nodepoolName: sei-node
nodepoolArchive: sei-archive
tolerationKey: sei.io/workload
serviceAccount: seid-node

storage: # no sizePerf — perf is a storage-class tier only
classPerf: gp3-10k-750
classDefault: gp3
classArchive: gp3-archive
sizeDefault: 2000Gi
sizeArchive: 40Ti

resources:
cpuArchive: "48" # SEI_RESOURCE_CPU_ARCHIVE
memArchive: 448Gi # SEI_RESOURCE_MEM_ARCHIVE
cpuDefault: "4" # SEI_RESOURCE_CPU_DEFAULT
memDefault: 32Gi # SEI_RESOURCE_MEM_DEFAULT
cpuArchive: "48"
memArchive: 448Gi
cpuDefault: "4"
memDefault: 32Gi

snapshot:
bucket: sei-snapshots # SEI_SNAPSHOT_BUCKET
region: us-east-2 # SEI_SNAPSHOT_REGION
bucket: sei-snapshots
region: us-east-2

resultExport:
bucket: sei-shadow-results # SEI_RESULT_EXPORT_BUCKET
region: us-east-2 # SEI_RESULT_EXPORT_REGION
prefix: shadow-results/ # SEI_RESULT_EXPORT_PREFIX
bucket: sei-shadow-results
region: us-east-2
prefix: shadow-results/

genesis:
bucket: sei-k8s-genesis # SEI_GENESIS_BUCKET
region: us-east-2 # SEI_GENESIS_REGION
bucket: sei-k8s-genesis
region: us-east-2

images:
sidecar: ghcr.io/sei-protocol/seictl@sha256:... # SEI_SIDECAR_IMAGE
kubeRBACProxy: quay.io/brancz/kube-rbac-proxy:v0.19.1 # SEI_KUBE_RBAC_PROXY_IMAGE
cosmosExporter: ghcr.io/sei-protocol/sei-cosmos-exporter@sha256:... # SEI_COSMOS_EXPORTER_IMAGE
sidecar: ghcr.io/sei-protocol/seictl@sha256:...
kubeRBACProxy: quay.io/brancz/kube-rbac-proxy:v0.19.1
cosmosExporter: ghcr.io/sei-protocol/sei-cosmos-exporter@sha256:...
```

A present-but-unparseable file is a hard startup error — it never silently falls
back to env. Required fields missing from both sources fail `Config.Validate`
with a message naming the file key and the env var.
A present-but-unparseable file is a hard startup error. A required infra field
unset in the file fails `Config.Validate` at startup, naming the file key.
4 changes: 2 additions & 2 deletions internal/noderesource/noderesource.go
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ func DefaultResourcesForMode(mode string, p PlatformConfig) corev1.ResourceRequi
// never seid main or any non-sidecar init container.
func GenerateStatefulSet(node *seiv1alpha1.SeiNode, p PlatformConfig) (*appsv1.StatefulSet, error) {
if p.KubeRBACProxyImage == "" {
return nil, fmt.Errorf("SEI_KUBE_RBAC_PROXY_IMAGE is not configured on the controller")
return nil, fmt.Errorf("images.kubeRBACProxy is not configured in the app-config file")
}
one := int32(1)
labels := ResourceLabels(node)
Expand Down Expand Up @@ -644,7 +644,7 @@ func cosmosExporterWaitCommand() (command []string, args []string) {
// buildCosmosExporterContainer renders the cosmos-exporter sidecar.
func buildCosmosExporterContainer(p PlatformConfig) (corev1.Container, error) {
if p.CosmosExporterImage == "" {
return corev1.Container{}, fmt.Errorf("SEI_COSMOS_EXPORTER_IMAGE is required on the operator Deployment")
return corev1.Container{}, fmt.Errorf("images.cosmosExporter is required in the app-config file")
}
command, args := cosmosExporterWaitCommand()
return corev1.Container{
Expand Down
2 changes: 1 addition & 1 deletion internal/noderesource/noderesource_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1349,7 +1349,7 @@ func TestCosmosExporter_ErrorWhenImageUnset(t *testing.T) {
_, err := GenerateStatefulSet(node, cfg)

g.Expect(err).To(HaveOccurred())
g.Expect(err.Error()).To(ContainSubstring("SEI_COSMOS_EXPORTER_IMAGE is required"))
g.Expect(err.Error()).To(ContainSubstring("images.cosmosExporter is required"))
}

func TestCosmosExporter_ReadinessProbe_TargetsExporterListener(t *testing.T) {
Expand Down
2 changes: 1 addition & 1 deletion internal/noderesource/sidecar_proxy_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,5 +101,5 @@ func TestGenerateStatefulSet_ProxyImageMissing_Errors(t *testing.T) {
p.KubeRBACProxyImage = ""
_, err := GenerateStatefulSet(newGenesisNode("a", "default"), p)
g.Expect(err).To(HaveOccurred())
g.Expect(err.Error()).To(ContainSubstring("KUBE_RBAC_PROXY_IMAGE"))
g.Expect(err.Error()).To(ContainSubstring("images.kubeRBACProxy"))
}
110 changes: 35 additions & 75 deletions internal/platform/load.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,58 +9,27 @@ import (
)

// Environment-variable names. SEI_CONTROLLER_CONFIG points at the read-only
// app-config file (a GitOps-written ConfigMap mounted as a directory); the rest
// are the historical infra knobs Load falls back to when a field is absent from
// that file. Single source of truth — referenced by Load and Config.Validate.
// app-config file (a GitOps-written ConfigMap mounted as a directory). The
// gateway vars stay env-sourced pending their removal from the controller in
// the GitOps networking move (PLT-451); all other infra config is file-sourced.
const (
envControllerConfig = "SEI_CONTROLLER_CONFIG"

envNodepoolName = "SEI_NODEPOOL_NAME"
envNodepoolArchive = "SEI_NODEPOOL_ARCHIVE"
envTolerationKey = "SEI_TOLERATION_KEY"
envServiceAccount = "SEI_SERVICE_ACCOUNT"

envStorageClassPerf = "SEI_STORAGE_CLASS_PERF"
envStorageClassDefault = "SEI_STORAGE_CLASS_DEFAULT"
envStorageClassArchive = "SEI_STORAGE_CLASS_ARCHIVE"
envStorageSizeDefault = "SEI_STORAGE_SIZE_DEFAULT"
envStorageSizeArchive = "SEI_STORAGE_SIZE_ARCHIVE"

envResourceCPUArchive = "SEI_RESOURCE_CPU_ARCHIVE"
envResourceMemArchive = "SEI_RESOURCE_MEM_ARCHIVE"
envResourceCPUDefault = "SEI_RESOURCE_CPU_DEFAULT"
envResourceMemDefault = "SEI_RESOURCE_MEM_DEFAULT"

envSnapshotBucket = "SEI_SNAPSHOT_BUCKET"
envSnapshotRegion = "SEI_SNAPSHOT_REGION"

envResultExportBucket = "SEI_RESULT_EXPORT_BUCKET"
envResultExportRegion = "SEI_RESULT_EXPORT_REGION"
envResultExportPrefix = "SEI_RESULT_EXPORT_PREFIX"

envGenesisBucket = "SEI_GENESIS_BUCKET"
envGenesisRegion = "SEI_GENESIS_REGION"

envSidecarImage = "SEI_SIDECAR_IMAGE"
envKubeRBACProxyImage = "SEI_KUBE_RBAC_PROXY_IMAGE"
envCosmosExporterImage = "SEI_COSMOS_EXPORTER_IMAGE"

envGatewayName = "SEI_GATEWAY_NAME"
envGatewayNamespace = "SEI_GATEWAY_NAMESPACE"
envGatewayDomain = "SEI_GATEWAY_DOMAIN"
envGatewayPublicDomain = "SEI_GATEWAY_PUBLIC_DOMAIN"
)

// Load resolves the platform Config at startup. A non-empty value in the
// app-config file wins; an absent infra field falls back to its historical env
// var, so an unset SEI_CONTROLLER_CONFIG yields the original all-env behavior.
// That env fallback is transitional — removed once the ConfigMap is populated
// everywhere (PLT-475). Networking/gateway fields and the config-file path
// itself are env-sourced.
// Load resolves the platform Config at startup. Infra config is read from the
// app-config file (SEI_CONTROLLER_CONFIG → FileConfig), which is authoritative.
// The networking/gateway fields and the config-file path itself stay
// env-sourced, pending the gateway fields' removal in the GitOps networking
// move (PLT-451).
//
// The file is read once here; infra changes therefore require a controller
// restart. The stateSync section is read per-reconcile elsewhere (it hot-reloads).
// Caller is expected to run Config.Validate after Load.
// The file is read once here; infra changes require a controller restart. The
// stateSync section is read per-reconcile elsewhere (it hot-reloads). Caller is
// expected to run Config.Validate after Load.
func Load() (Config, error) {
path := strings.TrimSpace(os.Getenv(envControllerConfig))
file, err := ReadFileConfig(path)
Expand All @@ -69,38 +38,38 @@ func Load() (Config, error) {
}

return Config{
NodepoolName: fileOrEnv(file.Scheduling.NodepoolName, envNodepoolName),
NodepoolArchive: fileOrEnv(file.Scheduling.NodepoolArchive, envNodepoolArchive),
TolerationKey: fileOrEnv(file.Scheduling.TolerationKey, envTolerationKey),
ServiceAccount: fileOrEnv(file.Scheduling.ServiceAccount, envServiceAccount),
NodepoolName: file.Scheduling.NodepoolName,
NodepoolArchive: file.Scheduling.NodepoolArchive,
TolerationKey: file.Scheduling.TolerationKey,
ServiceAccount: file.Scheduling.ServiceAccount,

StorageClassPerf: fileOrEnv(file.Storage.ClassPerf, envStorageClassPerf),
StorageClassDefault: fileOrEnv(file.Storage.ClassDefault, envStorageClassDefault),
StorageClassArchive: fileOrEnv(file.Storage.ClassArchive, envStorageClassArchive),
StorageSizeDefault: fileOrEnv(file.Storage.SizeDefault, envStorageSizeDefault),
StorageSizeArchive: fileOrEnv(file.Storage.SizeArchive, envStorageSizeArchive),
StorageClassPerf: file.Storage.ClassPerf,
StorageClassDefault: file.Storage.ClassDefault,
StorageClassArchive: file.Storage.ClassArchive,
StorageSizeDefault: file.Storage.SizeDefault,
StorageSizeArchive: file.Storage.SizeArchive,

ResourceCPUArchive: fileOrEnv(file.Resources.CPUArchive, envResourceCPUArchive),
ResourceMemArchive: fileOrEnv(file.Resources.MemArchive, envResourceMemArchive),
ResourceCPUDefault: fileOrEnv(file.Resources.CPUDefault, envResourceCPUDefault),
ResourceMemDefault: fileOrEnv(file.Resources.MemDefault, envResourceMemDefault),
ResourceCPUArchive: file.Resources.CPUArchive,
ResourceMemArchive: file.Resources.MemArchive,
ResourceCPUDefault: file.Resources.CPUDefault,
ResourceMemDefault: file.Resources.MemDefault,

SnapshotBucket: fileOrEnv(file.Snapshot.Bucket, envSnapshotBucket),
SnapshotRegion: fileOrEnv(file.Snapshot.Region, envSnapshotRegion),
SnapshotBucket: file.Snapshot.Bucket,
SnapshotRegion: file.Snapshot.Region,

ResultExportBucket: fileOrEnv(file.ResultExport.Bucket, envResultExportBucket),
ResultExportRegion: fileOrEnv(file.ResultExport.Region, envResultExportRegion),
ResultExportPrefix: fileOrEnv(file.ResultExport.Prefix, envResultExportPrefix),
ResultExportBucket: file.ResultExport.Bucket,
ResultExportRegion: file.ResultExport.Region,
ResultExportPrefix: file.ResultExport.Prefix,

GenesisBucket: fileOrEnv(file.Genesis.Bucket, envGenesisBucket),
GenesisRegion: fileOrEnv(file.Genesis.Region, envGenesisRegion),
GenesisBucket: file.Genesis.Bucket,
GenesisRegion: file.Genesis.Region,

SidecarImage: fileOrEnv(file.Images.Sidecar, envSidecarImage),
KubeRBACProxyImage: fileOrEnv(file.Images.KubeRBACProxy, envKubeRBACProxyImage),
CosmosExporterImage: fileOrEnv(file.Images.CosmosExporter, envCosmosExporterImage),
SidecarImage: file.Images.Sidecar,
KubeRBACProxyImage: file.Images.KubeRBACProxy,
CosmosExporterImage: file.Images.CosmosExporter,

// Networking/gateway: env-only, pending removal in the GitOps networking
// move (PLT-451). Not migrated to the file to avoid migrate-then-delete.
// move (PLT-451).
GatewayName: os.Getenv(envGatewayName),
GatewayNamespace: os.Getenv(envGatewayNamespace),
GatewayDomain: os.Getenv(envGatewayDomain),
Expand Down Expand Up @@ -132,12 +101,3 @@ func ReadFileConfig(path string) (FileConfig, error) {
}
return cfg, nil
}

// fileOrEnv returns the file value when non-empty, otherwise the named env var
// (the transitional fallback).
func fileOrEnv(fileVal, envVar string) string {
if strings.TrimSpace(fileVal) != "" {
return fileVal
}
return os.Getenv(envVar)
}
Loading
Loading