Skip to content

feat(platform): make the app-config file authoritative (PLT-475 PR-2)#403

Merged
bdchatham merged 3 commits into
mainfrom
plt475/pr2-file-authoritative
Jun 13, 2026
Merged

feat(platform): make the app-config file authoritative (PLT-475 PR-2)#403
bdchatham merged 3 commits into
mainfrom
plt475/pr2-file-authoritative

Conversation

@bdchatham

@bdchatham bdchatham commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Draft — gated. Do not merge/deploy until the sei-controller-config ConfigMap is verified populated in every target environment. This PR removes the env fallback, so a controller whose ConfigMap is missing a required infra field fails Config.Validate at boot (crashloop). Build + cross-review are safe; merge is the gated one-way step (ConfigMap must land before this image).

What

PR-2 of PLT-475 — makes the mounted app-config file authoritative, dropping the transitional env fallback from #402.

  • platform.Load reads infra straight from FileConfig (no fileOrEnv); gateway stays os.Getenv (pending PLT-451). Migrated env-name constants removed.
  • Config.Validate reports the file key for infra fields, and now requires images.cosmosExporter.
  • config/manager/manager.yaml drops the migrated infra env vars; SEI_GATEWAY_* + SEI_CONTROLLER_CONFIG remain.
  • Docs (CLAUDE.md, docs/controller-app-config.md, README.md) + noderesource.go error strings updated to file-authoritative; dead # SEI_* infra references scrubbed.
  • Tests rewritten: file-sourced, infra env ignored, missing field fails Validate, gateway from env.

⚠️ Behavior change beyond "drop the fallback"

images.cosmosExporter is newly boot-required. Previously it failed lazily at first node pod-build (noderesource); now Validate requires it at startup. Strict improvement (fail-fast — the cosmos-exporter container is on every pod), but any environment whose ConfigMap omits it will crashloop. The deploy gate covers this field too.

ConfigMap handoff

The infra values previously in manager.yaml must be in the GitOps sei-controller-config ConfigMap before deploy (schema: docs/controller-app-config.md). Repo-base defaults that were removed:

scheduling: { nodepoolName: sei-node, nodepoolArchive: sei-archive, tolerationKey: sei.io/workload, serviceAccount: seid-node }
storage:    { classPerf: gp3-10k-750, classDefault: gp3, classArchive: gp3-archive, sizeDefault: 2000Gi, sizeArchive: 40Ti }
resources:  { cpuArchive: "48", memArchive: 448Gi, cpuDefault: "4", memDefault: 32Gi }
snapshot:     { bucket: dev-sei-snapshots, region: us-east-2 }
resultExport: { bucket: dev-sei-shadow-results, region: us-east-2, prefix: shadow-results/ }
genesis:      { bucket: dev-sei-k8s-genesis-artifacts, region: us-east-2 }
images:     { sidecar: <digest>, kubeRBACProxy: quay.io/brancz/kube-rbac-proxy:v0.19.1, cosmosExporter: <digest> }

(env-specific values come from platform-repo overlays; these are repo-base defaults.)

Cross-review (4 lenses, clean)

platform / security / kubernetes / idiom all COMPATIBLE. Resolved findings folded in: scrubbed the dead # SEI_* doc/error references (platform + kubernetes), confirmed file-authoritative strictly narrows the trust surface and all failure modes fail closed (security), confirmed the cosmosExporter fold-in is a strict improvement with no test/path regressing (kubernetes).

UnmarshalStrict — decided: stay lenient. ReadFileConfig is shared with the per-reconcile state-sync read; strict there would break state-sync hot-reload on controller/ConfigMap version skew, and strict-at-startup only improves error text over a Validate failure you already get. A startup-only warn (non-fatal) is a possible fast-follow if config typos cause real friction — deferred.

🤖 Generated with Claude Code

bdchatham and others added 3 commits June 12, 2026 16:10
Drops the transitional env fallback established in PR-1 (#402): infra config
now comes solely from the mounted app-config file.

- platform.Load reads infra fields straight from FileConfig (no fileOrEnv);
  gateway fields stay env-sourced (pending PLT-451). The migrated env-name
  constants are removed.
- Config.Validate reports the file key for infra fields (no "or SEI_*"), and
  now also requires images.cosmosExporter (previously validated lazily at
  pod-build).
- config/manager/manager.yaml drops the migrated infra env vars; gateway +
  SEI_CONTROLLER_CONFIG remain.
- CLAUDE.md + docs/controller-app-config.md updated to "file-authoritative".
- Tests rewritten for file-authoritative behavior (file sourced, infra env
  ignored, missing field fails Validate, gateway from env).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Platform + kubernetes lenses flagged stale env references now that infra config
is file-authoritative:
- docs/controller-app-config.md: drop the per-key # SEI_* annotations and the
  "env-var fallback" framing; the file is the source.
- README.md: Platform Configuration section now points at the app-config file
  (was "reads all settings from environment variables").
- noderesource.go: kubeRBACProxy/cosmosExporter "not configured" errors name
  the file key, not the removed env var.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@bdchatham bdchatham marked this pull request as ready for review June 13, 2026 15:59
@bdchatham bdchatham merged commit 2f6a62c into main Jun 13, 2026
4 checks passed
@cursor

cursor Bot commented Jun 13, 2026

Copy link
Copy Markdown

PR Summary

High Risk
One-way operational change: missing or incomplete app-config causes crashloop at startup; merge/deploy must follow populated ConfigMap in every target environment.

Overview
Makes the mounted app-config file the sole source for controller infra settings, completing PLT-475 by removing the transitional SEI_* env fallback. platform.Load maps infra fields only from FileConfig; gateway settings still come from env (PLT-451).

Config.Validate now reports missing infra using file keys (e.g. scheduling.nodepoolName) and requires images.cosmosExporter at startup (previously failed only when building node pods).

config/manager/manager.yaml drops the migrated infra env block; SEI_GATEWAY_* and SEI_CONTROLLER_CONFIG remain. Docs and noderesource error strings point at app-config keys instead of env var names. load_test asserts infra env is ignored and missing file/fields fail validation.

Deploy gate: environments must have a complete sei-controller-config ConfigMap (including images.cosmosExporter) before rolling this image, or the manager will exit on Validate.

Reviewed by Cursor Bugbot for commit 056102a. Bugbot is set up for automated code reviews on this repo. Configure here.

@bdchatham bdchatham deleted the plt475/pr2-file-authoritative branch June 13, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant