From 8583a3df0874ef2aa3c209ea934b28c94b8d5314 Mon Sep 17 00:00:00 2001 From: bdchatham Date: Fri, 12 Jun 2026 16:41:07 -0700 Subject: [PATCH 1/3] docs: add SeiNodeTask CRD operator reference (PLT-488) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds docs/seinode-task.md — the operator/automation reference for SeiNodeTask: kinds + non-obvious behavior notes, lifecycle (deterministic task ID, terminal requirePhase timeout), conditions + reason enums, the 3-branch operator-key signing chain, chain-vs-controller idempotency, the status.outputs-unpopulated reality (overrides the godoc), GitOps patterns, reconciliation cadence + the fee-floor / value-encoding / voting-window gotchas, and a vote-fan-out example. Links it from the README and notes in CLAUDE.md that its headings are cited anchors for PLT-489. Design: bdchatham-designs designs/wave/seinode-task-crd-docs.md (Coral-signed-off). Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 1 + README.md | 6 ++ docs/seinode-task.md | 144 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 151 insertions(+) create mode 100644 docs/seinode-task.md diff --git a/CLAUDE.md b/CLAUDE.md index 722674f..62a7e0d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -92,6 +92,7 @@ Don't mix polarities for the same subject (no `XReady` + `XFailed` — pick one - Edit types in `api/v1alpha1/` (e.g., `seinode_types.go`, `seinodedeployment_types.go`, `validator_types.go`). - After any type change, run `make manifests generate` to regenerate CRD YAML and DeepCopy methods. - Never hand-edit files in `manifests/` or `zz_generated.deepcopy.go`. +- When changing `SeiNodeTask` kinds or their operational behavior, update `docs/seinode-task.md`. Its section headings are **cited anchors** for the gov-ops skill (PLT-489) — renaming one is a breaking change for that consumer. ### RBAC - RBAC is generated from `// +kubebuilder:rbac:` markers on controller files. diff --git a/README.md b/README.md index a078101..a4afe13 100644 --- a/README.md +++ b/README.md @@ -77,6 +77,12 @@ spec: | Archive | `spec.archive` set | State sync with archival pruning configuration | | Replayer | `spec.replayer` set | Snapshot restore with result export for shadow validation | +### SeiNodeTask + +A one-shot operation against a single `SeiNode` (governance votes/proposals, image updates, +height/condition waits). See **[docs/seinode-task.md](docs/seinode-task.md)** for the kinds, +lifecycle, signing topology, idempotency, and operational gotchas. + ## Platform Configuration The controller reads all infrastructure-level settings from environment variables. Every field is required — the controller fails fast at startup if any are missing. diff --git a/docs/seinode-task.md b/docs/seinode-task.md new file mode 100644 index 0000000..3f53b85 --- /dev/null +++ b/docs/seinode-task.md @@ -0,0 +1,144 @@ +# SeiNodeTask + +A `SeiNodeTask` is a **one-shot operation against a single `SeiNode`**, driven to a +terminal state by the `nodetask` controller. This is the operator/automation +reference; field-level contracts live in the godoc (`api/v1alpha1/seinodetask_types.go`) +and the design record in the LLD (PR #277). Field definitions are **not** restated +here — only the operational behavior that the types don't tell you. + +> Anchors below are a cited contract for the gov-ops skill (PLT-489) — **renaming a +> heading is a breaking change** for its references. + +## Kinds + +One line per kind; the **behavior note** is the part not obvious from the payload. +`spec.kind` is **immutable**, and **exactly one** payload sub-spec must match it +(admission-rejected otherwise). + +| kind | does | behavior note (not in the payload) | +|---|---|---| +| `GovVote` | `MsgVote` on a proposal | chain-idempotent (last-write-wins per voter) — re-apply is safe | +| `GovSoftwareUpgrade` | submits a software-upgrade proposal | content hardcoded to `SoftwareUpgradeProposal`; **not** chain-idempotent | +| `GovParamChange` *(planned, PLT-487)* | submits a `ParameterChangeProposal` | `value` is JSON of the param's type (see [Gotchas](#reconciliation-cadence--gotchas)); **not** idempotent | +| `AwaitCondition` | waits on a local node condition (height) | `action: SIGTERM_SEID` SIGTERMs seid after the condition — the coordinated halt-at-height primitive | +| `AwaitNodesAtHeight` | waits for the target to cross a height | **single-node** (`target.nodeRef`); maps to the sidecar `await-condition(height=H)` and **drops `action`** | +| `UpdateNodeImage` | patches `spec.image`, waits for `currentImage` | **no readiness check** — completes on image observation; green ≠ healthy node | + +## Lifecycle + +Phases: `Pending → Running → Complete | Failed`. The task ID is **UUIDv5 of +`(CR UID, spec.kind, 0)`** — so **re-applying the same CR re-joins** the in-flight +task (never re-submits), and **delete+recreate mints a new ID** (a new run). `status.task.id` +is stamped **atomically before any side effect**; terminal CRs are **no-op reconciled**. + +- `target.requirePhase` (default `Running`) / `requirePhaseTimeout` (default `5m`) gate + dispatch; **the timeout is terminal** — a target that never reaches the phase fails the + task (delete+recreate to retry), it does not wait forever. +- `timeoutSeconds` (0 = unbounded) bounds the run from `status.startedAt`. +- Status writes use the single optimistic-lock patch model (see `CLAUDE.md`). + +## Conditions + +`Ready`/`Failed` are a **latch pair** (the documented `kubectl wait` exception): +`Ready=True` only at `Complete`, `Failed=True` only at `Failed`. Treat `reason` as the +stable public API for runbooks/alerting: + +- `TargetReady`: `Resolving` / `PhaseMet` / `PhaseNotMet` / `ResolveTimeout` +- `Failed`: `TargetResolveFailed` / `UnsupportedKind` / `TaskTerminalError` / `TaskFailed` / `Timeout` / `DeserializeFailed` + +> Caveat: the direct condition writes in `nodetask/controller.go` do **not** set +> `observedGeneration` today (a known `CLAUDE.md` divergence) — do not gate on condition +> freshness for SeiNodeTask until it is harmonized. + +## Signing topology + +The operator keyring is **sidecar-resident** (mounted on `sei-sidecar`, not signable from +the main `seid` container). `keyName` resolves via **`ResolveOperatorKeyringUID`** +(`api/v1alpha1/validator_types.go`), a three-branch chain: + +1. explicit `keyName` → that key; +2. `.validator.operatorKeyring.secret` set but `keyName` empty → **`node_admin`**; +3. `.secret` **unset** → **`validator`** (the gentx genesis-ceremony key). + +A gentx-bootstrapped validator that assumes `node_admin` signs with the wrong key. +Out-of-band proposal *submission* needs a sidecar-resident or separately-funded key for +the same reason — `seid` in the main container has no operator key. + +## Idempotency + +Two layers, distinct: + +- **Chain-level:** `GovVote` is last-write-wins (re-apply safe); `MsgSubmitProposal` + (`GovSoftwareUpgrade`/`GovParamChange`) is **not** idempotent — a duplicate submit is a + second proposal + second deposit. +- **Controller-level (at-least-once):** the deterministic ID re-joins on reconciler + restart, and a `submittedAt` stamp guards `Execute` to run **once**. So the submit-proposal + rehydration risk is a **controller-restart-after-submit** window, not a re-apply window. + +## status.outputs reality + +`status.outputs` is **unpopulated for all sidecar-backed kinds** (`GovVote`, +`GovSoftwareUpgrade`, `AwaitCondition`, `AwaitNodesAtHeight`) — **only** +`UpdateNodeImage.appliedImage` is set. The typed output godoc (`GovVoteOutputs.txHash`, +etc.) is forward-compatible shape that is **currently never written** — read the **chain** +for a tx hash or proposal id, not `status`. (This overrides the godoc, which reads as if +outputs are populated.) + +## GitOps patterns + +Tasks are ordinary manifests: wire them into a cluster's Flux path and `flux reconcile`. +Notes: the runtime `proposalId` of a fresh proposal isn't known until submission, so a +vote manifest's `proposalId` is filled post-submit; after votes complete, the one-shot CRs +sit terminal (prune via a follow-up PR); fan out **per cluster** (a `nodeRef` only resolves +in its own cluster). + +## Reconciliation cadence & gotchas + +Cadence (the latency budget the voting-window gotcha spends): the controller **Watches** +(not Owns) SeiNode, so a target status change wakes the task; steady poll **15s**, +target-wait **5s**, sidecar HTTP timeouts **30s/15s**. + +Gotchas (invariant first; the canonical home — also restated at point of use): + +1. **Fees must clear the chain-enforced min-gas-price**, which CheckTx enforces + independent of the validator's local `app.toml`. A too-low fee is a **silent CheckTx + code-13 retry-loop**, not a visible rejection — the tally never moves. *(Worked + instance: arctic-1 enforces `0.02usei/gas`, not the `0.01` in app.toml — size + `fees ≥ gas × 0.02usei`.)* +2. **A param-change `value` is JSON of the param's registered type** — a scalar (`100`), + string (`"86400000000000"`), bool, or object — **not** a re-escaped string. A quoted + string double-encodes and **fails at apply** (deposit consumed). *(Pending PLT-487 for + the CRD path; applies to the out-of-band `seid tx` path today.)* +3. **Governance `voting_period` vs the cadence above** — arctic-1's 5-minute window is + tight against merge→reconcile→broadcast; size fees right (gotcha 1) and apply promptly, + or raise `voting_period` first. + +## Example — vote fan-out + +Cast a yes-vote from a validator (`keyName` omitted → resolves `node_admin`). `fees` clears +the chain floor (gotcha 1); `proposalId` is filled after the proposal is submitted. + +```yaml +apiVersion: sei.io/v1alpha1 +kind: SeiNodeTask +metadata: + name: govvote-prop42-validator-0 + namespace: arctic-1 +spec: + kind: GovVote + timeoutSeconds: 600 + target: + nodeRef: + name: validator-0-0 # the SeiNode, not the SeiNodeDeployment or the pod + requirePhase: Running + requirePhaseTimeout: 30m # ride through a transient validator restart + govVote: + chainId: arctic-1 + proposalId: 42 # filled post-submission + option: "yes" # quote it — bare yes is YAML true + fees: "8000usei" # >= gas * chain min-gas-price (0.02usei/gas on arctic-1) + gas: 300000 +``` + +Generate one per validator from the live list per cluster, wire into that cluster's Flux +path, and `flux reconcile`. A param-change submission example follows once PLT-487 lands. From 9fc33b4df702725dae722a5e70354b1c9c573aba Mon Sep 17 00:00:00 2001 From: bdchatham Date: Fri, 12 Jun 2026 16:46:02 -0700 Subject: [PATCH 2/3] docs: correct SeiNodeTask doc against current main (review fixes) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add RestartSeid + MarkReady kinds (now wired in the enum) - timeoutSeconds measured from status.task.executionStartedAt; 0 = per-kind default (RestartSeid 10m / MarkReady 2m), not universally unbounded - TargetReady reasons are PhaseMet/PhaseNotMet only; Failed adds ParamsBuildFailed - ObservedGeneration IS set (setCondition harmonized) — drop the stale caveat - Signing: resolveSigningUID takes explicit keyName, else ResolveOperatorKeyringUID - Stabilize the status.outputs heading; trim a redundant trailing line --- docs/seinode-task.md | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/docs/seinode-task.md b/docs/seinode-task.md index 3f53b85..e296fb8 100644 --- a/docs/seinode-task.md +++ b/docs/seinode-task.md @@ -23,6 +23,8 @@ One line per kind; the **behavior note** is the part not obvious from the payloa | `AwaitCondition` | waits on a local node condition (height) | `action: SIGTERM_SEID` SIGTERMs seid after the condition — the coordinated halt-at-height primitive | | `AwaitNodesAtHeight` | waits for the target to cross a height | **single-node** (`target.nodeRef`); maps to the sidecar `await-condition(height=H)` and **drops `action`** | | `UpdateNodeImage` | patches `spec.image`, waits for `currentImage` | **no readiness check** — completes on image observation; green ≠ healthy node | +| `RestartSeid` | restarts seid in place | SIGTERMs seid so it re-reads `config.toml` **without bouncing the sidecar** (no mark-ready reapproval gap); empty payload; completion = local RPC serving again, **not** caught-up/voting — gate height with a following `AwaitNodesAtHeight`. Supersedes `RestartPod` | +| `MarkReady` | re-marks sidecar readiness | fire-and-forget; re-asserts `/v0/healthz=200` to unblock the seid start-gate / proxy probe after a readiness-blind restart or rollout; empty payload; completion = the submit ack (a beat before `/v0/healthz` serves 200) — gate real serving with a following `AwaitCondition`/`AwaitNodesAtHeight` | ## Lifecycle @@ -34,7 +36,7 @@ is stamped **atomically before any side effect**; terminal CRs are **no-op recon - `target.requirePhase` (default `Running`) / `requirePhaseTimeout` (default `5m`) gate dispatch; **the timeout is terminal** — a target that never reaches the phase fails the task (delete+recreate to retry), it does not wait forever. -- `timeoutSeconds` (0 = unbounded) bounds the run from `status.startedAt`. +- `timeoutSeconds` bounds the run from `status.task.executionStartedAt` (the requirePhase wait is **not** charged). `0` means the per-kind default: unbounded for most kinds, but **`RestartSeid` 10m / `MarkReady` 2m**. - Status writes use the single optimistic-lock patch model (see `CLAUDE.md`). ## Conditions @@ -43,21 +45,22 @@ is stamped **atomically before any side effect**; terminal CRs are **no-op recon `Ready=True` only at `Complete`, `Failed=True` only at `Failed`. Treat `reason` as the stable public API for runbooks/alerting: -- `TargetReady`: `Resolving` / `PhaseMet` / `PhaseNotMet` / `ResolveTimeout` -- `Failed`: `TargetResolveFailed` / `UnsupportedKind` / `TaskTerminalError` / `TaskFailed` / `Timeout` / `DeserializeFailed` +- `TargetReady`: `PhaseMet` / `PhaseNotMet` +- `Failed`: `TargetResolveFailed` / `ParamsBuildFailed` / `UnsupportedKind` / `DeserializeFailed` / `TaskTerminalError` / `TaskFailed` / `Timeout` -> Caveat: the direct condition writes in `nodetask/controller.go` do **not** set -> `observedGeneration` today (a known `CLAUDE.md` divergence) — do not gate on condition -> freshness for SeiNodeTask until it is harmonized. +All condition writes route through `setCondition`, which stamps +`observedGeneration = cr.Generation` — so a condition reliably reflects the spec generation +it was evaluated against. ## Signing topology The operator keyring is **sidecar-resident** (mounted on `sei-sidecar`, not signable from -the main `seid` container). `keyName` resolves via **`ResolveOperatorKeyringUID`** -(`api/v1alpha1/validator_types.go`), a three-branch chain: +the main `seid` container). `keyName` resolves in two layers: the params builder +(`resolveSigningUID`) takes an explicit `keyName` when set; otherwise it derives via +**`ResolveOperatorKeyringUID`** (`api/v1alpha1/validator_types.go`): -1. explicit `keyName` → that key; -2. `.validator.operatorKeyring.secret` set but `keyName` empty → **`node_admin`**; +1. explicit `keyName` → that key (`resolveSigningUID`); +2. `.validator.operatorKeyring.secret` set, `keyName` empty → **`node_admin`**; 3. `.secret` **unset** → **`validator`** (the gentx genesis-ceremony key). A gentx-bootstrapped validator that assumes `node_admin` signs with the wrong key. @@ -75,7 +78,7 @@ Two layers, distinct: restart, and a `submittedAt` stamp guards `Execute` to run **once**. So the submit-proposal rehydration risk is a **controller-restart-after-submit** window, not a re-apply window. -## status.outputs reality +## status.outputs `status.outputs` is **unpopulated for all sidecar-backed kinds** (`GovVote`, `GovSoftwareUpgrade`, `AwaitCondition`, `AwaitNodesAtHeight`) — **only** @@ -141,4 +144,4 @@ spec: ``` Generate one per validator from the live list per cluster, wire into that cluster's Flux -path, and `flux reconcile`. A param-change submission example follows once PLT-487 lands. +path, and `flux reconcile`. From 1a3b1a3f5ddc81d12a50f6022def34d779a6e926 Mon Sep 17 00:00:00 2001 From: bdchatham Date: Fri, 12 Jun 2026 16:47:28 -0700 Subject: [PATCH 3/3] docs: align TargetReady godoc with emitted reasons (PhaseMet/PhaseNotMet) Comment-only: the controller no longer emits Resolving/ResolveTimeout; drop them from the ConditionSeiNodeTaskTargetReady godoc so it matches docs/seinode-task.md. --- api/v1alpha1/seinodetask_types.go | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/api/v1alpha1/seinodetask_types.go b/api/v1alpha1/seinodetask_types.go index 7b85bec..202ce48 100644 --- a/api/v1alpha1/seinodetask_types.go +++ b/api/v1alpha1/seinodetask_types.go @@ -88,8 +88,8 @@ const ( ConditionSeiNodeTaskFailed = "Failed" // ConditionSeiNodeTaskTargetReady reflects whether the target SeiNode - // satisfies spec.target.requirePhase. Reason indicates why (Resolving, - // PhaseMet, PhaseNotMet, ResolveTimeout). + // satisfies spec.target.requirePhase. Reason indicates why (PhaseMet, + // PhaseNotMet). ConditionSeiNodeTaskTargetReady = "TargetReady" )