deploy to prod by JoaquinBN · Pull Request #875 · genlayer-foundation/points

JoaquinBN · 2026-07-02T15:28:48Z

No description provided.

The version-shame window on the Wall of Shame is no longer hardcoded to three days. It now reads a NODE_VERSION_SHAME_GRACE_DAYS setting (default three days, env-overridable), so the grace period can be tuned per environment without a code change. The version verdict logic also moves out of the wallet viewset into a shared version_status helper — with an explicit node_version override for callers that already know the running version — so the same rule can be reused by the Grafana-driven sync added later in this branch. ## Claude Implementation Notes - backend/validators/version_status.py: New compute_version_status(wallet, target, now, node_version=...) — extracted from the viewset; grace from settings.NODE_VERSION_SHAME_GRACE_DAYS via default_grace_days(); node_version param lets the Grafana sync pass the observed version (compares via NodeVersionMixin._compare_versions, no operator required) - backend/validators/views.py: _version_context now delegates to the helper; removed hardcoded VERSION_SHAME_GRACE_DAYS constant and unused timedelta import - backend/tally/settings.py: Add NODE_VERSION_SHAME_GRACE_DAYS (env-overridable, default 3, applied globally at evaluation time) - backend/validators/tests/test_version_status.py: Unit tests — parity + grace configurable via the setting - backend/CLAUDE.md, CHANGELOG.md: Document the setting and the shared helper

Every Grafana status sync now captures a per-run observation for each active validator wallet (on-chain status, metrics, logs, and the node version read from the Prometheus `version` label) and latches it into a per-day rollup on the existing daily snapshot. Metrics and logs latch pessimistically — shamed at any point means the day is shamed — while version latches optimistically: once a node has upgraded during a day, an earlier stale reading cannot shame it. Per-day sample counters record whether the node was seen reporting at all, the building blocks for uptime-streak and days-in-shame reporting. Version labels are normalised at ingest ('v' prefix stripped, capped to the column length, and when a node briefly reports two series right after an upgrade the higher parseable one wins), so bad node-reported data can never corrupt or abort a whole network's history. The rollup is fully rebuildable from the raw observation log, whose rows are retained forever by explicit decision. History writing is best-effort and isolated from the live status update, so a failure there never corrupts the Wall of Shame status. No points or public API behaviour changes in this step. ## Claude Implementation Notes - backend/validators/models.py: New ValidatorWalletObservation (append-only raw log); extend ValidatorWalletStatusSnapshot with metrics_status/logs_status/version_status, node_version, metrics_samples/logs_samples - backend/validators/grafana_service.py: PromQL adds the `version` label; parse_response returns a 4-tuple with version_by_address (normalised via _normalize_version, capped to _VERSION_MAX_LENGTH, higher parseable version wins on duplicate series via _safe_parse); sync_network computes per-wallet version_status via compute_version_status; _record_history writes observations + latched rollup (worst-of-day _latch for metrics/logs, best-of-day _latch_version for version) - backend/validators/management/commands/rebuild_daily_snapshots.py: New command to re-materialise rollups from observations; --days N cutoff snapped to the local-day boundary so the oldest day is never rebuilt from partial observations - backend/validators/migrations/0015_*: New model + snapshot columns - backend/validators/tests/test_grafana_service.py: version-label parse (normalised), duplicate-series keeps-higher, overlong-label truncation, observation/rollup writes, both latch directions, no-observations-on-failure, rebuild day-boundary regression - backend/CLAUDE.md, CHANGELOG.md: Document the observation log, rollup columns, latch directions, rebuild command, and retain-forever decision

The Wall of Shame now surfaces how long each validator has gone without being shamed. Every wallet reports a consecutive clean-day streak and the reasons the streak was last broken, and each operator gets a per-network streak using any-node-clean roll-up: a network-day counts as clean if at least one of the operator's nodes was healthy that day. Streaks are computed on read from the daily observability rollup, so they cost one extra snapshot query per request (the endpoint stays cached 60s) and start accumulating from deploy. ## Claude Implementation Notes - backend/validators/streaks.py: New module. clean_streak(wallet_ids, now, index) walks the daily rollup backward counting consecutive clean days (any-node-clean over the given wallet ids); clean day = active + >=1 metrics & logs sample + no shame dim. A partial today never breaks the streak; broken_by only attributes a reason for observed days (edge-of-history returns []). load_snapshot_index prefetches the window in one query. - backend/validators/views.py: wall_of_shame builds the snapshot index once and per-wallet streaks, passes them to the serializer via context and into _build_validator_groups; groups gain network_streaks (per-network any-node-clean) and each node entry gains clean_streak_days / clean_streak_broken_by - backend/validators/serializers.py: WallOfShameSerializer adds clean_streak_days + clean_streak_broken_by (from context, no N+1) - backend/validators/tests/test_streaks.py: streak counting, shame/gap/version breaks, unsynced-today, any-node-clean operator roll-up - backend/validators/tests/test_grafana_service.py: endpoint exposes streak fields + network_streaks - backend/CLAUDE.md, CHANGELOG.md: Document the streak fields

Grafana becomes the single source of truth for validator node versions. The status sync reads each node's reported version and: promotes the fleet's highest stable release to the active upgrade target the first time it is seen (ignoring pre-release and build-tagged versions), keeps each operator's recorded version in step with what their nodes actually run, and awards the node-upgrade contribution directly — with the existing sooner-is-better bonus — the moment a visible operator reaches the target, with no manual submission or steward review. Detection covers every reporting node regardless of on-chain status, so a quarantined validator that upgrades still records it and earns the award. Versions that packaging cannot parse are excluded from comparisons instead of aborting the run, and one operator's failure never blocks version updates or awards for the rest. Because versions are now observed rather than self-reported, the portal stops accepting manual edits: the profile shows the detected version read-only, the two backend write paths are closed, and the old save()-driven pending-submission flow is removed. Dedup on the shared notes key guarantees nothing is ever awarded twice. ## Claude Implementation Notes - backend/validators/grafana_service.py: new _sync_node_versions({address_lower: version}) — matches wallets of ANY on-chain status, filters to semver-valid AND PEP 440-parseable versions, auto-creates TargetNodeVersion from the highest stable observed (never blindly supersedes an unparseable active target), writes node_version_<network> via Validator.objects.update() (max across the operator's nodes), per-operator try/except fault isolation; _award_node_upgrade creates a direct approved Contribution (early-bonus 4/3/2/1, dedup on `version {v} [{network}]`, multiplier fallback via _allow_missing_multiplier) - backend/validators/node_version.py: remove NodeVersionMixin.save() and _create_upgrade_submission (dead once the portal can't write versions); keep fields, validation, comparison helpers, calculate_early_upgrade_bonus - backend/users/serializers.py: UserProfileUpdateSerializer drops the writable node_version fields and the custom update() - backend/validators/views.py: ValidatorViewSet.my_profile is GET-only (PATCH → 405) - frontend/src/routes/ProfileEdit.svelte: node version inputs replaced with read-only display ("Not detected yet" fallback, auto-detected hint); removed the related state/change-tracking/save logic - backend/validators/tests/test_node_version_sync.py: auto-target stable-only guard, supersede/no-op cases, max-across-nodes, single award + dedup, invisible-operator no-award, quarantined-wallet award, PEP 440-invalid isolation, one-failing-operator isolation - backend/validators/tests/test_node_version_tracking.py, test_api.py: drop save()-driven submission tests; /validators/me PATCH asserts read-only - backend/CLAUDE.md, frontend/CLAUDE.md, CHANGELOG.md: document Grafana as source of truth and the read-only portal surface

Grafana's version label is self-reported by the node being judged and rewarded, so the automatic flows no longer trust a single reporter. An upgrade target is only auto-created when the new stable release is seen on portal-known wallets of at least two distinct operators, and a broadcast notification announces it so validators learn about the grace period before they can be shamed. Versions from unknown Prometheus series or banned wallets count for nothing, recorded operator versions only move forward (a skipped scrape can't flash a downgrade onto the Wall of Shame), and removing the node-upgrade multiplier now pauses the auto-award entirely instead of awarding at 1.0. The shame verdict never falls back to lexicographic version comparison: unparseable versions read as version-unknown, and a parseable version series always beats an unparseable duplicate regardless of frame order. A sync run where a whole datasource comes back empty still updates the self-healing live statuses but skips the permanent daily history latch, so an infrastructure blackout can't shame every validator's recorded day. Version detection also runs before the active-wallet early return, so networks with zero active wallets still record versions and awards. Uptime streaks now skip days with no monitoring data instead of breaking, days spent quarantined or inactive break with an explicit status reason, version-only rollups count as observed days, the maximum streak honors the 180-day window, and both snapshot writers share one day-bucketing function so the (wallet, date) key can never split. ## Claude Implementation Notes - backend/validators/grafana_service.py: MIN_OPERATORS_FOR_AUTO_TARGET consensus guard + known/non-banned wallet restriction in _sync_node_versions; _broadcast_auto_target helper; monotonic node_version writes; award skipped on missing multiplier (_allow_missing_multiplier escape hatch removed); parseability gate for version_status in sync_network; datasource-blackout guard around _record_history; _sync_node_versions moved before the no-active- wallets early return; prefer-parseable rule in parse_response; defensive handlers log at exception level - backend/validators/streaks.py: clean_streak skips unobserved days, breaks on non-active snapshot rows, range capped at max_days; _has_observation counts version_status; _shame_dims attributes 'status' from the on-chain column even without Grafana data - backend/validators/genlayer_validators_service.py: snapshot date uses timezone.localdate() to match the Grafana rollup bucketing - backend/validators/management/commands/rebuild_daily_snapshots.py: --days 0 no longer means "all"; summary counts during iteration instead of a second full scan - backend/validators/tests/: coverage for the consensus guard, unknown address rejection, banned exclusion, monotonic writes, multiplier kill switch, auto-target notification, blackout guard, unparseable version verdicts, zero-active-wallet version sync, streak skip/status semantics (80 tests, all passing) - backend/CLAUDE.md, frontend/CLAUDE.md: docs updated to the new behavior; validators mutation contract corrected to staff-only; ProfileEdit.svelte filename fixed

Closes the still-open CodeRabbit findings: the version verdict shared by the Wall of Shame and the Grafana sync now compares versions only via PEP 440 parsing — an unparseable legacy or vendor-format version reads as 'on' when it exactly equals the target string and 'unknown' otherwise, never a lexicographic comparison that misorders versions. The per-operator network-streak rollup pre-groups wallet ids once instead of rescanning every operator-network pair for every group. ## Claude Implementation Notes - backend/validators/version_status.py: safe_parse_version added (shared helper); compute_version_status verdicts via parsed comparison with exact-string-equality escape for vendor formats; 'unknown' when incomparable; NodeVersionMixin._compare_versions no longer used here - backend/validators/grafana_service.py: imports safe_parse_version instead of a local duplicate; sync-loop parseability gate removed in favor of the hardened shared verdict; parse_response docstring documents the parseable-beats-unparseable rule - backend/validators/views.py: wallet_ids_by_operator pre-grouping removes the O(groups x pairs) scan in _build_validator_groups - backend/validators/tests/test_version_status.py: unparseable-version and vendor-format-equality verdict coverage

The node-upgrade award dedup now runs inside a transaction holding a lock on the user row, so even the residual stale-lock-takeover window can never double-award the same version. The minimum number of distinct operators required before a new stable release auto-creates the fleet-wide upgrade target is now a setting (default 2), tunable without a code deploy like the shame grace period. The new snapshot and observation model fields document their intent through help_text, matching the model's existing convention. ## Claude Implementation Notes - backend/validators/grafana_service.py: _award_node_upgrade wraps dedup check + create in transaction.atomic with select_for_update on the user row (no-op on SQLite, real lock on Postgres); module constant MIN_OPERATORS_FOR_AUTO_TARGET replaced by min_operators_for_auto_target() reading NODE_VERSION_MIN_OPERATORS_FOR_AUTO_TARGET at call time - backend/tally/settings.py: NODE_VERSION_MIN_OPERATORS_FOR_AUTO_TARGET env-driven setting (default 2) - backend/validators/models.py + migrations/0015: help_text on all new ValidatorWalletStatusSnapshot / ValidatorWalletObservation fields; migration regenerated in place (same name, help_text-only diff, makemigrations --check clean) - backend/validators/tests/test_node_version_sync.py: threshold configurability test via override_settings - backend/CLAUDE.md: env var documented; constant reference updated

Banned users are blocked from submitting contributions everywhere else, so the Grafana version sync must not let them in through the back door: wallets whose linked user is banned no longer count toward the auto-target quorum, no longer get version write-backs, and can never receive the direct node-upgrade award (the award gate also re-checks the ban as a second layer). ## Claude Implementation Notes - backend/validators/grafana_service.py: wallet query in _sync_node_versions filters operator__user__is_banned=False alongside the on-chain banned-wallet exclusion; award gate re-checks is_banned; docstrings updated - backend/validators/tests/test_node_version_sync.py: banned-user test covering quorum, version write-back, and award paths

Per product decision, a single validator seen running a new stable release is enough to auto-create the fleet-wide upgrade target; the operator quorum setting now defaults to 1. The setting remains so the bar can be raised without a deploy if version spoofing ever becomes a concern. Known-wallet, banned-wallet, and banned-user restrictions are unchanged. ## Claude Implementation Notes - backend/tally/settings.py: NODE_VERSION_MIN_OPERATORS_FOR_AUTO_TARGET default 2 -> 1, comment explains the decision and when to raise it - backend/validators/grafana_service.py: fallback default 1; docstrings and comments updated to match - backend/validators/tests/test_node_version_sync.py: single-operator target creation is the default-path test again; quorum test now overrides the setting to 2 and covers both rejection and corroboration - backend/CLAUDE.md: default documented as 1

…istory-streaks Grafana-sourced shame history: uptime streaks + auto node-version

coderabbitai · 2026-07-02T15:28:58Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e24c36c2-762b-44f5-a615-e61b2484ac12

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

rasca and others added 14 commits July 1, 2026 22:15

Update changelog

7fe932e

Update changelog

8592634

Update changelog

473de76

Update changelog

0da73bd

Merge pull request #867 from genlayer-foundation/feat/grafana-shame-h…

95e042b

…istory-streaks Grafana-sourced shame history: uptime streaks + auto node-version

JoaquinBN merged commit d7461b3 into main Jul 2, 2026
5 checks passed

JoaquinBN temporarily deployed to cron-job July 2, 2026 15:29 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 2, 2026 15:32 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 16:24 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 16:25 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 16:29 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 17:12 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 2, 2026 17:14 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 18:08 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 18:12 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 18:16 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 18:48 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 2, 2026 18:50 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 19:47 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 19:53 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 19:54 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 20:11 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 2, 2026 20:12 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 20:52 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 20:56 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 21:17 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 21:32 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 2, 2026 21:34 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 21:55 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 22:08 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 22:29 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 22:41 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 2, 2026 22:43 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 22:55 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 23:20 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 23:40 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 23:44 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 2, 2026 23:46 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 2, 2026 23:54 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 01:06 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 01:18 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 01:19 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 3, 2026 01:21 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 01:27 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 01:29 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 04:54 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 05:00 — with GitHub Actions Inactive

JoaquinBN temporarily deployed to cron-job July 3, 2026 05:02 — with GitHub Actions Inactive

albert-mr temporarily deployed to cron-job July 3, 2026 05:06 — with GitHub Actions Inactive

JoaquinBN deployed to cron-job July 3, 2026 05:25 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

deploy to prod#875

deploy to prod#875
JoaquinBN merged 14 commits into
mainfrom
dev

JoaquinBN commented Jul 2, 2026

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

JoaquinBN commented Jul 2, 2026

Uh oh!

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants