feat: add attestation aggregate coverage metrics#386
Conversation
🤖 Kimi Code ReviewThis PR adds attestation aggregate coverage metrics following the existing patterns in the codebase. The implementation is correct and idiomatic. Assessment:
Minor observation:
No security vulnerabilities or consensus risks identified—this is pure instrumentation with no state transition logic. Verdict: LGTM. Clean, well-documented addition that follows established patterns. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code ReviewNo findings. This PR is limited to new Prometheus label constants and gauge registration in metrics.rs and startup seeding in metrics.rs. I don’t see any impact on fork choice, attestation validation, justification/finalization, state transition, XMSS handling, or SSZ behavior. The added code is straightforward, uses the existing One residual note: this patch only registers/seeds the new metrics; there are no writer APIs or call sites in this PR, so the metrics will stay at their seeded values until follow-up instrumentation lands. I also couldn’t complete Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewHere is the full code review: PR #386 —
|
Greptile SummaryRegisters three new
Confidence Score: 5/5This PR is safe to merge — it is purely additive metric registration with no changes to any logic, state, or existing code paths. The change adds three new IntGaugeVec statics and seeds 18 default zero series in init(). No existing code is modified, no consumers of the new metrics exist yet, and the implementation closely mirrors the patterns already established in the file. No files require special attention. The single changed file is straightforward metric registration.
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/metrics.rs | Adds three IntGaugeVec statics and two pub label-set constants; seeds 18 default zero series in init(). Follows all existing patterns; no logic, no consumers yet. |
Reviews (1): Last reviewed commit: "feat: add attestation aggregate coverage..." | Re-trigger Greptile
MegaRedHand
left a comment
There was a problem hiding this comment.
We still need to emit the metrics
|
Instrumentation added in 855f56d. Five emission sites now write to the 18 series registered in e9a04f7:
Store gets a small
Ready for another look. |
|
@MegaRedHand please re-review the PR |
Port leanSpec PR #735: register three IntGaugeVec metrics describing
attestation aggregate coverage, with default zero-valued series so
dashboards render from a fresh node startup.
- lean_attestation_aggregate_coverage_validators (section, subnet)
- lean_attestation_aggregate_coverage_subnets (section)
- lean_attestation_aggregate_coverage_diff_validators (direction)
ATTESTATION_AGGREGATE_COVERAGE_SECTIONS and
ATTESTATION_AGGREGATE_COVERAGE_DIFF_DIRECTIONS are exported as the
single source of truth for label sets. init() forces the new statics
and seeds combined-subnet, section, and direction series to 0 (18
default series total). Per-subnet (subnet="subnet_N") series appear
lazily when instrumentation writes them.
Registration only. The producer side (per-slot coverage computation,
de-duplication across payloads, and the chain-status log line) ports
zeam #876 and lands in a follow-up PR.
The diff_validators help text diverges from upstreams terse phrasing
to spell out the symmetric-difference semantics (block_only: in block
but not in local timely pool; timely_only: the reverse). Metric name,
labels, and values are unchanged.
b879490 to
6622da5
Compare
Port the producer side of zeam #876 on top of the metrics registered in
the previous commit. After this commit, all 18 coverage series receive
real per-slot updates from chain activity.
Five emission sites:
- accept_new_attestations (store.rs): captures `new_payloads`
participant bits BEFORE promote and stashes them as a
CoverageSnapshot on the Store. Read at the next slot boundary to
populate the `timely` section ("prev_new" in zeam).
- on_block_core (store.rs): mirrors the imported block s per-AttData
aggregation bits into Store::last_block_coverage. Observability-only;
fork choice is unchanged.
- on_tick interval 0 (lib.rs): emits the post-block-merge report for
`slot - 1`. Computes `timely`/`late`/`block`/`combined` from the
stashed snapshots and the current `new_payloads`, then emits the
diff_validators direction counts as the symmetric difference between
`block` and `timely`.
- start_aggregation_session (lib.rs): emits `agg_start_new` from the
current `new_payloads` right before fork-choice aggregation runs at
interval 2.
- propose_block (lib.rs): emits `proposal_payloads`,
`proposal_gossip`, and `proposal_combined` after the block is built.
Each validator set in the block is classified by whether the
AttestationData has a matching known-payload proof.
New module crates/blockchain/src/coverage.rs holds the Coverage type
(seen + has_subnet bitsets, derived subnet via vid % committee_count to
match the gossip subnet assignment) plus the 3 emission helpers and 6
unit tests covering add_bits, merge_from, diff_counts, empty/zero/out-
of-range edge cases.
Storage gets a CoverageSnapshot type and two Arc<Mutex<Option<…>>>
fields on Store. No proofs are duplicated — only AggregationBits are
captured, keeping the per-slot allocation in the tens of bytes per
entry. The pre-merge capture happens inside accept_new_attestations
just before promote_new_aggregated_payloads, so consumer-side timing
concerns stay in the existing tick path.
BlockChain::spawn now takes attestation_committee_count as a
parameter; bin/ethlambda/src/main.rs already resolves the value
(CLI > validator-config.yaml > 1) and passes it through. The number
of attestation committees was previously only known to P2P (for
subnet subscriptions); the coverage emitters need it to derive
subnet ids.
Delete crates/blockchain/src/coverage.rs and move the three coverage emitters (post_block, agg_start_new, proposal) into lib.rs as private free functions. The Coverage struct is replaced with Vec<bool> locals (seen for validators, has_subnet for subnets), plus three small helpers (cov_add, cov_record, or_into). Each emit_* function was called from exactly one site, so the previous abstraction added no DRY benefit. The 6 unit tests in coverage.rs were exercising the wrappers bitset ops, not the emission contract, so they go away with the wrapper. Behavior, metric names, labels, and values are unchanged. Net effect on the PR: roughly -200 lines.
…sip coverage emit_proposal_coverage classified block-included validators as payload- vs gossip-sourced by checking known_aggregated_payloads. But ethlambda builds blocks exclusively from that same pool (build_block -> extend_proofs_greedily), so every included validator is by construction payload-seen: proposal_gossip was always 0 and proposal_payloads always equalled proposal_combined. Keep only proposal_combined (the set of validators in the proposed block). This also removes the per-attestation hash_tree_root lookup on the block- building path and the cross-AttestationData payload_seen contamination the old classification carried. Default-seeded series drop from 18 to 14.
| pre_merge_new_coverage: Arc<Mutex<Option<CoverageSnapshot>>>, | ||
| /// Snapshot of the most-recently-imported block's aggregated attestation | ||
| /// participant bits. Reset on each imported block. Observability-only. | ||
| last_block_coverage: Arc<Mutex<Option<CoverageSnapshot>>>, |
There was a problem hiding this comment.
I think we should move these out of the storage crate
…gate-coverage-metrics # Conflicts: # crates/blockchain/src/store.rs
Key every coverage section by the attestation data.slot (voting round)
and fire the per-slot report at interval 1, matching zeam #876:
- timely: CoverageSnapshot now stores (data.slot, bits) per entry and the
report filters by data.slot, removing the non-deterministic slot picked
from HashMap iteration order.
- block: read from the canonical head block at report time (filtered to
data.slot) instead of an unconditionally-overwritten per-block snapshot,
so fork blocks cant poison it. Removes last_block_coverage from Store.
- trigger: emit at interval 1 (not 0) so the block carrying the reporting
slots votes has been received and processed.
- has_any gate: emit nothing when no section has coverage for the round,
instead of a misleading block_only=0 / timely_only=N reading.
Description / Motivation
Ports two upstream changes that together add attestation aggregate coverage observability:
After this PR, all 14 coverage series receive real per-slot updates from chain activity.
What Changed
Registration (
crates/blockchain/src/metrics.rs)pub const &[&str]label-set constants — single source of truth for sections and directions.IntGaugeVecstatics:lean_attestation_aggregate_coverage_validators{section, subnet}—subnet="combined"is the section total;subnet="subnet_N"is per-subnet coverage (lazy).lean_attestation_aggregate_coverage_subnets{section}— covered-subnet count per section.lean_attestation_aggregate_coverage_diff_validators{direction}— symmetric difference between block-included and locally-aggregated pre-merge aggregates.init()forces the statics and seeds 14 default zero-valued series.Sections:
timely,late,block,combined,agg_start_new,proposal_combined.Directions:
block_only,timely_only.Emission (4 sites)
Every per-slot section is keyed by the attestation
data.slot(the voting round being reported), sotimely/late/block/combineddescribe the same cohort (matching zeam #876).accept_new_attestations(blockchain/store.rs) — before each promote, snapshotsnew_payloadsparticipant bits, tagged per entry with theirdata.slot, stashed on the Store. Read at the next slot boundary to populate thetimelysection ("prev_new" in zeam).on_tickinterval 1 (blockchain/lib.rs) — emits the post-block report forslot - 1. Fired at interval 1, not 0, so the block carrying that round's votes (proposed at interval 0 of this slot) has typically been received and processed, letting theblocksection observe the same round. Computestimely(pre-merge snapshot filtered to the round),late(currentnew_payloadsfor the round),block(attestations in the canonical head block filtered to the round — canonical by construction, so a fork block can't poison it), andcombined(their union); then emitsdiff_validatorsas the symmetric difference betweenblockandtimely. Emits nothing when no section has coverage for the round, so a missed slot doesn't surface as a misleadingblock_only=0, timely_only=N.start_aggregation_session(blockchain/lib.rs) — emitsagg_start_newfrom currentnew_payloadsright before fork-choice aggregation at interval 2.propose_block(blockchain/lib.rs) — emitsproposal_combined(the full set of validators included across the block's aggregated attestations) after the block is built. (A payload-vs-gossip split was considered but dropped: ethlambda builds blocks exclusively fromknown_aggregated_payloads, so every included validator is payload-sourced by construction —proposal_gossipwould always be 0 andproposal_payloadswould always equalproposal_combined.)Supporting changes
fns at the bottom ofblockchain/src/lib.rs. They useVec<bool>locals (seenfor validators,has_subnetfor subnets) plus three small helpers (cov_add,cov_record,or_into). Subnet derivation isvid % committee_count, matching the gossip subnet assignment incrates/net/p2p/src/lib.rs.Storegets a singleCoverageSnapshotfield (pre_merge_new_coverage) holding(data.slot, AggregationBits)entries captured before each promote — only participant bits, no proof duplication. Theblocksection needs no stored state: it is read from the canonical head block at report time.BlockChain::spawnnow takesattestation_committee_count(resolved inmain.rsfrom CLI > validator-config.yaml > 1; previously only known to P2P for subnet subscriptions).Operator interpretation of
diff_validatorsThe aggregation pipeline produces two pools for the same slot:
timely(locally aggregated pre-merge) andblock(what the proposer included). The diff counts validators in the symmetric difference:block_onlypersistently high → this node was slow to receive/aggregate via gossip; proposer had a better view.timely_onlypersistently high → proposer omitted attestations the network had time to gossip.Correctness / Behavior Guarantees
diff_validatorshelp text intentionally diverges from upstream's terse phrasing to spell out the symmetric-difference semantics. Metric name, labels, and values are unchanged; dashboards built against any client's schema work as-is.6 × 65 = 390series.Tests Added / Run
No new unit tests. The earlier draft of this PR included unit tests for an internal
Coveragebitset wrapper; once the wrapper was inlined asVec<bool>locals, those tests were exercising trivial iterator ops rather than the emission contract, so they were dropped.Commands run:
cargo fmt --all -- --check— cleanmake lint(clippy-D warnings) — cleancargo test -p ethlambda-blockchain --release --lib --bins— 21 passedcargo test -p ethlambda-storage --release— 38 passedcargo test -p ethlambda-blockchain --release --test forkchoice_spectests— 62 passed, 8 failed (allAttestationTooFarInFuture, pre-existing fixture flakes onmain, unrelated to this PR).Related Issues / PRs
✅ Verification Checklist
make fmt— cleanmake lint(clippy with-D warnings) — cleancargo test --workspace --release— passing modulo 8 pre-existing forkchoice fixture flakes documented above (unrelated to this PR)