diff --git a/.github/actions/claude-pr-review/action.yml b/.github/actions/claude-pr-review/action.yml index c9f3164..4090660 100644 --- a/.github/actions/claude-pr-review/action.yml +++ b/.github/actions/claude-pr-review/action.yml @@ -85,6 +85,20 @@ runs: fi echo "$delim" >> "$GITHUB_OUTPUT" + # Inject the shared review rubric, maintained as a sibling markdown file so it + # can be edited and reviewed as prose rather than buried in a YAML block scalar. + rubric_delim="RUBRIC_$RANDOM$RANDOM" + { + echo "rubric<<$rubric_delim" + cat "$GITHUB_ACTION_PATH/rubric.md" + echo "$rubric_delim" + } >> "$GITHUB_OUTPUT" + + # Stage the on-demand detailed checklist into the workspace so the review agent + # can Read it lazily, only for the areas a given diff actually touches. + mkdir -p "$GITHUB_WORKSPACE/.claude-review" + cp "$GITHUB_ACTION_PATH/rubric-detail.md" "$GITHUB_WORKSPACE/.claude-review/rubric-detail.md" + - uses: anthropics/claude-code-action@c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1 with: claude_code_oauth_token: ${{ inputs.claude_code_oauth_token }} @@ -107,35 +121,11 @@ runs: general guidance below wherever they conflict. Be concise and actionable. Only comment on issues that matter. - Apply the following common review rubric as a baseline. A consumer repo's REVIEW.md and other agent files may override or extend any part of it; defer to those wherever they conflict. - - Mindset: - - The goal is to improve overall code health over time. Approve changes that make the codebase better even if not perfect; do not let "perfect" block "better". - - Review is also knowledge transfer. Critique the code, not the person, and explain why — cite a reason, not just "change this to that". - - Prioritize attention in this order (defer everything below the line to tooling): - 1. Data structures — are the core types right? Bad code around good data structures is fixable; good code around bad data structures is not. - 2. Design — is the approach right and does it fit the architecture? Prefer eliminating edge cases through better design over adding conditionals. - 3. Correctness — edge cases, off-by-ones, race conditions, and especially any change to consensus-, protocol-, or wire-critical code or a backward-compatibility-sensitive contract (extra scrutiny here). - 4. Complexity — could it be simpler? Can the number of concepts be cut? - 5. Breakage — which downstream consumers or public APIs/contracts could this break? A breaking change to a published contract must be called out explicitly. - 6. Tests — present for new or changed logic; deterministic (no sleep/wall-clock/flaky); assert exact values and specific error variants (not just is_ok/is_some/truthiness); cover side effects and the absence of unintended changes. - 7. Naming and comments — names are clear; comments explain why, not what. - 8. Style — defer to the repo's formatter and linter; never spend review effort on formatting. - - Also check: - - PR title follows Conventional Commits and the description explains what changed and why; flag a title or description that is stale or misleading after new commits. - - unsafe code carries a SAFETY comment; errors are checked or have a comment justifying the ignore; no unhandled concurrency hazards (shared mutable state, TOCTOU, deadlocks, dropped errors or promise rejections). - - Public API or interface changes have a documented rationale, and framework conventions are respected rather than replaced. - - Logging uses structured fields at appropriate levels (not string interpolation or println-equivalents); recurring counts and durations are metrics, not logs. - - Documentation proximity matches the scope of the fact being documented: implementation mechanics live inline with the code they describe, while separate design docs focus on architecture and rationale and link to code instead of restating details that will drift. - - Flag the same fact duplicated across inline docs and separate docs; pick one authoritative home and link from the other. - - Do NOT flag: - - Formatting or lint already enforced by the repo's formatter/linter, or type errors caught by its type checker. - - Redundant guards that aid readability, or empirically-tuned constants and thresholds. - - Anything already addressed in the diff under review — read the full diff first. - Avoid reviewer anti-patterns: bikeshedding, rubber-stamping, nitpick avalanches, gatekeeping on style or hypothetical concerns, and manufacturing feedback. If the PR is clean, say so briefly in a structured review body instead of inventing comments. + Apply the common review rubric below as a baseline. A consumer repo's REVIEW.md and + other agent files may override or extend any part of it; defer to those wherever they + conflict. + + ${{ steps.compose.outputs.rubric }} Follow this review contract in order. Be exhaustive before posting. Treat Phases 1 through 5 as an iterative loop: after one pass, revisit the diff, checklist, local guidance, and prior threads again if you found any new risk area, uncertainty, or candidate finding. Keep iterating until a fresh pass produces no new actionable findings. Do not post new review feedback until all analysis phases are complete and you are ready to submit one unified review. diff --git a/.github/actions/claude-pr-review/rubric-detail.md b/.github/actions/claude-pr-review/rubric-detail.md new file mode 100644 index 0000000..098c822 --- /dev/null +++ b/.github/actions/claude-pr-review/rubric-detail.md @@ -0,0 +1,37 @@ +# Detailed review checks (conditional) + +Read only the section(s) the diff actually touches — see "Routing to detailed checks" in the +rubric. These are line-level checks that would clutter a review if applied everywhere. Keep +findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in which case escalate. + +## Persistence + +- **Crash-consistency:** on-disk writes must survive a crash mid-write — write to a temp file + and atomically `rename()` into place; never leave a half-written file on crash or power loss. +- **Atomic multi-step persists:** a sequence that must be all-or-nothing (e.g. clean-old + + write-new) should be one atomic operation; a crash between steps must not leave inconsistent + on-disk state. +- **No silent corruption on serialize:** don't write `…unwrap_or_default()` bytes — empty or + default bytes silently produce a corrupt record that later parses as a wrong-but-valid value. +- **Don't gold-plate internal artifacts:** for non-user-facing on-disk output, optimize for + not-misleading over pretty (e.g. don't spend effort pretty-printing an internal dump). + +## Unit-safety + +- **Annotate ambiguous integers:** two same-typed integers with different units in one + struct or signature is a trap — put the unit in the name or a doc so durations and + comparisons can't be miscomputed. +- **One conversion choke point:** route scaling conversions (`*1000` / `/1000`, ms↔ns, etc.) + through a single named helper with a `_lossy` suffix when it truncates, so the lossy step is + visible at every call site and a fix is a one-file diff. + +## Test-&-perf-surface + +- **Assertion placement:** put an assertion where the test or fuzzer actually observes the + condition — not after an early `return Err` that makes it tautologically true (the failing + case is then never exercised). +- **Evidence for perf-shaped / hot-path changes:** require a benchmark, and an equivalence test + pinning new == old behavior, before trusting an optimization or a hot-path restructure. +- **Dead or test-only scaffolding is a trapdoor:** a `Default`/`Option`/bootstrap field or + method with zero production callers isn't merely unused — it's a latent footgun a future + caller can trip. Confirm zero callers, then delete it. diff --git a/.github/actions/claude-pr-review/rubric.md b/.github/actions/claude-pr-review/rubric.md new file mode 100644 index 0000000..b7d9e99 --- /dev/null +++ b/.github/actions/claude-pr-review/rubric.md @@ -0,0 +1,107 @@ +# MegaETH PR Review Rubric (shared baseline) + +A consumer repo's `REVIEW.md` / `CLAUDE.md` / `AGENTS.md` override or extend anything here; +defer to them on conflict. This is the general baseline only. + +## Stance (governs everything below) + +- Review **structure and design, not lines**. The deepest finding makes a special case + disappear; the shallowest renames a variable. Spend attention high. +- The goal is improving code health over time — approve "better" even when not "perfect." + Critique the code, not the person, and always say _why_. +- **Personal taste or style is never a change request.** If the formatter, a linter, the type + checker, or the author's reasonable judgment can own it, stay silent. +- Severity is **leverage** (how much the design improves), not effort to spot. Default to + silence — a clean PR gets a clean verdict, not invented findings. + +## Lenses — judge in this order; the earlier the lens, the higher its leverage + +### 1. Design & data structures + +Are the core types right? Bad code around good data structures is fixable; the reverse is not. + +- Encode invariants in the type so the compiler catches misuse — prefer a newtype/enum/`OnceCell` + over a primitive-with-sentinel. A config knob or `Default` with only one correct value is a + footgun; remove it. +- Does each piece live where its responsibility lives? One concept = one type across all its + consumers, not a parallel type per subsystem. Decompose god-structs (pure logic that returns + effects + a thin orchestrator). +- Prefer eliminating an edge case through better design over adding a conditional. +- Keep it simple — can the number of concepts be cut? An abstraction is _earned_ only if it + (a) hides an impl detail, (b) enables a test fake, **and (c) keeps its generic bounds + confined** — otherwise inline it; a single call site rarely earns a new trait or indirection. +- Respect existing framework/library conventions rather than replacing them. + +### 2. Correctness & safety + +- Edge cases, off-by-ones, and races; extra scrutiny for consensus-/protocol-/wire-critical or + backward-compat-sensitive code. +- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A _fatal_ + condition must actually terminate the unit, not just the loop that observed it. +- Errors are handled or their ignore is justified with a comment; no silently dropped sends or + join-errors; prefer typed error enums over opaque/stringly errors. +- Unsafe code carries a `SAFETY` comment justifying its invariants. No unhandled concurrency + hazards — shared mutable state, TOCTOU, deadlocks. + +### 3. Surface stewardship + +What does this disturb that others depend on? + +- Call out breaking changes to any published or public API/contract explicitly; public API or + interface changes carry a documented rationale. +- Minimize the diff to vendored/forked/upstream files — even comment-only drift costs rebase + surface; keep customizations in our own modules. +- Don't change the semantics of an existing function — add a new one. +- Treat stable wire, on-disk, and public surfaces as sacred. + +### 4. Change & operate + +Can the next person change and run this safely? + +- **Tests:** present for new/changed logic; deterministic (no sleep/wall-clock); assert exact + values and specific error variants (not `is_ok`/truthiness); cover side effects and the + absence of unintended change. If logic is hard to test, that's a design signal (split pure + logic from I/O). A bool param every caller passes identically is an untested dimension. Don't + build the expected value from the same primitive under test. +- **Naming & docs:** names describe state/intent, not effect or serialization format; drop + redundant qualifiers. Comments and docs explain _why_, not _what_; new modules carry a + one-line orientation. A missing or misleading doc on a non-obvious invariant is a real gap. +- **Observability:** error variants carry the disagreeing values; loss/drop logs carry the + correlating key; fail-open paths expose a scrape-visible counter; long-running spawned tasks + get a tracing span; never print secrets into `--help`. Logging uses structured fields at + appropriate levels (not string interpolation); recurring counts/durations are metrics, not logs. +- **Doc proximity:** implementation mechanics live with the code; design docs hold + architecture/rationale and link to code rather than restating it. Flag the same fact + duplicated across homes — pick one authoritative home. + +## Do NOT flag + +- Formatting, lint, or type errors already owned by the repo's formatter/linter/type-checker — + never spend review effort on style. +- Redundant guards that aid readability; empirically-tuned constants/thresholds. +- Anything already addressed in the diff (read the full diff first). +- Avoid bikeshedding, nitpick avalanches, gatekeeping on style or hypotheticals, and + manufacturing feedback. If the PR is clean, say so. + +## How to deliver + +- **Propose the concrete alternative** (a `suggestion` block, a sketch, a rename target) — not + just a complaint. For a consequential design ask, note the option you considered and rejected + and why. +- Open with what's genuinely good before the asks. +- Calibrate: blocking (correctness / security / breakage / consensus) vs. should-fix vs. + follow-up. Route non-blocking design improvements to a follow-up suggestion; don't block. +- A finding's altitude should match its severity — a `[Nit]` that's really a structural concern + is mis-tagged. +- Confirm the PR title follows Conventional Commits and the description explains what & why; + flag a stale or misleading title/description after new commits. + +## Routing to detailed checks (read on demand) + +Before finalizing, if the diff touches any of these areas, read the matching section of +`.claude-review/rubric-detail.md` and apply it; skip areas the diff does not touch (these are +line-level checks — do not load them when irrelevant): + +- **Persistence / on-disk writes / serialization** → §Persistence +- **Unit / time / scaling math (ratios, conversions, timestamps)** → §Unit-safety +- **Tests, fuzzers, or fault-injection harnesses; perf-shaped changes** → §Test-&-perf-surface