Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 19 additions & 29 deletions .github/actions/claude-pr-review/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,20 @@ runs:
fi
echo "$delim" >> "$GITHUB_OUTPUT"

# Inject the shared review rubric, maintained as a sibling markdown file so it
# can be edited and reviewed as prose rather than buried in a YAML block scalar.
rubric_delim="RUBRIC_$RANDOM$RANDOM"
{
echo "rubric<<$rubric_delim"
cat "$GITHUB_ACTION_PATH/rubric.md"
echo "$rubric_delim"
} >> "$GITHUB_OUTPUT"

# Stage the on-demand detailed checklist into the workspace so the review agent
# can Read it lazily, only for the areas a given diff actually touches.
mkdir -p "$GITHUB_WORKSPACE/.claude-review"
cp "$GITHUB_ACTION_PATH/rubric-detail.md" "$GITHUB_WORKSPACE/.claude-review/rubric-detail.md"

- uses: anthropics/claude-code-action@c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1
with:
claude_code_oauth_token: ${{ inputs.claude_code_oauth_token }}
Expand All @@ -107,35 +121,11 @@ runs:
general guidance below wherever they conflict.
Be concise and actionable. Only comment on issues that matter.

Apply the following common review rubric as a baseline. A consumer repo's REVIEW.md and other agent files may override or extend any part of it; defer to those wherever they conflict.

Mindset:
- The goal is to improve overall code health over time. Approve changes that make the codebase better even if not perfect; do not let "perfect" block "better".
- Review is also knowledge transfer. Critique the code, not the person, and explain why — cite a reason, not just "change this to that".

Prioritize attention in this order (defer everything below the line to tooling):
1. Data structures — are the core types right? Bad code around good data structures is fixable; good code around bad data structures is not.
2. Design — is the approach right and does it fit the architecture? Prefer eliminating edge cases through better design over adding conditionals.
3. Correctness — edge cases, off-by-ones, race conditions, and especially any change to consensus-, protocol-, or wire-critical code or a backward-compatibility-sensitive contract (extra scrutiny here).
4. Complexity — could it be simpler? Can the number of concepts be cut?
5. Breakage — which downstream consumers or public APIs/contracts could this break? A breaking change to a published contract must be called out explicitly.
6. Tests — present for new or changed logic; deterministic (no sleep/wall-clock/flaky); assert exact values and specific error variants (not just is_ok/is_some/truthiness); cover side effects and the absence of unintended changes.
7. Naming and comments — names are clear; comments explain why, not what.
8. Style — defer to the repo's formatter and linter; never spend review effort on formatting.

Also check:
- PR title follows Conventional Commits and the description explains what changed and why; flag a title or description that is stale or misleading after new commits.
- unsafe code carries a SAFETY comment; errors are checked or have a comment justifying the ignore; no unhandled concurrency hazards (shared mutable state, TOCTOU, deadlocks, dropped errors or promise rejections).
- Public API or interface changes have a documented rationale, and framework conventions are respected rather than replaced.
- Logging uses structured fields at appropriate levels (not string interpolation or println-equivalents); recurring counts and durations are metrics, not logs.
- Documentation proximity matches the scope of the fact being documented: implementation mechanics live inline with the code they describe, while separate design docs focus on architecture and rationale and link to code instead of restating details that will drift.
- Flag the same fact duplicated across inline docs and separate docs; pick one authoritative home and link from the other.

Do NOT flag:
- Formatting or lint already enforced by the repo's formatter/linter, or type errors caught by its type checker.
- Redundant guards that aid readability, or empirically-tuned constants and thresholds.
- Anything already addressed in the diff under review — read the full diff first.
Avoid reviewer anti-patterns: bikeshedding, rubber-stamping, nitpick avalanches, gatekeeping on style or hypothetical concerns, and manufacturing feedback. If the PR is clean, say so briefly in a structured review body instead of inventing comments.
Apply the common review rubric below as a baseline. A consumer repo's REVIEW.md and
other agent files may override or extend any part of it; defer to those wherever they
conflict.

${{ steps.compose.outputs.rubric }}

Follow this review contract in order. Be exhaustive before posting. Treat Phases 1 through 5 as an iterative loop: after one pass, revisit the diff, checklist, local guidance, and prior threads again if you found any new risk area, uncertainty, or candidate finding. Keep iterating until a fresh pass produces no new actionable findings. Do not post new review feedback until all analysis phases are complete and you are ready to submit one unified review.

Expand Down
37 changes: 37 additions & 0 deletions .github/actions/claude-pr-review/rubric-detail.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Detailed review checks (conditional)

Read only the section(s) the diff actually touches — see "Routing to detailed checks" in the
rubric. These are line-level checks that would clutter a review if applied everywhere. Keep
findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in which case escalate.

## Persistence

- **Crash-consistency:** on-disk writes must survive a crash mid-write — write to a temp file
and atomically `rename()` into place; never leave a half-written file on crash or power loss.
- **Atomic multi-step persists:** a sequence that must be all-or-nothing (e.g. clean-old +
write-new) should be one atomic operation; a crash between steps must not leave inconsistent
on-disk state.
- **No silent corruption on serialize:** don't write `…unwrap_or_default()` bytes — empty or
default bytes silently produce a corrupt record that later parses as a wrong-but-valid value.
- **Don't gold-plate internal artifacts:** for non-user-facing on-disk output, optimize for
not-misleading over pretty (e.g. don't spend effort pretty-printing an internal dump).

## Unit-safety

- **Annotate ambiguous integers:** two same-typed integers with different units in one
struct or signature is a trap — put the unit in the name or a doc so durations and
comparisons can't be miscomputed.
- **One conversion choke point:** route scaling conversions (`*1000` / `/1000`, ms↔ns, etc.)
through a single named helper with a `_lossy` suffix when it truncates, so the lossy step is
visible at every call site and a fix is a one-file diff.

## Test-&-perf-surface

- **Assertion placement:** put an assertion where the test or fuzzer actually observes the
condition — not after an early `return Err` that makes it tautologically true (the failing
case is then never exercised).
- **Evidence for perf-shaped / hot-path changes:** require a benchmark, and an equivalence test
pinning new == old behavior, before trusting an optimization or a hot-path restructure.
- **Dead or test-only scaffolding is a trapdoor:** a `Default`/`Option`/bootstrap field or
method with zero production callers isn't merely unused — it's a latent footgun a future
caller can trip. Confirm zero callers, then delete it.
107 changes: 107 additions & 0 deletions .github/actions/claude-pr-review/rubric.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# MegaETH PR Review Rubric (shared baseline)

A consumer repo's `REVIEW.md` / `CLAUDE.md` / `AGENTS.md` override or extend anything here;
defer to them on conflict. This is the general baseline only.

## Stance (governs everything below)

- Review **structure and design, not lines**. The deepest finding makes a special case
disappear; the shallowest renames a variable. Spend attention high.
- The goal is improving code health over time — approve "better" even when not "perfect."
Critique the code, not the person, and always say _why_.
- **Personal taste or style is never a change request.** If the formatter, a linter, the type
checker, or the author's reasonable judgment can own it, stay silent.
- Severity is **leverage** (how much the design improves), not effort to spot. Default to
silence — a clean PR gets a clean verdict, not invented findings.

## Lenses — judge in this order; the earlier the lens, the higher its leverage

### 1. Design & data structures

Are the core types right? Bad code around good data structures is fixable; the reverse is not.

- Encode invariants in the type so the compiler catches misuse — prefer a newtype/enum/`OnceCell`
over a primitive-with-sentinel. A config knob or `Default` with only one correct value is a
footgun; remove it.
- Does each piece live where its responsibility lives? One concept = one type across all its
consumers, not a parallel type per subsystem. Decompose god-structs (pure logic that returns
effects + a thin orchestrator).
- Prefer eliminating an edge case through better design over adding a conditional.
- Keep it simple — can the number of concepts be cut? An abstraction is _earned_ only if it
(a) hides an impl detail, (b) enables a test fake, **and (c) keeps its generic bounds
confined** — otherwise inline it; a single call site rarely earns a new trait or indirection.
- Respect existing framework/library conventions rather than replacing them.

### 2. Correctness & safety

- Edge cases, off-by-ones, and races; extra scrutiny for consensus-/protocol-/wire-critical or
backward-compat-sensitive code.
- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A _fatal_
condition must actually terminate the unit, not just the loop that observed it.
- Errors are handled or their ignore is justified with a comment; no silently dropped sends or
join-errors; prefer typed error enums over opaque/stringly errors.
- Unsafe code carries a `SAFETY` comment justifying its invariants. No unhandled concurrency
hazards — shared mutable state, TOCTOU, deadlocks.

### 3. Surface stewardship

What does this disturb that others depend on?

- Call out breaking changes to any published or public API/contract explicitly; public API or
interface changes carry a documented rationale.
- Minimize the diff to vendored/forked/upstream files — even comment-only drift costs rebase
surface; keep customizations in our own modules.
- Don't change the semantics of an existing function — add a new one.
- Treat stable wire, on-disk, and public surfaces as sacred.

### 4. Change & operate

Can the next person change and run this safely?

- **Tests:** present for new/changed logic; deterministic (no sleep/wall-clock); assert exact
values and specific error variants (not `is_ok`/truthiness); cover side effects and the
absence of unintended change. If logic is hard to test, that's a design signal (split pure
logic from I/O). A bool param every caller passes identically is an untested dimension. Don't
build the expected value from the same primitive under test.
- **Naming & docs:** names describe state/intent, not effect or serialization format; drop
redundant qualifiers. Comments and docs explain _why_, not _what_; new modules carry a
one-line orientation. A missing or misleading doc on a non-obvious invariant is a real gap.
- **Observability:** error variants carry the disagreeing values; loss/drop logs carry the
correlating key; fail-open paths expose a scrape-visible counter; long-running spawned tasks
get a tracing span; never print secrets into `--help`. Logging uses structured fields at
appropriate levels (not string interpolation); recurring counts/durations are metrics, not logs.
- **Doc proximity:** implementation mechanics live with the code; design docs hold
architecture/rationale and link to code rather than restating it. Flag the same fact
duplicated across homes — pick one authoritative home.

## Do NOT flag

- Formatting, lint, or type errors already owned by the repo's formatter/linter/type-checker —
never spend review effort on style.
- Redundant guards that aid readability; empirically-tuned constants/thresholds.
- Anything already addressed in the diff (read the full diff first).
- Avoid bikeshedding, nitpick avalanches, gatekeeping on style or hypotheticals, and
manufacturing feedback. If the PR is clean, say so.

## How to deliver

- **Propose the concrete alternative** (a `suggestion` block, a sketch, a rename target) — not
just a complaint. For a consequential design ask, note the option you considered and rejected
and why.
- Open with what's genuinely good before the asks.
- Calibrate: blocking (correctness / security / breakage / consensus) vs. should-fix vs.
follow-up. Route non-blocking design improvements to a follow-up suggestion; don't block.
- A finding's altitude should match its severity — a `[Nit]` that's really a structural concern
is mis-tagged.
- Confirm the PR title follows Conventional Commits and the description explains what & why;
flag a stale or misleading title/description after new commits.

## Routing to detailed checks (read on demand)

Before finalizing, if the diff touches any of these areas, read the matching section of
`.claude-review/rubric-detail.md` and apply it; skip areas the diff does not touch (these are
line-level checks — do not load them when irrelevant):

- **Persistence / on-disk writes / serialization** → §Persistence
- **Unit / time / scaling math (ratios, conversions, timestamps)** → §Unit-safety
- **Tests, fuzzers, or fault-injection harnesses; perf-shaped changes** → §Test-&-perf-surface
Loading