From 53ac4fdac8b10f57c5f0da93b5c2ef57862b77f9 Mon Sep 17 00:00:00 2001 From: William Aaron Cheung Date: Tue, 30 Jun 2026 20:06:52 +0800 Subject: [PATCH 1/2] feat(pr-review): extract review rubric to files and restructure to 4 lenses Move the inline review rubric out of the action.yml prompt block into a sibling rubric.md, injected at compose time, so it can be edited and reviewed as prose instead of a YAML block scalar. Restructure the eight priority items plus the also-check list into four leverage-ordered lenses (design & data structures, correctness & safety, surface stewardship, change & operate) to keep the reviewer at a structural altitude and curb nit avalanches. Add rubric-detail.md for conditional line-level checks (persistence, unit-safety, test/perf surface), staged into the workspace and read on demand only for the areas a diff touches. No existing checks are dropped. --- .github/actions/claude-pr-review/action.yml | 48 ++++------ .../actions/claude-pr-review/rubric-detail.md | 34 +++++++ .github/actions/claude-pr-review/rubric.md | 95 +++++++++++++++++++ 3 files changed, 148 insertions(+), 29 deletions(-) create mode 100644 .github/actions/claude-pr-review/rubric-detail.md create mode 100644 .github/actions/claude-pr-review/rubric.md diff --git a/.github/actions/claude-pr-review/action.yml b/.github/actions/claude-pr-review/action.yml index c9f3164..4090660 100644 --- a/.github/actions/claude-pr-review/action.yml +++ b/.github/actions/claude-pr-review/action.yml @@ -85,6 +85,20 @@ runs: fi echo "$delim" >> "$GITHUB_OUTPUT" + # Inject the shared review rubric, maintained as a sibling markdown file so it + # can be edited and reviewed as prose rather than buried in a YAML block scalar. + rubric_delim="RUBRIC_$RANDOM$RANDOM" + { + echo "rubric<<$rubric_delim" + cat "$GITHUB_ACTION_PATH/rubric.md" + echo "$rubric_delim" + } >> "$GITHUB_OUTPUT" + + # Stage the on-demand detailed checklist into the workspace so the review agent + # can Read it lazily, only for the areas a given diff actually touches. + mkdir -p "$GITHUB_WORKSPACE/.claude-review" + cp "$GITHUB_ACTION_PATH/rubric-detail.md" "$GITHUB_WORKSPACE/.claude-review/rubric-detail.md" + - uses: anthropics/claude-code-action@c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1 with: claude_code_oauth_token: ${{ inputs.claude_code_oauth_token }} @@ -107,35 +121,11 @@ runs: general guidance below wherever they conflict. Be concise and actionable. Only comment on issues that matter. - Apply the following common review rubric as a baseline. A consumer repo's REVIEW.md and other agent files may override or extend any part of it; defer to those wherever they conflict. - - Mindset: - - The goal is to improve overall code health over time. Approve changes that make the codebase better even if not perfect; do not let "perfect" block "better". - - Review is also knowledge transfer. Critique the code, not the person, and explain why — cite a reason, not just "change this to that". - - Prioritize attention in this order (defer everything below the line to tooling): - 1. Data structures — are the core types right? Bad code around good data structures is fixable; good code around bad data structures is not. - 2. Design — is the approach right and does it fit the architecture? Prefer eliminating edge cases through better design over adding conditionals. - 3. Correctness — edge cases, off-by-ones, race conditions, and especially any change to consensus-, protocol-, or wire-critical code or a backward-compatibility-sensitive contract (extra scrutiny here). - 4. Complexity — could it be simpler? Can the number of concepts be cut? - 5. Breakage — which downstream consumers or public APIs/contracts could this break? A breaking change to a published contract must be called out explicitly. - 6. Tests — present for new or changed logic; deterministic (no sleep/wall-clock/flaky); assert exact values and specific error variants (not just is_ok/is_some/truthiness); cover side effects and the absence of unintended changes. - 7. Naming and comments — names are clear; comments explain why, not what. - 8. Style — defer to the repo's formatter and linter; never spend review effort on formatting. - - Also check: - - PR title follows Conventional Commits and the description explains what changed and why; flag a title or description that is stale or misleading after new commits. - - unsafe code carries a SAFETY comment; errors are checked or have a comment justifying the ignore; no unhandled concurrency hazards (shared mutable state, TOCTOU, deadlocks, dropped errors or promise rejections). - - Public API or interface changes have a documented rationale, and framework conventions are respected rather than replaced. - - Logging uses structured fields at appropriate levels (not string interpolation or println-equivalents); recurring counts and durations are metrics, not logs. - - Documentation proximity matches the scope of the fact being documented: implementation mechanics live inline with the code they describe, while separate design docs focus on architecture and rationale and link to code instead of restating details that will drift. - - Flag the same fact duplicated across inline docs and separate docs; pick one authoritative home and link from the other. - - Do NOT flag: - - Formatting or lint already enforced by the repo's formatter/linter, or type errors caught by its type checker. - - Redundant guards that aid readability, or empirically-tuned constants and thresholds. - - Anything already addressed in the diff under review — read the full diff first. - Avoid reviewer anti-patterns: bikeshedding, rubber-stamping, nitpick avalanches, gatekeeping on style or hypothetical concerns, and manufacturing feedback. If the PR is clean, say so briefly in a structured review body instead of inventing comments. + Apply the common review rubric below as a baseline. A consumer repo's REVIEW.md and + other agent files may override or extend any part of it; defer to those wherever they + conflict. + + ${{ steps.compose.outputs.rubric }} Follow this review contract in order. Be exhaustive before posting. Treat Phases 1 through 5 as an iterative loop: after one pass, revisit the diff, checklist, local guidance, and prior threads again if you found any new risk area, uncertainty, or candidate finding. Keep iterating until a fresh pass produces no new actionable findings. Do not post new review feedback until all analysis phases are complete and you are ready to submit one unified review. diff --git a/.github/actions/claude-pr-review/rubric-detail.md b/.github/actions/claude-pr-review/rubric-detail.md new file mode 100644 index 0000000..01d6be0 --- /dev/null +++ b/.github/actions/claude-pr-review/rubric-detail.md @@ -0,0 +1,34 @@ +# Detailed review checks (conditional) + +Read only the section(s) the diff actually touches — see "Routing to detailed checks" in the +rubric. These are line-level checks that would clutter a review if applied everywhere. Keep +findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in which case escalate. + +## Persistence +- **Crash-consistency:** on-disk writes must survive a crash mid-write — write to a temp file + and atomically `rename()` into place; never leave a half-written file on crash or power loss. +- **Atomic multi-step persists:** a sequence that must be all-or-nothing (e.g. clean-old + + write-new) should be one atomic operation; a crash between steps must not leave inconsistent + on-disk state. +- **No silent corruption on serialize:** don't write `…unwrap_or_default()` bytes — empty or + default bytes silently produce a corrupt record that later parses as a wrong-but-valid value. +- **Don't gold-plate internal artifacts:** for non-user-facing on-disk output, optimize for + not-misleading over pretty (e.g. don't spend effort pretty-printing an internal dump). + +## Unit-safety +- **Annotate ambiguous integers:** two same-typed integers with different units in one + struct or signature is a trap — put the unit in the name or a doc so durations and + comparisons can't be miscomputed. +- **One conversion choke point:** route scaling conversions (`*1000` / `/1000`, ms↔ns, etc.) + through a single named helper with a `_lossy` suffix when it truncates, so the lossy step is + visible at every call site and a fix is a one-file diff. + +## Test-&-perf-surface +- **Assertion placement:** put an assertion where the test or fuzzer actually observes the + condition — not after an early `return Err` that makes it tautologically true (the failing + case is then never exercised). +- **Evidence for perf-shaped / hot-path changes:** require a benchmark, and an equivalence test + pinning new == old behavior, before trusting an optimization or a hot-path restructure. +- **Dead or test-only scaffolding is a trapdoor:** a `Default`/`Option`/bootstrap field or + method with zero production callers isn't merely unused — it's a latent footgun a future + caller can trip. Confirm zero callers, then delete it. diff --git a/.github/actions/claude-pr-review/rubric.md b/.github/actions/claude-pr-review/rubric.md new file mode 100644 index 0000000..2d92f8c --- /dev/null +++ b/.github/actions/claude-pr-review/rubric.md @@ -0,0 +1,95 @@ +# MegaETH PR Review Rubric (shared baseline) + +A consumer repo's `REVIEW.md` / `CLAUDE.md` / `AGENTS.md` override or extend anything here; +defer to them on conflict. This is the general baseline only. + +## Stance (governs everything below) +- Review **structure and design, not lines**. The deepest finding makes a special case + disappear; the shallowest renames a variable. Spend attention high. +- The goal is improving code health over time — approve "better" even when not "perfect." + Critique the code, not the person, and always say *why*. +- **Personal taste or style is never a change request.** If the formatter, a linter, the type + checker, or the author's reasonable judgment can own it, stay silent. +- Severity is **leverage** (how much the design improves), not effort to spot. Default to + silence — a clean PR gets a clean verdict, not invented findings. + +## Lenses — judge in this order; the earlier the lens, the higher its leverage + +### 1. Design & data structures +Are the core types right? Bad code around good data structures is fixable; the reverse is not. +- Encode invariants in the type so the compiler catches misuse — prefer a newtype/enum/`OnceCell` + over a primitive-with-sentinel. A config knob or `Default` with only one correct value is a + footgun; remove it. +- Does each piece live where its responsibility lives? One concept = one type across all its + consumers, not a parallel type per subsystem. Decompose god-structs (pure logic that returns + effects + a thin orchestrator). +- Prefer eliminating an edge case through better design over adding a conditional. +- Keep it simple — can the number of concepts be cut? An abstraction is *earned* only if it + (a) hides an impl detail, (b) enables a test fake, **and (c) keeps its generic bounds + confined** — otherwise inline it; a single call site rarely earns a new trait or indirection. +- Respect existing framework/library conventions rather than replacing them. + +### 2. Correctness & safety +- Edge cases, off-by-ones, and races; extra scrutiny for consensus-/protocol-/wire-critical or + backward-compat-sensitive code. +- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A *fatal* + condition must actually terminate the unit, not just the loop that observed it. +- Errors are handled or their ignore is justified with a comment; no silently dropped sends or + join-errors; prefer typed error enums over opaque/stringly errors. +- Unsafe code carries a `SAFETY` comment justifying its invariants. No unhandled concurrency + hazards — shared mutable state, TOCTOU, deadlocks. + +### 3. Surface stewardship +What does this disturb that others depend on? +- Call out breaking changes to any published or public API/contract explicitly; public API or + interface changes carry a documented rationale. +- Minimize the diff to vendored/forked/upstream files — even comment-only drift costs rebase + surface; keep customizations in our own modules. +- Don't change the semantics of an existing function — add a new one. +- Treat stable wire, on-disk, and public surfaces as sacred. + +### 4. Change & operate +Can the next person change and run this safely? +- **Tests:** present for new/changed logic; deterministic (no sleep/wall-clock); assert exact + values and specific error variants (not `is_ok`/truthiness); cover side effects and the + absence of unintended change. If logic is hard to test, that's a design signal (split pure + logic from I/O). A bool param every caller passes identically is an untested dimension. Don't + build the expected value from the same primitive under test. +- **Naming & docs:** names describe state/intent, not effect or serialization format; drop + redundant qualifiers. Comments and docs explain *why*, not *what*; new modules carry a + one-line orientation. A missing or misleading doc on a non-obvious invariant is a real gap. +- **Observability:** error variants carry the disagreeing values; loss/drop logs carry the + correlating key; fail-open paths expose a scrape-visible counter; long-running spawned tasks + get a tracing span; never print secrets into `--help`. Logging uses structured fields at + appropriate levels (not string interpolation); recurring counts/durations are metrics, not logs. +- **Doc proximity:** implementation mechanics live with the code; design docs hold + architecture/rationale and link to code rather than restating it. Flag the same fact + duplicated across homes — pick one authoritative home. + +## Do NOT flag +- Formatting, lint, or type errors already owned by the repo's formatter/linter/type-checker — + never spend review effort on style. +- Redundant guards that aid readability; empirically-tuned constants/thresholds. +- Anything already addressed in the diff (read the full diff first). +- Avoid bikeshedding, nitpick avalanches, gatekeeping on style or hypotheticals, and + manufacturing feedback. If the PR is clean, say so. + +## How to deliver +- **Propose the concrete alternative** (a `suggestion` block, a sketch, a rename target) — not + just a complaint. For a consequential design ask, note the option you considered and rejected + and why. +- Open with what's genuinely good before the asks. +- Calibrate: blocking (correctness / security / breakage / consensus) vs. should-fix vs. + follow-up. Route non-blocking design improvements to a follow-up suggestion; don't block. +- A finding's altitude should match its severity — a `[Nit]` that's really a structural concern + is mis-tagged. +- Confirm the PR title follows Conventional Commits and the description explains what & why; + flag a stale or misleading title/description after new commits. + +## Routing to detailed checks (read on demand) +Before finalizing, if the diff touches any of these areas, read the matching section of +`.claude-review/rubric-detail.md` and apply it; skip areas the diff does not touch (these are +line-level checks — do not load them when irrelevant): +- **Persistence / on-disk writes / serialization** → §Persistence +- **Unit / time / scaling math (ratios, conversions, timestamps)** → §Unit-safety +- **Tests, fuzzers, or fault-injection harnesses; perf-shaped changes** → §Test-&-perf-surface From c43023ac9fa4dfe03a02bc182fc229f2248ff0dc Mon Sep 17 00:00:00 2001 From: William Aaron Cheung Date: Tue, 30 Jun 2026 20:10:51 +0800 Subject: [PATCH 2/2] style: apply prettier to new rubric markdown files --- .../actions/claude-pr-review/rubric-detail.md | 3 +++ .github/actions/claude-pr-review/rubric.md | 20 +++++++++++++++---- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/.github/actions/claude-pr-review/rubric-detail.md b/.github/actions/claude-pr-review/rubric-detail.md index 01d6be0..098c822 100644 --- a/.github/actions/claude-pr-review/rubric-detail.md +++ b/.github/actions/claude-pr-review/rubric-detail.md @@ -5,6 +5,7 @@ rubric. These are line-level checks that would clutter a review if applied every findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in which case escalate. ## Persistence + - **Crash-consistency:** on-disk writes must survive a crash mid-write — write to a temp file and atomically `rename()` into place; never leave a half-written file on crash or power loss. - **Atomic multi-step persists:** a sequence that must be all-or-nothing (e.g. clean-old + @@ -16,6 +17,7 @@ findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in whic not-misleading over pretty (e.g. don't spend effort pretty-printing an internal dump). ## Unit-safety + - **Annotate ambiguous integers:** two same-typed integers with different units in one struct or signature is a trap — put the unit in the name or a doc so durations and comparisons can't be miscomputed. @@ -24,6 +26,7 @@ findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in whic visible at every call site and a fix is a one-file diff. ## Test-&-perf-surface + - **Assertion placement:** put an assertion where the test or fuzzer actually observes the condition — not after an early `return Err` that makes it tautologically true (the failing case is then never exercised). diff --git a/.github/actions/claude-pr-review/rubric.md b/.github/actions/claude-pr-review/rubric.md index 2d92f8c..b7d9e99 100644 --- a/.github/actions/claude-pr-review/rubric.md +++ b/.github/actions/claude-pr-review/rubric.md @@ -4,10 +4,11 @@ A consumer repo's `REVIEW.md` / `CLAUDE.md` / `AGENTS.md` override or extend any defer to them on conflict. This is the general baseline only. ## Stance (governs everything below) + - Review **structure and design, not lines**. The deepest finding makes a special case disappear; the shallowest renames a variable. Spend attention high. - The goal is improving code health over time — approve "better" even when not "perfect." - Critique the code, not the person, and always say *why*. + Critique the code, not the person, and always say _why_. - **Personal taste or style is never a change request.** If the formatter, a linter, the type checker, or the author's reasonable judgment can own it, stay silent. - Severity is **leverage** (how much the design improves), not effort to spot. Default to @@ -16,7 +17,9 @@ defer to them on conflict. This is the general baseline only. ## Lenses — judge in this order; the earlier the lens, the higher its leverage ### 1. Design & data structures + Are the core types right? Bad code around good data structures is fixable; the reverse is not. + - Encode invariants in the type so the compiler catches misuse — prefer a newtype/enum/`OnceCell` over a primitive-with-sentinel. A config knob or `Default` with only one correct value is a footgun; remove it. @@ -24,15 +27,16 @@ Are the core types right? Bad code around good data structures is fixable; the r consumers, not a parallel type per subsystem. Decompose god-structs (pure logic that returns effects + a thin orchestrator). - Prefer eliminating an edge case through better design over adding a conditional. -- Keep it simple — can the number of concepts be cut? An abstraction is *earned* only if it +- Keep it simple — can the number of concepts be cut? An abstraction is _earned_ only if it (a) hides an impl detail, (b) enables a test fake, **and (c) keeps its generic bounds confined** — otherwise inline it; a single call site rarely earns a new trait or indirection. - Respect existing framework/library conventions rather than replacing them. ### 2. Correctness & safety + - Edge cases, off-by-ones, and races; extra scrutiny for consensus-/protocol-/wire-critical or backward-compat-sensitive code. -- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A *fatal* +- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A _fatal_ condition must actually terminate the unit, not just the loop that observed it. - Errors are handled or their ignore is justified with a comment; no silently dropped sends or join-errors; prefer typed error enums over opaque/stringly errors. @@ -40,7 +44,9 @@ Are the core types right? Bad code around good data structures is fixable; the r hazards — shared mutable state, TOCTOU, deadlocks. ### 3. Surface stewardship + What does this disturb that others depend on? + - Call out breaking changes to any published or public API/contract explicitly; public API or interface changes carry a documented rationale. - Minimize the diff to vendored/forked/upstream files — even comment-only drift costs rebase @@ -49,14 +55,16 @@ What does this disturb that others depend on? - Treat stable wire, on-disk, and public surfaces as sacred. ### 4. Change & operate + Can the next person change and run this safely? + - **Tests:** present for new/changed logic; deterministic (no sleep/wall-clock); assert exact values and specific error variants (not `is_ok`/truthiness); cover side effects and the absence of unintended change. If logic is hard to test, that's a design signal (split pure logic from I/O). A bool param every caller passes identically is an untested dimension. Don't build the expected value from the same primitive under test. - **Naming & docs:** names describe state/intent, not effect or serialization format; drop - redundant qualifiers. Comments and docs explain *why*, not *what*; new modules carry a + redundant qualifiers. Comments and docs explain _why_, not _what_; new modules carry a one-line orientation. A missing or misleading doc on a non-obvious invariant is a real gap. - **Observability:** error variants carry the disagreeing values; loss/drop logs carry the correlating key; fail-open paths expose a scrape-visible counter; long-running spawned tasks @@ -67,6 +75,7 @@ Can the next person change and run this safely? duplicated across homes — pick one authoritative home. ## Do NOT flag + - Formatting, lint, or type errors already owned by the repo's formatter/linter/type-checker — never spend review effort on style. - Redundant guards that aid readability; empirically-tuned constants/thresholds. @@ -75,6 +84,7 @@ Can the next person change and run this safely? manufacturing feedback. If the PR is clean, say so. ## How to deliver + - **Propose the concrete alternative** (a `suggestion` block, a sketch, a rename target) — not just a complaint. For a consequential design ask, note the option you considered and rejected and why. @@ -87,9 +97,11 @@ Can the next person change and run this safely? flag a stale or misleading title/description after new commits. ## Routing to detailed checks (read on demand) + Before finalizing, if the diff touches any of these areas, read the matching section of `.claude-review/rubric-detail.md` and apply it; skip areas the diff does not touch (these are line-level checks — do not load them when irrelevant): + - **Persistence / on-disk writes / serialization** → §Persistence - **Unit / time / scaling math (ratios, conversions, timestamps)** → §Unit-safety - **Tests, fuzzers, or fault-injection harnesses; perf-shaped changes** → §Test-&-perf-surface