From 53ac4fdac8b10f57c5f0da93b5c2ef57862b77f9 Mon Sep 17 00:00:00 2001
From: William Aaron Cheung <git@troublor.xyz>
Date: Tue, 30 Jun 2026 20:06:52 +0800
Subject: [PATCH 1/2] feat(pr-review): extract review rubric to files and
 restructure to 4 lenses

Move the inline review rubric out of the action.yml prompt block into a
sibling rubric.md, injected at compose time, so it can be edited and reviewed
as prose instead of a YAML block scalar. Restructure the eight priority items
plus the also-check list into four leverage-ordered lenses (design & data
structures, correctness & safety, surface stewardship, change & operate) to
keep the reviewer at a structural altitude and curb nit avalanches.

Add rubric-detail.md for conditional line-level checks (persistence,
unit-safety, test/perf surface), staged into the workspace and read on demand
only for the areas a diff touches. No existing checks are dropped.
---
 .github/actions/claude-pr-review/action.yml   | 48 ++++------
 .../actions/claude-pr-review/rubric-detail.md | 34 +++++++
 .github/actions/claude-pr-review/rubric.md    | 95 +++++++++++++++++++
 3 files changed, 148 insertions(+), 29 deletions(-)
 create mode 100644 .github/actions/claude-pr-review/rubric-detail.md
 create mode 100644 .github/actions/claude-pr-review/rubric.md

diff --git a/.github/actions/claude-pr-review/action.yml b/.github/actions/claude-pr-review/action.yml
index c9f3164..4090660 100644
--- a/.github/actions/claude-pr-review/action.yml
+++ b/.github/actions/claude-pr-review/action.yml
@@ -85,6 +85,20 @@ runs:
         fi
         echo "$delim" >> "$GITHUB_OUTPUT"
 
+        # Inject the shared review rubric, maintained as a sibling markdown file so it
+        # can be edited and reviewed as prose rather than buried in a YAML block scalar.
+        rubric_delim="RUBRIC_$RANDOM$RANDOM"
+        {
+          echo "rubric<<$rubric_delim"
+          cat "$GITHUB_ACTION_PATH/rubric.md"
+          echo "$rubric_delim"
+        } >> "$GITHUB_OUTPUT"
+
+        # Stage the on-demand detailed checklist into the workspace so the review agent
+        # can Read it lazily, only for the areas a given diff actually touches.
+        mkdir -p "$GITHUB_WORKSPACE/.claude-review"
+        cp "$GITHUB_ACTION_PATH/rubric-detail.md" "$GITHUB_WORKSPACE/.claude-review/rubric-detail.md"
+
     - uses: anthropics/claude-code-action@c3d45e8e941e1b2ad7b278c57482d9c5bf1f35b3 # v1
       with:
         claude_code_oauth_token: ${{ inputs.claude_code_oauth_token }}
@@ -107,35 +121,11 @@ runs:
           general guidance below wherever they conflict.
           Be concise and actionable. Only comment on issues that matter.
 
-          Apply the following common review rubric as a baseline. A consumer repo's REVIEW.md and other agent files may override or extend any part of it; defer to those wherever they conflict.
-
-          Mindset:
-          - The goal is to improve overall code health over time. Approve changes that make the codebase better even if not perfect; do not let "perfect" block "better".
-          - Review is also knowledge transfer. Critique the code, not the person, and explain why — cite a reason, not just "change this to that".
-
-          Prioritize attention in this order (defer everything below the line to tooling):
-          1. Data structures — are the core types right? Bad code around good data structures is fixable; good code around bad data structures is not.
-          2. Design — is the approach right and does it fit the architecture? Prefer eliminating edge cases through better design over adding conditionals.
-          3. Correctness — edge cases, off-by-ones, race conditions, and especially any change to consensus-, protocol-, or wire-critical code or a backward-compatibility-sensitive contract (extra scrutiny here).
-          4. Complexity — could it be simpler? Can the number of concepts be cut?
-          5. Breakage — which downstream consumers or public APIs/contracts could this break? A breaking change to a published contract must be called out explicitly.
-          6. Tests — present for new or changed logic; deterministic (no sleep/wall-clock/flaky); assert exact values and specific error variants (not just is_ok/is_some/truthiness); cover side effects and the absence of unintended changes.
-          7. Naming and comments — names are clear; comments explain why, not what.
-          8. Style — defer to the repo's formatter and linter; never spend review effort on formatting.
-
-          Also check:
-          - PR title follows Conventional Commits and the description explains what changed and why; flag a title or description that is stale or misleading after new commits.
-          - unsafe code carries a SAFETY comment; errors are checked or have a comment justifying the ignore; no unhandled concurrency hazards (shared mutable state, TOCTOU, deadlocks, dropped errors or promise rejections).
-          - Public API or interface changes have a documented rationale, and framework conventions are respected rather than replaced.
-          - Logging uses structured fields at appropriate levels (not string interpolation or println-equivalents); recurring counts and durations are metrics, not logs.
-          - Documentation proximity matches the scope of the fact being documented: implementation mechanics live inline with the code they describe, while separate design docs focus on architecture and rationale and link to code instead of restating details that will drift.
-          - Flag the same fact duplicated across inline docs and separate docs; pick one authoritative home and link from the other.
-
-          Do NOT flag:
-          - Formatting or lint already enforced by the repo's formatter/linter, or type errors caught by its type checker.
-          - Redundant guards that aid readability, or empirically-tuned constants and thresholds.
-          - Anything already addressed in the diff under review — read the full diff first.
-          Avoid reviewer anti-patterns: bikeshedding, rubber-stamping, nitpick avalanches, gatekeeping on style or hypothetical concerns, and manufacturing feedback. If the PR is clean, say so briefly in a structured review body instead of inventing comments.
+          Apply the common review rubric below as a baseline. A consumer repo's REVIEW.md and
+          other agent files may override or extend any part of it; defer to those wherever they
+          conflict.
+
+          ${{ steps.compose.outputs.rubric }}
 
           Follow this review contract in order. Be exhaustive before posting. Treat Phases 1 through 5 as an iterative loop: after one pass, revisit the diff, checklist, local guidance, and prior threads again if you found any new risk area, uncertainty, or candidate finding. Keep iterating until a fresh pass produces no new actionable findings. Do not post new review feedback until all analysis phases are complete and you are ready to submit one unified review.
 
diff --git a/.github/actions/claude-pr-review/rubric-detail.md b/.github/actions/claude-pr-review/rubric-detail.md
new file mode 100644
index 0000000..01d6be0
--- /dev/null
+++ b/.github/actions/claude-pr-review/rubric-detail.md
@@ -0,0 +1,34 @@
+# Detailed review checks (conditional)
+
+Read only the section(s) the diff actually touches — see "Routing to detailed checks" in the
+rubric. These are line-level checks that would clutter a review if applied everywhere. Keep
+findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in which case escalate.
+
+## Persistence
+- **Crash-consistency:** on-disk writes must survive a crash mid-write — write to a temp file
+  and atomically `rename()` into place; never leave a half-written file on crash or power loss.
+- **Atomic multi-step persists:** a sequence that must be all-or-nothing (e.g. clean-old +
+  write-new) should be one atomic operation; a crash between steps must not leave inconsistent
+  on-disk state.
+- **No silent corruption on serialize:** don't write `…unwrap_or_default()` bytes — empty or
+  default bytes silently produce a corrupt record that later parses as a wrong-but-valid value.
+- **Don't gold-plate internal artifacts:** for non-user-facing on-disk output, optimize for
+  not-misleading over pretty (e.g. don't spend effort pretty-printing an internal dump).
+
+## Unit-safety
+- **Annotate ambiguous integers:** two same-typed integers with different units in one
+  struct or signature is a trap — put the unit in the name or a doc so durations and
+  comparisons can't be miscomputed.
+- **One conversion choke point:** route scaling conversions (`*1000` / `/1000`, ms↔ns, etc.)
+  through a single named helper with a `_lossy` suffix when it truncates, so the lossy step is
+  visible at every call site and a fix is a one-file diff.
+
+## Test-&-perf-surface
+- **Assertion placement:** put an assertion where the test or fuzzer actually observes the
+  condition — not after an early `return Err` that makes it tautologically true (the failing
+  case is then never exercised).
+- **Evidence for perf-shaped / hot-path changes:** require a benchmark, and an equivalence test
+  pinning new == old behavior, before trusting an optimization or a hot-path restructure.
+- **Dead or test-only scaffolding is a trapdoor:** a `Default`/`Option`/bootstrap field or
+  method with zero production callers isn't merely unused — it's a latent footgun a future
+  caller can trip. Confirm zero callers, then delete it.
diff --git a/.github/actions/claude-pr-review/rubric.md b/.github/actions/claude-pr-review/rubric.md
new file mode 100644
index 0000000..2d92f8c
--- /dev/null
+++ b/.github/actions/claude-pr-review/rubric.md
@@ -0,0 +1,95 @@
+# MegaETH PR Review Rubric (shared baseline)
+
+A consumer repo's `REVIEW.md` / `CLAUDE.md` / `AGENTS.md` override or extend anything here;
+defer to them on conflict. This is the general baseline only.
+
+## Stance (governs everything below)
+- Review **structure and design, not lines**. The deepest finding makes a special case
+  disappear; the shallowest renames a variable. Spend attention high.
+- The goal is improving code health over time — approve "better" even when not "perfect."
+  Critique the code, not the person, and always say *why*.
+- **Personal taste or style is never a change request.** If the formatter, a linter, the type
+  checker, or the author's reasonable judgment can own it, stay silent.
+- Severity is **leverage** (how much the design improves), not effort to spot. Default to
+  silence — a clean PR gets a clean verdict, not invented findings.
+
+## Lenses — judge in this order; the earlier the lens, the higher its leverage
+
+### 1. Design & data structures
+Are the core types right? Bad code around good data structures is fixable; the reverse is not.
+- Encode invariants in the type so the compiler catches misuse — prefer a newtype/enum/`OnceCell`
+  over a primitive-with-sentinel. A config knob or `Default` with only one correct value is a
+  footgun; remove it.
+- Does each piece live where its responsibility lives? One concept = one type across all its
+  consumers, not a parallel type per subsystem. Decompose god-structs (pure logic that returns
+  effects + a thin orchestrator).
+- Prefer eliminating an edge case through better design over adding a conditional.
+- Keep it simple — can the number of concepts be cut? An abstraction is *earned* only if it
+  (a) hides an impl detail, (b) enables a test fake, **and (c) keeps its generic bounds
+  confined** — otherwise inline it; a single call site rarely earns a new trait or indirection.
+- Respect existing framework/library conventions rather than replacing them.
+
+### 2. Correctness & safety
+- Edge cases, off-by-ones, and races; extra scrutiny for consensus-/protocol-/wire-critical or
+  backward-compat-sensitive code.
+- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A *fatal*
+  condition must actually terminate the unit, not just the loop that observed it.
+- Errors are handled or their ignore is justified with a comment; no silently dropped sends or
+  join-errors; prefer typed error enums over opaque/stringly errors.
+- Unsafe code carries a `SAFETY` comment justifying its invariants. No unhandled concurrency
+  hazards — shared mutable state, TOCTOU, deadlocks.
+
+### 3. Surface stewardship
+What does this disturb that others depend on?
+- Call out breaking changes to any published or public API/contract explicitly; public API or
+  interface changes carry a documented rationale.
+- Minimize the diff to vendored/forked/upstream files — even comment-only drift costs rebase
+  surface; keep customizations in our own modules.
+- Don't change the semantics of an existing function — add a new one.
+- Treat stable wire, on-disk, and public surfaces as sacred.
+
+### 4. Change & operate
+Can the next person change and run this safely?
+- **Tests:** present for new/changed logic; deterministic (no sleep/wall-clock); assert exact
+  values and specific error variants (not `is_ok`/truthiness); cover side effects and the
+  absence of unintended change. If logic is hard to test, that's a design signal (split pure
+  logic from I/O). A bool param every caller passes identically is an untested dimension. Don't
+  build the expected value from the same primitive under test.
+- **Naming & docs:** names describe state/intent, not effect or serialization format; drop
+  redundant qualifiers. Comments and docs explain *why*, not *what*; new modules carry a
+  one-line orientation. A missing or misleading doc on a non-obvious invariant is a real gap.
+- **Observability:** error variants carry the disagreeing values; loss/drop logs carry the
+  correlating key; fail-open paths expose a scrape-visible counter; long-running spawned tasks
+  get a tracing span; never print secrets into `--help`. Logging uses structured fields at
+  appropriate levels (not string interpolation); recurring counts/durations are metrics, not logs.
+- **Doc proximity:** implementation mechanics live with the code; design docs hold
+  architecture/rationale and link to code rather than restating it. Flag the same fact
+  duplicated across homes — pick one authoritative home.
+
+## Do NOT flag
+- Formatting, lint, or type errors already owned by the repo's formatter/linter/type-checker —
+  never spend review effort on style.
+- Redundant guards that aid readability; empirically-tuned constants/thresholds.
+- Anything already addressed in the diff (read the full diff first).
+- Avoid bikeshedding, nitpick avalanches, gatekeeping on style or hypotheticals, and
+  manufacturing feedback. If the PR is clean, say so.
+
+## How to deliver
+- **Propose the concrete alternative** (a `suggestion` block, a sketch, a rename target) — not
+  just a complaint. For a consequential design ask, note the option you considered and rejected
+  and why.
+- Open with what's genuinely good before the asks.
+- Calibrate: blocking (correctness / security / breakage / consensus) vs. should-fix vs.
+  follow-up. Route non-blocking design improvements to a follow-up suggestion; don't block.
+- A finding's altitude should match its severity — a `[Nit]` that's really a structural concern
+  is mis-tagged.
+- Confirm the PR title follows Conventional Commits and the description explains what & why;
+  flag a stale or misleading title/description after new commits.
+
+## Routing to detailed checks (read on demand)
+Before finalizing, if the diff touches any of these areas, read the matching section of
+`.claude-review/rubric-detail.md` and apply it; skip areas the diff does not touch (these are
+line-level checks — do not load them when irrelevant):
+- **Persistence / on-disk writes / serialization** → §Persistence
+- **Unit / time / scaling math (ratios, conversions, timestamps)** → §Unit-safety
+- **Tests, fuzzers, or fault-injection harnesses; perf-shaped changes** → §Test-&-perf-surface

From c43023ac9fa4dfe03a02bc182fc229f2248ff0dc Mon Sep 17 00:00:00 2001
From: William Aaron Cheung <git@troublor.xyz>
Date: Tue, 30 Jun 2026 20:10:51 +0800
Subject: [PATCH 2/2] style: apply prettier to new rubric markdown files

---
 .../actions/claude-pr-review/rubric-detail.md |  3 +++
 .github/actions/claude-pr-review/rubric.md    | 20 +++++++++++++++----
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/.github/actions/claude-pr-review/rubric-detail.md b/.github/actions/claude-pr-review/rubric-detail.md
index 01d6be0..098c822 100644
--- a/.github/actions/claude-pr-review/rubric-detail.md
+++ b/.github/actions/claude-pr-review/rubric-detail.md
@@ -5,6 +5,7 @@ rubric. These are line-level checks that would clutter a review if applied every
 findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in which case escalate.
 
 ## Persistence
+
 - **Crash-consistency:** on-disk writes must survive a crash mid-write — write to a temp file
   and atomically `rename()` into place; never leave a half-written file on crash or power loss.
 - **Atomic multi-step persists:** a sequence that must be all-or-nothing (e.g. clean-old +
@@ -16,6 +17,7 @@ findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in whic
   not-misleading over pretty (e.g. don't spend effort pretty-printing an internal dump).
 
 ## Unit-safety
+
 - **Annotate ambiguous integers:** two same-typed integers with different units in one
   struct or signature is a trap — put the unit in the name or a doc so durations and
   comparisons can't be miscomputed.
@@ -24,6 +26,7 @@ findings here at `[Minor]`/`[Nit]` unless they cause incorrect behavior, in whic
   visible at every call site and a fix is a one-file diff.
 
 ## Test-&-perf-surface
+
 - **Assertion placement:** put an assertion where the test or fuzzer actually observes the
   condition — not after an early `return Err` that makes it tautologically true (the failing
   case is then never exercised).
diff --git a/.github/actions/claude-pr-review/rubric.md b/.github/actions/claude-pr-review/rubric.md
index 2d92f8c..b7d9e99 100644
--- a/.github/actions/claude-pr-review/rubric.md
+++ b/.github/actions/claude-pr-review/rubric.md
@@ -4,10 +4,11 @@ A consumer repo's `REVIEW.md` / `CLAUDE.md` / `AGENTS.md` override or extend any
 defer to them on conflict. This is the general baseline only.
 
 ## Stance (governs everything below)
+
 - Review **structure and design, not lines**. The deepest finding makes a special case
   disappear; the shallowest renames a variable. Spend attention high.
 - The goal is improving code health over time — approve "better" even when not "perfect."
-  Critique the code, not the person, and always say *why*.
+  Critique the code, not the person, and always say _why_.
 - **Personal taste or style is never a change request.** If the formatter, a linter, the type
   checker, or the author's reasonable judgment can own it, stay silent.
 - Severity is **leverage** (how much the design improves), not effort to spot. Default to
@@ -16,7 +17,9 @@ defer to them on conflict. This is the general baseline only.
 ## Lenses — judge in this order; the earlier the lens, the higher its leverage
 
 ### 1. Design & data structures
+
 Are the core types right? Bad code around good data structures is fixable; the reverse is not.
+
 - Encode invariants in the type so the compiler catches misuse — prefer a newtype/enum/`OnceCell`
   over a primitive-with-sentinel. A config knob or `Default` with only one correct value is a
   footgun; remove it.
@@ -24,15 +27,16 @@ Are the core types right? Bad code around good data structures is fixable; the r
   consumers, not a parallel type per subsystem. Decompose god-structs (pure logic that returns
   effects + a thin orchestrator).
 - Prefer eliminating an edge case through better design over adding a conditional.
-- Keep it simple — can the number of concepts be cut? An abstraction is *earned* only if it
+- Keep it simple — can the number of concepts be cut? An abstraction is _earned_ only if it
   (a) hides an impl detail, (b) enables a test fake, **and (c) keeps its generic bounds
   confined** — otherwise inline it; a single call site rarely earns a new trait or indirection.
 - Respect existing framework/library conventions rather than replacing them.
 
 ### 2. Correctness & safety
+
 - Edge cases, off-by-ones, and races; extra scrutiny for consensus-/protocol-/wire-critical or
   backward-compat-sensitive code.
-- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A *fatal*
+- **Fail loud over silently-wrong:** an explicit error beats a quietly wrong value. A _fatal_
   condition must actually terminate the unit, not just the loop that observed it.
 - Errors are handled or their ignore is justified with a comment; no silently dropped sends or
   join-errors; prefer typed error enums over opaque/stringly errors.
@@ -40,7 +44,9 @@ Are the core types right? Bad code around good data structures is fixable; the r
   hazards — shared mutable state, TOCTOU, deadlocks.
 
 ### 3. Surface stewardship
+
 What does this disturb that others depend on?
+
 - Call out breaking changes to any published or public API/contract explicitly; public API or
   interface changes carry a documented rationale.
 - Minimize the diff to vendored/forked/upstream files — even comment-only drift costs rebase
@@ -49,14 +55,16 @@ What does this disturb that others depend on?
 - Treat stable wire, on-disk, and public surfaces as sacred.
 
 ### 4. Change & operate
+
 Can the next person change and run this safely?
+
 - **Tests:** present for new/changed logic; deterministic (no sleep/wall-clock); assert exact
   values and specific error variants (not `is_ok`/truthiness); cover side effects and the
   absence of unintended change. If logic is hard to test, that's a design signal (split pure
   logic from I/O). A bool param every caller passes identically is an untested dimension. Don't
   build the expected value from the same primitive under test.
 - **Naming & docs:** names describe state/intent, not effect or serialization format; drop
-  redundant qualifiers. Comments and docs explain *why*, not *what*; new modules carry a
+  redundant qualifiers. Comments and docs explain _why_, not _what_; new modules carry a
   one-line orientation. A missing or misleading doc on a non-obvious invariant is a real gap.
 - **Observability:** error variants carry the disagreeing values; loss/drop logs carry the
   correlating key; fail-open paths expose a scrape-visible counter; long-running spawned tasks
@@ -67,6 +75,7 @@ Can the next person change and run this safely?
   duplicated across homes — pick one authoritative home.
 
 ## Do NOT flag
+
 - Formatting, lint, or type errors already owned by the repo's formatter/linter/type-checker —
   never spend review effort on style.
 - Redundant guards that aid readability; empirically-tuned constants/thresholds.
@@ -75,6 +84,7 @@ Can the next person change and run this safely?
   manufacturing feedback. If the PR is clean, say so.
 
 ## How to deliver
+
 - **Propose the concrete alternative** (a `suggestion` block, a sketch, a rename target) — not
   just a complaint. For a consequential design ask, note the option you considered and rejected
   and why.
@@ -87,9 +97,11 @@ Can the next person change and run this safely?
   flag a stale or misleading title/description after new commits.
 
 ## Routing to detailed checks (read on demand)
+
 Before finalizing, if the diff touches any of these areas, read the matching section of
 `.claude-review/rubric-detail.md` and apply it; skip areas the diff does not touch (these are
 line-level checks — do not load them when irrelevant):
+
 - **Persistence / on-disk writes / serialization** → §Persistence
 - **Unit / time / scaling math (ratios, conversions, timestamps)** → §Unit-safety
 - **Tests, fuzzers, or fault-injection harnesses; perf-shaped changes** → §Test-&-perf-surface