Skip to content

Decompose the remaining giant simulation modules (m_rhs first) using the validated split pattern #1577

@sbryngelson

Description

@sbryngelson

Motivation

The 2026-06 series validated a repeatable, safe recipe for decomposing MFC's giant modules — used on m_boundary_common (2,355 lines → 3 modules, #1555) and m_riemann_solvers (4,706 lines → 6 modules, #1556), both with zero behavior change proven at the emitted-statement level and md5-identical GPU directive sets. The remaining giants in src/simulation/:

file lines
m_rhs.fpp ~1,850
m_bubbles_EL.fpp ~1,640
m_start_up.fpp ~1,500
m_cbc.fpp ~1,430

m_rhs is the highest-value target: it is the per-step orchestrator everyone reads first to understand the solver, and its stage structure (buffer population → reconstruction → Riemann fluxes → source terms → time-derivative assembly) suggests natural seams the way the Riemann dispatcher did.

Proposal & execution sketch (per module, m_rhs first)

The recipe, now twice-proven, per module:

  1. Investigation first (read-only): every routine's size, callers (in-module vs external), module-level state inventory with GPU_DECLARE status, and — the critical lesson from the Riemann split — the call-graph direction: routines called BY the extracted pieces must move to a lower layer than the dispatching core, or a module use-cycle results. The fypp #:def/include inventory comes first (defs do not cross file boundaries).
  2. State placement by the lowest-consumer rule: GPU-resident state moves with its on-device consumers; declares move with declarations (Cray rejects declare-target on use-associated names); lifecycle allocation can stay in the orchestrator via use-association.
  3. Pure motion, one extracted module per commit, smallest first; the original module keeps its public list and re-exports, so zero external callers change.
  4. Verification battery per commit: emitted-statement multiset equivalence (seed-pinned, per backend), GPU-directive multiset md5, declare-scoping check, full golden suite per module-split, and — for anything touching hot loops — the per-case benchmark comparison against a pre-series baseline (the <5% grind-time gate used in Phase-3: Riemann hot-path decomposition into shared GPU device helpers #1572; per-solver benchmark cases were added there for exactly this purpose).

Effort: moderate per module (the Riemann split, the largest, was a single focused effort start to finish including independent review). Risk: low — the pattern's two previous applications shipped with zero regressions, and the verification battery catches the known failure modes (it caught real issues both times). Suggested order: m_rhs, then m_cbc (self-contained physics), m_bubbles_EL, with m_start_up last (its read/restart logic is the least seam-friendly).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions