You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The 2026-06 series validated a repeatable, safe recipe for decomposing MFC's giant modules — used on m_boundary_common (2,355 lines → 3 modules, #1555) and m_riemann_solvers (4,706 lines → 6 modules, #1556), both with zero behavior change proven at the emitted-statement level and md5-identical GPU directive sets. The remaining giants in src/simulation/:
file
lines
m_rhs.fpp
~1,850
m_bubbles_EL.fpp
~1,640
m_start_up.fpp
~1,500
m_cbc.fpp
~1,430
m_rhs is the highest-value target: it is the per-step orchestrator everyone reads first to understand the solver, and its stage structure (buffer population → reconstruction → Riemann fluxes → source terms → time-derivative assembly) suggests natural seams the way the Riemann dispatcher did.
Investigation first (read-only): every routine's size, callers (in-module vs external), module-level state inventory with GPU_DECLARE status, and — the critical lesson from the Riemann split — the call-graph direction: routines called BY the extracted pieces must move to a lower layer than the dispatching core, or a module use-cycle results. The fypp #:def/include inventory comes first (defs do not cross file boundaries).
State placement by the lowest-consumer rule: GPU-resident state moves with its on-device consumers; declares move with declarations (Cray rejects declare-target on use-associated names); lifecycle allocation can stay in the orchestrator via use-association.
Pure motion, one extracted module per commit, smallest first; the original module keeps its public list and re-exports, so zero external callers change.
Verification battery per commit: emitted-statement multiset equivalence (seed-pinned, per backend), GPU-directive multiset md5, declare-scoping check, full golden suite per module-split, and — for anything touching hot loops — the per-case benchmark comparison against a pre-series baseline (the <5% grind-time gate used in Phase-3: Riemann hot-path decomposition into shared GPU device helpers #1572; per-solver benchmark cases were added there for exactly this purpose).
Effort: moderate per module (the Riemann split, the largest, was a single focused effort start to finish including independent review). Risk: low — the pattern's two previous applications shipped with zero regressions, and the verification battery catches the known failure modes (it caught real issues both times). Suggested order: m_rhs, then m_cbc (self-contained physics), m_bubbles_EL, with m_start_up last (its read/restart logic is the least seam-friendly).
Motivation
The 2026-06 series validated a repeatable, safe recipe for decomposing MFC's giant modules — used on
m_boundary_common(2,355 lines → 3 modules, #1555) andm_riemann_solvers(4,706 lines → 6 modules, #1556), both with zero behavior change proven at the emitted-statement level and md5-identical GPU directive sets. The remaining giants insrc/simulation/:m_rhs.fppm_bubbles_EL.fppm_start_up.fppm_cbc.fppm_rhsis the highest-value target: it is the per-step orchestrator everyone reads first to understand the solver, and its stage structure (buffer population → reconstruction → Riemann fluxes → source terms → time-derivative assembly) suggests natural seams the way the Riemann dispatcher did.Proposal & execution sketch (per module, m_rhs first)
The recipe, now twice-proven, per module:
#:def/include inventory comes first (defs do not cross file boundaries).Effort: moderate per module (the Riemann split, the largest, was a single focused effort start to finish including independent review). Risk: low — the pattern's two previous applications shipped with zero regressions, and the verification battery catches the known failure modes (it caught real issues both times). Suggested order:
m_rhs, thenm_cbc(self-contained physics),m_bubbles_EL, withm_start_uplast (its read/restart logic is the least seam-friendly).