Skip to content

Fix three pre-existing multi-rank broadcast defects; deregister dead parameters#1553

Merged
sbryngelson merged 5 commits into
MFlowCode:masterfrom
sbryngelson:broadcast-correctness
Jun 12, 2026
Merged

Fix three pre-existing multi-rank broadcast defects; deregister dead parameters#1553
sbryngelson merged 5 commits into
MFlowCode:masterfrom
sbryngelson:broadcast-correctness

Conversation

@sbryngelson

@sbryngelson sbryngelson commented Jun 11, 2026

Copy link
Copy Markdown
Member

Stacked on #1551 and the executable-dedup PR — this branch includes both; its own content is the top 3 commits. It exists separately because, unlike its parents, it intentionally changes multi-rank behavior (it fixes bugs).

Summary

While converting the MPI broadcast lists to registry-driven generation (parent PR), a member-by-member audit of every namelist parameter's broadcast/consumption pattern turned up three pre-existing defects on master. All three predate this work.

  1. bc_x/y/z%{vb1..3, ve1..3} were never broadcast in pre_process/post_process but are consumed on all ranks by s_slip_wall/s_no_slip_wall in m_boundary_common — rank-dependent ghost cells for any multi-rank wall-velocity BC run in pre/post. Fixed by adding the member broadcasts (simulation already had them).

  2. muscl_eps was never broadcast, and its post-read derivation runs only under f_is_default(muscl_eps) with defaults assigned rank-0-only — so non-root ranks carried divergent values on every multi-rank MUSCL run. Fixed by broadcasting it like every other registered REAL.

  3. 14 dead registrations crash the namelist read: lag_params%{T0, Thost, c0, rho0, x0} and fluid_pp%{mul0, ss, pv, gamma_v, M_v, mu_v, k_v, cp_v, D_v} were removed from the derived types (Refactor subgrid bubble models #1085/fix swapped vapor gas properties #1093) but never deregistered from the toolchain — setting any of them in a case file aborts with a misleading "datatype mismatch". Deregistered; the case validator now rejects them by name instead.

With the dead members gone, the fluid_pp/lag_params broadcast emitters convert from hardcoded member lists to registry walks — a member added to those types is now broadcast by construction, closing the gap that allowed bugs 1–2 to exist.

Verification

  • Full golden suite (sharded, multi-rank) green on the final tip.
  • Broadcast tuple-set comparison per target: deltas are exactly the intended additions (vb/ve members in pre/post, muscl_eps everywhere it is registered) and the dead-member removals; everything else identical.
  • Toolchain pytest green, including new tests pinning the registry-walk emission and the deregistrations.

Issues fixed

Fixes #1559 (muscl_eps never broadcast)
Fixes #1560 (wall-velocity vb/ve members never broadcast in pre/post)
Fixes #1561 (pre_process broadcasts integer BC codes as mpi_p)
Fixes #1562 (14 phantom parameter registrations crash the namelist read)

Copilot AI review requested due to automatic review settings June 11, 2026 01:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@sbryngelson

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. pre_process broadcast the integer bc_x/y/z%beg/%end members with mpi_p (an 8-byte real transfer over 4-byte integers — undefined behavior that happened to work by struct adjacency), while simulation and post_process correctly use MPI_INTEGER for the same members. Pre-existing on master, but this PR reorganized exactly this section — making it the fourth multi-rank broadcast defect this series fixes. Fixed in ea0f69b, which also adds a test pinning the hand-written vb/ve and BC-code residue so a merge conflict cannot silently drop them.

& 'y_domain%end', 'z_domain%beg', 'z_domain%end', &
& 'bc_x%beg', 'bc_x%end', 'bc_y%beg', 'bc_y%end', &
& 'bc_z%beg', 'bc_z%end', 'bc_x%Twall_in', 'bc_x%Twall_out', &
& 'bc_y%Twall_in', 'bc_y%Twall_out', 'bc_z%Twall_in', &

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@codecov

codecov Bot commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.00%. Comparing base (ac30c32) to head (ca17a0d).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1553      +/-   ##
==========================================
- Coverage   61.17%   61.00%   -0.18%     
==========================================
  Files          74       75       +1     
  Lines       20313    19970     -343     
  Branches     2961     2930      -31     
==========================================
- Hits        12427    12182     -245     
+ Misses       5870     5806      -64     
+ Partials     2016     1982      -34     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fluid_pp%{mul0,ss,pv,gamma_v,M_v,mu_v,k_v,cp_v,D_v} and
lag_params%{T0,Thost,c0,rho0,x0} were removed from the Fortran derived
types by upstream MFlowCode#1085/MFlowCode#1093 but remained registered in the toolchain.
Setting any of them causes namelist read to abort with a misleading
'datatype mismatch' error.

Verified against src/common/m_derived_types.fpp:
- physical_parameters has: gamma, pi_inf, Re, cv, qv, qvp, G
- bubbles_lagrange_parameters has: solver_approach, cluster_type,
  pressure_corrector, smooth_type, heatTransfer_model, massTransfer_model,
  write_bubbles, write_bubbles_stats, nBubs_glb, epsilonb, charwidth, valmaxvoid

Also remove the now-dead PATTERNS entries in descriptions.py and update
the stale comments in fortran_gen.py (_emit_fluid_pp and _emit_lag_params).
Both emitters now walk the registry rather than hardcoding member lists.
bc_x/y/z%{vb1,vb2,vb3,ve1,ve2,ve3} are read from the namelist on rank 0
and consumed on all ranks by s_slip_wall/s_no_slip_wall in
src/common/m_boundary_common.fpp (~833-1155). These routines are compiled
into all three targets and are reached from pre_process (m_data_output,
m_perturbation) and post_process (m_start_up) code paths.

Without the broadcast, non-root ranks use uninitialised values for the
wall-velocity components, producing rank-dependent ghost cells for any
multi-rank wall-BC pre-process or post-process run. The sim residue has
always broadcast these; pre and post were missing them.

Add the matching broadcasts to both pre and post using the same nested
Fypp loop idiom as the simulation residue.
muscl_eps was excluded from broadcast generation via _BCAST_EXCLUDE on the
incorrect assumption that it is derived post-broadcast. The derivation
(in m_weno or m_muscl) only fires under f_is_default(muscl_eps), and
default values are assigned on rank 0 only. Every multi-rank MUSCL run
therefore had rank-divergent muscl_eps on non-root ranks. Remove it from
the exclusion set.

Tuple-set delta (var, mpi_type, count) vs. HEAD~:
  sim: +1 entry: (muscl_eps, mpi_p, 1)
  pre: no change
  post: no change

_emit_fluid_pp and _emit_lag_params now walk the registry instead of
maintaining hardcoded member lists. After Commit 1 deregistered the
dead members (mul0/ss/pv/gamma_v/M_v/mu_v/k_v/cp_v/D_v for fluid_pp;
T0/Thost/c0/rho0/x0 for lag_params), the registry now matches the
Fortran types exactly. Re(1) count=2 remains sim-only via an explicit
target check with a comment. G is walked as a regular REAL member.

Tests: 3 new tests added — muscl_eps now broadcast in sim, fluid_pp and
lag_params registry walks produce exactly the registered members minus
documented exclusions, dead members absent.
…idue broadcasts

pre_process sent the integer bc_x/y/z%beg/%end with mpi_p (an 8-byte real transfer over 4-byte integers - undefined behavior that happened to work by adjacency); simulation and post_process already used MPI_INTEGER. Also adds a test pinning the hand-written vb/ve and BC-code residue in pre/post so a merge conflict cannot silently drop them. Both from review.
@sbryngelson sbryngelson force-pushed the broadcast-correctness branch from 8eb6e18 to c2cd9c5 Compare June 12, 2026 02:55
@sbryngelson sbryngelson merged commit 6dec86a into MFlowCode:master Jun 12, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment