Symptom (reported by @danieljvickers, confirmed in CI run 27417122769 on a CI-files-only branch, i.e. master's code): test 70EC99CE (2D -> Example -> viscous_shock_tube) fails golden tolerance on Frontier with AMD flang + OpenMP offload. First failing variable is marginal (abs 1.38e-3 vs 1.0e-3 tolerance); the large relative errors sit on near-zero golden values inside the shock. The failure is deterministic — three repeat runs produce the identical candidate value (1.0118017002832).
Bisect evidence (real Frontier hardware, 3x repeats per ref):
Cross-backend evidence: the same test on the same refs PASSES under gpu-omp on NVIDIA (Phoenix), 3/3 both refs, and all CPU lanes are golden-green — so this is a bit-level FP-evaluation-ordering change specific to the AMD flang toolchain, within tolerance everywhere else and over the line on this one razor-margin case. Not wrong physics.
Leading hypothesis (if the midpoint passes): the #1556 module split introduced cross-module device-helper calls; cray_inline=True covers the Cray compiler, but AMD flang may decline to inline across modules, changing FMA/contraction order. Fix family: per-file inline flags for the riemann modules under LLVMFlang in cmake/MFCTargets.cmake (the per-file -Mnoinline NVHPC precedent exists), a flang-side inline hint in the GPU_ROUTINE macro, or — last resort — a per-backend tolerance for this case.
Open secondary question: the #1556 pre-merge CI had both Frontier AMD gpu-omp shards green, which implies the 2D -> Example -> * tests were not in those shards' selection — a GPU CI coverage gap worth confirming and possibly its own issue.
Will update with the midpoint attribution and the fix PR. Part of the post-refactor audit series (cf. #1579, #1580).
Symptom (reported by @danieljvickers, confirmed in CI run 27417122769 on a CI-files-only branch, i.e. master's code): test
70EC99CE(2D -> Example -> viscous_shock_tube) fails golden tolerance on Frontier with AMD flang + OpenMP offload. First failing variable is marginal (abs 1.38e-3 vs 1.0e-3 tolerance); the large relative errors sit on near-zero golden values inside the shock. The failure is deterministic — three repeat runs produce the identical candidate value (1.0118017002832).Bisect evidence (real Frontier hardware, 3x repeats per ref):
a221310b(pre-refactor-series master): PASS 3/3751db879(current master): FAIL 3/3, identical value each runb14def777(Phase-2 structural pilots: IC context type, unified BC dispatcher, boundary module split #1555, before the m_riemann_solvers split): in progress — will attribute to Phase-2 completion: post-process context types, m_riemann_solvers split, cleanup #1556 (split) vs Phase-2 structural pilots: IC context type, unified BC dispatcher, boundary module split #1555-or-earlier.Cross-backend evidence: the same test on the same refs PASSES under gpu-omp on NVIDIA (Phoenix), 3/3 both refs, and all CPU lanes are golden-green — so this is a bit-level FP-evaluation-ordering change specific to the AMD flang toolchain, within tolerance everywhere else and over the line on this one razor-margin case. Not wrong physics.
Leading hypothesis (if the midpoint passes): the #1556 module split introduced cross-module device-helper calls;
cray_inline=Truecovers the Cray compiler, but AMD flang may decline to inline across modules, changing FMA/contraction order. Fix family: per-file inline flags for the riemann modules underLLVMFlangincmake/MFCTargets.cmake(the per-file-MnoinlineNVHPC precedent exists), a flang-side inline hint in the GPU_ROUTINE macro, or — last resort — a per-backend tolerance for this case.Open secondary question: the #1556 pre-merge CI had both Frontier AMD gpu-omp shards green, which implies the
2D -> Example -> *tests were not in those shards' selection — a GPU CI coverage gap worth confirming and possibly its own issue.Will update with the midpoint attribution and the fix PR. Part of the post-refactor audit series (cf. #1579, #1580).