Frontier AMD (flang, gpu-omp): viscous_shock_tube fails golden tolerance deterministically on master

**Symptom** (reported by @danieljvickers, confirmed in CI run 27417122769 on a CI-files-only branch, i.e. master's code): test `70EC99CE` (2D -> Example -> viscous_shock_tube) fails golden tolerance on Frontier with AMD flang + OpenMP offload. First failing variable is marginal (abs 1.38e-3 vs 1.0e-3 tolerance); the large relative errors sit on near-zero golden values inside the shock. The failure is **deterministic** — three repeat runs produce the identical candidate value (1.0118017002832).

**Bisect evidence (real Frontier hardware, 3x repeats per ref):**
- `a221310b` (pre-refactor-series master): **PASS 3/3**
- `751db879` (current master): **FAIL 3/3**, identical value each run
- Midpoint `b14def777` (#1555, before the m_riemann_solvers split): in progress — will attribute to #1556 (split) vs #1555-or-earlier.

**Cross-backend evidence:** the same test on the same refs PASSES under gpu-omp on NVIDIA (Phoenix), 3/3 both refs, and all CPU lanes are golden-green — so this is a bit-level FP-evaluation-ordering change specific to the AMD flang toolchain, within tolerance everywhere else and over the line on this one razor-margin case. Not wrong physics.

**Leading hypothesis** (if the midpoint passes): the #1556 module split introduced cross-module device-helper calls; `cray_inline=True` covers the Cray compiler, but AMD flang may decline to inline across modules, changing FMA/contraction order. Fix family: per-file inline flags for the riemann modules under `LLVMFlang` in `cmake/MFCTargets.cmake` (the per-file `-Mnoinline` NVHPC precedent exists), a flang-side inline hint in the GPU_ROUTINE macro, or — last resort — a per-backend tolerance for this case.

**Open secondary question:** the #1556 pre-merge CI had both Frontier AMD gpu-omp shards green, which implies the `2D -> Example -> *` tests were not in those shards' selection — a GPU CI coverage gap worth confirming and possibly its own issue.

Will update with the midpoint attribution and the fix PR. Part of the post-refactor audit series (cf. #1579, #1580).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frontier AMD (flang, gpu-omp): viscous_shock_tube fails golden tolerance deterministically on master #1587

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Frontier AMD (flang, gpu-omp): viscous_shock_tube fails golden tolerance deterministically on master #1587

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions