Fix silent viscosity loss on AMD flang GPU (host-capture Re_size in the Riemann solvers) by sbryngelson · Pull Request #1588 · MFlowCode/MFC

sbryngelson · 2026-06-13T16:20:35Z

Closes #1587. (The branch name is stale — this no longer makes Re_size allocatable; see below.)

Summary

Fixes silently-wrong viscous results on the Frontier AMD-flang OpenMP-offload GPU build — the 2D_viscous_shock_tube regression introduced by #1556. 3 files, solver-only; Re_size stays static.

Root cause

On AMD flang OpenMP offload, a cross-translation-unit read of the static declare target Re_size inside a Riemann-solver kernel is unreliable and codegen-dependent: some solver kernels read it as 0, so if (Re_size(i) > 0) is false, Re_L/Re_R collapse to dflt_real, the interface Reynolds number becomes 1e16, and viscosity is silently switched off. CPU and NVIDIA (nvfortran) builds are unaffected — verified across all 14 NVHPC CI lanes.

Why not "make `Re_size` allocatable" (the previous version of this PR)

That approach only moved the bug. On AMD flang it fixed HLL/HLLC but broke the LF (riemann_solver=5) and IBM viscous paths — they pass on master and failed with the allocatable change (an inversion: static breaks some solver kernels, allocatable breaks others). The defect is per-kernel codegen, not a clean static-vs-allocatable rule, so no choice of storage class is reliable for every kernel. (Confirmed on hardware: per-solver GPU_UPDATE workarounds also only fixed some kernels.)

The fix

Keep Re_size static; in each of s_hll/hllc/lf_riemann_solver, capture it into a host-set local Re_size_loc and firstprivate that value into the viscous kernels, so they never read Re_size from declare-target device memory:

integer, dimension(2) :: Re_size_loc   ! host copy
...
Re_size_loc = Re_size
$:GPU_PARALLEL_LOOP(collapse=3, firstprivate='[Re_size_loc]', private='[...]')
...
do q = 1, Re_size_loc(i)   ! was Re_size(i)

Removing the declare-target read removes the variable the compiler bug acts on, so it's immune regardless of which kernel or solver.

Verification (Frontier AMD flang, gpu-omp)

All five previously-affected viscous tests pass with this fix:
70EC99CE viscous_shock_tube, 8EAC3DA7 1D viscous canary, 27B24CD9 + 4C812637 LF-viscous (the two the allocatable approach broke), B0CE19C5 IBM-viscous. format + precheck clean.

Upstream

This is a workaround for an amdflang bug, filed with a standalone minimal reproducer: ROCm/llvm-project#2890, llvm/llvm-project#203711 (still reproduces on the newest amdflang, AFAR 23.2.0 / LLVM 23). The always-run viscous-GPU canary in #1590 is what surfaced the inversion before merge.

Copilot

Pull request overview

This PR addresses an AMD flang OpenMP-offload GPU correctness regression (post-#1556 module split) where Re_size could remain zero on-device in solver translation units, effectively disabling viscous terms without an error. The fix makes Re_size allocatable so it is handled through the OpenMP runtime mapping machinery (consistent with Re_idx / Res_gs) and thus behaves correctly across translation units.

Changes:

Change Re_size from a static integer, dimension(2) to integer, allocatable, dimension(:).
Allocate Re_size(1:2) during global-parameters initialization before first use and keep initialization to zero.
Deallocate Re_size in the module finalization path.

codecov · 2026-06-13T18:26:50Z

Codecov Report

❌ Patch coverage is 42.85714% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.94%. Comparing base (ac5774e) to head (ecf7f22).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/simulation/m_riemann_solver_hllc.fpp	50.00%	3 Missing and 2 partials ⚠️
src/simulation/m_riemann_solver_lf.fpp	42.85%	0 Missing and 4 partials ⚠️
src/simulation/m_riemann_solver_hll.fpp	25.00%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1588   +/-   ##
=======================================
  Coverage   60.94%   60.94%           
=======================================
  Files          82       82           
  Lines       19922    19925    +3     
  Branches     2924     2924           
=======================================
+ Hits        12141    12144    +3     
  Misses       5805     5805           
  Partials     1976     1976

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…host-capture (firstprivate) The declare-target-roulette case shows two identical declare-target arrays disagreeing in one kernel based only on the push mechanism (enter data map vs update to), verified on ROCm 7.2.0 and AFAR drop 23.2.0. Also corrects declare-target-static-tu's Fix section: making the variable allocatable is NOT reliable in the full app (it only moves the bug -- an inversion), the robust fix is host-capture + firstprivate (MFlowCode/MFC#1588).

…plete Riemann private lists) AMD flang reads the static declare-target Re_size stale across translation units, silently disabling viscosity. Host-capture it into Re_size_loc and firstprivate it into the Riemann kernels for all compilers. That firstprivate clause exposed a latent bug: several per-cell scalars (s_M/s_P/xi_M/xi_P in HLL, c_sum_Yi_Phi/flux_ene_e in HLLC, Gamm_L/Gamm_R/flux_tau_L/flux_tau_R in LF) were omitted from private() and relied on CCE-19's defaultmap(firstprivate:scalar) auto-privatization, which the firstprivate clause disrupts (gross wrong physics on Cray). Complete the private lists -- a no-op on every compiler -- so firstprivate is safe everywhere; no compiler gate needed. Verified on Frontier: AMD-flang 29/29 and Cray-CCE (rs=1/2/5) of the previously-failing cases pass.

The host-capture firstprivate(Re_size_loc) added to the LF/HLL/HLLC solvers (the AMD-flang viscosity fix) pushed the longest GPU_PARALLEL_LOOP directive past nvfortran's ~1000-char source-line limit (1011 chars -> 'source line too long' / unbalanced parentheses; AMD and Cray accept the long line, nvfortran caps it). FOLD_DIRECTIVE wraps the assembled directive at whole-clause boundaries with repeated sentinels (!$acc&/!$omp&), so the longest emitted line is prefix+longest-single-clause (817 chars on LF) -- shorter than the single line a build with one fewer clause already compiles. firstprivate is preserved (AMD/Cray correctness) on its own continuation. Validated: nvfortran 25.5 -acc and -mp=gpu compile+run the folded marker-interleaved continuation correctly; the cpp line-markers between continuations are already emitted by master for regular Fortran continuations and compile on Cray/AMD CI. Short directives (<200 chars) are unchanged.

Copilot AI review requested due to automatic review settings June 13, 2026 16:20

Copilot started reviewing on behalf of sbryngelson June 13, 2026 16:20 View session

sbryngelson mentioned this pull request Jun 13, 2026

Harden AMD gpu-omp CI coverage + lint against silent declare-target regressions (follow-up to #1588) #1589

Open

Copilot AI reviewed Jun 13, 2026

View reviewed changes

sbryngelson mentioned this pull request Jun 13, 2026

test: always-run canary smoke set spanning physics modules #1590

Open

sbryngelson force-pushed the fix-amdflang-re-size-allocatable branch from baf0884 to ec9c371 Compare June 13, 2026 19:49

sbryngelson changed the title ~~Make Re_size allocatable: fix silent viscosity loss on AMD flang GPU~~ Fix silent viscosity loss on AMD flang GPU (host-capture Re_size in the Riemann solvers) Jun 13, 2026

sbryngelson force-pushed the fix-amdflang-re-size-allocatable branch 2 times, most recently from dd1eef1 to 203ba70 Compare June 14, 2026 02:45

sbryngelson force-pushed the fix-amdflang-re-size-allocatable branch from 203ba70 to 0d995d5 Compare June 14, 2026 02:59

sbryngelson added 2 commits June 13, 2026 23:50

macros: avoid '...' placeholder in FOLD_DIRECTIVE comment (source lint)

ecf7f22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix silent viscosity loss on AMD flang GPU (host-capture Re_size in the Riemann solvers)#1588

Fix silent viscosity loss on AMD flang GPU (host-capture Re_size in the Riemann solvers)#1588
sbryngelson wants to merge 3 commits into
masterfrom
fix-amdflang-re-size-allocatable

sbryngelson commented Jun 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

codecov Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

sbryngelson commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Why not "make Re_size allocatable" (the previous version of this PR)

The fix

Verification (Frontier AMD flang, gpu-omp)

Upstream

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

codecov Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

sbryngelson commented Jun 13, 2026 •

edited

Loading

Why not "make `Re_size` allocatable" (the previous version of this PR)

codecov Bot commented Jun 13, 2026 •

edited

Loading