Skip to content

1454-projection-optimization #1549

Open
danieljvickers wants to merge 26 commits into
MFlowCode:masterfrom
danieljvickers:1454-projection-optimization
Open

1454-projection-optimization #1549
danieljvickers wants to merge 26 commits into
MFlowCode:masterfrom
danieljvickers:1454-projection-optimization

Conversation

@danieljvickers

@danieljvickers danieljvickers commented Jun 9, 2026

Copy link
Copy Markdown
Member

Description

In many-particle cases, the limiting factor in the IBM compute is by far the time it takes to project the immersed boundaries onto the grid. This is fundamentally rooted in how we parallelize the work. The current code parallelizes of x, y, and z, but sequentially iterates through the IB patches. In cases where there are many IBs that are small, we are launching several (thousands) of GPU kernels each time step, but each kernel only works on hundreds of grid cells. Adding an option to parallelize over the thousands of particles should be significantly larger parallelism and thus optimize the projection.

In order to maintain the functionality of both parallelism methods, I need a separate set of geometry bounding checks. Since we perform a check in icpp patches and now 2 in IB patches, this is a significant amount of redundant code that must be maintained. To be somewhat forward-looking, I opted to merge all geometry checking into a single module that can be called from both forms of IB parallelism and the icpp pre_processing code. This should clean up the code nicely and significantly reduce code maintenance going forward. Since we can now change cylinder orientation with angles, I also deprecate the unneeded cylinder length checks.

The end result is the creation of a new module in common, the deletion of duplicate code, and a new parallelism path for IB patches when utilizing GPU compute.

Closes #1454, #1532, #1543

Type of change (delete unused ones)

  • New feature
  • Refactor

Testing

All tests pass on GNU compiler

Checklist

Check these like this [x] to indicate which of the below applies.

  • I added or updated tests for new behavior
  • I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not retriggered automatically. To request a review, comment on the PR:

  • @claude full review — Claude full review (also triggers on PR open/reopen/ready)
  • Or add label claude-full-review — Claude full review via label

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Claude Code Review

Head SHA: 4f88e03

Files changed:

  • 13
  • src/simulation/m_ib_patches.fpp
  • src/common/m_patch_geometries.fpp
  • src/pre_process/m_icpp_patches.fpp
  • src/simulation/m_ibm.fpp
  • src/simulation/m_checker.fpp
  • src/simulation/m_global_parameters.fpp
  • src/pre_process/m_global_parameters.fpp
  • src/post_process/m_global_parameters.fpp
  • toolchain/mfc/params/definitions.py
  • toolchain/mfc/params/descriptions.py

Findings

1. Uninitialized k writes in 2D section of s_apply_ib_patches_grid_cell_parallelism — silent wrong-index bug

In src/simulation/m_ib_patches.fpp, the 2D path (else if (num_dims == 2) then) only loops over j and i; k is never assigned. Three geometry handlers write to ib_markers%sf(i, j, k) with that uninitialized value instead of ib_markers%sf(i, j, 0):

  • Geometry 2 (circle): if (f_is_inside_cylinder(...)) ib_markers%sf(i, & j, k) = encoded_patch_id
  • Geometry 3 (rectangle): if (f_is_inside_cuboid(...)) ib_markers%sf(i, j, & k) = encoded_patch_id
  • Geometry 5 (STL model): ib_markers%sf(i, j, k) = encoded_patch_id

Geometries 4 and 6 in the same block correctly use ib_markers%sf(i, j, 0), confirming the inconsistency is a refactoring bug.


2. Uninitialized k writes in all 2D geometries of s_apply_ib_patches_ib_parallelism — silent wrong-index bug

All five geometry branches in the 2D section of s_apply_ib_patches_ib_parallelism write ib_markers%sf(i, j, k) where k is undeclared/uninitialized (the 2D loop only iterates over j and i). All five should use index 0 for the third dimension in 2D:

  • Geometry 2: ib_markers%sf(i, & j, k)
  • Geometry 3: ib_markers%sf(i, j, & k)
  • Geometry 4: ib_markers%sf(i, j, k)
  • Geometry 5: ib_markers%sf(i, j, k)
  • Geometry 6: ib_markers%sf(i, j, & k)

3. Nested GPU_PARALLEL_LOOP and mismatched END_GPU_PARALLEL_LOOP in 2D section of s_apply_ib_patches_ib_parallelism — GPU build failure

The 2D path emits two GPU_PARALLEL_LOOP macros but only one END_GPU_PARALLEL_LOOP, and the outer one wraps the inner one (nesting is forbidden per common-pitfalls.md):

$:GPU_PARALLEL_LOOP(private='[xp, yp, patch_id, center]', collapse=3)
do xp = xp_lower, xp_upper
    do yp = yp_lower, yp_upper
        $:GPU_PARALLEL_LOOP(private='[xp, yp, i, il, ir, ...]')   ! nested — invalid
        do patch_id = 1, num_ibs
            ...
        end do
        $:END_GPU_PARALLEL_LOOP()   ! closes inner only; outer is never closed
    end do
end do

The outer directive also specifies collapse=3 but only two explicit loops (xp, yp) appear before the nested directive begins. The 3D section of the same subroutine correctly places a single GPU_PARALLEL_LOOP/END_GPU_PARALLEL_LOOP around only the patch_id loop without wrapping the periodicity loops.


4. Undeclared bounding_box_corner_distance in GPU_PARALLEL_LOOP private clause — compilation error

In s_apply_ib_patches_ib_parallelism the inner GPU_PARALLEL_LOOP private list references bounding_box_corner_distance:

$:GPU_PARALLEL_LOOP(private='[xp, yp, i, il, ir, j, jl, jr, xyz_local, length, radius, &
                    & bounding_box_corner_distance, patch_id, airfoil_id, ...]')

This variable is not declared anywhere in the subroutine and will cause a compilation error.


5. get_indices_from_bounds lacks GPU_ROUTINE pragma — GPU build failure

get_bounding_indices is decorated with $:GPU_ROUTINE(parallelism='[seq]') and called from inside GPU_PARALLEL_LOOP regions. It calls get_indices_from_bounds:

call get_indices_from_bounds(bbox_min(1), bbox_max(1), x_cc, il, ir)
call get_indices_from_bounds(bbox_min(2), bbox_max(2), y_cc, jl, jr)
if (num_dims == 3) call get_indices_from_bounds(bbox_min(3), bbox_max(3), z_cc, kl, kr)

get_indices_from_bounds has no $:GPU_ROUTINE pragma. In OpenACC and OpenMP offload, every routine called from a device routine must itself be compiled as a device routine. The missing pragma will cause a GPU build failure.

@danieljvickers danieljvickers marked this pull request as ready for review June 10, 2026 16:19
@danieljvickers

Copy link
Copy Markdown
Member Author

All AI review comments are now irrelevant because of significant changes that have occurred between now and that review.

I doubt that this will pass the test suite on the first try, but on the off chance that it does we should not yet merge it. A data product of computational optimization performance should be presented before this PR is merged. Otherwise, it is unnecessary refactoring of the code.

@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.15385% with 101 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.67%. Comparing base (ac5774e) to head (ac1681a).

Files with missing lines Patch % Lines
src/simulation/m_ib_patches.fpp 63.08% 58 Missing and 21 partials ⚠️
src/common/m_patch_geometries.fpp 52.77% 9 Missing and 8 partials ⚠️
src/pre_process/m_icpp_patches.fpp 50.00% 2 Missing and 2 partials ⚠️
src/common/m_model.fpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1549      +/-   ##
==========================================
- Coverage   60.94%   60.67%   -0.27%     
==========================================
  Files          82       83       +1     
  Lines       19922    19852      -70     
  Branches     2924     2944      +20     
==========================================
- Hits        12141    12045      -96     
- Misses       5805     5814       +9     
- Partials     1976     1993      +17     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@danieljvickers

Copy link
Copy Markdown
Member Author
runtime_comparison Successful runtime comparison on Frontier. This plot shows run times for the s_apply_ib_patches subroutines.

In this test, I ran with 8k particles being projected onto the grid. Master is the current master branch (before this is merged) as a baseline of performance. The other two plots show the two types of parallelism now available from this branch.

the grid parallelism is the current Master branch functionality. This demonstrates that the performance is equivalent. The IB parallelism is the new parallelism introduced by this feature. In this particular case of 8k sphere with only about 300 grid cells per sphere, there was a 150x performance improvement over the previous parallelism. This this subroutine is called 3x per time step in the test, this is a 327 ms time reduction PER TIME STEP.

This plot should demonstrate that this feature is useful and should be merged. I will now move on to resolving any remaining issues with the test suite and hopefully getting this on master.

Resolves conflicts with the m_global_parameters_common refactor:
- post_process m_global_parameters: keep master's relocation of shear/proc
  vars to m_global_parameters_common; keep PR's ib_airfoil/ib_airfoil_grids
  stubs needed by the new common m_patch_geometries module
- simulation m_global_parameters: keep master's GPU_DECLARE (ib, num_ibs,
  ib_coefficient_of_friction now declared in m_global_parameters_common);
  relocate the PR's stl_models GPU_DECLARE to the TYPED_DECLS gpu flag in
  toolchain/mfc/params/definitions.py (auto-generated for sim)
@sbryngelson

Copy link
Copy Markdown
Member

I've pushed a merge of master into this branch (ac1681a, a true merge commit — no history rewritten) to resolve the conflicts from the recently-merged global-parameters/registry refactors (#1550#1556). What changed and why:

src/post_process/m_global_parameters.fpp — master moved shear_*, proc_coords, start_idx into the new shared m_global_parameters_common; kept your ib_airfoil/ib_airfoil_grids stubs that the new common m_patch_geometries module needs to compile for post_process.

src/simulation/m_global_parameters.fpp — the GPU_DECLARE lines for ib, num_ibs, and ib_coefficient_of_friction now live in m_global_parameters_common, so the manual list reduces to ib_airfoil_grids. Your addition of stl_models to the GPU declare list was relocated to the new parameter pipeline: derived-type declarations (and their sim-side GPU_DECLARE) are auto-generated from the TYPED_DECLS table in toolchain/mfc/params/definitions.py — I flipped the stl_models gpu flag there to True, which emits the same declare in the generated decls.

Heads-up for future commits on this branch: Fortran parameter declarations, namelist bindings, and MPI broadcasts are now generated from toolchain/mfc/params/definitions.py at build time (see docs/documentation/contributing.md). Your many_ib_patch_parallelism registration came through cleanly and its broadcast is auto-generated — no action needed.

Verification on my end: ./mfc.sh format, ./mfc.sh precheck, the toolchain pytest suite (342 passed), and a full CPU build of all three targets all pass on the merged tree. Nothing remains on your side; CI should run on the updated head.

@danieljvickers

Copy link
Copy Markdown
Member Author

I am very glad you have merged this for me because I screwed up the previous one significantly, lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Projection Optimization

2 participants