Skip to content

support rocm7.2#819

Open
Binyang2014 wants to merge 10 commits into
mainfrom
binyli/rocm7
Open

support rocm7.2#819
Binyang2014 wants to merge 10 commits into
mainfrom
binyli/rocm7

Conversation

@Binyang2014

@Binyang2014 Binyang2014 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

This pull request introduces support for ROCm 7.2 across the build system, CI pipelines, Docker images, and documentation, while also improving ROCm FP8 type selection and CUDA IPC memory handle management. It updates dependencies and configurations to ensure compatibility with ROCm 7.2, adds new options for native FP8 variants, and refines some benchmarking and internal memory handling logic.

Pls notice: there is an issue in rocm7.2 (rocm7.2 user lib + rocm6.2 driver) when execution code in this order: allocating memory -> ipc communication -> allocate new memory -> free old memory.

@github-advanced-security

Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@Binyang2014 Binyang2014 marked this pull request as ready for review June 23, 2026 17:57
@Binyang2014 Binyang2014 requested review from a team and Copilot June 23, 2026 18:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request expands MSCCL++ support for ROCm 7.2 across packaging, CI, Docker images, and docs, while also refining ROCm FP8 native-type selection and updating CUDA IPC handle lifecycle management to better accommodate ROCm IPC/mapping limits.

Changes:

  • Add ROCm 7.x Python extras/requirements and auto-detect ROCm major version during test deployment installs.
  • Extend CI (GitHub Actions CodeQL + Azure Pipelines) and Docker build targets to include ROCm 7.2 images and test runs.
  • Update ROCm FP8 selection to be controlled via a CMake option/compile definition; improve CUDA IPC handle caching/closing behavior and adjust fullmesh allreduce context/channel setup.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/deploy/setup.sh Auto-detect ROCm major version and install matching Python extra (rocm6/rocm7).
src/ext/collectives/include/allreduce/allreduce_fullmesh.hpp Remove per-input channel cache member from the builder state.
src/ext/collectives/allreduce/allreduce_fullmesh.cu Update fullmesh allreduce kernel launch bounds and rework context/channel initialization & keying.
src/core/registered_memory.cc Strengthen CUDA IPC handle serialization validation and standardize unknown-transport error handling.
src/core/gpu_ipc_mem.cc Refine runtime IPC open/close caching (HIP-focused) using shared_ptr/weak_ptr ownership.
python/requirements_rocm7.txt Add ROCm 7 Python requirements set (incl. hip-python>=7,<8).
python/mscclpp_benchmark/tuner.py Remove redundant reset() between correctness and timing in the tuning loop.
python/mscclpp_benchmark/correctness.py Remove an extra pre-run barrier in correctness iterations.
python/mscclpp_benchmark/bench_collective.py Add a ROCm 7.2 workaround to free cases between iterations and synchronize ranks.
pyproject.toml Add rocm7 extra with ROCm 7-compatible hip-python dependency.
include/mscclpp/gpu_data_types.hpp Make ROCm native FP8 alias selection depend on a build-controlled macro (FNUZ vs HIP default).
docs/quickstart.md Document ROCm 7.2 docker tag and rocm7 install extra.
docker/build.sh Add ROCm 7.2 base image target and related build metadata.
docker/base-dev-x.dockerfile Adjust ROCm package installs and normalize extra selection parsing from TARGET.
CMakeLists.txt Add MSCCLPP_ROCM_USE_FNUZ_FP8 option and update default ROCm arch list.
.github/workflows/codeql-analysis.yml Run CodeQL for both rocm6.2 and rocm7.2 images.
.azure-pipelines/ut.yml Add ROCm 7.2 container matrix entry for unit tests.
.azure-pipelines/templates/rccl-test.yml Add ROCm scratch-reclaim workaround env var for RCCL tests/benchmarks.
.azure-pipelines/rccl-api-test.yml Add ROCm 7.2 container matrix entry for RCCL API tests.
.azure-pipelines/codecov.yml Add ROCm 7.2 container matrix entry for coverage runs.

Comment thread src/ext/collectives/allreduce/allreduce_fullmesh.cu
Comment thread src/core/gpu_ipc_mem.cc Outdated
Comment thread include/mscclpp/gpu_data_types.hpp Outdated
Comment thread CMakeLists.txt
Comment thread python/mscclpp_benchmark/bench_collective.py Outdated
Comment thread src/core/gpu_ipc_mem.cc
Binyang2014 and others added 3 commits June 23, 2026 16:58
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants