support rocm7.2#819
Open
Binyang2014 wants to merge 10 commits into
Open
Conversation
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request expands MSCCL++ support for ROCm 7.2 across packaging, CI, Docker images, and docs, while also refining ROCm FP8 native-type selection and updating CUDA IPC handle lifecycle management to better accommodate ROCm IPC/mapping limits.
Changes:
- Add ROCm 7.x Python extras/requirements and auto-detect ROCm major version during test deployment installs.
- Extend CI (GitHub Actions CodeQL + Azure Pipelines) and Docker build targets to include ROCm 7.2 images and test runs.
- Update ROCm FP8 selection to be controlled via a CMake option/compile definition; improve CUDA IPC handle caching/closing behavior and adjust fullmesh allreduce context/channel setup.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/deploy/setup.sh | Auto-detect ROCm major version and install matching Python extra (rocm6/rocm7). |
| src/ext/collectives/include/allreduce/allreduce_fullmesh.hpp | Remove per-input channel cache member from the builder state. |
| src/ext/collectives/allreduce/allreduce_fullmesh.cu | Update fullmesh allreduce kernel launch bounds and rework context/channel initialization & keying. |
| src/core/registered_memory.cc | Strengthen CUDA IPC handle serialization validation and standardize unknown-transport error handling. |
| src/core/gpu_ipc_mem.cc | Refine runtime IPC open/close caching (HIP-focused) using shared_ptr/weak_ptr ownership. |
| python/requirements_rocm7.txt | Add ROCm 7 Python requirements set (incl. hip-python>=7,<8). |
| python/mscclpp_benchmark/tuner.py | Remove redundant reset() between correctness and timing in the tuning loop. |
| python/mscclpp_benchmark/correctness.py | Remove an extra pre-run barrier in correctness iterations. |
| python/mscclpp_benchmark/bench_collective.py | Add a ROCm 7.2 workaround to free cases between iterations and synchronize ranks. |
| pyproject.toml | Add rocm7 extra with ROCm 7-compatible hip-python dependency. |
| include/mscclpp/gpu_data_types.hpp | Make ROCm native FP8 alias selection depend on a build-controlled macro (FNUZ vs HIP default). |
| docs/quickstart.md | Document ROCm 7.2 docker tag and rocm7 install extra. |
| docker/build.sh | Add ROCm 7.2 base image target and related build metadata. |
| docker/base-dev-x.dockerfile | Adjust ROCm package installs and normalize extra selection parsing from TARGET. |
| CMakeLists.txt | Add MSCCLPP_ROCM_USE_FNUZ_FP8 option and update default ROCm arch list. |
| .github/workflows/codeql-analysis.yml | Run CodeQL for both rocm6.2 and rocm7.2 images. |
| .azure-pipelines/ut.yml | Add ROCm 7.2 container matrix entry for unit tests. |
| .azure-pipelines/templates/rccl-test.yml | Add ROCm scratch-reclaim workaround env var for RCCL tests/benchmarks. |
| .azure-pipelines/rccl-api-test.yml | Add ROCm 7.2 container matrix entry for RCCL API tests. |
| .azure-pipelines/codecov.yml | Add ROCm 7.2 container matrix entry for coverage runs. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces support for ROCm 7.2 across the build system, CI pipelines, Docker images, and documentation, while also improving ROCm FP8 type selection and CUDA IPC memory handle management. It updates dependencies and configurations to ensure compatibility with ROCm 7.2, adds new options for native FP8 variants, and refines some benchmarking and internal memory handling logic.
Pls notice: there is an issue in rocm7.2 (rocm7.2 user lib + rocm6.2 driver) when execution code in this order: allocating memory -> ipc communication -> allocate new memory -> free old memory.