Add support for allgather nvls algorithms#817
Open
Empyreus wants to merge 25 commits into
Open
Conversation
Binyang2014
reviewed
Jun 8, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the MSCCL++ execution/runtime and Python language layer to enable an initial NVLS-based AllGather implementation, including a new NVLS “multicast store” executor op and a new environment knob to disable IB transport for NVLS-focused runs.
Changes:
- Add a new executor op
MULTI_STORE(plan opcodegstore) and device-side implementation to multicast/broadcast data via NVLS without reduction. - Add
MSCCLPP_FORCE_DISABLE_IBto force-disable IB transport selection/registration (useful for MNNVL/NVLink-centric setups). - Update Python SwitchChannel broadcast (
GroupStore) JSON emission to thesrc_buff/dst_buffschema and add an AllGather NVLS zero-copy test program.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/npkit/npkit_trace_generator.py | Extends NPKIT op list to include the new MULTI_STORE executor op. |
| src/core/include/execution_common.hpp | Adds OperationType::MULTI_STORE to the executor op enum. |
| src/core/include/execution_kernel.hpp | Implements the MULTI_STORE device op handler and wires it into the executor dispatch. |
| src/core/executor/execution_plan.cc | Adds plan opcode mapping for gstore → OperationType::MULTI_STORE. |
| src/core/executor/executor.cc | Adds IB-disable env gating to transport selection and local memory registration. |
| src/core/env.cpp | Reads/logs the new MSCCLPP_FORCE_DISABLE_IB env var. |
| include/mscclpp/env.hpp | Documents the new MSCCLPP_FORCE_DISABLE_IB environment option. |
| python/mscclpp/language/internal/operations.py | Updates GroupStore.to_dict() to emit src_buff/dst_buff for NVLS broadcast ops. |
| python/mscclpp/language/tests/single_node/allgather_nvls_zero_copy.py | Adds a single-node NVLS-based AllGather program for testing/benchmarking. |
Binyang2014
approved these changes
Jun 9, 2026
Binyang2014
left a comment
Contributor
There was a problem hiding this comment.
LGTM, pls address copilot PR first
Contributor
Author
Comment on lines
+582
to
+586
| // MULTI_STORE is a pure data-movement op: it broadcasts bytes from a local (unicast) source buffer | ||
| // to the NVLS multicast destination with no reduction. `multimem.st` writes raw register bits without | ||
| // any type conversion, so the data type is irrelevant here -- we move raw bytes using the widest | ||
| // available multimem store unit (16 -> 8 -> 4 bytes). This keeps the op fully type agnostic (works for | ||
| // any dtype, including uint8_t and FP8, on any arch that supports MULTI_STORE). |
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
caiomcbr
reviewed
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Add support for testing with NVSL by creating MSCCLPP_FORCE_DISABLE_IB to disable IB for NVLS runs.
Add needed functions for allgather implementation
Implement initial allgather nvls algorithm.
Results with allgather_nvls_zero_copy.py: