GPU acceleration support, Docker environment, and build fixes for H100/A100#9
Open
ajithsureshtii wants to merge 22 commits into
Open
GPU acceleration support, Docker environment, and build fixes for H100/A100#9ajithsureshtii wants to merge 22 commits into
ajithsureshtii wants to merge 22 commits into
Conversation
…assignment on same machine
f6b404a to
803fba7
Compare
…ale troy dependency
…nt parallel build race
…ation and per-test options
…ents.sh and run_config.py
…l tests (default=0)
…ESS=0, throughput=24 processes)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds full GPU acceleration support for HPMPC on H100 and A100 machines, introduces a reproducible Docker-based development environment, and fixes several build issues encountered when enabling CUDA GEMM and ConvTriple GPU preprocessing.
Changes
Bug Fixes
Makefile: Fixed hardcoded CUDA library path (/opt/cuda→/usr/local/cuda/lib64) forCHEETAH_GPU=1buildsMakefile: AddedNVCC_LINKFLAGSusing-Xlinkerrpath syntax — nvcc rejects-Wl,flags directly, causing all ConvTriple/SEAL/OpenSSL symbols to be unresolved when linking CUDA executables withUSE_CUDA_GEMMcore/cuda/conv_cutlass_int.cu: Commented outuint16_texplicit instantiation — CUTLASScp_asyncon sm_80/sm_90 only supports 4/8/16-byte transfers, not 2-byteuint16_tcore/cuda/gemm_cutlass_int.cu: Same fix for GEMM kernelNew Files
docker/Dockerfile: GPU-capable image based onnvidia/cuda:12.9.1-devel-ubuntu24.04with gcc-13 (required for C++20<format>), all HPMPC/ConvTriple system dependencies, CUTLASS pre-cloned at/cutlass, andGPU_ARCHITECTUREenv var preset for troy-nova buildsdocker-run.sh: Convenience script to build and launch the container with bind-mount and GPU passthrough; supports--build,--gpus,--no-mount, and--imageflagsrun_cases.sh: Script to build and run the four standard test cases (CPU-only, GPU online, GPU preprocessing, full GPU), with per-case make and run logs inlogs/Step-by-step-instructions.md: End-to-end setup guide covering Docker launch, submodule init, ConvTriple deps, CUDA GEMM build, and all four execution variants for both H100 (sm_90) and A100 (sm_80)Test Cases
USE_CUDA_GEMM=2)CHEETAH_GPU=1)CHEETAH_GPU=1 USE_CUDA_GEMM=2)