Skip to content

GPU acceleration support, Docker environment, and build fixes for H100/A100#9

Open
ajithsureshtii wants to merge 22 commits into
chart21:aby2-mergefrom
ajithsureshtii:aby2-merge
Open

GPU acceleration support, Docker environment, and build fixes for H100/A100#9
ajithsureshtii wants to merge 22 commits into
chart21:aby2-mergefrom
ajithsureshtii:aby2-merge

Conversation

@ajithsureshtii
Copy link
Copy Markdown

Summary

This PR adds full GPU acceleration support for HPMPC on H100 and A100 machines, introduces a reproducible Docker-based development environment, and fixes several build issues encountered when enabling CUDA GEMM and ConvTriple GPU preprocessing.

Changes

Bug Fixes

  • Makefile: Fixed hardcoded CUDA library path (/opt/cuda/usr/local/cuda/lib64) for CHEETAH_GPU=1 builds
  • Makefile: Added NVCC_LINKFLAGS using -Xlinker rpath syntax — nvcc rejects -Wl, flags directly, causing all ConvTriple/SEAL/OpenSSL symbols to be unresolved when linking CUDA executables with USE_CUDA_GEMM
  • core/cuda/conv_cutlass_int.cu: Commented out uint16_t explicit instantiation — CUTLASS cp_async on sm_80/sm_90 only supports 4/8/16-byte transfers, not 2-byte uint16_t
  • core/cuda/gemm_cutlass_int.cu: Same fix for GEMM kernel

New Files

  • docker/Dockerfile: GPU-capable image based on nvidia/cuda:12.9.1-devel-ubuntu24.04 with gcc-13 (required for C++20 <format>), all HPMPC/ConvTriple system dependencies, CUTLASS pre-cloned at /cutlass, and GPU_ARCHITECTURE env var preset for troy-nova builds
  • docker-run.sh: Convenience script to build and launch the container with bind-mount and GPU passthrough; supports --build, --gpus, --no-mount, and --image flags
  • run_cases.sh: Script to build and run the four standard test cases (CPU-only, GPU online, GPU preprocessing, full GPU), with per-case make and run logs in logs/
  • Step-by-step-instructions.md: End-to-end setup guide covering Docker launch, submodule init, ConvTriple deps, CUDA GEMM build, and all four execution variants for both H100 (sm_90) and A100 (sm_80)

Test Cases

Case Description Status
1 CPU only, skip preprocessing ✅ Verified
2 GPU online phase (USE_CUDA_GEMM=2) ✅ Verified
3 GPU preprocessing (CHEETAH_GPU=1) 🔄 Pending ConvTriple GPU build
4 Full GPU (CHEETAH_GPU=1 USE_CUDA_GEMM=2) 🔄 Pending ConvTriple GPU build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant