GPU acceleration support, Docker environment, and build fixes for H100/A100 by ajithsureshtii · Pull Request #9 · chart21/hpmpc

ajithsureshtii · 2026-06-03T12:42:14Z

Summary

This PR adds full GPU acceleration support for HPMPC on H100 and A100 machines, introduces a reproducible Docker-based development environment, and fixes several build issues encountered when enabling CUDA GEMM and ConvTriple GPU preprocessing.

Changes

Bug Fixes

Makefile: Fixed hardcoded CUDA library path (/opt/cuda → /usr/local/cuda/lib64) for CHEETAH_GPU=1 builds
Makefile: Added NVCC_LINKFLAGS using -Xlinker rpath syntax — nvcc rejects -Wl, flags directly, causing all ConvTriple/SEAL/OpenSSL symbols to be unresolved when linking CUDA executables with USE_CUDA_GEMM
core/cuda/conv_cutlass_int.cu: Commented out uint16_t explicit instantiation — CUTLASS cp_async on sm_80/sm_90 only supports 4/8/16-byte transfers, not 2-byte uint16_t
core/cuda/gemm_cutlass_int.cu: Same fix for GEMM kernel

New Files

docker/Dockerfile: GPU-capable image based on nvidia/cuda:12.9.1-devel-ubuntu24.04 with gcc-13 (required for C++20 <format>), all HPMPC/ConvTriple system dependencies, CUTLASS pre-cloned at /cutlass, and GPU_ARCHITECTURE env var preset for troy-nova builds
docker-run.sh: Convenience script to build and launch the container with bind-mount and GPU passthrough; supports --build, --gpus, --no-mount, and --image flags
run_cases.sh: Script to build and run the four standard test cases (CPU-only, GPU online, GPU preprocessing, full GPU), with per-case make and run logs in logs/
Step-by-step-instructions.md: End-to-end setup guide covering Docker launch, submodule init, ConvTriple deps, CUDA GEMM build, and all four execution variants for both H100 (sm_90) and A100 (sm_80)

Test Cases

Case	Description	Status
1	CPU only, skip preprocessing	✅ Verified
2	GPU online phase (`USE_CUDA_GEMM=2`)	✅ Verified
3	GPU preprocessing (`CHEETAH_GPU=1`)	🔄 Pending ConvTriple GPU build
4	Full GPU (`CHEETAH_GPU=1 USE_CUDA_GEMM=2`)	🔄 Pending ConvTriple GPU build

…fig.py

…ntil completion

… condition

…bmodule pointer

…assignment on same machine

…ale troy dependency

…nt parallel build race

…ation and per-test options

…ents.sh and run_config.py

…ncryption roles

…on in header

…l tests (default=0)

…est_throughput)

…ESS=0, throughput=24 processes)

ajithsureshtii added 10 commits June 3, 2026 16:35

Add GPU support, Docker environment, and build fixes for H100/A100

a87ee33

Enable host networking and NET_ADMIN for inter-machine communication

9df6c8f

Add iperf3 and ping to Docker image

1edc4ef

Fix non-deterministic config file ordering across machines in run_con…

251d986

…fig.py

Stream command output to terminal in real time instead of buffering u…

3be9b01

…ntil completion

Auto-rebuild ConvTriple in Makefile when CHEETAH_GPU changes

2169333

Fix convtriple_check to run before compile_pch to avoid parallel race…

11c9ce2

… condition

GPU build support: Makefile calls build.sh -gpu, update ConvTriple su…

2ac5991

…bmodule pointer

Add -G player:device flag to run.sh/run_locally.sh for per-party GPU …

cec1e0d

…assignment on same machine

Pass -G player:device flags through run_cases.sh to run.sh for GPU cases

803fba7

ajithsureshtii force-pushed the aby2-merge branch from f6b404a to 803fba7 Compare June 4, 2026 17:25

ajithsureshtii added 12 commits June 4, 2026 21:32

Force relink executables after ConvTriple GPU/CPU rebuild to avoid st…

fa9f77d

…ale troy dependency

Fix compile_executables sequential dependency on compile_pch to preve…

b7dccdd

…nt parallel build race

Add run_measurements.sh for GPU/CPU measurement configs with per-iter…

41f592a

…ation and per-test options

Update ConvTriple submodule pointer to fork aby2-merge

ba5b6df

Point ConvTriple submodule to fork so clone works out of the box

4ee7e9c

Support single-machine GPU vs CPU comparison: -G flag in run_measurem…

bbd3277

…ents.sh and run_config.py

Add CHEETAH_GPU_REVERSE flag: auto-rebuild ConvTriple with reversed e…

be7c45c

…ncryption roles

Add H100 benchmark results (4 tests); fix conv2d_ab_reverse declarati…

0adf2a5

…on in header

Add --gemm flag to run_measurements.sh: controls USE_CUDA_GEMM for al…

89f7c26

…l tests (default=0)

Prefix GPU config files with a/b so single-batch runs before multi-batch

16e4e0b

Add best-latency and best-throughput GPU configs (c_best_latency, c_b…

2d91165

…est_throughput)

Add d_best_latency/throughput (CPU), update c_best configs (GPU COMPR…

e7f78e4

…ESS=0, throughput=24 processes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU acceleration support, Docker environment, and build fixes for H100/A100#9

GPU acceleration support, Docker environment, and build fixes for H100/A100#9
ajithsureshtii wants to merge 22 commits into
chart21:aby2-mergefrom
ajithsureshtii:aby2-merge

ajithsureshtii commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ajithsureshtii commented Jun 3, 2026

Summary

Changes

Bug Fixes

New Files

Test Cases

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant