Skip to content

Handle CUDA checkpoint restore arg layouts#2144

Open
kkraus14 wants to merge 1 commit into
NVIDIA:mainfrom
kkraus14:codex/fix-checkpoint-restoreargs
Open

Handle CUDA checkpoint restore arg layouts#2144
kkraus14 wants to merge 1 commit into
NVIDIA:mainfrom
kkraus14:codex/fix-checkpoint-restoreargs

Conversation

@kkraus14
Copy link
Copy Markdown
Collaborator

Summary

  • Add the CUDA 13.3 CUcheckpointRestoreArgs_st.reserved header rewrite so pyclibrary keeps parsing the struct and typedef.
  • Carry parsed struct member type and array length metadata into the Tempita templates.
  • Generate CUcheckpointRestoreArgs_st for the CUDA 12.x, CUDA 13.2, and CUDA 13.3 restore-argument layouts, including the CUDA 13.2 reserved1 field.

Validation

  • Parsed/rendered cuda.h snapshots for CUDA 12.9, 13.2, and 13.3; CUcheckpointRestoreArgs is present in all three and the generated layouts match the headers.
  • Checked CUDA 13.2 vs 13.3 cuda.h sizeof-array expressions and empty named structs; no extra empty named structs remain in 13.3 after the fix.
  • python -m compileall cuda_bindings/build_hooks.py
  • Commit hooks: ruff, SPDX, whitespace, EOF checks passed.
  • pixi build --output-dir /tmp/cuda-bindings-build-check2 --clean no longer hits the CUDA 12.9 checkpoint restore compile error; the build continues through the cuda.bindings.driver link step and then fails on unrelated cufile symbols missing from the selected CUDA 12.x libcufile-dev.
  • A CUDA 13.2-focused pixi build attempt was also blocked by unrelated newer nvrtc symbols / stale parser-cache behavior (nvrtcBundledHeadersInfo, NVRTC_ERROR_BUSY), not by checkpoint restore generation.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 27, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the cuda.bindings Everything related to the cuda.bindings module label May 27, 2026
@kkraus14 kkraus14 marked this pull request as ready for review May 27, 2026 21:02
@kkraus14
Copy link
Copy Markdown
Collaborator Author

/ok to test

@kkraus14 kkraus14 added the bug Something isn't working label May 27, 2026
@kkraus14 kkraus14 self-assigned this May 27, 2026
@kkraus14 kkraus14 added P0 High priority - Must do! to-be-backported Trigger the bot to raise a backport PR upon merge and removed to-be-backported Trigger the bot to raise a backport PR upon merge labels May 27, 2026
@github-actions
Copy link
Copy Markdown

Copy link
Copy Markdown
Collaborator

@rparolin rparolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean fix for the CUDA 13.3 header parsing break.

@kkraus14 kkraus14 force-pushed the codex/fix-checkpoint-restoreargs branch from 8d7dd31 to ae8888c Compare May 28, 2026 02:09
@kkraus14 kkraus14 force-pushed the codex/fix-checkpoint-restoreargs branch from ae8888c to 9ba2616 Compare May 28, 2026 02:12
@kkraus14
Copy link
Copy Markdown
Collaborator Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.bindings Everything related to the cuda.bindings module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants