fix(runtime): RuntimeCache pickle path — wrapper exclusion + handle bytes round-trip by tp5uiuc · Pull Request #4368 · pytorch/TensorRT

tp5uiuc · 2026-06-26T20:20:43Z

Summary

This MR adds three independent pickle-correctness fixes that together let torch.save(module) and standalone-pickle of RuntimeCache / RuntimeCacheHandle work cleanly.

Wrapper __getstate__ / __setstate__ on TorchTensorRTModule excludes _runtime_settings and _implicit_cache_handle from the pickle stream and resets both to defaults on load. Mirrors set_extra_state (line 515) and the line-236 design note: "RuntimeSettings are intentionally NOT serialized: they're per-engine, in-memory init values, not part of the engine's identity." Pickle path now matches state_dict + load_state_dict behavior.
Python _RuntimeCacheHandle.__getstate__ / __setstate__ ships a protocol pair pickling as (path, bytes). Default pickle crashes on _thread.lock; the new protocol carries the cache contents (live cache when materialized, _pending_warm_bytes fallback otherwise) and rebuilds a fresh threading.Lock on the load side. Factored a lock-free _serialize_unlocked helper.
Cpp RuntimeCacheHandle::serialize() falls back to pending_warm_bytes_ when trt_handle_ is null, so callers asking for the handle's persistable bytes (save_to_stream, def_pickle) get the right answer in every lifecycle state. The misleading "called before materialized" warning is retired now that the pre-materialize case hooks into serializable bytes.
As a result Cpp def_pickle moves from std::string (path-only) to std::tuple<std::string, at::Tensor> (path + bytes) and round-trips via the existing serialize / deserialize API. Matches the python facade's shape.

Pre-existing CI noise this resolves

test_cross_runtime_serde::test_save_{cpp_load_python,python_load_python,python_load_cpp} : root cause is _thread.lock failing pickle in python-runtime subprocesses. Fixed by the _RuntimeCacheHandle.__getstate__ protocol.
Sibling PR (fix(runtime): autosave engine-implicit RuntimeCache via atexit + weakref #4362) introduces an _atexit_token slot containing a weakref. The wrapper exclusion here keeps that out of pickle entirely, unblocking the sibling PR's CI if needed.

Refs #4359.

Type of change

Bug fix (non-breaking)

Test plan

test_python_handle_pickle_preserves_pending_warm_bytes — python
handle's bytes round-trip end-to-end (path + pending bytes
preserved; fresh lock; no live cache reused).
Full test_000_runtime_cache.py passes locally (24 passed, 1
unrelated cpp-rt-only skip).
pre-commit run clean on touched files (mypy skipped for two
pre-existing errors at _TorchTensorRTModule.py:380 and :508,
unrelated).

narendasan · 2026-06-29T18:56:52Z

I am still trying to figure out if RuntimeCache is runtime specific or engine specific. If its not engine specific then the module should not serialize. The module should be 1:1 to the engine

I see your point Naren. RuntimeCache is a runtime-config knob and is not engine specific and so not a part of the engine's identity. The same instance can be 1:N with engines: that's why we have an in-memory runtime cache object that the users can attach and share across multiple engines.

Also, since the TorchTensorRTModule wrapper is 1:1 with an engine, we should not serialize runtime cache, and so TorchTensorRTModule.getstate drops _runtime_settings (and _implicit_cache_handle, which aliases the same RuntimeCache) on pickle.

This is consistent with the two serialization paths toda:

_pack_engine_info (line 236 (https://github.com/pytorch/TensorRT/blob/main/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py#L236)): "RuntimeSettings are intentionally NOT serialized: they're per-engine, in-memory init values, not part of the engine's identity (see Move TRT-RTX runtime mode controls from CompilationSettings to runtime context managers #4310)."

set_extra_state (line 515 (https://github.com/pytorch/TensorRT/blob/main/py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py#L515)): explicitly resets _runtime_settings = RuntimeSettings() on load — "Caller can reapply via mod.runtime_settings = ...".

This MR ensures that the pickle path also adheres to the same settings. I have added getstate and setstate for the RuntimeCache in case users want to pickle that object separately, independent from the torch module. Hope that makes sense, happy to sync up offline in case more info needed.

tp5uiuc · 2026-06-29T20:33:00Z

[by Claude Code] CI status after rebase on #4367:

Wheel builds (Linux x86_64, SBSA, Windows, RTX, Python-only) all green now that fix: Some stale apis needed updating in the runtime #4367 fixed the CUDAGraph::debug_dump API mismatch. The new cpp serialize() fallback and the def_pickle(path, bytes) tuple compile and link cleanly across all wheel variants.
Python-only dynamo runtime tests summary: 82 passed, 4 failed, 81 skipped — the 4 fails are all test_004_weight_streaming (pre-existing flake on every PR). test_mutable_torchtrt_module::test_save is in the passing set.
The python-handle pickle test (test_python_handle_pickle_preserves_pending_warm_bytes) runs and passes.

Other red is environmental, identical to #4362 and to main:

L0 dynamo converter — torch-nightly SDPA API change in an unrelated test
L0 torchscript — pre-existing Windows torchscript flakes
RTX Python-only dynamo runtime — same test_004_weight_streaming

…ytes round-trip Two independent pickle-correctness fixes that together let ``torch.save(module)`` and other pickle paths handle ``RuntimeCache`` cleanly. These don't depend on any new attribute additions to ``RuntimeCache``; they're standalone improvements. 1. ``TorchTensorRTModule.__getstate__`` / ``__setstate__`` exclude ``_runtime_settings`` and ``_implicit_cache_handle`` from the pickle stream and reset both to defaults on load. This mirrors ``set_extra_state`` (line 515) and the documented intent at line 236: "RuntimeSettings are intentionally NOT serialized: they're per-engine, in-memory init values, not part of the engine's identity (see pytorch#4310)." Aligns the pickle path with the state_dict path -- the caller is expected to reapply any ``runtime_cache`` / strategy / cuda-graph configuration on the loading side. 2. ``_RuntimeCacheHandle`` (python rt) gains ``__getstate__`` / ``__setstate__``. The default pickle walks ``__dict__``, which crashes on the ``threading.Lock`` (``_thread.lock`` is unpicklable). The new protocol pickles ``(path, bytes)``, matching the cpp ``def_pickle`` shape. The bytes blob captures live cache contents when materialized, falling back to ``_pending_warm_bytes`` so the pre-materialize window is preserved end-to-end. Factored a lock-free ``_serialize_unlocked`` helper to share read logic with ``serialize``. 3. Cpp ``RuntimeCacheHandle::serialize()`` now falls back to ``pending_warm_bytes_`` when ``trt_handle_`` is null, so callers asking for the handle's persistable bytes (``save_to_stream``, ``def_pickle``) get the correct answer in every lifecycle state. The misleading "called before materialized" warning is retired now that the pre-materialize case has a real answer. 4. Cpp ``def_pickle`` switches from ``std::string`` (path only) to ``std::tuple<std::string, at::Tensor>`` (path + bytes) and round-trips via the existing ``serialize`` / ``deserialize`` API. The matching python ``_RuntimeCacheHandle.__getstate__`` uses the same shape. Pre-existing CI failures on ``test_cross_runtime_serde::test_save_*`` (python-runtime subprocess save/load) trace to (2) -- the ``_thread.lock`` failure -- and should resolve once this lands. Test ``test_python_handle_pickle_preserves_pending_warm_bytes`` covers the python handle's bytes round-trip end-to-end. Refs pytorch#4359

- ``RuntimeCacheHandle::serialize``: switch the pending-bytes copy from ``std::memcpy`` to ``std::copy`` (with free-function ``std::cbegin`` / ``std::cend``), and use free-function ``std::empty`` for the empty-vector check. Idiomatic C++17 in a hot path that didn't need the C-string-style API. - Drop the explanatory comment block above the pending-bytes branch. - Trim the wrapper ``__getstate__`` docstring: the rationale that named specific picklability hazards (``weakref``, ``threading.Lock``) is implementation noise that doesn't belong in the wrapper's contract. - Drop the cpp-rt skip on ``test_python_handle_pickle_preserves_pending_warm_bytes``. The test instantiates ``_RuntimeCacheHandle`` directly rather than going through ``RuntimeCache._handle`` selection, so the python class is exercisable regardless of which runtime is active. Updated docstring notes the directness. Behavior is unchanged. Addresses review comments on core/runtime/RuntimeSettings.cpp:131, :136 and tests/py/dynamo/runtime/test_000_runtime_cache.py:341.

tp5uiuc · 2026-07-02T22:23:47Z

[by Claude Code] CI on f547d50 (rebased onto main after #4362 merged):

Wheel builds (Linux x86_64 cu130/cu132, Windows, RTX Linux/Windows, SBSA) all green — the earlier GitLab/eigen flake cleared.
test_000_runtime_cache.py: all 5 new TestRuntimeCacheAutosave tests (landed via fix(runtime): autosave engine-implicit RuntimeCache via atexit + weakref #4362) + test_python_handle_pickle_preserves_pending_warm_bytes (this PR) pass. Pytest summary on the RTX runtime target: 1 failed, 138 passed, 52 skipped — the 1 failure is unrelated (see below).
No conflicts remaining; wrapper __getstate__ docstring's stale line 515 reference to set_extra_state was dropped since line numbers shifted after fix(runtime): autosave engine-implicit RuntimeCache via atexit + weakref #4362 merged.

Remaining red is environmental / pre-existing on main:

Failing job	Cause
RTX Python-only dynamo runtime (cu130, cu132 ×2)	`test_multi_optimization_profiles::test_profiles_restored_from_trt_api_without_metadata` — Myelin "Must be called with an execution graph cache". Also fails on main (verified against `97defd7`).
Build Jetpack (cu126 aarch64)	`CUDAGraph::has_graph_exec` accessibility error — Jetpack path did not pick up #4367's fix; fails on main.
L0 dynamo converter (cu130)	torch-nightly SDPA API change — unrelated.
L0 torchscript (cu130)	Pre-existing Windows torchscript flakes — unrelated.

Nothing this PR modifies is on any failure list.

meta-cla Bot added the cla signed label Jun 26, 2026

github-actions Bot added component: tests Issues re: Tests component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jun 26, 2026

tp5uiuc mentioned this pull request Jun 26, 2026

fix(runtime): autosave engine-implicit RuntimeCache via atexit + weakref #4362

Merged

6 tasks

tp5uiuc commented Jun 26, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.cpp Outdated

tp5uiuc commented Jun 26, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.cpp Outdated

tp5uiuc commented Jun 26, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py

tp5uiuc commented Jun 26, 2026

View reviewed changes

Comment thread tests/py/dynamo/runtime/test_000_runtime_cache.py Outdated

tp5uiuc self-assigned this Jun 27, 2026

tp5uiuc marked this pull request as ready for review June 27, 2026 00:08

tp5uiuc requested a review from narendasan June 27, 2026 00:08

narendasan reviewed Jun 29, 2026

View reviewed changes

tp5uiuc force-pushed the fix/runtimecache-pickle-roundtrip branch from 1ad40e3 to 9776534 Compare June 29, 2026 19:02

tp5uiuc added 2 commits July 2, 2026 13:05

tp5uiuc force-pushed the fix/runtimecache-pickle-roundtrip branch from 9776534 to f547d50 Compare July 2, 2026 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(runtime): RuntimeCache pickle path — wrapper exclusion + handle bytes round-trip#4368

fix(runtime): RuntimeCache pickle path — wrapper exclusion + handle bytes round-trip#4368
tp5uiuc wants to merge 2 commits into
pytorch:mainfrom
tp5uiuc:fix/runtimecache-pickle-roundtrip

tp5uiuc commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan Jun 29, 2026

Uh oh!

tp5uiuc Jun 29, 2026

Uh oh!

tp5uiuc commented Jun 29, 2026

Uh oh!

tp5uiuc commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tp5uiuc commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pre-existing CI noise this resolves

Type of change

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

tp5uiuc Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

tp5uiuc commented Jun 29, 2026

Uh oh!

tp5uiuc commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tp5uiuc commented Jun 26, 2026 •

edited

Loading