Skip to content

fix(runtime): RuntimeCache pickle path — wrapper exclusion + handle bytes round-trip#4368

Open
tp5uiuc wants to merge 2 commits into
pytorch:mainfrom
tp5uiuc:fix/runtimecache-pickle-roundtrip
Open

fix(runtime): RuntimeCache pickle path — wrapper exclusion + handle bytes round-trip#4368
tp5uiuc wants to merge 2 commits into
pytorch:mainfrom
tp5uiuc:fix/runtimecache-pickle-roundtrip

Conversation

@tp5uiuc

@tp5uiuc tp5uiuc commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

This MR adds three independent pickle-correctness fixes that together let torch.save(module) and standalone-pickle of RuntimeCache / RuntimeCacheHandle work cleanly.

  • Wrapper __getstate__ / __setstate__ on TorchTensorRTModule excludes _runtime_settings and _implicit_cache_handle from the pickle stream and resets both to defaults on load. Mirrors set_extra_state (line 515) and the line-236 design note: "RuntimeSettings are intentionally NOT serialized: they're per-engine, in-memory init values, not part of the engine's identity." Pickle path now matches state_dict + load_state_dict behavior.
  • Python _RuntimeCacheHandle.__getstate__ / __setstate__ ships a protocol pair pickling as (path, bytes). Default pickle crashes on _thread.lock; the new protocol carries the cache contents (live cache when materialized, _pending_warm_bytes fallback otherwise) and rebuilds a fresh threading.Lock on the load side. Factored a lock-free _serialize_unlocked helper.
  • Cpp RuntimeCacheHandle::serialize() falls back to pending_warm_bytes_ when trt_handle_ is null, so callers asking for the handle's persistable bytes (save_to_stream, def_pickle) get the right answer in every lifecycle state. The misleading "called before materialized" warning is retired now that the pre-materialize case hooks into serializable bytes.
  • As a result Cpp def_pickle moves from std::string (path-only) to std::tuple<std::string, at::Tensor> (path + bytes) and round-trips via the existing serialize / deserialize API. Matches the python facade's shape.

Pre-existing CI noise this resolves

  • test_cross_runtime_serde::test_save_{cpp_load_python,python_load_python,python_load_cpp} : root cause is _thread.lock failing pickle in python-runtime subprocesses. Fixed by the _RuntimeCacheHandle.__getstate__ protocol.
  • Sibling PR (fix(runtime): autosave engine-implicit RuntimeCache via atexit + weakref #4362) introduces an _atexit_token slot containing a weakref. The wrapper exclusion here keeps that out of pickle entirely, unblocking the sibling PR's CI if needed.

Refs #4359.

Type of change

  • Bug fix (non-breaking)

Test plan

  • test_python_handle_pickle_preserves_pending_warm_bytes — python
    handle's bytes round-trip end-to-end (path + pending bytes
    preserved; fresh lock; no live cache reused).
  • Full test_000_runtime_cache.py passes locally (24 passed, 1
    unrelated cpp-rt-only skip).
  • pre-commit run clean on touched files (mypy skipped for two
    pre-existing errors at _TorchTensorRTModule.py:380 and :508,
    unrelated).

@meta-cla meta-cla Bot added the cla signed label Jun 26, 2026
@github-actions github-actions Bot added component: tests Issues re: Tests component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jun 26, 2026
Comment thread core/runtime/RuntimeSettings.cpp Outdated
Comment thread core/runtime/RuntimeSettings.cpp Outdated
Comment thread py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py
Comment thread tests/py/dynamo/runtime/test_000_runtime_cache.py Outdated
@tp5uiuc tp5uiuc self-assigned this Jun 27, 2026
@tp5uiuc tp5uiuc marked this pull request as ready for review June 27, 2026 00:08
@tp5uiuc tp5uiuc requested a review from narendasan June 27, 2026 00:08

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still trying to figure out if RuntimeCache is runtime specific or engine specific. If its not engine specific then the module should not serialize. The module should be 1:1 to the engine

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point Naren. RuntimeCache is a runtime-config knob and is not engine specific and so not a part of the engine's identity. The same instance can be 1:N with engines: that's why we have an in-memory runtime cache object that the users can attach and share across multiple engines.

Also, since the TorchTensorRTModule wrapper is 1:1 with an engine, we should not serialize runtime cache, and so TorchTensorRTModule.getstate drops _runtime_settings (and _implicit_cache_handle, which aliases the same RuntimeCache) on pickle.

This is consistent with the two serialization paths toda:

This MR ensures that the pickle path also adheres to the same settings. I have added getstate and setstate for the RuntimeCache in case users want to pickle that object separately, independent from the torch module. Hope that makes sense, happy to sync up offline in case more info needed.

@tp5uiuc tp5uiuc force-pushed the fix/runtimecache-pickle-roundtrip branch from 1ad40e3 to 9776534 Compare June 29, 2026 19:02
@tp5uiuc

tp5uiuc commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

[by Claude Code] CI status after rebase on #4367:

  • Wheel builds (Linux x86_64, SBSA, Windows, RTX, Python-only) all green now that fix: Some stale apis needed updating in the runtime #4367 fixed the CUDAGraph::debug_dump API mismatch. The new cpp serialize() fallback and the def_pickle(path, bytes) tuple compile and link cleanly across all wheel variants.
  • Python-only dynamo runtime tests summary: 82 passed, 4 failed, 81 skipped — the 4 fails are all test_004_weight_streaming (pre-existing flake on every PR). test_mutable_torchtrt_module::test_save is in the passing set.
  • The python-handle pickle test (test_python_handle_pickle_preserves_pending_warm_bytes) runs and passes.

Other red is environmental, identical to #4362 and to main:

  • L0 dynamo converter — torch-nightly SDPA API change in an unrelated test
  • L0 torchscript — pre-existing Windows torchscript flakes
  • RTX Python-only dynamo runtime — same test_004_weight_streaming

tp5uiuc added 2 commits July 2, 2026 13:05
…ytes round-trip

Two independent pickle-correctness fixes that together let
``torch.save(module)`` and other pickle paths handle ``RuntimeCache``
cleanly. These don't depend on any new attribute additions to
``RuntimeCache``; they're standalone improvements.

1. ``TorchTensorRTModule.__getstate__`` / ``__setstate__`` exclude
   ``_runtime_settings`` and ``_implicit_cache_handle`` from the pickle
   stream and reset both to defaults on load. This mirrors
   ``set_extra_state`` (line 515) and the documented intent at line 236:
   "RuntimeSettings are intentionally NOT serialized: they're per-engine,
   in-memory init values, not part of the engine's identity (see pytorch#4310)."
   Aligns the pickle path with the state_dict path -- the caller is
   expected to reapply any ``runtime_cache`` / strategy / cuda-graph
   configuration on the loading side.

2. ``_RuntimeCacheHandle`` (python rt) gains ``__getstate__`` /
   ``__setstate__``. The default pickle walks ``__dict__``, which
   crashes on the ``threading.Lock`` (``_thread.lock`` is unpicklable).
   The new protocol pickles ``(path, bytes)``, matching the cpp
   ``def_pickle`` shape. The bytes blob captures live cache contents
   when materialized, falling back to ``_pending_warm_bytes`` so the
   pre-materialize window is preserved end-to-end. Factored a
   lock-free ``_serialize_unlocked`` helper to share read logic with
   ``serialize``.

3. Cpp ``RuntimeCacheHandle::serialize()`` now falls back to
   ``pending_warm_bytes_`` when ``trt_handle_`` is null, so callers
   asking for the handle's persistable bytes (``save_to_stream``,
   ``def_pickle``) get the correct answer in every lifecycle state.
   The misleading "called before materialized" warning is retired now
   that the pre-materialize case has a real answer.

4. Cpp ``def_pickle`` switches from ``std::string`` (path only) to
   ``std::tuple<std::string, at::Tensor>`` (path + bytes) and round-trips
   via the existing ``serialize`` / ``deserialize`` API. The matching
   python ``_RuntimeCacheHandle.__getstate__`` uses the same shape.

Pre-existing CI failures on ``test_cross_runtime_serde::test_save_*``
(python-runtime subprocess save/load) trace to (2) -- the
``_thread.lock`` failure -- and should resolve once this lands.

Test ``test_python_handle_pickle_preserves_pending_warm_bytes`` covers
the python handle's bytes round-trip end-to-end.

Refs pytorch#4359
- ``RuntimeCacheHandle::serialize``: switch the pending-bytes copy from
  ``std::memcpy`` to ``std::copy`` (with free-function ``std::cbegin``
  / ``std::cend``), and use free-function ``std::empty`` for the
  empty-vector check. Idiomatic C++17 in a hot path that didn't need
  the C-string-style API.
- Drop the explanatory comment block above the pending-bytes branch.
- Trim the wrapper ``__getstate__`` docstring: the rationale that named
  specific picklability hazards (``weakref``, ``threading.Lock``) is
  implementation noise that doesn't belong in the wrapper's contract.
- Drop the cpp-rt skip on
  ``test_python_handle_pickle_preserves_pending_warm_bytes``. The test
  instantiates ``_RuntimeCacheHandle`` directly rather than going
  through ``RuntimeCache._handle`` selection, so the python class is
  exercisable regardless of which runtime is active. Updated docstring
  notes the directness.

Behavior is unchanged. Addresses review comments on
core/runtime/RuntimeSettings.cpp:131, :136 and
tests/py/dynamo/runtime/test_000_runtime_cache.py:341.
@tp5uiuc tp5uiuc force-pushed the fix/runtimecache-pickle-roundtrip branch from 9776534 to f547d50 Compare July 2, 2026 20:07
@tp5uiuc

tp5uiuc commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

[by Claude Code] CI on f547d50 (rebased onto main after #4362 merged):

Remaining red is environmental / pre-existing on main:

Failing job Cause
RTX Python-only dynamo runtime (cu130, cu132 ×2) test_multi_optimization_profiles::test_profiles_restored_from_trt_api_without_metadata — Myelin "Must be called with an execution graph cache". Also fails on main (verified against 97defd7).
Build Jetpack (cu126 aarch64) CUDAGraph::has_graph_exec accessibility error — Jetpack path did not pick up #4367's fix; fails on main.
L0 dynamo converter (cu130) torch-nightly SDPA API change — unrelated.
L0 torchscript (cu130) Pre-existing Windows torchscript flakes — unrelated.

Nothing this PR modifies is on any failure list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: runtime component: tests Issues re: Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants