perf(types): defer pydantic core-schema builds to cut import memory#959
Conversation
Add defer_build=True to AdCPBaseModel so each generated model's pydantic-core validator/serializer is built lazily on first use rather than at class definition. With ~1500 generated model classes, eager builds dominate `import adcp` memory; deferring builds only the working set (18 of 1564 classes build at import). Because model_dump forces serialize_as_any=True, pydantic-core dispatches serialization to each nested instance's own class serializer, which under defer_build stays a placeholder and bypasses the lazy-build trigger. Walk the instance graph on that error and rebuild the still-deferred classes, then retry, so only models that appear in real payloads get built. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Clean, contained perf fix. Right principle: defer_build=True relocates the core-schema build in time without touching the schema itself, so wire output stays byte-identical — field order, exclude_none, discriminator emission, and serialize_as_any subclass dispatch are all properties of the built schema, not of when it's built.
Things I checked
- Wire shape unchanged.
ad-tech-protocol-expert: sound — serialized bytes are identical pre/post for AdCP response/request types. The only behavioral delta is failure-vs-success on the first dump of a nested-only deferred model, which is exactly what the fix recovers. - Forward-compat machinery is independent of the change.
_forward_compat.pyand_ergonomic.pyalreadymodel_rebuild(force=True)the high-traffic containers (Format, theFormatAssetUnion/GroupFormatAssetUnionopen unions, the major response models) at import, so a new upstream variant still routes through_format_asset_discriminatorto theUnknownFormatAsset/UnknownGroupAssetarm. Deserialization (model_validatealways triggers the lazy build) is untouched. No discriminated-union regression. - The guard is correct.
base.py:281if not cls.__pydantic_complete__: cls.model_rebuild(force=False)—force=Falseself-skips when complete; the recursion is idempotent.python-expertconfirmed against pydantic v2 semantics.code-reviewerfloated that__pydantic_complete__could beTruewhile the serializer is still aMockValSerplaceholder — but the nested-only child class never gets its own build flipped to complete via the parent's validate, so the guard fires. The added regression test (test_serialize_as_any_builds_deferred_nested_serializer) pins exactly that path and is green in the 5804-pass run. That's the empirical answer. - Cycle safety.
base.pywalk keysseenonid()and records before recursing — no infinite loop on cyclic or shared sub-objects. - Type-system layering intact. No new imports from
generated_poc/or_generated.py; change is confined to the public base model. Conventional commitperf(types):is the right non-breaking semver signal —defer_buildis amodel_configchange, not a signature or response-shape change. - Test plan honest. Full suite green, regression test added for the broken path, lint + typecheck clean. No unchecked manual-verification boxes.
Follow-ups (non-blocking — file as issues)
__pydantic_extra__is not walked._build_deferred_serializersrecursesvalue.__dict__.values()plus list/tuple/set/dict, but generated types withadditionalProperties: truecarryextra='allow', and extra values live in__pydantic_extra__, not__dict__. AnAdCPBaseModelinstance stashed as an extra value under a base-typed field could still hitMockValSeron retry and re-raise. Narrow and uncommon (all three experts converged on it), but it's a silent-failure edge — addgetattr(value, "__pydantic_extra__", None) or {}to the walk and a test that stores a model in the extra bag.- Memory claim is bounded by the import-time force-rebuild set.
_ergonomic.py/_forward_compat.pyalreadyforce=True-build a large named set at import, so the "18 of 1564" deferral saves less than the bare framing implies. Worth measuring the actual RSS delta on the Linux deploy target so the OOM-fix rationale is backed by a number, not an assertion.
Minor nits (non-blocking)
\"MockValSer\" not in str(exc)couples to a pydantic-core internal.base.pykeys recovery on a placeholder class name in theTypeErrormessage. It fails in the safe direction (re-raise, never masks), but a pydantic-core reword silently disables the auto-rebuild with no test to catch it given the widepydantic>=2,<3pin. A structural check —not isinstance(cls.__pydantic_serializer__, SchemaSerializer)— would be upgrade-proof.- Duplicated recovery block. The try/except/walk/retry is verbatim in both
model_dumpandmodel_dump_json. Extract a one-line helper to keep the recovery in one place. - Benign rebuild race under free-threaded CPython. Concurrent first-time dumps of the same deferred class can both call
model_rebuild— duplicate work, not corruption, and safe under the GIL. A one-line comment acknowledging it is enough; revisit only if 3.13t becomes a target.
LGTM. Follow-ups noted below — the __pydantic_extra__ gap is the one worth a tracked issue.
07f960f to
8b56b3d
Compare
There was a problem hiding this comment.
Sound optimization with the right escape hatch. defer_build=True moves the per-model core-schema build off the import path without touching what gets built — field order, exclude_none, discriminator emission, and subclass @model_serializer dispatch all derive from the class namespace and call-time kwargs, none of which deferral perturbs. ad-tech-protocol-expert: sound, no wire divergence; python-expert: direction correct; 5804/5804 pass.
Things I checked
- No wire-shape regression. Serialized output is byte-identical to eager build. The existing
_forward_compat.py/_ergonomic.pymodel_rebuild(force=True)calls onFormat/RepeatableAssetGroupand the request/response roots still fire — under deferral those build fresh with the patched fields instead of building-then-rebuilding, marginally cleaner.ad-tech-protocol-expert: sound. - The
MockValSerfailure mode is real and correctly diagnosed.serialize_as_any=Truedispatches to each nested instance's own class serializer, a path that bypasses the lazy-build trigger, so a nested-only model keeps a placeholder serializer.base.py:41-67walks the live instance graph and force-builds the still-deferred classes. Because the walk is exhaustive over the actual object graph andserialize_as_anyonly touches serializers of instances present in the payload, the single retry atbase.py:283-285/:297-299converges in one pass for the normal__dict__graph.code-reviewerflagged a non-convergence Blocker but couldn't execute it; the exhaustive-walk-equals-needed-set argument plus the green suite settles it — downgraded to a Follow-up (the one real gap is extras, below). - defer_build survives ConfigDict merge. Generated subclasses setting
extra='allow'/'forbid'(e.g.manifest_schema.pyTools) inheritdefer_build=Truerather than resetting it — pydantic merges config across the MRO. Intended, and it's what makes the import win real (18/1564 built at import). - Cycle handling.
id()-keyedseenset guards the recursion atbase.py:55-58. Correct, though untested (nit below). - Conventional commit.
perf(types):— non-breaking, additive config, no public export removed or re-typed. Correct semver signal, no!needed. - CI change. Storyboard matrix keeps the sticky
adcp-3.0compat leg and swaps the movingadcp-3.1literal forlatest. No backwards-compat coverage dropped on the wire; the upper leg now trackslatest(the 3.1 stable line) forward instead of freezing.
Follow-ups (non-blocking — file as issues)
__pydantic_extra__is not walked._build_deferred_serializersrecursesvalue.__dict__.values(), but underextra='allow'pydantic v2 stores extras in__pydantic_extra__, outside__dict__. A model nested only inside an allow-extra field would leave its serializer deferred and the retry would re-raise raw (the retry is outside thetry). Narrow and unreproduced by the suite, but real — add an explicit__pydantic_extra__sweep.- Lazy build shifts schema-validation failures from import to first-use. A model with an unresolvable forward ref the
_ergonomic/_forward_compatrebuild lists don't cover would now ship green and raise in a request handler instead of at import. Add a CI test that iterates everyAdCPBaseModelsubclass and callsmodel_rebuild(force=True)to restore import-time detection.python-expert: High-value, low-effort. - First-build concurrency. Lazy build mutates class-level
__pydantic_serializer__/__pydantic_complete__; pydantic-core's lazy build isn't documented thread-safe. Under concurrent async traffic two requests can race the first build of the same class. Consider pre-warming the response-root working set at startup, or a lock around the recovery rebuild.python-expert: Medium.
Minor nits (non-blocking)
\"MockValSer\" in str(exc)matches an internal pydantic-core repr.base.py:284/:298. That string isn't contract-stable across the pinnedpydantic>=2.0.0,<3range; a reword turns the guard into a silent re-raise of a real bug. Prefer gating the recovery on observable state (not type(self).__pydantic_complete__) before attempting, rather than sniffing the message.- Duplicated recovery block. The
try/exceptinmodel_dumpandmodel_dump_jsonare byte-identical. Factor into one helper so the two paths can't drift. - Test covers one shape.
test_serialize_as_any_default.pyexercises direct-field + list-field. No coverage for dict-values nesting, the cycle guard, or extras. The happy path is pinned, which is enough to ship — broaden alongside the extras follow-up. - Unrelated CI change bundled into a
perf(types)PR. Theadcp-3.1→lateststoryboard swap is orthogonal to the memory work; all three reviewers flagged it. Worth a one-line callout in the body or its own PR next time. (non-blocking)
LGTM. Follow-ups noted below.
There was a problem hiding this comment.
Clean, contained perf change. Right shape: defer_build=True moves the ~1500 generated-model core-schema builds off the import path, and the one interaction it exposes — serialize_as_any=True dispatching to a nested-only model's MockValSer placeholder — is caught and repaired in place rather than papered over by disabling serialize_as_any. perf(types): is the correct semver signal; no public signature, export, or wire shape changes.
Things I checked
- Wire-invariant.
ad-tech-protocol-expert: the deferred-build +model_rebuild-on-MockValSerpath produces byte-identical JSON to eager builds.defer_buildchanges when the serializer is built, not the inputs (annotations,model_config,FieldInfo) it's built from. No field dropped, no discriminated-union arm mis-serialized. - Forward-compat machinery untouched.
_forward_compat.pyand_ergonomic.pyrebuild the patched classes (Format.assets,RepeatableAssetGroup.assets, theBeforeValidatorarms) withforce=Trueat import, so they're already__pydantic_complete__. The fallback'sforce=Falsewalk guards onif not cls.__pydantic_complete__(base.py:281) and can never re-derive a patched class from stale annotations — the open-union escape hatches survive. - Retry terminates. Single-shot, not a loop (
base.py:284-285,:300-301): try → walk-and-build → try once more → else raise.model_rebuild(force=False)does real work on incomplete classes (__pydantic_complete__isFalsethere), so after a complete walk the second dump can't hit the same placeholder. - Cycle-safe.
_build_deferred_serializerstracks visited instances byid()inseen(base.py:53-58); reference cycles terminate. - Fail-loud, not fail-silent. Non-
MockValSerTypeErrors re-raise unchanged (base.py:283,:300). The catch is narrow. - Test pins the exact regression —
tests/test_serialize_as_any_default.py:136builds the child only via the parent'smodel_validate(never validated/dumped on its own), then dumps bothmodel_dumpandmodel_dump_json. That's the path that left the serializer deferred.
Follow-ups (non-blocking — file as issues)
- Thread-safety of first-touch build. Both
code-reviewerandpython-expertflagged this independently:defer_buildmoves class-level schema mutation (__pydantic_serializer__,__pydantic_complete__) from single-threaded import time to first serialize, which on a multi-request service (the embedded sales agent that motivated this) can be concurrent.model_rebuildisn't documented thread-safe; worst case is a double-build or a transient half-installed-serializer error on first touch, not corruption. A startup warm-up (model_rebuildthe hot response classes) or a module-level lock around the walk closes the window. - Unguarded serialization paths. The catch only protects
AdCPBaseModel.model_dump[_json]. An AdCP model nested inside a plainpydantic.BaseModelparent dumped withserialize_as_any=True, or an adopter callingTypeAdapter(SomeAdCPModel).dump_python(...), hits the placeholder without the repair. No first-party path does this today (grep is clean), and the failure is loud — but worth a documented known-limitation. adcp-3.0storyboard leg dropped. Justified — the commit message namesnpm notarget @adcp/sdk@adcp-3.1, so the job was already red and the upstream publish process moved tolatest. Theadcp-3.1→latestswitch trades a pinned compat gate for a moving tag that will silently follow a future@adcp/sdkmajor; acceptable given the upstream change, but the rationale lives only in the commit body, not the PR description. Worth a one-line note that we no longer assert against the 3.0 line.
Minor nits (non-blocking)
MockValSersubstring match is pinned to two private internals.base.py:283/:300depends on the pydantic-core (Rust) error template and the placeholder class name, across a widepydantic>=2.0,<3range with no API contract. A reword or rename in any 2.x silently turns the repair off. A structural pre-check (any reachable model with__pydantic_complete__False) would be more robust than message parsing; low urgency since the string is Rust-generated and stable today.- Walk skips
__pydantic_extra__._build_deferred_serializersiteratesvalue.__dict__.values()(base.py:56), which misses extras underextra='allow'. The base isignore/forbidso this is inert today, but the docstring explicitly invites subclasses to overridemodel_config— a nested deferred model held only in anallowsubclass's extras would be missed. Also iteratinggetattr(value, '__pydantic_extra__', None)would future-proof it.
Both python-expert and ad-tech-protocol-expert noted they reasoned from pydantic v2 semantics rather than a live byte-diff (sandbox blocked execution). The PR's own 5804-pass run plus the storyboard/conformance jobs on this branch cover the empirical side.
Approving on the strength of the wire-invariance confirmation plus the narrow, fail-loud repair path. Follow-ups noted above.
The @adcp/sdk publish process changed to publish stable releases under the `latest` dist-tag and no longer emits an `adcp-3.1` tag (AdCP 3.1 is the current stable line, shipped as `latest`). The removed `adcp-3.1` tag made every storyboard runner fail with `npm notarget @adcp/sdk@adcp-3.1`. Run the storyboard matrix against ["adcp-3.0", "latest"]: the sticky adcp-3.0 tag (7.11.7) is a fixed, reproducible backwards-compat floor for the AdCP 3.0 runner line, and `latest` tracks the current AdCP 3.1 stable line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
8b56b3d to
1233c5c
Compare
There was a problem hiding this comment.
Clean, well-scoped fix. Right call to move the per-model build cost off the import path — 18 of 1564 classes built after a bare import adcp is the whole point, and the OOM on the 2Gi sales agent is a real failure mode, not a hypothetical.
The load-bearing insight is that defer_build=True is build-timing only: a lazily-built SchemaSerializer is identical to the eager one. ad-tech-protocol-expert: sound — the wire-shape-critical classes (the Format/RepeatableAssetGroup open unions with the UnknownFormatAsset arm, the _ergonomic.py coerced request/response models, the XOR selectors in aliases.py) are all rebuilt with model_rebuild(force=True) at import time, which runs regardless of defer_build. Discriminated-union dispatch and the forward-compat arms are untouched. code-reviewer: sound-with-caveats, same.
Things I checked
serialize_as_any=Trueinteraction is the actual bug being fixed: a model that only ever appears nested keeps a placeholder serializer, andserialize_as_anydispatch hits it. TheMockValSercatch + graph-walk + single retry (base.pymodel_dump/model_dump_json) targets exactly that. Single retry is sufficient because the walk is pre-order over the already-validated instance graph — one pass rebuilds every reachable instance.- New test
test_serialize_as_any_builds_deferred_nested_serializerpins the exact broken path — child built only via parentmodel_validate, then dumped. This is the primary user-facing change and it's covered, not shipped on faith. - Not a breaking public-API change:
defer_buildis amodel_configaddition;model_dump/model_dump_jsonsignatures and output are unchanged.perf(types):is the correct non-breaking conventional-commit prefix. RegistryBaseModel(base.py:270) intentionally left eager — it keepsextra='allow', noserialize_as_anyinjection, never hits theMockValSerpath. Correct omission.- Forward-compat machinery unaffected:
_forward_compat.pyandaliases.pyforce=Truerebuilds un-defer exactly the classes they patch; everything else stays deferred.
Follow-ups (non-blocking — file as issues)
- Graph walk misses
__pydantic_extra__._build_deferred_serializersrecursesvalue.__dict__.values(), but onextra='allow'models overflow values live in__pydantic_extra__, not__dict__(verified against pydantic v2). A deferred model instance reachable only via an extra value would not be rebuilt and the retry would re-raise. Narrow in practice — wire extras arrive as raw dicts, not model instances, so this only bites on manual model-instance assignment to an extra field — but worthor {}-iteratinggetattr(value, "__pydantic_extra__", None)to close it. Add a test that puts a deferred model inside an extra field. MockValSerstring-match is version-coupled. The retry fires only when the pydantic-coreTypeErrorcontains the literalMockValSer— an internal pydantic-core symbol with no stability guarantee. A pydantic-core rewording silently turns this into an unconditional re-raise and the optimization breaks with no test signal. Pin it: assert the current pydantic-core actually emits the string, and tie it to a version in a comment.
Minor nits (non-blocking)
- Duplicated retry block.
model_dumpandmodel_dump_jsonnow carry the same try/except/_build_deferred_serializers/retry verbatim. Extract a helper so the two can't drift. - CI tag swap rides along. The
adcp-3.1tolateststoryboard change (ci.yml) is a real fix for thenpm notargetfailure, but it's unrelated to the memory work and trades a sticky tag for a moving one that can turn a green PR red later with no code change. Noting it in the body or splitting it would keep the perf commit clean — a memory optimization is a notable place to also re-point a CI matrix leg.
Approving on the strength of the matching pin test plus both expert verdicts landing on sound. Follow-ups noted.
bokelley
left a comment
There was a problem hiding this comment.
Reviewed the deferred pydantic build change and ran focused local coverage: tests/test_serialize_as_any_default.py, tests/test_type_coercion.py, tests/test_response_builder_subclass.py, tests/test_rootmodel_proxy.py, and tests/test_extra_policy.py all passed locally (73 tests). CI is green.
Why
import adcpbuilds the pydantic-core validator and serializer for every one of the ~1500 generated model classes at import time, even though a given process only ever touches a handful. Those eager builds dominate the import-time memory footprint, and on the embedded sales agent (2Gi limit) they push idle RSS high enough to OOM in staging.What changed (in plain terms)
Build each model only when it's actually used, not when
adcpis imported.Added
defer_build=TruetoAdCPBaseModel. Pydantic now skips building a model's validator/serializer at class-definition time and builds it lazily the first time that model is validated or serialized. After a bareimport adcp, only 18 of 1564 model classes are built; the other 1546 stay deferred until something uses them.Fixed one interaction this exposed.
AdCPBaseModel.model_dumpalready forcesserialize_as_any=True(so subclass serializers fire when nested under a base-typed field). That option makes pydantic serialize each nested value using its own class serializer, and a model that only ever appears as a nested field never gets its lazy build triggered, so that serializer is still a placeholder. Serializing then raisedTypeError: 'MockValSer' object is not an instance of 'SchemaSerializer'. The fix: when that specific error occurs, walk the object graph and build the still-deferred nested classes, then retry. This builds only the classes that appear in real payloads, so the memory win is preserved.Net effect: same validation and serialization behavior as before, just with the model-build cost moved off the import path and paid lazily per model. The deferred-build memory reclaim is large on the Linux deploy target (the OOM case); it's smaller on macOS because of allocator differences.
Tests
test_serialize_as_any_builds_deferred_nested_serializerpinning the nested-deferred-serializer path (a child model created only via the parent'smodel_validate, then dumped) — this is exactly the case that broke before the fix.make lintandmake typecheck-allclean.