Skip to content

perf(types): defer pydantic core-schema builds to cut import memory#959

Merged
bokelley merged 2 commits into
mainfrom
create-pr-with-push-pr-skill
Jun 28, 2026
Merged

perf(types): defer pydantic core-schema builds to cut import memory#959
bokelley merged 2 commits into
mainfrom
create-pr-with-push-pr-skill

Conversation

@andybevan-scope3

Copy link
Copy Markdown
Collaborator

Why

import adcp builds the pydantic-core validator and serializer for every one of the ~1500 generated model classes at import time, even though a given process only ever touches a handful. Those eager builds dominate the import-time memory footprint, and on the embedded sales agent (2Gi limit) they push idle RSS high enough to OOM in staging.

What changed (in plain terms)

Build each model only when it's actually used, not when adcp is imported.

  • Added defer_build=True to AdCPBaseModel. Pydantic now skips building a model's validator/serializer at class-definition time and builds it lazily the first time that model is validated or serialized. After a bare import adcp, only 18 of 1564 model classes are built; the other 1546 stay deferred until something uses them.

  • Fixed one interaction this exposed. AdCPBaseModel.model_dump already forces serialize_as_any=True (so subclass serializers fire when nested under a base-typed field). That option makes pydantic serialize each nested value using its own class serializer, and a model that only ever appears as a nested field never gets its lazy build triggered, so that serializer is still a placeholder. Serializing then raised TypeError: 'MockValSer' object is not an instance of 'SchemaSerializer'. The fix: when that specific error occurs, walk the object graph and build the still-deferred nested classes, then retry. This builds only the classes that appear in real payloads, so the memory win is preserved.

Net effect: same validation and serialization behavior as before, just with the model-build cost moved off the import path and paid lazily per model. The deferred-build memory reclaim is large on the Linux deploy target (the OOM case); it's smaller on macOS because of allocator differences.

Tests

  • Full suite passes: 5804 passed, 41 skipped, 1 xfailed (pre-existing, unrelated).
  • Added test_serialize_as_any_builds_deferred_nested_serializer pinning the nested-deferred-serializer path (a child model created only via the parent's model_validate, then dumped) — this is exactly the case that broke before the fix.
  • make lint and make typecheck-all clean.

Add defer_build=True to AdCPBaseModel so each generated model's
pydantic-core validator/serializer is built lazily on first use rather
than at class definition. With ~1500 generated model classes, eager
builds dominate `import adcp` memory; deferring builds only the working
set (18 of 1564 classes build at import).

Because model_dump forces serialize_as_any=True, pydantic-core dispatches
serialization to each nested instance's own class serializer, which under
defer_build stays a placeholder and bypasses the lazy-build trigger. Walk
the instance graph on that error and rebuild the still-deferred classes,
then retry, so only models that appear in real payloads get built.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes Jun 27, 2026

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, contained perf fix. Right principle: defer_build=True relocates the core-schema build in time without touching the schema itself, so wire output stays byte-identical — field order, exclude_none, discriminator emission, and serialize_as_any subclass dispatch are all properties of the built schema, not of when it's built.

Things I checked

  • Wire shape unchanged. ad-tech-protocol-expert: sound — serialized bytes are identical pre/post for AdCP response/request types. The only behavioral delta is failure-vs-success on the first dump of a nested-only deferred model, which is exactly what the fix recovers.
  • Forward-compat machinery is independent of the change. _forward_compat.py and _ergonomic.py already model_rebuild(force=True) the high-traffic containers (Format, the FormatAssetUnion/GroupFormatAssetUnion open unions, the major response models) at import, so a new upstream variant still routes through _format_asset_discriminator to the UnknownFormatAsset/UnknownGroupAsset arm. Deserialization (model_validate always triggers the lazy build) is untouched. No discriminated-union regression.
  • The guard is correct. base.py:281 if not cls.__pydantic_complete__: cls.model_rebuild(force=False)force=False self-skips when complete; the recursion is idempotent. python-expert confirmed against pydantic v2 semantics. code-reviewer floated that __pydantic_complete__ could be True while the serializer is still a MockValSer placeholder — but the nested-only child class never gets its own build flipped to complete via the parent's validate, so the guard fires. The added regression test (test_serialize_as_any_builds_deferred_nested_serializer) pins exactly that path and is green in the 5804-pass run. That's the empirical answer.
  • Cycle safety. base.py walk keys seen on id() and records before recursing — no infinite loop on cyclic or shared sub-objects.
  • Type-system layering intact. No new imports from generated_poc/ or _generated.py; change is confined to the public base model. Conventional commit perf(types): is the right non-breaking semver signal — defer_build is a model_config change, not a signature or response-shape change.
  • Test plan honest. Full suite green, regression test added for the broken path, lint + typecheck clean. No unchecked manual-verification boxes.

Follow-ups (non-blocking — file as issues)

  • __pydantic_extra__ is not walked. _build_deferred_serializers recurses value.__dict__.values() plus list/tuple/set/dict, but generated types with additionalProperties: true carry extra='allow', and extra values live in __pydantic_extra__, not __dict__. An AdCPBaseModel instance stashed as an extra value under a base-typed field could still hit MockValSer on retry and re-raise. Narrow and uncommon (all three experts converged on it), but it's a silent-failure edge — add getattr(value, "__pydantic_extra__", None) or {} to the walk and a test that stores a model in the extra bag.
  • Memory claim is bounded by the import-time force-rebuild set. _ergonomic.py/_forward_compat.py already force=True-build a large named set at import, so the "18 of 1564" deferral saves less than the bare framing implies. Worth measuring the actual RSS delta on the Linux deploy target so the OOM-fix rationale is backed by a number, not an assertion.

Minor nits (non-blocking)

  1. \"MockValSer\" not in str(exc) couples to a pydantic-core internal. base.py keys recovery on a placeholder class name in the TypeError message. It fails in the safe direction (re-raise, never masks), but a pydantic-core reword silently disables the auto-rebuild with no test to catch it given the wide pydantic>=2,<3 pin. A structural check — not isinstance(cls.__pydantic_serializer__, SchemaSerializer) — would be upgrade-proof.
  2. Duplicated recovery block. The try/except/walk/retry is verbatim in both model_dump and model_dump_json. Extract a one-line helper to keep the recovery in one place.
  3. Benign rebuild race under free-threaded CPython. Concurrent first-time dumps of the same deferred class can both call model_rebuild — duplicate work, not corruption, and safe under the GIL. A one-line comment acknowledging it is enough; revisit only if 3.13t becomes a target.

LGTM. Follow-ups noted below — the __pydantic_extra__ gap is the one worth a tracked issue.

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound optimization with the right escape hatch. defer_build=True moves the per-model core-schema build off the import path without touching what gets built — field order, exclude_none, discriminator emission, and subclass @model_serializer dispatch all derive from the class namespace and call-time kwargs, none of which deferral perturbs. ad-tech-protocol-expert: sound, no wire divergence; python-expert: direction correct; 5804/5804 pass.

Things I checked

  • No wire-shape regression. Serialized output is byte-identical to eager build. The existing _forward_compat.py / _ergonomic.py model_rebuild(force=True) calls on Format / RepeatableAssetGroup and the request/response roots still fire — under deferral those build fresh with the patched fields instead of building-then-rebuilding, marginally cleaner. ad-tech-protocol-expert: sound.
  • The MockValSer failure mode is real and correctly diagnosed. serialize_as_any=True dispatches to each nested instance's own class serializer, a path that bypasses the lazy-build trigger, so a nested-only model keeps a placeholder serializer. base.py:41-67 walks the live instance graph and force-builds the still-deferred classes. Because the walk is exhaustive over the actual object graph and serialize_as_any only touches serializers of instances present in the payload, the single retry at base.py:283-285 / :297-299 converges in one pass for the normal __dict__ graph. code-reviewer flagged a non-convergence Blocker but couldn't execute it; the exhaustive-walk-equals-needed-set argument plus the green suite settles it — downgraded to a Follow-up (the one real gap is extras, below).
  • defer_build survives ConfigDict merge. Generated subclasses setting extra='allow' / 'forbid' (e.g. manifest_schema.py Tools) inherit defer_build=True rather than resetting it — pydantic merges config across the MRO. Intended, and it's what makes the import win real (18/1564 built at import).
  • Cycle handling. id()-keyed seen set guards the recursion at base.py:55-58. Correct, though untested (nit below).
  • Conventional commit. perf(types): — non-breaking, additive config, no public export removed or re-typed. Correct semver signal, no ! needed.
  • CI change. Storyboard matrix keeps the sticky adcp-3.0 compat leg and swaps the moving adcp-3.1 literal for latest. No backwards-compat coverage dropped on the wire; the upper leg now tracks latest (the 3.1 stable line) forward instead of freezing.

Follow-ups (non-blocking — file as issues)

  • __pydantic_extra__ is not walked. _build_deferred_serializers recurses value.__dict__.values(), but under extra='allow' pydantic v2 stores extras in __pydantic_extra__, outside __dict__. A model nested only inside an allow-extra field would leave its serializer deferred and the retry would re-raise raw (the retry is outside the try). Narrow and unreproduced by the suite, but real — add an explicit __pydantic_extra__ sweep.
  • Lazy build shifts schema-validation failures from import to first-use. A model with an unresolvable forward ref the _ergonomic / _forward_compat rebuild lists don't cover would now ship green and raise in a request handler instead of at import. Add a CI test that iterates every AdCPBaseModel subclass and calls model_rebuild(force=True) to restore import-time detection. python-expert: High-value, low-effort.
  • First-build concurrency. Lazy build mutates class-level __pydantic_serializer__ / __pydantic_complete__; pydantic-core's lazy build isn't documented thread-safe. Under concurrent async traffic two requests can race the first build of the same class. Consider pre-warming the response-root working set at startup, or a lock around the recovery rebuild. python-expert: Medium.

Minor nits (non-blocking)

  1. \"MockValSer\" in str(exc) matches an internal pydantic-core repr. base.py:284 / :298. That string isn't contract-stable across the pinned pydantic>=2.0.0,<3 range; a reword turns the guard into a silent re-raise of a real bug. Prefer gating the recovery on observable state (not type(self).__pydantic_complete__) before attempting, rather than sniffing the message.
  2. Duplicated recovery block. The try/except in model_dump and model_dump_json are byte-identical. Factor into one helper so the two paths can't drift.
  3. Test covers one shape. test_serialize_as_any_default.py exercises direct-field + list-field. No coverage for dict-values nesting, the cycle guard, or extras. The happy path is pinned, which is enough to ship — broaden alongside the extras follow-up.
  4. Unrelated CI change bundled into a perf(types) PR. The adcp-3.1latest storyboard swap is orthogonal to the memory work; all three reviewers flagged it. Worth a one-line callout in the body or its own PR next time. (non-blocking)

LGTM. Follow-ups noted below.

aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes Jun 28, 2026

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, contained perf change. Right shape: defer_build=True moves the ~1500 generated-model core-schema builds off the import path, and the one interaction it exposes — serialize_as_any=True dispatching to a nested-only model's MockValSer placeholder — is caught and repaired in place rather than papered over by disabling serialize_as_any. perf(types): is the correct semver signal; no public signature, export, or wire shape changes.

Things I checked

  • Wire-invariant. ad-tech-protocol-expert: the deferred-build + model_rebuild-on-MockValSer path produces byte-identical JSON to eager builds. defer_build changes when the serializer is built, not the inputs (annotations, model_config, FieldInfo) it's built from. No field dropped, no discriminated-union arm mis-serialized.
  • Forward-compat machinery untouched. _forward_compat.py and _ergonomic.py rebuild the patched classes (Format.assets, RepeatableAssetGroup.assets, the BeforeValidator arms) with force=True at import, so they're already __pydantic_complete__. The fallback's force=False walk guards on if not cls.__pydantic_complete__ (base.py:281) and can never re-derive a patched class from stale annotations — the open-union escape hatches survive.
  • Retry terminates. Single-shot, not a loop (base.py:284-285, :300-301): try → walk-and-build → try once more → else raise. model_rebuild(force=False) does real work on incomplete classes (__pydantic_complete__ is False there), so after a complete walk the second dump can't hit the same placeholder.
  • Cycle-safe. _build_deferred_serializers tracks visited instances by id() in seen (base.py:53-58); reference cycles terminate.
  • Fail-loud, not fail-silent. Non-MockValSer TypeErrors re-raise unchanged (base.py:283, :300). The catch is narrow.
  • Test pins the exact regressiontests/test_serialize_as_any_default.py:136 builds the child only via the parent's model_validate (never validated/dumped on its own), then dumps both model_dump and model_dump_json. That's the path that left the serializer deferred.

Follow-ups (non-blocking — file as issues)

  • Thread-safety of first-touch build. Both code-reviewer and python-expert flagged this independently: defer_build moves class-level schema mutation (__pydantic_serializer__, __pydantic_complete__) from single-threaded import time to first serialize, which on a multi-request service (the embedded sales agent that motivated this) can be concurrent. model_rebuild isn't documented thread-safe; worst case is a double-build or a transient half-installed-serializer error on first touch, not corruption. A startup warm-up (model_rebuild the hot response classes) or a module-level lock around the walk closes the window.
  • Unguarded serialization paths. The catch only protects AdCPBaseModel.model_dump[_json]. An AdCP model nested inside a plain pydantic.BaseModel parent dumped with serialize_as_any=True, or an adopter calling TypeAdapter(SomeAdCPModel).dump_python(...), hits the placeholder without the repair. No first-party path does this today (grep is clean), and the failure is loud — but worth a documented known-limitation.
  • adcp-3.0 storyboard leg dropped. Justified — the commit message names npm notarget @adcp/sdk@adcp-3.1, so the job was already red and the upstream publish process moved to latest. The adcp-3.1latest switch trades a pinned compat gate for a moving tag that will silently follow a future @adcp/sdk major; acceptable given the upstream change, but the rationale lives only in the commit body, not the PR description. Worth a one-line note that we no longer assert against the 3.0 line.

Minor nits (non-blocking)

  1. MockValSer substring match is pinned to two private internals. base.py:283/:300 depends on the pydantic-core (Rust) error template and the placeholder class name, across a wide pydantic>=2.0,<3 range with no API contract. A reword or rename in any 2.x silently turns the repair off. A structural pre-check (any reachable model with __pydantic_complete__ False) would be more robust than message parsing; low urgency since the string is Rust-generated and stable today.
  2. Walk skips __pydantic_extra__. _build_deferred_serializers iterates value.__dict__.values() (base.py:56), which misses extras under extra='allow'. The base is ignore/forbid so this is inert today, but the docstring explicitly invites subclasses to override model_config — a nested deferred model held only in an allow subclass's extras would be missed. Also iterating getattr(value, '__pydantic_extra__', None) would future-proof it.

Both python-expert and ad-tech-protocol-expert noted they reasoned from pydantic v2 semantics rather than a live byte-diff (sandbox blocked execution). The PR's own 5804-pass run plus the storyboard/conformance jobs on this branch cover the empirical side.

Approving on the strength of the wire-invariance confirmation plus the narrow, fail-loud repair path. Follow-ups noted above.

The @adcp/sdk publish process changed to publish stable releases under the
`latest` dist-tag and no longer emits an `adcp-3.1` tag (AdCP 3.1 is the
current stable line, shipped as `latest`). The removed `adcp-3.1` tag made
every storyboard runner fail with `npm notarget @adcp/sdk@adcp-3.1`.

Run the storyboard matrix against ["adcp-3.0", "latest"]: the sticky
adcp-3.0 tag (7.11.7) is a fixed, reproducible backwards-compat floor for
the AdCP 3.0 runner line, and `latest` tracks the current AdCP 3.1 stable
line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped fix. Right call to move the per-model build cost off the import path — 18 of 1564 classes built after a bare import adcp is the whole point, and the OOM on the 2Gi sales agent is a real failure mode, not a hypothetical.

The load-bearing insight is that defer_build=True is build-timing only: a lazily-built SchemaSerializer is identical to the eager one. ad-tech-protocol-expert: sound — the wire-shape-critical classes (the Format/RepeatableAssetGroup open unions with the UnknownFormatAsset arm, the _ergonomic.py coerced request/response models, the XOR selectors in aliases.py) are all rebuilt with model_rebuild(force=True) at import time, which runs regardless of defer_build. Discriminated-union dispatch and the forward-compat arms are untouched. code-reviewer: sound-with-caveats, same.

Things I checked

  • serialize_as_any=True interaction is the actual bug being fixed: a model that only ever appears nested keeps a placeholder serializer, and serialize_as_any dispatch hits it. The MockValSer catch + graph-walk + single retry (base.py model_dump/model_dump_json) targets exactly that. Single retry is sufficient because the walk is pre-order over the already-validated instance graph — one pass rebuilds every reachable instance.
  • New test test_serialize_as_any_builds_deferred_nested_serializer pins the exact broken path — child built only via parent model_validate, then dumped. This is the primary user-facing change and it's covered, not shipped on faith.
  • Not a breaking public-API change: defer_build is a model_config addition; model_dump/model_dump_json signatures and output are unchanged. perf(types): is the correct non-breaking conventional-commit prefix.
  • RegistryBaseModel (base.py:270) intentionally left eager — it keeps extra='allow', no serialize_as_any injection, never hits the MockValSer path. Correct omission.
  • Forward-compat machinery unaffected: _forward_compat.py and aliases.py force=True rebuilds un-defer exactly the classes they patch; everything else stays deferred.

Follow-ups (non-blocking — file as issues)

  • Graph walk misses __pydantic_extra__. _build_deferred_serializers recurses value.__dict__.values(), but on extra='allow' models overflow values live in __pydantic_extra__, not __dict__ (verified against pydantic v2). A deferred model instance reachable only via an extra value would not be rebuilt and the retry would re-raise. Narrow in practice — wire extras arrive as raw dicts, not model instances, so this only bites on manual model-instance assignment to an extra field — but worth or {}-iterating getattr(value, "__pydantic_extra__", None) to close it. Add a test that puts a deferred model inside an extra field.
  • MockValSer string-match is version-coupled. The retry fires only when the pydantic-core TypeError contains the literal MockValSer — an internal pydantic-core symbol with no stability guarantee. A pydantic-core rewording silently turns this into an unconditional re-raise and the optimization breaks with no test signal. Pin it: assert the current pydantic-core actually emits the string, and tie it to a version in a comment.

Minor nits (non-blocking)

  1. Duplicated retry block. model_dump and model_dump_json now carry the same try/except/_build_deferred_serializers/retry verbatim. Extract a helper so the two can't drift.
  2. CI tag swap rides along. The adcp-3.1 to latest storyboard change (ci.yml) is a real fix for the npm notarget failure, but it's unrelated to the memory work and trades a sticky tag for a moving one that can turn a green PR red later with no code change. Noting it in the body or splitting it would keep the perf commit clean — a memory optimization is a notable place to also re-point a CI matrix leg.

Approving on the strength of the matching pin test plus both expert verdicts landing on sound. Follow-ups noted.

@bokelley bokelley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the deferred pydantic build change and ran focused local coverage: tests/test_serialize_as_any_default.py, tests/test_type_coercion.py, tests/test_response_builder_subclass.py, tests/test_rootmodel_proxy.py, and tests/test_extra_policy.py all passed locally (73 tests). CI is green.

@bokelley bokelley merged commit f418961 into main Jun 28, 2026
26 checks passed
@bokelley bokelley deleted the create-pr-with-push-pr-skill branch June 28, 2026 04:38
@aao-ipr-bot aao-ipr-bot Bot mentioned this pull request Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants