Skip to content

chore(types): prune unused bundled generated modules#960

Merged
bokelley merged 2 commits into
mainfrom
restructure-dedup-completion
Jun 28, 2026
Merged

chore(types): prune unused bundled generated modules#960
bokelley merged 2 commits into
mainfrom
restructure-dedup-completion

Conversation

@andybevan-scope3

Copy link
Copy Markdown
Collaborator

What this does, in plain terms

The library auto-generates Python type files from JSON schemas into
src/adcp/types/generated_poc/. One sub-folder, bundled/, had grown to
dominate that tree: ~92% of all generated classes lived there.

That folder is generated from "bundled" schemas, which inline their entire
reference graph. So every bundled message file re-emitted its own private copy
of the same shared sub-schemas (provenance, verification, assets, etc.) over and
over (e.g. VerifyAgent appeared 3,762 times).

The catch: almost none of it is used. Of ~97 bundled modules, exactly one
is imported by real code: bundled/protocol/get_adcp_capabilities_response.py
(via adcp.types.capabilities, whose typed sub-models only exist in the inlined
form). consolidate_exports.py already excludes the bundled tree from the public
namespace, and it isn't even loaded at import adcp.

So this PR deletes the 95 unreferenced bundled modules and teaches the generator
not to recreate them.

What changed

  • Deleted 95 unused bundled/ modules (kept only the one consumed module +
    its package __init__ files).
  • Added a prune_unused_bundled_modules() step to scripts/generate_types.py
    (runs after post-generation fixes, before consolidation). Generation itself is
    unchanged; the step only removes the unreferenced output afterward, so every
    retained module stays byte-identical. Keep-set lives in the BUNDLED_KEEP
    constant.

Impact

Before After
Generated class definitions 30,173 2,382 (−27,791, −92%)
Generated source lines ~686k ~61k (−625k)
Full test suite 5,824 passed 5,824 passed, 0 failed

make lint, make typecheck, and the full pytest suite all pass.

Scope notes

  • This is a source-tree / class-count cleanup. It does not lower
    import adcp memory (the bundled tree was lazy and never loaded). Import-time
    RSS is a separate concern handled by the defer_build work in the el-paso
    worktree.
  • Follow-up (not in this PR): regeneration still generates bundled/media_buy
    and then prunes it. That split-generation pass is now dead weight and can be
    skipped to speed up make regenerate-schemas.
  • A full make regenerate-schemas run to exercise the generation→prune
    integration end-to-end has not been run yet; the end-state and the prune
    function were verified directly (deletion + full suite green, idempotent prune).

🤖 Generated with Claude Code

The generated_poc/bundled tree is built from self-contained inlined JSON
schemas, so each bundled message re-emits its own copy of the shared
provenance/verification/asset sub-schemas. That accounts for ~92% of the
generated tree (~27.8k of ~30k class definitions), yet only one bundled
module is imported anywhere: protocol/get_adcp_capabilities_response (via
adcp.types.capabilities). consolidate_exports.py already excludes bundled
from the public namespace, and the tree is not loaded at import adcp.

Remove the 95 unreferenced bundled modules and add a post-generation prune
step to generate_types.py so regeneration does not recreate them. Generation
runs unchanged; the step only deletes unreferenced output, leaving every
retained module byte-identical.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes Jun 27, 2026

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right cleanup, correctly fenced. The bundled tree is generated dead weight that consolidate_exports.py already excludes from the public namespace — deleting it changes the committed source tree without touching the wire or the public API.

Things I checked

  • Only one bundled module has a real importer, and it survives. src/adcp/types/capabilities.py (L38–L128) imports solely from bundled/protocol/get_adcp_capabilities_response.py, which is in BUNDLED_KEEP. That module inlines its full $ref graph — zero intra-bundled imports — so deleting its 95 siblings can't dangle it.
  • No other runtime importer of bundled.*. client.py:267 and _generated.py:1888 import get_adcp_capabilities_request from the non-bundled protocol/ dir, not the deleted bundled/protocol/ one. The tests/test_migrate_v3_to_v4.py:142,781 hits are codemod string fixtures (bundled.x), not imports.
  • Retained module is byte-identical. get_adcp_capabilities_response.py is not in the diff — only deletions plus the script. The "generation runs unchanged" claim holds; this is prune-after-generate, not a hand-edit to generated source.
  • Prune logic is sound. relative_to(bundled_dir) in BUNDLED_KEEP is exact, parts-anchored Path equality — no prefix false-match over-deletes the kept module. Empty-dir cleanup is deepest-first (key=len(d.parts), reverse=True); bundled/__init__.py is kept so the root never empties. Idempotent, with an exists() guard. code-reviewer: clean.
  • Ordering is correct. Prune runs after post-generation fixes, before consolidation. consolidate_exports.py filters parts[0] != "bundled" (L234) and prefers non-bundled in its first-seen dedup, so removing the dead copies first can only reduce work — nothing downstream reads a pruned module.
  • No public-API surface change, no semver signal needed. chore(types): is the right prefix.

Follow-ups (non-blocking — file as issues)

  • Generation→prune integration not exercised end-to-end. Per the PR's own scope note, make regenerate-schemas hasn't been run to validate the generate-then-prune pass; the end-state and the prune function were verified directly (suite green, idempotent). The shipped artifact — the committed tree — is validated; the generator path that reproduces it is not. Run it once before the next regen relies on it.
  • One-time SCHEMA_DELTAS.md noise. diff_generated_types.snapshot() walks the whole tree, so the first post-merge regeneration diffs the old full-bundled committed tree against the freshly-pruned one and emits a large one-time "removed" delta. Self-corrects on every subsequent run. Worth a line in the PR body so the next regenerator isn't alarmed.
  • bundled/media_buy is still generated then pruned (PR already flags this). That split-generation pass is now dead weight in make regenerate-schemas and can be skipped.

Minor nits (non-blocking)

  1. Stale docstring cross-ref. src/adcp/decisioning/refine.py:57 has a :class: reference to adcp.types.generated_poc.bundled.media_buy, which this PR deletes. Docstring only — no runtime impact — but the Sphinx xref now points at nothing. Repoint it at the non-bundled RefinementApplied.status source.

27,791 fewer class definitions for one retained module is a notable ratio to have been carrying. Approving on the strength of the import audit plus code-reviewer clean.

The @adcp/sdk adcp-3.1 npm dist-tag is not published, so every storyboard
runner job fails at install with ETARGET "No matching version found". Track
the latest stable release via the latest dist-tag and keep the sticky
adcp-3.0 leg as a backwards-compat floor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix. Deletes 27.8k generated class definitions that were already excluded from the public namespace and never loaded at import adcp — the right cleanup, and it's data-driven (one bundled module is actually consumed, the other 95 are dead weight).

Things I checked

  • Only one bundled module is load-bearing. src/adcp/types/capabilities.py:38 is the sole importer of the bundled tree, and it imports exactly bundled/protocol/get_adcp_capabilities_response — which BUNDLED_KEEP retains. That module is self-contained (imports only adcp.types.base, _str_enum, pydantic — no intra-bundled refs), so deleting its 95 siblings can't break it.
  • Consolidation output is unchanged. consolidate_exports.py:234 and :270 hard-filter parts[0] != "bundled", so the bundled tree never entered the consolidated namespace, the collision map, or the first-seen dedup ordering. Pruning before consolidate_exports.py runs is a no-op for its output.
  • Prune ordering is correct. prune_unused_bundled_modules() runs after apply_post_generation_fixes() and before consolidation (scripts/generate_types.py:610-614). post_generate_fixes.py patches bundled/media_buy/*, so it must run before the prune deletes them — it does.
  • Prune is idempotent and safe. Empty-dir cleanup only rmdirs dirs where not any(directory.iterdir()); bundled/ and bundled/protocol/ keep retained files and survive. restore_unchanged_files() skips non-existent paths, so it won't git checkout-resurrect pruned output. Re-running over a pruned tree removes 0.
  • No public-API / semver impact. Nothing reachable from adcp.* changes — bundled was never on the public surface. chore(types): is the right prefix; no ! needed.
  • _generated.py does not import bundled — confirms the tree was never loaded at import time, as the PR claims.

Follow-ups (non-blocking — file as issues)

  • latest dist-tag is a moving target. Switching the storyboard matrix adcp-3.1latest means a new @adcp/sdk publish can turn the blocking storyboard-required-gate red with no change in this repo. Intended (the comment says so) and the sticky adcp-3.0 leg stays a reproducible floor — but worth a line in the PR body acknowledging the external-publish coupling.
  • Dead-weight split-generation pass. Per the PR body, regen still generates bundled/media_buy then prunes it. Skipping that split-generation pass would speed up make regenerate-schemas. Good follow-up.

Minor nits (non-blocking)

  1. Dangling docstring cross-ref. src/adcp/decisioning/refine.py:57 has a :class: ref to adcp.types.generated_poc.bundled.media_buy's RefinementApplied.status — a package this PR deletes. There's no Sphinx config in the repo so nothing resolves it at build or runtime; purely cosmetic. Repoint it at RefinementStatus or the wire-enum doc on a future pass.

Test-plan note

The PR body is honest that a full make regenerate-schemas round-trip (generation→prune integration) hasn't been run end-to-end — only the end-state deletion plus a green suite and idempotent-prune were verified directly. The prune is the kind of thing that only bites on a clean regen, so exercising that path before the next schema bump is the real validation. Non-blocking: the retained tree is byte-identical and 5,824/5,824 pass.

Approving on the strength of the consolidation-exclusion invariant plus the self-contained kept module — the deletion can't reach anything live.

@bokelley bokelley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the generated bundled-module pruning and live import surface. Local focused checks passed: tests/test_code_generation.py, tests/test_decisioning_capabilities_submodule.py, and tests/test_asset_aliases_stable.py all passed (89 tests), plus a syntax/import smoke for _generated.py and the retained bundled capabilities module. I also attempted a full local scripts/generate_types.py run; it remained active in datamodel-code-generator for ~13 minutes, so I stopped it and relied on the successful CI schema-validation job for the full regeneration environment.

@bokelley bokelley merged commit 2b9e315 into main Jun 28, 2026
40 of 41 checks passed
@bokelley bokelley deleted the restructure-dedup-completion branch June 28, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants