audit: classify api round-trip failures by measured divergence and rank a worklist by webern · Pull Request #225 · webern/mx

webern · 2026-06-20T19:20:47Z

Summary

Reworks the api round-trip failure classifier (audit/classify.py) to rank fixes purely by measured behavior, and produces the ranked worklist #212 asked for.

The classifier keyed its categories on the hand-authored support attribute in data/api.features.xml — a prediction of round-trip behavior, not a measurement, and demonstrably wrong (part-group was marked full yet dropped data, #224). Ranking built on a fallible prediction is untrustworthy.

What changed:

No api.features.xml. Whether a drop is intended is a present-day decision (api: define divergence policy and guardrails for round-trip corpus coverage #214), not something the classifier asserts.
Every difference becomes a signature: drop:/add:/value:/attr:/reorder:. A file's distance to passing is its count of unique signatures (a tag dropped many times counts once).
value/attr come from an alignment walk that survives sibling drops; drop/add stay on the O(n) multiset.
Status is PASS/FAIL/CRASH. A FAIL with no reorder is a candidate; reorders are expected mx::api behavior, deferred to test normalization (api: define divergence policy and guardrails for round-trip corpus coverage #214, api reorders identification/encoding children on round-trip #220).
The worklist ranks signatures by the candidate files they are the sole blocker of; a greedy batch plan answers "minimal changes -> most files" directly; the report adds a distance histogram and distance-1..3 near-miss buckets.

Ran natively over the corpus: 828 failing files, 0 crashes, 550 reorder-free candidates. The top signals are add: — mx::api injecting elements the source lacked (encoding, identification, type) — which the old support-based classifier could never surface. The first two fixes (stop emitting an empty <encoding/>, preserve part-name/@print-object) land 26 files; 15 fixes land 71.

The individual fixes are filed as separate issues under #208 / #213.

Testing

make test-audit (classifier unit tests): 19 cases, green
Rewrote the classifier tests for the measured model (signatures, distance, candidate/reorder split, sole-blocker worklist, greedy batch plan)
End-to-end: built mxtest-api-roundtrip, ran make dump-api-roundtrip (828 files) and make classify-api-roundtrip — produces the worklist + batch plan

References

Closes api: rank round-trip failure fixes by files unblocked vs effort #212
Progresses api: expand corpus coverage of the api round-trip pass-list #208; feeds api: implement ranked fixes and grow the round-trip corpus baseline #213
Reworks PR feat: dump and classify api round-trip failures #217; motivated by fix: part-group round-trip fidelity, and a classifier category for supported-element drops #224
Relates to api: define divergence policy and guardrails for round-trip corpus coverage #214 / api reorders identification/encoding children on round-trip #220 (reorders deferred to divergence policy); changes the premise of investigate api dropping elements marked support=full on round-trip #219 / track footnote and level support in api.features.xml #221 (the api.features.xml cross-reference they describe is removed)

…upport opinion The classifier keyed categories (B/D/E/G) on the hand-authored `support` attribute in data/api.features.xml -- a prediction of round-trip behavior, not a measurement, and demonstrably wrong (part-group was marked full yet dropped data, #224). Ranking built on a fallible prediction is untrustworthy. Rework classification to be grounded only in what each expected/actual pair actually shows: - Drop the api.features.xml cross-reference entirely. Whether a drop is intended is a present-day human call (#214), not something the classifier asserts. - Reduce every difference to a signature (drop/add/value/attr/reorder); a file's distance to passing is its count of unique signatures (a tag dropped many times is one signature). - value/attr now come from the alignment walk (recurse SequenceMatcher equal blocks) so they survive sibling drops; drops/adds stay on the O(n) multiset. - Status is PASS/FAIL/CRASH. A FAIL with no reorder is a candidate; reorders are expected mx::api behavior, deferred to test normalization (#214). - Worklist ranks signatures by candidate files they are the sole blocker of, then by total files blocked; report adds a distance histogram and distance-1..3 near-miss buckets so small fix-sets are visible. Rewrite the classifier tests for the measured model and update the design doc, audit/README, the explain-api-roundtrip skill, and the CLI help to match.

The signature worklist ranks fixes independently, but most candidate files need several fixes before they pass strict comparison. Add build_batch_plan: a greedy set-cover that, at each step, picks the signature clearing the most candidate files outright, answering #212's "minimal changes -> most files" directly. The report gains a batch_plan section and the stdout summary prints it. On the current corpus the first two fixes -- stop emitting an empty <encoding/> and preserve part-name/@print-object -- land 26 files; 15 fixes land 71.

github-actions · 2026-06-20T19:21:48Z

gen-quality `gen/`

gen-quality: 84.5 / 100   (floor 84.5, +0.0)

  structure     86.5  x0.50   [fn 90.5 / file 82.6]
  cyclomatic    88.4  x0.25
  cognitive     76.6  x0.25

  409 functions across 31 files, 7702 lines (largest file 1044)
  max cc 56  max cognitive 44  max fn loc 152

Worst offenders (top 5 per axis; full lists in score.json):
  cyclomatic gen/xsd/analyze.py:311     report                             56
  cyclomatic gen/plates/build.py:956    _validate_config_against_ir        35
  cyclomatic gen/press/context.py:145   plate_context                      34
  cyclomatic gen/__main__.py:46         _ir                                23
  cyclomatic gen/tests/test_ir.py:102   _check_references                  20
  cognitive  gen/xsd/analyze.py:311     report                             44
  cognitive  gen/ir/resolve.py:119      flat_elements                      40
  cognitive  gen/tests/test_ir.py:102   _check_references                  38
  cognitive  gen/press/context.py:145   plate_context                      37
  cognitive  gen/xsd/analyze.py:207     _sccs                              37
  size       gen/xsd/analyze.py:311     report                             152
  size       gen/press/context.py:145   plate_context                      96
  size       gen/plates/build.py:533    _value_plate                       89
  size       gen/plates/build.py:956    _validate_config_against_ir        89
  size       gen/ir/resolve.py:119      flat_elements                      78

Commit 6d4db346b0eb9ec5b90b2551208453b9c343e0dd.

github-actions · 2026-06-20T19:35:03Z

Coverage report

Core-dev coverage `src/private/mx/core/`

Metric	Coverage	Covered / Total
Lines	77.9%	28539 / 36624
Functions	74.4%	6360 / 8550
Branches	50.7%	22672 / 44725

API coverage `src/private/mx/{api,impl,utility}/`

Metric	Coverage	Covered / Total
Lines	72.7%	5428 / 7468
Functions	60.3%	1831 / 3034
Branches	43.7%	4532 / 10375

Core HTML report | API HTML report

Commit 6d4db346b0eb9ec5b90b2551208453b9c343e0dd.

webern added 2 commits June 20, 2026 18:19

webern added testing non-breaking fixes or implementation that do not require breaking changes ai Issues opened by, or through, a coding agent. labels Jun 20, 2026 — with Claude

This was referenced Jun 20, 2026

test: api round-trip expects mx's attribution stamp instead of stripping it #232

Merged

fix: round-trip part-name/print-object instead of forcing print-object="no" #233

Merged

webern merged commit 9c19b20 into main Jun 20, 2026
7 checks passed

webern deleted the claude/tender-mccarthy-ye7mii branch June 20, 2026 20:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audit: classify api round-trip failures by measured divergence and rank a worklist#225

audit: classify api round-trip failures by measured divergence and rank a worklist#225
webern merged 2 commits into
mainfrom
claude/tender-mccarthy-ye7mii

webern commented Jun 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

github-actions Bot commented Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

webern commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

References

Uh oh!

github-actions Bot commented Jun 20, 2026

gen-quality gen/

Uh oh!

github-actions Bot commented Jun 20, 2026

Coverage report

Core-dev coverage src/private/mx/core/

API coverage src/private/mx/{api,impl,utility}/

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

webern commented Jun 20, 2026 •

edited

Loading

gen-quality `gen/`

Core-dev coverage `src/private/mx/core/`

API coverage `src/private/mx/{api,impl,utility}/`