Tracking issue for growing the number of corpus files that round-trip cleanly through the mx::api layer.
Background
The api round-trip is a standalone binary (mxtest-api-roundtrip, built from src/private/mxtest/api/CorpusRoundtripMain.cpp) with two modes:
regression (make test-api-roundtrip) — the CI gate. Runs only the files in src/private/mxtest/api/roundtrip-baseline.txt and fails if any fail. The baseline currently pins exactly one file (ksuite/k016a_Miscellaneous_Fields.xml).
discovery (make discover-api-roundtrip) — walks all ~829 eligible corpus files, prints PASS|FAIL|SKIP|LOADFAIL|GETDATAFAIL|CREATEFAIL per file, always exits 0. This exists to grow the pinned list.
Each file goes through createFromFile → getData → createFromScore → writeToStream, and the output is normalized and compared to the original with the same machinery corert uses (mxtest/corert/Compare + Fixer), after stripping mx's own <software> stamp.
"Increasing the number of round-tripped files" means: find files that pass discovery (now, or after a small api change) and add their paths to roundtrip-baseline.txt. Growth is intentionally manual and monotonic — once pinned, regression mode prevents backsliding.
The catch: mx::api is a deliberate subset of MusicXML. The historical capture in the baseline header shows that of 829 files, 794 produced output but only 1 survived strict comparison — because the api drops <part-group>, <credit>, most of <defaults>, and reorders <encoding> children. Nearly every real file diverges by design.
Key decision
Whether "round-trip" should mean strict full-DOM fidelity (today's rule — small candidate pool) or api-scoped fidelity (compare only the parts mx::api claims to model, ignoring known-dropped subtrees — far larger pool, more meaningful coverage signal). Recommendation: keep strict regression as the hard gate, and add an api-scoped discovery mode as the growth engine. This decision is resolved in Phase 1/Phase 5.
Goal
A repeatable analysis pipeline that buckets the corpus by "distance to passing," producing a ranked worklist of small api additions and the files each one unblocks — then executing those and growing the pinned baseline.
Phases
Related
Tracking issue for growing the number of corpus files that round-trip cleanly through the
mx::apilayer.Background
The api round-trip is a standalone binary (
mxtest-api-roundtrip, built fromsrc/private/mxtest/api/CorpusRoundtripMain.cpp) with two modes:regression(make test-api-roundtrip) — the CI gate. Runs only the files insrc/private/mxtest/api/roundtrip-baseline.txtand fails if any fail. The baseline currently pins exactly one file (ksuite/k016a_Miscellaneous_Fields.xml).discovery(make discover-api-roundtrip) — walks all ~829 eligible corpus files, printsPASS|FAIL|SKIP|LOADFAIL|GETDATAFAIL|CREATEFAILper file, always exits 0. This exists to grow the pinned list.Each file goes through
createFromFile → getData → createFromScore → writeToStream, and the output is normalized and compared to the original with the same machinery corert uses (mxtest/corert/Compare+Fixer), after stripping mx's own<software>stamp."Increasing the number of round-tripped files" means: find files that pass discovery (now, or after a small api change) and add their paths to
roundtrip-baseline.txt. Growth is intentionally manual and monotonic — once pinned, regression mode prevents backsliding.The catch:
mx::apiis a deliberate subset of MusicXML. The historical capture in the baseline header shows that of 829 files, 794 produced output but only 1 survived strict comparison — because the api drops<part-group>,<credit>, most of<defaults>, and reorders<encoding>children. Nearly every real file diverges by design.Key decision
Whether "round-trip" should mean strict full-DOM fidelity (today's rule — small candidate pool) or api-scoped fidelity (compare only the parts
mx::apiclaims to model, ignoring known-dropped subtrees — far larger pool, more meaningful coverage signal). Recommendation: keep strict regression as the hard gate, and add an api-scoped discovery mode as the growth engine. This decision is resolved in Phase 1/Phase 5.Goal
A repeatable analysis pipeline that buckets the corpus by "distance to passing," producing a ranked worklist of small api additions and the files each one unblocks — then executing those and growing the pinned baseline.
Phases
Related
DirectionWriterignoresorderedComponents— interleaved direction-type children lose original order on write #206, impl:parseWordsnever setsWordsData.isColorSpecified— words color does not round-trip #207 (round-trip fidelity bugs surfaced in the impl layer)<divisions>and dropped direction<voice>#237 (segno/coda probes blocked from the strict gate by empty-measure<divisions>+ dropped direction<voice>)