Skip to content

api: expand corpus coverage of the api round-trip pass-list #208

Description

@webern

Tracking issue for growing the number of corpus files that round-trip cleanly through the mx::api layer.

Background

The api round-trip is a standalone binary (mxtest-api-roundtrip, built from src/private/mxtest/api/CorpusRoundtripMain.cpp) with two modes:

  • regression (make test-api-roundtrip) — the CI gate. Runs only the files in src/private/mxtest/api/roundtrip-baseline.txt and fails if any fail. The baseline currently pins exactly one file (ksuite/k016a_Miscellaneous_Fields.xml).
  • discovery (make discover-api-roundtrip) — walks all ~829 eligible corpus files, prints PASS|FAIL|SKIP|LOADFAIL|GETDATAFAIL|CREATEFAIL per file, always exits 0. This exists to grow the pinned list.

Each file goes through createFromFile → getData → createFromScore → writeToStream, and the output is normalized and compared to the original with the same machinery corert uses (mxtest/corert/Compare + Fixer), after stripping mx's own <software> stamp.

"Increasing the number of round-tripped files" means: find files that pass discovery (now, or after a small api change) and add their paths to roundtrip-baseline.txt. Growth is intentionally manual and monotonic — once pinned, regression mode prevents backsliding.

The catch: mx::api is a deliberate subset of MusicXML. The historical capture in the baseline header shows that of 829 files, 794 produced output but only 1 survived strict comparison — because the api drops <part-group>, <credit>, most of <defaults>, and reorders <encoding> children. Nearly every real file diverges by design.

Key decision

Whether "round-trip" should mean strict full-DOM fidelity (today's rule — small candidate pool) or api-scoped fidelity (compare only the parts mx::api claims to model, ignoring known-dropped subtrees — far larger pool, more meaningful coverage signal). Recommendation: keep strict regression as the hard gate, and add an api-scoped discovery mode as the growth engine. This decision is resolved in Phase 1/Phase 5.

Goal

A repeatable analysis pipeline that buckets the corpus by "distance to passing," producing a ranked worklist of small api additions and the files each one unblocks — then executing those and growing the pinned baseline.

Phases

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    aiIssues opened by, or through, a coding agent.area/mx::apinon-breakingfixes or implementation that do not require breaking changestesting

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions