Skip to content

Drop label-duplicating synonyms (seeder fix + clean 39 records)#116

Merged
realmarcin merged 1 commit into
mainfrom
claude/synonym-hygiene
Jun 17, 2026
Merged

Drop label-duplicating synonyms (seeder fix + clean 39 records)#116
realmarcin merged 1 commit into
mainfrom
claude/synonym-hygiene

Conversation

@realmarcin

Copy link
Copy Markdown
Contributor

What

A quality diagnostic found 39 trait records with a synonym whose text exactly equals the label (e.g. aerobic.yaml label "aerobic" + synonym "aerobic"). All were seeder-introduced (source: metpo.owl) — the METPO seeder copied each class label into a same-text RELATED/EXACT synonym. Per OBO convention these are redundant noise (the label already represents that string; the synonym adds no information).

  • seed_from_metpo.py: at emit time, skip any synonym whose text equals the label (case-insensitive) and de-dupe repeated synonym_texts — the seeder no longer creates them.
  • Migrated the 39 existing records: removed the redundant synonym (37 RELATED + 2 EXACT), with a REMOVE_REDUNDANT_SYNONYM curation event per file.

Also checked (no action needed)

The same diagnostic confirmed the corpus is otherwise clean: 0 malformed evidence references, 0 duplicate synonyms, 0 duplicate causal edges, 0 duplicate labels across records. (The "105 TRAIT-not-an-edge-target" signal is a non-issue — those trait nodes are connected as edge subjects, a legitimate modeling direction.)

Verification

  • syn==label now 0. just validate-strict: 477 files, 0 errors. 90 tests pass. Seeder dry-run clean (writes nothing). Pages regenerated.

🤖 Generated with Claude Code

The METPO seeder copied each class label into a same-text RELATED/EXACT
synonym (source: metpo.owl) — redundant per OBO convention (the label already
represents that string; no information beyond it). 39 trait records carried
such a synonym.

- seed_from_metpo.py: at emit time, skip any synonym whose text equals the
  label (case-insensitive) and de-dupe repeated synonym_texts, so the seeder
  no longer introduces them.
- Migrated the 39 existing records: removed the redundant synonym (37
  RELATED + 2 EXACT), REMOVE_REDUNDANT_SYNONYM curation event per file.

syn==label now 0. validate-strict 0 errors; 90 tests pass; seeder dry-run
clean. Pages regenerated.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@realmarcin realmarcin merged commit 8ee3afb into main Jun 17, 2026
2 checks passed
@realmarcin realmarcin deleted the claude/synonym-hygiene branch June 17, 2026 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant