Skip to content

Record id-label residual triage: validate-terms-all blocked upstream#159

Merged
realmarcin merged 1 commit into
mainfrom
docs/idlabel-residual-triage
Jun 18, 2026
Merged

Record id-label residual triage: validate-terms-all blocked upstream#159
realmarcin merged 1 commit into
mainfrom
docs/idlabel-residual-triage

Conversation

@realmarcin

Copy link
Copy Markdown
Contributor

Option 2 of the plan ("mint/clean the 34 residuals so validate-terms-all can go blocking") — triage result.

Re-checked all 34 curator-accepted exceptions residuals against the current OAK snapshot. The finding is sharper than the old "needs minting" reasons suggest: every residual id resolves to an unrelated entity in the current ChEBI/ENVO/NCBITaxon build, and broad runoak search finds no repoint target for the intended label — the compound/environment is genuinely absent from the ontology, and the placeholder id is semantically wrong.

Examples:

  • CHEBI:33104 "chromium(III) hydroxide" → hydridoarsenic(2.) (triplet)
  • CHEBI:34818 "humic acid" → Leucomycin A8
  • CHEBI:89981 "yeast extract" → LPS with O-antigen
  • ENVO:00000274 "soda lake" → continental rise
  • NCBITaxon:3050471 "Stenotrophomonas goyi" → unclassified Dissulfuribacter

Only near-repoint: sodium metasilicate (CHEBI:86154) → CHEBI:60720 "sodium silicate" (a generalization that changes the label; not applied).

Conclusion

validate-terms-all cannot be made a blocking gate now — minting ChEBI/ENVO/GO/NCBITaxon terms is an external OBO process I can't perform. Documented in NEXT_TASKS.md item 1: the real fix path (term requests for ~9 CHEBI + 3 ENVO + 2 NCBITaxon; GO obsolete repoints), and the data-quality caveat that these groundings carry the wrong id in the KGX export (worth tracking separately).

Doc-only; no code/data change. validate-products stays green via the exceptions allow-list.

🤖 Generated with Claude Code

…134 follow-up)

Triaged all 34 curator-accepted residuals against the current OAK snapshot.
Finding: each residual id resolves to an UNRELATED entity in the current
ChEBI/ENVO/NCBITaxon build, and broad `runoak search` finds no repoint target for
the intended label — the compound/environment is genuinely absent from the
ontology and the placeholder id is semantically wrong (e.g. CHEBI:33104
"chromium(III) hydroxide" → hydridoarsenic; ENVO:00000274 "soda lake" →
continental rise). Only near-match: sodium metasilicate → CHEBI:60720 sodium
silicate (generalization; not applied).

Conclusion: enabling `validate-terms-all` as a blocking gate is genuinely
upstream-blocked — minting ChEBI/ENVO/GO/NCBITaxon terms is an external OBO
process. Recorded the real fix path (term requests + GO repoints) and the
data-quality caveat (these groundings carry the wrong id in the KGX export) in
NEXT_TASKS.md item 1. No code/data change.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@realmarcin realmarcin merged commit 0538f88 into main Jun 18, 2026
@realmarcin realmarcin deleted the docs/idlabel-residual-triage branch June 18, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant