Record id-label residual triage: validate-terms-all blocked upstream#159
Merged
Conversation
…134 follow-up) Triaged all 34 curator-accepted residuals against the current OAK snapshot. Finding: each residual id resolves to an UNRELATED entity in the current ChEBI/ENVO/NCBITaxon build, and broad `runoak search` finds no repoint target for the intended label — the compound/environment is genuinely absent from the ontology and the placeholder id is semantically wrong (e.g. CHEBI:33104 "chromium(III) hydroxide" → hydridoarsenic; ENVO:00000274 "soda lake" → continental rise). Only near-match: sodium metasilicate → CHEBI:60720 sodium silicate (generalization; not applied). Conclusion: enabling `validate-terms-all` as a blocking gate is genuinely upstream-blocked — minting ChEBI/ENVO/GO/NCBITaxon terms is an external OBO process. Recorded the real fix path (term requests + GO repoints) and the data-quality caveat (these groundings carry the wrong id in the KGX export) in NEXT_TASKS.md item 1. No code/data change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Option 2 of the plan ("mint/clean the 34 residuals so validate-terms-all can go blocking") — triage result.
Re-checked all 34 curator-accepted
exceptionsresiduals against the current OAK snapshot. The finding is sharper than the old "needs minting" reasons suggest: every residual id resolves to an unrelated entity in the current ChEBI/ENVO/NCBITaxon build, and broadrunoak searchfinds no repoint target for the intended label — the compound/environment is genuinely absent from the ontology, and the placeholder id is semantically wrong.Examples:
CHEBI:33104"chromium(III) hydroxide" → hydridoarsenic(2.) (triplet)CHEBI:34818"humic acid" → Leucomycin A8CHEBI:89981"yeast extract" → LPS with O-antigenENVO:00000274"soda lake" → continental riseNCBITaxon:3050471"Stenotrophomonas goyi" → unclassified DissulfuribacterOnly near-repoint: sodium metasilicate (
CHEBI:86154) →CHEBI:60720"sodium silicate" (a generalization that changes the label; not applied).Conclusion
validate-terms-allcannot be made a blocking gate now — minting ChEBI/ENVO/GO/NCBITaxon terms is an external OBO process I can't perform. Documented inNEXT_TASKS.mditem 1: the real fix path (term requests for ~9 CHEBI + 3 ENVO + 2 NCBITaxon; GO obsolete repoints), and the data-quality caveat that these groundings carry the wrong id in the KGX export (worth tracking separately).Doc-only; no code/data change.
validate-productsstays green via theexceptionsallow-list.🤖 Generated with Claude Code