Promote per-source thermodynamics to canonical deltag (+ retrained dGPredictor-ModelSEED source) by freiburgermsu · Pull Request #265 · ModelSEED/ModelSEEDDatabase

freiburgermsu · 2026-06-17T22:24:54Z

Summary

Gives 14,141 reactions a canonical free-energy value (deltag/deltagerr/reversibility) that they were missing — purely by re-aggregating estimates that already exist in the additive thermodynamics dict, plus the retrained dGPredictor source that supplies some of them.

Background: the promotion gap

After the additive-thermodynamics refactor, the Update_Reaction_*_Energies.py scripts write only into each reaction's additive thermodynamics dict and no longer populate the canonical top-level deltag/deltagerr. As a result thousands of reactions carried a perfectly good computed energy in thermodynamics while their canonical deltag stayed the 10000000 sentinel (and reversibility "?"), so they read as thermodynamically undefined despite the value already existing.

What this PR does

Two commits:

dGPredictor-ModelSEED source (existing additive work) — the dGPredictor group-contribution model retrained on ModelSEED structures, recorded as its own per-method entry in thermodynamics. Additive; supplies energies for reactions the other sources miss.
Promotion — new Scripts/Thermodynamics/Promote_Reaction_Thermodynamics_to_Canonical.py re-aggregates the stored per-source estimates into the canonical fields. Pure re-aggregation: no new estimation, no external dependencies.
- Only reactions with a missing canonical deltag are touched; existing canonical values are never overwritten.
- Selection: prefer the mechanistic/measurement-anchored tier (eQuilibrator → Group contribution) over the ML tier (dGPredictor-ModelSEED, dGPredictor); within the chosen tier take the lowest-uncertainty estimate. The within-tier lowest-error rule stops a wildly-uncertain ML outlier (e.g. -100 ± 71 kcal/mol) being promoted over a tight estimate (-8.6 ± 0.04).
- Guards reject implausible magnitudes (|dG| > 1000) and useless uncertainties (> 100 kcal/mol), leaving those undefined rather than promoting garbage.
- reversibility is set to the chosen estimate's own direction operator (same heuristic as Estimate_Reaction_Reversibility.py, already stored with each per-source energy).

Result

14,141 reactions promoted: Group contribution 1,474 · dGPredictor 8,635 · dGPredictor-ModelSEED 4,032.

Verified: every promoted deltag equals one of that reaction's stored per-source energies; zero pre-existing canonical values, thermodynamics dicts, or other fields changed. (.tsv files update because deltag/deltagerr/reversibility are TSV columns.)

The source-precedence policy is a single editable TIERS constant; ~732 reactions have >50 kcal/mol cross-source disagreement and are worth a curator spot-check (the policy resolves them by tier + lowest error, not averaging).

🤖 Generated with Claude Code

…urce Records the dGPredictor group-contribution model retrained on the ModelSEED compound structures as its own per-method entry, "dGPredictor-ModelSEED", in each reaction's `thermodynamics` dict. Purely additive: it sits next to the Group contribution / eQuilibrator / (original KEGG-based) dGPredictor records, and the original "dGPredictor" entry is left untouched. The canonical deltag / deltagerr / reversibility are not changed, and no .tsv or compound files change. - New staged predictions: Biochemistry/Thermodynamics/dGPredictor/ modelseed_retrained_dG.json (31,924 reactions, kJ/mol). - New writer: Scripts/Thermodynamics/Update_Reaction_dGPredictor_ModelSEED_ Energies.py (kJ->kcal /4.184; operator via reversibility_from_energy). - 31,924 reactions gain a dGPredictor-ModelSEED record (incl. ~11,400 the original KEGG-based dGPredictor could not reach); 24,088 reactions unchanged. - Verified: every modified reaction differs from dev ONLY by the added dGPredictor-ModelSEED key; added values equal dG_mean/4.184; the writer is idempotent. - Docs: sources.yaml, Scripts/Thermodynamics/README.md, Rerun_Thermodynamics.sh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…141 reactions) After the additive-thermodynamics refactor, the Update_Reaction_*_Energies.py scripts write only into each reaction's additive `thermodynamics` dict and no longer populate the canonical top-level deltag/deltagerr. As a result 14,141 non-EMPTY reactions carried a perfectly good computed energy in `thermodynamics` while their canonical deltag stayed the 10000000 sentinel (reversibility "?"), so they read as thermodynamically undefined despite the value already existing. New Scripts/Thermodynamics/Promote_Reaction_Thermodynamics_to_Canonical.py re-aggregates those existing per-source estimates into the canonical fields. It is pure re-aggregation -- no new estimation, no external dependencies: - Only reactions whose canonical deltag is missing are touched; existing canonical values are never overwritten. - Selection: prefer the mechanistic/measurement-anchored tier (eQuilibrator, then Group contribution) over the ML tier (dGPredictor-ModelSEED, dGPredictor); WITHIN the chosen tier take the lowest-uncertainty estimate. The within-tier lowest-error rule prevents a wildly-uncertain ML outlier (e.g. -100 +/- 71 kcal/mol) from being promoted over a tight estimate (-8.6 +/- 0.04). - Guards reject implausible magnitudes (|dG| > 1000 kcal/mol) and useless uncertainties (> 100 kcal/mol), leaving those reactions undefined rather than promoting garbage. - deltagerr is set from the chosen source and reversibility is set to that estimate's own direction operator (same heuristic as Estimate_Reaction_ Reversibility, already stored alongside each per-source energy). Promoted 14,141 reactions: Group contribution 1,474; dGPredictor 8,635; dGPredictor-ModelSEED 4,032. Verified: every promoted deltag equals one of the reaction's stored per-source energies; zero pre-existing canonical values, thermodynamics dicts, or other fields changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

freiburgermsu and others added 2 commits June 10, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Promote per-source thermodynamics to canonical deltag (+ retrained dGPredictor-ModelSEED source) - #265

Promote per-source thermodynamics to canonical deltag (+ retrained dGPredictor-ModelSEED source)#265
freiburgermsu wants to merge 2 commits into
ModelSEED:devfrom
freiburgermsu:promote-thermodynamics-to-canonical-deltag

freiburgermsu commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

freiburgermsu commented Jun 17, 2026

Summary

Background: the promotion gap

What this PR does

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant