Skip to content

fix(experiments): pass base_experiment_id to summarize#512

Merged
Abhijeet Prasad (AbhiPrasad) merged 2 commits into
mainfrom
barrettpyke/base-experiment-id
Jun 15, 2026
Merged

fix(experiments): pass base_experiment_id to summarize#512
Abhijeet Prasad (AbhiPrasad) merged 2 commits into
mainfrom
barrettpyke/base-experiment-id

Conversation

@barrettpyke

Copy link
Copy Markdown
Contributor

Description

Eval stores base_experiment_id correctly on the experiment but the final summary does not pass it as the explicit comparison ID. As a result, summary comparison can fall back to project/default baseline resolution and show wrong diffs.

Fix

Pass evaluator.base_experiment_id into experiment.summarize(comparison_experiment_id=...), so score and metric diffs are computed against the explicit experiment baseline.

Also resolve the explicit comparison experiment name so the returned summary displays the correct “compared to” name. Previously, comparison_experiment_id was None, so summarize() called POST /api/base_experiment/get_id; that resolver can apply UI/default-baseline behavior, including letting a project default baseline override the experiment’s explicit base_exp_id.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base_experiment_name also has the same bug here. We need to fix that too

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the JS SDK has the same issue. Could you follow up with that?

@AbhiPrasad Abhijeet Prasad (AbhiPrasad) merged commit 76ea5a6 into main Jun 15, 2026
82 checks passed
@AbhiPrasad Abhijeet Prasad (AbhiPrasad) deleted the barrettpyke/base-experiment-id branch June 15, 2026 19:32
@barrettpyke

Copy link
Copy Markdown
Contributor Author

Abhijeet Prasad (@AbhiPrasad) will do!

@AbhiPrasad

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval summary can compare against wrong baseline

2 participants