Skip to content

pass baseExperimentId to summarize#2124

Open
Barrett Pyke (barrettpyke) wants to merge 2 commits into
mainfrom
barrettpyke/fix-comparison-exp
Open

pass baseExperimentId to summarize#2124
Barrett Pyke (barrettpyke) wants to merge 2 commits into
mainfrom
barrettpyke/fix-comparison-exp

Conversation

@barrettpyke

@barrettpyke Barrett Pyke (barrettpyke) commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Description

Eval stores baseExperimentId correctly on the experiment but the final summary does not pass it as the explicit comparison ID. As a result, summary comparison can fall back to project/default baseline resolution and show wrong diffs.

Reproduction

  1. Navigate to Experiments in the Braintrust web app.
  2. Select an Experiment and click Set as default baseline.
  3. Run the following script:
import { Eval } from "braintrust";

const projectName = "<project-name>";
const suffix = Date.now();

console.log("creating baseline...");
const baseline = await Eval(projectName, {
  experimentName: `baseline-${suffix}`,
  data: [{ input: "hello", expected: "hello" }],
  task: (input) => input,
  scores: [
    ({ output, expected }) => ({
      name: "exact_match",
      score: output === expected ? 1 : 0,
    }),
  ],
  summarizeScores: false,
});

const baseExperimentId = baseline.summary.experimentId;
console.log("baseline experiment id:", baseExperimentId);

console.log("creating comparison with baseExperimentId...");
const comparison = await Eval(projectName, {
  experimentName: `comparison-${suffix}`,
  baseExperimentId,
  data: [{ input: "hello", expected: "hello" }],
  task: (input) => input,
  scores: [
    ({ output, expected }) => ({
      name: "exact_match",
      score: output === expected ? 1 : 0,
    }),
  ],
});

console.log("comparison experiment id:", comparison.summary.experimentId);
console.log("comparison baseline name:", comparison.summary.comparisonExperimentName);
console.log(JSON.stringify(comparison.summary, null, 2));

Expected
comparison-${timestamp} experiment should be compared to baseline-${timestamp} experiment

Observed
comparison-${timestamp} is compared to the default baseline experiment selected in Step 2

Fix

  • Use explicit baseExperimentId in summarize call. If undefined fallback to Experiment's persisted base_exp_id which may have been resolved during experiment init from baseExperimentName, BaseExperiment(...) data, or backend baseline resolution.
  • When experiment.summarize() receives an explicit comparisonExperimentId fetch v1/experiment/${comparisonExperimentId} to resolve the comparison experiment name for display.

Testing

  • Tested changes manually with repro steps for baseExperimentId and baseExperimentName
  • Added unit tests for:
    • runEvaluator forwards baseExperimentId to summary
    • runEvaluator forwards persisted baseExperimentName id to summary
    • experiment.summarize resolves explicit comparison experiment name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant