Skip to content

XY-899: [ELF benchmark vNext P3] Add qmd and OpenViking strength-profile benchmarks#180

Merged
yvette-carlisle merged 11 commits into
mainfrom
y/elf-xy-899
Jun 11, 2026
Merged

XY-899: [ELF benchmark vNext P3] Add qmd and OpenViking strength-profile benchmarks#180
yvette-carlisle merged 11 commits into
mainfrom
y/elf-xy-899

Conversation

@yvette-carlisle

@yvette-carlisle yvette-carlisle commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

  • Add qmd/OpenViking strength-profile benchmark report artifacts with scenario-level outcome boundaries.
  • Preserve qmd debug/replay ergonomics as not-tested where no equivalent scored ELF surface exists.
  • Keep OpenViking trajectory strengths as research-gate/not_encoded behind typed wrong_result same-corpus evidence.
  • Align external adapter project-count semantics across generator, spec, tests, and measurement coverage audit.

Validation

  • cargo test -p elf-eval --test real_world_job_benchmark qmd_openviking_strength_profile_report_preserves_claim_boundaries --all-features
  • cargo test -p elf-eval --test real_world_job_benchmark real_world_report_includes_external_adapter_coverage_manifest --all-features
  • cargo test -p elf-eval --test real_world_job_benchmark current_benchmark_reports_preserve_live_sweep_boundaries --all-features
  • cargo make real-world-memory-live-adapters
  • ELF_BASELINE_PROJECTS=qmd,OpenViking cargo make baseline-live-docker
  • cargo make fmt
  • cargo make lint-fix
  • cargo make checks

Notes

  • Scoped baseline run exited 0; qmd passed and OpenViking remained typed wrong_result evidence, not a loosened pass.
  • Independent Decodex review checkpoint is clean for head 154bdbf.

@yvette-carlisle yvette-carlisle merged commit 2001703 into main Jun 11, 2026
13 checks passed
@yvette-carlisle yvette-carlisle deleted the y/elf-xy-899 branch June 11, 2026 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant