Skip to content

Add Docker live baseline benchmark#119

Merged
yvette-carlisle merged 2 commits into
mainfrom
xy/full-live-baseline-benchmark
Jun 9, 2026
Merged

Add Docker live baseline benchmark#119
yvette-carlisle merged 2 commits into
mainfrom
xy/full-live-baseline-benchmark

Conversation

@yvette-carlisle

Copy link
Copy Markdown
Member

Summary

  • Add a Docker-isolated live baseline runner for ELF plus external memory-system baselines.
  • Publish a checked-in June 9, 2026 benchmark report and README benchmark snapshot.
  • Add a Markdown report publisher so future benchmark JSON runs can be promoted into reviewed docs.

Benchmark Evidence

  • ELF provider stress: Qwen3-Embedding-8B, 4096 dimensions, 480 docs, 16 queries, 8/8 checks, pass, 1163s.
  • All-project smoke: ELF and qmd pass all encoded checks; agentmemory passes retrieval but fails/incompletes lifecycle checks; mem0, memsearch, and claude-mem miss expected retrieval evidence; OpenViking is incomplete due local embed install failure.

Verification

  • cargo make fmt-check
  • cargo check -p elf-worker -p elf-eval --bin live_baseline_elf
  • cargo test -p elf-providers
  • docker compose -f docker-compose.baseline.yml config
  • bash -n scripts/live-baseline-benchmark.sh
  • bash -n scripts/live-baseline-report-to-md.sh
  • ELF_BASELINE_MARKDOWN_REPORT=tmp/live-baseline/rendered-report.md cargo make baseline-live-report
  • git diff --check / git diff --cached --check
  • semantic drift audit pass: README/docs claims matched generated JSON, Makefile tasks, Docker compose config, and runner/publisher scripts

Notes

The benchmark run artifacts stay under tmp/live-baseline/ and are not committed. The checked-in Markdown report is the reviewed publication artifact.

@yvette-carlisle yvette-carlisle merged commit badafe5 into main Jun 9, 2026
10 checks passed
@yvette-carlisle yvette-carlisle deleted the xy/full-live-baseline-benchmark branch June 9, 2026 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant