Skip to content

XY-861: [ELF benchmark vNext P1] Encode project-decision real-world memory suite#151

Merged
yvette-carlisle merged 2 commits into
mainfrom
y/elf-xy-861
Jun 10, 2026
Merged

XY-861: [ELF benchmark vNext P1] Encode project-decision real-world memory suite#151
yvette-carlisle merged 2 commits into
mainfrom
y/elf-xy-861

Conversation

@yvette-carlisle

Copy link
Copy Markdown
Member

Summary:

  • Add checked-in project_decisions real-world memory fixtures for accepted decisions, reversals, current validation policy, tradeoff rationale, and bounded caveats.
  • Surface per-job answer_type, requires_caveat, requires_refusal, and can_answer_unknown in benchmark reports.
  • Add Makefile/docs/test coverage for the project_decisions suite and aggregate expectations.

Validation:

  • cargo make fmt
  • cargo make lint-fix
  • cargo make checks
  • cargo make real-world-memory

Benchmark readback:

  • project_decisions: status=pass, encoded_job_count=5, score_mean=1.0, expected_evidence_recall=1.0

@yvette-carlisle yvette-carlisle merged commit 0b8f357 into main Jun 10, 2026
10 checks passed
@yvette-carlisle yvette-carlisle deleted the y/elf-xy-861 branch June 10, 2026 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant