Skip to content

Add Anchored Review Format standard for AI review systems#106

Open
dangng2004 wants to merge 1 commit into
mainfrom
add-anchored-review-standard
Open

Add Anchored Review Format standard for AI review systems#106
dangng2004 wants to merge 1 commit into
mainfrom
add-anchored-review-standard

Conversation

@dangng2004

Copy link
Copy Markdown
Contributor

Summary

Introduces standard/, an open output standard for AI paper-review systems so a conformant system plugs into the benchmark with no per-system adapter.

  • Payload: a review is a list of comments, each with a verbatim quote and an explanation. Those are the only fields scoring depends on; everything else (title, severity, paragraph_index, paper metadata) is optional.
  • Two integration profiles:
    • profile-cli.md — open systems the benchmark runs locally (one command, paper in, payload out).
    • profile-api.md — closed systems exposed as an async submit-and-poll HTTP API.
  • validate.py — stdlib-only conformance checker (errors vs warnings, --strict, 0/1 exit).
  • reference/review_client.py — stdlib-only caller for the API profile that doubles as a conformance test a system can run against its own staging endpoint.
  • examples/ — minimal, full, and an intentionally-invalid payload.

Scope is one concern: adds only standard/. No changes to the reviewer package, the benchmark, or the adapters.

Follow-ups

  • Make the openaireview web backend (openaireview-web-backend) conformant so our own hosted system dogfoods the standard and the reference client runs against it end to end. It is already async submit-and-poll, but today it uses /review + /status|/results, a token field, multipart+email submission, and a nested methods.*.comments body, so it does not yet conform. The fix is a thin POST /v1/reviews + GET /v1/reviews/{id} pair that reuses the existing worker and returns the flat payload.

An open output standard so conformant review systems plug into the benchmark
with no per-system adapter. A review is a list of comments, each with a verbatim
quote and an explanation (the only fields scoring depends on), with everything
else optional.

Includes two integration profiles (CLI for systems the benchmark runs locally,
hosted API for closed systems), a stdlib-only validator, a stdlib-only reference
client for the API profile, and minimal/full/invalid examples.

Adds only standard/; no changes to the reviewer package, benchmark, or adapters.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dangng2004 dangng2004 requested a review from chenhaot July 1, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant