Add Anchored Review Format standard for AI review systems#106
Open
dangng2004 wants to merge 1 commit into
Open
Add Anchored Review Format standard for AI review systems#106dangng2004 wants to merge 1 commit into
dangng2004 wants to merge 1 commit into
Conversation
An open output standard so conformant review systems plug into the benchmark with no per-system adapter. A review is a list of comments, each with a verbatim quote and an explanation (the only fields scoring depends on), with everything else optional. Includes two integration profiles (CLI for systems the benchmark runs locally, hosted API for closed systems), a stdlib-only validator, a stdlib-only reference client for the API profile, and minimal/full/invalid examples. Adds only standard/; no changes to the reviewer package, benchmark, or adapters. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces
standard/, an open output standard for AI paper-review systems so a conformant system plugs into the benchmark with no per-system adapter.quoteand anexplanation. Those are the only fields scoring depends on; everything else (title,severity,paragraph_index, paper metadata) is optional.profile-cli.md— open systems the benchmark runs locally (one command, paper in, payload out).profile-api.md— closed systems exposed as an async submit-and-poll HTTP API.validate.py— stdlib-only conformance checker (errors vs warnings,--strict, 0/1 exit).reference/review_client.py— stdlib-only caller for the API profile that doubles as a conformance test a system can run against its own staging endpoint.examples/— minimal, full, and an intentionally-invalid payload.Scope is one concern: adds only
standard/. No changes to the reviewer package, the benchmark, or the adapters.Follow-ups
openaireview-web-backend) conformant so our own hosted system dogfoods the standard and the reference client runs against it end to end. It is already async submit-and-poll, but today it uses/review+/status|/results, atokenfield, multipart+emailsubmission, and a nestedmethods.*.commentsbody, so it does not yet conform. The fix is a thinPOST /v1/reviews+GET /v1/reviews/{id}pair that reuses the existing worker and returns the flat payload.