Skip to content

Add generator selection-bias analysis for the perturbation benchmark#100

Open
dangng2004 wants to merge 1 commit into
mainfrom
feat/selection-bias
Open

Add generator selection-bias analysis for the perturbation benchmark#100
dangng2004 wants to merge 1 commit into
mainfrom
feat/selection-bias

Conversation

@dangng2004

Copy link
Copy Markdown
Contributor

Quantifies selection bias in the perturbation generator: the LLM picks which candidate spans to perturb, so a selected subset whose feature distribution diverges from the candidate pool measures bias directly, with no new LLM calls.

Walks existing *_perturbations.json files, re-extracts the candidate pool with the same extractor (extract.py), and compares selected vs pool distributions.

🤖 Generated with Claude Code

Compares the feature distribution of perturbation spans the generator
selected against the full candidate pool it chose from. Random selection
would make the two distributions match, so divergence measures selection
bias directly, with no new LLM calls. Walks existing *_perturbations.json
files and re-extracts the candidate pool with the same extractor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dangng2004 dangng2004 marked this pull request as draft June 5, 2026 22:28
@dangng2004 dangng2004 marked this pull request as ready for review June 6, 2026 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant