Feat/bot leaderboard/v2.3 followup by colesussmeier · Pull Request #4916 · Metaculus/metaculus

colesussmeier · 2026-06-19T20:51:16Z

Batch update for several parameter implementations, bug fixes, and logic updates

Always return tuple[float, float | None] instead of conditionally returning either a bare float or a tuple, so callers have a single shape to unpack. The second element stays None unless include_discrimination is set. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace the AIB project-id list duplicated inside gather_data with a single module-level AIB_PROJECT_IDS constant. Co-authored-by: Cursor <cursoragent@cursor.com>

Factor the question project filter into a project_filter Q object and add an aib_minibench_only flag that restricts the leaderboard to AIB and Minibench questions. Tag CSV output with the _AIBMiniB suffix. Co-authored-by: Cursor <cursoragent@cursor.com>

Add a min_human_forecasters threshold: on community questions with fewer than that many distinct human forecasters, keep the question but drop the Community Aggregate head-to-head matches. Do the same for minibench questions, which have no real human crowd (also skip building the aggregate for them in gather_data). Tag CSV output with _MinHF. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace the NotImplementedError with the per-year split for third-party bots: rewrite their head-to-head ids to year-tagged strings ("name (YYYY)"), parallel to the cp/pro aggregate split. This also drops them from non_metac_bot_ids membership so the per-year history bypasses the recency filter. Guard with an assert that include_non_metac_bots is set. Co-authored-by: Cursor <cursoragent@cursor.com>

Add participation_parent_key to map year-split player ids ("... (YYYY)") to their parent, and apply min_participation_count to the parent's combined question set. This keeps an established aggregate/bot from being dropped just because individual per-year slices are sparse. Co-authored-by: Cursor <cursoragent@cursor.com>

Add combine_year_split_players, which collapses per-year community/pro aggregates and non_metac_bots_by_year bots into a single combined entry (contribution-count-weighted mean skill, CI via SE propagation, summed counts), mirroring the front-end re-aggregation. Apply it to the leaderboard DB save and CSV output while keeping the per-year fit intact for the discrimination and distribution diagnostics. Co-authored-by: Cursor <cursoragent@cursor.com>

Set the Command.handle() run configuration to the current v2.3 defaults (include_minibench, min_human_forecasters, non_metac_bots_by_year, bot recency/score windows, ALS off, etc.) and wire the new aib_minibench_only / min_human_forecasters kwargs through the call. Move the explanatory comments off the function signature. Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai · 2026-06-19T20:51:23Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0f8d31f3-12f2-4e59-abfe-2fadd831f814

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/bot-leaderboard/v2.3-followup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-19T21:02:51Z

🚀 Preview Environment

Your preview environment is ready!

Resource	Details
🌐 Preview URL	https://metaculus-pr-4916-feat-bot-leaderboard-v2-3-foll-preview.mtcl.cc
📦 Docker Image	`ghcr.io/metaculus/metaculus:feat-bot-leaderboard-v2.3-followup-bb7f574`
🗄️ PostgreSQL	NeonDB branch `preview/pr-4916-feat-bot-leaderboard-v2-3-foll`
⚡ Redis	Fly Redis `mtc-redis-pr-4916-feat-bot-leaderboard-v2-3-foll`

Details

Commit: 8df5af98df968ca1dda8d769a4bf7804a78baa0d
Branch: feat/bot-leaderboard/v2.3-followup
Fly App: metaculus-pr-4916-feat-bot-leaderboard-v2-3-foll

ℹ️ Preview Environment Info

Isolation:

PostgreSQL and Redis are fully isolated from production
Each PR gets its own database branch and Redis instance
Changes pushed to this PR will trigger a new deployment

Limitations:

Background workers and cron jobs are not deployed in preview environments
If you need to test background jobs, use Heroku staging environments

Cleanup:

This preview will be automatically destroyed when the PR is closed

lsabor

great!

lsabor · 2026-06-19T21:22:40Z

+            .exclude(post__default_project__slug__startswith="minibench")
+            .annotate(
+                human_forecaster_count=Count(
+                    "user_forecasts__author",


doesn't matter in this case because the filter query below already joins the forecasts and User table in user_forecasts__author__is_bot=False, but if that weren't the case, you'd want to replace "user_forecasts__author" with "user_forecasts__author_id" to skip the join and just use the integer field.

lsabor · 2026-06-19T21:25:48Z

+    # Drop only the community-aggregate matches on low-human / minibench questions,
+    if drop_cp_question_ids:
+        keep = [
+            i
+            for i, (qid, u1, u2) in enumerate(
+                zip(question_ids, user1_ids, user2_ids)
+            )
+            if not (
+                qid in drop_cp_question_ids
+                and ("Community Aggregate" in (u1, u2))
+            )
+        ]
+        user1_ids = [user1_ids[i] for i in keep]
+        user2_ids = [user2_ids[i] for i in keep]
+        question_ids = [question_ids[i] for i in keep]
+        scores = [scores[i] for i in keep]
+        coverages = [coverages[i] for i in keep]
+        timestamps = [timestamps[i] for i in keep]


you should drop question ids before the gather_data step. Just filter our questions with those ids. That allows you to skip the gather data calculations on those questions instead of just calculating and tossing later.

lsabor · 2026-06-19T21:30:20Z

+    - score: contribution-count-weighted mean of the per-year skills, with
+      weight = max(distinct-question count, 1).
+    - CI: per-year half-widths are converted to SEs and propagated through the
+      same normalized weights -- se_combined = sqrt(Σ (wᵢ/W)² · seᵢ²), then
+      score ± z·se_combined. Combining estimates *narrows* the interval. CI is
+      dropped unless every member has both bounds.
+    - match count / distinct questions / coverage: combined across the year
+      slices (which are disjoint in questions).


Seems reasonable and I can't suggest an alternative. But just want to double check that it's "close enough" that we don't just have to recalculate the whole thing with and without the split.

Speaking of which, that is an option - if you wanted to validate the proximity of this combiner, you could take the same data with and without splitting and then recombine the output of the split and see the difference. If you've already done it, mentioning it here as the justification of this move would be reasonable.

colesussmeier and others added 8 commits June 19, 2026 15:14

Extract AIB_PROJECT_IDS module constant

1e8fbc5

Replace the AIB project-id list duplicated inside gather_data with a single module-level AIB_PROJECT_IDS constant. Co-authored-by: Cursor <cursoragent@cursor.com>

colesussmeier deployed to Preview June 19, 2026 21:00 — with GitHub Actions View deployment

lsabor approved these changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/bot leaderboard/v2.3 followup#4916

Feat/bot leaderboard/v2.3 followup#4916
colesussmeier wants to merge 8 commits into
feat/bot-leaderboard/v2.3from
feat/bot-leaderboard/v2.3-followup

colesussmeier commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review skipped

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

lsabor left a comment

Uh oh!

lsabor Jun 19, 2026

Uh oh!

lsabor Jun 19, 2026

Uh oh!

lsabor Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

colesussmeier commented Jun 19, 2026

Uh oh!

coderabbitai Bot commented Jun 19, 2026

Review skipped

Uh oh!

github-actions Bot commented Jun 19, 2026

🚀 Preview Environment

Details

Uh oh!

lsabor left a comment

Choose a reason for hiding this comment

Uh oh!

lsabor Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

lsabor Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

lsabor Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants