Join us to solve the problem of cheating on AI Benchmarking

🎉

Our paper "Benchmarking is Broken - Don't Let AI be its Own Judge" just got accepted to NeurIPS 2025 (Read on arXiv)

We will be publishing a series of subsequent papers on PeerBench and want to reward those who make it possible. Get your name in the paper by contributing to the community (create prompts, comment, review).

PeerBench.ai is an open-source, non-profit community implementation of the NeurIPS paper, bringing the research to life.

General Review

Review general prompts and help improve AI benchmarking quality

Logical Puzzles - English

Review puzzles and contribute to logical reasoning AI evaluation

AIME25 Mathematical Reasoning Benchmark - English

Review AIME25 mathematical reasoning prompts

Enhanced History Questions

Review history prompts: knowledge in combination with reasoning and math skills

Polish Language Mix of Tasks

Review Polish language prompts: culture, language, history, geography, and more

Ukrainian Grammar

Review Ukrainian grammar prompts: updated rules of Ukrainian grammar