Join us to solve the problem of cheating on AI Benchmarking

🎉

Our paper "Benchmarking is Broken - Don't Let AI be its Own Judge" just got accepted to NeurIPS 2025

Read the paper on arXiv
General Review
Review general prompts and help improve AI benchmarking quality
Logical Puzzles - English
Review puzzles and contribute to logical reasoning AI evaluation
AIME25 Mathematical Reasoning Benchmark - English
Review AIME25 mathematical reasoning prompts
Enhanced History Questions
Review history prompts: knowledge in combination with reasoning and math skills
Polish Language Mix of Tasks
Review Polish language prompts: culture, language, history, geography, and more
Ukrainian Grammar
Review Ukrainian grammar prompts: updated rules of Ukrainian grammar