Join us to solve the problem of cheating on AI Benchmarking
🎉
Our paper "Benchmarking is Broken - Don't Let AI be its Own Judge" just got accepted to NeurIPS 2025
Read the paper on arXivOur paper "Benchmarking is Broken - Don't Let AI be its Own Judge" just got accepted to NeurIPS 2025
Read the paper on arXiv