Loading
Loading
This Benchmark tests AI’s ability to handle sequential decision-making in business contexts, focusing on game theory and strategic reasoning failures. The benchmark tests understanding of sequential-move (Stackelberg-type) games in microeconomics and the correct application of backward induction to find subgame perfect equilibria. Typical AI failure patterns include: 1.Incorrect order of reasoning (e.g., treating the follower as the leader), 2.Ignoring subgame perfect equilibrium logic, 3.Introducing irrelevant “reputation” or “collusion” effects not in the prompt, 4.Jumping directly to intuitive but non-equilibrium answers.
45
Total Prompts
946
Scored Responses
10
Contributors
36%
Average Overall Score
| Rank | Model | Avg. Score | Prompts Tested | Avg. Response Time |
|---|---|---|---|---|
| Rank | Model | Avg. Score | Prompts Tested | Avg. Response Time |
|---|---|---|---|---|
1 | google/gemini-3-pro-preview | 0.81 | 37 | 39ms |
2 | openai/gpt-5.2 | 0.73 | 37 | 16ms |
3 | x-ai/grok-4.1-fast | 0.73 | 37 | 42ms |
4 | x-ai/grok-4 | 0.68 | 37 | 145ms |
5 | qwen/qwen3-235b-a22b-2507 | 0.59 | 37 | 4ms |