How the five AIs actually did

0 ▲

2 hours ago · 12 min read2302 words · Life · hide · 0 comments

The Group Stage of the 2026 World Cup finished today, so it’s time to mark the homework. Back in the launch post five AI models, a deterministic lookup table, and one football fan predicted the same 72 fixtures before a ball was kicked. I’ve been scoring the results on AIWC26 each day as they came in. I’ve got no real opinion on the teams; I’m interested in what the scoring reveals about the models. Final table Entrant Points Exact scores Correct results Claude Sonnet 4.6 67 10 37 Claude Fable 5 64 9 37 Gemini 3.1 Pro 62 8 38 Gemini 3.5 Flash 62 10 32 Ranking Bot 59 7 38 GPT-5.5 52 6 34 Human 1.0 52 6 34 The lookup table-powered Ranking Bot beat one of the five models. Twenty lines of Python, no training data, no judgement. It still out-scored GPT-5.5 and the football fan who claimed to care about this in the first place. GPT-5.5 and the human didn’t just tie on points, they tied on the breakdown too, 6 exact scores and 34 correct results each, arrived at by a model reasoning…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.