14 hours ago · Tech · 0 comments

I’ve heard some buzz around the new GLM 5.2 open-weights model. They say it’s very capable! I won’t run a full comparison benchmark, but I have some credits sloshing around on OpenRouter so I figured I might compare GLM 5.2 to the similarly-priced Gemini 3 Flash, and see where things land. This uses the same setup as the previous benchmark: each LLM gets a few attempts at playing the game, with each attempt being limited to a fixed budget of around $0.15. The LLM doesn’t know it, but the harness tracks achievements for each game, and counts how many the LLM earns in each attempt. Here are the number of attempts for each game in this run. (Continue reading the full article on the web.)

No comments yet. Log in to reply on the Fediverse. Comments will appear here.