Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations
Testing MiniMax M3 on real tasks: repo refactor, screenshot debugging, and Spotify recommendations I got early access to MiniMax M3, so I plugged it into Claude Code and used it to work on a few tasks that I wanted to complete for some time: a code audit and refactor of my old web game, two UI bugs from it that I had been putting off, and a music-recommendation experiment built from my Spotify history. I used M3 for the implementation work, then asked Opus 4.8 to review it. M3 is the first open-weights model (will soon be fully open-sourced on HuggingFace and GitHub) to combine three things in one release: frontier-level coding and agentic ability, a 1M-token context window, and native multimodality. I reviewed MiniMax M2.7 earlier, and M3 is a clear step up from M2.7 in the areas I tested. M3 was most useful when I gave it concrete artifacts — a repo, tests, screenshots, and data exports. It did a lot of real work quickly, but an independent review still caught some regressions. What…
No comments yet. Log in to reply on the Fediverse. Comments will appear here.