Claude Opus 4.8 vs Claude Fable 5 benchmark: prompt steering, cost, and latency results

0 ▲

4 hours ago · Tech · 0 comments

In April I published a benchmark article where one steering sentence - Do not invoke any tools. Answer from your own knowledge and reasoning only. - cut Opus 4.7’s session cost by 63 percent, while think-step-by-step and ultrathink pushed the same model’s cost up. The headline was that prompt steering for Opus 4.7 with adaptive thinking enabled, had become model-specific: a wrapper that saved money on one Opus version burned money on the next.Two model generations later, I re-ran the same 200-run matrix on Claude Opus 4.8 [1m] high and Claude Fable 5 [1m] high - 1-million-token context versions, meaning the model can hold roughly a million tokens of conversation and files in memory at once and used my Claude Code session-metrics plugin to track and record each of the benchmark sessions token usage and cost breakdown. The model-specific inversion survived, but it moved. ultrathink now leaves Opus 4.8 at high effort essentially flat at -2.6 percent while inflating Fable 5 at high effort…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.