Why are cached input tokens cheaper with AI services?

0 ▲

1 hour ago · Tech · 0 comments

When you see AI model pricing pages, you usually see things broken down like this: ModelContext LengthMax CoT TokensMax Output TokensInput Price (Cache Hit)Input Price (Cache Miss)Output Pricedeepseek-chat64K-8K$0.07 / 1M tokens$0.27 / 1M tokens$1.10 / 1M tokensdeepseek-reasoner64K32K8K$0.14 / 1M tokens$0.55 / 1M tokens$2.19 / 1M tokens Source: DeepSeek API Docs If you manage to have most of your input tokens be cached, you save a huge amount, in this case $0.20 per million tokens. What does this mean though? What does caching do that makes you save so much, in some cases upwards of tens of kilodollars? Someone explain the cached vs not thing to me for how this is $10,000 worth of savings lol[image or embed]— Chimney Sweepers Local 420 FKA yburyug (@bobbby.online) June 12, 2026 at 12:39 AM WarningI'm gonna be totally honest, I barely understand the basic outline of the math involved here. Where possible I am to not be completely wrong here, but I'm not going to emit something 1:1…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.