3 hours ago · Tech · 0 comments

DeepSeek-V4 Review: Why Million-Token Context Needs Efficient Attention, Not Just Larger Windows Paper Project Long-context LLMs usually promise a simple capability: put more tokens into the prompt and let the model reason over them. This works up to a point, but it hides a structural bottleneck: a long context window is only useful if the model can actually afford to attend over it during inference, tool use, and long reasoning trajectories. DeepSeek-V4 changes the focus from maximum context length to efficient long-horizon computation. Both available models (V4-Pro with 1.6T total / 49B active parameters and V4-Flash with 284B / 13B active) support 1M-token context windows. The whole architecture is built around making that window usable: hybrid compressed attention (Compressed Sparse, Heavily Compressed, and Sliding Window, interleaved across layers), a scaled MoE with Manifold-Constrained Hyper-Connections, Muon optimizer, reduced KV-cache cost, and a post-training recipe that…

No comments yet. Log in to reply on the Fediverse. Comments will appear here.