MODEL
DeepSeek v4
DeepSeek v4
Open-weights frontier model from DeepSeek with 384K output window capability.
Model Specifications
- 1 trillion total parameters with ~37B active parameters in mixture-of-experts architecture
- 384K output window — the largest output window of any publicly available model as of April 2026
- Multimodal capability (text and image generation)
- Open-weights release under permissive license
Community Reception & Impact
-
April 25, 2026: r/LocalLLaMA threads document the single-shot generation of 100KB self-contained HTML “web OS” from a single prompt, demonstrating the practical capability unlocked by a 384K output window. The practical demonstration that a 384K output window (not just input) opens a different category of agent tasks than the 32K–64K output ceilings most frontier models still ship with. (2026-04-25-AI-Digest)
-
The 384K output capability is the headline differentiator, enabling full-application generation and long-horizon code synthesis in a single model turn — tasks that were previously stitched together as multi-turn affairs.
-
April 26, 2026: The quiet Sunday following DeepSeek v4’s launch continues to reinforce the model’s infrastructure and cost positioning. Xiaomi’s MiMo V2.5 Pro (#54 Artificial Analysis) and Qwen3.6-27B’s single-RTX-5090 throughput milestone (80 tok/sec at 218K context) demonstrate that the open-weights frontier continues moving at exceptional pace. DeepSeek v4’s 384K output window remains the single largest open-weights capability jump on the inference-output dimension, enabling architectural simplifications (full-app generation in a single turn) that were impossible with previous output ceilings. (2026-04-26-AI-Digest)
-
April 28, 2026: MIT Technology Review published an analysis framing V4 around “world models,” but the model card claims long-horizon reasoning capability—a critical distinction. The actual story is cost-vs-capability narrowing: V4 still trails Gemini 3.1 Pro by ~3–7 points on broad scientific reasoning benchmarks, but dramatically undercuts the closed-model pricing envelope. This coverage arc highlights the real DeepSeek win: economic disruption, not architecture breakthrough. (2026-04-28-AI-Digest)
Technical Significance
DeepSeek v4’s large output window combined with its mixture-of-experts efficiency represents a structural shift in the open-weights competitive landscape. The model demonstrates that open-weights capability can match or exceed frontier closed models on specific dimensions (output length, cost efficiency) even if the overall capability profile remains narrower.
Pricing Tiers
- V4-Pro: Standard $1.74/Mtok input, $3.48/Mtok output; cache-hit input drops to ~$0.03625 (98% reduction) through May 5 promotional window
- V4-Flash: Standard input $0.14/Mtok; cache-hit input drops to $0.028 (80% reduction) through May 5 promotional window
- Promotional cache-hit pricing is engineered to pull RAG/agentic/repeated-context workloads at cost points substantially below Claude Opus 4.7 and GPT-5.5
Related Digests
- 2026-04-25-AI-Digest — DeepSeek v4 with 384K output window enabling 100KB app generation
- 2026-04-27-AI-Digest — V4-Pro 75% promotional price cut; V4-Flash cache discount through May 5
- 2026-04-28-AI-Digest — MIT Tech Review’s “world models” framing vs. model card’s long-horizon-reasoning claims; cost-capability narrowing is the real story
- 2026-05-10-AI-Digest — Full V4 paper drops on r/MachineLearning, expanding the April preview with FP4 quantization-aware training applied during late-stage training to MoE expert weights (FP8 elsewhere in the stack), with real FP4 weights used during inference and RL rollout. Honest framing: V4 is a Blackwell validator rather than a threat — FP8+FP4 mix (not end-to-end FP4), built FOR Blackwell’s NVFP4 path, NVIDIA’s own dev blog promotes the integration. First open-weights frontier MoE with FP4 expert weights and a co-released FP4 train+serve stack.