MODEL

DeepSeek V4 Pro

modeltopic-note

Overview

DeepSeek V4 Pro is a frontier reasoning model from DeepSeek, a Chinese AI lab. In May 2026, DeepSeek V4 Pro demonstrated strong cost-efficiency on agentic benchmarks, positioning at a fraction of GPT-5.2’s pricing while achieving comparable performance on specific tasks.

Timeline

  • 2026-05-06-AI-Digest — A Resources flair post on r/LocalLLaMA reported that DeepSeek V4 Pro ties GPT-5.2 on FoodTruck Bench — a 30-day agentic benchmark with persistent memory — at roughly 17× lower cost per million tokens ($0.435 / $0.87 input/output for V4 Pro versus $1.75 / $14 for GPT-5.2). The headline framing in the thread is that “the China–US frontier gap has compressed to ten weeks.” Treat that framing carefully: independent evals put DeepSeek’s broader capability lag closer to 6–8 months on reasoning and 12+ months on multimodal and code, so what compressed is one specific benchmark profile (long-horizon agentic with memory), not the overall capability surface. The 17× cost ratio is the load-bearing number for AI-startup unit economics — at GPT-5.2 prices, agentic loops with persistent memory burn through margins fast; at V4 Pro prices the same loop becomes price-insensitive. The capability-parity claim is benchmark-specific and shouldn’t be over-extrapolated; the cost story is structural.

  • 2026-05-11-AI-Digest — r/LocalLLaMA post (“I have DeepSeek V4 Pro at home”, 245 upvotes, 122 comments) documents a successful Q4_K_M run on an EPYC 9374F workstation (12×96 GB RAM, single RTX PRO 6000 Max-Q) using a community CUDA fork of llama.cpp with modified Q4_K_M support. Worked out of the box. The frontier-class MoE model in this weight class is now self-hostable on prosumer hardware budgets — continues narrowing the “you need a cluster for this” envelope.

  • 2026-05-24-AI-DigestDeepSeek formalises the 75% promotional discount as the permanent list rate: $0.435/M input (cache miss), $0.003625/M (cache hit), $0.87/M output. The previous list prices ($1.74/M input, $3.48/M output) are retired; the long-running promo becomes the standard rate. Against GPT-5.5‘s $5/M input and $30/M output that’s roughly 11.5× cheaper on input and 34× cheaper on output, with the cache-hit input rate at sub-cent-per-million economics no US frontier lab is publishing. The price formalisation locks in the China-vs-US frontier-API gap at the ~10–35× range rather than the 3–5× US analysts had assumed would re-converge once promo pricing ended.

  • 2026-05-25-AI-Digest — V4-Pro becomes the load-bearing economics for Reasonix, a community / third-party terminal coding agent (esengine GitHub org, MIT-licensed, npm reasonix, ~5.5k★) engineered explicitly around V4-Pro’s prefix-cache behaviour. Reasonix claims a 99.82% prefix-cache-hit rate and ~93% cost savings against Claude Code equivalents — direct validation that the cache-hit input rate ($0.003625/M) is the price point that changes the architecture of how coding agents structure their context. Lands the day after permanent-pricing went live, on the HN front page (495 pts / 208 cmts). The read is demand-side: practitioners built a working agent around V4-Pro’s prefix cache the day after the pricing formalised, not that DeepSeek is shipping a first-party agent.

Key Developments

  1. FoodTruck Bench Parity with GPT-5.2: Achieves comparable performance on 30-day agentic benchmark with persistent memory at 17× lower cost.

  2. Cost-Efficiency Leadership: Pricing structure ($0.435 / $0.87 per million tokens) enables price-insensitive agentic loops that become margin-critical economics at GPT-5.2 pricing levels.

  3. Capability-vs-Cost Positioning: Benchmark-specific parity does not represent broader capability parity (6–8 month lag on reasoning, 12+ months on multimodal/code); cost advantage is the structural differentiation for agentic workloads.

  4. Prosumer Home Deployment (May 2026): Q4_K_M run on single-RTX PRO 6000 Max-Q workstation working out of the box establishes that frontier-class MoE models at this weight class are now self-hostable on prosumer hardware budgets, validating the continuing local-vs-frontier curve compression.

  5. Permanent 75% Discount as List Pricing (May 24, 2026): The promo rate becomes the permanent list rate — $0.435/M input cache-miss, $0.003625/M cache-hit, $0.87/M output. Roughly 11.5× cheaper input and 34× cheaper output than GPT-5.5. The signal is structural: the China-vs-US frontier-API price gap has been locked in at the ~10–35× range rather than the 3–5× re-convergence US analysts had assumed.