MiniMax M3

Overview

MiniMax M3 is MiniMax‘s third-generation flagship model, announced 2026-06-02 with a new MiniMax Sparse Attention architecture targeting long-context efficiency. The model is open-weight with weights set to drop to Hugging Face and GitHub within 10 days of the announcement, claiming ~1/20th compute at 1M tokens, 9× faster input and 15× faster generation vs dense attention at long context, trained on 100T interleaved multimodal tokens. Vendor-published benchmarks (SWE-Bench Pro 59%, BrowseComp 83.5) are unaudited but credible enough that even partial validation makes M3 the load-bearing open-weights story of its launch week.

Timeline

2026-06-02-AI-Digest — MiniMax announces M3 with MiniMax Sparse Attention, claiming ~1/20th compute at 1M tokens, 9× faster input and 15× faster generation vs dense attention at long context, trained on 100T interleaved multimodal tokens, with weights set to drop to Hugging Face and GitHub within 10 days. Vendor benchmarks: SWE-Bench Pro 59% (ahead of GPT-5.5 and Gemini 3.1 Pro, just behind Opus 4.7) and BrowseComp 83.5 (beats Opus 4.7’s 79.3). All numbers vendor-published and unaudited — but if even half of the sparse-attention efficiency holds at scale, this is the actual open-weights story of the week, landing days after the SimSD speculative-decoding-for-diffusion-LMs paper makes long-context serving cheaper to discuss in general.
2026-06-13-AI-Digest — MaxProof (arXiv:2606.13473) lands as a population-level test-time-scaling result built on MiniMax M3 — proof generation, verification, and critique-conditioned repair trained into one model, with a tournament-selection population search at test time, reportedly scoring 35/42 on IMO 2025 and 36/42 on USAMO 2026. Paired with the same-day MiniMax Sparse Attention paper (28.4× compute reduction, 14.2× prefill / 7.6× decode on H800), today is the coordinated mathematical-reasoning + long-context push from one lab in a single drop.

Key Developments

MiniMax Sparse Attention — First Credible Sparse-Attention Numbers at Long Context (June 2, 2026): The architecture’s headline claims (1/20th compute at 1M tokens, 9× faster input, 15× faster generation vs dense at long context) are the first vendor-published sparse-attention efficiency numbers at long context from the open-weight cohort. Pairs structurally with the same week’s SimSD speculative-decoding-for-diffusion-LMs result — long-context serving is getting structurally cheaper from two independent architectural directions.
Open-Weights Distribution on a 10-Day Clock: The “weights to HF + GitHub within 10 days” commitment is the credibility test on the sparse-attention claims — once weights drop, independent reproduction of the 1M-token throughput numbers is feasible. Until then, treat the benchmarks as vendor-published and unaudited.
Benchmark Position: SWE-Bench Pro 59% places M3 ahead of GPT-5.5 and Gemini 3.1 Pro, just behind Claude Opus 4.7; BrowseComp 83.5 beats Opus 4.7’s 79.3. The narrow gap to a frontier closed model on open weights at long context is the load-bearing capability claim — distinct from the efficiency claim above.

MiniMax M3

Overview

Timeline

Key Developments

Related