MODEL
Gemini 3 Pro
modeltopic-notegoogle
Overview
Gemini 3 Pro is Google’s flagship reasoning model as of mid-2026. On the SOOHAK benchmark (CMU/EleutherAI/Seoul National University), it scores approximately 30% on solvable math problems (Avg@3), leading the frontier tier above GPT-5 (~26%) and Claude Opus 4.5 (~10%). Like all other frontier models tested, it does not clear 50% on recognizing intentionally unsolvable problems (best score ~49%).
Timeline
- 2026-05-18-AI-Digest — Leads the SOOHAK benchmark on solvable IMO-medalist-validated math problems at ~30% (Avg@3), ahead of GPT-5 (~26%) and Claude Opus 4.5 (~10%). The benchmark’s key finding is that no model including Gemini 3 Pro clears 50% on recognizing unsolvable problems, with the best refusal score ~49%, and that scaling does not improve this metric meaningfully.
Key Developments
- SOOHAK Benchmark Leader (May 2026): Highest solvable-problem score at ~30% Avg@3. The load-bearing diagnostic is the refusal failure it shares with all frontier models — confident wrong answers on unsolvable problems is the failure mode that aggregate accuracy benchmarks hide.
Related
See also: Gemini, Google, GPT-5, Claude Opus 4.5, MOC - Major Companies.