MODEL

Gemini 3 Pro

modeltopic-notegoogle

Overview

Gemini 3 Pro is Google’s flagship reasoning model as of mid-2026. On the SOOHAK benchmark (CMU/EleutherAI/Seoul National University), it scores approximately 30% on solvable math problems (Avg@3), leading the frontier tier above GPT-5 (~26%) and Claude Opus 4.5 (~10%). Like all other frontier models tested, it does not clear 50% on recognizing intentionally unsolvable problems (best score ~49%).

Timeline

  • 2026-05-18-AI-Digest — Leads the SOOHAK benchmark on solvable IMO-medalist-validated math problems at ~30% (Avg@3), ahead of GPT-5 (~26%) and Claude Opus 4.5 (~10%). The benchmark’s key finding is that no model including Gemini 3 Pro clears 50% on recognizing unsolvable problems, with the best refusal score ~49%, and that scaling does not improve this metric meaningfully.

Key Developments

  1. SOOHAK Benchmark Leader (May 2026): Highest solvable-problem score at ~30% Avg@3. The load-bearing diagnostic is the refusal failure it shares with all frontier models — confident wrong answers on unsolvable problems is the failure mode that aggregate accuracy benchmarks hide.

See also: Gemini, Google, GPT-5, Claude Opus 4.5, MOC - Major Companies.