Gemini 3 Pro

Overview

Gemini 3 Pro is Google’s flagship reasoning model as of mid-2026. On the SOOHAK benchmark (CMU/EleutherAI/Seoul National University), it scores approximately 30% on solvable math problems (Avg@3), leading the frontier tier above GPT-5 (~26%) and Claude Opus 4.5 (~10%). Like all other frontier models tested, it does not clear 50% on recognizing intentionally unsolvable problems (best score ~49%).

Timeline

2026-05-18-AI-Digest — Leads the SOOHAK benchmark on solvable IMO-medalist-validated math problems at ~30% (Avg@3), ahead of GPT-5 (~26%) and Claude Opus 4.5 (~10%). The benchmark’s key finding is that no model including Gemini 3 Pro clears 50% on recognizing unsolvable problems, with the best refusal score ~49%, and that scaling does not improve this metric meaningfully.
2026-06-14-AI-Digest — Gemini 3 Pro (3.1 Pro variant) is the base model under Google Research’s new Gemini SQL2 — a text-to-SQL specialisation built via prompting/scaffolding on top of Gemini 3 Pro with no fine-tuning, reaching 80.04% execution accuracy on the BIRD single-model leaderboard vs GPT-5.5-xhigh ~72.8% and Claude Opus 4.6 ~70.9%. The structural read: the win comes from prompting on top of an existing flagship, not a new pre-trained checkpoint — Gemini 3 Pro is the substrate that the +7-point BIRD lead is built on.

Key Developments

SOOHAK Benchmark Leader (May 2026): Highest solvable-problem score at ~30% Avg@3. The load-bearing diagnostic is the refusal failure it shares with all frontier models — confident wrong answers on unsolvable problems is the failure mode that aggregate accuracy benchmarks hide.

Gemini 3 Pro

Overview

Timeline

Key Developments

Related