MODEL

GPT-5

modeltopic-noteopenai

Overview

GPT-5 is OpenAI’s frontier model, appearing in benchmarks in mid-2026. On the SOOHAK benchmark (CMU/EleutherAI/Seoul National University), GPT-5 scores approximately 26% on solvable math problems (Avg@3), placing it second behind Gemini 3 Pro (~30%) and above Claude Opus 4.5 (~10%). No model including GPT-5 clears 50% on recognizing unsolvable problems, with the best score ~49%.

Timeline

  • 2026-05-18-AI-Digest — Cited in SOOHAK benchmark results: GPT-5 scores ~26% (Avg@3) on solvable IMO-medalist-validated math problems, placing second behind Gemini 3 Pro (~30%) and above Claude Opus 4.5 (~10%). On the refusal axis (recognizing intentionally unsolvable problems), no model including GPT-5 clears 50% — the paper’s key finding is that scaling does not move the refusal number meaningfully.
  • 2026-05-23-AI-Digest — Continues sweeping four of five slots on the Aider polyglot top-5 (gpt-5 high 88.0%, gpt-5 medium 86.7%, o3-pro third at 84.9%, gemini-2.5-pro-preview-06-05 32k think at 83.1%, gpt-5 low 81.3%). Board is identical to yesterday’s snapshot; the durability of the GPT-5 dominance against the absence of any Gemini 3.x entry is itself the data point.

Key Developments

  1. SOOHAK Benchmark (May 2026): Second on solvable problems at ~26% Avg@3. More importantly, fails to clear the 50% refusal threshold alongside all other frontier models — the benchmark’s load-bearing diagnostic is that confident wrong answers on unsolvable problems is the failure mode aggregate accuracy benchmarks systematically hide.

See also: OpenAI, Gemini 3 Pro, Claude Opus 4.5, MOC - Open Source Models.