MODEL
Claude Opus 4.5
modeltopic-noteanthropic
Overview
Claude Opus 4.5 is an Anthropic model appearing in the SOOHAK benchmark in May 2026. On the SOOHAK benchmark (CMU/EleutherAI/Seoul National University), it scores approximately 10% on solvable math problems (Avg@3), placing third behind Gemini 3 Pro (~30%) and GPT-5 (~26%). Like all other frontier models tested, it does not clear 50% on recognizing intentionally unsolvable problems.
Timeline
- 2026-05-18-AI-Digest — Cited in SOOHAK benchmark results: Claude Opus 4.5 scores ~10% (Avg@3) on solvable IMO-medalist-validated math problems, third in the frontier tier behind Gemini 3 Pro (~30%) and GPT-5 (~26%). On the refusal axis (recognizing intentionally unsolvable problems), no model clears 50% — the paper finds scaling does not meaningfully move refusal quality.
Key Developments
- SOOHAK Benchmark (May 2026): Third on solvable problems at ~10% Avg@3. The benchmark’s primary finding is the refusal failure shared by all frontier models, not the solvable-problem leaderboard alone.
Related
See also: Anthropic, Claude, GPT-5, Gemini 3 Pro, MOC - Major Companies.