Claude Opus 4.5

Overview

Claude Opus 4.5 is an Anthropic model appearing in the SOOHAK benchmark in May 2026. On the SOOHAK benchmark (CMU/EleutherAI/Seoul National University), it scores approximately 10% on solvable math problems (Avg@3), placing third behind Gemini 3 Pro (~30%) and GPT-5 (~26%). Like all other frontier models tested, it does not clear 50% on recognizing intentionally unsolvable problems.

Timeline

2026-05-18-AI-Digest — Cited in SOOHAK benchmark results: Claude Opus 4.5 scores ~10% (Avg@3) on solvable IMO-medalist-validated math problems, third in the frontier tier behind Gemini 3 Pro (~30%) and GPT-5 (~26%). On the refusal axis (recognizing intentionally unsolvable problems), no model clears 50% — the paper finds scaling does not meaningfully move refusal quality.

Key Developments

SOOHAK Benchmark (May 2026): Third on solvable problems at ~10% Avg@3. The benchmark’s primary finding is the refusal failure shared by all frontier models, not the solvable-problem leaderboard alone.

Claude Opus 4.5

Overview

Timeline

Key Developments

Related