MODEL
Gemini 2.5 Pro
modeltopic-notegoogle
Overview
Gemini 2.5 Pro is Google’s frontier-class reasoning model, the successor to Gemini 2.0 Pro. It appears on the Terminal-Bench 2.0 agentic-coding leaderboard as a reference point for scaffold-sensitivity analysis: the same model scores materially differently depending on whether it is paired with the Gemini CLI or the Terminus 2 scaffold, illustrating that benchmark results for agentic-coding tasks are heavily harness-dependent.
Timeline
- 2026-05-17-AI-Digest — Benchmarked on Terminal-Bench 2.0 with two scaffolds: 19.6% via Gemini CLI and 32.6% via Terminus 2 — a 13-point swing from scaffold choice alone. Cited as the reference point for the digest’s “scaffold matters as much as model” takeaway, framing the Qwen3.6-35B-A3B 24.6% result in context.
Key Developments
- Scaffold Sensitivity on Terminal-Bench 2.0: 13-point gap (19.6% vs 32.6%) between Gemini CLI and Terminus 2 scaffolds on the same model is the clearest single-day illustration of how benchmark ranking on agentic-coding tasks depends on harness architecture, not just model capability.
Related
See also: Gemini, MOC - Agentic Coding, MOC - Open Source Models