TOOL

Aider

tooltopic-notebenchmark

Overview

Aider is an open-source AI pair-programming tool whose polyglot leaderboard has become a canonical practitioner reference for ranking frontier models on real coding tasks. In the AI Digest corpus, Aider’s polyglot top-5 is fetched as a reference signal during composition and cross-checked against the day’s model claims; through late May 2026 the board has been a near-static gpt-5 sweep with gemini-2.5-pro-preview-06-05 holding the only non-OpenAI slot.

Timeline

  • 2026-05-29-AI-Digest — Aider polyglot top-5 (fetched 2026-05-29) referenced as the day’s code-benchmark reference signal: 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. The board is unchanged from prior days and serves as the practitioner cross-check against Anthropic‘s same-day Claude Opus 4.8 Terminal-Bench 2.1 claims.
  • 2026-06-03-AI-Digest — Aider polyglot top-5 (fetched 2026-06-03) referenced as the day’s code-benchmark reference signal, with the explicit note that the page was last refreshed 2025-11-20 — a stable snapshot rather than a today-signal: 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Same top-5, same percentages — the staleness disclaimer now applies for over six months and the board has functionally frozen as a reference rather than a leading indicator.
  • 2026-06-06-AI-Digest — Aider polyglot top-5 (fetched 2026-06-06) referenced as the day’s code-benchmark reference signal: 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Unchanged from the prior snapshot. The board is used as the calibration anchor in today’s Key Takeaways — the on-device substrate keeps shifting down a tier (Gemma 4 QAT E2B at ~1 GB) while the practitioner top-5 stays wall-to-wall closed reasoning, framing the day’s “1 GB multimodal on a phone, not open-weights caught up” read.
  • 2026-06-07-AI-Digest — Aider polyglot top-5 (fetched 2026-06-07): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Identical to the 2026-06-06-AI-Digest snapshot — the publicly published board has now been frozen at the 2025-11-20 refresh for over six months and is functioning as a reference floor, not a leading indicator. Honest caveat: several Jun 2026 open-weight releases (DeepSeek V4-Pro, Qwen3-Coder-Next, MiniMax M3) have not been benchmarked here yet, and at least one community-maintained Aider mirror puts DeepSeek-V3.2-Exp into the top-5 at ~74%. Read the public top-5 as a slow-moving anchor for closed-reasoning quality, not as today’s open-vs-closed scorecard.

Key Developments

  1. Polyglot Leaderboard as Corpus Reference Signal: Aider’s polyglot top-5 is the canonical practitioner code benchmark the digest fetches each day to cross-check frontier-model claims; through late May 2026 it has held a four-of-five gpt-5 sweep with gemini-2.5-pro-preview-06-05 the lone non-OpenAI presence.

See also: GPT-5, Gemini 2.5 Pro, MOC - Agentic Coding.