Aider

Overview

Aider is an open-source AI pair-programming tool whose polyglot leaderboard has become a canonical practitioner reference for ranking frontier models on real coding tasks. In the AI Digest corpus, Aider’s polyglot top-5 is fetched as a reference signal during composition and cross-checked against the day’s model claims; through late May 2026 the board has been a near-static gpt-5 sweep with gemini-2.5-pro-preview-06-05 holding the only non-OpenAI slot.

Timeline

2026-05-29-AI-Digest — Aider polyglot top-5 (fetched 2026-05-29) referenced as the day’s code-benchmark reference signal: 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. The board is unchanged from prior days and serves as the practitioner cross-check against Anthropic‘s same-day Claude Opus 4.8 Terminal-Bench 2.1 claims.
2026-06-03-AI-Digest — Aider polyglot top-5 (fetched 2026-06-03) referenced as the day’s code-benchmark reference signal, with the explicit note that the page was last refreshed 2025-11-20 — a stable snapshot rather than a today-signal: 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Same top-5, same percentages — the staleness disclaimer now applies for over six months and the board has functionally frozen as a reference rather than a leading indicator.
2026-06-06-AI-Digest — Aider polyglot top-5 (fetched 2026-06-06) referenced as the day’s code-benchmark reference signal: 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Unchanged from the prior snapshot. The board is used as the calibration anchor in today’s Key Takeaways — the on-device substrate keeps shifting down a tier (Gemma 4 QAT E2B at ~1 GB) while the practitioner top-5 stays wall-to-wall closed reasoning, framing the day’s “1 GB multimodal on a phone, not open-weights caught up” read.
2026-06-09-AI-Digest — Aider polyglot top-5 (fetched 2026-06-09): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Unchanged from 2026-06-08-AI-Digest for the second day running — same five lines, same percentages. The board functions as the calibration anchor for today’s Xiaomi MiMo-v2.5-Pro-UltraSpeed coverage (absent from the top-5) and the capability-ceiling vs cost-disruption vs inference-speed three-races frame.
2026-06-07-AI-Digest — Aider polyglot top-5 (fetched 2026-06-07): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Identical to the 2026-06-06-AI-Digest snapshot — the publicly published board has now been frozen at the 2025-11-20 refresh for over six months and is functioning as a reference floor, not a leading indicator. Honest caveat: several Jun 2026 open-weight releases (DeepSeek V4-Pro, Qwen3-Coder-Next, MiniMax M3) have not been benchmarked here yet, and at least one community-maintained Aider mirror puts DeepSeek-V3.2-Exp into the top-5 at ~74%. Read the public top-5 as a slow-moving anchor for closed-reasoning quality, not as today’s open-vs-closed scorecard.
2026-06-15-AI-Digest — Aider polyglot top-5 (fetched 2026-06-15): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Identical to yesterday’s and Saturday’s row order and percentages — 72 hours frozen. The SWE-Bench Verified top-three from earlier this week (Claude Mythos 5 95.5%, Claude Fable 5 95%, Claude Opus 4.8 88.6%) is unchanged in print, but Mythos 5 and Fable 5 remain globally disabled — the published frontier of SWE-Bench has now been inaccessible to API callers for ~72 hours. The “Aider vs SWE-Bench divergence” thread the corpus has been running since late May stays on pause until either Anthropic reactivates the disabled tier or a fresh tag overtakes. Also today: the SWE-Explore paper proposes Aider beam search (alongside CodeRabbit, Cursor planner) as a token-budget target for the next generation of agent harnesses chasing line-level recall.
2026-06-17-AI-Digest — Aider polyglot top-5 (fetched 2026-06-17): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. Gemini 2.5 Pro preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Identical ordering and percentages to last week — board still GPT-5-dominated four of five, Gemini 2.5 Pro holds fourth, no Claude Opus 4.8 entry. Today’s digest’s [!note] frames it explicitly: “no new frontier coding model has cleared the bar this week” rather than a fresh ranking event — the frozen leaderboard is itself the signal in the post-Fable-5 window.
2026-06-18-AI-Digest — Aider polyglot top-5 (fetched 2026-06-18): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. Gemini 2.5 Pro preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Same ordering, same percentages as last week — and the same as 2026-06-17-AI-Digest. Today’s “Eight days frozen” callout: the agentic-coding bar has not moved during the entire Fable 5 / Mythos 5 shutdown window. Cross-reference 2026-06-15-AI-Digest‘s SWE-Explore numbers for the structural complement — line-level-recall axis is where the next harness-side gain lands, not the public leaderboard.
2026-06-28-AI-Digest — Aider polyglot top-5 (fetched 2026-06-28): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Day eighteen of the polyglot freeze — same five rows, same percentages as 2026-06-27-AI-Digest and every print back to 2026-06-12-AI-Digest. Today’s digest reframes the freeze: eighteen consecutive days at the same top-5 is the longest unbroken freeze the corpus has recorded, and with GPT-5.6 Sol still under the customer-by-customer access regime and Mythos 5 only just restored to ~100 trusted partners, Aider cannot realistically sample either tier yet — the freeze is now an artifact of gated-access timing, not a benchmark plateau.
2026-06-29-AI-Digest — Aider polyglot top-5 (fetched 2026-06-29): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Day nineteen of the polyglot freeze — same five rows, same percentages as 2026-06-28-AI-Digest and every print back to 2026-06-12-AI-Digest, extending the longest unbroken freeze the corpus has recorded. The corpus framing the digest holds: with GPT-5.6 Sol still under the customer-by-customer access regime and Mythos now restored only to ~100 trusted partners, Aider still cannot realistically sample either of the two highest-altitude tiers — the freeze remains an artifact of gated-access timing, not a benchmark plateau. Secondary axis worth watching: whether GLM 5.2 — given today’s Semgrep cyber-bench result — surfaces on the polyglot leaderboard outside the top-5 cut in the next print.
2026-07-14-AI-Digest — Aider polyglot top-5 (fetched 2026-07-14): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Board unchanged again — no Claude Sonnet 5, no Claude Fable 5, no GPT-5.6 Sol entry across three settings. The corpus carries the softened read from 2026-07-12-AI-Digest: leaderboard methodology plus refresh lag rather than a capability-race verdict.
2026-07-15-AI-Digest — Aider polyglot top-5 (fetched 2026-07-15): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Polyglot stasis day thirty-three — no Claude Sonnet 5, no Claude Fable 5, no GPT-5.6 Sol entry. Interesting cross-check today is OpenAI’s July 14 blog claiming GPT-5.6 is 54% more token-efficient than the next-highest-scoring model on the Artificial Analysis Coding Agent Index — different leaderboard, different metric, vendor claim. Does not resolve why polyglot is frozen. Corpus carries the methodology-plus-refresh-lag framing from 2026-07-12-AI-Digest extended another day.
2026-06-30-AI-Digest — Aider polyglot top-5 (fetched 2026-06-30): 1. gpt-5 (high) — 88.0% · 2. gpt-5 (medium) — 86.7% · 3. o3-pro (high) — 84.9% · 4. gemini-2.5-pro-preview-06-05 (32k think) — 83.1% · 5. gpt-5 (low) — 81.3%. Day twenty of the polyglot freeze — same five rows, same percentages as 2026-06-29-AI-Digest and every print back to 2026-06-12-AI-Digest, extending the longest unbroken freeze the corpus has recorded into its fourth week. Framing from prior digests still holds: GPT-5.6 Sol remains gated under the customer-by-customer access regime, Mythos is restored only to a small trusted-partner allowlist, so Aider cannot realistically sample either of the two highest-altitude tiers — gated-access-timing artefact, not a benchmark plateau. Worth noting in passing that Qwen 3.6‘s 27B variant just hit the Hacker News front page on a “sweet spot for local development” pitch — the polyglot leaderboard would be the natural place for that claim to be validated, but no open-weights entry has surfaced in the top-5 cut.

Key Developments

Polyglot Leaderboard as Corpus Reference Signal: Aider’s polyglot top-5 is the canonical practitioner code benchmark the digest fetches each day to cross-check frontier-model claims; through late May 2026 it has held a four-of-five gpt-5 sweep with gemini-2.5-pro-preview-06-05 the lone non-OpenAI presence.
Eighteen-Day Polyglot Freeze as a Gated-Access Artifact (June 28, 2026): Eighteen consecutive days at the same top-5 ordering and percentages is the longest unbroken freeze the corpus has recorded — but the structural read is not a capability plateau. With GPT-5.6 Sol still under the customer-by-customer access regime and Claude Mythos 5 only just restored to ~100 trusted partners, Aider cannot realistically sample either tier — the freeze is an artifact of gated-access timing, and the corpus should expect it to outlast the next two release cycles until preview-access models become evaluable.

Aider

Overview

Timeline

Key Developments

Related