[[NVIDIA]]'s 2026 AI equity commitments cross $40B as the circular-financing critique becomes mainstream-analyst consensus; Box Elder County approves the 9 GW Stratos AI data center over a withdrawn water-rights filing and a planned referendum; and Fields Medalist Tim Gowers reports ChatGPT 5.5 Pro solving previously-open math research problems unaided in under an hour.

AI Digest — May 10, 2026

Your daily deep-dive on AI models, tools, research, and developer ecosystem news.

🔖 Project Releases

Claude Code

No new release in the past 24 hours. The latest is still v2.1.138 from 2026-05-09, covered in 2026-05-09-AI-Digest alongside v2.1.137 and the substantive v2.1.136 cut. The cadence reset that began 2026-05-07 has held — five releases across May 6–9, then a quiet Sunday.

Beads

v1.0.4 shipped 2026-05-09T15:11Z — the first net-new Beads release in over two weeks. The bd close command picks up a --reason-file flag mirroring the existing --body-file pattern, and a central server config library landed (#3258). The headline change is a heavy expansion of the Linear integration: OAuth client-credentials, an ambient staleness signal for auto-fresh data, idempotency markers preventing duplicate issue creation, a per-workspace concurrency lock on sync, decision/spike/story/milestone type mappings, and adoption of issueBatchCreate / issueBatchUpdate for “50x efficiency” (the release notes’ own phrasing) (#3654). For corpus tracking: yesterday’s note that v1.0.3 was fifteen days old is now stale; v1.0.4 dropped roughly fifteen hours after the 2026-05-09-AI-Digest published.

OpenSpec

No new release this week. Latest remains v1.3.1 from 2026-04-21 (19 days old as of today), originally covered in 2026-04-22-AI-Digest and re-flagged in 2026-05-09-AI-Digest.

🧵 From the Community (r/LocalLLaMA & r/MachineLearning)

A user lands 80 tok/sec on Qwen 3.6 35B A3B with the new `llama.cpp` MTP PR — single 12 GB GPU, 128K context

The OP self-text on r/LocalLLaMA reports “over 80 tok/sec with 80%+ draft acceptance rate” running Qwen 3.6 35B A3B at 128K context (-c 131072) on an RTX 4070 Super 12 GB, using the new multi-token-prediction PR against llama.cpp and the Qwen3.6-35B-A3B-MTP-UD-Q4_K_XL.gguf quant. Source: r/LocalLLaMA — 500 upvotes, 103 comments. Three of today’s r/LocalLLaMA top threads (this one, dual Mi50 MTP, and the Q4_1 quants thread) thread the same MTP-on-modest-VRAM story, suggesting the PR is moving from experimental to default for the on-device crowd.

DeepSeek V4’s full paper lands with FP4 QAT for MoE expert weights — but it’s a Blackwell validator, not a Blackwell threat

The full V4 paper expands the April preview with FP4 quantization-aware training applied during late-stage training to MoE expert weights (FP8 elsewhere in the stack), with real FP4 weights used during inference and RL rollout. Source: r/MachineLearning.

Reddit framing vs. what the paper actually delivers

The thread’s framing — “DeepSeek operationalising FP4 end-to-end resets the cost curve and pressures NVIDIA’s Blackwell FP4 narrative” — overshoots. The model is FP8+FP4 mix, not end-to-end FP4, and it is built FOR Blackwell’s NVFP4 path. NVIDIA’s own developer blog promotes the integration. The cleaner read is that V4 is the first open-weights frontier MoE with FP4 expert weights and a co-released FP4 train+serve stack — a validation of Blackwell’s NVFP4 bet. The cost-curve pressure lands on FP8-era incumbents, not on NVIDIA.

NVIDIA ships Star Elastic — a single nested checkpoint containing 30B / 23B / 12B reasoning models, sliceable in place

Star Elastic packages three reasoning-model sizes into one matryoshka-style checkpoint, with zero-shot slicing reportedly preserving quality at each cut. Source: r/LocalLLaMA — 115 upvotes, 30 comments. Coverage cites a 360× token-cost reduction vs. training the variants from scratch and 2.4× throughput at the 12B slice on the NVFP4 QAD path. Worth pairing with NVIDIA‘s Nemotron-Elastic-12B from November 2025 — Star Elastic extends the same elastic-checkpoint research line to 30B/23B/12B reasoning packaging, not a clean break from prior work. The deployment-matrix collapse (one artifact, many size budgets) is real if the slicing-preserves-quality claim holds at scale; the corpus framing should be “strong execution on an established matryoshka-style technique,” not “default packaging for local model families.”

📰 Technical News & Releases

NVIDIA‘s 2026 AI equity ledger crosses $40B — and the IREN structure is a textbook circular flow

Source: TechCrunch | CNBC

NVIDIA‘s announced AI equity commitments in 2026 have crossed $40B in roughly four months, anchored by the OpenAI $30B direct equity investment closed in February — itself a restructured replacement for the scrapped $100B / 10 GW framework after OpenAI pivoted away from owning data centers, not a tranche of it. The other named line items: $500M of Corning warrants with rights to invest up to $3.2B in Corning equity over three years to fund three new US optical-connectivity plants in NC and TX; $2.1B in IREN warrant rights paired with a $3.4B / 5-year managed-GPU-cloud contract back to NVIDIA; seven more multi-billion-dollar public-company deals; and roughly 24 private rounds.

What “$40B” actually means here

The number is announced commitments — warrants, rights-to-invest, and multi-year tranches — not funded cash equity. The Corning headline figure of “$3.2B” is a $500M upfront warrant purchase plus rights to invest the remaining $2.7B over three years. Wedbush’s Matthew Bryson called the structure “circular investment” in the TechCrunch piece, but the framing is now consensus rather than novelty: Mizuho’s Jordan Klein (“smells like you are pre-funding the purchase of your own GPUs”), Bloomberg’s standalone “AI Circular Deals” graphic series, and EU competition staff (March 2026) have all flagged the same loop.

The IREN structure is the cleanest single instance: capital out (rights to buy IREN equity), revenue in (a contract to buy GPU-cloud capacity from IREN), both denominated in the same NVIDIA hardware. Pair with 2026-05-08-AI-Digest‘s xAI Colossus 1 lease and 2026-05-09-AI-Digest‘s Anthropic-Akamai $1.8B compute deal — the May 8–9 capex stack and the May 10 capital-flow stack are two views of the same picture. The framing question is whether to read the May 10 beat as “one more datapoint in a months-old narrative” or “the moment the regulatory clock starts.” The honest read is the first; the EU competition flag in March is what would tip it to the second.

Box Elder County approves Stratos — a 9 GW Utah AI data-center buildout — over a withdrawn water-rights filing and a planned referendum

Source: CNN Business | Tom’s Hardware | Salt Lake Tribune

The Box Elder County commission approved the Stratos Project — a 9 GW AI data-center campus on roughly 40,000 acres in Hansel Valley, with power coming from the Ruby Pipeline interstate gas connection — over loud protest from hundreds of residents at the meeting; one commissioner told the room to “grow up” before the panel retreated to a private chamber. Kevin O’Leary is fronting the project alongside Utah’s Military Installation Development Authority. A planned November ballot referendum (5,000+ signatures required) is in motion, and the project’s water-rights request was withdrawn on May 7 following public protest — material context the initial CNN headline dropped.

What “9 GW” is, and what “highest-profile flashpoint” elides

The 9 GW figure is full-buildout aspiration, not committed phase-1 capacity; the capital raise has not commenced and the only stated total is “$1B+.” Stratos is now the most-protested single-site AI data center in the US, but it joins xAI‘s Memphis Colossus 1 emissions and Loudoun County grid stress as the marquee flashpoints of the build-out’s land/energy footprint, not unilaterally “the highest-profile yet.” Heatmap News separately counts 142 organised opposition groups and roughly $64B in blocked or paused AI/cloud projects.

Pair with 2026-05-09-AI-Digest‘s PJM Interconnection grid warning — same arc, different layer. The constraint on US compute build-out is firming up at the local-permitting and grid-adjacency layers faster than at the capital-markets one, and the May 10 beat sharpens that reading.

Fields Medalist Tim Gowers reports ChatGPT 5.5 Pro solving previously-open math research problems unaided in under an hour

Source: Gowers’s Weblog | The Decoder

Fields Medalist Tim Gowers published a primary blog post on May 8 describing a session with ChatGPT 5.5 Pro in which the model improved an exponential bound to a quadratic one in 17 minutes 5 seconds, then generalised exponential to polynomial by 31 minutes 40 seconds — both on previously-open problems Gowers gave it cold. Gowers describes the result as “above the lower bound for contributing to mathematics,” and MIT student Isaac Rajagopal called the key idea “completely original.” The OpenAI / arXiv “Early science acceleration experiments with GPT-5” paper (arXiv:2511.16072) is the broader research programme this fits inside.

Worth flagging the corpus context: this is distinct from prior LLM-mathematician headlines. Tim Gowers is a working Fields Medalist working on his own active research — not olympiad problems and not a formal-proof assistant pipeline. The result also doesn’t contradict 2026-05-09-AI-Digest‘s “models faking reasoning traces under evaluation pressure” finding: Gowers verified the output is correct mathematics, not that the chain-of-thought reflects the actual computation. Output correctness and trace faithfulness are orthogonal evidence streams; the May 10 beat moves the first and leaves the second untouched.

Source discipline on this one

Gowers’s blog is the primary; The Decoder is the secondary. The Decoder’s “MIT researchers calling ideas ‘completely original’” framing is a single-student attribution (Isaac Rajagopal), not a generic-researcher claim. Cite the blog where the specifics matter.

Apple removes the 256 GB Mac Studio M3 Ultra from the online store — DRAM shortage, not strategy

Source: MacRumors | 9to5Mac | Macworld

Apple pulled the 256 GB Mac Studio M3 Ultra SKU from the US online store in early May, leaving 96 GB as the maximum-RAM configuration; the 512 GB option had already been pulled in March. MacRumors and 9to5Mac both attribute the cuts to the global DRAM shortage driven by AI-server memory contention, not a deliberate Apple ladder strategy. Macworld separately reports the M5 Mac Studio launch is delayed for the same reason. The story originally broke May 5 — it is surfacing on r/LocalLLaMA today because local-LLM practitioners are reading the SKU shrinkage as a near-term ceiling on the Mac-as-inference-box thesis until DRAM supply clears.

Apple has not signalled the cuts are permanent. But for users who built workflows around 256–512 GB unified-memory configurations, the practical ceiling is moving the wrong direction this quarter, and the corpus’s high-RAM-Mac inference subthread loses a headline option. The honest read is that the same compute-buildout pressure driving NVIDIA‘s $40B equity ledger and the Stratos 9 GW approval is now reaching back into consumer hardware availability — three different layers, one supply story.

Wispr Flow reports India is its fastest-growing market — but on a ₹320/month tier vs. $12 globally

Source: TechCrunch

Bay Area dictation startup Wispr Flow told TechCrunch that India is now its fastest-growing market — 14% of 2.5M downloads (Oct 2025–Apr 2026) but roughly 2% of in-app revenue, with growth jumping from 60% to 100% MoM after a Hinglish-first localisation push. The Indian price tier is ₹320/month (~$3.50) on annual billing, vs. $12/month elsewhere — closer to a 70% discount than a flat low-ARPU bet. Worth holding loosely: download share, revenue share, and MoM growth figures are self-reported by Wispr Flow founders and are not independently audited. The bet matters as a live test of whether frontier voice-AI products can monetise in code-switched, low-ARPU markets — the same wall earlier consumer-AI startups have hit chasing Indian scale.

Source: simonwillison.net

Simon Willison amplified Thariq Shihipar’s (Claude Code team, Anthropic) argument that asking Claude to emit HTML — not Markdown — unlocks SVG diagrams, interactive widgets, in-page navigation, and other rendering the Markdown surface area cannot carry. Willison’s quote: “Asking Claude for an explanation in HTML means it can drop in SVG diagrams, interactive widgets, in-page navigation and all sorts of other neat ways.” His worked example uses ChatGPT 5.5 Pro on a Linux exploit walkthrough. The framing to keep is developer-tooling-affordance discovery — not a new model capability, but a prompt-pattern that surfaces capability the default Markdown framing was muting.

🧭 Key Takeaways

The capital-flow story is now mainstream-analyst consensus, not a contrarian read. Wedbush, Mizuho, Bloomberg’s circular-deals graphic series, and EU competition staff have converged on the same framing of NVIDIA‘s $40B equity ledger; the IREN warrant + buy-back-from-IREN structure is the cleanest single instance. The novelty has shifted from “is this circular?” to “what does the second-order regulatory response look like?”
Build-out friction is shifting from financing to politics. Box Elder approving Stratos despite a withdrawn water-rights filing and a planned referendum, Heatmap counting 142 organised opposition groups and ~$64B in blocked projects, and 2026-05-09-AI-Digest‘s PJM Interconnection grid warning are three layers of the same arc — the constraint on US compute build-out is firming up at the local-permitting and grid layers faster than at the capital-markets one.
A working Fields Medalist saying ChatGPT 5.5 Pro produced novel math research unaided in under an hour is a real frontier-capability beat, not a recycled “models can do math” headline. Tim Gowers gave the model previously-open problems cold; the work is verified-correct output, distinct from formal-proof tooling and from olympiad headlines, and orthogonal to the chain-of-thought-faithfulness question 2026-05-09-AI-Digest flagged.
DeepSeek V4 is a Blackwell validator, not a Blackwell challenger. The FP4 QAT result is genuinely novel as the first open-weights frontier MoE with FP4 expert weights and a co-released train+serve stack — but it is built FOR Blackwell’s NVFP4 path, and NVIDIA‘s own dev blog promotes the integration. The cost-curve pressure lands on FP8-era incumbents.
MTP is moving from experimental to default for on-device LLMs, and the lab side is moving in the same direction with elastic checkpoints. Three of today’s r/LocalLLaMA top threads converge on the new llama.cpp MTP PR delivering step-change throughput on consumer-grade hardware (Qwen 3.6 35B A3B at 80 tok/sec on a 12 GB GPU is the marquee number); Star Elastic packages three reasoning-model sizes into a single sliceable checkpoint on the lab side. Same arc — fewer artifacts, more deployment options — different layer.

Generated on 2026-05-10 by Claude