KKR closes $10B+ for Helix Digital Infrastructure under ex-AWS chief Adam Selipsky as Anthropic ships Claude Code Security in beta to Enterprise and Mistral launches cloud-resident coding agents — three different bets on what 'AI infrastructure' means in practice.

AI Digest — May 3, 2026

Your daily deep-dive on AI models, tools, research, and developer ecosystem news.

🔖 Project Releases

Claude Code

Claude Code v2.1.126 (May 1) shipped after the three-day quiet window flagged in 2026-05-02-AI-Digest. The reported changes are practitioner-relevant rather than headline: the model picker now lists models from the /v1/models endpoint when ANTHROPIC_BASE_URL points at an Anthropic-compatible gateway (relevant for Bedrock and Vertex routing); a new claude project purge [path] command removes all Claude Code state for a project (the kind of teardown affordance that’s been missing); the OAuth Authenticate / Re-authenticate actions in the /mcp menu — previously hidden in some configurations — are surfaced again; and HTTP/SSE MCP servers using custom headers no longer get stuck in “needs authentication” after a transient 401. The Linux x86-64 baseline crash filed April 29 (2026-04-29-AI-Digest) is still the open thread; v2.1.126 doesn’t claim it.

Beads

No new release. Beads v1.0.3 (April 24) is now nine days old — bd gate create, bd prune cascading orphan cleanup, BD_JSON_ENVELOPE=1 (covered in 2026-04-26-AI-Digest). Cadence-normal post-1.x settle.

OpenSpec

No new release. OpenSpec v1.3.1 (April 21) is twelve days old — the canonical artifact path resolution fix and stricter fenced-code-block validation (covered in 2026-04-22-AI-Digest). Also inside the project’s normal cadence; nothing overdue.

🧵 From the Community (r/LocalLLaMA & r/MachineLearning)

Note

Section patched after publication: the original automated run’s r/LocalLLaMA fetch was returned by the WebFetch tool as a saved-to-disk reference (the JSON exceeded inline-return size), and the research subagent bailed instead of reading the file in chunks. A re-fetch with limit=5 returns inline cleanly; community threads below are from that retry.

Qwen3.6-27B with agentic search hits 95.7% SimpleQA on a single 3090

Source: r/LocalLLaMA

The LDR (Local Deep Research) maintainer reports Qwen3.6-27B running on a single RTX 3090 with the project’s langgraph_agent strategy — multi-iteration tool-calling, parallel subtopic decomposition — hitting 95.7% on SimpleQA (287/300) and 77.0% on xbench-DeepSearch (77/100). For comparison, the post cites Perplexity Deep Research at 93.9%. The author flags the obvious caveats: these are agent + search scores, not closed-book; the sample is 200–300 questions; SimpleQA contamination risk on newer base models is real; and the LLM grader is the model self-grading, with Opus-reviewed spot checks. The interesting empirical framing is that performance tracks tool-calling quality more than raw size — a 27B local model with a well-built agent loop competing with a closed-source deep-research stack is a plausible signal that the agent loop is the binding constraint, not the model.

A native-Windows vLLM port runs Qwen3.6-27B at 72 tok/s on an RTX 3090

Source: r/LocalLLaMA

A patched vLLM fork bundled as a portable Windows launcher pushes Qwen3.6-27B to 72 tok/s on a 3090, 53.4 tok/s at 127K context, and 160K context across two 3090s with PP=2. The author is candid that this isn’t the global record — Linux with TurboQuant 3-bit KV reaches 80–82 tok/s, a 5090 hits 160 tok/s — but the pitch is “no WSL, no Docker, no admin install” on Windows, which is the practical bottleneck for a non-trivial slice of practitioners running consumer hardware. A community-maintained vLLM fork shipping prebuilt wheels is the kind of glue work that makes consumer-GPU LLM serving actually usable; Blackwell (50-series) support is on the author’s roadmap but not in today’s release.

A community implementation of Meta’s “Scaling Test-Time Compute for Agentic Coding”

Source: r/MachineLearning

Open-source implementation of the PDR+RTV pipeline from Meta’s recent test-time-compute-scaling paper (arXiv 2604.16529), tested on SWE-bench against Gemini 3.1-Pro. The paper itself is the better citation — the implementation is “first public mirror” rather than a new finding — but the existence of a runnable copy this fast is the practitioner-relevant signal. Test-time compute scaling has shifted from frontier-lab posture to community-experimentable in roughly three weeks.

📰 Technical News & Releases

KKR closes $10B+ for Helix Digital Infrastructure under ex-AWS chief Selipsky

Source: Bloomberg

KKR launched Helix Digital Infrastructure with $10B+ in secured capital — sovereign-wealth and strategic-partner money rather than a fundraise target — to design and operate purpose-built AI infrastructure: data centres, on-site power generation, transmission, and fibre. Helix is led by Adam Selipsky, the former AWS CEO, who’s confirmed as the operational lead, not a passive board figure. The structural play is to sit between hyperscalers and the physical asset stack — Helix takes ownership and operations of the long-lived infrastructure, hyperscalers take compute capacity off the top.

Note

The “$700B hyperscaler capex pipeline” framing some primary writeups attached to this story is doing more work than the deal itself justifies. Helix isn’t unlocking that capex — it’s competing for a slice of it against existing infra REITs, sovereign-backed builders, and the hyperscalers’ own self-build. The genuinely new thing is the deal shape: $10B of patient PE money structured for asset-level partnership rather than the project-finance debt that’s traditionally underwritten this category. Read the news as private equity arriving at scale in AI infrastructure, not as PE inventing the category.

Anthropic ships Claude Code Security in public beta to Enterprise customers

Source: Anthropic

Anthropic launched Claude Code Security in public beta on May 1, available initially to Claude Enterprise customers. The product is built on Claude Opus 4.7 (the April 16 model release) and pitches as a code scanner that detects vulnerabilities and suggests fixes — positioned as developer-side rather than CI-side, integrated into the Claude Code flow developers already use. The Enterprise-only tier gating is explicit, not inferred; this isn’t general availability and won’t be for some time. The strategic contrast with the May 1 Pentagon classified-network exclusion (2026-05-02-AI-Digest) is hard to miss — Anthropic doubles down on commercial-enterprise security tooling the same week the federal classified-network door closes — but the Code Security launch was on a roadmap that predates the Pentagon news. Don’t read causality into adjacency.

Mistral launches Medium 3.5 with cloud-resident “Vibe” remote agents

Source: Mistral AI

Mistral released Mistral Medium 3.5, a 128B dense multimodal model, alongside Vibe remote agents — cloud-resident coding agents that run asynchronously and can be teleported from the local CLI / Le Chat to the cloud sandbox without losing session state. Vibe ships on Pro, Team, and Enterprise plans; API pricing is $1.50 / $7.50 per 1M input/output tokens. The product framing is closer to Claude Code‘s recent “session in the cloud” direction than to a pure VS Code copilot — Mistral is positioning Vibe as agent infrastructure, not assistant tooling.

Note

Mistral’s published headline number includes a 77.6% on SWE-Bench Verified for Medium 3.5; we couldn’t independently corroborate that placement against the public leaderboard at writing, so treat it as Mistral’s own benchmark unless and until it shows up on the maintained Verified leaderboard. The 128B parameter spec, multi-aspect-ratio vision encoder, and Vibe integration list (GitHub, Linear, Jira, Sentry) are confirmed on the Mistral post itself.

GitHub Copilot moves to token-based billing on June 1

Source: The Decoder | GitHub Blog

GitHub announced that Copilot is moving from premium-request pricing to a token-based credit model effective June 1, 2026. Plan prices stay flat — Pro at $10, Pro+ at $39, Business at $19/user/month — but consumption inside the plan now bills against API token rates (input, output, cached) rather than counting requests. The practitioner-visible consequence is that high-context, agentic workflows — the kind that hit a single chat session with thousands of tokens in tool-call output — will burn through allowance noticeably faster than the request-counting model surfaced. Same nominal price, different cost curve. Read the move as cost-mapping clarity rather than a forced repricing; nothing in the announcement frames this as competitive pressure.

Musk acknowledges xAI distilled OpenAI models for Grok during testimony

Source: MIT Technology Review | TechCrunch

In testimony during the Musk v. Altman trial’s first week, Elon Musk acknowledged that xAI used knowledge distillation on OpenAI model outputs to accelerate Grok’s training, framing the practice as “a general practice among AI companies.” The admission’s discovery weight is real — distillation has been an open secret in industry-watcher circles, but a courtroom-record acknowledgement is new. The legal-versus-contractual question is the part the headlines are flattening: distillation isn’t clearly illegal in U.S. law, but it routinely violates the terms of service that frontier-lab APIs publish. So this is contractual liability, not statutory liability — at least until courts have ruled.

Note

Subagent reporting on this story sometimes attached a “Musk ranked vendors Anthropic > OpenAI > Google > Chinese OSS” quote to the testimony. We couldn’t find that ranking in primary sources during verification — it appears to be a paraphrased characterisation that didn’t survive cross-checking. Treat the distillation acknowledgement as the load-bearing fact; the ranking, set aside.

Meta’s free business AI hits 10M conversations/week, monetisation still future-state

Source: TechCrunch

Meta disclosed that its business AI tools — free across Messenger, WhatsApp, and Instagram — are now facilitating roughly 10 million customer conversations per week, up roughly 10× from ~1 million at the start of 2026. The model behind it is Muse Spark, the in-house LLM Meta debuted in early April. The detail worth holding onto: this is currently free to businesses, not a paid tier, and Meta’s own framing in the announcement is that “as the company makes more progress, it expects to establish a longer-term monetization model” — i.e. the revenue plan is still future-state. Read this as customer-acquisition surface for an eventual paid SMB workflow product rather than a current revenue line.

Karpathy frames “Software 3.0” at Sequoia Ascent — but it’s his workflow, not industry consensus

Source: Karpathy on Bear Blog (Sequoia Ascent 2026 writeup)

Andrej Karpathy published a Sequoia Ascent 2026 writeup on May 2 framing the current moment as “Software 3.0” — prompts plus agents plus context plus verification, distinct from both Software 1.0 (handwritten code) and Software 2.0 (model weights as code). The widely-quoted detail: by December 2025 his personal coding workflow had inverted from ~80% manual + autocomplete to ~80% delegated to agents with human review. The framing is intellectually clean, and Karpathy is one of the practitioner voices whose forecasts have repeatedly pre-figured shifts the news outlets cover months later. But the 80% number is Karpathy’s own workflow, not a measured industry adoption stat, and “biggest programming shift in two decades” is his thesis, not consensus. Read it as one practitioner’s lens on his own desk; treat aggregate “agentic-engineering adoption” claims with that anchor in mind.

🧭 Key Takeaways

AI infrastructure is becoming a capital-allocation thesis, not just a product thesis. Helix’s $10B under Selipsky is the same week’s evidence as Anthropic shipping a developer-security product to Enterprise and Mistral wiring agents into cloud sandboxes. Three different layers — physical infra, security tooling, agent compute — getting funded or shipped on the same day. The market isn’t picking one definition of “AI infrastructure”; it’s funding all of them in parallel.
The Pentagon-Anthropic split is now structural rather than reactive. Yesterday’s exclusion (2026-05-02-AI-Digest) and today’s Claude Code Security beta launch read together as Anthropic doubling down on commercial-enterprise as the wedge — not a pivot, just a posture. Enterprise software is the lane the federal-classified-network door closing actually clears for Anthropic, and the strategic logic is consistent with the use-restriction posture the company has held publicly.
Karpathy’s Software 3.0 framing is sharp, but treat the 80% number as personal workflow, not industry adoption. The agentic-coding shift is real and Karpathy’s writeup will get heavily cited; the failure mode to avoid is reading his desk as the population. The distance between “Karpathy delegates 80% to agents” and “the industry has shifted to 80% agent-directed work” is large, and most practitioner surveys still show high single-digit-to-mid-twenties percentage adoption of agent-mode workflows even at engineering-forward orgs.
Copilot’s pricing reshape is cost-mapping, not market-pressure. The move to token-based billing is GitHub aligning customer charges to actual model consumption, and the per-tier price is unchanged. The practitioner consequence is real for high-context agentic users, but the right frame is “internal cost transparency” rather than “the AI tooling market is forcing a repricing.”
Distillation as a contractual question, not a legal one. The Musk testimony’s standing weight is the courtroom-record acknowledgement, not a settled legal theory. The questions distillation actually raises in current law are around terms-of-service enforceability and competitive harm, not “is distilling another model’s outputs illegal” — and the answers are likely to come from contract litigation rather than statutory carving.

Generated on 2026-05-03 by Claude