Google Threat Intelligence Group publishes the first publicly attributed criminal use of an AI-built zero-day exploit, identifying telltale LLM authorship artifacts in a 2FA bypass aimed at an open-source web admin tool.

AI Digest — May 12, 2026

Your daily deep-dive on AI models, tools, research, and developer ecosystem news.

🔖 Project Releases

Claude Code

v2.1.139 shipped 2026-05-11 at 18:43 UTC — about 18 hours after the 2026-05-11-AI-Digest closed on v2.1.138, so today’s note picks up where yesterday’s patch wave left off.

Agent View (Research Preview) — claude agents now surfaces a single unified list of every Claude Code session, with each entry tagged running, blocked-on-you, or done. First time the lifecycle has had a primary CLI surface rather than being scattered across --print invocations and IDE tabs.
/goal command — sets a completion condition and lets Claude work across turns until it’s met, with a live overlay panel showing elapsed time, turn count, and token spend. Works in interactive, -p, and Remote Control modes. The framing puts it in the same surface area as Codex and Cursor’s agent modes; the differentiating bit is the named stopping condition and the instrumentation overlay rather than a category-defining capability shift.
hook continueOnBlock + exec-form args: string[] — PostToolUse hooks can now feed a rejection reason back to Claude and continue the turn instead of halting; hooks also gained a no-shell args: string[] exec form so path placeholders never need quoting. The exec form is the load-bearing part for anyone scripting around paths with spaces or quotes.
Compaction now preserves sensitive user instructions, and MCP stdio servers receive CLAUDE_PROJECT_DIR — aligning them with the hook environment. /mcp Reconnect picks up .mcp.json edits without restart.

Beads

No new release since v1.0.4 on 2026-05-09 (covered in 2026-05-11-AI-Digest). Quiet cycle on the public release surface.

OpenSpec

Still on v1.3.1 from 2026-04-21 — 21 days since last tag. Already noted as a quiet cycle in 2026-05-11-AI-Digest; no commits or pre-release activity visible.

🧵 From the Community (r/LocalLLaMA & r/MachineLearning)

Computer build using Intel Optane Persistent Memory running Kimi K2.5 at 4+ tok/s (517 points, Tutorial | Guide flair). A builder used discontinued Intel Optane Persistent Memory — DIMM-form-factor non-volatile memory sitting between DRAM and SSD on the latency curve — to run Kimi K2.5, a 1T-parameter MoE model, locally at over four tokens per second. The post is a full build guide and appears to be the first documented LLM inference build using Optane PMem. Worth holding loosely: Optane has been EOL since 2022, so the parts pool is fixed and shrinking, but the build demonstrates a memory tier that materially expands addressable working-set for very large models on commodity hardware.

MTP on Unsloth (393 points, News flair). Unsloth released GGUF builds of Qwen 3.6-27B and Qwen3.6-35B-A3B with the multi-token-prediction layer preserved, enabling speculative-style MTP inference. Users still need to build llama.cpp from the open MTP PR; the model card has the build instructions. MTP can meaningfully lift inference throughput without additional GPU memory, and ready-made GGUFs lower the barrier for the local-inference community to benchmark real-world speed gains rather than treat the feature as theoretical.

ExLlamaV3 major updates (145 points, News flair). Turboderp shipped a rapid sequence of ExLlamaV3 releases: Gemma 4 support, improved cache efficiency, and DFlash. The dev branch is accumulating commits at a high cadence. ExLlamaV3 is a primary inference engine for quantized models on consumer GPUs; throughput and model-compatibility changes here propagate directly to anyone running local models on a single workstation card.

PSA: extra spaces in chat-template-kwargs silently break llama-server on Qwen3.6 (50 points, Other flair). A user discovered that whitespace inside the JSON string passed to chat-template-kwargs in llama-server’s models.ini silently disables preserve_thinking for Qwen3.6 — { "preserve_thinking": true } fails where {"preserve_thinking":true} works. A trivial fix once known, but the failure mode is silent, which is exactly the kind of footgun that wastes hours for anyone deploying Qwen3.6 thinking-mode behind llama-server.

📰 Technical News & Releases

Google publicly attributes the first criminal AI-built zero-day exploit

Source: Bloomberg | Google Cloud Threat Intelligence Blog | BleepingComputer

Google’s Threat Intelligence Group (GTIG) reported “high confidence” that a financially-motivated criminal actor used an AI model to build a working zero-day exploit — a Python script bypassing two-factor authentication in a popular open-source web-based system administration tool. GTIG identified the LLM authorship signature from a cluster of telltale artifacts: “educational docstrings, including a hallucinated CVSS score, and using a structured, textbook Pythonic format highly characteristic of LLM training data.” The specific model used remains unattributed; GTIG explicitly noted Gemini was not involved.

GTIG’s frame is “first identified,” not “first ever”

Google’s own language: “For the first time, GTIG has identified a threat actor using a zero-day exploit we believe was developed with AI.” The report separately notes that PRC- and DPRK-aligned state actors have already demonstrated sophisticated AI-augmented vulnerability discovery before this criminal case. The defensible read is that this is the first publicly attributed criminal use of an AI-built zero-day in the wild — not a historical first for AI-built exploits as a category.

Google worked with the affected vendor to patch silently before a planned mass-exploitation campaign launched. The load-bearing piece for defenders: the LLM-authored code wasn’t just executable, it was good enough that detection had to lean on stylistic tells rather than functional failure. That moves the offensive baseline forward in a way prior “AI assists offensive workflows” framings didn’t capture.

OpenAI formally launches “The OpenAI Deployment Company” as a Palantir-style subsidiary

Source: OpenAI announcement | The Decoder | Axios

OpenAI announced “The OpenAI Deployment Company” — referred to internally as DeployCo — as a majority-controlled subsidiary backed by $4B in fresh capital at a $10B pre-money valuation, with TPG leading the round. DeployCo absorbs the recent ~150-engineer Tomoro acquisition and ships Forward Deployed Engineers into enterprise customer operations. The Decoder’s headline framing — “adopts Palantir’s playbook” — is press analogy rather than OpenAI’s own language, but the structural resemblance (on-site engineers embedding into client workflows as the moat) is the actual shape of the entity.

The strategic read is in what DeployCo replaces

Accenture stock dipped on the announcement, which is the market saying it reads DeployCo as competition for the AI-implementation consulting category, not as a complement to it. Pair with 2026-05-08-AI-Digest‘s note on Anthropic’s parallel Blackstone/Goldman enterprise-services structure: both frontier labs now have explicit forward-deployed-engineering subsidiaries, which moves the competitive front from raw model capability toward workflow ownership.

The Tomoro absorption is the load-bearing part operationally — 150 engineers with implementation expertise inside the org gives DeployCo a head start that the “we’ll embed engineers” announcements from other labs over the past year have lacked.

Baidu reports Ernie 5.1 trained at “6% of comparable cost,” ranks 4th on Arena Search

Source: The Decoder

Baidu published Ernie 5.1 with a claim that pre-training cost ran at six percent of what comparable models require — a 94% reduction. The reported method is an elastic training approach Baidu describes as allowing smaller sub-models to be extracted from a single large-foundation training run (functionally a “once-for-all” pattern, though The Decoder’s article doesn’t use that exact name). On the Arena Search Leaderboard Ernie 5.1 sits at 4th globally with 1,223 points as of 2026-05-09, trailing Claude Opus 4.6 Search (1,255), GPT-5.5 Search (1,242), and Claude Opus 4.7 (1,236).

The 94% figure is Baidu’s claim, not an audited benchmark

The Decoder explicitly flags that no model weights have been released, making independent verification impossible. The defensible read is that this is a strategic positioning claim — directionally consistent with DeepSeek‘s cost-efficiency posture (see 2026-05-09-AI-Digest) — not an audited cost benchmark. Worth holding loosely until weights or training logs ship.

The narrower piece worth tracking is the Arena Search 4th-place finish, which is at least a third-party benchmark on output quality, even if it tells you nothing about training cost.

DALL-E 2 and DALL-E 3 reach API shutdown today; image generation consolidates on gpt-image-* lineage

Source: OpenAI deprecation notice | OpenAI deprecation docs

Per the November 2025 deprecation notice, DALL-E 2 and DALL-E 3 are shut down today — May 12, 2026 — with OpenAI directing API users to migrate to gpt-image-1.5 or gpt-image-1-mini. The honest read is that this is housekeeping rather than a strategic turning point: the gpt-image-1 launch in April 2025 had already commercially deprioritized the DALL-E line, and today’s API end-of-life is the formal closing of a transition that effectively ended a year ago.

The operational point for anyone still on the old endpoints: migrate today. The pricing and capability gap between gpt-image-1.5 and the deprecated DALL-E 3 is large enough that “we’ll port it next sprint” plans will surface integration issues immediately.

Shopify‘s internal coding agent “River” is deliberately public-channel-only

Source: Simon Willison’s Weblog — relaying Tobias Lütke’s X post

Shopify CEO Tobias Lütke described the company’s internal coding agent, River, in an X post that Simon Willison relayed: River refuses direct messages and forces every coding conversation into a public Slack-style channel. Lütke calls this “osmosis learning,” invoking the German Lehrwerkstatt (apprenticeship workshop) concept — the design intent is that engineers see how peers prompt the agent, what context they assemble, and where the agent fails, all by default. Willison parallels it to early Midjourney’s Discord-only interaction model.

The pattern is suggestive rather than proven

Willison’s framing is doing the synthesis work here; no other enterprise has publicly described a forced-transparency coding-agent design pattern, and Midjourney’s Discord was a consumer distribution channel rather than an internal enterprise tool, so the parallel is structurally looser than it reads. The defensible read is that this is a design principle Shopify is articulating, not yet a documented cross-enterprise pattern — but the org-design problem (junior engineers learning agent-use by observing senior ones) is real and the proposed solution is novel enough to be worth tracking.

Pair with today’s Claude Code v2.1.139 Agent View blurb above: both moves treat agent-session visibility as a primary surface rather than a debugging affordance, which is the same instinct from two different angles.

🧭 Key Takeaways

The Google zero-day attribution moves the offensive AI baseline. GTIG’s catch was made on stylistic tells — hallucinated CVSS scores, textbook docstrings — not on the exploit failing to work. The exploit worked. Detection had to lean on authorship signatures because functional defense didn’t catch it. That’s a different regime than “AI helps attackers script known techniques faster.”
OpenAI DeployCo formalises workflow ownership as the new competitive front. Two frontier labs now have explicit forward-deployed-engineering subsidiaries (DeployCo and Anthropic’s Blackstone/Goldman vehicle from 2026-05-08-AI-Digest). The Accenture stock dip is the market reading this as direct AI-implementation-consulting competition, not complementary tooling.
Baidu’s 94% pre-training cost claim is a positioning move, not a benchmark. Closed weights, single-vendor figure, no independent verification. The Arena Search 4th-place finish (third-party measured) is the part with actual signal; the 94% number rides alongside it but should be cited as Baidu’s claim, not as audited data.
Claude Code’s /goal is catching up, not leading. Devin, Cursor Agents, and OpenAI Codex have shipped analogous named-goal interfaces. The differentiated piece in v2.1.139 is the instrumentation overlay (elapsed/turns/tokens) and the Agent View lifecycle list, not the goal command in isolation. The real signal is the cluster of agentic-surface features shipping together.
The local-inference community is in an MTP-and-memory-tier moment. Today’s r/LocalLLaMA top threads thread two stories: Unsloth’s Qwen3.6 MTP GGUFs lowering the throughput-gain barrier, and the Optane Kimi K2.5 build showing prosumer hardware can host 1T-parameter MoE inference if you reach for unusual memory tiers. Together they keep narrowing the “you need a cluster for this” envelope — a continuation of the arc visible across 2026-05-09-AI-Digest and 2026-05-10-AI-Digest.

Generated on 2026-05-12 by Claude