Arm announces first in-house CPU chip in 35 years co-developed with Meta for AI inference.

AI Digest — March 26, 2026

Your daily deep-dive on AI models, tools, research, and developer ecosystem news.

🔖 Project Releases

Claude Code

v2.1.84 released today (March 26). Hot on the heels of yesterday’s v2.1.83, this release adds a PowerShell tool for Windows (opt-in preview), giving Windows developers a native shell experience alongside the existing Bash tool. New hook events TaskCreated and WorktreeCreate expand the automation surface — WorktreeCreate notably supports type: "http" and can return the created worktree path via hookSpecificOutput.worktreePath, enabling tighter CI/CD integration with isolated worktree-based agent runs.

Enterprise admins get allowedChannelPlugins, a managed setting for defining channel plugin allowlists. New environment variables ANTHROPIC_DEFAULT_{OPUS,SONNET,HAIKU}_MODEL_SUPPORTS let teams override effort/thinking capability detection for pinned default models on third-party providers (Bedrock, Vertex, Foundry), while CLAUDE_STREAM_IDLE_TIMEOUT_MS makes the streaming idle watchdog configurable (default 90s). Deep links (claude-cli://) now open in your preferred terminal rather than the first-detected one, and rules/skills paths: frontmatter now accepts YAML lists of globs for more granular scoping.

Performance and UX improvements include MCP tool descriptions capped at 2KB to prevent context bloating, MCP server deduplication between local and claude.ai configs, a ~30ms interactive startup improvement via parallel setup() execution, and token counts ≥1M now displaying as “1.5m” instead of “1512.6k”. The idle-return prompt nudges users returning after 75+ minutes to /clear, reducing wasteful token re-caching. Bug fixes address voice push-to-talk character leakage, a hang when generating attachment snippets for large edited files, MCP tool/resource cache leaks on server reconnect, and spurious “Not logged in” errors on macOS from transient keychain failures.

If you use Claude Code on Windows, the PowerShell tool preview is the headline feature — opt in and test it. For teams on Bedrock/Vertex/Foundry, the new

env vars solve the longstanding pain of capability detection mismatches with pinned models.

Full release notes: GitHub

Beads

No new release since v0.62.0 reported on March 24. The embedded Dolt backend, Azure DevOps integration, custom status categories, and bd note command remain the latest features.

OpenSpec

No new release since v1.2.0 reported on March 8. The profiles system, propose workflow, and support for Pi and AWS Kiro IDE remain the latest features.

🧵 From the Community (r/LocalLLaMA & r/MachineLearning)

Reddit remains inaccessible via direct fetch. Community discussions are sourced from web search cross-references, secondary aggregators, and content syndicated to other platforms.

OpenClaw security is dominating the local-first AI conversation. The community is split between enthusiasm for running autonomous agents locally without cloud APIs and genuine alarm about the security surface. Multiple independent security audits published this week — from Giskard, PromptArmor, Cisco, and Kaspersky — all converge on the same conclusion: OpenClaw’s default configuration is dangerously permissive. The Telegram/Discord link preview attack vector (where indirect prompt injection via previewed URLs can exfiltrate API keys and environment variables) is particularly concerning for practitioners running agents with production credentials. China’s CNCERT issued an explicit warning, and the practical guidance emerging from the community is clear: if you must run OpenClaw, do it in a fully isolated VM with dedicated, non-privileged credentials accessing only non-sensitive data.

Arm’s AGI CPU announcement is generating infrastructure-level excitement. Platform engineers and MLOps practitioners are dissecting the 136-core Neoverse V3 design, the 96 lanes of PCIe Gen 6, and especially the Meta partnership. The air-cooled 36kW rack reference design (30 blades, 8,160 cores) and liquid-cooled 200kW design (336 CPUs, 45,000+ cores) represent a credible alternative to x86 for inference-heavy workloads. The question the community is debating: does this actually threaten AMD/Intel’s data center dominance, or is it primarily a Meta-driven custom silicon play?

KubeCon Europe wraps up today in Amsterdam. The final day includes sessions on OpenTelemetry profiling (now in alpha), Dapr + OpenTelemetry observability workflows, and continued discussion of Microsoft’s AI Runway and the CNCF’s expanded Kubernetes AI conformance program. The conference has cemented the narrative that AI workloads are Kubernetes-native by default — with 7.3 million AI developers now building on cloud-native infrastructure according to the latest CNCF/SlashData report.

📰 Technical News & Releases

Arm Unveils AGI CPU: First In-House Chip in 35 Years, Co-Developed with Meta

Source: Arm Newsroom | Announcement | Meta Partnership | Tom’s Hardware | Coverage

Announced March 24–25 at HP Imagine week, the Arm AGI CPU marks a historic shift: Arm is producing its own silicon for the first time in its 35+ year history, moving from pure IP licensing to a vertically integrated product. Built on TSMC’s 3nm process with up to 136 Neoverse V3 cores at 300W TDP, the chip targets AI inference and agentic workloads at data center scale. The I/O stack is enterprise-grade: 96 lanes of PCIe Gen 6, CXL 3.0 support, and DDR5-8800 memory.

Meta is the lead development partner and plans to deploy AGI CPUs alongside its custom MTIA accelerators. The reference architecture is ambitious — an air-cooled 36kW rack holding 30 blades with 8,160 total cores, or a liquid-cooled 200kW rack with 336 CPUs delivering over 45,000 cores. Commercial commitments come from Cerebras, Cloudflare, F5, OpenAI, Positron, Rebellions, SAP, and SK Telecom. For infrastructure teams, the key question is whether Arm’s power efficiency advantage (a well-established narrative from AWS Graviton) translates to inference workloads where GPU offload and memory bandwidth matter more than core count. The commitment list suggests serious enterprises are betting it does.

OpenAI Shuts Down Sora Video Platform, Kills Disney Partnership

Sam Altman informed staff on March 24–25 that OpenAI is shuttering Sora, its video generation platform, just six months after launching the standalone app and three months after signing a landmark deal with Disney. The Disney partnership — which would have given OpenAI exclusive rights to incorporate hundreds of Disney characters into AI products, backed by a $1B Disney investment — collapses before any money changed hands.

The numbers tell the story: Sora downloads plunged ~75% from their November peak, and the compute cost of video generation proved unsustainable relative to user engagement. Video generation is orders of magnitude more compute-intensive than text, and OpenAI explicitly cited the need to make compute allocation trade-offs. Altman teased “Spud” as an upcoming replacement but provided no technical details. For developers building on video generation APIs, this is a significant signal: even the best-funded lab in AI couldn’t make standalone video generation economically viable at consumer scale. The remaining players — Runway, Kling, Lightricks (LTX), and Adobe Firefly — may benefit from reduced competition, but face the same fundamental compute economics.

OpenAI Foundation Pledges $1B Across Four Priority Areas

Source: Fortune | Coverage | Digitimes | Report

Announced March 25, the OpenAI Foundation committed at least $1 billion over the next twelve months — part of a broader $25 billion pledge first announced in October — distributed across four pillars: life sciences and disease research (led by Jacob Trefethen, formerly managing $500M+ in grants at Coefficient Giving), jobs and economic impact research, AI resilience (led by co-founder Wojciech Zaremba), and community programmes. The health strand targets Alzheimer’s research, public health data initiatives, and high-mortality diseases. For the AI practitioner community, the jobs and economic impact pillar is the most relevant — this is OpenAI funding independent research into the labor market effects of AI, which may produce data that shapes both policy and enterprise AI adoption strategies.

HP IQ: On-Device AI Orchestrator Running a 20B Model Locally

Source: HP Newsroom | Announcement | Techzine | Analysis

Unveiled at HP Imagine 2026 on March 24, HP IQ is an on-device AI orchestrator for commercial PCs built around a local 20-billion-parameter model with specialized tools and a task coordinator. The architecture is local-first by design: HP IQ runs most intelligence on-device, routing to the cloud only when explicitly permitted by enterprise policy. This keeps sensitive data, proprietary IP, and enterprise knowledge within IT’s control — a direct response to the shadow AI governance concerns that tools like Nudge Security are trying to address (covered yesterday).

The technical approach mirrors what Apple is doing with on-device Siri intelligence: a capable-enough local model handles routine tasks with low latency, while the orchestrator decides when cloud escalation is necessary. HP IQ connects across HP notebooks, desktops, and Poly Studio Video Bars, with limited summer 2026 release and broader availability in H2 2026. For enterprise developers, the interesting signal is that major OEMs are now building AI orchestration into the hardware platform layer — not as an app or browser extension, but as a system-level capability. Whether a 20B model is capable enough for substantive enterprise tasks remains the open question.

Vercel AI SDK 6: Agents as First-Class Abstractions with Full MCP Support

Source: Vercel Blog | Announcement | Documentation

Vercel shipped AI SDK 6, the most significant update to their TypeScript AI framework since its inception. The headline feature is agents as a first-class abstraction: you define an agent once (with its tools, model, and constraints) and reuse it across UIs, background jobs, and API routes via ToolLoopAgent, which handles the automated loop of LLM calls, tool executions, and iterations with configurable stop conditions. This eliminates the boilerplate that previously required manual orchestration of multi-step tool-calling workflows.

Full MCP support is now stable in @ai-sdk/mcp, covering OAuth authentication, resources, prompts, and elicitation — meaning developers can expose data through MCP resources, create reusable prompt templates, and handle server-initiated requests for user input. Tool Execution Approval (human-in-the-loop) lets you flag sensitive tools for manual review, integrating with UI hooks like useChat. New AI SDK DevTools provide full visibility into LLM calls and agent execution. The AI Gateway offers unified access to hundreds of models at zero markup (pay only provider token costs, with $5/month free credit per team). For TypeScript developers building AI applications, this release closes the gap between AI SDK and Python-centric agent frameworks like LangGraph — the composable agent pattern with MCP integration is now production-ready in the JavaScript ecosystem.

NVIDIA Nemotron 3 Family: Open Models for Agentic AI at GTC 2026

Source: NVIDIA Newsroom | Announcement | GTC Blog

Announced at GTC 2026 (March 16), the Nemotron 3 family is gaining adoption traction this week. The lineup spans three tiers: Nano (optimized for throughput, 4x faster than Nemotron 2 Nano via a hybrid MoE architecture), Super (120B total / 12B active parameters, scoring 85.6% on PinchBench — a new benchmark for OpenClaw agent performance — making it the top open model in its class), and Ultra (frontier-level intelligence with 5x throughput efficiency using NVFP4 on Blackwell). Complementing the text models, Nemotron 3 Omni integrates audio, vision, and language understanding, while Nemotron 3 VoiceChat supports real-time simultaneous listen-and-respond conversations.

Early adopters include Cursor, Perplexity, CrowdStrike, Palantir, ServiceNow, Siemens, and Zoom. The strategic play is clear: NVIDIA is building an open model ecosystem optimized for its own hardware (Blackwell + NVFP4), creating a tight vertical integration between models and silicon. For developers choosing open models for agentic workflows, Nemotron 3 Super’s combination of agent-optimized benchmarks and NVIDIA hardware acceleration makes it a compelling option if you’re already in the NVIDIA ecosystem.

OpenClaw Security: Multiple Independent Audits Converge on Serious Vulnerabilities

OpenClaw surpassed 250,000 GitHub stars in March and became the most talked-about open-source AI project of 2026, but this week a convergence of independent security research paints a concerning picture. Giskard demonstrated that carefully crafted prompts can extract API keys, environment variables, and secrets from running agents. PromptArmor showed that link preview features in messaging apps (Telegram, Discord) can be weaponized as data exfiltration pathways via indirect prompt injection. Cisco’s blog called personal AI agents like OpenClaw “a security nightmare,” and Kaspersky published an explicit advisory recommending against running it with primary accounts or on devices containing sensitive data.

China’s CNCERT issued an official warning about OpenClaw’s weak default security configurations. AWS launched Managed OpenClaw on Lightsail amid these concerns, suggesting cloud providers see an opportunity in offering hardened, managed deployments. For practitioners: the attack surface is real and well-documented. If you’re running OpenClaw (or similar autonomous agents with system access), the minimum viable security posture is a dedicated VM, non-privileged credentials, no access to sensitive data stores, and careful review of all connected messaging platform integrations. The broader lesson applies to all agentic AI: the more autonomous the agent, the larger the blast radius of a successful prompt injection.

If you have OpenClaw instances connected to production messaging platforms (Slack, Discord, Telegram), review the PromptArmor findings on link preview exploitation immediately. The attack requires zero user interaction beyond the agent processing a previewed link.

IBM Granite 4.0 1B Speech Tops OpenASR Leaderboard for Edge Deployment

Source: IBM | HuggingFace | Blog | MarkTechPost | Coverage

Released March 6–9 and now ranking #1 on the OpenASR leaderboard with a 5.52 Average WER and 280.02 RTFx, Granite 4.0 1B Speech is a compact 1-billion-parameter model designed for multilingual ASR and bidirectional speech translation. It supports six languages (English, French, German, Spanish, Portuguese, Japanese), requires under 1.5GB VRAM, and delivers half the parameter count of its predecessor (granite-speech-3.3-2b) while adding Japanese ASR and keyword list biasing.

For developers building voice interfaces or transcription pipelines on edge devices, the combination of top-tier accuracy, minimal resource requirements, and Apache 2.0 licensing makes this immediately practical. The keyword biasing feature is particularly valuable for enterprise deployments where domain-specific vocabulary (medical, legal, technical) needs to be recognized accurately without fine-tuning. This is IBM’s clearest demonstration yet that competitive speech models don’t need billions of parameters or cloud-scale compute.

📄 Papers Worth Reading

BiJEPA: Bi-directional Joint Embedding Predictive Architecture

Source: arXiv | cs.LG March 2026 listings

A new entry in the March 2026 cs.LG listings proposes extending JEPA (Joint Embedding Predictive Architecture, the framework behind Meta’s V-JEPA and I-JEPA) with bidirectional prediction. Standard JEPA predicts target representations from context; BiJEPA adds the reverse direction, forcing both encoder branches to learn more symmetric and complete representations. The practical implication for self-supervised learning practitioners is a potential improvement in representation quality without architectural complexity — the bidirectional objective is a training-time change, not an inference-time one. Worth tracking if you’re working with JEPA-family models for vision or multimodal tasks.

Exact and Asymptotically Complete Robust Verifications of Neural Networks via Quantum Optimization

Source: arXiv | cs.LG March 2026 listings

This paper explores using quantum optimization algorithms for formal verification of neural network robustness — proving that a network’s outputs are stable under input perturbations. While quantum advantage for practical neural network verification remains theoretical, the paper contributes exact verification methods (not just bounds) for small networks, which could serve as ground-truth benchmarks for classical approximation methods. Primarily of interest to the formal verification and AI safety research communities.

🧭 Key Takeaways

Claude Code v2.1.84’s PowerShell tool preview is the signal that Windows-native support is becoming a first-class priority. If you’re on Windows, opt in and test it. The allowedChannelPlugins managed setting and MCP description capping at 2KB also show continued enterprise hardening — context bloat from poorly-configured MCP servers was a real problem, and this is a pragmatic fix.
Arm’s AGI CPU represents the most credible Arm-based challenge to x86 in data center inference. With Meta as co-development partner and commitments from OpenAI, Cerebras, and Cloudflare, this isn’t a paper launch. The 96-lane PCIe Gen 6 and CXL 3.0 support signal that Arm is targeting the memory bandwidth bottleneck that limits inference throughput — watch for benchmarks against Graviton4 and EPYC Turin in the coming months.
OpenAI shutting down Sora is the strongest market signal yet that standalone video generation is economically unsustainable at consumer scale. A 75% download decline and unsustainable compute costs killed it despite a billion-dollar Disney partnership. If you were building on Sora’s API, start evaluating alternatives now — and factor compute economics into your vendor selection.
OpenClaw’s security vulnerabilities are well-documented and actively exploitable. The link preview prompt injection attack (Telegram/Discord) requires zero user interaction. If you have any autonomous agents connected to messaging platforms with production credentials, audit them this week. This isn’t theoretical — multiple independent research teams have demonstrated working exploits.
Vercel AI SDK 6 makes TypeScript a first-class language for building AI agents. Composable agents with stable MCP support, human-in-the-loop tool approval, and the AI Gateway’s zero-markup model access close the gap with Python agent frameworks. If you’ve been building agents in LangGraph but prefer TypeScript, this is the migration point.
The on-device AI trend is accelerating at the OEM level. HP IQ (20B model, local-first orchestrator) joins Apple’s on-device Siri intelligence and Qualcomm’s NPU push. The pattern is clear: enterprises want AI capabilities that don’t send data to third-party clouds. Expect every major PC OEM to ship an on-device AI layer by end of 2026.

Generated on March 26, 2026 by Claude