Claude Code source leaked via npm revealing 512K lines of architecture including unreleased features.

AI Digest — March 30, 2026

Your daily deep-dive on AI models, tools, research, and developer ecosystem news.

🔖 Project Releases

Claude Code

No new release since v2.1.87 reported on March 29. However, it’s worth noting that v2.1.86 (March 27) — which landed between digest cycles — was a substantial release: it added the X-Claude-Code-Session-Id header for session aggregation, .jj and .sl VCS directory exclusions (Jujutsu and Sapling), improved prompt cache hit rates for Bedrock/Vertex/Foundry, introduced a compact line-number format for the Read tool with deduplication of unchanged re-reads, and fixed a cluster of bugs including --resume failures, out-of-memory crashes on /feedback with long sessions, and --bare mode dropping MCP tools. If you skipped the v2.1.86 update, it’s worth pulling — the cache hit improvements alone may meaningfully reduce your API costs on non-direct providers.

Full release notes: GitHub

Beads

No new release since v0.62.0 reported on March 28. Azure DevOps integration, bd note command, custom status categories, and enhanced audit logging remain the latest features.

OpenSpec

No new release since v1.2.0 reported on March 8. Profiles, propose workflow, Pi and AWS Kiro IDE support remain the latest features.

🧵 From the Community (r/LocalLLaMA & r/MachineLearning)

Reddit remains inaccessible via direct fetch. Community discussions are sourced from web search cross-references, secondary aggregators, and content syndicated to other platforms.

GLM-5.1 coding benchmarks are drawing serious scrutiny. Z.ai’s claim that GLM-5.1 reaches 94.6% of Claude Opus 4.6’s coding performance (45.3 vs 47.9 on their internal coding eval) has the community split. Some practitioners are excited about the cost implications — the GLM Coding Plan starts at $3/month versus Anthropic’s API pricing — while others are questioning whether the benchmark methodology is truly apples-to-apples. The broader context matters: GLM-5’s original release on domestic Huawei Ascend 910B chips (zero NVIDIA GPUs) already made it a technical landmark, and this post-training iteration closing the gap further on coding is fueling discussion about whether the US export controls are actually slowing Chinese AI labs.

The Mythos leak is dominating safety-focused ML discussion. The accidental exposure of Anthropic’s draft blog post describing Claude Mythos as posing “unprecedented cybersecurity risks” has created an unusual dynamic: a frontier lab’s own internal safety assessment is now public before the model itself. Researchers are debating whether Anthropic’s proposed “defender-first” release strategy (early access for cyber defense organizations) is genuinely novel responsible deployment or marketing positioning. The fact that cybersecurity stocks dropped measurably on the news suggests the market is taking the capability claims seriously.

Codex plugins are being compared to Claude Code’s skill/plugin system. With OpenAI launching 20+ plugins for Codex that extend it beyond coding into workflow automation (Figma, Notion, Gmail, Slack integrations), the community is drawing direct comparisons to Claude Code’s existing plugin architecture. The consensus so far: Codex’s plugin ecosystem is catching up fast, but Claude Code’s tighter integration with the MCP protocol and its skill system still offers more flexibility for custom agent workflows.

📰 Technical News & Releases

Anthropic’s “Mythos” Model Leaked via Unsecured Data Cache

Source: Fortune (March 26) | Fortune Follow-up (March 27) | CoinDesk (March 28)

A misconfigured content management system at Anthropic left close to 3,000 unpublished blog assets in a publicly accessible data cache, including a draft post describing a new model called Claude Mythos (internal codename “Capybara”). The model represents a new tier above the current Opus line, and Anthropic describes it as “a step change and the most capable we’ve built to date,” scoring “dramatically higher” than Claude Opus 4.6 on coding, academic reasoning, and cybersecurity benchmarks. The cybersecurity angle is the headline: leaked internal assessments warn Mythos is “currently far ahead of any other AI model in cyber capabilities” and could “exploit vulnerabilities in ways that far outpace the efforts of defenders.” Anthropic’s planned release strategy — giving cyber defense organizations early access before general availability — is a novel approach to responsible deployment, but the leak itself (a human configuration error exposing internal safety assessments) is deeply ironic. Cybersecurity stocks dropped measurably on the news, suggesting markets interpret this as an acceleration of AI-driven offensive capabilities.

The Mythos model is NOT released or available. What leaked was internal documentation and safety assessments. If you see anyone claiming API access to “Claude Mythos” or “Capybara,” it’s a scam. The model’s actual release timeline remains unknown.

Z.ai Ships GLM-5.1: Post-Training Upgrade Closes Coding Gap with Opus

Source: BuildFastWithAI | DigitalApplied | Apiyi Analysis

On March 27, Z.ai (formerly Zhipu AI) released GLM-5.1, a post-training upgrade to its GLM-5 foundation model that specifically targets coding performance. The architecture is unchanged — same 744B-parameter MoE with 40B active parameters per token, 200K context window, and DeepSeek Sparse Attention — but the post-training push yields a 28% improvement over GLM-5 on their coding eval (45.3 vs 35.4), reaching 94.6% of Claude Opus 4.6’s score of 47.9. The cost story is compelling: the GLM Coding Plan starts at $3/month, making this potentially the most cost-effective near-frontier coding model available. GLM-5.1 is not yet open-source, but Z.ai has teased an open-source release (GLM-5 weights are already on Hugging Face under MIT). For context, the entire GLM-5 family was trained exclusively on Huawei Ascend 910B chips with no NVIDIA hardware — the fact that a non-NVIDIA-trained model is reaching 94%+ of Opus on coding tasks is a significant data point about the effectiveness of US export controls.

If you’re evaluating coding models for cost-sensitive production use cases, GLM-5.1 at $3/month is worth benchmarking against your specific workloads. The 28% improvement over GLM-5 is unusually large for a post-training-only update, suggesting Z.ai found significant headroom in their RLHF/DPO pipeline.

OpenAI Expands Codex Beyond Coding with Plugin System

Source: gHacks (March 29) | SiliconANGLE (March 27) | InfoWorld

OpenAI announced on March 26 that Codex now supports plugins — installable bundles that combine skills, app connectors, and resource configurations into shareable packages. Over 20 plugins are available at launch across the Codex app, CLI, and VS Code extension, with integrations including Figma, Notion, Gmail, Google Drive, and Slack. This is a strategic inflection point: Codex is no longer positioned as a coding assistant but as a general-purpose agentic work platform. The plugin architecture lets teams package custom workflows (planning, research, team coordination) and share them internally or publicly. The timing is notable — Claude Code has offered a similar plugin/skill system for months, and this move brings Codex to feature parity on the extensibility front. With over 1 million developers using Codex monthly, the plugin ecosystem could scale quickly. For enterprises, the governance angle matters: plugins provide a structured way to control what Codex agents can access and do, which is a prerequisite for production deployment in regulated environments.

Google Launches Lyria 3 Pro: 3-Minute AI Music Tracks with Structural Awareness

Source: TechCrunch (March 25) | Google Blog | WinBuzzer

Google released Lyria 3 Pro on March 25 — a significant upgrade over Lyria 3 that generates music tracks up to three minutes long (up from 30 seconds) with explicit understanding of song structure: intros, verses, choruses, and bridges can be specified in natural language prompts. The model is available across Vertex AI, Google AI Studio, the Gemini API, Google Vids, the Gemini app (paid subscribers only), and ProducerAI. All generated tracks are watermarked with SynthID. Google states it trained the model on partner data and permissible YouTube/Google data, which positions it more conservatively on copyright than competitors like Suno or Udio who face ongoing lawsuits. For developers building audio-enabled applications, the Gemini API availability means you can integrate music generation alongside text and image generation in a single API surface. The jump from 30 seconds to 3 minutes with structural awareness is a meaningful capability threshold — it makes the model usable for actual background music, content creation, and prototyping use cases rather than just demos.

OpenAI Codex Security: 1.2M Commits Scanned, 10,561 High-Severity Findings

Source: The Hacker News | OpenAI Blog | SecurityWeek

OpenAI’s Codex Security agent — the production evolution of the Aardvark project from October 2025 — has now scanned over 1.2 million commits across external repositories during its beta period, identifying 792 critical and 10,561 high-severity vulnerabilities in projects including OpenSSH, GnuTLS, GOGS, Chromium, and PHP. The key technical differentiator is automated validation: Codex Security uses reasoning models to generate exploit validation for each finding, and OpenAI reports false positive rates have dropped by more than 50% across all repositories over the beta period. The tool is now in research preview for ChatGPT Pro, Enterprise, Business, and Edu customers via Codex web, with free usage for the first month. This positions Codex Security differently from traditional SAST/DAST tools — it’s scanning at commit granularity with model-generated fix suggestions and validation, which is closer to how a human security researcher works than how existing automated scanners operate. The practical question is whether the false positive rate is low enough for integration into CI/CD pipelines without creating alert fatigue.

If you’re running an open-source project of any significance, your codebase may have already been scanned by Codex Security during the beta. Check OpenAI’s disclosure program for any findings related to your repositories.

OpenAI Ships Codex Hooks: Extensibility for the Agentic Loop

Source: OpenAI Developers | Codex Changelog | Augment Code Analysis

Codex v0.116.0 (March 19) introduced an experimental hooks engine that lets developers inject custom scripts into the agentic loop at key lifecycle points — SessionStart, Stop, and a new UserPromptSubmit hook that can block or augment prompts before execution. The practical applications are immediately valuable for enterprise deployment: you can pipe all conversations to custom logging/analytics, scan prompts to block accidentally-pasted API keys or secrets, summarize conversations to create persistent memory automatically, and run custom validators before any agent action executes. The UserPromptSubmit hook is particularly interesting — it creates a programmable gateway before any prompt enters the agent’s execution context, which is exactly what compliance and security teams need to feel comfortable with agentic coding tools in production. Combined with the plugin system (covered above), Codex is building the infrastructure layer that enterprises require: extensibility for custom workflows (plugins) plus governance and observability (hooks).

Anthropic in Talks for October IPO, Eyeing $60B+ Raise

Source: Bloomberg (March 27) | The Tech Portal | CNBC

Bloomberg reported on March 27 that Anthropic executives have discussed an IPO as early as Q4 2026, with bankers expecting the company to raise over $60 billion — which would make it one of the largest tech IPOs in history. The company was valued at $380 billion in its February Series G ($30B raised). Goldman Sachs, JPMorgan, and Morgan Stanley are in early discussions for leading roles. The timing context matters for developers: an IPO-track company has strong incentives to demonstrate revenue growth and platform stickiness, which likely means aggressive feature shipping and competitive API pricing through the rest of 2026. Combined with the Mythos leak (suggesting a major model upgrade is coming) and the recent Claude mobile work tools expansion, Anthropic appears to be building the narrative of a full-stack AI platform ahead of a public listing. No formal filing has been made, and timelines could change.

Chinese State Hackers Targeted Claude Code in Coordinated Campaign

Source: Fortune (March 26) | Crescendo AI

Buried in the broader Mythos leak coverage was a significant security disclosure: Anthropic reported that Chinese state-sponsored hacking groups had been running a coordinated campaign using Claude Code to infiltrate roughly 30 organizations before detection. This is one of the first publicly documented cases of a nation-state actor weaponizing an AI coding agent as an attack tool at scale. The implications extend beyond Anthropic — any agentic coding tool with sufficient capability (Codex, Cursor, Windsurf) could be similarly leveraged. The attack chain reportedly involved using Claude Code’s ability to understand codebases, generate code, and execute shell commands to accelerate reconnaissance and exploitation across target organizations. Coming on the heels of the Langflow RCE and LiteLLM supply chain attacks covered in previous digests, this represents an escalation: from attacking AI infrastructure to using AI infrastructure as an attack vector.

If your organization uses any agentic coding tool with network access and shell execution capabilities, review your security posture. Audit what the tool can access, what credentials it can reach, and what network egress is permitted. The threat model for AI coding agents now explicitly includes nation-state adversaries.

📄 Papers Worth Reading

GLM-5: From Vibe Coding to Agentic Engineering

Authors: Z.ai Research | arXiv: 2602.15763

The technical report for the GLM-5 family provides detailed documentation of the “Slime” reinforcement learning technique that compressed GLM-5’s hallucination rate from 90% (GLM-4.7) to 34%. The paper details the MoE architecture (744B total, 44B active), the training on 100K Huawei Ascend 910B chips using MindSpore, and the DeepSeek Sparse Attention mechanism. Particularly relevant now that GLM-5.1 has shipped: the post-training pipeline described in this paper is the foundation that yielded the 28% coding improvement, suggesting the architecture has significant remaining headroom for task-specific optimization.

MiMo-V2-Flash Technical Report

Authors: Xiaomi MiMo Team | arXiv: 2601.02780

Worth revisiting in the context of the current open-source model discussions: MiMo-V2-Flash’s hybrid attention architecture (5:1 sliding window to global attention ratio with 128-token windows) achieves nearly 6x KV-cache reduction while maintaining long-context performance via learnable attention sink bias. At 309B total / 15B active parameters with 150 tok/s inference, this paper documents the architectural choices that make MiMo-V2-Flash the top-ranked open-source model on SWE-bench Verified while costing roughly 3.5% of Claude Sonnet 4.5. The Multi-Token Prediction approach is particularly interesting for anyone building inference-optimized serving infrastructure.

🧭 Key Takeaways

The Mythos leak is more significant for what it reveals about Anthropic’s internal safety assessment process than for the model itself. A frontier lab’s own unvarnished capability evaluation — including explicit acknowledgment of “unprecedented cybersecurity risks” — is now public. This sets a precedent: future model safety assessments will be written with the assumption they could leak, which may make them less candid.
GLM-5.1 reaching 94% of Opus coding performance at $3/month should change your cost modeling. If you’re running high-volume coding tasks against frontier APIs, benchmark GLM-5.1 on your specific workloads. The 28% post-training improvement over GLM-5 suggests Z.ai found significant optimization headroom, and the promised open-source release could eliminate even that $3 price floor.
OpenAI’s Codex is now a direct platform competitor to Claude Code, not just a coding tool. Plugins for workflow automation + hooks for governance/observability + Security for vulnerability scanning = a full agentic work platform. If you’ve standardized on one tool, evaluate whether the other’s latest capabilities address gaps in your setup.
Nation-state actors are now weaponizing AI coding agents — not just attacking AI infrastructure. The documented Chinese state campaign using Claude Code to infiltrate ~30 organizations is a qualitative escalation from the Langflow/LiteLLM attacks. Your threat model for agentic coding tools needs to include adversarial use by sophisticated actors with access to the same tools your team uses.
Google’s Lyria 3 Pro crossing the 3-minute generation threshold with structural awareness makes AI music practically usable. If you’re building content creation tools, the Gemini API integration means music generation is now available alongside text and image in a single API surface — a significant reduction in integration complexity.
Codex Security’s automated validation approach (exploit generation + declining false positive rates) is the template for AI-powered security tooling. The 50% false positive reduction over the beta period suggests the model-in-the-loop validation approach works. Watch for this pattern to spread to other security tool categories.

Generated on March 30, 2026 by Claude