Codex Security finds 792 critical vulnerabilities in 1.2M commits with major false positive reduction.

AI Digest — March 25, 2026

Your daily deep-dive on AI models, tools, research, and developer ecosystem news.

🔖 Project Releases

Claude Code

v2.1.83 released today (March 25). This is a substantial release with a focus on enterprise policy management and developer workflow improvements. The headline feature is managed-settings.d/, a drop-in directory alongside managed-settings.json that lets teams deploy independent policy fragments which merge alphabetically — a significant quality-of-life improvement for organizations managing Claude Code across multiple teams with different configuration needs.

New hook events CwdChanged and FileChanged enable reactive environment management (think automatic direnv reloads when switching directories). Security gets a boost with CLAUDE_CODE_SUBPROCESS_ENV_SCRUB=1, which strips Anthropic and cloud provider credentials from subprocess environments — important for teams running Claude Code in CI/CD pipelines where credential leakage is a concern.

Other notable additions: transcript search (press / in transcript mode, n/N to navigate matches), sandbox.failIfUnavailable to hard-fail when sandboxing can’t start, initialPrompt frontmatter for agents to auto-submit a first turn, and pasted images now inserting [Image #N] chips at the cursor for positional reference.

The bug fix list is extensive — 25+ fixes including a 1–8 second UI freeze on startup caused by eager audio module loading, a ~3s startup delay from waiting for claude.ai MCP config fetch, and --mcp-config CLI flag bypassing managed policy enforcement (a security-relevant fix). Performance improvements include ~600ms faster claude -p startup, plugin disk caching (no re-fetching), and non-streaming fallback token cap increased from 21k to 64k.

If you’re managing Claude Code in a team environment, the

directory is the key upgrade here. You can now deploy separate policy files per team or concern (security, model selection, permissions) without merge conflicts in a single JSON file.

Full release notes: GitHub

Beads

No new release since v0.62.0 reported on March 24. The embedded Dolt backend, Azure DevOps integration, custom status categories, and bd note command from that release remain the latest features.

OpenSpec

No new release since v1.2.0 reported on March 8. The profiles system, propose workflow, and support for Pi and AWS Kiro IDE remain the latest features.

🧵 From the Community (r/LocalLLaMA & r/MachineLearning)

Reddit remains inaccessible via direct fetch. Community discussions are sourced from web search cross-references, secondary aggregators, and content syndicated to other platforms.

Moonshot AI’s Attention Residuals paper is generating serious architectural debate. The Kimi team’s paper (arXiv:2603.15031, published March 15) proposes replacing standard residual connections in Transformers with depth-wise attention, allowing each layer to selectively aggregate earlier representations instead of blindly accumulating everything with fixed unit weights. The community is interested because it’s a drop-in replacement that delivers a 1.25x compute efficiency gain — not a new architecture, but a targeted fix for a known weakness (PreNorm signal dilution at depth). Elon Musk publicly praised the research, which amplified community attention. The practical question practitioners are debating: should this be the default for new training runs, or does it add enough complexity to make fine-tuning and inference more fragile?

KubeCon Europe is driving cloud-native AI infrastructure conversations. Beyond Dapr Agents v1.0 (covered yesterday), Microsoft’s announcement of AI Runway — a common Kubernetes API for inference workloads with HuggingFace model discovery and GPU memory fit indicators — is getting attention from platform engineers who need to deploy models without building custom inference pipelines. The CNCF’s report that 7.3 million AI developers are now cloud-native (out of 19.9 million cloud-native developers total) puts numbers on what practitioners already feel: AI workloads are increasingly Kubernetes-native.

Shadow AI agents are becoming a real governance problem. Nudge Security’s AI agent discovery launch (March 24) is resonating with enterprise developers and security teams. The stat that 80% of organizations have already encountered agentic AI risks from unauthorized agent deployments — agents with hardcoded credentials, unauthenticated MCP connections, and public-facing endpoints — validates what many security-conscious developers have been warning about: the ease of deploying agents on platforms like Microsoft Copilot Studio and Salesforce Agentforce is outrunning governance capabilities.

📰 Technical News & Releases

OpenAI Codex Security: AI-Powered Vulnerability Scanning Finds 792 Critical Issues in 1.2M Commits

Source: OpenAI | The Hacker News | SecurityWeek

Launched March 21 in research preview, Codex Security is an evolution of OpenAI’s Aardvark project (private beta since October 2025) — an AI agent that continuously analyzes codebases, identifies vulnerabilities, and proposes automated patches. During a 30-day beta scanning over 1.2 million commits across external repositories, it found 792 critical and 10,561 high-severity issues in open-source projects including OpenSSH, GnuTLS, PHP, and Chromium.

The performance claims are notable: up to 84% reduction in alert noise and 50%+ cut in false positives compared to traditional static analysis tools. Available to ChatGPT Pro, Enterprise, Business, and Edu customers via the Codex web interface, with free usage for the next month. For security-conscious development teams, the question is whether an AI-driven scanner that produces substantially fewer false positives can replace or augment existing SAST pipelines like Semgrep, CodeQL, or Snyk — and whether the tradeoff of sending code to OpenAI’s infrastructure is acceptable for your threat model.

Moonshot AI’s Attention Residuals: A Drop-In Fix for Transformer Signal Dilution

Source: Moonshot AI / Kimi Team | GitHub | MarkTechPost

Published March 15 and gaining traction this week, Attention Residuals (AttnRes) is a targeted architectural improvement from Moonshot AI’s Kimi team that addresses a decade-old limitation in Transformer residual connections. Standard PreNorm residual connections accumulate all layer outputs with fixed unit weights, causing hidden-state magnitude to grow with depth and progressively diluting each layer’s contribution. AttnRes replaces this fixed accumulation with softmax attention over preceding layer outputs, making aggregation learned and input-dependent.

The practical variant, Block AttnRes, partitions layers into ~8 blocks, uses standard residuals within each block, and applies attention only over block-level representations. This recovers most of Full AttnRes’s gains while adding marginal overhead. Integrated into the Kimi Linear architecture (48B total / 3B activated MoE), Block AttnRes reached the baseline’s training loss using 1.25x less compute. The architecture used is identical to Kimi Linear (which follows the DeepSeek-V3 design), interleaving Kimi Delta Attention and Multi-Head Latent Attention layers in a 3:1 ratio — the only modification is the residual connection replacement.

For practitioners training models from scratch, this is worth serious evaluation: it’s a principled fix for a known problem, it’s drop-in compatible, code is available, and the compute savings are meaningful at scale. For fine-tuning existing models, adoption is less straightforward since the residual structure is baked in at pre-training time.

The key insight: standard residual connections are a known source of signal dilution at depth, but nobody had a practical replacement until now. Block AttnRes is simple enough to implement and impactful enough to matter — expect this to show up in the next generation of open-weight model architectures.

KubeCon Europe 2026: Microsoft AI Runway, HolmesGPT, and CNCF Expands Kubernetes AI Conformance

Source: Microsoft | CNCF | Developer Report | SiliconANGLE

KubeCon Europe 2026 (Amsterdam, March 23–26) is in full swing and producing significant AI infrastructure announcements beyond Dapr Agents v1.0 (covered yesterday). Microsoft announced AI Runway, a new open-source project introducing a common Kubernetes API for inference workloads. It ships with a web interface for model deployment, built-in HuggingFace model discovery, GPU memory fit indicators, and support for runtimes including NVIDIA Dynamo, KubeRay, and others. For platform teams juggling multiple inference runtimes, AI Runway provides a single abstraction layer — rather than writing custom operators for each runtime.

HolmesGPT joined the CNCF as a Sandbox project, bringing agentic troubleshooting capabilities into the shared cloud-native tooling ecosystem. The CNCF also nearly doubled the number of certified Kubernetes AI platforms in its conformance program, with stricter v1.35 requirements now codified as Kubernetes AI Requirements (KARs) — essentially standardizing what “AI-ready Kubernetes” means across vendors. The CNCF/SlashData developer survey reports 19.9 million cloud-native developers globally, with 7.3 million AI developers now building on cloud-native infrastructure, confirming that AI workloads are increasingly Kubernetes-native by default.

Nudge Security Launches AI Agent Discovery: Governing Shadow Agentic AI

Source: Nudge Security | Technical Blog | PR Newswire

Announced March 24, Nudge Security’s AI agent discovery capabilities address what’s becoming the “shadow IT” problem of 2026: employees deploying AI agents across platforms like Microsoft Copilot Studio, Salesforce Agentforce, and n8n without security team awareness. The stat backing the product: 80% of organizations have already encountered agentic AI risks from improper data exposure and unauthorized system access.

The platform discovers agents at their creation source, maps their access patterns, and engages the human creators for context. Risk insights flag specific issues: publicly accessible agents, hardcoded credentials, unauthenticated MCP connections, high-risk integrations, and orphaned agents whose creators have left the organization. For existing Nudge Security customers connected to SaaS environments like Salesforce and ServiceNow, agent discovery extends automatically — no new deployment required. This matters for developers because the agents you build in your dev environment can now be discovered and profiled by your security team. If you’re building agents with production credentials or MCP connections, expect governance conversations to start happening.

Adobe Firefly Expands: Custom Models, 30+ Third-Party Models, and Image Model 5 GA

Source: Adobe

Released March 19 and ramping up this week, Adobe Firefly received its most significant capability expansion since launch. The centerpiece is custom models in public beta: you upload your own images, Firefly trains a model aligned to your specific style (character, illustration, or photographic), and it becomes a reusable asset for ideation. This is Adobe’s answer to the LoRA/Dreambooth workflow that open-source users have been doing manually — but integrated into a commercially licensed pipeline.

The model marketplace expansion is equally notable: creators now have access to 30+ models from across the industry, including Google’s Nano Banana 2 and Veo 3.1, Runway’s Gen-4.5, and Kling’s 2.5 Turbo, alongside Adobe’s own Firefly Image Model 5 (now generally available). The Quick Cut beta uses AI to automatically assemble uploaded or generated clips into a structured first edit. For creative professionals, the significance is that Firefly is becoming a model hub rather than a single-model tool — you pick the right model for each task within Adobe’s IP-indemnified ecosystem.

NIST AI Agent Standards Initiative: Comment Period Closes, March 31 Deadline Approaching

Source: NIST | Initiative Page

The NIST AI Agent Standards Initiative, launched February 17 through the Center for AI Standards and Innovation (CAISI), is reaching key milestones this week. The public comment period closed March 9, and NIST is now processing domain stakeholder input submitted before March 20. The next critical deadline is March 31, when the draft on automated benchmark evaluations closes — this work will likely define how assurance for AI agents is measured and serve as a direct precursor to how auditors and regulators assess agent deployments.

Three focus areas matter most for practitioners building agents: agent identity and authentication (agents will need enterprise-grade identities beyond API keys), stricter authorization models (least privilege, just-in-time access, task-scoped privileges, action-level approvals), and interoperability standards for multi-agent systems. The initiative explicitly aims to facilitate industry-led agent standards and U.S. leadership in international standards bodies. If you’re building production agent systems, the March 31 automated benchmark evaluation deadline is worth tracking — the standards that emerge from this process will shape compliance requirements for enterprise AI agent deployments.

OpenAI Codex CLI Updates: Device-Code Auth, Plugin Setup, and TUI Improvements

Source: OpenAI

OpenAI’s Codex CLI (the open-source coding agent) received app-server TUI updates this week including ChatGPT device-code sign-in (eliminating the need for manual API key configuration), smoother plugin setup workflows, a new user prompt hook for customizing the interaction flow, and improved realtime session behavior. Bug fixes address startup stalls, history restore reliability, Linux sandbox issues, and agent job finalization edge cases.

These are incremental improvements but they collectively address the friction points that have kept Codex CLI from being a daily-driver tool for many developers. The device-code sign-in flow in particular lowers the barrier to entry — you authenticate through your existing ChatGPT account rather than managing separate API keys. Combined with the security scanning capabilities of Codex Security (see above), OpenAI’s coding agent strategy is becoming more cohesive: Codex CLI for interactive development, Codex Security for automated vulnerability scanning.

OpenAI Plans Workforce Expansion to ~8,000 Employees by End of 2026

Source: Multiple reports

OpenAI announced plans to nearly double its workforce to approximately 8,000 employees by end of 2026, primarily to push harder into enterprise sales and product delivery. This comes at a time when Atlassian (which cut 1,600 employees, ~10% of workforce, on March 11) and other tech companies are cutting staff explicitly to redirect resources toward AI development. The contrast is striking: the companies building AI are hiring aggressively while the companies deploying AI are cutting the roles they believe AI will replace. For developers, the practical signal is that OpenAI is investing heavily in enterprise-grade tooling and support — expect continued rapid iteration on their API platform, Codex, and enterprise features throughout 2026.

📄 Papers Worth Reading

Attention Residuals (arXiv:2603.15031)

Authors: Kimi Team, Moonshot AI | Published March 15, 2026 | Paper | Code

Covered in detail above, but the paper merits separate mention for its research contribution. The core finding — that replacing fixed-weight residual accumulation with learned, depth-wise attention yields a 1.25x compute advantage at equivalent loss — is both theoretically clean and practically implementable. The Block AttnRes variant (partitioning into ~8 blocks) makes the idea production-viable by limiting attention overhead. The experiments use Kimi Linear’s MoE architecture (48B total / 3B active, 1.4T training tokens), showing improvements across reasoning, coding, and evaluation benchmarks. The paper explicitly draws an analogy between depth-wise aggregation and temporal attention, suggesting this is a generalizable principle. Code and training configurations are released.

When Does Chain-of-Thought Help: A Markovian Perspective

Authors: Listed on arXiv | March 2026 | arXiv

A recent submission to the cs.LG listing that formalizes when Chain-of-Thought (CoT) prompting actually improves model performance versus when it adds noise. The Markovian framing provides testable predictions about which task structures benefit from step-by-step reasoning. Relevant for anyone designing prompting strategies or deciding when to use thinking-enabled models versus standard inference.

🧭 Key Takeaways

Claude Code v2.1.83’s managed-settings.d/ directory is a sleeper feature for teams. If you’re managing Claude Code across an organization, you can now deploy independent policy fragments per team or concern without merge conflicts. Combined with CLAUDE_CODE_SUBPROCESS_ENV_SCRUB=1 for CI/CD credential safety, this release is materially more enterprise-ready.
OpenAI Codex Security’s 84% alert noise reduction and 50%+ false positive cut could shift the SAST landscape. If those numbers hold in production across diverse codebases, the main barrier becomes whether you’re comfortable sending code to OpenAI — not whether the tool works. Free for a month; worth benchmarking against your current static analysis pipeline.
Moonshot AI’s Attention Residuals is the kind of fundamental architecture improvement that propagates slowly but widely. A 1.25x compute efficiency gain from a drop-in residual connection replacement is significant at training scale. Watch for this showing up in next-gen model architectures from labs that train from scratch.
Microsoft AI Runway could become the standard Kubernetes inference API. A common abstraction over NVIDIA Dynamo, KubeRay, and other runtimes, with HuggingFace integration and GPU memory fit indicators — this is what platform teams have been building internally. If it gains adoption, it eliminates a significant amount of custom infrastructure work.
Shadow AI agents are now an enterprise governance crisis, not a hypothetical risk. Nudge Security’s finding that 80% of organizations already face agentic AI risks from unauthorized deployments means the “secure by default” ship has sailed. If you’re building agents with production credentials or MCP connections, expect security team scrutiny to increase.
The workforce signals are increasingly divergent: AI builders are hiring, AI adopters are cutting. OpenAI targeting 8,000 employees while Atlassian cuts 1,600 “to fund AI” illustrates the structural labor market shift. For developers, the implication is clear: building with and on AI tools is the growth vector.

Generated on March 25, 2026 by Claude