AutoMuon

Overview

AutoMuon is a Python package that provides automatic optimizer selection for model training. The tool enables one-line substitution of AdamW with Muon, automatically assigning Muon to 2D matrices while keeping AdamW for embeddings, norms, and biases.

Timeline

2026-04-26-AI-Digest — AutoMuon ships as a one-line drop-in for AdamW (r/MachineLearning, 7 upvotes, 1 comment, low engagement but high signal-to-noise). The friction-reducing framing is the strategic move: Muon’s empirical wins on training speed have been stuck behind an “is it worth the per-parameter rewiring?” implementation question that AutoMuon removes entirely. Auto-scanning a model and assigning Muon to 2D matrices while keeping AdamW for embeddings, norms, and biases lowers the friction from “research optimization” to “standard practice” — the pattern that moves an optimization from academic paper to production deployments.

Key Developments

Friction Reduction for Muon Adoption: Transforms Muon from a custom-optimizer-per-model decision into an automatic one-liner, removing the engineering friction that keeps optimizations in the research tier.
Selective Optimizer Assignment: Intelligent partitioning (Muon for 2D matrices, AdamW for embeddings/norms/biases) is conservative enough to apply broadly without per-model tuning — the practical path to adoption.
From Paper to Practice: The framing as a “drop-in replacement” signals the maturation path: an academic optimization becomes a standard library primitive when the friction cost (engineering time, per-model tuning) is removed.