MODEL
EMO
Overview
EMO is a 1B-active / 14B-total mixture-of-experts model released by ai2 (the Allen Institute for AI) on May 9, 2026, trained on 1T tokens. The substantive structural distinction is document-level expert routing: experts cluster around domains (health, news, etc.) rather than surface patterns. Most published MoE designs route per-token; document-level routing is closer to retrieval-augmented sparsity than to Mixtral-style per-token gating. Open-weights, with the full collection at allenai/emo on Hugging Face.
Timeline
- 2026-05-09-AI-Digest — ai2 releases EMO on Hugging Face: 1B-active / 14B-total MoE, 1T training tokens, document-level expert routing. Surfaced via r/LocalLLaMA thread (110 upvotes, 12 comments). Worth flagging because the structural choice — document-level routing — is the substantive angle, not the absolute capability tier.
Key Developments
-
Document-Level Routing: Experts cluster around domains (health, news, etc.) at document granularity rather than per-token. The architectural distinction is closer to retrieval-augmented sparsity than to Mixtral-style per-token gating, and is the load-bearing reason to track EMO inside the open-weights MoE cohort.
-
1B-Active / 14B-Total Footprint: Smaller activated-parameter footprint than frontier-tier Chinese-lab MoE releases (DeepSeek V4 at ~37B active, Qwen3.6-27B), positioning EMO inside the consumer-deployable open-weights tier rather than the data-centre-frontier tier.
Related
See also: ai2, MOC - Open Source Models.