MODEL
Qwen3.6-35B-A3B
modeltopic-notealibabaopen-sourcemoe
Overview
Qwen3.6-35B-A3B is a Mixture-of-Experts (MoE) model in Alibaba’s Qwen3.6 family with 35B total parameters and only 3B active parameters per forward pass — making it a sub-10B-active model that runs on modest consumer hardware. It supports Multi-Token Prediction (MTP), enabling speculative-decoding acceleration in llama.cpp and other inference runtimes. Notable for achieving competitive agentic-coding benchmark scores against models with far larger active parameter counts.
Timeline
- 2026-05-17-AI-Digest — Lands on Terminal-Bench 2.0 leaderboard at 24.6% via the
little-coderscaffold, above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%). Comparison is scaffold-sensitive: Gemini 2.5 Pro on Terminus 2 scores 32.6%, and the leaderboard top is Claude Opus 4.7 at 90.2%. MTP support also merges into llama.cpp (PR #22673) for the Qwen3.6 family the same day, with community benchmarks reporting up to +111% generation-speed gains on AMD Strix Halo hardware (user-reported, not independently benchmarked).
Key Developments
- Terminal-Bench 2.0 Leaderboard Debut: 24.6% with
little-coderscaffold is a notable result for a sub-10B-active model; demonstrates that scaffold choice is as important as model size in agentic-coding benchmarks. - MTP llama.cpp Merge: PR #22673 landing MTP speculative-decoding for the Qwen3.6 family is the most significant local-inference unlock of the week for community users on modest consumer hardware.
Related
See also: Qwen, Alibaba, MOC - Open Source Models, MOC - Agentic Coding