MODEL

Qwen3.6-35B-A3B

modeltopic-notealibabaopen-sourcemoe

Overview

Qwen3.6-35B-A3B is a Mixture-of-Experts (MoE) model in Alibaba’s Qwen3.6 family with 35B total parameters and only 3B active parameters per forward pass — making it a sub-10B-active model that runs on modest consumer hardware. It supports Multi-Token Prediction (MTP), enabling speculative-decoding acceleration in llama.cpp and other inference runtimes. Notable for achieving competitive agentic-coding benchmark scores against models with far larger active parameter counts.

Timeline

  • 2026-05-17-AI-Digest — Lands on Terminal-Bench 2.0 leaderboard at 24.6% via the little-coder scaffold, above Gemini 2.5 Pro on Gemini CLI (19.6%) and Qwen3-Coder-480B on Terminus 2 (23.9%). Comparison is scaffold-sensitive: Gemini 2.5 Pro on Terminus 2 scores 32.6%, and the leaderboard top is Claude Opus 4.7 at 90.2%. MTP support also merges into llama.cpp (PR #22673) for the Qwen3.6 family the same day, with community benchmarks reporting up to +111% generation-speed gains on AMD Strix Halo hardware (user-reported, not independently benchmarked).

Key Developments

  1. Terminal-Bench 2.0 Leaderboard Debut: 24.6% with little-coder scaffold is a notable result for a sub-10B-active model; demonstrates that scaffold choice is as important as model size in agentic-coding benchmarks.
  2. MTP llama.cpp Merge: PR #22673 landing MTP speculative-decoding for the Qwen3.6 family is the most significant local-inference unlock of the week for community users on modest consumer hardware.

See also: Qwen, Alibaba, MOC - Open Source Models, MOC - Agentic Coding