Qwen 3.5

Alibaba’s efficient open-weights small-to-mid-model series, the general-purpose default across r/LocalLLaMA by mid-April 2026. Distinguished by its hybrid thinking mode (per-query chain-of-thought toggle), broad language coverage, mature tokenizer, and strong single-GPU inference profile on commodity hardware. Part of the broader Qwen family; specifically, Qwen 3.5 is the lineage the community treats as the practical open-weights baseline even after the Qwen 3.6-Plus closed-source pivot.

Key Specs

Model sizes: 0.8B, mid-range variants, 9B parameter (top Qwen 3.5 performer)
License: Apache 2.0
MMLU-Pro: 82.5 (9B variant)
GPQA Diamond: 81.7 (9B variant)
Significance: Beats models ~13× larger in parameter count
Inference profile: 4-bit quantized 9B fits cleanly on a 24 GB RTX 4090; larger variants run with a small router pattern
Hybrid thinking mode: Chain-of-thought toggled per query rather than globally

Timeline

2026-03-12-AI-Digest — Qwen 3.5 series initial tracking; small models outperforming much larger competitors. First signal that efficiency-per-parameter is displacing raw scale as the open-weights competitive axis.
2026-03-16-AI-Digest — Performance updates and benchmark releases; community consolidation on Qwen 3.5 as the preferred all-around open model over Llama at comparable sizes.
2026-03-27-AI-Digest — Performance benchmarking against competitor baselines; Qwen 3.5’s small-model dominance confirmed across open-source coding and reasoning evals.
2026-04-03-AI-Digest — Qwen 3.6-Plus closed-source pivot announced by Alibaba; Qwen 3.5 remains the community’s open-weights anchor as the 3.6-Plus tier moves to service-only deployment.
2026-04-09-AI-Digest — With Llama effectively retired from frontier open-weights competition after Meta‘s Muse Spark closed-source pivot, r/LocalLLaMA treats Qwen 3.5 and Gemma 4 31B as the new top of the open-weights stack. Qwen 3.5 wins coding and tool calling (with hybrid thinking mode); Gemma 4 wins multimodal, long-context retention, multilingual, and structured output. Both fit on a 24 GB card at 4-bit quantization.
2026-04-12-AI-Digest — One week post-Gemma-4-launch, r/LocalLLaMA sentiment consolidates: Gemma 4 31B Dense as strongest all-around; Qwen 3.5 retains the coding/tool-calling edge. Community practical consensus: run both with a router.
2026-04-14-AI-Digest — qwen-ai/qwen3-coder (128K context, tool calling) surfaces as a top April Hugging Face momentum project (2,800+ stars), cementing Qwen 3 Coder as the community default for code-specialist open-weights workloads. Broader r/LocalLLaMA consensus is a multi-model router pattern combining Qwen 3 Coder, Gemma 4 31B, DeepSeek V3, and Llama Stack.
2026-04-18-AI-Digest — Second-most-active r/LocalLLaMA thread of the week: GLM-5.1 vs Qwen 3.5 as best open-weights coding daily driver. Pro-Qwen camp emphasizes broader language coverage, faster inference on commodity hardware, more mature tokenizer. Pro-GLM camp emphasizes SWE-Bench Pro lead (58.4%) and tool-use reliability on agentic loops. Community working consensus: GLM-5.1 for agentic coding workflows, Qwen 3.5 for everything else, run both if you have the VRAM. Qwen 3.5 holds position as the general-purpose open-weights default even as it cedes the narrow coding-specialist crown.
2026-04-19-AI-Digest — Weekend “open-weights safety floor is a competitive moat” framing. Qwen 3.5 and GLM-5.1 can’t match Claude Opus 4.7‘s 87.6% / 64.3%, but also can’t match Claude Mythos Preview‘s zero-day discovery or GPT-5.4-Cyber‘s defensive-analysis profile. Community pivots to benchmarking open-weights against frontier labs’ gated models rather than shipping models.

Market Position

Qwen 3.5’s performance on r/LocalLLaMA was so strong in Q1 2026 that it dethroned Llama as the community favorite, marking a significant shift in open-source model preferences. By mid-April 2026, it has settled into the role of the general-purpose open-weights default — the model most users recommend when asked “what should I run locally for everything” — while ceding the narrow agentic-coding crown to GLM-5.1 and the multimodal/long-context crown to Gemma 4 31B. The hybrid thinking mode remains Qwen 3.5’s most distinctive capability: toggling chain-of-thought per query rather than paying the latency cost on every inference is a workflow advantage most competing open models don’t match.

The Qwen 3.6-Plus closed-source pivot (April 3, 2026) frames Qwen 3.5 as the “last fully open” generation in the lineage. Alibaba’s strategic read: Qwen 3.5 Apache 2.0 remains the community anchor while Qwen 3.6-Plus moves to service-only deployment at the frontier tier.

Qwen 3.5

Qwen 3.5

Key Specs

Timeline

Market Position

See Also