Alibaba Qwen · Open-Weight · Apache 2.0
Qwen3.6-27B: A Dense 27B That Beats Its Own 397B MoE
Qwen3.6-27B (released April 22, 2026, Apache 2.0) is the most surprising open-weight release of the year: a dense 27-billion-parameter model that outperforms Alibaba's own much-larger Qwen3.5-Plus 397B MoE on agentic coding benchmarks. It runs comfortably on a single RTX 5090 at Q4 quantization, making it the best local AI model for users with one high-end consumer GPU. This review covers specs, the dense-vs-MoE explanation, hardware requirements, and benchmarks.
Key takeaways
- →Dense 27B beats own 397B MoE — 68.9% SWE-Bench Verified vs 65.4% for Qwen3.5-Plus.
- →Single-GPU friendly — fits in 17 GB VRAM at Q4_K_M; runs on one RTX 5090.
- →Apache 2.0 — unlimited commercial use, no royalties.
- →128K context — fits most repositories, full PR diffs, long specs.
- →Strong general-purpose — also leads its weight class on MMLU-Pro and AIME.
Quick verdict
If you have one good GPU (RTX 5090, RTX 4090 with tight budget, or M3 Max/Ultra) and you want one local model that handles coding + research + general work, Qwen3.6-27B is the right default in May 2026.
For multi-GPU rigs or H100-class hardware, Qwen3-Coder-Next (80B/3B active MoE) edges ahead on coding benchmarks. For frontier-class general capability with 1M context, DeepSeek V4-Flash is the upgrade.
Specs at a glance
| Vendor | Alibaba Qwen |
| Architecture | Dense transformer (no MoE) |
| Parameters | 27 billion (all active) |
| Context window | 128,000 tokens |
| License | Apache 2.0 |
| Storage (BF16) | ~54 GB |
| Storage (Q4_K_M) | ~17 GB |
| Hugging Face | Qwen/Qwen3.6-27B |
Why a 27B dense model beats a 397B MoE
The whole 2024-2025 narrative was that MoE wins by spending compute only where needed. Qwen3.6-27B challenges that for agentic coding workloads. Three reasons it works:
- 1.All-active compute every token. A 27B dense model applies all 27B parameters to every input token. Qwen3.5-Plus 397B MoE activates only 17B per token. For tasks where every step needs deep reasoning (multi-file refactors, debugging loops), all-active beats sparse-active.
- 2.Stronger inter-layer information flow. Dense transformers route information through every layer in a single integrated path. MoE creates multiple parallel paths that have to be reconciled — fine for facts and patterns, less ideal for stepwise reasoning.
- 3.Better training data. Qwen3.6-27B trained on a curated coding corpus with longer agentic-loop traces, while Qwen3.5-Plus prioritized broad knowledge. For coding benchmarks specifically, that data quality dominates parameter count.
Caveat: dense doesn't beat MoE everywhere. Qwen3.5-Plus 397B still wins on broad knowledge benchmarks (MMLU-Pro 88.4% vs Qwen3.6-27B's 81.7%), where parameter count and breadth matter more than reasoning depth. The right takeaway: pick architecture by workload, not by parameter count.
Hardware & setup
| Hardware | Quant | Tokens/sec |
|---|---|---|
| 1× RTX 5090 (32GB) | Q4_K_M | 60-90 tok/s |
| 1× RTX 4090 (24GB) | Q4_K_M | 35-55 tok/s |
| 1× H100 80GB | BF16 | 120-180 tok/s |
| M3 Max 64GB | Q4_K_M | 25-40 tok/s |
| M3 Ultra 96GB | Q5_K_M | 30-50 tok/s |
Ollama (5-min install)
ollama pull qwen3.6:27b
ollama run qwen3.6:27b
# Or for Cursor/Aider integration:
ollama serve # listens on :11434/v1vLLM (production)
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen3.6-27B --max-model-len 131072 --port 8000Benchmarks
| Benchmark | Qwen3.6-27B | Qwen3.5-Plus 397B | Qwen3-Coder-Next | DeepSeek V4-Flash |
|---|---|---|---|---|
| SWE-Bench Verified | 68.9% | 65.4% | 70.6% | 78.4% |
| LiveCodeBench | 66.2% | 64.8% | 68.4% | 67.2% |
| MMLU-Pro | 81.7% | 88.4% | 81.4% | 83.8% |
| GPQA Diamond | 73.4% | 79.6% | 71.2% | 76.9% |
| AIME 2025 | 79.3% | 82.1% | 76.8% | 82.4% |
When to pick Qwen3.6-27B
- ✓You have one good GPU and want one local model for coding + general work.
- ✓You want simpler deployment than MoE (predictable VRAM, no expert routing complexity).
- ✓You're on Apple Silicon — dense models work cleanly on M3 Max/Ultra Metal.
- ✓Your work fits in 128K context (most coding does).
Frequently asked questions
What is Qwen3.6-27B?
Why does a 27B dense model beat a 397B MoE?
How much VRAM does Qwen3.6-27B need?
How do I install Qwen3.6-27B?
Qwen3.6-27B vs Qwen3-Coder-Next: which should I pick?
Is Qwen3.6-27B good for general work too, not just coding?
Build a single-GPU local-AI stack
The Local AI Master deployment course covers Qwen3.6-27B production setup, Cursor integration, and hybrid routing.
See the course →Related models
- → Qwen3-Coder-Next — MoE upgrade if you have 1× H100 / 2× RTX 5090
- → DeepSeek V4-Flash — frontier MoE alternative, 1M context
- → GLM-5 — 745B MoE for serious infrastructure
- → Mistral Medium 3.5 — dense 128B for 4-GPU rigs
- → Best AI models May 2026 — pillar comparison