Zhipu AI · Open-Weight · MIT
GLM-5: 745B Open-Weight Frontier Model, MIT Licensed
Zhipu AI's GLM-5 (released February 11, 2026) is one of two open-weight models in the “frontier” tier — alongside DeepSeek V4. 745 billion total parameters with 44 billion active per token (MoE), MIT licensed, 200K context window, 77.8% SWE-Bench Verified. Notable detail: trained primarily on Huawei Ascend NPU clusters, demonstrating that frontier results are reachable without NVIDIA. This review covers GLM-5 and its smaller GLM-5-Air variant, hardware requirements, and benchmarks.
Key takeaways
- →745B/44B active MoE — frontier capability with 4× H100 hardware floor.
- →MIT licensed — same permissive license as DeepSeek V4.
- →77.8% SWE-Bench Verified — strong coding for an open-weight model.
- →GLM-5-Air variant — distilled 67B active fits on 1× H100 / 2× RTX 5090.
- →Trained on Huawei Ascend — first frontier model trained primarily off-NVIDIA.
Quick verdict
GLM-5 is the right pick when your hardware budget is 4× H100 and you need a frontier-class open-weight model. Smaller footprint than DeepSeek V4-Pro (8× H100), comparable benchmarks on most tasks, same permissive MIT license.
For individual users, GLM-5-Air (the distilled variant) is more realistic — runs on a single H100 or 2× RTX 5090. For coding-specific workloads, look at Qwen3-Coder-Next instead — it's smaller and scores marginally lower on coding benchmarks.
Specs at a glance
| Vendor | Zhipu AI (THUDM) |
| Architecture | Mixture-of-Experts |
| Total parameters | 745 billion |
| Active parameters | 44 billion per token |
| Context window | 200,000 tokens |
| License | MIT |
| Storage (BF16) | ~1.5 TB |
| Storage (Q4) | ~380 GB |
| Hugging Face | THUDM/GLM-5 |
GLM-5 vs GLM-5-Air
GLM-5 (full)
- 745B total / 44B active
- 4× H100 / 2× B200 minimum
- 77.8% SWE-Bench Verified
- For teams with infrastructure
GLM-5-Air (distilled)
- ~67B active (distilled)
- 1× H100 / 2× RTX 5090
- 71.4% SWE-Bench Verified
- For individual users / startups
Hardware & setup
vLLM (full GLM-5, 4× H100)
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model THUDM/GLM-5 \
--tensor-parallel-size 4 \
--max-model-len 200000 \
--gpu-memory-utilization 0.92 \
--port 8000Ollama (GLM-5-Air, 1× H100 or 2× RTX 5090)
ollama pull glm-5-air
ollama run glm-5-air
# Or expose API endpoint:
ollama serve # listens on :11434/v1Benchmarks
| Benchmark | GLM-5 | GLM-5-Air | DeepSeek V4-Pro | Claude Sonnet 5 |
|---|---|---|---|---|
| SWE-Bench Verified | 77.8% | 71.4% | 82.6% | 92.4% |
| MMLU-Pro | 84.6% | 81.2% | 86.3% | 87.9% |
| GPQA Diamond | 79.4% | 74.8% | 81.4% | 85.7% |
| AIME 2025 | 85.2% | 79.6% | 88.7% | 91.5% |
| LiveCodeBench | 71.6% | 67.2% | 73.4% | 79.8% |
When to pick GLM-5
- ✓You have 4× H100 and want frontier-class self-hosting at smaller footprint than V4-Pro.
- ✓You operate in a region with NVIDIA export restrictions — GLM-5 is fully usable on Huawei Ascend.
- ✓Your work fits in 200K context (most engineering work does).
- ✓You want a non-DeepSeek frontier open-weight to avoid single-vendor risk.
FAQ
What is GLM-5?
GLM-5 vs DeepSeek V4: which open-weight should I pick?
How much VRAM does GLM-5 need?
How do I install and serve GLM-5?
Why does Huawei Ascend matter for GLM-5?
GLM-5 vs Claude Sonnet 5: how big is the gap?
Related models
- → GLM-4.6 — predecessor; smaller hardware floor
- → DeepSeek V4 — comparable open-weight frontier, 1M context
- → Qwen3-Coder-Next — coding-specialized smaller alternative
- → Kimi K2.6 — agentic-first 1T MoE
- → Best AI models May 2026 — pillar comparison