Zhipu AI · Open-Weight · MIT

GLM-5: 745B Open-Weight Frontier Model, MIT Licensed

Zhipu AI's GLM-5 (released February 11, 2026) is one of two open-weight models in the “frontier” tier — alongside DeepSeek V4. 745 billion total parameters with 44 billion active per token (MoE), MIT licensed, 200K context window, 77.8% SWE-Bench Verified. Notable detail: trained primarily on Huawei Ascend NPU clusters, demonstrating that frontier results are reachable without NVIDIA. This review covers GLM-5 and its smaller GLM-5-Air variant, hardware requirements, and benchmarks.

📅 Published: May 9, 2026🔄 Last Updated: May 9, 2026✓ Manually Reviewed

Key takeaways

→745B/44B active MoE — frontier capability with 4× H100 hardware floor.
→MIT licensed — same permissive license as DeepSeek V4.
→77.8% SWE-Bench Verified — strong coding for an open-weight model.
→GLM-5-Air variant — distilled 67B active fits on 1× H100 / 2× RTX 5090.
→Trained on Huawei Ascend — first frontier model trained primarily off-NVIDIA.

Quick verdict

GLM-5 is the right pick when your hardware budget is 4× H100 and you need a frontier-class open-weight model. Smaller footprint than DeepSeek V4-Pro (8× H100), comparable benchmarks on most tasks, same permissive MIT license.

For individual users, GLM-5-Air (the distilled variant) is more realistic — runs on a single H100 or 2× RTX 5090. For coding-specific workloads, look at Qwen3-Coder-Next instead — it's smaller and scores marginally lower on coding benchmarks.

Specs at a glance

Vendor	Zhipu AI (THUDM)
Architecture	Mixture-of-Experts
Total parameters	745 billion
Active parameters	44 billion per token
Context window	200,000 tokens
License	MIT
Storage (BF16)	~1.5 TB
Storage (Q4)	~380 GB
Hugging Face	`THUDM/GLM-5`

GLM-5 vs GLM-5-Air

GLM-5 (full)

745B total / 44B active
4× H100 / 2× B200 minimum
77.8% SWE-Bench Verified
For teams with infrastructure

GLM-5-Air (distilled)

~67B active (distilled)
1× H100 / 2× RTX 5090
71.4% SWE-Bench Verified
For individual users / startups

Hardware & setup

vLLM (full GLM-5, 4× H100)

pip install vllm

python -m vllm.entrypoints.openai.api_server \
  --model THUDM/GLM-5 \
  --tensor-parallel-size 4 \
  --max-model-len 200000 \
  --gpu-memory-utilization 0.92 \
  --port 8000

Ollama (GLM-5-Air, 1× H100 or 2× RTX 5090)

ollama pull glm-5-air
ollama run glm-5-air
# Or expose API endpoint:
ollama serve  # listens on :11434/v1

Benchmarks

Benchmark	GLM-5	GLM-5-Air	DeepSeek V4-Pro	Claude Sonnet 5
SWE-Bench Verified	77.8%	71.4%	82.6%	92.4%
MMLU-Pro	84.6%	81.2%	86.3%	87.9%
GPQA Diamond	79.4%	74.8%	81.4%	85.7%
AIME 2025	85.2%	79.6%	88.7%	91.5%
LiveCodeBench	71.6%	67.2%	73.4%	79.8%

When to pick GLM-5

✓You have 4× H100 and want frontier-class self-hosting at smaller footprint than V4-Pro.
✓You operate in a region with NVIDIA export restrictions — GLM-5 is fully usable on Huawei Ascend.
✓Your work fits in 200K context (most engineering work does).
✓You want a non-DeepSeek frontier open-weight to avoid single-vendor risk.

FAQ

What is GLM-5?

GLM-5 is Zhipu AI's flagship open-weight model released February 11, 2026. It's a 745-billion-parameter Mixture-of-Experts model with 44 billion active per token, MIT licensed for unrestricted commercial use. Notable for being trained primarily on Huawei Ascend NPU clusters (an alternative to NVIDIA infrastructure for sanctioned regions). 77.8% on SWE-Bench Verified, 200K context window. The closest open-weight alternative to DeepSeek V4 with a meaningfully smaller hardware footprint.

GLM-5 vs DeepSeek V4: which open-weight should I pick?

GLM-5 is the more practical choice for most teams. Hardware footprint is smaller (4× H100 vs 8× for V4-Pro), MIT licensed (DeepSeek V4 is also MIT, so identical here), and SWE-Bench Verified scores are competitive (77.8% vs V4-Pro's 82.6%). DeepSeek V4 wins on context length (1M vs 200K) and absolute peak coding quality. Pick GLM-5 if your hardware is 4× H100 or you want to run multiple models on one cluster. Pick V4-Pro if you have 8× H100 and need 1M context for whole-monorepo work.

How much VRAM does GLM-5 need?

GLM-5 at BF16 needs ~1.5 TB total — beyond any consumer setup. At Q4_K_M quantization (~380 GB), 4× H100 80GB is the realistic minimum. 2× B200 180GB also works. For lower-quality but more accessible deployment, GLM-5-Air is a distilled 67B-active variant that fits in ~85 GB Q4 — runs on 1× H100 or 2× RTX 5090. For most individual users, GLM-5-Air is the realistic option; full GLM-5 is for teams with infrastructure.

How do I install and serve GLM-5?

For full GLM-5 (745B): vLLM is the standard. `pip install vllm`, then `python -m vllm.entrypoints.openai.api_server --model THUDM/GLM-5 --tensor-parallel-size 4 --max-model-len 200000 --port 8000`. For GLM-5-Air on Ollama: `ollama pull glm-5-air` (default Q4, ~85 GB). For Cursor/Aider integration with either: point at the local OpenAI-compatible endpoint (`http://localhost:8000/v1` for vLLM, `http://localhost:11434/v1` for Ollama) with model name matching what you served. Full production deployment in our local-AI deployment course.

Why does Huawei Ascend matter for GLM-5?

Most frontier models are trained on NVIDIA H100/H200/B200 clusters. GLM-5 is one of the first frontier models trained primarily on Huawei Ascend 910B NPUs — Chinese-domestic AI accelerators. Practical implications: 1) GLM-5 is geopolitically resilient — it ships from Zhipu (Chinese AI lab) and is fully usable regardless of NVIDIA export restrictions. 2) Demonstrates that frontier-class results are reachable without NVIDIA, which matters for sanctioned regions or organizations diversifying away from a single vendor. 3) Inference still works fine on NVIDIA hardware — the Ascend training is a supply-chain story, not a deployment story.

GLM-5 vs Claude Sonnet 5: how big is the gap?

On most benchmarks GLM-5 lands within 8-12 points of Claude Sonnet 5. SWE-Bench Verified: GLM-5 77.8% vs Sonnet 5 92.4% — that's 14.6 points, a meaningful gap on coding. MMLU-Pro: GLM-5 84.6% vs Sonnet 5 87.9%. GPQA Diamond: GLM-5 79.4% vs Sonnet 5 85.7%. The pattern: GLM-5 is competitive on knowledge and reasoning, well behind on coding. For privacy-required general workloads (research, content, analysis), GLM-5 is a strong substitute. For coding-heavy work, Sonnet 5 (or open alternatives like Qwen3-Coder-Next) win.

Related models

→ GLM-4.6 — predecessor; smaller hardware floor
→ DeepSeek V4 — comparable open-weight frontier, 1M context
→ Qwen3-Coder-Next — coding-specialized smaller alternative
→ Kimi K2.6 — agentic-first 1T MoE
→ Best AI models May 2026 — pillar comparison