★ Reading this for free? Get 17 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds

Zhipu AI · Open-Weight · MIT

GLM-5: 745B Open-Weight Frontier Model, MIT Licensed

Zhipu AI's GLM-5 (released February 11, 2026) is one of two open-weight models in the “frontier” tier — alongside DeepSeek V4. 745 billion total parameters with 44 billion active per token (MoE), MIT licensed, 200K context window, 77.8% SWE-Bench Verified. Notable detail: trained primarily on Huawei Ascend NPU clusters, demonstrating that frontier results are reachable without NVIDIA. This review covers GLM-5 and its smaller GLM-5-Air variant, hardware requirements, and benchmarks.

📅 Published: May 9, 2026🔄 Last Updated: May 9, 2026✓ Manually Reviewed

Key takeaways

  • 745B/44B active MoE — frontier capability with 4× H100 hardware floor.
  • MIT licensed — same permissive license as DeepSeek V4.
  • 77.8% SWE-Bench Verified — strong coding for an open-weight model.
  • GLM-5-Air variant — distilled 67B active fits on 1× H100 / 2× RTX 5090.
  • Trained on Huawei Ascend — first frontier model trained primarily off-NVIDIA.

Quick verdict

GLM-5 is the right pick when your hardware budget is 4× H100 and you need a frontier-class open-weight model. Smaller footprint than DeepSeek V4-Pro (8× H100), comparable benchmarks on most tasks, same permissive MIT license.

For individual users, GLM-5-Air (the distilled variant) is more realistic — runs on a single H100 or 2× RTX 5090. For coding-specific workloads, look at Qwen3-Coder-Next instead — it's smaller and scores marginally lower on coding benchmarks.

Specs at a glance

VendorZhipu AI (THUDM)
ArchitectureMixture-of-Experts
Total parameters745 billion
Active parameters44 billion per token
Context window200,000 tokens
LicenseMIT
Storage (BF16)~1.5 TB
Storage (Q4)~380 GB
Hugging FaceTHUDM/GLM-5

GLM-5 vs GLM-5-Air

GLM-5 (full)

  • 745B total / 44B active
  • 4× H100 / 2× B200 minimum
  • 77.8% SWE-Bench Verified
  • For teams with infrastructure

GLM-5-Air (distilled)

  • ~67B active (distilled)
  • 1× H100 / 2× RTX 5090
  • 71.4% SWE-Bench Verified
  • For individual users / startups

Hardware & setup

vLLM (full GLM-5, 4× H100)

pip install vllm

python -m vllm.entrypoints.openai.api_server \
  --model THUDM/GLM-5 \
  --tensor-parallel-size 4 \
  --max-model-len 200000 \
  --gpu-memory-utilization 0.92 \
  --port 8000

Ollama (GLM-5-Air, 1× H100 or 2× RTX 5090)

ollama pull glm-5-air
ollama run glm-5-air
# Or expose API endpoint:
ollama serve  # listens on :11434/v1

Benchmarks

BenchmarkGLM-5GLM-5-AirDeepSeek V4-ProClaude Sonnet 5
SWE-Bench Verified77.8%71.4%82.6%92.4%
MMLU-Pro84.6%81.2%86.3%87.9%
GPQA Diamond79.4%74.8%81.4%85.7%
AIME 202585.2%79.6%88.7%91.5%
LiveCodeBench71.6%67.2%73.4%79.8%

When to pick GLM-5

  • You have 4× H100 and want frontier-class self-hosting at smaller footprint than V4-Pro.
  • You operate in a region with NVIDIA export restrictions — GLM-5 is fully usable on Huawei Ascend.
  • Your work fits in 200K context (most engineering work does).
  • You want a non-DeepSeek frontier open-weight to avoid single-vendor risk.

FAQ

What is GLM-5?
GLM-5 is Zhipu AI's flagship open-weight model released February 11, 2026. It's a 745-billion-parameter Mixture-of-Experts model with 44 billion active per token, MIT licensed for unrestricted commercial use. Notable for being trained primarily on Huawei Ascend NPU clusters (an alternative to NVIDIA infrastructure for sanctioned regions). 77.8% on SWE-Bench Verified, 200K context window. The closest open-weight alternative to DeepSeek V4 with a meaningfully smaller hardware footprint.
GLM-5 vs DeepSeek V4: which open-weight should I pick?
GLM-5 is the more practical choice for most teams. Hardware footprint is smaller (4× H100 vs 8× for V4-Pro), MIT licensed (DeepSeek V4 is also MIT, so identical here), and SWE-Bench Verified scores are competitive (77.8% vs V4-Pro's 82.6%). DeepSeek V4 wins on context length (1M vs 200K) and absolute peak coding quality. Pick GLM-5 if your hardware is 4× H100 or you want to run multiple models on one cluster. Pick V4-Pro if you have 8× H100 and need 1M context for whole-monorepo work.
How much VRAM does GLM-5 need?
GLM-5 at BF16 needs ~1.5 TB total — beyond any consumer setup. At Q4_K_M quantization (~380 GB), 4× H100 80GB is the realistic minimum. 2× B200 180GB also works. For lower-quality but more accessible deployment, GLM-5-Air is a distilled 67B-active variant that fits in ~85 GB Q4 — runs on 1× H100 or 2× RTX 5090. For most individual users, GLM-5-Air is the realistic option; full GLM-5 is for teams with infrastructure.
How do I install and serve GLM-5?
For full GLM-5 (745B): vLLM is the standard. `pip install vllm`, then `python -m vllm.entrypoints.openai.api_server --model THUDM/GLM-5 --tensor-parallel-size 4 --max-model-len 200000 --port 8000`. For GLM-5-Air on Ollama: `ollama pull glm-5-air` (default Q4, ~85 GB). For Cursor/Aider integration with either: point at the local OpenAI-compatible endpoint (`http://localhost:8000/v1` for vLLM, `http://localhost:11434/v1` for Ollama) with model name matching what you served. Full production deployment in our local-AI deployment course.
Why does Huawei Ascend matter for GLM-5?
Most frontier models are trained on NVIDIA H100/H200/B200 clusters. GLM-5 is one of the first frontier models trained primarily on Huawei Ascend 910B NPUs — Chinese-domestic AI accelerators. Practical implications: 1) GLM-5 is geopolitically resilient — it ships from Zhipu (Chinese AI lab) and is fully usable regardless of NVIDIA export restrictions. 2) Demonstrates that frontier-class results are reachable without NVIDIA, which matters for sanctioned regions or organizations diversifying away from a single vendor. 3) Inference still works fine on NVIDIA hardware — the Ascend training is a supply-chain story, not a deployment story.
GLM-5 vs Claude Sonnet 5: how big is the gap?
On most benchmarks GLM-5 lands within 8-12 points of Claude Sonnet 5. SWE-Bench Verified: GLM-5 77.8% vs Sonnet 5 92.4% — that's 14.6 points, a meaningful gap on coding. MMLU-Pro: GLM-5 84.6% vs Sonnet 5 87.9%. GPQA Diamond: GLM-5 79.4% vs Sonnet 5 85.7%. The pattern: GLM-5 is competitive on knowledge and reasoning, well behind on coding. For privacy-required general workloads (research, content, analysis), GLM-5 is a strong substitute. For coding-heavy work, Sonnet 5 (or open alternatives like Qwen3-Coder-Next) win.

Related models

📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators