Mistral · Open-Weight · Modified MIT
Mistral Medium 3.5: 128B Dense, 4-GPU Open-Weight
Mistral Medium 3.5 (April 30, 2026) is the French AI lab Mistral's unified flagship — 128 billion dense parameters, 256K context, 77.6% SWE-Bench Verified, modified MIT licensed. The big design choice: it replaces three previously-separate Mistral models (Magistral / Pixtral / Devstral) with one model that handles general reasoning, vision, and coding equally well. Runs on 4× H100 at full precision, or 1× H100 / 2× RTX 5090 at Q4 quantization. This is the realistic open-weight choice for prosumer hardware.
Key takeaways
- →128B dense — no MoE complexity, predictable VRAM, simpler deployment.
- →Unified model — replaces Magistral, Pixtral, Devstral with one model handling all three.
- →77.6% SWE-Bench Verified — competitive with DeepSeek V4-Flash (78.4%).
- →256K context — bigger than most prosumer-tier alternatives.
- →Runs on 1× H100 at Q4 — accessible without an 8× H100 cluster.
Quick verdict
Mistral Medium 3.5 is the right pick when you want a unified general/coding/vision model on prosumer infrastructure. Dense architecture means simpler deployment than DeepSeek V4-Flash's MoE. Single H100 at Q4 makes it viable without cluster-grade hardware.
Where it loses: peak coding quality vs Qwen3-Coder-Next (smaller and slightly higher SWE-Bench), 1M context vs DeepSeek V4 (4× longer). For pure coding workloads on a single GPU, Qwen3-Coder-Next or Qwen3.6-27B may be better. For mixed coding + research + vision, Mistral Medium 3.5 is the cleanest single-model option.
Specs at a glance
| Vendor | Mistral AI |
| Architecture | Dense transformer (no MoE) |
| Parameters | 128 billion |
| Context window | 256,000 tokens |
| Modalities | Text · Code · Vision |
| License | Modified MIT |
| Storage (BF16) | ~256 GB |
| Storage (Q4_K_M) | ~80 GB |
| Hugging Face | mistralai/Mistral-Medium-3.5 |
Hardware & setup
| Hardware | Quant | Context | Tokens/sec |
|---|---|---|---|
| 4× H100 80GB | BF16 | 256K | 80-130 tok/s |
| 1× H100 80GB | Q4_K_M | 256K | 35-55 tok/s |
| 2× RTX 5090 (32GB each) | Q4_K_M | 128K (reduced) | 25-40 tok/s |
| 1× M3 Ultra (192GB) | Q5_K_M | 256K | 15-28 tok/s |
Ollama (single-GPU prosumer)
ollama pull mistral-medium-3.5
ollama run mistral-medium-3.5vLLM (production, 4× H100)
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-Medium-3.5 \
--tensor-parallel-size 4 \
--max-model-len 262144 --port 8000Benchmarks
| Benchmark | Mistral Medium 3.5 | DeepSeek V4-Flash | Qwen3-Coder-Next | GLM-5 |
|---|---|---|---|---|
| SWE-Bench Verified | 77.6% | 78.4% | 70.6% | 77.8% |
| MMLU-Pro | 85.2% | 83.8% | 81.4% | 84.6% |
| GPQA Diamond | 76.4% | 76.9% | 71.2% | 79.4% |
| AIME 2025 | 81.6% | 82.4% | 76.8% | 85.2% |
| Vision-MME (image QA) | 73.4% | N/A | N/A | 68.7% |
When to pick Mistral Medium 3.5
- ✓You want one model for general work + coding + vision (replaces 3 Mistral models).
- ✓Dense architecture preference (simpler than MoE — predictable VRAM, no expert routing).
- ✓Single H100 / 2× RTX 5090 hardware (Q4 quantization).
- ✓EU sovereignty matters — Mistral is Paris-based, GDPR-aligned governance.
FAQ
What is Mistral Medium 3.5?
How much VRAM does Mistral Medium 3.5 need?
Mistral Medium 3.5 vs DeepSeek V4-Flash: which to pick?
How do I install Mistral Medium 3.5?
What does the unified design (Magistral + Pixtral + Devstral) mean?
Why does “modified MIT” license matter?
Related models
- → Mistral Large 123B — predecessor
- → DeepSeek V4 — frontier MoE alternative, 1M context
- → Qwen3-Coder-Next — smaller, coding-specialized
- → Qwen3.6-27B — single-GPU dense alternative
- → GLM-5 — frontier open weight, 4× H100
- → Best AI models May 2026 — pillar comparison