★ Reading this for free? Get 17 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds

Mistral · Open-Weight · Modified MIT

Mistral Medium 3.5: 128B Dense, 4-GPU Open-Weight

Mistral Medium 3.5 (April 30, 2026) is the French AI lab Mistral's unified flagship — 128 billion dense parameters, 256K context, 77.6% SWE-Bench Verified, modified MIT licensed. The big design choice: it replaces three previously-separate Mistral models (Magistral / Pixtral / Devstral) with one model that handles general reasoning, vision, and coding equally well. Runs on 4× H100 at full precision, or 1× H100 / 2× RTX 5090 at Q4 quantization. This is the realistic open-weight choice for prosumer hardware.

📅 Published: May 9, 2026🔄 Last Updated: May 9, 2026✓ Manually Reviewed

Key takeaways

  • 128B dense — no MoE complexity, predictable VRAM, simpler deployment.
  • Unified model — replaces Magistral, Pixtral, Devstral with one model handling all three.
  • 77.6% SWE-Bench Verified — competitive with DeepSeek V4-Flash (78.4%).
  • 256K context — bigger than most prosumer-tier alternatives.
  • Runs on 1× H100 at Q4 — accessible without an 8× H100 cluster.

Quick verdict

Mistral Medium 3.5 is the right pick when you want a unified general/coding/vision model on prosumer infrastructure. Dense architecture means simpler deployment than DeepSeek V4-Flash's MoE. Single H100 at Q4 makes it viable without cluster-grade hardware.

Where it loses: peak coding quality vs Qwen3-Coder-Next (smaller and slightly higher SWE-Bench), 1M context vs DeepSeek V4 (4× longer). For pure coding workloads on a single GPU, Qwen3-Coder-Next or Qwen3.6-27B may be better. For mixed coding + research + vision, Mistral Medium 3.5 is the cleanest single-model option.

Specs at a glance

VendorMistral AI
ArchitectureDense transformer (no MoE)
Parameters128 billion
Context window256,000 tokens
ModalitiesText · Code · Vision
LicenseModified MIT
Storage (BF16)~256 GB
Storage (Q4_K_M)~80 GB
Hugging Facemistralai/Mistral-Medium-3.5

Hardware & setup

HardwareQuantContextTokens/sec
4× H100 80GBBF16256K80-130 tok/s
1× H100 80GBQ4_K_M256K35-55 tok/s
2× RTX 5090 (32GB each)Q4_K_M128K (reduced)25-40 tok/s
1× M3 Ultra (192GB)Q5_K_M256K15-28 tok/s

Ollama (single-GPU prosumer)

ollama pull mistral-medium-3.5
ollama run mistral-medium-3.5

vLLM (production, 4× H100)

python -m vllm.entrypoints.openai.api_server \
  --model mistralai/Mistral-Medium-3.5 \
  --tensor-parallel-size 4 \
  --max-model-len 262144 --port 8000

Benchmarks

BenchmarkMistral Medium 3.5DeepSeek V4-FlashQwen3-Coder-NextGLM-5
SWE-Bench Verified77.6%78.4%70.6%77.8%
MMLU-Pro85.2%83.8%81.4%84.6%
GPQA Diamond76.4%76.9%71.2%79.4%
AIME 202581.6%82.4%76.8%85.2%
Vision-MME (image QA)73.4%N/AN/A68.7%

When to pick Mistral Medium 3.5

  • You want one model for general work + coding + vision (replaces 3 Mistral models).
  • Dense architecture preference (simpler than MoE — predictable VRAM, no expert routing).
  • Single H100 / 2× RTX 5090 hardware (Q4 quantization).
  • EU sovereignty matters — Mistral is Paris-based, GDPR-aligned governance.

FAQ

What is Mistral Medium 3.5?
Mistral Medium 3.5 is the French AI lab Mistral's flagship open-weight model released April 30, 2026. It's a 128-billion-parameter dense transformer (no MoE), 256K context window, scores 77.6% on SWE-Bench Verified, and ships under a modified MIT license that permits commercial use. Mistral Medium 3.5 unifies what were previously three separate models (Magistral for general, Pixtral for vision, Devstral for coding) into one — all three are now retired in favor of the unified Medium 3.5.
How much VRAM does Mistral Medium 3.5 need?
At BF16 (full precision), Mistral Medium 3.5 weights total ~256 GB — needs 4× H100 (80 GB each, 320 GB total) for stable inference. Q4_K_M quantization brings it to ~80 GB, which fits on 1× H100 80GB or 2× RTX 5090 (32 GB each, 64 GB total — tight, requires reduced context). Q5_K_M is ~96 GB. For most teams, 4× H100 with BF16 is the sweet spot. For prosumer/consumer hardware, Q4_K_M on 2× RTX 5090 with 128K context (instead of full 256K) is the realistic config.
Mistral Medium 3.5 vs DeepSeek V4-Flash: which to pick?
Both are accessible open-weight options for prosumer infrastructure. Hardware: Mistral Medium 3.5 dense ~80 GB Q4 (1× H100 or 2× RTX 5090) vs DeepSeek V4-Flash ~150 GB Q4 (2× H100). Benchmarks: Medium 3.5 77.6% SWE-Bench Verified vs V4-Flash 78.4% — essentially tied on coding. V4-Flash wins on context length (1M vs 256K). Mistral wins on simplicity (dense, no MoE complexity). For most teams: pick Mistral Medium 3.5 if hardware budget caps at 1-2 GPUs; pick V4-Flash if you have 2× H100 and need the 1M context.
How do I install Mistral Medium 3.5?
Ollama: `ollama pull mistral-medium-3.5` (default Q4_K_M, ~80 GB) then `ollama run mistral-medium-3.5`. For vLLM serving: `python -m vllm.entrypoints.openai.api_server --model mistralai/Mistral-Medium-3.5 --tensor-parallel-size 4 --max-model-len 262144 --port 8000`. Apple Silicon: llama.cpp with Metal backend works on M3 Max/Ultra (~25-40 tok/s at Q4). Cursor/Continue/Aider integration: point any tool at the OpenAI-compatible endpoint with model name `mistral-medium-3.5`.
What does the unified design (Magistral + Pixtral + Devstral) mean?
Before Medium 3.5, Mistral shipped three specialized models: Magistral (general reasoning), Pixtral (vision), Devstral (coding). Operationally a pain — different APIs, different fine-tunes, different licenses. Medium 3.5 unifies all three into one model with strong performance across all domains. Vision: handles image input natively (no separate Pixtral). Coding: matches old Devstral on SWE-Bench. Reasoning: matches old Magistral on math benchmarks. The benefit is operational simplicity — one model, one deployment, one fine-tuning workflow.
Why does “modified MIT” license matter?
Mistral's modified MIT permits unlimited commercial use, modification, and redistribution. The "modification" adds a clause prohibiting use for training competitive models above a certain scale. In practice this affects almost no one — only AI labs trying to clone Mistral's model would hit the restriction. Day-to-day commercial use, fine-tuning, distillation for product-specific purposes, embedding in SaaS, and self-hosting are all fully permitted with no royalties. Compare to Apache 2.0 (no restrictions) or Llama 4 (modified license with usage thresholds and attribution) — Mistral's license is in between but lenient for typical use cases.

Related models

📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators