Mistral AI · Open-Weight Coding Model
Devstral 2 Review: Mistral's Open Agentic Coding Model
Mistral AI released Devstral 2 on December 9, 2025 — a pair of open-weight, agentic coding models. The flagship Devstral 2 (123B) hits 72.2% on SWE-bench Verified, among the best open-weight scores for software engineering, while Devstral Small 2 (24B) scores 68.0% and is light enough to run on a single RTX 4090 or a 32GB Mac. Both share a 256K-token context window and ship with Mistral Vibe CLI, an open-source terminal coding agent. This review covers the specs, benchmarks, the licensing details that actually matter, and how to run Devstral on your own hardware.
Self-hostable. Unlike most frontier coding models, Devstral ships open weights — you can download and run it offline. For more local coding models, see Qwen3-Coder and our best local AI models for programming guide.
Key takeaways
- →Two open models: Devstral 2 (123B) and Devstral Small 2 (24B) — both built for agentic coding, not just autocomplete.
- →SOTA among open models: 72.2% (123B) and 68.0% (24B) on SWE-bench Verified — Mistral's positioning.
- →256K context on both — enough to hold a real codebase in one prompt.
- →Licenses differ: 24B is Apache 2.0 (no restriction); 123B is modified MIT, restricted only for companies above ~$20M/mo revenue.
- →Runs locally: the 24B fits a single 4090 / 32GB Mac. Ships with the open-source Mistral Vibe CLI agent.
Quick verdict
Devstral 2 is the most credible open-weight answer to the wave of proprietary coding agents. The reason it matters here is simple: it is one of the few genuinely strong agentic coders you can download. If you care about keeping source code on your own machine — regulated work, IP-sensitive projects, or just avoiding a metered API — Devstral is the model to reach for, and Devstral Small 2 is the one most people should actually run.
Our take after weighing the specs: don't default to the 123B. Devstral Small 2 (24B) scores 68.0% on SWE-bench Verified — only ~4 points behind the flagship — fits on a single consumer GPU, and ships under the cleaner Apache 2.0 license with no commercial restriction. The 123B is the pick when you have the hardware and need the extra few points on hard, multi-file refactors. For most self-hosters, start with the 24B and the free Cline + Ollama setup and only step up if the smaller model leaves capability on the table.
Specs at a glance
| Attribute | Devstral 2 (123B) | Devstral Small 2 (24B) |
|---|---|---|
| Vendor | Mistral AI | Mistral AI |
| Release date | December 9, 2025 | December 9, 2025 |
| Parameters | 123B (dense) | 24B |
| Context window | 256K tokens | 256K tokens |
| SWE-bench Verified | 72.2% | 68.0% |
| License | Modified MIT* | Apache 2.0 |
| Open weights? | Yes | Yes |
| Local self-hostable? | Yes (high-VRAM / multi-GPU) | Yes (single 4090 / 32GB Mac) |
| Agentic features | Function calling · FIM · multi-file diffs · image input | Function calling · FIM · multi-file diffs · image input |
| Hosted API price | $0.40 / $2.00 per Mtok | $0.10 / $0.30 per Mtok |
| Access | Hugging Face · Ollama · LM Studio · Mistral API | Hugging Face · Ollama · LM Studio · Mistral API |
*Devstral 2 (123B) uses a modified MIT license that restricts use by companies with global consolidated monthly revenue above ~$20M (see Licensing below). Devstral Small 2 (24B) is plain Apache 2.0 with no such restriction. Sources: Mistral AI announcement (Dec 9, 2025); VentureBeat launch coverage. API prices are list rates and change — verify before budgeting.
The Devstral line
Devstral 2 is the second generation of Mistral's code-specialized family. The release ships two models that share an architecture lineage and feature set but target different deployment realities:
Devstral 2 (123B) — the flagship
A 123-billion-parameter dense transformer aimed at complex, enterprise-grade software work: large multi-file edits, refactors, and integration into agentic pipelines. Top of the line at 72.2% SWE-bench Verified, but it needs serious hardware to self-host (or you use Mistral's API).
Devstral Small 2 (24B) — the local one
A 24B model deliberately sized for consumer hardware — a single RTX 4090 or a 32GB Mac. It scores 68.0% SWE-bench Verified, ships under Apache 2.0, and is the model most individuals and small teams will actually run. Fast inference means tight edit-test feedback loops, fully on-device.
Both are built specifically for agentic coding: exploring a repository, editing multiple files in one pass, calling tools, and retrying with corrections when a step fails — plus fill-in-the-middle editing, multi-file diffs, and image input for multimodal workflows. This is the same job class as terminal agents like Cline, Aider, and Claude Code, but with weights you control.
Benchmarks
Mistral positions both models as state-of-the-art among open-weight models at their size on SWE-bench Verified — the benchmark that measures whether a model can resolve real GitHub issues end to end. Treat the cross-size comparison as Mistral's framing; the headline figures are below.
| Benchmark | Devstral 2 (123B) | Devstral Small 2 (24B) | Notes |
|---|---|---|---|
| SWE-bench Verified | 72.2% | 68.0% | Real GitHub-issue resolution; SOTA among open models per Mistral. |
| SWE-bench Multilingual | — | 55.7% | Reported figure for the 24B model. |
| Context window | 256K | 256K | Architecture-level reasoning across large codebases. |
| Cost efficiency | ~7× vs Claude Sonnet | — | Mistral's claim on real-world tasks — vendor figure. |
Source: Mistral AI Devstral 2 announcement (Dec 9, 2025). SWE-bench Verified scores and the cost-efficiency comparison are Mistral's own reported figures; independent third-party benchmark results may differ. We report the vendor numbers and attribute them as such.
Licensing: read this before you deploy
The two models do not share a license, and the difference is the single most important thing to understand before you build on Devstral:
Devstral Small 2 (24B) — Apache 2.0
Standard, permissive open-source license. No revenue gate, no special terms. Free to use, modify, and ship commercially. This is the clean choice for products and startups.
Devstral 2 (123B) — modified MIT
MIT-style permissions, with one carve-out: per Mistral's license text, you are not authorized to use the model if your company's (or your employer's) global consolidated monthly revenue exceeds roughly $20 million for the preceding month — derivatives and combined works included. Large companies above that line must obtain a commercial license (sales@mistral.ai) or use Mistral's hosted service instead. For individuals, indie developers, and small companies, the 123B is usable commercially; only large enterprises hit the wall.
Bottom line: if you are a large enterprise, default to Devstral Small 2 (Apache 2.0) or get a commercial agreement for the 123B. If you are small, both are fine — but always read the actual license file shipped with the weights rather than relying on a summary.
Running Devstral on your own hardware
This is where Devstral earns its place on a local-AI site. Most strong coding agents are API-only; Devstral ships open weights you can pull and run offline. Rough hardware guidance:
| Model | Practical local target | Notes |
|---|---|---|
| Devstral Small 2 (24B) | Single RTX 4090 (24GB) or 32GB Apple Silicon Mac | Runs quantized; the recommended starting point for self-hosting. |
| Devstral 2 (123B) | ~64–80GB+ VRAM at 4-bit (multi-GPU / high-RAM workstation) | Downloadable, but heavy — many will prefer the hosted API here. |
The fastest path: pull Devstral Small 2 through Ollama or LM Studio, then point a coding agent at it. Our Cline + Ollama setup guide walks through wiring a local model into a real editor-based agent. If you are choosing between local coders in this weight class, see best 14B coding models and the broader best local AI models for programming roundup.
Mistral Vibe CLI
Devstral 2 didn't launch alone — Mistral shipped Vibe CLI alongside it: an open-source, terminal-native agentic coding assistant under Apache 2.0. It is project-aware (it scans your repository structure automatically), orchestrates edits across multiple files, and integrates with IDEs — it launched with a Zed extension. It is Mistral's direct entry into the terminal coding-agent race against tools like Claude Code, and it is built to run on Devstral models, including a fully local Devstral Small 2 backend.
The practical implication: with Devstral Small 2 + Vibe CLI you can assemble an entirely self-hosted coding agent — open model, open tooling, no data leaving your machine.
Who should pick Devstral
| If you are… | Best pick | Why |
|---|---|---|
| An indie dev / small team self-hosting | Devstral Small 2 (24B) | Runs on a single 4090 / 32GB Mac, Apache 2.0, 68.0% SWE-bench. |
| Doing hard, multi-file enterprise refactors | Devstral 2 (123B) | Top open-weight SWE-bench (72.2%); check the revenue restriction. |
| A large enterprise (>$20M/mo) | Devstral Small 2 or 123B w/ license | 24B is unrestricted Apache 2.0; 123B needs a commercial agreement. |
| Comparing local coders in this class | Qwen3-Coder | Another strong open coding family worth benchmarking against. |
| Surveying the open-weight field | Best open-source LLMs 2026 | Full landscape of downloadable models, coding and general. |
Frequently asked questions
Can I run Devstral locally?
What is the difference between Devstral 2 and Devstral Small 2?
What license is Devstral under — can I use it commercially?
How good is Devstral 2 at coding?
What is Mistral Vibe CLI?
How much does the Devstral API cost?
How big is Devstral's context window?
Build a self-hosted coding agent
Devstral Small 2 plus an open agent runtime gives you a private, offline coding assistant with zero per-token cost. The Local AI Master deployment course shows you how to pull open-weight models like Devstral and Qwen3-Coder, quantize them for your GPU, and wire them into a real editor-based agent.
See the deployment course →Related models & guides
- → Qwen3-Coder — another strong open-weight coding family to benchmark against
- → Best local AI models for programming — the full self-hostable coder roundup
- → Best 14B coding models — local coders sized for consumer GPUs
- → Cline + Ollama setup — wire a local model into an editor-based agent
- → Best open-source LLMs 2026 — the wider open-weight landscape
Go from reading about AI to building with AI
20 structured courses. Hands-on projects. Runs on your machine. Start free.
Written by the Local AI Master Team
The team behind Local AI Master
We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.