Mistral AI · Open-Weight Coding Model

Devstral 2 Review: Mistral's Open Agentic Coding Model

Name: Devstral 2
Author: Mistral AI

Mistral AI released Devstral 2 on December 9, 2025 — a pair of open-weight, agentic coding models. The flagship Devstral 2 (123B) hits 72.2% on SWE-bench Verified, among the best open-weight scores for software engineering, while Devstral Small 2 (24B) scores 68.0% and is light enough to run on a single RTX 4090 or a 32GB Mac. Both share a 256K-token context window and ship with Mistral Vibe CLI, an open-source terminal coding agent. This review covers the specs, benchmarks, the licensing details that actually matter, and how to run Devstral on your own hardware.

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

Self-hostable. Unlike most frontier coding models, Devstral ships open weights — you can download and run it offline. For more local coding models, see Qwen3-Coder and our best local AI models for programming guide.

Key takeaways

→Two open models: Devstral 2 (123B) and Devstral Small 2 (24B) — both built for agentic coding, not just autocomplete.
→SOTA among open models: 72.2% (123B) and 68.0% (24B) on SWE-bench Verified — Mistral's positioning.
→256K context on both — enough to hold a real codebase in one prompt.
→Licenses differ: 24B is Apache 2.0 (no restriction); 123B is modified MIT, restricted only for companies above ~$20M/mo revenue.
→Runs locally: the 24B fits a single 4090 / 32GB Mac. Ships with the open-source Mistral Vibe CLI agent.

Quick verdict

Devstral 2 is the most credible open-weight answer to the wave of proprietary coding agents. The reason it matters here is simple: it is one of the few genuinely strong agentic coders you can download. If you care about keeping source code on your own machine — regulated work, IP-sensitive projects, or just avoiding a metered API — Devstral is the model to reach for, and Devstral Small 2 is the one most people should actually run.

Our take after weighing the specs: don't default to the 123B. Devstral Small 2 (24B) scores 68.0% on SWE-bench Verified — only ~4 points behind the flagship — fits on a single consumer GPU, and ships under the cleaner Apache 2.0 license with no commercial restriction. The 123B is the pick when you have the hardware and need the extra few points on hard, multi-file refactors. For most self-hosters, start with the 24B and the free Cline + Ollama setup and only step up if the smaller model leaves capability on the table.

Specs at a glance

Attribute	Devstral 2 (123B)	Devstral Small 2 (24B)
Vendor	Mistral AI	Mistral AI
Release date	December 9, 2025	December 9, 2025
Parameters	123B (dense)	24B
Context window	256K tokens	256K tokens
SWE-bench Verified	72.2%	68.0%
License	Modified MIT*	Apache 2.0
Open weights?	Yes	Yes
Local self-hostable?	Yes (high-VRAM / multi-GPU)	Yes (single 4090 / 32GB Mac)
Agentic features	Function calling · FIM · multi-file diffs · image input	Function calling · FIM · multi-file diffs · image input
Hosted API price	$0.40 / $2.00 per Mtok	$0.10 / $0.30 per Mtok
Access	Hugging Face · Ollama · LM Studio · Mistral API	Hugging Face · Ollama · LM Studio · Mistral API

*Devstral 2 (123B) uses a modified MIT license that restricts use by companies with global consolidated monthly revenue above ~$20M (see Licensing below). Devstral Small 2 (24B) is plain Apache 2.0 with no such restriction. Sources: Mistral AI announcement (Dec 9, 2025); VentureBeat launch coverage. API prices are list rates and change — verify before budgeting.

The Devstral line

Devstral 2 is the second generation of Mistral's code-specialized family. The release ships two models that share an architecture lineage and feature set but target different deployment realities:

Devstral 2 (123B) — the flagship

A 123-billion-parameter dense transformer aimed at complex, enterprise-grade software work: large multi-file edits, refactors, and integration into agentic pipelines. Top of the line at 72.2% SWE-bench Verified, but it needs serious hardware to self-host (or you use Mistral's API).

Devstral Small 2 (24B) — the local one

A 24B model deliberately sized for consumer hardware — a single RTX 4090 or a 32GB Mac. It scores 68.0% SWE-bench Verified, ships under Apache 2.0, and is the model most individuals and small teams will actually run. Fast inference means tight edit-test feedback loops, fully on-device.

Both are built specifically for agentic coding: exploring a repository, editing multiple files in one pass, calling tools, and retrying with corrections when a step fails — plus fill-in-the-middle editing, multi-file diffs, and image input for multimodal workflows. This is the same job class as terminal agents like Cline, Aider, and Claude Code, but with weights you control.

Benchmarks

Mistral positions both models as state-of-the-art among open-weight models at their size on SWE-bench Verified — the benchmark that measures whether a model can resolve real GitHub issues end to end. Treat the cross-size comparison as Mistral's framing; the headline figures are below.

Benchmark	Devstral 2 (123B)	Devstral Small 2 (24B)	Notes
SWE-bench Verified	72.2%	68.0%	Real GitHub-issue resolution; SOTA among open models per Mistral.
SWE-bench Multilingual	—	55.7%	Reported figure for the 24B model.
Context window	256K	256K	Architecture-level reasoning across large codebases.
Cost efficiency	~7× vs Claude Sonnet	—	Mistral's claim on real-world tasks — vendor figure.

Source: Mistral AI Devstral 2 announcement (Dec 9, 2025). SWE-bench Verified scores and the cost-efficiency comparison are Mistral's own reported figures; independent third-party benchmark results may differ. We report the vendor numbers and attribute them as such.

Licensing: read this before you deploy

The two models do not share a license, and the difference is the single most important thing to understand before you build on Devstral:

Devstral Small 2 (24B) — Apache 2.0

Standard, permissive open-source license. No revenue gate, no special terms. Free to use, modify, and ship commercially. This is the clean choice for products and startups.

Devstral 2 (123B) — modified MIT

MIT-style permissions, with one carve-out: per Mistral's license text, you are not authorized to use the model if your company's (or your employer's) global consolidated monthly revenue exceeds roughly $20 million for the preceding month — derivatives and combined works included. Large companies above that line must obtain a commercial license (sales@mistral.ai) or use Mistral's hosted service instead. For individuals, indie developers, and small companies, the 123B is usable commercially; only large enterprises hit the wall.

Bottom line: if you are a large enterprise, default to Devstral Small 2 (Apache 2.0) or get a commercial agreement for the 123B. If you are small, both are fine — but always read the actual license file shipped with the weights rather than relying on a summary.

Running Devstral on your own hardware

This is where Devstral earns its place on a local-AI site. Most strong coding agents are API-only; Devstral ships open weights you can pull and run offline. Rough hardware guidance:

Model	Practical local target	Notes
Devstral Small 2 (24B)	Single RTX 4090 (24GB) or 32GB Apple Silicon Mac	Runs quantized; the recommended starting point for self-hosting.
Devstral 2 (123B)	~64–80GB+ VRAM at 4-bit (multi-GPU / high-RAM workstation)	Downloadable, but heavy — many will prefer the hosted API here.

The fastest path: pull Devstral Small 2 through Ollama or LM Studio, then point a coding agent at it. Our Cline + Ollama setup guide walks through wiring a local model into a real editor-based agent. If you are choosing between local coders in this weight class, see best 14B coding models and the broader best local AI models for programming roundup.

Mistral Vibe CLI

Devstral 2 didn't launch alone — Mistral shipped Vibe CLI alongside it: an open-source, terminal-native agentic coding assistant under Apache 2.0. It is project-aware (it scans your repository structure automatically), orchestrates edits across multiple files, and integrates with IDEs — it launched with a Zed extension. It is Mistral's direct entry into the terminal coding-agent race against tools like Claude Code, and it is built to run on Devstral models, including a fully local Devstral Small 2 backend.

The practical implication: with Devstral Small 2 + Vibe CLI you can assemble an entirely self-hosted coding agent — open model, open tooling, no data leaving your machine.

Who should pick Devstral

If you are…	Best pick	Why
An indie dev / small team self-hosting	Devstral Small 2 (24B)	Runs on a single 4090 / 32GB Mac, Apache 2.0, 68.0% SWE-bench.
Doing hard, multi-file enterprise refactors	Devstral 2 (123B)	Top open-weight SWE-bench (72.2%); check the revenue restriction.
A large enterprise (>$20M/mo)	Devstral Small 2 or 123B w/ license	24B is unrestricted Apache 2.0; 123B needs a commercial agreement.
Comparing local coders in this class	Qwen3-Coder	Another strong open coding family worth benchmarking against.
Surveying the open-weight field	Best open-source LLMs 2026	Full landscape of downloadable models, coding and general.

Frequently asked questions

Can I run Devstral locally?

Yes — that is the whole point of the line. Devstral Small 2 (24B) is built for local deployment: Mistral and early reviewers report it runs on a single RTX 4090 or a 32GB Mac, so your code never leaves your machine. The larger Devstral 2 (123B) is also open-weight and downloadable, but at 123B dense parameters it needs serious hardware (roughly 64–80GB+ of VRAM at 4-bit; comfortably a multi-GPU or high-RAM workstation setup). Both ship on Hugging Face and are available through Ollama and LM Studio.

What is the difference between Devstral 2 and Devstral Small 2?

Size, license, and where you run them. Devstral 2 is a 123B dense transformer under a modified MIT license; it scores 72.2% on SWE-bench Verified and is the flagship for complex, multi-file enterprise work. Devstral Small 2 is a 24B model under the more permissive Apache 2.0 license, scores 68.0% on SWE-bench Verified, and is optimized to run on consumer hardware. Both share the same 256K context window and agentic feature set (function calling, fill-in-the-middle, multi-file diffs, image input). For most people self-hosting, Small 2 is the practical choice; the 4-point SWE-bench gap is small relative to the hardware savings.

What license is Devstral under — can I use it commercially?

It depends which model. Devstral Small 2 (24B) is Apache 2.0 — no usage restriction, free for commercial use. Devstral 2 (123B) is a modified MIT license: free for commercial use UNLESS your company (or your employer) has global consolidated monthly revenue above roughly $20 million, in which case Mistral's license text says you are not authorized to use it without a commercial agreement (contact sales@mistral.ai). For small teams and individuals both models are usable commercially; only very large companies hit the 123B restriction. Always read the actual license file before deploying in production.

How good is Devstral 2 at coding?

Strong for an open-weight model. Devstral 2 reaches 72.2% on SWE-bench Verified and Devstral Small 2 reaches 68.0% — Mistral positions these among the best open-weight scores for software-engineering tasks at their respective sizes. They are tuned specifically for agentic coding: exploring a repo, editing multiple files, running tools, and retrying on failure, rather than just single-snippet completion. They will not top the absolute frontier set by the best proprietary models, but for a model you can download and run yourself, the SWE-bench numbers are competitive.

What is Mistral Vibe CLI?

Vibe CLI is Mistral's open-source, terminal-native agentic coding assistant, released under Apache 2.0 alongside Devstral 2. It is project-aware — it scans your repo structure, orchestrates edits across multiple files, and integrates with IDEs (it shipped as a Zed extension). It is Mistral's answer to terminal coding agents like Claude Code, and it is designed to run on Devstral models, including a fully local Devstral Small 2 backend.

How much does the Devstral API cost?

Mistral's hosted API lists Devstral 2 (123B) at $0.40 per million input tokens and $2.00 per million output tokens, and Devstral Small 2 (24B) at $0.10 / $0.30 per million tokens. Mistral claims Devstral 2 is up to roughly 7× more cost-efficient than Claude Sonnet on real-world tasks (their figure — treat it as a vendor claim). If you self-host the open weights, the API price is moot: you pay for hardware and electricity, not per token.

How big is Devstral's context window?

Both Devstral 2 and Devstral Small 2 have a 256K-token context window — large enough to hold a sizeable codebase, dependency graph, and conversation history in a single prompt. That matters for agentic coding, where the model needs to track state across many files. It is smaller than the 1M-token windows some proprietary models now advertise, but 256K is ample for most real repositories.

Build a self-hosted coding agent

Devstral Small 2 plus an open agent runtime gives you a private, offline coding assistant with zero per-token cost. The Local AI Master deployment course shows you how to pull open-weight models like Devstral and Qwen3-Coder, quantize them for your GPU, and wire them into a real editor-based agent.

See the deployment course →

Related models & guides

→ Qwen3-Coder — another strong open-weight coding family to benchmark against
→ Best local AI models for programming — the full self-hostable coder roundup
→ Best 14B coding models — local coders sized for consumer GPUs
→ Cline + Ollama setup — wire a local model into an editor-based agent
→ Best open-source LLMs 2026 — the wider open-weight landscape

🎯

AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Start free Browse courses first

Or own it for life — Lifetime $149 $599, pay once

Training your whole team? Get a team quote →

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

GitHub LinkedIn Twitter