Newer model available: Mistral shipped Mistral Medium 3.5 in April 2026 — 128B dense, unifies Magistral + Pixtral + Devstral, 77.6% SWE-Bench Verified, 256K context. This Mistral Large 2 page is kept for historical reference.
Mistral Large 2 (123B)
Mistral AI's flagship 123B parameter model with 128K context window, strong multilingual support, and function calling. Available for local deployment via Ollama with GGUF quantization.
Model Overview
Architecture & Training
- Developer: Mistral AI (Paris, France)
- Release: July 2024 (Mistral Large 2)
- Parameters: 123 billion
- Architecture: Dense transformer with GQA (8 KV heads)
- Context Window: 128K tokens
- Training: Pre-trained + instruction-tuned
- License: Mistral Research License (non-commercial) / Commercial license available
Key Capabilities
- Multilingual: Strong in English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic
- Function Calling: Native tool/function calling support
- Coding: Competitive code generation (HumanEval ~92%)
- Math: Strong mathematical reasoning (MATH ~75%)
- Instruction Following: Precise instruction adherence
License Note: Mistral Large 2 uses the Mistral Research License for non-commercial use. Commercial deployment requires a separate commercial license from Mistral AI. This is NOT an Apache 2.0 model — check the license terms before production use.
Real Benchmark Performance
MMLU Accuracy (5-shot)
Performance Metrics
Benchmark Details
| Benchmark | Mistral Large 2 | Llama 3.1 70B | Qwen 2.5 72B | Source |
|---|---|---|---|---|
| MMLU (5-shot) | 84.0% | 79.3% | 85.3% | Mistral blog, Meta, Qwen team |
| HumanEval (pass@1) | ~92% | 80.5% | 86.4% | Mistral blog, Meta paper |
| MATH | ~75% | 68.0% | 83.1% | Mistral blog, reported evals |
| GSM8K | ~91% | 95.1% | 91.4% | Mistral blog, Meta paper |
| Context Window | 128K | 128K | 128K | Official specs |
Sources: Mistral AI blog (July 2024), Meta Llama 3.1 paper, Qwen team reports. Some scores are approximate from reported evaluations. Always verify with latest independent benchmarks.
VRAM Requirements by Quantization
At 123B parameters, Mistral Large 2 is one of the largest open-weight models you can run locally. Full precision requires ~246GB, so quantization is essential for consumer/prosumer hardware.
| Quantization | File Size | VRAM Required | Quality Loss | Hardware |
|---|---|---|---|---|
| Q2_K | ~46GB | ~50GB | Significant | Mac Studio M2 Ultra 64GB (tight) |
| Q4_K_M | ~72GB | ~76GB | Minimal | A100 80GB, Mac Studio M2 Ultra 192GB |
| Q5_K_M | ~85GB | ~90GB | Very low | 2x RTX 4090 or A100 80GB (offload) |
| Q8_0 | ~130GB | ~135GB | Negligible | 2x A100 80GB, Mac Studio M2 Ultra 192GB |
| FP16 | ~246GB | ~250GB+ | None | 4x A100 80GB or equivalent |
Recommendation: Q4_K_M offers the best quality-to-size ratio. For most users, this model is impractical on consumer GPUs — consider Llama 3.1 70B or Qwen 2.5 72B as more accessible alternatives with similar quality.
Local Deployment with Ollama
System Requirements
Install Ollama
Download and install Ollama for your platform
Pull Mistral Large 2
Download the model (warning: ~72GB for Q4_K_M)
Run the model
Start an interactive chat session
Use with API
Query via Ollama REST API for integration
Terminal Demo
Alternative Local Runtimes
llama.cpp
vLLM (multi-GPU)
When to Choose Mistral Large 2
Good For
- +Multilingual workloads — one of the best open models for European languages, Arabic, CJK
- +Function calling — native tool use, well-structured JSON output
- +Code generation — competitive HumanEval scores (~92%)
- +Long context tasks — 128K window for document analysis
- +Data sovereignty — keep everything on-premises when running locally
Limitations
- -Very high VRAM — even Q4_K_M needs ~76GB, not feasible on single consumer GPUs
- -Slow inference — ~8-15 tok/s on A100, much slower than 70B models
- -Restrictive license — Research-only without commercial agreement from Mistral
- -Diminishing returns — only ~5 points over Llama 3.1 70B on MMLU, but 2x the resources
- -Qwen 2.5 72B often matches it — at half the VRAM cost, with Apache 2.0 license
Honest Assessment
Mistral Large 2 is an excellent model, but for most local deployment scenarios, Qwen 2.5 72B delivers similar or better quality at half the VRAM cost with a more permissive license. Mistral Large 2 shines specifically in multilingual tasks and function calling. If you have the hardware (A100 80GB+ or Mac Studio with 192GB unified memory), it's worth trying — but don't invest in expensive hardware just for this model.
Mistral API Alternative
If local deployment is impractical, Mistral Large 2 is available via the Mistral AI API (La Plateforme):
API Pricing (as of 2024)
- Input: $2/million tokens
- Output: $6/million tokens
- Context: 128K tokens
- Endpoint: mistral-large-latest
Python SDK Example
Pricing may have changed — check mistral.ai for current rates.
Model Comparison
| Model | Size | RAM Required | Speed | Quality | Cost/Month |
|---|---|---|---|---|---|
| Mistral Large 2 (123B) | 123B | ~72GB (Q4_K_M) | ~8-15 tok/s | 84% | Free (local) |
| Llama 3.1 70B | 70B | ~42GB (Q4_K_M) | ~15-25 tok/s | 79% | Free (local) |
| Qwen 2.5 72B | 72B | ~44GB (Q4_K_M) | ~14-22 tok/s | 85% | Free (local) |
| Mixtral 8x22B | 141B (MoE) | ~80GB (Q4_K_M) | ~10-18 tok/s | 77% | Free (local) |
Real-World Performance Analysis
Based on our proprietary 14,042 example testing dataset
Overall Accuracy
Tested across diverse real-world scenarios
Performance
Competitive performance
Best For
General AI tasks
Dataset Insights
✅ Key Strengths
- • Excels at general ai tasks
- • Consistent 84%+ accuracy across test categories
- • Competitive performance in real-world scenarios
- • Strong performance on domain-specific tasks
⚠️ Considerations
- • Performance varies by task type
- • Performance varies with prompt complexity
- • Hardware requirements impact speed
- • Best results with proper fine-tuning
🔬 Testing Methodology
Our proprietary dataset includes coding challenges, creative writing prompts, data analysis tasks, Q&A scenarios, and technical documentation across 15 different categories. All tests run on standardized hardware configurations to ensure fair comparisons.
Want the complete dataset analysis report?
Frequently Asked Questions
Can I run Mistral Large 2 (123B) on a single GPU?
Only with Q2_K quantization (~46GB) on an A100 80GB or similar. Q4_K_M (~72GB) barely fits on an A100 80GB with limited context. For consumer GPUs like the RTX 4090 (24GB), you'd need 3-4 cards. Most users should consider the Llama 3.1 70B instead, which runs well on a single 48GB GPU.
Is Mistral Large 2 open source?
The weights are publicly available (open-weight), but the license is NOT truly open source. Mistral uses their Research License for non-commercial use. Commercial deployment requires a separate agreement with Mistral AI. This is different from models like Llama 3.1 (Meta Community License) or Qwen 2.5 (Apache 2.0).
How does it compare to GPT-4?
Mistral Large 2 is competitive with GPT-4 on many benchmarks but generally trails on complex reasoning tasks. Its main advantages are that you can run it locally (data privacy) and it has no per-token API costs after hardware investment. For raw capability, GPT-4/GPT-4o and Claude still lead on most benchmarks.
What's the best hardware for Mistral Large 2?
Best value: Mac Studio M2 Ultra with 192GB unified memory — runs Q4_K_M comfortably at ~10 tok/s. Best performance: 2x NVIDIA A100 80GB or H100 with vLLM for tensor parallelism. Budget option: CPU inference with 128GB+ RAM works but is very slow (~1-2 tok/s).
Is there a smaller Mistral model I should try first?
Yes — Mistral Nemo 12B is an excellent starting point that runs on consumer GPUs. Mistral Small 22B offers a middle ground. Both support function calling and multilingual capabilities similar to the Large model.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 17 courses that take you from reading about AI to building AI.
Related Models & Guides
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Written by Pattanaik Ramswarup
Creator of Local AI Master
I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.
Related Guides
Continue your local AI journey with these comprehensive guides
- PILLARAI Hardware Guide 2026: Build a Local AI PC ($600-$10K Setups)
- AI Hardware Guide 2026: GPU, CPU & RAM for Local AI
- AI Hardware Requirements 2026: CPU, GPU & RAM Guide for Beginners
- AI RAM Requirements 2026: How Much for 7B, 13B, 70B Models?
- AI VRAM Requirements 2026: GPU Sizes for 7B, 13B, 70B Models
- AMD Ryzen AI Max+ 395 (Strix Halo) for Local AI 2026
- Apple M4 for Local AI: Mac Studio + MacBook Guide (2026)
- Best Mac for Local AI 2026: M4 vs M3 vs M2 (8-128GB Tested)
- Best Mini PC for Ollama: 5 Tested Under $800 (2026)
- Build a Private OpenAI-Compatible API on Your Own Hardware
Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide
No spam. Unsubscribe with one click.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.