Free Tool · No Signup
Coding Model Router: the right local model for your coding job
Pick what you're coding (autocomplete, refactor, debug, agentic multi-file, or tests) and your hardware (VRAM or Apple Silicon memory). You get the best open-weight model that actually fits — Qwen3-Coder, Devstral Small 2, Codestral, or DeepSeek-Coder — plus an honest call on when a cloud model (Claude or GPT) is the smarter route. Two clicks, no signup, deterministic.
1 · What coding job are you routing?
How the routing works
Coding is not one workload — it's five, and they have very different model demands. Autocomplete is latency-bound and loves a small, fast fill-in-the-middle model. Refactoring and debugging need context and reasoning over existing code. Agentic multi-file work (driving an editor agent like Cline or Aider) needs a model strong enough to plan and edit across files without going off the rails. Test generation sits in between. The router crosses your task with your VRAM tier and returns the smallest open-weight model that does that job well at Q4 quantization — and tells you when the job has outgrown your hardware and cloud is the honest answer.
The model facts are real and current (June 2026). Codestral 22B is the fill-in-the-middle specialist (~12GB at Q4, 256K context). DeepSeek-Coder-V2-Lite 16B hits 90.2% on HumanEval with only ~2.4B active parameters, so it stays fast on modest GPUs. Devstral Small 2 (24B) is the breakout: 68% on SWE-bench Verified, purpose-built for coding agents, and designed to run on a single RTX 4090 (24GB) or a 32GB Mac — which is why 24GB is the tier where local agentic coding gets genuinely good. For deeper comparisons start with our pillar on the best local AI models for programming.
Worked examples
Autocomplete · 12GB VRAM (RTX 4070)
→ Codestral 22B for state-of-the-art FIM, with Qwen 2.5 Coder 7B as the lighter fallback. Cloud is rarely needed — local latency wins here.
Agentic multi-file · 24GB VRAM (RTX 4090)
→ Devstral Small 2 (24B) — wire it into VS Code with the Cline + Ollama setup guide. Cloud only for the hardest 10-20%.
Debug · 8GB VRAM
→ Qwen 2.5 Coder 7B for in-file fixes; reach for a cloud model on wide cross-file bugs. 8GB caps you at ~7B.
Refactor · CPU only
→ Qwen 2.5 Coder 7B for short edits (slow on CPU); cloud strongly recommended for anything long or multi-file.
When cloud is actually worth it
We don't pretend local always wins. Cloud earns its keep in three situations: weak hardware (8GB VRAM or CPU-only, where the strong coding-agent models simply don't fit), the hardest tasks (tangled architectural refactors, subtle concurrency or memory bugs, anything where frontier reasoning still leads), and when raw throughput matters. If you do go cloud, pick the right tier — our guide to the best Claude model for coding breaks down Opus 4.8 vs Sonnet 4.6 vs Haiku so you don't overpay. Most productive setups are hybrid: a local model for the routine 80% of edits and completions, cloud for the gnarly 20%.
Frequently asked questions
How does the coding model router work?
Which local coding models does it recommend?
When should I use cloud (Claude or GPT) instead of a local model?
Is Devstral Small 2 really good enough to replace cloud for agentic coding?
Is the tool free, and does it send my code anywhere?
Want to wire a local model into your editor?
Once the router picks a model, the next step is connecting it to VS Code as a real coding agent. Our Cline + Ollama guide walks through running Devstral Small 2, Qwen3-Coder, or DeepSeek-Coder locally and driving multi-file edits — no API keys, no per-token bills.
Cline + Ollama setup guide →Related tools & resources
- → VRAM Calculator — exact VRAM needs for any model + quantization
- → AI Model Finder — match any GPU to the right model (all use cases)
- → Best local AI models for programming — the full coding pillar
- → Best Claude model for coding — Opus 4.8 vs Sonnet 4.6 vs Haiku
- → Mistral Medium 3.5 — the Devstral-class unified model
Go from reading about AI to building with AI
20 structured courses. Hands-on projects. Runs on your machine. Start free.
Written by the Local AI Master Team
The team behind Local AI Master
We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.