★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds

Free Tool · No Signup

Coding Model Router: the right local model for your coding job

Pick what you're coding (autocomplete, refactor, debug, agentic multi-file, or tests) and your hardware (VRAM or Apple Silicon memory). You get the best open-weight model that actually fits — Qwen3-Coder, Devstral Small 2, Codestral, or DeepSeek-Coder — plus an honest call on when a cloud model (Claude or GPT) is the smarter route. Two clicks, no signup, deterministic.

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

1 · What coding job are you routing?

How the routing works

Coding is not one workload — it's five, and they have very different model demands. Autocomplete is latency-bound and loves a small, fast fill-in-the-middle model. Refactoring and debugging need context and reasoning over existing code. Agentic multi-file work (driving an editor agent like Cline or Aider) needs a model strong enough to plan and edit across files without going off the rails. Test generation sits in between. The router crosses your task with your VRAM tier and returns the smallest open-weight model that does that job well at Q4 quantization — and tells you when the job has outgrown your hardware and cloud is the honest answer.

The model facts are real and current (June 2026). Codestral 22B is the fill-in-the-middle specialist (~12GB at Q4, 256K context). DeepSeek-Coder-V2-Lite 16B hits 90.2% on HumanEval with only ~2.4B active parameters, so it stays fast on modest GPUs. Devstral Small 2 (24B) is the breakout: 68% on SWE-bench Verified, purpose-built for coding agents, and designed to run on a single RTX 4090 (24GB) or a 32GB Mac — which is why 24GB is the tier where local agentic coding gets genuinely good. For deeper comparisons start with our pillar on the best local AI models for programming.

Worked examples

Autocomplete · 12GB VRAM (RTX 4070)

Codestral 22B for state-of-the-art FIM, with Qwen 2.5 Coder 7B as the lighter fallback. Cloud is rarely needed — local latency wins here.

Agentic multi-file · 24GB VRAM (RTX 4090)

→ Devstral Small 2 (24B) — wire it into VS Code with the Cline + Ollama setup guide. Cloud only for the hardest 10-20%.

Debug · 8GB VRAM

Qwen 2.5 Coder 7B for in-file fixes; reach for a cloud model on wide cross-file bugs. 8GB caps you at ~7B.

Refactor · CPU only

Qwen 2.5 Coder 7B for short edits (slow on CPU); cloud strongly recommended for anything long or multi-file.

When cloud is actually worth it

We don't pretend local always wins. Cloud earns its keep in three situations: weak hardware (8GB VRAM or CPU-only, where the strong coding-agent models simply don't fit), the hardest tasks (tangled architectural refactors, subtle concurrency or memory bugs, anything where frontier reasoning still leads), and when raw throughput matters. If you do go cloud, pick the right tier — our guide to the best Claude model for coding breaks down Opus 4.8 vs Sonnet 4.6 vs Haiku so you don't overpay. Most productive setups are hybrid: a local model for the routine 80% of edits and completions, cloud for the gnarly 20%.

Frequently asked questions

How does the coding model router work?
You pick two things: the coding job (autocomplete, refactor, debug, agentic multi-file edits, or writing tests) and your hardware tier (VRAM amount or Apple Silicon memory). The router maps that combination to the open-weight model that actually fits and performs well for that job, and tells you honestly when a cloud model (Claude or GPT) is the better call. Output is deterministic — the same two inputs always give the same recommendation.
Which local coding models does it recommend?
Real, currently-relevant open-weight models: Qwen 2.5 Coder 7B (small/fast, great fill-in-the-middle), Codestral 22B (state-of-the-art FIM, 256K context), DeepSeek-Coder-V2-Lite 16B (90.2% HumanEval, low active-param MoE), Devstral Small 2 24B (68% SWE-bench Verified, built for coding agents, runs on a single RTX 4090), and Qwen3-Coder for higher tiers. Larger Apple Silicon also unlocks the Devstral-class Mistral Medium 3.5.
When should I use cloud (Claude or GPT) instead of a local model?
Local wins for autocomplete and most everyday edits — it is private, free per token, and low-latency. Cloud is worth it in three cases: (1) you are on weak hardware (8GB VRAM or CPU) where local models cannot run the bigger coding-agent models; (2) the task is a hard multi-file agentic change or a subtle architectural/concurrency bug where frontier reasoning still leads; (3) you need maximum throughput. The router flags which of these applies to your exact combination.
Is Devstral Small 2 really good enough to replace cloud for agentic coding?
For a lot of real work, yes. Devstral Small 2 (24B) scores 68% on SWE-bench Verified — competitive with much larger models — and Mistral designed it to run on a single RTX 4090 (24GB) or a 32GB Mac. That makes the 24GB tier the inflection point where local coding agents (via Cline or Aider) become genuinely useful. The hardest ~10-20% of tasks still favor a frontier cloud model.
Is the tool free, and does it send my code anywhere?
Free, no signup, and it runs entirely in your browser — it only routes a model recommendation from your two selections. It never sees or sends your code. The local models it recommends also keep your code on your machine; only the optional cloud path involves an API.

Want to wire a local model into your editor?

Once the router picks a model, the next step is connecting it to VS Code as a real coding agent. Our Cline + Ollama guide walks through running Devstral Small 2, Qwen3-Coder, or DeepSeek-Coder locally and driving multi-file edits — no API keys, no per-token bills.

Cline + Ollama setup guide →

Related tools & resources

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once
LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

Free Tools & Calculators