Aider + Ollama Setup (2026): Free Local AI Coding Agent
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Ollama’s running. Here’s what to build with it. Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
To run Aider fully local on Ollama, install Aider (the quickest way is python -m pip install aider-install then aider-install), pull a coding model with ollama pull qwen2.5-coder:14b, set OLLAMA_API_BASE=http://127.0.0.1:11434, then launch aider --model ollama_chat/qwen2.5-coder:14b inside a git repo. That gives you a free, private, git-native pair programmer in your terminal — no API key, no per-token bill, and every edit auto-committed so you can undo anything. Aider is the most mature terminal coding agent (Apache 2.0, ~41k GitHub stars), and unlike IDE-bound tools it works the same whether you use VS Code, Vim, or no editor at all.
This guide covers the exact install, the one easy-to-miss detail (use the ollama_chat/ prefix, not ollama/), which Ollama models actually code well, how Aider's architect/editor split and repo-map work, and how it compares to Cline and Goose.
Why Aider + Ollama instead of a cloud coding agent?
Aider is a command-line AI pair programmer that edits files in your local git repository. The Ollama pairing matters for three concrete reasons:
- It is free and stays free. Cloud agents meter you per token; a local model on Ollama costs nothing per request after the one-time download. For a tool you leave running all day, that difference compounds fast.
- Your code never leaves the machine. Aider sends file contents, a repo map, and chat history to the model. With Ollama, "the model" is a process on localhost — nothing goes to a third-party API. For proprietary or client code under NDA, that is the whole point.
- It is git-native. Whenever Aider edits a file it commits the change with a descriptive message, so every AI edit is its own reviewable, revertible commit. You get a clean audit trail instead of a mystery diff.
The honest trade-off: a 7B-14B local model is not GPT-5 or Claude. It is genuinely good at focused edits, refactors, and boilerplate, and noticeably weaker than frontier cloud models on sprawling multi-file architecture. The architect/editor split below is how you close some of that gap. If you want the broader picture of building an all-local stack, see our complete 2026 local AI developer toolchain.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How do you install Aider and connect it to Ollama?
You need two things running: Ollama (serving a model) and Aider (the agent). Assuming you already have Ollama installed, here is the full path.
1. Install Aider. The maintainers' quickest method is the bootstrap installer:
python -m pip install aider-install
aider-install
If you prefer a clean, isolated install (recommended on a dev box so Aider's dependencies don't collide with your project's), use uv:
uv tool install --force --python python3.12 --with pip aider-chat@latest
2. Pull a coding model with Ollama:
ollama pull qwen2.5-coder:14b
3. Point Aider at your local Ollama server:
export OLLAMA_API_BASE=http://127.0.0.1:11434 # Mac/Linux
# Windows (PowerShell/CMD): setx OLLAMA_API_BASE http://127.0.0.1:11434 (then restart the shell)
4. Launch Aider in a git repo:
cd your-project
aider --model ollama_chat/qwen2.5-coder:14b
That's it. Aider drops you into a chat prompt; describe a change in plain English and it edits the files and commits the result.
Use the ollama_chat/ prefix, not ollama/
This is the one detail people get wrong. Aider's docs explicitly recommend ollama_chat/<model> over ollama/<model> — the chat endpoint produces better results with Aider's prompting. So it is --model ollama_chat/qwen2.5-coder:14b, not --model ollama/qwen2.5-coder:14b.
Fix the context window (the silent quality killer)
By default Aider sizes Ollama's context window to fit each request plus about 8k tokens for the reply, which is fine for small edits. For real repo work you want a larger, fixed window. Create a .aider.model.settings.yml in your project root:
- name: ollama_chat/qwen2.5-coder:14b
extra_params:
num_ctx: 32768
Bump num_ctx as high as your VRAM allows — a bigger window lets Aider hold more of the repo map and more files in chat at once, which is where most of the quality comes from. Not sure what fits your card? Run the numbers through our VRAM calculator before you set it too high.
Which Ollama models work best with Aider?
Aider works with any Ollama model, but edit quality varies a lot. These are the local models worth running, all verified against their official model cards:
| Model | Ollama tag | Size | HumanEval | VRAM (Q4_K_M) | Best for |
|---|---|---|---|---|---|
| Qwen2.5-Coder-14B-Instruct | qwen2.5-coder:14b | 14.7B dense | 89.6% | ~9.5 GB | Best balance for one 12-16 GB GPU |
| Qwen2.5-Coder-32B-Instruct | qwen2.5-coder:32b | 32B dense | 92.7% | ~19 GB | Highest quality if you have 24 GB |
| Qwen3-Coder-30B-A3B-Instruct | (community GGUF) | 30.5B MoE / 3.3B active | agentic-focused | ~18 GB | Long-context agentic coding (256K ctx) |
| DeepSeek-Coder-V2-Lite-Instruct | deepseek-coder-v2:16b | 16B MoE / 2.4B active | ~81% (vendor) | ~10.5 GB | Fast first-token, 128K ctx |
| Qwen2.5-Coder-7B-Instruct | qwen2.5-coder:7b | 7.6B dense | 88.4% | ~5 GB | Small GPUs / laptops |
A few honest notes. Qwen2.5-Coder-14B is the default recommendation — 89.6% HumanEval at roughly 9.5 GB makes it the strongest model that comfortably fits a single 12 GB or 16 GB GPU, and it was trained for fill-in-the-middle so it edits cleanly. Step up to the 32B (92.7% HumanEval) only if you have a 24 GB card. Qwen3-Coder-30B-A3B is a Mixture-of-Experts model (30.5B total, ~3.3B active per token) tuned specifically for agentic coding with native 256K context — promising for Aider's longer sessions, but at launch it ships mainly as community GGUF quants rather than an official Ollama-library tag, so confirm the quant before relying on it. DeepSeek-Coder-V2-Lite is the speed pick: its 16B MoE activates only 2.4B params per token, so first tokens come back fast. (Its 236B big sibling hits 90.2% HumanEval but needs a server, not a desktop.) For the full cross-size leaderboard, see our guide to the best Ollama model for coding, and if you're choosing within the 12-16 GB bracket, our best 14B coding models breakdown ranks them by HumanEval and VRAM.
What is the architect/editor split, and why use it?
Aider has several chat modes you switch between mid-session: code (default, makes edits), ask (discuss without editing), architect, and help. The architect mode is the one that meaningfully improves results with local models.
In architect mode Aider uses two models in a two-pass design: an architect model reasons about the change and writes a plan in prose, then an editor model translates that plan into precise file edits in Aider's diff format. Splitting "think about the problem" from "produce a perfectly formatted diff" helps, because smaller local models often struggle to do both at once. Launch it with the --architect flag (or --chat-mode architect), and you can set a separate editor model:
aider --architect \
--model ollama_chat/qwen2.5-coder:32b \
--editor-model ollama_chat/qwen2.5-coder:14b
A practical pattern: stay in ask mode while you and the model agree on a plan, then say "go ahead" to execute — or reach for architect mode whenever a change touches more than a couple of files. You switch on the fly with the /code, /ask, /architect, and /help slash commands.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How does the repo map work?
The repo map is what lets a small local model punch above its weight on a large codebase. Aider always sends the model three things: the files you've explicitly added to the chat (full contents), a repo map (a concise skeleton of every other file — the key classes and functions with their signatures), and the conversation history. The model can see how a file relates to the rest of the project without you pasting the entire repo into context.
That makes the most useful slash command /add: it pulls specific files into the chat so the model can edit them directly. The everyday command set:
| Command | What it does |
|---|---|
/add path/to/file.py | Add a file (or /add src/*.py, or a directory) to the chat for editing |
/drop file.py | Remove a file from the chat once it's done |
/ask <question> | Ask about the code without making edits |
/run <cmd> | Run a shell command and optionally add its output to the chat |
/diff | Show the diff of changes since the last message |
/undo | Undo Aider's last commit |
/clear | Discard the chat history for a fresh start |
Because each edit is its own git commit, /undo is genuinely safe — it just reverts the last commit Aider made. If you've used Continue.dev with Ollama for inline autocomplete, think of Aider as the complementary tool: Continue lives in your editor for completions, Aider lives in the terminal for whole-task, multi-file edits.
Aider vs Cline vs Goose: which local agent fits you?
All three are free, open, and run on local Ollama models, but they live in different places and suit different workflows. Verified facts on each:
| Tool | Where it runs | License | Model wiring | Best when |
|---|---|---|---|---|
| Aider | Terminal (CLI) | Apache 2.0 | --model ollama_chat/<model> + OLLAMA_API_BASE | You want git-native, terminal-first, editor-agnostic edits |
| Cline | VS Code extension | Apache 2.0 | Ollama provider in settings | You live in VS Code and want approve-every-step autonomy + browser/MCP tools |
| Goose | Desktop app + CLI | Apache 2.0 | Ollama provider + MCP extensions | You want a standalone agent (not tied to an editor) with heavy MCP tooling |
The short version: pick Aider if your workflow is terminal- and git-centric and you want the most mature, lowest-overhead option. Pick Cline if you live inside VS Code and want a human-in-the-loop agent that asks before every file change — see our Cline + Ollama setup. Pick Goose if you want a standalone desktop/CLI agent with a large MCP extension ecosystem — covered in Goose + Ollama. None of them locks you in; many developers keep Aider for fast terminal edits and a second tool for editor-integrated work.
A real terminal session (what it actually looks like)
Here is a representative session against a Python project, edited for length:
$ aider --model ollama_chat/qwen2.5-coder:14b
Aider v0.x | Model: ollama_chat/qwen2.5-coder:14b | Git repo: .git
> /add api/users.py
Added api/users.py to the chat
> add input validation to create_user and return 422 on bad email
Editing api/users.py
- imports email-validator, validates payload.email
- raises HTTP 422 with a clear message on failure
Committed: a3f10c2 feat: validate email in create_user, return 422
> /diff
(shows the committed diff)
> /undo
Removed last commit a3f10c2
Notice the loop: add the file, describe the change, Aider edits and commits, you inspect with /diff, and /undo reverts cleanly because it's all git underneath. No copy-paste, no leaving the terminal.
First-hand notes on speed and VRAM
Approximate, from a single machine — treat as ballpark, not a benchmark. On an RTX 3090 (24 GB) running qwen2.5-coder:14b at Q4_K_M, I saw roughly 35-45 tokens/sec generating edits, with the whole model GPU-offloaded; the 32B at the same quant dropped to about 18-22 tokens/sec but produced cleaner first-try diffs on multi-file refactors. The DeepSeek-Coder-V2-Lite MoE felt the snappiest on first-token latency, as expected from its 2.4B active params. The practical lesson held every time: the moment any layer spills from VRAM into system RAM, throughput collapses — keep the entire model on the GPU, and raise num_ctx only as far as your remaining VRAM allows. For a 12 GB card, the 14B at Q4 plus a 16K-32K context is the sweet spot; below that, drop to qwen2.5-coder:7b.
Key Takeaways
- Aider + Ollama = a free, private, git-native coding agent in your terminal. No API key, no per-token cost, code never leaves localhost, and every edit is its own revertible commit.
- Install fast, then wire it up:
pip install aider-install && aider-install(oruv tool install ... aider-chat@latest), setOLLAMA_API_BASE=http://127.0.0.1:11434, and launch withaider --model ollama_chat/<model>. - Use the
ollama_chat/prefix, notollama/— it's the maintainers' recommendation and gives better edits. - Qwen2.5-Coder-14B is the default model pick (89.6% HumanEval, ~9.5 GB); raise
num_ctxin.aider.model.settings.ymlfor real repo work. - Architect mode + the repo map are how a small local model handles big codebases: a reasoning pass plus a precise-edit pass, with a skeleton of the whole repo always in context.
- Aider is the terminal/git-native pick; Cline is the VS Code pick; Goose is the standalone-agent pick — all three are free and run on local Ollama.
Next Steps
- Verify everything against the source: the Aider GitHub repository and the official Aider + Ollama docs.
- New to Ollama itself? Start with our complete Ollama guide, then come back here.
- Choosing a model for your GPU? Compare the field in best Ollama model for coding and the 12-16 GB best 14B coding models ranking.
- Prefer edits inside your editor? Set up inline completion with Continue.dev + Ollama, or pick a different agent in Cline + Ollama / Goose + Ollama.
- Building the whole local stack? See the complete 2026 local AI developer toolchain.
Ollama’s running. Here’s what to build with it.
Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
- PILLARBest Local AI for Coding 2026: 10 Models Tested & Ranked
- 7B vs 14B vs 32B vs 70B for Coding (2026): What Size?
- AI Context Windows: 4K vs 128K vs 1M vs 10M Tokens (2026)
- AI vs Coding for Kids: Which Should Children Learn First?
- Best 14B Coding Models (2026): Ranked by HumanEval + VRAM
- Best AI Coding Models Ranked: SWE-bench Leaderboard
- Best AI for JavaScript & TypeScript 2026: 10 Models Ranked
- Best AI Models for Python Development 2026: Top 10 Ranked
- Best Claude Model for Coding (2026): Opus 4.8 vs Sonnet 4.6 vs Haiku
- Best Ollama Model for Coding (2026): Qwen3-Coder Ranked #1
Comments (0)
No comments yet. Be the first to share your thoughts!