Roo Code Shut Down — Best Local Alternative (Self-Hosted Coding Agent + Ollama)
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Ollama’s running. Here’s what to build with it. Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
Yes — Roo Code shut down. The team announced it on April 21, 2026 and archived the VS Code extension on May 15, 2026, pivoting to a cloud agent (roomote.dev) because they no longer believe the IDE is the future of coding. The best replacement if you want to keep working locally is a fully self-hosted coding agent — Cline (the upstream project Roo forked from, and Roo's own recommended successor) or Kilo Code (an active Roo fork that reads your existing .roomodes and .roo/rules/ files) — pointed at a local Ollama model like Qwen3-Coder 30B A3B. No cloud account, no per-token billing, and your proprietary code never leaves the machine. The archived Roo extension still runs after May 15 (it doesn't self-destruct), but it gets no updates, so a migration is the right move — and it's a good moment to drop cloud lock-in entirely.
This guide covers what actually happened to Roo Code, why a local successor beats jumping onto another cloud agent, how Cline and Kilo Code differ, the exact steps to carry your Roo config across, and an honest look at where local agents still trail frontier cloud models.
Did Roo Code actually shut down?
Yes. This is real and recent, so here are the verified facts as of mid-2026:
- Announced April 21, 2026. Roo Code's Matt Rubens posted that all Roo Code products — the VS Code extension, Roo Code Cloud, and the Roo Code Router — would be discontinued.
- Repository archived May 15, 2026. The
RooCodeInc/Roo-CodeGitHub repo was archived after a final push, at roughly 24,200 stars and 3,300 forks. The extension had passed 3 million installs. - The reason: a cloud pivot. The team stated they "don't believe IDEs are the future of coding" and went all-in on a new cloud agent, Roomote (roomote.dev).
- The archived extension still works. The binary doesn't disappear on May 15 — it just stops getting updates, security fixes, and model support. Running an unmaintained agent against fast-moving model APIs is borrowed time.
- Roo's own recommendation was Cline. Roo Code pointed users back to Cline, the open-source project it originally forked from, for a model-agnostic extension. Cline's team publicly welcomed Roo users.
So Roo Code is genuinely gone as a maintained tool. The open question for you isn't whether to migrate — it's to what. The answer this site cares about: a setup you fully control, running on your own hardware.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Why a local alternative, not the cloud pivot?
Roo's successor (Roomote) is a hosted cloud agent. You can follow it there — but you'd be trading one cloud dependency for another, and the whole reason many people liked Roo was that it was an open extension you ran yourself. Going local instead fixes the failure mode you just lived through:
- No vendor can sunset your setup. Cline and Kilo Code are open-source and model-agnostic. Even if one project pivoted tomorrow, the other — plus the local model — keeps working. You're not betting your workflow on a single company's roadmap.
- Your code never leaves the machine. A local agent pointed at Ollama sends nothing to a third-party server. For proprietary or regulated codebases, that's the difference between "allowed" and "not allowed."
- $0 in subscriptions, no per-token bill. A cloud agent meters every refactor; an autonomous agent on a big task can burn real dollars per run. Local inference is free after the hardware.
- No rate limits, works offline. Hammer it through a refactor on a plane. The only ceiling is your GPU.
The honest trade-off — covered in the limits section — is that a 24–30B local model isn't frontier-cloud-class on the hardest multi-file tasks. But for scoped edits, boilerplate, tests, and refactors on a private repo, a well-configured local agent is a legitimate daily driver, and it can't be shut down out from under you.
Cline vs Kilo Code: which local successor?
Both are open-source VS Code agents that run local models through Ollama, and both descend from the same lineage (Roo forked Cline; Kilo Code forked both). Here's how to choose:
| Cline | Kilo Code | |
|---|---|---|
| Origin | The original upstream Roo forked from | Active fork of both Cline and Roo Code |
| Roo's official pick? | Yes — recommended successor | One of two migration paths Roo named |
| Reads Roo config? | Concepts carried over (plan/act, MCP, diffs); rules need light porting | Reads existing .roomodes and .roo/rules/ directly + publishes a Roo→Kilo migration guide |
| Local models via Ollama | Yes | Yes (also LM Studio, vLLM, OpenAI-compatible) |
| Install base / maturity | Largest install base, original codebase | ~1.5M+ users, well-funded, ships fast |
| Extra features | Mature plan/act, MCP, broad provider support | Orchestrator mode, inline autocomplete, Memory Bank |
| License | MIT (open-source) | Open-source |
Both are legitimate, actively-maintained choices. Star counts and user numbers move; treat the figures above as mid-2026 approximations, not live stats.
Pick Cline if you want the upstream original with the largest community, the project Roo itself pointed you to, and a clean break from forks. Our full walkthrough is the Cline + Ollama setup guide.
Pick Kilo Code if you have a pile of .roomodes / custom rules you don't want to rewrite — it ingests your existing Roo config directly — or you want inline autocomplete in the same extension as the agent.
Either way, the local part is identical: both talk to the same Ollama server, so the model and hardware advice below applies to both.
Migrate your Roo config to a local agent
You need two pieces: Ollama (the local model server) and your chosen extension (Cline or Kilo Code).
1. Install Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download the installer from ollama.com
2. Pull a coding model (pick based on your VRAM — see the model section):
# Strong agentic default if you have ~24GB VRAM:
ollama pull qwen3-coder:30b
# Smaller, code-focused option for ~24GB:
ollama pull devstral-small-2:24b
3. Install the extension in VS Code → Extensions (⇧⌘X / Ctrl+Shift+X) → search "Cline" or "Kilo Code" → Install.
4. Point it at local Ollama. Open the extension panel → settings gear → set API Provider to Ollama, Base URL to http://localhost:11434, and select your pulled model from the dropdown. (If it doesn't appear, confirm it with ollama list.)
5. Carry your Roo config across:
- Custom modes / rules: Kilo Code reads your existing
.roomodesfile and.roo/rules/directory directly — just open the project. For Cline, recreate them as Cline custom instructions / rules (the concepts map cleanly; the file format differs). - MCP servers: both support the Model Context Protocol, so any MCP servers you wired into Roo carry over by re-adding them in the new extension's MCP settings. If MCP is new to you, start with MCP servers explained and Ollama MCP integration.
- The one trap to fix immediately — context window. Ollama defaults a model's context (
num_ctx) to roughly 2K–4K tokens, and an autonomous agent blows past that within a few tool calls, after which it silently loops or "forgets." Bake a larger context into a custom Modelfile:
# Save as Modelfile (no extension)
FROM qwen3-coder:30b
PARAMETER num_ctx 65536
ollama create qwen3-coder-agent-64k -f ./Modelfile
Then select qwen3-coder-agent-64k in the extension's model dropdown. This single step is the #1 reason "the local agent doesn't work" reports happen — the full explanation (and why you can't just crank it to 256K) is in the Cline + Ollama guide.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Which local model should you run?
Agentic coding is harder than autocomplete — the model must follow tool-calling instructions reliably across many turns. In 2026, these open-weight models are the practical picks for a local Roo successor:
| Model | Params (active) | Q4_K_M download | Native context | Best for |
|---|---|---|---|---|
| Qwen3-Coder 30B A3B | 30.5B (3.3B active, MoE) | ~19 GB | 256K (→1M w/ YaRN) | Big-repo context, faster tokens/s on 24GB |
| Devstral Small 2 24B | 24B (dense) | ~15 GB | 256K | Best agentic reliability on a 24GB card |
| Qwen2.5-Coder 14B | 14B (dense) | ~9 GB | 32K | 12–16GB cards |
| Qwen2.5-Coder 7B | 7B (dense) | ~4.7 GB | 32K | 8GB cards, scoped edits |
Download sizes are the Q4_K_M figures Ollama lists per tag; actual VRAM at load is higher once the KV cache and runtime overhead are added, and it grows with the context window you set.
Qwen3-Coder 30B A3B is a strong default for an agent: it's a Mixture-of-Experts model (30.5B total, only ~3.3B active per token), so it's noticeably faster than a dense 30B and ships a huge native context. Devstral Small 2 24B is purpose-built by Mistral and All Hands AI for agentic software engineering, with Mistral reporting it at 68.0% on SWE-bench Verified — reach for it when tool-call reliability matters more than raw speed. For the full tested ranking, see best local AI models for programming and the curated best local AI coding models.
What hardware do you actually need?
The model weights are only half the VRAM story — context (KV cache) is the other half, and an agent pushes context hard. Rough guidance:
| GPU / Unified RAM | Realistic model | Context you can run |
|---|---|---|
| 8 GB | Qwen2.5-Coder 7B (Q4) | ~16–32K |
| 12–16 GB | Qwen2.5-Coder 14B (Q4) | ~32K |
| 24 GB (RTX 3090/4090) | Qwen3-Coder 30B / Devstral Small 2 24B (Q4) | ~64K comfortably |
| 32 GB+ unified (Apple Silicon) | Either 24–30B model | 64–128K |
CPU-only inference works but is slow enough that agent loops get tedious. An Apple Silicon Mac with 32GB+ unified memory or a 24GB NVIDIA card is the practical sweet spot. For a full memory map of every model and quant, see the Ollama RAM/VRAM table.
Local successor vs the cloud version
Being honest matters more than cheerleading. Compared to following Roo to a cloud agent (or using a hosted frontier model):
Where local wins
- Survivability: open-source + your hardware means no company can archive your workflow. You just lived through why that matters.
- Privacy: code never leaves your machine — the reason regulated and proprietary teams use local agents at all.
- Cost: $0 ongoing vs. token bills that can run dollars per task on a big refactor.
- No limits / offline: unlimited runs, works with no network.
Where cloud still wins
- Raw capability: frontier cloud models lead on the hardest multi-file, long-horizon agent tasks. A 24–30B local model is strong, not frontier-class on the toughest SWE-bench problems.
- Context ceiling: cloud hands you 200K+ context with no VRAM math; locally, every extra token of context costs GPU memory.
- Zero setup: no Modelfiles, no
num_ctxtuning, no quant tradeoffs.
The pragmatic pattern most developers settle on: local Cline or Kilo Code for the bulk of day-to-day, private, scoped work; reach for a cloud model only on the occasional gnarly task where the extra capability is worth the dependency. If you mainly want inline completion rather than a full agent, Continue.dev + Ollama is the lighter companion, and the complete Ollama guide covers the server side end to end.
Key Takeaways
- Roo Code shut down for real — announced April 21, 2026, extension archived May 15, 2026, team pivoted to a cloud agent (Roomote). The old extension still runs but gets no updates.
- Go local instead of cloud. The lesson of the shutdown is vendor risk; an open-source agent on your own hardware can't be sunset out from under you, keeps your code private, and costs $0 to run.
- Cline or Kilo Code are the two local successors. Cline is the upstream original (Roo's own recommendation); Kilo Code reads your existing
.roomodes/.roo/rules/directly and adds inline autocomplete. - Both run on the same Ollama setup — Qwen3-Coder 30B A3B (MoE, big context) or Devstral Small 2 24B (best agentic reliability) on a 24GB card.
- Fix
num_ctxfirst. Ollama's tiny default context breaks agents; bake 32K–64K into a Modelfile before you judge the model.
Next Steps
- Ready to set it up? Follow the step-by-step Cline + Ollama setup guide — the same flow works for Kilo Code.
- Want your agent to touch files, the web, and databases? Start with MCP servers explained, then wire in Ollama MCP integration.
- Choosing a model? Read the tested best local AI models for programming ranking and the curated best local AI coding models.
- New to running models locally? The complete Ollama guide covers install, models, and the server end to end.
External references: Cline on GitHub · Qwen3-Coder model card.
Ollama’s running. Here’s what to build with it.
Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
- PILLARBest Local AI for Coding 2026: 10 Models Tested & Ranked
- 7B vs 14B vs 32B vs 70B for Coding (2026): What Size?
- AI Context Windows: 4K vs 128K vs 1M Tokens Explained (2026)
- AI vs Coding for Kids: Which Should Children Learn First?
- Aider + Ollama Setup (2026): Free Local AI Coding Agent
- Best 14B Coding Models (2026): Ranked by HumanEval + VRAM
- Best AI Coding Models Ranked: SWE-bench Leaderboard
- Best AI for JavaScript & TypeScript 2026: 10 Models Ranked
- Best AI Models for Python Development 2026: Top 10 Ranked
- Best Claude Model for Coding (2026): Opus 4.8 vs Sonnet 4.6 vs Haiku
Comments (0)
No comments yet. Be the first to share your thoughts!