Yes — Roo Code shut down. The team announced it on April 21, 2026 and archived the VS Code extension on May 15, 2026, pivoting to a cloud agent (roomote.dev) because they no longer believe the IDE is the future of coding. The best replacement if you want to keep working locally is a fully self-hosted coding agent — Cline (the upstream project Roo forked from, and Roo's own recommended successor) or Kilo Code (an active Roo fork that reads your existing .roomodes and .roo/rules/ files) — pointed at a local Ollama model like Qwen3-Coder 30B A3B. No cloud account, no per-token billing, and your proprietary code never leaves the machine. The archived Roo extension still runs after May 15 (it doesn't self-destruct), but it gets no updates, so a migration is the right move — and it's a good moment to drop cloud lock-in entirely.

This guide covers what actually happened to Roo Code, why a local successor beats jumping onto another cloud agent, how Cline and Kilo Code differ, the exact steps to carry your Roo config across, and an honest look at where local agents still trail frontier cloud models.

Did Roo Code actually shut down?

Yes. This is real and recent, so here are the verified facts as of mid-2026:

Announced April 21, 2026. Roo Code's Matt Rubens posted that all Roo Code products — the VS Code extension, Roo Code Cloud, and the Roo Code Router — would be discontinued.
Repository archived May 15, 2026. The RooCodeInc/Roo-Code GitHub repo was archived after a final push, at roughly 24,200 stars and 3,300 forks. The extension had passed 3 million installs.
The reason: a cloud pivot. The team stated they "don't believe IDEs are the future of coding" and went all-in on a new cloud agent, Roomote (roomote.dev).
The archived extension still works. The binary doesn't disappear on May 15 — it just stops getting updates, security fixes, and model support. Running an unmaintained agent against fast-moving model APIs is borrowed time.
Roo's own recommendation was Cline. Roo Code pointed users back to Cline, the open-source project it originally forked from, for a model-agnostic extension. Cline's team publicly welcomed Roo users.

So Roo Code is genuinely gone as a maintained tool. The open question for you isn't whether to migrate — it's to what. The answer this site cares about: a setup you fully control, running on your own hardware.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Why a local alternative, not the cloud pivot?

Roo's successor (Roomote) is a hosted cloud agent. You can follow it there — but you'd be trading one cloud dependency for another, and the whole reason many people liked Roo was that it was an open extension you ran yourself. Going local instead fixes the failure mode you just lived through:

No vendor can sunset your setup. Cline and Kilo Code are open-source and model-agnostic. Even if one project pivoted tomorrow, the other — plus the local model — keeps working. You're not betting your workflow on a single company's roadmap.
Your code never leaves the machine. A local agent pointed at Ollama sends nothing to a third-party server. For proprietary or regulated codebases, that's the difference between "allowed" and "not allowed."
$0 in subscriptions, no per-token bill. A cloud agent meters every refactor; an autonomous agent on a big task can burn real dollars per run. Local inference is free after the hardware.
No rate limits, works offline. Hammer it through a refactor on a plane. The only ceiling is your GPU.

The honest trade-off — covered in the limits section — is that a 24–30B local model isn't frontier-cloud-class on the hardest multi-file tasks. But for scoped edits, boilerplate, tests, and refactors on a private repo, a well-configured local agent is a legitimate daily driver, and it can't be shut down out from under you.

Cline vs Kilo Code: which local successor?

Both are open-source VS Code agents that run local models through Ollama, and both descend from the same lineage (Roo forked Cline; Kilo Code forked both). Here's how to choose:

	Cline	Kilo Code
Origin	The original upstream Roo forked from	Active fork of both Cline and Roo Code
Roo's official pick?	Yes — recommended successor	One of two migration paths Roo named
Reads Roo config?	Concepts carried over (plan/act, MCP, diffs); rules need light porting	Reads existing `.roomodes` and `.roo/rules/` directly + publishes a Roo→Kilo migration guide
Local models via Ollama	Yes	Yes (also LM Studio, vLLM, OpenAI-compatible)
Install base / maturity	Largest install base, original codebase	~1.5M+ users, well-funded, ships fast
Extra features	Mature plan/act, MCP, broad provider support	Orchestrator mode, inline autocomplete, Memory Bank
License	MIT (open-source)	Open-source

Both are legitimate, actively-maintained choices. Star counts and user numbers move; treat the figures above as mid-2026 approximations, not live stats.

Pick Cline if you want the upstream original with the largest community, the project Roo itself pointed you to, and a clean break from forks. Our full walkthrough is the Cline + Ollama setup guide.

Pick Kilo Code if you have a pile of .roomodes / custom rules you don't want to rewrite — it ingests your existing Roo config directly — or you want inline autocomplete in the same extension as the agent.

Either way, the local part is identical: both talk to the same Ollama server, so the model and hardware advice below applies to both.

Migrate your Roo config to a local agent

You need two pieces: Ollama (the local model server) and your chosen extension (Cline or Kilo Code).

1. Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download the installer from ollama.com

2. Pull a coding model (pick based on your VRAM — see the model section):

# Strong agentic default if you have ~24GB VRAM:
ollama pull qwen3-coder:30b

# Smaller, code-focused option for ~24GB:
ollama pull devstral-small-2:24b

3. Install the extension in VS Code → Extensions (⇧⌘X / Ctrl+Shift+X) → search "Cline" or "Kilo Code" → Install.

4. Point it at local Ollama. Open the extension panel → settings gear → set API Provider to Ollama, Base URL to http://localhost:11434, and select your pulled model from the dropdown. (If it doesn't appear, confirm it with ollama list.)

5. Carry your Roo config across:

Custom modes / rules: Kilo Code reads your existing .roomodes file and .roo/rules/ directory directly — just open the project. For Cline, recreate them as Cline custom instructions / rules (the concepts map cleanly; the file format differs).
MCP servers: both support the Model Context Protocol, so any MCP servers you wired into Roo carry over by re-adding them in the new extension's MCP settings. If MCP is new to you, start with MCP servers explained and Ollama MCP integration.
The one trap to fix immediately — context window. Ollama defaults a model's context (num_ctx) to roughly 2K–4K tokens, and an autonomous agent blows past that within a few tool calls, after which it silently loops or "forgets." Bake a larger context into a custom Modelfile:

# Save as Modelfile (no extension)
FROM qwen3-coder:30b
PARAMETER num_ctx 65536

ollama create qwen3-coder-agent-64k -f ./Modelfile

Then select qwen3-coder-agent-64k in the extension's model dropdown. This single step is the #1 reason "the local agent doesn't work" reports happen — the full explanation (and why you can't just crank it to 256K) is in the Cline + Ollama guide.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Which local model should you run?

Agentic coding is harder than autocomplete — the model must follow tool-calling instructions reliably across many turns. In 2026, these open-weight models are the practical picks for a local Roo successor:

Model	Params (active)	Q4_K_M download	Native context	Best for
Qwen3-Coder 30B A3B	30.5B (3.3B active, MoE)	~19 GB	256K (→1M w/ YaRN)	Big-repo context, faster tokens/s on 24GB
Devstral Small 2 24B	24B (dense)	~15 GB	256K	Best agentic reliability on a 24GB card
Qwen2.5-Coder 14B	14B (dense)	~9 GB	32K	12–16GB cards
Qwen2.5-Coder 7B	7B (dense)	~4.7 GB	32K	8GB cards, scoped edits

Download sizes are the Q4_K_M figures Ollama lists per tag; actual VRAM at load is higher once the KV cache and runtime overhead are added, and it grows with the context window you set.

Qwen3-Coder 30B A3B is a strong default for an agent: it's a Mixture-of-Experts model (30.5B total, only ~3.3B active per token), so it's noticeably faster than a dense 30B and ships a huge native context. Devstral Small 2 24B is purpose-built by Mistral and All Hands AI for agentic software engineering, with Mistral reporting it at 68.0% on SWE-bench Verified — reach for it when tool-call reliability matters more than raw speed. For the full tested ranking, see best local AI models for programming and the curated best local AI coding models.

What hardware do you actually need?

The model weights are only half the VRAM story — context (KV cache) is the other half, and an agent pushes context hard. Rough guidance:

GPU / Unified RAM	Realistic model	Context you can run
8 GB	Qwen2.5-Coder 7B (Q4)	~16–32K
12–16 GB	Qwen2.5-Coder 14B (Q4)	~32K
24 GB (RTX 3090/4090)	Qwen3-Coder 30B / Devstral Small 2 24B (Q4)	~64K comfortably
32 GB+ unified (Apple Silicon)	Either 24–30B model	64–128K

CPU-only inference works but is slow enough that agent loops get tedious. An Apple Silicon Mac with 32GB+ unified memory or a 24GB NVIDIA card is the practical sweet spot. For a full memory map of every model and quant, see the Ollama RAM/VRAM table.

Local successor vs the cloud version

Being honest matters more than cheerleading. Compared to following Roo to a cloud agent (or using a hosted frontier model):

Where local wins

Survivability: open-source + your hardware means no company can archive your workflow. You just lived through why that matters.
Privacy: code never leaves your machine — the reason regulated and proprietary teams use local agents at all.
Cost: $0 ongoing vs. token bills that can run dollars per task on a big refactor.
No limits / offline: unlimited runs, works with no network.

Where cloud still wins

Raw capability: frontier cloud models lead on the hardest multi-file, long-horizon agent tasks. A 24–30B local model is strong, not frontier-class on the toughest SWE-bench problems.
Context ceiling: cloud hands you 200K+ context with no VRAM math; locally, every extra token of context costs GPU memory.
Zero setup: no Modelfiles, no num_ctx tuning, no quant tradeoffs.

The pragmatic pattern most developers settle on: local Cline or Kilo Code for the bulk of day-to-day, private, scoped work; reach for a cloud model only on the occasional gnarly task where the extra capability is worth the dependency. If you mainly want inline completion rather than a full agent, Continue.dev + Ollama is the lighter companion, and the complete Ollama guide covers the server side end to end.

Key Takeaways

Roo Code shut down for real — announced April 21, 2026, extension archived May 15, 2026, team pivoted to a cloud agent (Roomote). The old extension still runs but gets no updates.
Go local instead of cloud. The lesson of the shutdown is vendor risk; an open-source agent on your own hardware can't be sunset out from under you, keeps your code private, and costs $0 to run.
Cline or Kilo Code are the two local successors. Cline is the upstream original (Roo's own recommendation); Kilo Code reads your existing .roomodes / .roo/rules/ directly and adds inline autocomplete.
Both run on the same Ollama setup — Qwen3-Coder 30B A3B (MoE, big context) or Devstral Small 2 24B (best agentic reliability) on a 24GB card.
Fix num_ctx first. Ollama's tiny default context breaks agents; bake 32K–64K into a Modelfile before you judge the model.

Next Steps

Ready to set it up? Follow the step-by-step Cline + Ollama setup guide — the same flow works for Kilo Code.
Want your agent to touch files, the web, and databases? Start with MCP servers explained, then wire in Ollama MCP integration.
Choosing a model? Read the tested best local AI models for programming ranking and the curated best local AI coding models.
New to running models locally? The complete Ollama guide covers install, models, and the server end to end.

External references: Cline on GitHub · Qwen3-Coder model card.

Roo Code Shut Down — Best Local Alternative (Self-Hosted Coding Agent + Ollama)

Want to go deeper than this article?

Did Roo Code actually shut down?

Reading articles is good. Building is better.

Why a local alternative, not the cloud pivot?

Cline vs Kilo Code: which local successor?

Migrate your Roo config to a local agent

Reading articles is good. Building is better.

Which local model should you run?

What hardware do you actually need?

Local successor vs the cloud version

Key Takeaways

Next Steps

Ollama’s running. Here’s what to build with it.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

Cline + Ollama Setup

Best Local AI Models for Programming

Ollama MCP Integration

Complete Ollama Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI