★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Coding Tools

Roo Code Shut Down — Best Local Alternative (Self-Hosted Coding Agent + Ollama)

June 21, 2026
12 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Ollama’s running. Here’s what to build with it. Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.

Start free
Or own it for life — Lifetime $149, pay once

Yes — Roo Code shut down. The team announced it on April 21, 2026 and archived the VS Code extension on May 15, 2026, pivoting to a cloud agent (roomote.dev) because they no longer believe the IDE is the future of coding. The best replacement if you want to keep working locally is a fully self-hosted coding agent — Cline (the upstream project Roo forked from, and Roo's own recommended successor) or Kilo Code (an active Roo fork that reads your existing .roomodes and .roo/rules/ files) — pointed at a local Ollama model like Qwen3-Coder 30B A3B. No cloud account, no per-token billing, and your proprietary code never leaves the machine. The archived Roo extension still runs after May 15 (it doesn't self-destruct), but it gets no updates, so a migration is the right move — and it's a good moment to drop cloud lock-in entirely.

This guide covers what actually happened to Roo Code, why a local successor beats jumping onto another cloud agent, how Cline and Kilo Code differ, the exact steps to carry your Roo config across, and an honest look at where local agents still trail frontier cloud models.

Did Roo Code actually shut down?

Yes. This is real and recent, so here are the verified facts as of mid-2026:

  • Announced April 21, 2026. Roo Code's Matt Rubens posted that all Roo Code products — the VS Code extension, Roo Code Cloud, and the Roo Code Router — would be discontinued.
  • Repository archived May 15, 2026. The RooCodeInc/Roo-Code GitHub repo was archived after a final push, at roughly 24,200 stars and 3,300 forks. The extension had passed 3 million installs.
  • The reason: a cloud pivot. The team stated they "don't believe IDEs are the future of coding" and went all-in on a new cloud agent, Roomote (roomote.dev).
  • The archived extension still works. The binary doesn't disappear on May 15 — it just stops getting updates, security fixes, and model support. Running an unmaintained agent against fast-moving model APIs is borrowed time.
  • Roo's own recommendation was Cline. Roo Code pointed users back to Cline, the open-source project it originally forked from, for a model-agnostic extension. Cline's team publicly welcomed Roo users.

So Roo Code is genuinely gone as a maintained tool. The open question for you isn't whether to migrate — it's to what. The answer this site cares about: a setup you fully control, running on your own hardware.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Why a local alternative, not the cloud pivot?

Roo's successor (Roomote) is a hosted cloud agent. You can follow it there — but you'd be trading one cloud dependency for another, and the whole reason many people liked Roo was that it was an open extension you ran yourself. Going local instead fixes the failure mode you just lived through:

  • No vendor can sunset your setup. Cline and Kilo Code are open-source and model-agnostic. Even if one project pivoted tomorrow, the other — plus the local model — keeps working. You're not betting your workflow on a single company's roadmap.
  • Your code never leaves the machine. A local agent pointed at Ollama sends nothing to a third-party server. For proprietary or regulated codebases, that's the difference between "allowed" and "not allowed."
  • $0 in subscriptions, no per-token bill. A cloud agent meters every refactor; an autonomous agent on a big task can burn real dollars per run. Local inference is free after the hardware.
  • No rate limits, works offline. Hammer it through a refactor on a plane. The only ceiling is your GPU.

The honest trade-off — covered in the limits section — is that a 24–30B local model isn't frontier-cloud-class on the hardest multi-file tasks. But for scoped edits, boilerplate, tests, and refactors on a private repo, a well-configured local agent is a legitimate daily driver, and it can't be shut down out from under you.

Cline vs Kilo Code: which local successor?

Both are open-source VS Code agents that run local models through Ollama, and both descend from the same lineage (Roo forked Cline; Kilo Code forked both). Here's how to choose:

ClineKilo Code
OriginThe original upstream Roo forked fromActive fork of both Cline and Roo Code
Roo's official pick?Yes — recommended successorOne of two migration paths Roo named
Reads Roo config?Concepts carried over (plan/act, MCP, diffs); rules need light portingReads existing .roomodes and .roo/rules/ directly + publishes a Roo→Kilo migration guide
Local models via OllamaYesYes (also LM Studio, vLLM, OpenAI-compatible)
Install base / maturityLargest install base, original codebase~1.5M+ users, well-funded, ships fast
Extra featuresMature plan/act, MCP, broad provider supportOrchestrator mode, inline autocomplete, Memory Bank
LicenseMIT (open-source)Open-source

Both are legitimate, actively-maintained choices. Star counts and user numbers move; treat the figures above as mid-2026 approximations, not live stats.

Pick Cline if you want the upstream original with the largest community, the project Roo itself pointed you to, and a clean break from forks. Our full walkthrough is the Cline + Ollama setup guide.

Pick Kilo Code if you have a pile of .roomodes / custom rules you don't want to rewrite — it ingests your existing Roo config directly — or you want inline autocomplete in the same extension as the agent.

Either way, the local part is identical: both talk to the same Ollama server, so the model and hardware advice below applies to both.

Migrate your Roo config to a local agent

You need two pieces: Ollama (the local model server) and your chosen extension (Cline or Kilo Code).

1. Install Ollama

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: download the installer from ollama.com

2. Pull a coding model (pick based on your VRAM — see the model section):

# Strong agentic default if you have ~24GB VRAM:
ollama pull qwen3-coder:30b

# Smaller, code-focused option for ~24GB:
ollama pull devstral-small-2:24b

3. Install the extension in VS Code → Extensions (⇧⌘X / Ctrl+Shift+X) → search "Cline" or "Kilo Code" → Install.

4. Point it at local Ollama. Open the extension panel → settings gear → set API Provider to Ollama, Base URL to http://localhost:11434, and select your pulled model from the dropdown. (If it doesn't appear, confirm it with ollama list.)

5. Carry your Roo config across:

  • Custom modes / rules: Kilo Code reads your existing .roomodes file and .roo/rules/ directory directly — just open the project. For Cline, recreate them as Cline custom instructions / rules (the concepts map cleanly; the file format differs).
  • MCP servers: both support the Model Context Protocol, so any MCP servers you wired into Roo carry over by re-adding them in the new extension's MCP settings. If MCP is new to you, start with MCP servers explained and Ollama MCP integration.
  • The one trap to fix immediately — context window. Ollama defaults a model's context (num_ctx) to roughly 2K–4K tokens, and an autonomous agent blows past that within a few tool calls, after which it silently loops or "forgets." Bake a larger context into a custom Modelfile:
# Save as Modelfile (no extension)
FROM qwen3-coder:30b
PARAMETER num_ctx 65536
ollama create qwen3-coder-agent-64k -f ./Modelfile

Then select qwen3-coder-agent-64k in the extension's model dropdown. This single step is the #1 reason "the local agent doesn't work" reports happen — the full explanation (and why you can't just crank it to 256K) is in the Cline + Ollama guide.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Which local model should you run?

Agentic coding is harder than autocomplete — the model must follow tool-calling instructions reliably across many turns. In 2026, these open-weight models are the practical picks for a local Roo successor:

ModelParams (active)Q4_K_M downloadNative contextBest for
Qwen3-Coder 30B A3B30.5B (3.3B active, MoE)~19 GB256K (→1M w/ YaRN)Big-repo context, faster tokens/s on 24GB
Devstral Small 2 24B24B (dense)~15 GB256KBest agentic reliability on a 24GB card
Qwen2.5-Coder 14B14B (dense)~9 GB32K12–16GB cards
Qwen2.5-Coder 7B7B (dense)~4.7 GB32K8GB cards, scoped edits

Download sizes are the Q4_K_M figures Ollama lists per tag; actual VRAM at load is higher once the KV cache and runtime overhead are added, and it grows with the context window you set.

Qwen3-Coder 30B A3B is a strong default for an agent: it's a Mixture-of-Experts model (30.5B total, only ~3.3B active per token), so it's noticeably faster than a dense 30B and ships a huge native context. Devstral Small 2 24B is purpose-built by Mistral and All Hands AI for agentic software engineering, with Mistral reporting it at 68.0% on SWE-bench Verified — reach for it when tool-call reliability matters more than raw speed. For the full tested ranking, see best local AI models for programming and the curated best local AI coding models.

What hardware do you actually need?

The model weights are only half the VRAM story — context (KV cache) is the other half, and an agent pushes context hard. Rough guidance:

GPU / Unified RAMRealistic modelContext you can run
8 GBQwen2.5-Coder 7B (Q4)~16–32K
12–16 GBQwen2.5-Coder 14B (Q4)~32K
24 GB (RTX 3090/4090)Qwen3-Coder 30B / Devstral Small 2 24B (Q4)~64K comfortably
32 GB+ unified (Apple Silicon)Either 24–30B model64–128K

CPU-only inference works but is slow enough that agent loops get tedious. An Apple Silicon Mac with 32GB+ unified memory or a 24GB NVIDIA card is the practical sweet spot. For a full memory map of every model and quant, see the Ollama RAM/VRAM table.

Local successor vs the cloud version

Being honest matters more than cheerleading. Compared to following Roo to a cloud agent (or using a hosted frontier model):

Where local wins

  • Survivability: open-source + your hardware means no company can archive your workflow. You just lived through why that matters.
  • Privacy: code never leaves your machine — the reason regulated and proprietary teams use local agents at all.
  • Cost: $0 ongoing vs. token bills that can run dollars per task on a big refactor.
  • No limits / offline: unlimited runs, works with no network.

Where cloud still wins

  • Raw capability: frontier cloud models lead on the hardest multi-file, long-horizon agent tasks. A 24–30B local model is strong, not frontier-class on the toughest SWE-bench problems.
  • Context ceiling: cloud hands you 200K+ context with no VRAM math; locally, every extra token of context costs GPU memory.
  • Zero setup: no Modelfiles, no num_ctx tuning, no quant tradeoffs.

The pragmatic pattern most developers settle on: local Cline or Kilo Code for the bulk of day-to-day, private, scoped work; reach for a cloud model only on the occasional gnarly task where the extra capability is worth the dependency. If you mainly want inline completion rather than a full agent, Continue.dev + Ollama is the lighter companion, and the complete Ollama guide covers the server side end to end.

Key Takeaways

  1. Roo Code shut down for real — announced April 21, 2026, extension archived May 15, 2026, team pivoted to a cloud agent (Roomote). The old extension still runs but gets no updates.
  2. Go local instead of cloud. The lesson of the shutdown is vendor risk; an open-source agent on your own hardware can't be sunset out from under you, keeps your code private, and costs $0 to run.
  3. Cline or Kilo Code are the two local successors. Cline is the upstream original (Roo's own recommendation); Kilo Code reads your existing .roomodes / .roo/rules/ directly and adds inline autocomplete.
  4. Both run on the same Ollama setup — Qwen3-Coder 30B A3B (MoE, big context) or Devstral Small 2 24B (best agentic reliability) on a 24GB card.
  5. Fix num_ctx first. Ollama's tiny default context breaks agents; bake 32K–64K into a Modelfile before you judge the model.

Next Steps

External references: Cline on GitHub · Qwen3-Coder model card.

🎯
AI Learning Path

Ollama’s running. Here’s what to build with it.

Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path
More on AI Models for Coding
See the full Best Local AI for Coding guide.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 21, 2026🔄 Last Updated: June 21, 2026✓ Manually Reviewed

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators