Continue.dev + Ollama: Free Local AI Coding Assistant
Before we dive deeper...
Get your free AI Starter Kit
Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.
Continue + Ollama Quick Start
# 1. Install Ollama and models
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5-coder:1.5b
ollama pull llama3.1:8b
# 2. Install Continue extension
# VS Code: Search "Continue" in Extensions
# JetBrains: Settings → Plugins → "Continue"
# 3. Start coding with local AI!
What is Continue.dev?
Continue.dev is the leading open-source AI coding assistant, offering a free alternative to GitHub Copilot that runs entirely on your machine with local models.
Key Statistics
| Metric | Value |
|---|---|
| GitHub Stars | 31,300+ |
| Contributors | 450+ |
| License | Apache 2.0 |
| IDE Support | VS Code, JetBrains |
| Backing | Y Combinator (W23) |
Why Continue + Ollama?
- Free forever - No $10-20/month subscription
- 100% private - Code never leaves your machine
- Fully customizable - Any model, any workflow
- Open source - Audit, modify, contribute
- Enterprise-ready - Used by Siemens, Morningstar
Installation
Step 1: Install Ollama
macOS:
brew install ollama
Linux:
curl -fsSL https://ollama.ai/install.sh | sh
Windows: Download from ollama.ai and run the installer.
Step 2: Pull Required Models
# Start Ollama
ollama serve
# Autocomplete model (fast, small)
ollama pull qwen2.5-coder:1.5b
# Chat model (quality, reasoning)
ollama pull llama3.1:8b
# Embeddings for codebase search
ollama pull nomic-embed-text
# Verify
ollama list
Step 3: Install Continue Extension
VS Code:
- Open Extensions (Cmd/Ctrl + Shift + X)
- Search "Continue"
- Click Install
JetBrains:
- Settings → Plugins → Marketplace
- Search "Continue"
- Install and restart
Step 4: Configure Continue
Continue reads configuration from:
- macOS/Linux:
~/.continue/config.yaml - Windows:
%USERPROFILE%\.continue\config.yaml
Complete Configuration Guide
Basic config.yaml
name: Local AI Assistant
version: 1.0.0
schema: v1
models:
# Chat and reasoning (quality model)
- name: Llama 3.1 8B
provider: ollama
model: llama3.1:8b
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
defaultCompletionOptions:
temperature: 0.7
contextLength: 8192
# Tab autocomplete (fast model)
- name: Qwen Coder 1.5B
provider: ollama
model: qwen2.5-coder:1.5b
roles:
- autocomplete
autocompleteOptions:
debounceDelay: 250
maxPromptTokens: 1024
multilineCompletions: auto
# Embeddings for @codebase
- name: Nomic Embed
provider: ollama
model: nomic-embed-text
roles:
- embed
# Context providers
context:
- provider: code
- provider: docs
- provider: diff
- provider: terminal
- provider: folder
- provider: codebase
# Coding rules
rules:
- Give concise, focused responses
- Follow existing code style
- Prefer TypeScript over JavaScript
Advanced Configuration (24GB+ VRAM)
name: Power User Config
version: 1.0.0
schema: v1
models:
# Primary reasoning model
- name: DeepSeek R1 32B
provider: ollama
model: deepseek-r1:32b
apiBase: http://localhost:11434
roles:
- chat
- edit
- apply
capabilities:
- tool_use # Enable agent mode
defaultCompletionOptions:
temperature: 0.7
contextLength: 16384
top_p: 0.9
# Fast autocomplete
- name: StarCoder 3B
provider: ollama
model: starcoder2:3b
roles:
- autocomplete
autocompleteOptions:
debounceDelay: 200
maxPromptTokens: 2048
multilineCompletions: auto
# Embeddings
- name: Nomic Embed
provider: ollama
model: nomic-embed-text
roles:
- embed
# Custom slash commands
prompts:
- name: test
description: Generate unit tests
prompt: |
Write comprehensive unit tests for this code.
Use Jest/Vitest. Cover edge cases.
- name: refactor
description: Refactor for readability
prompt: |
Refactor this code for better readability.
Explain your changes.
- name: review
description: Code review
prompt: |
Review this code for:
- Bugs and edge cases
- Performance issues
- Security concerns
- Code style
Provide actionable feedback.
# MCP servers for extended functionality
mcpServers:
- name: filesystem
command: npx
args:
- "-y"
- "@anthropic/mcp-filesystem-server"
- "/path/to/allowed/directory"
Autodetect Models (Simplest)
name: Simple Config
schema: v1
models:
- name: Autodetect
provider: ollama
model: AUTODETECT
roles:
- chat
- edit
- autocomplete
Best Models by Hardware
4-8GB VRAM (RTX 3060, M1/M2 8GB)
| Role | Model | VRAM |
|---|---|---|
| Autocomplete | qwen2.5-coder:1.5b | ~2GB |
| Chat | llama3.1:8b | ~6GB |
| Embeddings | nomic-embed-text | ~1GB |
12-16GB VRAM (RTX 4070, M2 Pro)
| Role | Model | VRAM |
|---|---|---|
| Autocomplete | starcoder2:3b | ~4GB |
| Chat | codellama:13b | ~10GB |
| Embeddings | nomic-embed-text | ~1GB |
24GB+ VRAM (RTX 4090, M3 Max)
| Role | Model | VRAM |
|---|---|---|
| Autocomplete | starcoder2:3b | ~4GB |
| Chat | deepseek-r1:32b | ~20GB |
| Embeddings | nomic-embed-text | ~1GB |
Key Features
Tab Autocomplete
Press Tab to accept inline suggestions. Works in any file type.
Config options:
autocompleteOptions:
debounceDelay: 250 # ms before triggering
maxPromptTokens: 1024 # context for model
multilineCompletions: auto
onlyMyCode: true # ignore node_modules etc.
Tip: Specialized code models (qwen2.5-coder, starcoder2) outperform GPT-4 for autocomplete.
Chat Interface
- VS Code: Cmd/Ctrl + L
- JetBrains: Cmd/Ctrl + J
Select code and ask questions. Add context with @ mentions:
@codebase- Search entire project@file- Reference specific file@folder- Include directory@docs- Documentation context
Edit Mode
Select code → Press Cmd/Ctrl + I → Describe changes → Apply
Continue modifies code while preserving formatting and style.
Agent Mode
Enable with capabilities: [tool_use] for autonomous multi-step tasks:
- Read and write files
- Run terminal commands
- Search codebase
- Make multiple changes
models:
- name: Agent Model
provider: ollama
model: llama3.1:8b
capabilities:
- tool_use
Custom Slash Commands
Create shortcuts for common tasks:
prompts:
- name: doc
description: Add documentation
prompt: Add comprehensive JSDoc/docstring to this code.
- name: optimize
description: Optimize performance
prompt: Suggest performance optimizations for this code.
Use with /doc or /optimize in chat.
Performance Optimization
Reduce Autocomplete Latency
- Use small models (1.5B-3B parameters)
- Increase debounce delay:
autocompleteOptions: debounceDelay: 350 - Disable thinking for Qwen3:
requestOptions: extraBodyProperties: think: false
Verify GPU Acceleration
# Check GPU usage
ollama ps
# Should show GPU layers loaded
NAME SIZE PROCESSOR
llama3.1:8b 4.7GB 100% GPU
Debug Issues
# Restart Ollama with debug logging
pkill ollama
OLLAMA_DEBUG=1 ollama serve
# Check Continue logs
cat ~/.continue/logs/core.log
Continue vs Alternatives
| Feature | Continue | GitHub Copilot | Cursor | Cline |
|---|---|---|---|---|
| Price | Free | $10-19/mo | $20-200/mo | Free |
| Open Source | Yes | No | No | Yes |
| Local Models | Full | No | Limited | Yes |
| IDE Support | VS Code, JetBrains | Many | VS Code fork | VS Code |
| Agent Mode | Yes | Yes | Yes | Yes |
| Customization | Excellent | Limited | Good | Good |
When to Choose Continue
- Privacy-critical projects - No code leaves your machine
- Cost-conscious teams - Save $120-240/year per developer
- Custom workflows - Build exactly what your team needs
- Open source preference - Full transparency and control
When to Choose Copilot
- Quick setup priority - Works out of the box
- Enterprise requirements - SSO, compliance features
- Multi-IDE teams - Wider IDE support
- Training data quality - GitHub's massive codebase
Troubleshooting
"Cannot connect to Ollama"
# 1. Ensure Ollama is running
ollama serve
# 2. Verify port
curl http://localhost:11434/version
# 3. Check config.yaml apiBase
apiBase: http://localhost:11434
Models Not Loading
# Pull models first
ollama pull qwen2.5-coder:1.5b
ollama list # Verify installed
# Restart Continue
# VS Code: Cmd/Ctrl + Shift + P → "Continue: Reload"
Slow Performance
- Switch to smaller autocomplete model
- Increase debounceDelay
- Reduce contextLength
- Check GPU usage:
ollama ps
Config Not Applied
# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('~/.continue/config.yaml'))"
# Restart VS Code after changes
MCP Integration
Extend Continue with Model Context Protocol servers:
mcpServers:
# Database access
- name: sqlite
command: npx
args: ["-y", "mcp-sqlite", "/path/to/db.sqlite"]
# GitHub integration
- name: github
command: uvx
args: [mcp-server-github]
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# File system access
- name: filesystem
command: npx
args: ["-y", "@anthropic/mcp-filesystem-server", "/allowed/path"]
Use in chat: Type @ → Select "MCP" → Choose resource.
Key Takeaways
- Continue + Ollama = Free Copilot alternative with full privacy
- Use small models for autocomplete (1.5B-3B) and large for chat (8B-32B)
- nomic-embed-text enables powerful codebase search
- Agent mode requires
capabilities: [tool_use]and 8B+ models - Custom slash commands automate your team's workflows
- MCP servers extend Continue's capabilities infinitely
- 31,300+ stars prove the community trust
Next Steps
- Compare local AI tools for model management
- Explore AI coding agents for autonomous development
- Compare Cursor vs Copilot vs Claude Code for alternatives
- Check VRAM requirements for model sizing
- Learn about MCP servers for tool integration
Continue.dev with Ollama delivers a professional AI coding experience without monthly subscriptions or privacy compromises. Whether you're a solo developer seeking GitHub Copilot features for free, or an enterprise team requiring local deployment for compliance, Continue provides the flexibility and performance to transform your coding workflow.
Ready to start your AI career?
Get the complete roadmap
Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!