Tools

Continue.dev + Ollama: Free Local AI Coding Assistant

February 6, 2026
18 min read
Local AI Master Research Team
🎁 4 PDFs included
Newsletter

Before we dive deeper...

Get your free AI Starter Kit

Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.

No spam, everUnsubscribe anytime
12,000+ downloads

Continue + Ollama Quick Start

# 1. Install Ollama and models
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull qwen2.5-coder:1.5b
ollama pull llama3.1:8b

# 2. Install Continue extension
# VS Code: Search "Continue" in Extensions
# JetBrains: Settings → Plugins → "Continue"

# 3. Start coding with local AI!

Cost: $0/month | Privacy: 100% local | Stars: 31,300+ GitHub

What is Continue.dev?

Continue.dev is the leading open-source AI coding assistant, offering a free alternative to GitHub Copilot that runs entirely on your machine with local models.

Key Statistics

MetricValue
GitHub Stars31,300+
Contributors450+
LicenseApache 2.0
IDE SupportVS Code, JetBrains
BackingY Combinator (W23)

Why Continue + Ollama?

  1. Free forever - No $10-20/month subscription
  2. 100% private - Code never leaves your machine
  3. Fully customizable - Any model, any workflow
  4. Open source - Audit, modify, contribute
  5. Enterprise-ready - Used by Siemens, Morningstar

Installation

Step 1: Install Ollama

macOS:

brew install ollama

Linux:

curl -fsSL https://ollama.ai/install.sh | sh

Windows: Download from ollama.ai and run the installer.

Step 2: Pull Required Models

# Start Ollama
ollama serve

# Autocomplete model (fast, small)
ollama pull qwen2.5-coder:1.5b

# Chat model (quality, reasoning)
ollama pull llama3.1:8b

# Embeddings for codebase search
ollama pull nomic-embed-text

# Verify
ollama list

Step 3: Install Continue Extension

VS Code:

  1. Open Extensions (Cmd/Ctrl + Shift + X)
  2. Search "Continue"
  3. Click Install

JetBrains:

  1. Settings → Plugins → Marketplace
  2. Search "Continue"
  3. Install and restart

Step 4: Configure Continue

Continue reads configuration from:

  • macOS/Linux: ~/.continue/config.yaml
  • Windows: %USERPROFILE%\.continue\config.yaml

Complete Configuration Guide

Basic config.yaml

name: Local AI Assistant
version: 1.0.0
schema: v1

models:
  # Chat and reasoning (quality model)
  - name: Llama 3.1 8B
    provider: ollama
    model: llama3.1:8b
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    defaultCompletionOptions:
      temperature: 0.7
      contextLength: 8192

  # Tab autocomplete (fast model)
  - name: Qwen Coder 1.5B
    provider: ollama
    model: qwen2.5-coder:1.5b
    roles:
      - autocomplete
    autocompleteOptions:
      debounceDelay: 250
      maxPromptTokens: 1024
      multilineCompletions: auto

  # Embeddings for @codebase
  - name: Nomic Embed
    provider: ollama
    model: nomic-embed-text
    roles:
      - embed

# Context providers
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: folder
  - provider: codebase

# Coding rules
rules:
  - Give concise, focused responses
  - Follow existing code style
  - Prefer TypeScript over JavaScript

Advanced Configuration (24GB+ VRAM)

name: Power User Config
version: 1.0.0
schema: v1

models:
  # Primary reasoning model
  - name: DeepSeek R1 32B
    provider: ollama
    model: deepseek-r1:32b
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply
    capabilities:
      - tool_use  # Enable agent mode
    defaultCompletionOptions:
      temperature: 0.7
      contextLength: 16384
      top_p: 0.9

  # Fast autocomplete
  - name: StarCoder 3B
    provider: ollama
    model: starcoder2:3b
    roles:
      - autocomplete
    autocompleteOptions:
      debounceDelay: 200
      maxPromptTokens: 2048
      multilineCompletions: auto

  # Embeddings
  - name: Nomic Embed
    provider: ollama
    model: nomic-embed-text
    roles:
      - embed

# Custom slash commands
prompts:
  - name: test
    description: Generate unit tests
    prompt: |
      Write comprehensive unit tests for this code.
      Use Jest/Vitest. Cover edge cases.

  - name: refactor
    description: Refactor for readability
    prompt: |
      Refactor this code for better readability.
      Explain your changes.

  - name: review
    description: Code review
    prompt: |
      Review this code for:
      - Bugs and edge cases
      - Performance issues
      - Security concerns
      - Code style
      Provide actionable feedback.

# MCP servers for extended functionality
mcpServers:
  - name: filesystem
    command: npx
    args:
      - "-y"
      - "@anthropic/mcp-filesystem-server"
      - "/path/to/allowed/directory"

Autodetect Models (Simplest)

name: Simple Config
schema: v1

models:
  - name: Autodetect
    provider: ollama
    model: AUTODETECT
    roles:
      - chat
      - edit
      - autocomplete

Best Models by Hardware

4-8GB VRAM (RTX 3060, M1/M2 8GB)

RoleModelVRAM
Autocompleteqwen2.5-coder:1.5b~2GB
Chatllama3.1:8b~6GB
Embeddingsnomic-embed-text~1GB

12-16GB VRAM (RTX 4070, M2 Pro)

RoleModelVRAM
Autocompletestarcoder2:3b~4GB
Chatcodellama:13b~10GB
Embeddingsnomic-embed-text~1GB

24GB+ VRAM (RTX 4090, M3 Max)

RoleModelVRAM
Autocompletestarcoder2:3b~4GB
Chatdeepseek-r1:32b~20GB
Embeddingsnomic-embed-text~1GB

Key Features

Tab Autocomplete

Press Tab to accept inline suggestions. Works in any file type.

Config options:

autocompleteOptions:
  debounceDelay: 250      # ms before triggering
  maxPromptTokens: 1024   # context for model
  multilineCompletions: auto
  onlyMyCode: true        # ignore node_modules etc.

Tip: Specialized code models (qwen2.5-coder, starcoder2) outperform GPT-4 for autocomplete.

Chat Interface

  • VS Code: Cmd/Ctrl + L
  • JetBrains: Cmd/Ctrl + J

Select code and ask questions. Add context with @ mentions:

  • @codebase - Search entire project
  • @file - Reference specific file
  • @folder - Include directory
  • @docs - Documentation context

Edit Mode

Select code → Press Cmd/Ctrl + I → Describe changes → Apply

Continue modifies code while preserving formatting and style.

Agent Mode

Enable with capabilities: [tool_use] for autonomous multi-step tasks:

  • Read and write files
  • Run terminal commands
  • Search codebase
  • Make multiple changes
models:
  - name: Agent Model
    provider: ollama
    model: llama3.1:8b
    capabilities:
      - tool_use

Custom Slash Commands

Create shortcuts for common tasks:

prompts:
  - name: doc
    description: Add documentation
    prompt: Add comprehensive JSDoc/docstring to this code.

  - name: optimize
    description: Optimize performance
    prompt: Suggest performance optimizations for this code.

Use with /doc or /optimize in chat.


Performance Optimization

Reduce Autocomplete Latency

  1. Use small models (1.5B-3B parameters)
  2. Increase debounce delay:
    autocompleteOptions:
      debounceDelay: 350
    
  3. Disable thinking for Qwen3:
    requestOptions:
      extraBodyProperties:
        think: false
    

Verify GPU Acceleration

# Check GPU usage
ollama ps

# Should show GPU layers loaded
NAME              SIZE    PROCESSOR
llama3.1:8b       4.7GB   100% GPU

Debug Issues

# Restart Ollama with debug logging
pkill ollama
OLLAMA_DEBUG=1 ollama serve

# Check Continue logs
cat ~/.continue/logs/core.log

Continue vs Alternatives

FeatureContinueGitHub CopilotCursorCline
PriceFree$10-19/mo$20-200/moFree
Open SourceYesNoNoYes
Local ModelsFullNoLimitedYes
IDE SupportVS Code, JetBrainsManyVS Code forkVS Code
Agent ModeYesYesYesYes
CustomizationExcellentLimitedGoodGood

When to Choose Continue

  • Privacy-critical projects - No code leaves your machine
  • Cost-conscious teams - Save $120-240/year per developer
  • Custom workflows - Build exactly what your team needs
  • Open source preference - Full transparency and control

When to Choose Copilot

  • Quick setup priority - Works out of the box
  • Enterprise requirements - SSO, compliance features
  • Multi-IDE teams - Wider IDE support
  • Training data quality - GitHub's massive codebase

Troubleshooting

"Cannot connect to Ollama"

# 1. Ensure Ollama is running
ollama serve

# 2. Verify port
curl http://localhost:11434/version

# 3. Check config.yaml apiBase
apiBase: http://localhost:11434

Models Not Loading

# Pull models first
ollama pull qwen2.5-coder:1.5b
ollama list  # Verify installed

# Restart Continue
# VS Code: Cmd/Ctrl + Shift + P → "Continue: Reload"

Slow Performance

  1. Switch to smaller autocomplete model
  2. Increase debounceDelay
  3. Reduce contextLength
  4. Check GPU usage: ollama ps

Config Not Applied

# Validate YAML syntax
python -c "import yaml; yaml.safe_load(open('~/.continue/config.yaml'))"

# Restart VS Code after changes

MCP Integration

Extend Continue with Model Context Protocol servers:

mcpServers:
  # Database access
  - name: sqlite
    command: npx
    args: ["-y", "mcp-sqlite", "/path/to/db.sqlite"]

  # GitHub integration
  - name: github
    command: uvx
    args: [mcp-server-github]
    env:
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

  # File system access
  - name: filesystem
    command: npx
    args: ["-y", "@anthropic/mcp-filesystem-server", "/allowed/path"]

Use in chat: Type @ → Select "MCP" → Choose resource.


Key Takeaways

  1. Continue + Ollama = Free Copilot alternative with full privacy
  2. Use small models for autocomplete (1.5B-3B) and large for chat (8B-32B)
  3. nomic-embed-text enables powerful codebase search
  4. Agent mode requires capabilities: [tool_use] and 8B+ models
  5. Custom slash commands automate your team's workflows
  6. MCP servers extend Continue's capabilities infinitely
  7. 31,300+ stars prove the community trust

Next Steps

  1. Compare local AI tools for model management
  2. Explore AI coding agents for autonomous development
  3. Compare Cursor vs Copilot vs Claude Code for alternatives
  4. Check VRAM requirements for model sizing
  5. Learn about MCP servers for tool integration

Continue.dev with Ollama delivers a professional AI coding experience without monthly subscriptions or privacy compromises. Whether you're a solo developer seeking GitHub Copilot features for free, or an enterprise team requiring local deployment for compliance, Continue provides the flexibility and performance to transform your coding workflow.

🚀 Join 12K+ developers
Newsletter

Ready to start your AI career?

Get the complete roadmap

Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.

No spam, everUnsubscribe anytime
12,000+ downloads
Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: February 6, 2026🔄 Last Updated: February 6, 2026✓ Manually Reviewed

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
Free Tools & Calculators