Ollama Not Working? Complete Troubleshooting Guide
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Ollama Not Working? Complete Troubleshooting Guide
Published on April 11, 2026 • 18 min read
I maintain Ollama across 14 machines — three Linux servers, a handful of Macs, and a few Windows workstations. Every error message in this guide is one I have personally hit, diagnosed, and fixed. If Ollama is broken on your machine right now, start with the diagnostic flowchart below and follow the branch that matches your situation.
Diagnostic Flowchart: Find Your Problem Fast
Work through these checkpoints in order. Each "No" answer points you to the right section.
Checkpoint 1 — Is Ollama installed?
ollama --version
- Command not found? Jump to Installation Failures.
- Version prints correctly? Move to Checkpoint 2.
Checkpoint 2 — Is the Ollama server running?
curl http://localhost:11434/api/tags
- Connection refused? Jump to Server Won't Start.
- Returns JSON with your models? Move to Checkpoint 3.
Checkpoint 3 — Can you pull a model?
ollama pull llama3.2:3b
- Network error or stall? Jump to Download and Network Issues.
- Disk space error? Jump to Storage Problems.
- Model downloads fine? Move to Checkpoint 4.
Checkpoint 4 — Can you run the model?
ollama run llama3.2:3b "Say hello"
- "model requires more system memory"? Jump to Memory Errors.
- "failed to load model"? Jump to Model Load Failures.
- "GPU not found" or CUDA errors? Jump to GPU Detection Issues.
- "context length exceeded"? Jump to Context Length Errors.
- Model runs but output is garbage? Jump to Bad Output Quality.
- Model runs but is painfully slow? Jump to Performance Problems.
Installation Failures {#installation-failures}
Windows: "ollama" is not recognized
The installer finished, but your terminal cannot find the binary. Three possible causes.
Cause 1: PATH not updated. The Ollama installer adds itself to the system PATH, but open terminal sessions do not pick up PATH changes automatically. Close every terminal window and open a fresh one. If you are using Windows Terminal, close the entire app — not just the tab.
Cause 2: Installer failed silently. Re-download from ollama.com/download/windows and run the installer as Administrator. Right-click the .exe, select "Run as administrator."
Cause 3: Antivirus quarantined the binary. Windows Defender sometimes flags Ollama. Check your quarantine list:
# Check Windows Defender quarantine
Get-MpThreatDetection | Select-Object -Last 5
# Add exclusion for Ollama
Add-MpPreference -ExclusionPath "C:\Users\$env:USERNAME\AppData\Local\Ollama"
For a complete walkthrough of Windows installation, see our Ollama Windows installation guide.
Mac: "command not found: ollama"
Homebrew install: Verify Homebrew itself works (brew --version). If yes:
brew reinstall ollama
# For Apple Silicon, ensure Homebrew path is set
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zprofile
source ~/.zprofile
Direct download install: The .app bundle puts the binary in a non-standard location. Add it manually:
sudo ln -sf /Applications/Ollama.app/Contents/Resources/ollama /usr/local/bin/ollama
Gatekeeper blocking: If macOS refuses to open Ollama because it is from an unidentified developer:
xattr -cr /Applications/Ollama.app
Full Mac setup walkthrough: Mac local AI setup guide.
Linux: "ollama: command not found"
Snap/apt install: Check that /usr/local/bin is in your PATH:
echo $PATH | tr ':' '\n' | grep -q '/usr/local/bin' && echo "OK" || echo "MISSING"
# If missing, add it
echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
curl install script failed: The official one-liner sometimes breaks on minimal server images missing curl or systemd:
# Install prerequisites first
sudo apt update && sudo apt install -y curl systemd
# Then run the installer
curl -fsSL https://ollama.com/install.sh | sh
# Verify
ollama --version
Server Won't Start {#server-wont-start}
Error: "bind: address already in use"
Port 11434 is taken. Another Ollama instance is already running, or another service grabbed the port.
# Find what is using port 11434
# Linux/Mac:
lsof -i :11434
# Windows (PowerShell):
Get-NetTCPConnection -LocalPort 11434 | Select-Object OwningProcess
Get-Process -Id (Get-NetTCPConnection -LocalPort 11434).OwningProcess
Fix 1: Kill the existing process.
# Linux/Mac
pkill -f ollama
# Wait 2 seconds, then start again
ollama serve
# Windows
taskkill /F /IM ollama.exe
ollama serve
Fix 2: Run on a different port.
# Linux/Mac
OLLAMA_HOST=127.0.0.1:11435 ollama serve
# Then tell the client about the new port
export OLLAMA_HOST=127.0.0.1:11435
ollama list
Server starts then immediately exits
Check the logs for the actual error:
# Linux (systemd)
journalctl -u ollama -n 50 --no-pager
# Mac (Homebrew service)
cat ~/.ollama/logs/server.log
# Windows
type "%LOCALAPPDATA%\Ollama\server.log"
Common causes: corrupted model files (delete ~/.ollama/models and re-pull), or NVIDIA driver mismatch on Linux (covered in GPU section below).
Download and Network Issues {#download-network-issues}
Pull hangs or times out
# Test basic connectivity
curl -v https://registry.ollama.ai/v2/
# If behind a proxy, set the environment variables
export HTTP_PROXY=http://your-proxy:8080
export HTTPS_PROXY=http://your-proxy:8080
export NO_PROXY=localhost,127.0.0.1
# Then retry
ollama pull llama3.2
Download speed is extremely slow
# Check your actual download speed
curl -o /dev/null -w "Speed: %{speed_download} bytes/sec\n" https://registry.ollama.ai/v2/
# Try a different DNS
# Linux:
sudo bash -c 'echo "nameserver 1.1.1.1" > /etc/resolv.conf'
# Mac:
sudo networksetup -setdnsservers Wi-Fi 1.1.1.1 8.8.8.8
Corporate firewall blocks the registry
Your IT department may block registry.ollama.ai. Two workarounds:
Option 1: Download the model on a personal network, then copy it:
# On unrestricted machine
ollama pull llama3.2
# Copy the model blob
tar -czf llama3.2-model.tar.gz ~/.ollama/models/
# Transfer to restricted machine and extract
tar -xzf llama3.2-model.tar.gz -C ~/
Option 2: Import a GGUF file directly (download from HuggingFace, which may not be blocked):
# Create a Modelfile
echo 'FROM ./llama-3.2-3b.Q4_K_M.gguf' > Modelfile
ollama create my-llama -f Modelfile
For ongoing issues, check Ollama's GitHub issues — connectivity problems sometimes stem from registry outages.
Storage Problems {#storage-problems}
"not enough disk space" or download fails partway
Models are larger than you think. Here is what you need:
| Model | Download Size | Disk After Install |
|---|---|---|
| llama3.2:3b | 2.0 GB | 2.0 GB |
| llama3.2:8b | 4.7 GB | 4.7 GB |
| llama3.3:70b | 40 GB | 40 GB |
| mixtral:8x7b | 26 GB | 26 GB |
| deepseek-r1:32b | 19 GB | 19 GB |
Check your available space:
# Linux/Mac
df -h ~/.ollama
# Windows (PowerShell)
Get-PSDrive C | Select-Object Used, Free
Move the model directory to a bigger drive:
# Linux/Mac — move to external or secondary drive
mv ~/.ollama /mnt/big-drive/ollama
ln -s /mnt/big-drive/ollama ~/.ollama
# Windows — set environment variable
# System Settings → Environment Variables → New
# Variable: OLLAMA_MODELS
# Value: D:\OllamaModels
Clean up old models
# List all models with sizes
ollama list
# Remove models you no longer need
ollama rm mixtral:8x7b
ollama rm codellama:34b
# Check recovered space
du -sh ~/.ollama/models/
Memory Errors {#memory-errors}
"model requires more system memory"
This is the single most common error. The model needs more RAM than your system has available. Not total RAM — available RAM.
Check available memory:
# Linux
free -h | grep Mem
# Mac
vm_stat | perl -ne '/Pages free:\s+(\d+)/ && printf "Free: %.1f GB\n", $1*4096/1073741824'
# Windows (PowerShell)
(Get-CimInstance Win32_OperatingSystem).FreePhysicalMemory / 1MB
Fix 1: Close applications. Browsers are the biggest offenders. Chrome with 20 tabs can consume 4-8 GB. Close it. Close Slack, Teams, Docker Desktop — anything you are not actively using.
Fix 2: Use a smaller model. This is not a workaround — it is the correct solution if your hardware cannot support the model. Consult our system requirements guide for exact RAM-to-model mappings.
| Available RAM | Maximum Model Size |
|---|---|
| 4 GB | 1B-3B (phi3:mini, llama3.2:1b) |
| 8 GB | 3B-7B (llama3.2:3b, gemma2:2b) |
| 16 GB | 7B-13B (llama3.2:8b, mistral) |
| 32 GB | 13B-30B (llama3.1:13b, deepseek-r1:14b) |
| 64 GB | 30B-70B (llama3.3:70b, mixtral:8x7b) |
Fix 3: Use a more aggressively quantized version. Q4_0 uses roughly half the memory of Q8_0:
# Instead of the default (usually Q4_K_M)
ollama pull llama3.2:8b-q4_0
# Or even smaller — Q2 is lossy but runs on tight systems
ollama pull llama3.2:8b-q2_K
Fix 4: Reduce context length. Larger context windows eat RAM. The default 2048 tokens is usually fine:
ollama run llama3.2 --num-ctx 1024
Model Load Failures {#model-load-failures}
"failed to load model"
This error has at least five different root causes. Work through them in order.
Cause 1: Corrupted download. Delete and re-pull.
ollama rm llama3.2
ollama pull llama3.2
Cause 2: Incompatible GGUF file. If you created the model from a manually downloaded GGUF, the file may use a quantization format your Ollama version does not support. Update Ollama:
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Mac
brew upgrade ollama
# Windows — download latest installer from ollama.com
Cause 3: Permission denied. The model directory has wrong ownership (common after running Ollama as root, then as a regular user):
# Linux/Mac
sudo chown -R $(whoami) ~/.ollama
chmod -R 755 ~/.ollama/models
Cause 4: Disk full mid-download. The model file is incomplete. Remove it and re-download:
# Nuclear option — clear all models and re-download what you need
rm -rf ~/.ollama/models/blobs/*
ollama pull llama3.2
Cause 5: CUDA/ROCm library mismatch (GPU systems only). See the GPU section below.
GPU Detection Issues {#gpu-detection-issues}
"GPU not found" / "no compatible GPUs were discovered"
NVIDIA on Linux — the driver dance:
# Check if NVIDIA driver is loaded
nvidia-smi
# If "command not found" — install the driver
sudo apt install -y nvidia-driver-550
sudo reboot
# If nvidia-smi works but Ollama ignores GPU — check CUDA
ls /usr/local/cuda/lib64/libcudart*
# Ollama bundles its own CUDA runtime since v0.3.0
# But driver version must be >= 525.60.13
nvidia-smi | grep "Driver Version"
AMD on Linux:
# ROCm must be installed
rocm-smi
# If missing, install ROCm 6.x
# See: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/
# Verify Ollama sees the GPU
OLLAMA_DEBUG=1 ollama serve 2>&1 | grep -i "gpu\|rocm\|amd"
Windows — CUDA version mismatch:
The most common Windows GPU issue is having an old NVIDIA driver. Ollama requires driver 525+ for CUDA 12.x support.
# Check driver version
nvidia-smi
# If driver is < 525, update from nvidia.com/drivers
# After updating, restart your machine (not just Ollama)
Mac — no discrete GPU needed. Apple Silicon uses Metal automatically. If you see GPU errors on Mac, you are likely on an Intel Mac without a supported GPU. Ollama will fall back to CPU, which is fine:
# Force CPU mode if Metal is causing issues
OLLAMA_METAL=0 ollama serve
"CUDA out of memory"
Your GPU does not have enough VRAM for the model. Unlike system RAM, you cannot use swap space for VRAM.
# Check GPU VRAM
nvidia-smi --query-gpu=memory.total,memory.used,memory.free --format=csv
# Offload some layers to CPU (slower but works)
ollama run llama3.2 --num-gpu 20
# Reduce the number until it fits. 0 = full CPU.
Context Length Errors {#context-length-errors}
"context length exceeded" / "requested context length is too large"
You asked for more context than the model supports or your hardware can handle.
# Check a model's default and maximum context length
ollama show llama3.2 --modelfile | grep num_ctx
# Run with explicit context length
ollama run llama3.2 --num-ctx 4096
# For longer context, you need more RAM:
# 4096 tokens ≈ +0.5 GB RAM
# 8192 tokens ≈ +1 GB RAM
# 32768 tokens ≈ +4 GB RAM
# 131072 tokens ≈ +16 GB RAM
If you routinely need long context, pick a model designed for it. Llama 3.2 supports up to 128K context, but you need the RAM to back it up.
Bad Output Quality {#bad-output-quality}
Model returns gibberish or incoherent text
Cause 1: Wrong quantization. Ultra-low quantization (Q2_K, IQ1) degrades output quality significantly. Step up to Q4_K_M or higher:
ollama rm llama3.2:8b-q2_K
ollama pull llama3.2:8b-q4_K_M
Cause 2: Model too small for the task. A 1B parameter model will not write coherent long-form text. It is not broken — it is limited. Move to 7B or larger.
Cause 3: Corrupted model file. If output was previously fine and suddenly degraded, the model file may be partially corrupted:
ollama rm llama3.2
ollama pull llama3.2
Cause 4: Bad system prompt in Modelfile. If you created a custom model, a poorly written system prompt can derail output. Test with the base model first to isolate the issue:
# Test base model (no custom Modelfile)
ollama run llama3.2 "Summarize the benefits of exercise"
# If base model works fine, the problem is your Modelfile
Model ignores instructions or gives wrong language
This usually means you are using a model that was not trained for your task. Some models are English-only, some are code-only. Check the model card on the Ollama library.
Performance Problems {#performance-problems}
Tokens per second is painfully slow
If inference runs but at 2-3 tokens/second when you expected 20+, the problem is almost always one of these:
Problem 1: Model is running on CPU instead of GPU. Verify GPU is being used:
# Check during inference
# Linux NVIDIA:
watch -n 1 nvidia-smi
# Mac:
sudo powermetrics --samplers gpu_power -i 1000 -n 3
# If GPU utilization is 0% during inference, GPU is not being used
# Restart Ollama with debug logging:
OLLAMA_DEBUG=1 ollama serve
Problem 2: Model is too large and spilling to CPU. When a model does not fully fit in VRAM, Ollama splits layers between GPU and CPU. The CPU layers are dramatically slower. Either use a smaller model or reduce layers on GPU until throughput stabilizes:
# See how many layers loaded to GPU vs CPU
OLLAMA_DEBUG=1 ollama run llama3.2 "test" 2>&1 | grep -i "layer\|gpu\|cpu"
Problem 3: Other processes hogging GPU. Common culprit: a web browser using hardware acceleration, or a stuck CUDA process:
# Linux/Windows — find GPU-hogging processes
nvidia-smi
# Kill stuck processes
sudo kill -9 <PID>
Problem 4: Thermal throttling. Laptops throttle GPU and CPU when they get hot. Check temperatures:
# Linux
sensors | grep -i temp
# Mac
sudo powermetrics --samplers thermal -i 1000 -n 1
For a deep dive on every performance optimization, see our dedicated slow local LLM fix guide.
Platform-Specific Issues {#platform-specific-issues}
Windows-Specific Problems
WSL2 vs Native: If you installed Ollama inside WSL2, it may not see your GPU. You need the NVIDIA CUDA driver for WSL:
# Check if WSL sees GPU
wsl nvidia-smi
# If not, install CUDA WSL driver from:
# https://developer.nvidia.com/cuda/wsl
Windows Defender real-time scanning slows model loading dramatically. Exclude the Ollama directory:
Add-MpPreference -ExclusionPath "C:\Users\$env:USERNAME\.ollama"
Add-MpPreference -ExclusionProcess "ollama.exe"
Long path names: Windows has a 260-character path limit by default. If model paths are deep, enable long paths:
# Run as Administrator
New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -Force
Mac-Specific Problems
Rosetta 2 overhead on M1: If you installed the x86 version of Ollama through an x86 Homebrew installation, it runs through Rosetta and loses 30-40% performance:
# Check architecture
file $(which ollama)
# Should say "arm64" not "x86_64"
# If x86, reinstall with native Homebrew
arch -arm64 brew reinstall ollama
macOS memory pressure: When macOS reports memory pressure as "yellow" or "red," Ollama gets throttled by the OS before it runs out of actual RAM:
# Check memory pressure
memory_pressure -S
# If "WARNING" — close apps and reduce model size
# Activity Monitor → Memory tab → sort by Memory to find hogs
Linux-Specific Problems
SELinux blocking Ollama:
# Check for SELinux denials
ausearch -m avc -ts recent | grep ollama
# Quick fix (permissive for Ollama only)
sudo semanage permissive -a ollama_t
# Or disable SELinux temporarily for testing
sudo setenforce 0
systemd service fails:
# Check service status
systemctl status ollama
# Common fix: the ollama user does not have GPU access
sudo usermod -aG video ollama
sudo usermod -aG render ollama
sudo systemctl restart ollama
NVIDIA driver mismatch after kernel update:
# After a kernel update, NVIDIA modules need rebuilding
sudo apt install --reinstall nvidia-driver-550
# Or if using DKMS:
sudo dkms autoinstall
sudo reboot
Environment Variable Reference {#env-var-reference}
When nothing else works, these environment variables control Ollama's behavior at a granular level. Set them before running ollama serve:
| Variable | Purpose | Example |
|---|---|---|
OLLAMA_HOST | Bind address and port | 127.0.0.1:11435 |
OLLAMA_MODELS | Custom model storage path | /mnt/ssd/models |
OLLAMA_NUM_PARALLEL | Concurrent request limit | 2 |
OLLAMA_MAX_LOADED_MODELS | Models kept in memory | 1 |
OLLAMA_DEBUG | Verbose logging | 1 |
OLLAMA_METAL | Enable/disable Metal (Mac) | 0 or 1 |
CUDA_VISIBLE_DEVICES | Select specific GPU | 0 or 0,1 |
OLLAMA_KEEP_ALIVE | Model unload timeout | 5m or 0 |
OLLAMA_FLASH_ATTENTION | Enable flash attention | 1 |
# Example: start Ollama with debug logging, custom port, single model loaded
OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0:11435 OLLAMA_MAX_LOADED_MODELS=1 ollama serve
Nuclear Options: Full Reset {#nuclear-options}
If nothing above works and you want to start completely clean:
# 1. Stop Ollama
pkill -f ollama # Linux/Mac
taskkill /F /IM ollama.exe # Windows
# 2. Remove all data
rm -rf ~/.ollama # Linux/Mac
# Windows:
# Delete: %LOCALAPPDATA%\Ollama
# Delete: %USERPROFILE%\.ollama
# 3. Uninstall
# Linux:
sudo rm /usr/local/bin/ollama
# Mac:
brew uninstall ollama
# Windows:
# Use Add/Remove Programs
# 4. Reinstall fresh
# Linux:
curl -fsSL https://ollama.com/install.sh | sh
# Mac:
brew install ollama
# Windows:
# Download from ollama.com/download/windows
# 5. Test
ollama --version
ollama pull llama3.2:3b
ollama run llama3.2:3b "Hello, is this working?"
Getting More Help
If you have worked through every section above and your issue persists:
- Collect debug logs: Run
OLLAMA_DEBUG=1 ollama serveand capture the full output. - Check GitHub issues: Search github.com/ollama/ollama/issues for your specific error message.
- Post a bug report: Include your OS, Ollama version (
ollama --version), hardware specs, and the exact error message with debug logs.
Most issues boil down to one of three things: not enough RAM, wrong GPU drivers, or network problems. Fix those and Ollama runs reliably for months at a time.
Running into performance problems specifically? Our local LLM slow fix guide covers every optimization technique from quantization selection to GPU layer splitting. For hardware planning, check the Ollama system requirements page.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!