Ollama + Open WebUI: Self-Hosted ChatGPT (Docker)
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Ollama + Open WebUI: Build Your Own Self-Hosted ChatGPT
Published on April 10, 2026 — 22 min read
You can have a ChatGPT-quality interface running on your own hardware, with your own models, behind your own firewall. No API keys. No per-token billing. No data leaving your network. The stack is Ollama for inference and Open WebUI for the frontend, tied together with Docker Compose.
I run this setup on a single machine with an RTX 4090 and it serves a team of 8 people. Total cost after hardware: zero dollars per month. Here is the exact configuration.
What this guide covers:
- Docker Compose file for the full stack (Ollama + Open WebUI)
- NVIDIA GPU passthrough configuration
- Nginx reverse proxy with automatic TLS certificates
- User management and access control
- Model management through the web interface
- Production hardening (backups, monitoring, resource limits)
- Troubleshooting the most common deployment failures
Prerequisites:
- A Linux server (Ubuntu 22.04+ recommended) or a machine with Docker Desktop
- Docker Engine 24+ and Docker Compose v2
- NVIDIA GPU with 8GB+ VRAM (optional but strongly recommended)
- Basic familiarity with terminal commands
If you need help installing Ollama on its own first, see the complete Ollama guide. For Mac-specific Docker setup, check the Mac local AI setup.
Table of Contents
- Architecture Overview
- Install Docker and NVIDIA Container Toolkit
- Docker Compose Configuration
- First Launch and Admin Setup
- GPU Passthrough Configuration
- Model Management via the UI
- Nginx Reverse Proxy with TLS
- User Management and Access Control
- Production Hardening
- Troubleshooting
Architecture Overview {#architecture}
The stack has three containers:
[Browser] → [Nginx :443] → [Open WebUI :8080] → [Ollama :11434]
TLS Frontend Inference
- Ollama handles model loading and inference. It exposes a REST API on port 11434.
- Open WebUI is the frontend. It talks to Ollama's API, manages conversation history, handles user auth, and serves the web interface on port 8080.
- Nginx (optional but recommended) sits in front as a reverse proxy, handles TLS termination, and provides rate limiting.
All three run as Docker containers on the same host. Open WebUI stores data (conversations, user accounts, settings) in a SQLite database mounted as a Docker volume. Ollama stores models in a separate volume.
Install Docker and NVIDIA Container Toolkit {#install-docker}
Docker Engine
# Ubuntu/Debian
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
# Add your user to the docker group (log out and back in after)
sudo usermod -aG docker $USER
NVIDIA Container Toolkit (for GPU passthrough)
# Add the NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU is accessible inside containers
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
If nvidia-smi works inside the container, you are good to proceed. If it fails, your GPU drivers need updating — Ollama v0.6+ requires NVIDIA driver 535 or newer.
Docker Compose Configuration {#docker-compose}
Create a project directory and the compose file:
mkdir -p ~/ollama-webui && cd ~/ollama-webui
# docker-compose.yml
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- OLLAMA_NUM_PARALLEL=4
- OLLAMA_MAX_LOADED_MODELS=2
- OLLAMA_FLASH_ATTENTION=1
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
volumes:
- openwebui_data:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_SECRET_KEY=change-this-to-a-random-string
- WEBUI_AUTH=true
- DEFAULT_USER_ROLE=pending
depends_on:
- ollama
volumes:
ollama_data:
openwebui_data:
Key configuration choices:
OLLAMA_NUM_PARALLEL=4allows 4 simultaneous inference requests. Reduce to 2 on GPUs with less than 12GB VRAM.OLLAMA_MAX_LOADED_MODELS=2keeps up to 2 models in GPU memory. Saves VRAM on smaller cards.OLLAMA_FLASH_ATTENTION=1enables Flash Attention for faster inference and lower memory usage.DEFAULT_USER_ROLE=pendingmeans new signups require admin approval before they can use the system.WEBUI_SECRET_KEYis used for JWT tokens. Generate one withopenssl rand -hex 32.
Launch the Stack
docker compose up -d
# Watch the logs
docker compose logs -f
# Verify both services are running
docker compose ps
Open WebUI should be accessible at http://your-server-ip:3000 within 30 seconds.
First Launch and Admin Setup {#first-launch}
The first user to create an account becomes the admin. Do this immediately after deployment — do not leave the signup page open on a public network.
- Navigate to
http://localhost:3000(or your server IP) - Click Sign Up
- Enter your name, email, and password
- You are now the admin
Pull Your First Model
Open WebUI can pull models directly through the interface:
- Click the gear icon (top right) → Admin Panel
- Go to Models → Pull a Model
- Type
llama3.2and click Pull
Or pull from the command line:
docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama pull qwen3:8b
docker exec -it ollama ollama pull mistral
Once models are pulled, they appear in the model selector dropdown on the chat page. You can switch between models mid-conversation.
GPU Passthrough Configuration {#gpu-passthrough}
NVIDIA GPUs
The Docker Compose file above already includes NVIDIA GPU passthrough via the deploy.resources.reservations block. Verify it is working:
# Check GPU visibility inside the Ollama container
docker exec -it ollama nvidia-smi
# Check Ollama sees the GPU
docker exec -it ollama ollama run llama3.2 "What GPU am I running on?" --verbose
# Look for "gpu" in the verbose output
Multi-GPU Setup
If you have multiple GPUs:
# In docker-compose.yml, under ollama service:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all # Use all GPUs
capabilities: [gpu]
Or restrict to specific GPUs:
# Use only GPU 0 and GPU 1
environment:
- CUDA_VISIBLE_DEVICES=0,1
AMD GPUs (ROCm)
Replace the Ollama image with the ROCm variant:
ollama:
image: ollama/ollama:rocm
devices:
- /dev/kfd
- /dev/dri
group_add:
- video
- render
CPU-Only Setup
Remove the entire deploy block from the Ollama service. It works fine on CPU — just slower. Expect ~5-8 tokens/second for a 7B model on a modern CPU versus ~40 tokens/second on an RTX 4090.
Model Management via the UI {#model-management}
Open WebUI provides a model management interface that makes Ollama's CLI optional for day-to-day operations.
Pulling and Deleting Models
Admin Panel → Models:
- Pull: Enter a model name (e.g.,
codellama:34b) and click Pull. Progress shows in real-time. - Delete: Click the trash icon next to any model. This frees disk space immediately.
- Model Info: Click a model name to see parameters, size, quantization level, and template.
Creating Custom Models (Modelfiles)
You can create customized model variants directly in the UI:
- Go to Workspace → Modelfiles
- Click Create New
- Define your system prompt, temperature, and base model
Example: A code review assistant based on CodeLlama:
FROM codellama:13b
SYSTEM "You are a senior software engineer performing code review. Be direct. Point out bugs, security issues, and performance problems. Suggest specific fixes with code snippets."
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
Recommended Model Selection by VRAM
| VRAM | Recommended Models | Concurrent Users |
|---|---|---|
| 8GB | Llama 3.2 3B, Phi-3 Mini, Gemma 2B | 2-3 |
| 12GB | Llama 3.2 7B, Mistral 7B, Qwen3 8B | 3-4 |
| 24GB | Llama 3.1 13B, CodeLlama 13B, Mixtral 8x7B (Q4) | 4-6 |
| 48GB | Llama 3.1 70B (Q4), Command R+ (Q4) | 4-8 |
For model selection details, see the best Ollama models guide.
Nginx Reverse Proxy with TLS {#nginx-tls}
Running Open WebUI behind Nginx gives you TLS, rate limiting, and a proper domain name.
Add Nginx to Docker Compose
# Add to docker-compose.yml services:
nginx:
image: nginx:alpine
container_name: nginx-proxy
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./certbot/conf:/etc/letsencrypt:ro
- ./certbot/www:/var/www/certbot:ro
depends_on:
- open-webui
certbot:
image: certbot/certbot
container_name: certbot
volumes:
- ./certbot/conf:/etc/letsencrypt
- ./certbot/www:/var/www/certbot
entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 12h & wait $${!}; done;'"
Nginx Configuration
# nginx.conf
events {
worker_connections 1024;
}
http {
# Rate limiting
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
upstream openwebui {
server open-webui:8080;
}
# Redirect HTTP to HTTPS
server {
listen 80;
server_name ai.yourdomain.com;
location /.well-known/acme-challenge/ {
root /var/www/certbot;
}
location / {
return 301 https://$host$request_uri;
}
}
server {
listen 443 ssl http2;
server_name ai.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/ai.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ai.yourdomain.com/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
# Security headers
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
add_header Strict-Transport-Security "max-age=63072000" always;
# WebSocket support (required for streaming responses)
location / {
proxy_pass http://openwebui;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
limit_req zone=api burst=20 nodelay;
}
# Increase body size for file uploads
client_max_body_size 50M;
}
}
Obtain TLS Certificate
# First, start nginx without TLS for the ACME challenge
# Temporarily comment out the ssl_certificate lines and the 443 server block
# Then run:
docker compose up -d nginx
# Request the certificate
docker compose run --rm certbot certonly \
--webroot --webroot-path=/var/www/certbot \
--email you@example.com --agree-tos --no-eff-email \
-d ai.yourdomain.com
# Uncomment the TLS config and restart
docker compose restart nginx
User Management and Access Control {#user-management}
Open WebUI has a built-in user system. For a team deployment, configure it properly.
User Roles
| Role | Permissions |
|---|---|
| Admin | Full access: models, users, settings, system config |
| User | Chat with models, create conversations, upload documents |
| Pending | Cannot use the system until approved by admin |
Configuration
# In docker-compose.yml, open-webui environment:
environment:
- WEBUI_AUTH=true # Require login
- DEFAULT_USER_ROLE=pending # New users need approval
- ENABLE_SIGNUP=true # Allow self-registration
- ENABLE_LOGIN_FORM=true # Show email/password form
Admin Actions
In the Admin Panel → Users section:
- Approve pending users
- Change roles (promote to admin or demote)
- Disable accounts without deleting conversation history
- Set model access per user (restrict which models specific users can access)
For a small team, the built-in auth is sufficient. For enterprise deployments, Open WebUI supports OAuth2/OIDC providers (Keycloak, Authentik, Azure AD) via the OAUTH_ environment variables.
Production Hardening {#production}
If you are deploying this for a team or leaving it running 24/7, these configurations prevent data loss and resource exhaustion.
Automated Backups
#!/bin/bash
# backup.sh — run daily via cron
BACKUP_DIR="/backups/ollama-webui/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"
# Back up Open WebUI database
docker exec open-webui cp /app/backend/data/webui.db /tmp/webui.db
docker cp open-webui:/tmp/webui.db "$BACKUP_DIR/webui.db"
# Back up model list (not the models themselves — they are large)
docker exec ollama ollama list > "$BACKUP_DIR/model-list.txt"
# Compress
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"
# Retain 30 days
find /backups/ollama-webui/ -name "*.tar.gz" -mtime +30 -delete
# Add to crontab
crontab -e
# 2 AM daily
0 2 * * * /home/user/ollama-webui/backup.sh
Resource Limits
# docker-compose.yml — prevent one container from eating all resources
services:
ollama:
# ... existing config ...
deploy:
resources:
limits:
memory: 32G
reservations:
memory: 8G
devices:
- driver: nvidia
count: all
capabilities: [gpu]
open-webui:
# ... existing config ...
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"
Health Checks
services:
ollama:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/version"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
open-webui:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
Log Rotation
# Add to each service in docker-compose.yml
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Firewall Rules
# Only expose ports 80 and 443 publicly
# Keep 11434 (Ollama) and 3000 (Open WebUI direct) internal only
sudo ufw default deny incoming
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
# Block direct access to Ollama from outside
sudo ufw deny 11434/tcp
sudo ufw deny 3000/tcp
Troubleshooting {#troubleshooting}
Open WebUI cannot connect to Ollama
# Check the OLLAMA_BASE_URL environment variable
docker exec open-webui env | grep OLLAMA
# It should be: OLLAMA_BASE_URL=http://ollama:11434
# NOT http://localhost:11434 — containers use service names
# Test connectivity from Open WebUI container
docker exec open-webui curl -s http://ollama:11434/api/version
GPU not detected in container
# Verify NVIDIA runtime is configured
docker info | grep -i runtime
# Should show "nvidia" in the runtimes list
# Test GPU access
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
# If it fails, reinstall NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Out of memory during model loading
# Check GPU memory usage
nvidia-smi
# Reduce loaded models
# In docker-compose.yml:
environment:
- OLLAMA_MAX_LOADED_MODELS=1
# Use a smaller model or quantization
docker exec ollama ollama pull llama3.2:7b-q4_K_M
Slow responses with multiple users
# Check if requests are queuing
docker exec ollama ollama ps
# Reduce parallelism to prevent thrashing
environment:
- OLLAMA_NUM_PARALLEL=2
# Monitor GPU utilization
watch -n 1 nvidia-smi
WebSocket errors (streaming not working)
This happens when Nginx is not configured for WebSocket upgrades. Ensure your Nginx config includes:
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
Container keeps restarting
# Check logs for the failing container
docker compose logs ollama --tail 50
docker compose logs open-webui --tail 50
# Common cause: port conflict
lsof -i :11434
lsof -i :3000
# Another cause: corrupted volume
docker compose down
docker volume rm ollama-webui_openwebui_data
docker compose up -d
# Warning: this deletes all conversations and user accounts
Cost Comparison: Self-Hosted vs Cloud
For context on why this setup is worth the effort:
| ChatGPT Team | Self-Hosted (this guide) | |
|---|---|---|
| Monthly cost (8 users) | $200/mo ($25/user) | $0 (after hardware) |
| Annual cost | $2,400 | ~$80 (electricity) |
| Data privacy | OpenAI servers | Your hardware |
| Model choice | GPT-4o only | Any open model |
| Rate limits | 100 msgs/3hrs | Your GPU is the limit |
| Customization | None | Full (system prompts, fine-tuned models) |
Hardware cost for an RTX 4090 server: ~$2,500. Break-even versus ChatGPT Team at 8 users: 12.5 months. After that, it is pure savings.
For a detailed cost breakdown, see our local AI vs ChatGPT cost comparison. If you are evaluating other self-hosted chat interfaces, the AnythingLLM setup guide covers an alternative with built-in RAG support.
What to Build Next
Once your Ollama + Open WebUI stack is running:
- Add RAG capabilities — Open WebUI supports document upload natively. Drop PDFs into a conversation and the model answers from their content.
- Connect to IDE tools — Point Continue.dev at your Ollama instance for AI-assisted coding.
- Automate with n8n — Wire Ollama into workflows for email processing, document summarization, or Slack bots.
- Monitor usage — Open WebUI's admin panel shows per-user message counts, model usage, and response times.
Need help choosing models for this setup? See best Ollama models by use case. For Windows-specific Docker issues, check the Ollama Windows installation guide.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!