What is Open WebUI and how does it work with Ollama?

Open WebUI is an open-source web interface that provides a ChatGPT-like experience for local AI models. It connects to Ollama's API to run inference while handling the user interface, conversation history, user management, and document uploads. Open WebUI runs as a separate Docker container that communicates with Ollama over HTTP on port 11434.

Do I need a GPU to run Ollama with Open WebUI?

No, a GPU is not required — Ollama runs on CPU. However, GPU acceleration dramatically improves performance. A 7B model generates about 5-8 tokens/second on CPU versus 35-45 tokens/second on an RTX 4090. For a team deployment, a GPU is strongly recommended for a usable experience.

How much VRAM do I need for a team deployment?

For 2-4 concurrent users with a 7B model: 8GB VRAM minimum. For 4-8 users with a 13B model: 24GB VRAM recommended (RTX 4090 or A5000). For larger models like Llama 70B, you need 48GB+ VRAM. OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS settings help manage VRAM across concurrent requests.

Is Open WebUI data stored locally?

Yes, all data stays on your server. Conversations, user accounts, uploaded documents, and settings are stored in a SQLite database inside the Docker volume. No data is sent to external servers. This is the primary advantage over cloud-based AI services.

Can I access Open WebUI from outside my local network?

Yes, by configuring Nginx as a reverse proxy with TLS (covered in this guide). Point a domain to your server, obtain a Let's Encrypt certificate, and configure Nginx to proxy traffic to Open WebUI. Always use HTTPS and enable authentication when exposing the service to the internet.

How do I update Open WebUI and Ollama?

Run 'docker compose pull' to fetch the latest images, then 'docker compose up -d' to restart with the new versions. Your data persists in Docker volumes. For production, pin specific image tags instead of using :latest to avoid unexpected breaking changes.

Can multiple users chat simultaneously?

Yes. Open WebUI handles multiple users with individual accounts and conversation histories. Ollama processes requests in parallel (configurable via OLLAMA_NUM_PARALLEL). With an RTX 4090 and a 7B model, 4-6 concurrent users get acceptable response times. More users or larger models require more GPU resources.

How does this compare to ChatGPT for a team?

For an 8-person team, ChatGPT Team costs $200/month ($2,400/year). A self-hosted Ollama + Open WebUI setup costs roughly $80/year in electricity after the initial hardware investment. You also get unlimited usage, complete data privacy, model flexibility, and no rate limits. Break-even on an RTX 4090 server is approximately 12-13 months.

Ollama + Open WebUI: Self-Hosted ChatGPT (Docker)

Published on April 10, 2026 — 22 min read

You can have a ChatGPT-quality interface running on your own hardware, with your own models, behind your own firewall. No API keys. No per-token billing. No data leaving your network. The stack is Ollama for inference and Open WebUI for the frontend, tied together with Docker Compose.

I run this setup on a single machine with an RTX 4090 and it serves a team of 8 people. Total cost after hardware: zero dollars per month. Here is the exact configuration.

What this guide covers:

Docker Compose file for the full stack (Ollama + Open WebUI)
NVIDIA GPU passthrough configuration
Nginx reverse proxy with automatic TLS certificates
User management and access control
Model management through the web interface
Production hardening (backups, monitoring, resource limits)
Troubleshooting the most common deployment failures

Prerequisites:

A Linux server (Ubuntu 22.04+ recommended) or a machine with Docker Desktop
Docker Engine 24+ and Docker Compose v2
NVIDIA GPU with 8GB+ VRAM (optional but strongly recommended)
Basic familiarity with terminal commands

If you need help installing Ollama on its own first, see the complete Ollama guide. For Mac-specific Docker setup, check the Mac local AI setup.

Architecture Overview
Install Docker and NVIDIA Container Toolkit
Docker Compose Configuration
First Launch and Admin Setup
GPU Passthrough Configuration
Model Management via the UI
Nginx Reverse Proxy with TLS
User Management and Access Control
Production Hardening
Troubleshooting

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Architecture Overview {#architecture}

The stack has three containers:

[Browser] → [Nginx :443] → [Open WebUI :8080] → [Ollama :11434]
                TLS              Frontend              Inference

Ollama handles model loading and inference. It exposes a REST API on port 11434.
Open WebUI is the frontend. It talks to Ollama's API, manages conversation history, handles user auth, and serves the web interface on port 8080.
Nginx (optional but recommended) sits in front as a reverse proxy, handles TLS termination, and provides rate limiting.

All three run as Docker containers on the same host. Open WebUI stores data (conversations, user accounts, settings) in a SQLite database mounted as a Docker volume. Ollama stores models in a separate volume.

Install Docker and NVIDIA Container Toolkit {#install-docker}

Docker Engine

# Ubuntu/Debian
sudo apt update
sudo apt install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin

# Add your user to the docker group (log out and back in after)
sudo usermod -aG docker $USER

NVIDIA Container Toolkit (for GPU passthrough)

# Add the NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker to use NVIDIA runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU is accessible inside containers
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

If nvidia-smi works inside the container, you are good to proceed. If it fails, your GPU drivers need updating — Ollama v0.6+ requires NVIDIA driver 535 or newer.

Docker Compose Configuration {#docker-compose}

Create a project directory and the compose file:

mkdir -p ~/ollama-webui && cd ~/ollama-webui

# docker-compose.yml
version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OLLAMA_NUM_PARALLEL=4
      - OLLAMA_MAX_LOADED_MODELS=2
      - OLLAMA_FLASH_ATTENTION=1

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - openwebui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=change-this-to-a-random-string
      - WEBUI_AUTH=true
      - DEFAULT_USER_ROLE=pending
    depends_on:
      - ollama

volumes:
  ollama_data:
  openwebui_data:

Key configuration choices:

OLLAMA_NUM_PARALLEL=4 allows 4 simultaneous inference requests. Reduce to 2 on GPUs with less than 12GB VRAM.
OLLAMA_MAX_LOADED_MODELS=2 keeps up to 2 models in GPU memory. Saves VRAM on smaller cards.
OLLAMA_FLASH_ATTENTION=1 enables Flash Attention for faster inference and lower memory usage.
DEFAULT_USER_ROLE=pending means new signups require admin approval before they can use the system.
WEBUI_SECRET_KEY is used for JWT tokens. Generate one with openssl rand -hex 32.

Launch the Stack

docker compose up -d

# Watch the logs
docker compose logs -f

# Verify both services are running
docker compose ps

Open WebUI should be accessible at http://your-server-ip:3000 within 30 seconds.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

First Launch and Admin Setup {#first-launch}

The first user to create an account becomes the admin. Do this immediately after deployment — do not leave the signup page open on a public network.

Navigate to http://localhost:3000 (or your server IP)
Click Sign Up
Enter your name, email, and password
You are now the admin

Pull Your First Model

Open WebUI can pull models directly through the interface:

Click the gear icon (top right) → Admin Panel
Go to Models → Pull a Model
Type llama3.2 and click Pull

Or pull from the command line:

docker exec -it ollama ollama pull llama3.2
docker exec -it ollama ollama pull qwen3:8b
docker exec -it ollama ollama pull mistral

Once models are pulled, they appear in the model selector dropdown on the chat page. You can switch between models mid-conversation.

GPU Passthrough Configuration {#gpu-passthrough}

NVIDIA GPUs

The Docker Compose file above already includes NVIDIA GPU passthrough via the deploy.resources.reservations block. Verify it is working:

# Check GPU visibility inside the Ollama container
docker exec -it ollama nvidia-smi

# Check Ollama sees the GPU
docker exec -it ollama ollama run llama3.2 "What GPU am I running on?" --verbose
# Look for "gpu" in the verbose output

Multi-GPU Setup

If you have multiple GPUs:

# In docker-compose.yml, under ollama service:
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all        # Use all GPUs
          capabilities: [gpu]

Or restrict to specific GPUs:

# Use only GPU 0 and GPU 1
environment:
  - CUDA_VISIBLE_DEVICES=0,1

AMD GPUs (ROCm)

Replace the Ollama image with the ROCm variant:

ollama:
  image: ollama/ollama:rocm
  devices:
    - /dev/kfd
    - /dev/dri
  group_add:
    - video
    - render

CPU-Only Setup

Remove the entire deploy block from the Ollama service. It works fine on CPU — just slower. Expect ~5-8 tokens/second for a 7B model on a modern CPU versus ~40 tokens/second on an RTX 4090.

Model Management via the UI {#model-management}

Open WebUI provides a model management interface that makes Ollama's CLI optional for day-to-day operations.

Pulling and Deleting Models

Admin Panel → Models:

Pull: Enter a model name (e.g., codellama:34b) and click Pull. Progress shows in real-time.
Delete: Click the trash icon next to any model. This frees disk space immediately.
Model Info: Click a model name to see parameters, size, quantization level, and template.

Creating Custom Models (Modelfiles)

You can create customized model variants directly in the UI:

Go to Workspace → Modelfiles
Click Create New
Define your system prompt, temperature, and base model

Example: A code review assistant based on CodeLlama:

FROM codellama:13b
SYSTEM "You are a senior software engineer performing code review. Be direct. Point out bugs, security issues, and performance problems. Suggest specific fixes with code snippets."
PARAMETER temperature 0.3
PARAMETER num_ctx 8192

Recommended Model Selection by VRAM

VRAM	Recommended Models	Concurrent Users
8GB	Llama 3.2 3B, Phi-3 Mini, Gemma 2B	2-3
12GB	Llama 3.2 7B, Mistral 7B, Qwen3 8B	3-4
24GB	Llama 3.1 13B, CodeLlama 13B, Mixtral 8x7B (Q4)	4-6
48GB	Llama 3.1 70B (Q4), Command R+ (Q4)	4-8

For model selection details, see the best Ollama models guide.

Nginx Reverse Proxy with TLS {#nginx-tls}

Running Open WebUI behind Nginx gives you TLS, rate limiting, and a proper domain name.

Add Nginx to Docker Compose

# Add to docker-compose.yml services:
  nginx:
    image: nginx:alpine
    container_name: nginx-proxy
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./certbot/conf:/etc/letsencrypt:ro
      - ./certbot/www:/var/www/certbot:ro
    depends_on:
      - open-webui

  certbot:
    image: certbot/certbot
    container_name: certbot
    volumes:
      - ./certbot/conf:/etc/letsencrypt
      - ./certbot/www:/var/www/certbot
    entrypoint: "/bin/sh -c 'trap exit TERM; while :; do certbot renew; sleep 12h & wait $${!}; done;'"

Nginx Configuration

# nginx.conf
events {
    worker_connections 1024;
}

http {
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    upstream openwebui {
        server open-webui:8080;
    }

    # Redirect HTTP to HTTPS
    server {
        listen 80;
        server_name ai.yourdomain.com;
        location /.well-known/acme-challenge/ {
            root /var/www/certbot;
        }
        location / {
            return 301 https://$host$request_uri;
        }
    }

    server {
        listen 443 ssl http2;
        server_name ai.yourdomain.com;

        ssl_certificate /etc/letsencrypt/live/ai.yourdomain.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/ai.yourdomain.com/privkey.pem;
        ssl_protocols TLSv1.2 TLSv1.3;

        # Security headers
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options nosniff;
        add_header Strict-Transport-Security "max-age=63072000" always;

        # WebSocket support (required for streaming responses)
        location / {
            proxy_pass http://openwebui;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_read_timeout 600s;
            proxy_send_timeout 600s;

            limit_req zone=api burst=20 nodelay;
        }

        # Increase body size for file uploads
        client_max_body_size 50M;
    }
}

Obtain TLS Certificate

# First, start nginx without TLS for the ACME challenge
# Temporarily comment out the ssl_certificate lines and the 443 server block
# Then run:
docker compose up -d nginx

# Request the certificate
docker compose run --rm certbot certonly \
  --webroot --webroot-path=/var/www/certbot \
  --email you@example.com --agree-tos --no-eff-email \
  -d ai.yourdomain.com

# Uncomment the TLS config and restart
docker compose restart nginx

User Management and Access Control {#user-management}

Open WebUI has a built-in user system. For a team deployment, configure it properly.

User Roles

Role	Permissions
Admin	Full access: models, users, settings, system config
User	Chat with models, create conversations, upload documents
Pending	Cannot use the system until approved by admin

Configuration

# In docker-compose.yml, open-webui environment:
environment:
  - WEBUI_AUTH=true              # Require login
  - DEFAULT_USER_ROLE=pending    # New users need approval
  - ENABLE_SIGNUP=true           # Allow self-registration
  - ENABLE_LOGIN_FORM=true       # Show email/password form

Admin Actions

In the Admin Panel → Users section:

Approve pending users
Change roles (promote to admin or demote)
Disable accounts without deleting conversation history
Set model access per user (restrict which models specific users can access)

For a small team, the built-in auth is sufficient. For enterprise deployments, Open WebUI supports OAuth2/OIDC providers (Keycloak, Authentik, Azure AD) via the OAUTH_ environment variables.

Production Hardening {#production}

If you are deploying this for a team or leaving it running 24/7, these configurations prevent data loss and resource exhaustion.

Automated Backups

#!/bin/bash
# backup.sh — run daily via cron
BACKUP_DIR="/backups/ollama-webui/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# Back up Open WebUI database
docker exec open-webui cp /app/backend/data/webui.db /tmp/webui.db
docker cp open-webui:/tmp/webui.db "$BACKUP_DIR/webui.db"

# Back up model list (not the models themselves — they are large)
docker exec ollama ollama list > "$BACKUP_DIR/model-list.txt"

# Compress
tar -czf "$BACKUP_DIR.tar.gz" "$BACKUP_DIR"
rm -rf "$BACKUP_DIR"

# Retain 30 days
find /backups/ollama-webui/ -name "*.tar.gz" -mtime +30 -delete

# Add to crontab
crontab -e
# 2 AM daily
0 2 * * * /home/user/ollama-webui/backup.sh

Resource Limits

# docker-compose.yml — prevent one container from eating all resources
services:
  ollama:
    # ... existing config ...
    deploy:
      resources:
        limits:
          memory: 32G
        reservations:
          memory: 8G
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    # ... existing config ...
    deploy:
      resources:
        limits:
          memory: 4G
          cpus: "2.0"

Health Checks

services:
  ollama:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/version"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  open-webui:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

Log Rotation

# Add to each service in docker-compose.yml
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Firewall Rules

# Only expose ports 80 and 443 publicly
# Keep 11434 (Ollama) and 3000 (Open WebUI direct) internal only
sudo ufw default deny incoming
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable

# Block direct access to Ollama from outside
sudo ufw deny 11434/tcp
sudo ufw deny 3000/tcp

Troubleshooting {#troubleshooting}

Open WebUI cannot connect to Ollama

# Check the OLLAMA_BASE_URL environment variable
docker exec open-webui env | grep OLLAMA

# It should be: OLLAMA_BASE_URL=http://ollama:11434
# NOT http://localhost:11434 — containers use service names

# Test connectivity from Open WebUI container
docker exec open-webui curl -s http://ollama:11434/api/version

GPU not detected in container

# Verify NVIDIA runtime is configured
docker info | grep -i runtime
# Should show "nvidia" in the runtimes list

# Test GPU access
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

# If it fails, reinstall NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Out of memory during model loading

# Check GPU memory usage
nvidia-smi

# Reduce loaded models
# In docker-compose.yml:
environment:
  - OLLAMA_MAX_LOADED_MODELS=1

# Use a smaller model or quantization
docker exec ollama ollama pull llama3.2:7b-q4_K_M

Slow responses with multiple users

# Check if requests are queuing
docker exec ollama ollama ps

# Reduce parallelism to prevent thrashing
environment:
  - OLLAMA_NUM_PARALLEL=2

# Monitor GPU utilization
watch -n 1 nvidia-smi

WebSocket errors (streaming not working)

This happens when Nginx is not configured for WebSocket upgrades. Ensure your Nginx config includes:

proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Container keeps restarting

# Check logs for the failing container
docker compose logs ollama --tail 50
docker compose logs open-webui --tail 50

# Common cause: port conflict
lsof -i :11434
lsof -i :3000

# Another cause: corrupted volume
docker compose down
docker volume rm ollama-webui_openwebui_data
docker compose up -d
# Warning: this deletes all conversations and user accounts

Cost Comparison: Self-Hosted vs Cloud

For context on why this setup is worth the effort:

	ChatGPT Team	Self-Hosted (this guide)
Monthly cost (8 users)	$200/mo ($25/user)	$0 (after hardware)
Annual cost	$2,400	~$80 (electricity)
Data privacy	OpenAI servers	Your hardware
Model choice	GPT-4o only	Any open model
Rate limits	100 msgs/3hrs	Your GPU is the limit
Customization	None	Full (system prompts, fine-tuned models)

Hardware cost for an RTX 4090 server: ~$2,500. Break-even versus ChatGPT Team at 8 users: 12.5 months. After that, it is pure savings.

For a detailed cost breakdown, see our local AI vs ChatGPT cost comparison. If you are evaluating other self-hosted chat interfaces, the AnythingLLM setup guide covers an alternative with built-in RAG support.

What to Build Next

Once your Ollama + Open WebUI stack is running:

Add RAG capabilities — Open WebUI supports document upload natively. Drop PDFs into a conversation and the model answers from their content.
Connect to IDE tools — Point Continue.dev at your Ollama instance for AI-assisted coding.
Automate with n8n — Wire Ollama into workflows for email processing, document summarization, or Slack bots.
Monitor usage — Open WebUI's admin panel shows per-user message counts, model usage, and response times.

Need help choosing models for this setup? See best Ollama models by use case. For Windows-specific Docker issues, check the Ollama Windows installation guide.

Ollama + Open WebUI: Self-Hosted ChatGPT (Docker)

Want to go deeper than this article?

Table of Contents

Reading articles is good. Building is better.

Architecture Overview {#architecture}

Install Docker and NVIDIA Container Toolkit {#install-docker}

Docker Engine

NVIDIA Container Toolkit (for GPU passthrough)

Docker Compose Configuration {#docker-compose}

Launch the Stack

Reading articles is good. Building is better.

First Launch and Admin Setup {#first-launch}

Pull Your First Model

GPU Passthrough Configuration {#gpu-passthrough}

NVIDIA GPUs

Multi-GPU Setup

AMD GPUs (ROCm)

CPU-Only Setup

Model Management via the UI {#model-management}

Pulling and Deleting Models

Creating Custom Models (Modelfiles)

Recommended Model Selection by VRAM

Nginx Reverse Proxy with TLS {#nginx-tls}

Add Nginx to Docker Compose

Nginx Configuration

Obtain TLS Certificate

User Management and Access Control {#user-management}

User Roles

Configuration

Admin Actions

Production Hardening {#production}

Automated Backups

Resource Limits

Health Checks

Log Rotation

Firewall Rules

Troubleshooting {#troubleshooting}

Open WebUI cannot connect to Ollama

GPU not detected in container

Out of memory during model loading

Slow responses with multiple users

WebSocket errors (streaming not working)

Container keeps restarting

Cost Comparison: Self-Hosted vs Cloud

What to Build Next

Ollama’s running. Here’s what to build with it.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by the Local AI Master Team

Self-Hosted AI Tips Weekly

Build Real AI on Your Machine

🎓 Continue Learning

Related Guides

Ollama on Windows: Complete Installation Guide

Ollama on Mac: Apple Silicon Setup Guide

AnythingLLM: Self-Hosted AI with RAG

Best Ollama Models for Every Use Case

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Ollama’s running. Here’s what to build with it.