Dify Self-Hosted: Deploy Your Own AI Platform
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Dify Self-Hosted: Deploy Your Own AI Platform
Published on April 10, 2026 • 22 min read
Quick Start: Dify Running in 5 Minutes
Get Dify running locally with three commands:
- Clone the repo:
git clone https://github.com/langgenius/dify.git && cd dify/docker - Copy config:
cp .env.example .env - Launch everything:
docker compose up -d
Open http://localhost and create your admin account. That's your private AI platform.
What you'll get from this guide:
- A fully private Dify instance on your own hardware
- Ollama connected as a zero-cost local model provider
- A working RAG pipeline that indexes your documents
- API endpoints you can call from any application
- Production hardening for team deployments
Dify has crossed 53,000 GitHub stars for a reason. It gives you a visual builder for AI workflows, built-in RAG, prompt management, and API generation -- all behind a clean web interface. The self-hosted version is free under the open-source license, with no feature restrictions compared to their cloud offering.
Most people discover Dify after hitting the walls of simpler tools. If you've outgrown basic chatbot UIs and need structured AI applications -- customer support bots, document analysis pipelines, internal knowledge bases -- Dify handles that without writing a framework from scratch.
If you're already using visual AI builders, you'll want to understand how Dify compares to Flowise and n8n for local AI workflows. For RAG specifically, our RAG local setup guide covers the fundamentals that Dify builds on top of.
Table of Contents
- What Is Dify and Why Self-Host
- Architecture Overview
- System Requirements
- Docker Compose Deployment
- Connecting Ollama as Model Provider
- Building Your First AI App
- RAG Pipeline Setup
- API Access and Integration
- Environment Configuration
- Scaling and Production Hardening
- Dify vs Flowise vs n8n
What Is Dify and Why Self-Host {#what-is-dify}
Dify is an open-source LLM application development platform. Think of it as a middle layer between raw model APIs and your end users. You design AI workflows visually, connect knowledge bases, and Dify generates API endpoints you can plug into any frontend.
Why self-host instead of using Dify Cloud:
| Factor | Dify Cloud | Self-Hosted |
|---|---|---|
| Data privacy | Data on Dify servers | Everything stays on your machine |
| Cost at scale | $59-599/month | Free (you pay for hardware) |
| Model freedom | Limited providers | Any model, including local Ollama |
| Customization | Standard features | Fork and modify anything |
| Latency | Internet round-trip | Sub-100ms on local network |
| Team size limits | Plan-dependent | Unlimited |
The self-hosted version includes every feature: workflow builder, RAG engine, agent capabilities, prompt IDE, annotation/logging, and multi-tenant workspaces. The only thing you lose is managed hosting.
Architecture Overview {#architecture-overview}
Dify's Docker Compose stack runs seven services:
┌─────────────────────────────────────────────┐
│ Nginx (port 80) │
│ Reverse proxy + static │
├──────────────────┬──────────────────────────┤
│ Web Frontend │ API Server │
│ (Next.js) │ (Flask/Python) │
├──────────────────┴──────────────────────────┤
│ Worker (Celery) │
│ Background jobs + indexing │
├─────────────┬───────────┬───────────────────┤
│ PostgreSQL │ Redis │ Weaviate │
│ (metadata) │ (cache) │ (vector store) │
└─────────────┴───────────┴───────────────────┘
│
▼
Ollama (external, port 11434)
- Nginx handles routing and serves the frontend
- API server manages all business logic, model routing, and app execution
- Worker processes async tasks: document indexing, dataset embedding, scheduled runs
- PostgreSQL stores users, apps, conversations, datasets metadata
- Redis handles caching, rate limiting, and Celery task queues
- Weaviate is the default vector database for RAG embeddings
The entire stack consumes about 2.5GB RAM at idle. During active RAG indexing, expect 4-6GB.
System Requirements {#system-requirements}
Minimum Hardware
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 8GB (Dify only) | 16GB+ (Dify + Ollama) |
| Storage | 20GB SSD | 100GB+ SSD |
| OS | Linux, macOS, or WSL2 | Ubuntu 22.04+ |
| Docker | v20.10+ | Latest stable |
| Docker Compose | v2.0+ | v2.20+ |
If you're running Ollama on the same machine, add the model's memory requirements on top. A 7B model needs roughly 4.5GB, so 16GB total RAM is the practical minimum for Dify + a usable model.
Software Prerequisites
# Check Docker version
docker --version # Need 20.10+
# Check Docker Compose
docker compose version # Need v2.0+
# Check available RAM
free -h # Linux
vm_stat # macOS
# Check disk space
df -h / # Need 20GB+ free
Docker Compose Deployment {#docker-compose-deployment}
Step 1: Clone the Repository
git clone https://github.com/langgenius/dify.git
cd dify/docker
Step 2: Configure Environment Variables
cp .env.example .env
Open .env and set these critical values:
# Security - change these immediately
SECRET_KEY=$(openssl rand -hex 32)
INIT_PASSWORD=your-secure-admin-password
# Database
DB_USERNAME=postgres
DB_PASSWORD=$(openssl rand -hex 16)
DB_HOST=db
DB_PORT=5432
DB_DATABASE=dify
# Redis
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_PASSWORD=$(openssl rand -hex 16)
# Storage (local filesystem for self-hosted)
STORAGE_TYPE=local
STORAGE_LOCAL_PATH=/app/api/storage
# Vector store
VECTOR_STORE=weaviate
WEAVIATE_ENDPOINT=http://weaviate:8080
Step 3: Launch the Stack
docker compose up -d
First launch pulls about 4GB of images. Subsequent starts take under 30 seconds.
Step 4: Verify All Services
# Check all containers are running
docker compose ps
# Expected output:
# dify-api running 0.0.0.0:5001->5001/tcp
# dify-web running 0.0.0.0:3000->3000/tcp
# dify-worker running
# dify-db running 0.0.0.0:5432->5432/tcp
# dify-redis running 0.0.0.0:6379->6379/tcp
# dify-weaviate running 0.0.0.0:8080->8080/tcp
# dify-nginx running 0.0.0.0:80->80/tcp
# Check logs if something fails
docker compose logs api --tail 50
docker compose logs worker --tail 50
Step 5: Create Admin Account
Open http://localhost in your browser. You'll see the setup wizard. Create your admin account with the password you set in INIT_PASSWORD.
Connecting Ollama as Model Provider {#connecting-ollama}
This is where Dify gets interesting for local AI. Instead of paying for OpenAI or Anthropic API calls, you point Dify at your Ollama instance.
Step 1: Make Sure Ollama Is Accessible
If Ollama runs on the same machine as Dify, Docker containers need to reach it via the host network:
# Start Ollama with external access
OLLAMA_HOST=0.0.0.0 ollama serve
# Verify it's accessible
curl http://host.docker.internal:11434/api/tags
# On Linux, use your machine's IP instead:
curl http://192.168.1.100:11434/api/tags
Step 2: Pull Models You Need
# Chat model
ollama pull llama3.2:8b
# Embedding model (critical for RAG)
ollama pull nomic-embed-text
# Code model (optional)
ollama pull qwen2.5-coder:7b
Step 3: Add Ollama in Dify Settings
- Go to Settings > Model Providers
- Find Ollama in the list and click Setup
- Enter the base URL:
- macOS/Windows:
http://host.docker.internal:11434 - Linux:
http://YOUR_MACHINE_IP:11434
- macOS/Windows:
- Click Save
Dify auto-discovers all models available in your Ollama instance. You'll see them listed immediately.
Step 4: Set Default Models
Go to Settings > Model Providers > System Model Settings and configure:
- Default Chat Model: llama3.2:8b
- Default Embedding Model: nomic-embed-text
- Default Reranking Model: (leave empty unless you have one)
Now every new app you create uses your local models by default. Zero API costs.
Docker Network Configuration (Linux)
On Linux, host.docker.internal may not resolve. Add this to your docker-compose.yml:
services:
api:
extra_hosts:
- "host.docker.internal:host-gateway"
worker:
extra_hosts:
- "host.docker.internal:host-gateway"
Then restart: docker compose up -d
Building Your First AI App {#first-ai-app}
Dify supports four app types. Here's when to use each:
| App Type | Use Case | Example |
|---|---|---|
| Chat | Conversational interface | Customer support bot |
| Completion | Single input/output | Email writer, summarizer |
| Workflow | Multi-step logic | Document processor pipeline |
| Agent | Tool-calling autonomous | Research assistant with web search |
Create a Knowledge Base Chat App
This is the most common use case -- a chatbot that answers questions from your documents.
- Click Create App > Chat App
- Name it "Internal Knowledge Base"
- In the prompt editor, set the system prompt:
You are a helpful assistant that answers questions based on the provided context.
If the context doesn't contain the answer, say so honestly.
Always cite which document your answer comes from.
Do not make up information.
- Under Context, click Add Dataset (we'll create this next)
- Set Model to your Ollama llama3.2:8b
- Set temperature to 0.3 for factual responses
- Click Publish
You now have a working chatbot with an API endpoint. The generated URL looks like:
POST http://localhost/v1/chat-messages
Authorization: Bearer app-xxxxxxxxxxxxxxxx
RAG Pipeline Setup {#rag-pipeline}
RAG (Retrieval-Augmented Generation) is where Dify's self-hosted value really shines. Your documents never leave your server.
Step 1: Create a Dataset
- Go to Knowledge > Create Dataset
- Name it and choose Import from file
- Supported formats: PDF, DOCX, TXT, Markdown, CSV, HTML, XLSX
- Upload your documents
Step 2: Configure Chunking
Dify offers two chunking modes:
Automatic -- Good for most documents:
- Splits by paragraphs and sections
- Chunk size: 500-1000 tokens (default)
- Overlap: 50 tokens
Custom -- Better for structured data:
Chunk size: 800 tokens
Overlap: 100 tokens
Separator: \n\n (double newline)
For technical documentation, I recommend 800-token chunks with 100-token overlap. Smaller chunks give more precise retrieval but lose context. Larger chunks retain context but dilute relevance scoring.
Step 3: Select Embedding Model
Choose nomic-embed-text from your Ollama models. This 137M parameter model produces 768-dimensional embeddings and handles up to 8192 tokens per chunk. It outperforms OpenAI's text-embedding-ada-002 on most retrieval benchmarks while running entirely on your hardware.
Step 4: Index and Test
Click Save and Process. Dify's worker container handles indexing in the background. For a 100-page PDF, expect 2-5 minutes on a 4-core machine.
Test retrieval quality:
- Go to your dataset
- Click Hit Testing
- Enter a question
- Verify the returned chunks contain relevant information
Step 5: Connect Dataset to Your App
Back in your chat app:
- Click Context > Add Dataset
- Select your dataset
- Configure retrieval settings:
- Top K: 3 (number of chunks retrieved)
- Score threshold: 0.5 (minimum relevance)
- Reranking: Enable if you have a reranking model
Test with a question about your documents. The model now answers using your private data.
For deeper RAG fundamentals and optimization techniques, check our complete RAG local setup guide.
API Access and Integration {#api-access}
Every Dify app automatically gets a REST API. This is one of Dify's strongest features -- you build the AI logic visually, then consume it as a standard API.
Get Your API Key
- Open your app
- Click Access API in the left sidebar
- Copy the API key
Chat Completion API
curl -X POST 'http://localhost/v1/chat-messages' \
-H 'Authorization: Bearer app-your-api-key' \
-H 'Content-Type: application/json' \
-d '{
"inputs": {},
"query": "What is our refund policy?",
"response_mode": "streaming",
"conversation_id": "",
"user": "user-123"
}'
Streaming Response
import requests
response = requests.post(
"http://localhost/v1/chat-messages",
headers={
"Authorization": "Bearer app-your-api-key",
"Content-Type": "application/json"
},
json={
"inputs": {},
"query": "Summarize Q4 results",
"response_mode": "streaming",
"user": "user-123"
},
stream=True
)
for line in response.iter_lines():
if line:
print(line.decode())
Embedding in a Web App
// Next.js / React example
const response = await fetch('http://your-dify-host/v1/chat-messages', {
method: 'POST',
headers: {
'Authorization': 'Bearer app-your-api-key',
'Content-Type': 'application/json',
},
body: JSON.stringify({
inputs: {},
query: userMessage,
response_mode: 'streaming',
conversation_id: conversationId,
user: userId,
}),
})
Dify also provides a built-in embeddable chat widget. Under Access API > Embedded, you get an iframe snippet you can drop into any webpage.
Environment Configuration {#environment-config}
Critical Environment Variables
# .env file reference
# ---- Security ----
SECRET_KEY=your-64-char-hex-string
INIT_PASSWORD=admin-password-change-this
# ---- Performance ----
# Worker concurrency (default 1, increase for heavy indexing)
CELERY_WORKER_CONCURRENCY=4
# API request timeout (seconds)
API_TIMEOUT=300
# Max upload size (MB)
UPLOAD_FILE_SIZE_LIMIT=50
# ---- Model Defaults ----
DEFAULT_LLM_PROVIDER=ollama
DEFAULT_LLM_MODEL=llama3.2:8b
# ---- Logging ----
LOG_LEVEL=INFO
LOG_FILE=/app/api/logs/dify.log
Persistent Storage
Docker volumes ensure your data survives container restarts:
volumes:
db_data: # PostgreSQL data
redis_data: # Redis cache
weaviate_data: # Vector embeddings
storage_data: # Uploaded files
Backup Strategy
# Backup PostgreSQL
docker compose exec db pg_dump -U postgres dify > backup_$(date +%Y%m%d).sql
# Backup volumes
docker run --rm -v dify_db_data:/data -v $(pwd):/backup \
alpine tar czf /backup/db_data_backup.tar.gz /data
# Restore PostgreSQL
cat backup_20260410.sql | docker compose exec -T db psql -U postgres dify
Scaling and Production Hardening {#scaling}
Running Behind a Reverse Proxy
For production, put Dify behind Nginx or Caddy with HTTPS:
# /etc/nginx/sites-available/dify
server {
listen 443 ssl http2;
server_name dify.yourdomain.com;
ssl_certificate /etc/letsencrypt/live/dify.yourdomain.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/dify.yourdomain.com/privkey.pem;
client_max_body_size 50M;
location / {
proxy_pass http://127.0.0.1:80;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support for streaming
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
Resource Limits
Add resource constraints in docker-compose.override.yml:
services:
api:
deploy:
resources:
limits:
memory: 2G
cpus: '2.0'
worker:
deploy:
resources:
limits:
memory: 4G
cpus: '4.0'
db:
deploy:
resources:
limits:
memory: 1G
Monitoring
# Watch container resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
# Check API health
curl -s http://localhost/health | jq .
# Monitor worker queue depth
docker compose exec redis redis-cli llen celery
Dify vs Flowise vs n8n {#comparison}
All three are excellent self-hosted tools, but they solve different problems:
| Feature | Dify | Flowise | n8n |
|---|---|---|---|
| Primary focus | LLM app platform | LLM flow builder | General automation |
| RAG built-in | Full pipeline | Basic | Via plugins |
| Visual builder | Workflow + prompt IDE | Flow-based | Node-based |
| API generation | Automatic per app | Per chatflow | Per workflow |
| Model providers | 30+ including Ollama | 20+ including Ollama | Via LangChain nodes |
| Multi-user | Built-in workspaces | Basic auth | Role-based |
| Dataset management | Full UI with chunking | Manual setup | Not built-in |
| GitHub stars | 53K+ | 34K+ | 51K+ |
| Best for | Production AI apps | Prototyping chatbots | Business automation |
Pick Dify when you need a production-ready AI application platform with proper dataset management, user workspaces, and auto-generated APIs. It's the most complete package for teams building customer-facing AI products.
Pick Flowise when you want fast prototyping of LangChain-based chatflows without writing code. It's simpler and faster to get started with.
Pick n8n when AI is one part of a larger automation workflow that includes email, databases, webhooks, and third-party services. Read our n8n + Ollama automation guide for that setup.
For the Ollama setup fundamentals that all three tools rely on, see our complete Ollama guide.
Updating Dify
cd dify/docker
# Pull latest images
docker compose pull
# Restart with new images
docker compose down
docker compose up -d
# Check version
docker compose exec api python -c "import app; print(app.__version__)"
Dify follows semantic versioning. Minor updates are safe. Major version bumps may require database migrations -- always check the GitHub release notes before upgrading.
Troubleshooting Common Issues
Weaviate Won't Start
# Check logs
docker compose logs weaviate
# Common fix: increase virtual memory
sudo sysctl -w vm.max_map_count=262144
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
Ollama Connection Refused
# Verify Ollama is listening on all interfaces
curl http://localhost:11434/api/tags
# On Linux, check firewall
sudo ufw status
sudo ufw allow 11434
# Test from inside Docker
docker compose exec api curl http://host.docker.internal:11434/api/tags
Slow Document Indexing
# Increase worker concurrency in .env
CELERY_WORKER_CONCURRENCY=4
# Check worker logs
docker compose logs worker --tail 100 -f
# Monitor embedding throughput
docker compose exec redis redis-cli info | grep instantaneous_ops_per_sec
Conclusion
Self-hosting Dify gives you a production-grade AI platform without recurring API costs or data privacy concerns. The combination of Dify's visual builder with Ollama's local model serving eliminates the two biggest barriers to deploying AI applications: cost and data sovereignty.
Start with a simple chat app connected to one dataset. Once that works, build a workflow app that chains multiple steps -- retrieval, summarization, and structured output. That's where Dify's architecture pays off versus simpler tools.
The official Dify documentation covers advanced topics like custom tool integration, SSO configuration, and multi-tenant isolation.
Running local AI models for the first time? Start with our complete Ollama guide to get your model server running, then come back here to build applications on top of it.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!