n8n + Ollama: Self-Hosted AI Automation Guide
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
n8n + Ollama: Build AI Automation Without Paying for APIs
Published on April 10, 2026 — 20 min read
Most AI automation advice starts with "connect to the OpenAI API" and ends with a $500/month bill. There is a better path: run n8n (an open-source workflow engine) alongside Ollama (a local model runtime) on the same machine, and you get unlimited AI automation for the cost of electricity.
I replaced a Zapier Pro + OpenAI API setup that cost $170/month with n8n + Ollama running on a $300 used workstation. Same workflows. Same results. Zero recurring cost. The switch took an afternoon.
This guide shows you how to set up the stack and build three real workflows that handle actual work.
What you will build:
- n8n + Ollama running together in Docker
- Workflow 1: Automatic email summarizer (IMAP trigger → Ollama → Slack)
- Workflow 2: Document processor (webhook → file parse → Ollama → database)
- Workflow 3: Customer support chatbot (webhook → Ollama with context → response)
- Cost comparison and migration strategy from cloud tools
Prerequisites:
- A machine with Docker installed (Linux, macOS, or Windows with WSL2)
- 8GB+ RAM (16GB recommended for running 7B models alongside n8n)
- Basic understanding of REST APIs and JSON
For local AI model setup, see the free local AI models guide. For AI agent patterns that pair well with n8n, check the AI agents local guide.
Table of Contents
- What is n8n and Why Pair It with Ollama
- Docker Setup for n8n + Ollama
- Connecting n8n to Ollama
- Workflow 1: Email Summarizer
- Workflow 2: Document Processor
- Workflow 3: Support Chatbot
- Triggers, Scheduling, and Webhooks
- Cost Comparison: n8n + Ollama vs Cloud
- Performance and Limitations
- Troubleshooting
What is n8n and Why Pair It with Ollama {#what-is-n8n}
n8n is an open-source workflow automation platform. Think Zapier or Make.com, but you host it yourself and there are no per-execution limits. It has a visual editor where you drag and drop nodes — triggers, actions, conditions, loops — to build automation workflows without writing code.
n8n ships with 400+ integrations: Gmail, Slack, PostgreSQL, HTTP webhooks, cron schedules, Google Sheets, Notion, and hundreds more. What makes it powerful for AI automation is its native Ollama node, added in n8n v1.25.
Why this combination works:
| Problem with cloud AI automation | n8n + Ollama solution |
|---|---|
| OpenAI API costs $0.01-0.06 per 1K tokens | Local models: $0 per token |
| Zapier Pro: $49-89/mo for 2,000-5,000 tasks | n8n self-hosted: unlimited tasks |
| Data sent to third-party servers | All data stays on your machine |
| Rate limits during peak usage | No rate limits except your hardware |
| API key management and rotation | No API keys needed |
The tradeoff: local models are slower than GPT-4o and less capable at complex reasoning. For 80% of automation tasks — summarization, classification, extraction, reformatting — a local 7B or 13B model handles it fine. The 20% where you genuinely need GPT-4-level intelligence can still use a cloud API through n8n's OpenAI node.
Docker Setup for n8n + Ollama {#docker-setup}
Docker Compose File
mkdir -p ~/n8n-ollama && cd ~/n8n-ollama
# docker-compose.yml
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_FLASH_ATTENTION=1
n8n:
image: docker.n8n.io/n8nio/n8n:latest
container_name: n8n
restart: unless-stopped
ports:
- "5678:5678"
volumes:
- n8n_data:/home/node/.n8n
environment:
- N8N_HOST=0.0.0.0
- N8N_PORT=5678
- N8N_PROTOCOL=http
- WEBHOOK_URL=http://localhost:5678
- N8N_DIAGNOSTICS_ENABLED=false
- N8N_HIRING_BANNER_ENABLED=false
depends_on:
- ollama
volumes:
ollama_data:
n8n_data:
Launch and Pull Models
# Start both services
docker compose up -d
# Wait 10 seconds for Ollama to initialize, then pull models
docker exec ollama ollama pull llama3.2
docker exec ollama ollama pull qwen3:8b
# Verify both are running
docker compose ps
# n8n is at http://localhost:5678
# Ollama API is at http://localhost:11434
CPU-Only Setup
If you do not have an NVIDIA GPU, remove the deploy block from the Ollama service. Pull a smaller model:
docker exec ollama ollama pull phi3:mini
docker exec ollama ollama pull gemma:2b
Expect ~5-10 tokens/second on CPU for a 3B model. That is fast enough for background automation tasks where response time is not critical.
Connecting n8n to Ollama {#connecting}
Step 1: Create Ollama Credentials in n8n
- Open n8n at
http://localhost:5678 - Create your admin account (first-time setup)
- Go to Settings → Credentials → Add Credential
- Search for Ollama
- Set the Base URL to
http://ollama:11434- Use
ollama(the Docker service name), notlocalhost
- Use
- Click Save
Step 2: Test the Connection
Create a quick test workflow:
- Click Add Workflow → Add first step
- Add a Manual Trigger node (just to test)
- Add an Ollama Chat Model node
- Configure it:
- Credential: select the Ollama credential you created
- Model:
llama3.2
- Add a Basic LLM Chain node
- Connect: Manual Trigger → Basic LLM Chain (with Ollama Chat Model as the AI model)
- Set the prompt to:
Summarize in one sentence: The quick brown fox jumped over the lazy dog. - Click Test Workflow
If you get a response, the connection works. If not, check the troubleshooting section.
Workflow 1: Email Summarizer {#email-summarizer}
This workflow checks your inbox every 5 minutes, summarizes new emails with Ollama, and posts the summaries to Slack. Actual time savings: ~15 minutes per day if you get 30+ emails.
Workflow Structure
[IMAP Trigger] → [Filter: skip newsletters] → [Ollama: summarize] → [Slack: post to channel]
Node Configuration
1. IMAP Email Trigger
- Mailbox: INBOX
- Poll interval: 5 minutes
- Credential: Your email (Gmail, Outlook, or any IMAP server)
2. IF Node (Filter)
- Condition:
{{ $json.from }}does not contain "newsletter" AND does not contain "noreply" - This skips marketing emails and only processes real messages
3. Ollama Chat Model + Basic LLM Chain
- Model:
llama3.2(fast enough for summarization) - System prompt:
You are an email summarizer. For each email, produce:
1. SENDER: who sent it
2. URGENCY: high/medium/low
3. SUMMARY: 2-3 sentences max
4. ACTION NEEDED: yes/no, and what action
Be concise. No filler.
- User prompt:
Summarize this email:\nFrom: {{ $json.from }}\nSubject: {{ $json.subject }}\nBody: {{ $json.text.substring(0, 3000) }}
4. Slack Node
- Channel: #email-summaries
- Message:
*{{ $('IMAP').item.json.subject }}*\n{{ $json.text }}
Performance
With Llama 3.2 7B on an RTX 3060 (12GB), each email takes 3-8 seconds to summarize. A batch of 10 emails processes in under a minute. On CPU, expect 15-30 seconds per email.
Workflow 2: Document Processor {#document-processor}
This workflow accepts PDF uploads via webhook, extracts text, chunks it, sends each chunk to Ollama for analysis, and stores structured output in a database.
Workflow Structure
[Webhook: POST /process] → [Extract PDF text] → [Split into chunks] → [Ollama: extract data] → [PostgreSQL: insert]
Node Configuration
1. Webhook Node
- Method: POST
- Path: /process
- Response mode: Last node (returns result to caller)
2. Extract from File Node
- Operation: Extract text from PDF
- Input: Binary data from webhook
3. Code Node (Text Splitter)
const text = $input.first().json.data;
const chunkSize = 2000;
const overlap = 200;
const chunks = [];
for (let i = 0; i < text.length; i += chunkSize - overlap) {
chunks.push({
json: {
chunk: text.substring(i, i + chunkSize),
index: chunks.length,
total: Math.ceil(text.length / (chunkSize - overlap))
}
});
}
return chunks;
4. Ollama Chat Model + Basic LLM Chain
- Model:
qwen3:8b(good at structured extraction) - System prompt:
Extract structured data from this document chunk. Return JSON:
{
"entities": ["list of people, companies, products mentioned"],
"dates": ["any dates found"],
"amounts": ["any monetary amounts"],
"key_facts": ["2-3 important facts"],
"category": "one of: legal, financial, technical, correspondence, other"
}
Return ONLY valid JSON. No explanation.
- User prompt:
{{ $json.chunk }}
5. PostgreSQL Node
- Operation: Insert
- Table: document_extractions
- Columns: chunk_index, entities, dates, amounts, key_facts, category, processed_at
Triggering the Workflow
# Upload a PDF for processing
curl -X POST http://localhost:5678/webhook/process \
-F "file=@contract.pdf"
Workflow 3: Support Chatbot {#support-chatbot}
A webhook-based chatbot that answers questions using your documentation. This is a lightweight RAG setup without a vector database — suitable for small knowledge bases (under 50 pages).
Workflow Structure
[Webhook: POST /chat] → [Load context docs] → [Build prompt] → [Ollama: answer] → [Respond to webhook]
Node Configuration
1. Webhook Node
- Path: /chat
- Method: POST
- Expected body:
{ "question": "How do I reset my password?" }
2. Read Binary Files Node
- Read from: /home/node/.n8n/knowledge-base/
- Pattern: *.txt
- This loads your documentation files as context
3. Code Node (Build Prompt)
const question = $('Webhook').first().json.body.question;
const docs = $input.all().map(item => item.json.data).join('\n---\n');
return [{
json: {
prompt: `Answer the user's question using ONLY the context below. If the context doesn't contain the answer, say "I don't have information about that."
CONTEXT:
{docs.substring(0, 6000)}
QUESTION: {question}
ANSWER:`
}
}];
4. Ollama Chat Model + Basic LLM Chain
- Model:
llama3.2 - Temperature: 0.3 (lower = more factual, less creative)
- User prompt:
{{ $json.prompt }}
5. Respond to Webhook Node
- Response body:
{ "answer": "{{ $json.text }}" }
Testing
# Ask a question
curl -X POST http://localhost:5678/webhook/chat \
-H "Content-Type: application/json" \
-d '{"question": "What are your business hours?"}'
For a more sophisticated RAG setup with vector search and embedding, see the RAG local setup guide.
Triggers, Scheduling, and Webhooks {#triggers}
n8n supports multiple ways to start a workflow:
Cron / Schedule Trigger
Every 5 minutes: */5 * * * *
Every hour: 0 * * * *
Daily at 9 AM: 0 9 * * *
Weekdays at 8 AM: 0 8 * * 1-5
Webhook Trigger
Any external service can POST to your n8n webhook URL. Useful for:
- GitHub push events → AI code review
- Stripe payment events → AI receipt generation
- Form submissions → AI classification and routing
App-Specific Triggers
n8n has native triggers for:
- Gmail / IMAP: New email received
- Slack: New message in channel
- GitHub: Pull request opened, issue created
- Google Sheets: Row added or updated
- Telegram: New message to bot
- RSS: New feed item
Polling Triggers
For services without webhooks, n8n polls on a schedule:
- Check an API every N minutes
- Watch a folder for new files
- Monitor a database table for new rows
See the full integration list at n8n.io/integrations.
Cost Comparison: n8n + Ollama vs Cloud {#cost-comparison}
Real numbers from my migration. These are monthly costs for a small business running 5,000 AI-powered automations per month.
Cloud Stack (Before)
| Service | Monthly Cost |
|---|---|
| Zapier Professional (5,000 tasks) | $89 |
| OpenAI API (~2M tokens/month) | $60-80 |
| Make.com for backup workflows | $16 |
| Total | $165-185/mo |
Self-Hosted Stack (After)
| Item | Monthly Cost |
|---|---|
| Electricity (dedicated workstation) | ~$12 |
| n8n software | $0 (open source) |
| Ollama + models | $0 (open source) |
| Total | ~$12/mo |
Hardware investment: I bought a used Dell Precision T5820 with a Xeon W-2145, 64GB RAM, and an RTX 3060 12GB for $300 on eBay. At the $165/month cloud savings, it paid for itself in under 2 months.
Where Cloud Still Wins
Be honest about the limitations:
- GPT-4o-level reasoning: If your workflow requires complex multi-step reasoning, creative writing, or nuanced understanding, GPT-4o still outperforms local 7B-13B models significantly.
- Zero maintenance: Cloud services handle uptime, scaling, and updates. Self-hosting means you are the ops team.
- First 10 minutes of setup: Zapier + OpenAI works in 10 minutes. This guide takes 30-60 minutes.
My approach: use n8n + Ollama for the 80% of workflows that involve straightforward tasks (summarization, classification, extraction, formatting). Route the remaining 20% through n8n's OpenAI node when you genuinely need it.
Performance and Limitations {#limitations}
Throughput Benchmarks
Tested on RTX 3060 12GB with Llama 3.2 7B (Q4_K_M):
| Task | Tokens/sec | Time per item | Items/hour |
|---|---|---|---|
| Email summary (300 words in, 100 out) | 32 tok/s | 4-6 seconds | ~700 |
| Document extraction (2K chunk) | 32 tok/s | 8-12 seconds | ~350 |
| Classification (short text) | 32 tok/s | 2-3 seconds | ~1,400 |
| Chatbot response (with context) | 32 tok/s | 6-10 seconds | ~450 |
Bottlenecks
-
Model loading: First request after idle takes 5-15 seconds while the model loads into GPU memory. Set
OLLAMA_KEEP_ALIVE=24hto keep models loaded. -
Concurrency: With
OLLAMA_NUM_PARALLEL=2, two workflows can process simultaneously. More parallel requests cause queueing. A 24GB GPU can handleNUM_PARALLEL=4comfortably. -
Context length: Most local models max out at 8K-32K tokens. If your document chunks are too large, the model truncates or produces garbage at the end. Keep prompts under 4K tokens for consistent results.
-
No streaming in workflows: Unlike a chatbot interface, n8n waits for the complete Ollama response before passing it to the next node. This means the total workflow time includes full generation time.
Setting OLLAMA_KEEP_ALIVE
# In docker-compose.yml, under ollama environment:
environment:
- OLLAMA_KEEP_ALIVE=24h # Keep model loaded for 24 hours
- OLLAMA_NUM_PARALLEL=2
This eliminates cold-start latency at the cost of keeping GPU memory occupied.
Troubleshooting {#troubleshooting}
n8n cannot find the Ollama credential type
You need n8n v1.25 or newer. Check your version:
docker exec n8n n8n --version
# If below 1.25, update:
docker compose pull n8n
docker compose up -d n8n
"Connection refused" when n8n connects to Ollama
# The Ollama URL in n8n must use the Docker service name, not localhost
# Correct: http://ollama:11434
# Wrong: http://localhost:11434
# Test from inside the n8n container
docker exec n8n curl -s http://ollama:11434/api/version
Ollama returns empty or garbled responses
# Check if the model is actually loaded
docker exec ollama ollama list
# Test the model directly
docker exec ollama ollama run llama3.2 "Say hello"
# If it works directly but not through n8n, the prompt may be too long
# Reduce chunk sizes or use a model with larger context window
n8n workflow times out
Default timeout for HTTP nodes in n8n is 60 seconds. Ollama can take longer for large prompts on CPU.
# Increase n8n timeout
environment:
- N8N_DEFAULT_TIMEOUT=300
High memory usage
# Check what is consuming memory
docker stats
# n8n typically uses 200-500MB
# Ollama uses 4-12GB depending on the loaded model
# If running out of memory, use a smaller model
docker exec ollama ollama pull phi3:mini # Only ~2GB in VRAM
Workflows stop running after restart
Make sure n8n workflows are set to Active (the toggle in the top-right of the workflow editor). Only active workflows run automatically. Manual triggers require you to click "Test" each time.
Migration from Zapier / Make.com
If you are currently using cloud automation tools, here is a practical migration path:
- Audit your workflows: List every Zap or Scenario. Tag each as "simple AI task" or "needs GPT-4."
- Recreate in n8n: Start with the highest-volume simple workflows. n8n has import guides for Zapier workflows.
- Test side-by-side: Run both for a week. Compare output quality.
- Cut over gradually: Disable cloud workflows one at a time as you validate the n8n replacements.
- Keep a cloud fallback: Keep one OpenAI API node in n8n for tasks that genuinely need GPT-4o. Most workflows will not need it.
Building more complex AI agents? See the AI agents local guide for multi-step reasoning patterns. For a visual flow builder alternative, check out Flowise + Ollama.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!