Flowise + Ollama: Build AI Chatbots Visually
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Flowise + Ollama: Build AI Chatbots Without Writing Code
Published on April 10, 2026 — 19 min read
Flowise is the tool I wish existed two years ago. It is a visual drag-and-drop builder for LLM applications — RAG chatbots, AI agents, document Q&A systems — that actually works in production. Connect it to Ollama and you have a complete AI application platform running on your hardware with no API costs, no coding required for basic setups, and a REST API you can embed anywhere.
I built a customer-facing documentation chatbot with Flowise + Ollama in about 90 minutes. It indexes 200 pages of product docs, answers questions accurately, and serves ~300 queries/day on a machine with an RTX 3060. Here is how to replicate that.
What you will build:
- Flowise + Ollama running in Docker
- A RAG chatbot that answers questions from your documents
- A conversational agent with memory
- API endpoint you can embed in any website or app
- Comparison with Dify, n8n, and LangChain for choosing the right tool
Prerequisites:
- Docker installed on your machine
- 8GB+ RAM (16GB recommended)
- Optional: NVIDIA GPU for faster inference
- Documents you want to make searchable (PDFs, text files, web pages)
For the foundational RAG concepts, see the RAG local setup guide. If you prefer automation workflows over chatbots, the n8n + Ollama guide covers that angle.
Table of Contents
- What is Flowise
- Docker Setup: Flowise + Ollama
- Flowise Interface Overview
- Build a Simple Chatbot (No RAG)
- Build a RAG Chatbot Step-by-Step
- Embedding Models and Vector Stores
- Adding Conversation Memory
- Deploy via API
- Flowise vs Dify vs n8n vs LangChain
- Troubleshooting
What is Flowise {#what-is-flowise}
Flowise is an open-source low-code platform for building LLM applications. It gives you a canvas where you drag nodes (LLMs, vector stores, document loaders, memory modules, tools) and connect them with wires. What you build visually compiles into a LangChain pipeline that runs as a REST API.
Why Flowise, not raw code?
If you are a developer, you might wonder why not just write LangChain/LlamaIndex code directly. Fair question. Here is where Flowise earns its keep:
- Speed: A RAG chatbot that takes 200 lines of Python takes 10 minutes of drag-and-drop in Flowise.
- Iteration: Swap embedding models, change chunk sizes, switch vector stores — all without touching code. Just reconnect nodes.
- Non-developers: Hand it to a product manager or support lead. They can tune prompts and add documents without deploying code.
- Built-in API: Every chatflow automatically gets a REST endpoint. No Flask/FastAPI boilerplate.
The tradeoff: Flowise is less flexible than raw code for complex custom logic. When you need 15-step agent chains with custom tool implementations, write code. For the 80% of use cases that are "load documents → chunk → embed → retrieve → answer," Flowise is faster and easier to maintain.
Docker Setup: Flowise + Ollama {#docker-setup}
Docker Compose File
mkdir -p ~/flowise-ollama && cd ~/flowise-ollama
# docker-compose.yml
version: "3.8"
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_FLASH_ATTENTION=1
- OLLAMA_KEEP_ALIVE=24h
flowise:
image: flowiseai/flowise:latest
container_name: flowise
restart: unless-stopped
ports:
- "3000:3000"
volumes:
- flowise_data:/root/.flowise
environment:
- FLOWISE_USERNAME=admin
- FLOWISE_PASSWORD=change-this-password
- APIKEY_PATH=/root/.flowise
- SECRETKEY_PATH=/root/.flowise
- LOG_LEVEL=info
depends_on:
- ollama
volumes:
ollama_data:
flowise_data:
Launch and Prepare
# Start both services
docker compose up -d
# Pull models for chat and embedding
docker exec ollama ollama pull llama3.2
docker exec ollama ollama pull nomic-embed-text
# Verify
docker compose ps
# flowise should be at http://localhost:3000
# ollama API at http://localhost:11434
Important model choices:
llama3.2— the chat model that generates responses (7B parameters, needs ~4.5GB VRAM)nomic-embed-text— the embedding model that converts text to vectors for search (137M parameters, needs ~300MB VRAM)
You need both. The chat model talks to users. The embedding model powers the search. They serve completely different functions.
Flowise Interface Overview {#interface-overview}
Open http://localhost:3000 and log in with the credentials from your Docker Compose file.
Key Areas
- Chatflows: The main workspace. Each chatflow is a visual pipeline that becomes an API.
- Agentflows: Similar to chatflows but designed for multi-step agent workflows with tool use.
- Marketplaces: Pre-built templates you can import and customize.
- Tools: Custom tools (API calls, code execution) that agents can use.
- Credentials: API keys and connection strings (including your Ollama endpoint).
- Document Stores: Manage uploaded documents and their vector embeddings.
Setting Up Ollama as a Credential
Before building anything:
- Click Credentials in the left sidebar
- Click Add Credential
- Search for "ChatOllama" → Add New
- Base URL:
http://ollama:11434(Docker service name, not localhost) - Save
Repeat for "Ollama Embeddings":
- Add Credential → search "Ollama Embeddings"
- Base URL:
http://ollama:11434 - Save
Build a Simple Chatbot (No RAG) {#simple-chatbot}
Start simple. A basic chatbot validates that your Ollama connection works before adding complexity.
Step-by-Step
- Click Chatflows → Add New
- Name it "Basic Ollama Chat"
- From the left panel, drag these nodes onto the canvas:
- ChatOllama (under Chat Models)
- Conversation Chain (under Chains)
- Connect ChatOllama → Conversation Chain (into the "Chat Model" input)
- Configure ChatOllama:
- Base URL:
http://ollama:11434 - Model Name:
llama3.2 - Temperature: 0.7
- Base URL:
- Click Save then use the chat panel on the right to test
Type a message. If you get a response, Ollama is connected and working. Response time on an RTX 3060 with Llama 3.2 7B: 2-4 seconds for a typical reply.
Adding a System Prompt
Drag a System Message node and connect it to the Conversation Chain's "System Message" input:
You are a helpful assistant for [Your Company]. You answer questions about our products and services. Be concise and accurate. If you don't know something, say so — never make up information.
Build a RAG Chatbot Step-by-Step {#rag-chatbot}
This is the main event. A RAG (Retrieval-Augmented Generation) chatbot that answers questions by searching your documents first, then generating responses based on what it finds.
Architecture
[User Question]
↓
[Embedding Model] → converts question to vector
↓
[Vector Store] → finds similar document chunks
↓
[Retrieved Context + Question]
↓
[Chat Model] → generates answer using context
↓
[Response to User]
Step-by-Step Build
1. Create a new Chatflow
- Name: "Documentation RAG Bot"
2. Drag these nodes onto the canvas:
- ChatOllama (Chat Models)
- Ollama Embeddings (Embeddings)
- Conversational Retrieval QA Chain (Chains)
- In-Memory Vector Store (Vector Stores) — or Chroma/Qdrant for persistence
- Document Loaders — choose based on your documents:
- PDF File for PDFs
- Text File for .txt files
- Cheerio Web Scraper for web pages
- Recursive Character Text Splitter (Text Splitters)
3. Connect the nodes:
[PDF File] → [Text Splitter] → [In-Memory Vector Store]
[Ollama Embeddings] → [In-Memory Vector Store]
[In-Memory Vector Store] → [Conversational Retrieval QA Chain]
[ChatOllama] → [Conversational Retrieval QA Chain]
4. Configure each node:
ChatOllama:
- Model:
llama3.2 - Temperature: 0.3 (low for factual accuracy)
- Base URL:
http://ollama:11434
Ollama Embeddings:
- Model:
nomic-embed-text - Base URL:
http://ollama:11434
Recursive Character Text Splitter:
- Chunk Size: 1000
- Chunk Overlap: 200
These numbers matter. 1000 characters per chunk gives the model enough context without overwhelming it. 200-character overlap ensures sentences are not cut in the middle. For dense technical documents, increase chunk size to 1500. For conversational content, 800 works well.
PDF File:
- Upload your PDF documents directly in the node configuration
5. Save and test
Upload a document and ask a question about its contents. The chatbot should quote or paraphrase information from the document rather than making up an answer.
Improving Accuracy
If the chatbot gives vague or incorrect answers:
- Increase chunk overlap to 300 — prevents context from being split mid-sentence
- Lower temperature to 0.1 — makes the model stick closer to retrieved context
- Use a larger model —
llama3.1:13bretrieves context better than 7B - Add more documents — RAG accuracy improves with more source material
- Change the prompt — add "Only answer based on the provided context" to the system message
Embedding Models and Vector Stores {#embeddings}
Embedding Models Available in Ollama
| Model | Dimensions | Size | Speed | Quality |
|---|---|---|---|---|
| nomic-embed-text | 768 | 274MB | Fast | Good |
| mxbai-embed-large | 1024 | 670MB | Medium | Better |
| all-minilm | 384 | 46MB | Fastest | Acceptable |
| snowflake-arctic-embed | 1024 | 670MB | Medium | Better |
# Pull embedding models
docker exec ollama ollama pull nomic-embed-text
docker exec ollama ollama pull mxbai-embed-large
For most use cases, nomic-embed-text hits the sweet spot of speed and quality. It produces 768-dimensional vectors and processes ~500 text chunks per minute on an RTX 3060.
If you need higher retrieval accuracy for technical or domain-specific content, mxbai-embed-large at 1024 dimensions performs noticeably better at the cost of 2x the processing time and storage.
Vector Store Options
In-Memory (default):
- No setup required
- Data lost on restart
- Fine for testing and small document sets (<100 pages)
Chroma (recommended for production):
Add Chroma to your Docker Compose:
chroma:
image: chromadb/chroma:latest
container_name: chroma
restart: unless-stopped
ports:
- "8000:8000"
volumes:
- chroma_data:/chroma/chroma
In Flowise, replace In-Memory Vector Store with Chroma:
- Collection Name:
my-docs - Chroma URL:
http://chroma:8000
Qdrant (for larger deployments):
qdrant:
image: qdrant/qdrant:latest
container_name: qdrant
restart: unless-stopped
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
Qdrant handles millions of vectors efficiently and supports filtering, which Chroma does not do as well at scale.
For a deeper comparison of vector databases, see the vector databases comparison guide.
Adding Conversation Memory {#memory}
Without memory, each message is independent — the chatbot forgets the previous exchange. Adding memory makes it conversational.
Buffer Memory
Drag a Buffer Memory node and connect it to the Conversational Retrieval QA Chain's "Memory" input.
Configuration:
- Memory Key:
chat_history - Session ID: leave blank (auto-generated per session)
- Memory Size: 5 (remembers last 5 exchanges)
This keeps the last 5 question-answer pairs in context. More than 5 starts consuming significant context window space, which reduces the room available for retrieved documents.
DynamoDB / Redis Memory (for persistence)
If your chatbot serves multiple users and you want conversation history to survive restarts:
redis:
image: redis:7-alpine
container_name: redis
restart: unless-stopped
ports:
- "6379:6379"
volumes:
- redis_data:/data
In Flowise, use the Redis-Backed Chat Memory node instead of Buffer Memory.
Deploy via API {#api-deployment}
Every Flowise chatflow automatically gets a REST API. This is how you embed the chatbot in your website, mobile app, or any other application.
Get the API Endpoint
- Open your chatflow
- Click the </> icon (top right, "API Endpoint")
- Copy the endpoint URL:
http://localhost:3000/api/v1/prediction/<chatflow-id>
Test with cURL
curl -X POST http://localhost:3000/api/v1/prediction/<chatflow-id> \
-H "Content-Type: application/json" \
-d '{"question": "What is your return policy?"}'
Response:
{
"text": "Based on our documentation, the return policy allows returns within 30 days of purchase...",
"sourceDocuments": [
{
"pageContent": "Returns are accepted within 30 days...",
"metadata": { "source": "return-policy.pdf", "page": 3 }
}
]
}
Embed in a Website
Flowise generates an embeddable chat widget:
- In the chatflow, click the Share icon
- Copy the embed code:
<script type="module">
import Chatbot from 'https://cdn.jsdelivr.net/npm/flowise-embed/dist/web.js'
Chatbot.init({
chatflowid: '<your-chatflow-id>',
apiHost: 'http://your-server:3000',
theme: {
button: { backgroundColor: '#3B81F6' },
chatWindow: { title: 'Documentation Assistant' }
}
})
</script>
Secure the API
# In Flowise, go to the chatflow → Settings → API Key
# Enable "Require API Key" and generate one
# Then include it in requests:
curl -X POST http://localhost:3000/api/v1/prediction/<chatflow-id> \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{"question": "What is your return policy?"}'
Flowise vs Dify vs n8n vs LangChain {#comparison}
These tools overlap but serve different primary purposes. Picking the wrong one costs you time.
| Flowise | Dify | n8n | LangChain (code) | |
|---|---|---|---|---|
| Primary use | RAG chatbots, AI apps | AI app platform | Workflow automation | Any LLM application |
| Interface | Visual flow builder | Web UI + visual | Visual workflow editor | Python/JS code |
| Ollama support | Native node | Plugin | Native node (v1.25+) | Direct integration |
| RAG built-in | Yes, multiple vector stores | Yes, built-in knowledge base | Basic (Code node) | Yes (LlamaIndex) |
| API output | Automatic REST API | Automatic REST API | Webhook-based | Manual (Flask/FastAPI) |
| Best for | Building chatbots fast | Full AI app platform | Automating business processes | Custom AI applications |
| Learning curve | Low | Low-Medium | Low | High |
| Self-hosted | Yes, Docker | Yes, Docker | Yes, Docker | N/A (it is a library) |
| License | Apache 2.0 | Custom (restrictive) | Fair-code (restrictions) | MIT |
My recommendation:
- Building a chatbot or document Q&A system? Use Flowise. It is purpose-built for this.
- Automating business workflows with AI? Use n8n. See the n8n + Ollama guide.
- Building a full AI application platform for a team? Dify has more features but a more restrictive license.
- Need maximum flexibility and you know Python? Use LangChain directly.
You can run Flowise and n8n side-by-side. They solve different problems. Flowise builds the AI brain (chatbot). n8n handles the plumbing (triggers, integrations, scheduling).
Troubleshooting {#troubleshooting}
Flowise cannot connect to Ollama
# Test connectivity from the Flowise container
docker exec flowise wget -qO- http://ollama:11434/api/version
# If it fails, check that both containers are on the same Docker network
docker network ls
docker network inspect flowise-ollama_default
# Verify Ollama is running
docker exec ollama ollama list
Embedding fails with "model not found"
# You need to pull the embedding model separately
docker exec ollama ollama pull nomic-embed-text
# Verify it is available
docker exec ollama ollama list
# Should show nomic-embed-text in the list
RAG returns irrelevant answers
Common causes:
- Chunk size too large: Reduce from 1000 to 500-800 characters
- Wrong embedding model: Switch from
all-minilmtonomic-embed-textfor better quality - Too few documents: RAG works better with more source material
- Temperature too high: Lower to 0.1-0.3 for factual tasks
- Retrieved chunks insufficient: Increase "Top K" from 4 to 6-8 in the retriever settings
Flowise crashes with large PDFs
# Increase Node.js memory limit
# In docker-compose.yml, under flowise environment:
environment:
- NODE_OPTIONS=--max-old-space-size=4096
Slow response times
# Check GPU utilization
nvidia-smi
# If GPU is not being used, verify Ollama has GPU access
docker exec ollama nvidia-smi
# For CPU-only: use a smaller model
docker exec ollama ollama pull phi3:mini
# Reduce chunk count retrieved (Top K = 3 instead of 6)
Chat history disappears on restart
The default In-Memory Vector Store and Buffer Memory are ephemeral. Switch to:
- Chroma or Qdrant for vector store persistence
- Redis-backed memory for conversation history persistence
Both require adding the respective service to your Docker Compose file (see sections above).
Production Deployment Checklist
Before exposing your Flowise chatbot to users:
- Switch from In-Memory to Chroma or Qdrant vector store
- Enable API key authentication on the chatflow
- Set
FLOWISE_USERNAMEandFLOWISE_PASSWORDto strong values - Add Nginx reverse proxy with TLS (see the Ollama + Open WebUI guide for Nginx config)
- Configure Docker resource limits to prevent OOM kills
- Set up automated backups for the Flowise data volume
- Test with adversarial prompts to verify the system prompt holds
- Monitor GPU memory and set
OLLAMA_MAX_LOADED_MODELS=1if tight on VRAM
What to Build Next
Once you have Flowise + Ollama running:
- Multi-source RAG: Combine PDFs, web pages, and database content in a single chatbot. Flowise supports mixing document loader types in one flow.
- Agent with tools: Build an Agentflow that can search the web, query databases, and calculate — all locally with Ollama.
- Custom embed widget: Style the Flowise chat widget to match your brand. The embed script accepts full theme customization.
- Feedback loop: Use Flowise's built-in feedback collection to identify questions your chatbot cannot answer, then add those documents.
For the underlying RAG architecture in detail, see the RAG local setup guide. To add your Flowise chatbot into broader automation workflows, connect it to n8n via the Flowise API.
Visit flowiseai.com for the official documentation and community templates.
Need help picking the right local model for your chatbot? See the best Ollama models guide. For a different approach to self-hosted AI chat, check the AnythingLLM setup guide.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!