What is Flowise and how does it work with Ollama?

Flowise is an open-source visual flow builder for LLM applications. It provides a drag-and-drop canvas where you connect nodes (LLMs, vector stores, document loaders, memory) to build AI chatbots and agents. It connects to Ollama via a native integration node, using Ollama for both chat inference and text embedding — all running locally with no API costs.

Do I need to know how to code to use Flowise?

No. Flowise is designed as a low-code platform. You build chatbots by dragging nodes onto a canvas and connecting them visually. Basic chatbots and RAG systems require zero coding. For advanced customization (custom tools, complex logic), a Code node is available for JavaScript, but most users never need it.

What hardware do I need for Flowise + Ollama?

Minimum: 8GB RAM, any modern CPU, no GPU required. Recommended: 16GB RAM and an NVIDIA GPU with 8GB+ VRAM. The chat model (e.g., Llama 3.2 7B) needs ~4.5GB VRAM, and the embedding model (nomic-embed-text) needs ~300MB. CPU-only setups work but responses take 10-20 seconds versus 2-4 seconds with a GPU.

What is RAG and why do I need it for a chatbot?

RAG (Retrieval-Augmented Generation) is a technique where the chatbot searches your documents for relevant information before generating a response. Without RAG, the model only knows what it learned during training. With RAG, it can answer questions about YOUR specific documents — product manuals, policies, knowledge bases — accurately and with source citations.

Which embedding model should I use with Ollama?

For most use cases: nomic-embed-text (768 dimensions, 274MB, good quality-speed balance). For higher accuracy on technical content: mxbai-embed-large (1024 dimensions, 670MB). For maximum speed with acceptable quality: all-minilm (384 dimensions, 46MB). Pull them with 'ollama pull nomic-embed-text'.

How does Flowise compare to Dify?

Flowise is focused on building chatbots and RAG systems with a visual flow builder. It uses Apache 2.0 licensing and is purely self-hosted. Dify is a broader AI application platform with more built-in features (knowledge base management, prompt engineering tools) but has a more restrictive custom license. Flowise is simpler; Dify is more feature-rich.

Can I embed a Flowise chatbot on my website?

Yes. Every Flowise chatflow generates a REST API endpoint and an embeddable JavaScript widget. Copy the embed code from the chatflow's Share menu and paste it into your website HTML. The widget is customizable (colors, title, position) and communicates with your Flowise server via the API.

How many documents can Flowise handle?

With the In-Memory vector store: up to ~100 pages before performance degrades. With Chroma or Qdrant: thousands of documents and millions of vector embeddings. For production use, always use a persistent vector store. Document ingestion speed with nomic-embed-text on an RTX 3060 is approximately 500 chunks per minute.

Flowise + Ollama: Build AI Chatbots Visually

Published on April 10, 2026 — 19 min read

Flowise is the tool I wish existed two years ago. It is a visual drag-and-drop builder for LLM applications — RAG chatbots, AI agents, document Q&A systems — that actually works in production. Connect it to Ollama and you have a complete AI application platform running on your hardware with no API costs, no coding required for basic setups, and a REST API you can embed anywhere.

I built a customer-facing documentation chatbot with Flowise + Ollama in about 90 minutes. It indexes 200 pages of product docs, answers questions accurately, and serves ~300 queries/day on a machine with an RTX 3060. Here is how to replicate that.

What you will build:

Flowise + Ollama running in Docker
A RAG chatbot that answers questions from your documents
A conversational agent with memory
API endpoint you can embed in any website or app
Comparison with Dify, n8n, and LangChain for choosing the right tool

Prerequisites:

Docker installed on your machine
8GB+ RAM (16GB recommended)
Optional: NVIDIA GPU for faster inference
Documents you want to make searchable (PDFs, text files, web pages)

For the foundational RAG concepts, see the RAG local setup guide. If you prefer automation workflows over chatbots, the n8n + Ollama guide covers that angle.

What is Flowise
Docker Setup: Flowise + Ollama
Flowise Interface Overview
Build a Simple Chatbot (No RAG)
Build a RAG Chatbot Step-by-Step
Embedding Models and Vector Stores
Adding Conversation Memory
Deploy via API
Flowise vs Dify vs n8n vs LangChain
Troubleshooting

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

What is Flowise {#what-is-flowise}

Flowise is an open-source low-code platform for building LLM applications. It gives you a canvas where you drag nodes (LLMs, vector stores, document loaders, memory modules, tools) and connect them with wires. What you build visually compiles into a LangChain pipeline that runs as a REST API.

Why Flowise, not raw code?

If you are a developer, you might wonder why not just write LangChain/LlamaIndex code directly. Fair question. Here is where Flowise earns its keep:

Speed: A RAG chatbot that takes 200 lines of Python takes 10 minutes of drag-and-drop in Flowise.
Iteration: Swap embedding models, change chunk sizes, switch vector stores — all without touching code. Just reconnect nodes.
Non-developers: Hand it to a product manager or support lead. They can tune prompts and add documents without deploying code.
Built-in API: Every chatflow automatically gets a REST endpoint. No Flask/FastAPI boilerplate.

The tradeoff: Flowise is less flexible than raw code for complex custom logic. When you need 15-step agent chains with custom tool implementations, write code. For the 80% of use cases that are "load documents → chunk → embed → retrieve → answer," Flowise is faster and easier to maintain.

Docker Setup: Flowise + Ollama {#docker-setup}

Docker Compose File

mkdir -p ~/flowise-ollama && cd ~/flowise-ollama

# docker-compose.yml
version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OLLAMA_NUM_PARALLEL=2
      - OLLAMA_FLASH_ATTENTION=1
      - OLLAMA_KEEP_ALIVE=24h

  flowise:
    image: flowiseai/flowise:latest
    container_name: flowise
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - flowise_data:/root/.flowise
    environment:
      - FLOWISE_USERNAME=admin
      - FLOWISE_PASSWORD=change-this-password
      - APIKEY_PATH=/root/.flowise
      - SECRETKEY_PATH=/root/.flowise
      - LOG_LEVEL=info
    depends_on:
      - ollama

volumes:
  ollama_data:
  flowise_data:

Launch and Prepare

# Start both services
docker compose up -d

# Pull models for chat and embedding
docker exec ollama ollama pull llama3.2
docker exec ollama ollama pull nomic-embed-text

# Verify
docker compose ps
# flowise should be at http://localhost:3000
# ollama API at http://localhost:11434

Important model choices:

llama3.2 — the chat model that generates responses (7B parameters, needs ~4.5GB VRAM)
nomic-embed-text — the embedding model that converts text to vectors for search (137M parameters, needs ~300MB VRAM)

You need both. The chat model talks to users. The embedding model powers the search. They serve completely different functions.

Flowise Interface Overview {#interface-overview}

Open http://localhost:3000 and log in with the credentials from your Docker Compose file.

Key Areas

Chatflows: The main workspace. Each chatflow is a visual pipeline that becomes an API.
Agentflows: Similar to chatflows but designed for multi-step agent workflows with tool use.
Marketplaces: Pre-built templates you can import and customize.
Tools: Custom tools (API calls, code execution) that agents can use.
Credentials: API keys and connection strings (including your Ollama endpoint).
Document Stores: Manage uploaded documents and their vector embeddings.

Setting Up Ollama as a Credential

Before building anything:

Click Credentials in the left sidebar
Click Add Credential
Search for "ChatOllama" → Add New
Base URL: http://ollama:11434 (Docker service name, not localhost)
Save

Repeat for "Ollama Embeddings":

Add Credential → search "Ollama Embeddings"
Base URL: http://ollama:11434
Save

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Build a Simple Chatbot (No RAG) {#simple-chatbot}

Start simple. A basic chatbot validates that your Ollama connection works before adding complexity.

Step-by-Step

Click Chatflows → Add New
Name it "Basic Ollama Chat"
From the left panel, drag these nodes onto the canvas:
- ChatOllama (under Chat Models)
- Conversation Chain (under Chains)
Connect ChatOllama → Conversation Chain (into the "Chat Model" input)
Configure ChatOllama:
- Base URL: http://ollama:11434
- Model Name: llama3.2
- Temperature: 0.7
Click Save then use the chat panel on the right to test

Type a message. If you get a response, Ollama is connected and working. Response time on an RTX 3060 with Llama 3.2 7B: 2-4 seconds for a typical reply.

Adding a System Prompt

Drag a System Message node and connect it to the Conversation Chain's "System Message" input:

You are a helpful assistant for [Your Company]. You answer questions about our products and services. Be concise and accurate. If you don't know something, say so — never make up information.

Build a RAG Chatbot Step-by-Step {#rag-chatbot}

This is the main event. A RAG (Retrieval-Augmented Generation) chatbot that answers questions by searching your documents first, then generating responses based on what it finds.

Architecture

[User Question]
       ↓
[Embedding Model] → converts question to vector
       ↓
[Vector Store] → finds similar document chunks
       ↓
[Retrieved Context + Question]
       ↓
[Chat Model] → generates answer using context
       ↓
[Response to User]

Step-by-Step Build

1. Create a new Chatflow

Name: "Documentation RAG Bot"

2. Drag these nodes onto the canvas:

ChatOllama (Chat Models)
Ollama Embeddings (Embeddings)
Conversational Retrieval QA Chain (Chains)
In-Memory Vector Store (Vector Stores) — or Chroma/Qdrant for persistence
Document Loaders — choose based on your documents:
- PDF File for PDFs
- Text File for .txt files
- Cheerio Web Scraper for web pages
Recursive Character Text Splitter (Text Splitters)

3. Connect the nodes:

[PDF File] → [Text Splitter] → [In-Memory Vector Store]
[Ollama Embeddings] → [In-Memory Vector Store]
[In-Memory Vector Store] → [Conversational Retrieval QA Chain]
[ChatOllama] → [Conversational Retrieval QA Chain]

4. Configure each node:

ChatOllama:

Model: llama3.2
Temperature: 0.3 (low for factual accuracy)
Base URL: http://ollama:11434

Ollama Embeddings:

Model: nomic-embed-text
Base URL: http://ollama:11434

Recursive Character Text Splitter:

Chunk Size: 1000
Chunk Overlap: 200

These numbers matter. 1000 characters per chunk gives the model enough context without overwhelming it. 200-character overlap ensures sentences are not cut in the middle. For dense technical documents, increase chunk size to 1500. For conversational content, 800 works well.

PDF File:

Upload your PDF documents directly in the node configuration

5. Save and test

Upload a document and ask a question about its contents. The chatbot should quote or paraphrase information from the document rather than making up an answer.

Improving Accuracy

If the chatbot gives vague or incorrect answers:

Increase chunk overlap to 300 — prevents context from being split mid-sentence
Lower temperature to 0.1 — makes the model stick closer to retrieved context
Use a larger model — llama3.1:13b retrieves context better than 7B
Add more documents — RAG accuracy improves with more source material
Change the prompt — add "Only answer based on the provided context" to the system message

Embedding Models and Vector Stores {#embeddings}

Embedding Models Available in Ollama

Model	Dimensions	Size	Speed	Quality
nomic-embed-text	768	274MB	Fast	Good
mxbai-embed-large	1024	670MB	Medium	Better
all-minilm	384	46MB	Fastest	Acceptable
snowflake-arctic-embed	1024	670MB	Medium	Better

# Pull embedding models
docker exec ollama ollama pull nomic-embed-text
docker exec ollama ollama pull mxbai-embed-large

For most use cases, nomic-embed-text hits the sweet spot of speed and quality. It produces 768-dimensional vectors and processes ~500 text chunks per minute on an RTX 3060.

If you need higher retrieval accuracy for technical or domain-specific content, mxbai-embed-large at 1024 dimensions performs noticeably better at the cost of 2x the processing time and storage.

Vector Store Options

In-Memory (default):

No setup required
Data lost on restart
Fine for testing and small document sets (<100 pages)

Chroma (recommended for production):

Add Chroma to your Docker Compose:

  chroma:
    image: chromadb/chroma:latest
    container_name: chroma
    restart: unless-stopped
    ports:
      - "8000:8000"
    volumes:
      - chroma_data:/chroma/chroma

In Flowise, replace In-Memory Vector Store with Chroma:

Collection Name: my-docs
Chroma URL: http://chroma:8000

Qdrant (for larger deployments):

  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    restart: unless-stopped
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

Qdrant handles millions of vectors efficiently and supports filtering, which Chroma does not do as well at scale.

For a deeper comparison of vector databases, see the vector databases comparison guide.

Adding Conversation Memory {#memory}

Without memory, each message is independent — the chatbot forgets the previous exchange. Adding memory makes it conversational.

Buffer Memory

Drag a Buffer Memory node and connect it to the Conversational Retrieval QA Chain's "Memory" input.

Configuration:

Memory Key: chat_history
Session ID: leave blank (auto-generated per session)
Memory Size: 5 (remembers last 5 exchanges)

This keeps the last 5 question-answer pairs in context. More than 5 starts consuming significant context window space, which reduces the room available for retrieved documents.

DynamoDB / Redis Memory (for persistence)

If your chatbot serves multiple users and you want conversation history to survive restarts:

  redis:
    image: redis:7-alpine
    container_name: redis
    restart: unless-stopped
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

In Flowise, use the Redis-Backed Chat Memory node instead of Buffer Memory.

Deploy via API {#api-deployment}

Every Flowise chatflow automatically gets a REST API. This is how you embed the chatbot in your website, mobile app, or any other application.

Get the API Endpoint

Open your chatflow
Click the </> icon (top right, "API Endpoint")
Copy the endpoint URL: http://localhost:3000/api/v1/prediction/<chatflow-id>

Test with cURL

curl -X POST http://localhost:3000/api/v1/prediction/<chatflow-id> \
  -H "Content-Type: application/json" \
  -d '{"question": "What is your return policy?"}'

Response:

{
  "text": "Based on our documentation, the return policy allows returns within 30 days of purchase...",
  "sourceDocuments": [
    {
      "pageContent": "Returns are accepted within 30 days...",
      "metadata": { "source": "return-policy.pdf", "page": 3 }
    }
  ]
}

Embed in a Website

Flowise generates an embeddable chat widget:

In the chatflow, click the Share icon
Copy the embed code:

<script type="module">
  import Chatbot from 'https://cdn.jsdelivr.net/npm/flowise-embed/dist/web.js'
  Chatbot.init({
    chatflowid: '<your-chatflow-id>',
    apiHost: 'http://your-server:3000',
    theme: {
      button: { backgroundColor: '#3B81F6' },
      chatWindow: { title: 'Documentation Assistant' }
    }
  })
</script>

Secure the API

# In Flowise, go to the chatflow → Settings → API Key
# Enable "Require API Key" and generate one

# Then include it in requests:
curl -X POST http://localhost:3000/api/v1/prediction/<chatflow-id> \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{"question": "What is your return policy?"}'

Flowise vs Dify vs n8n vs LangChain {#comparison}

These tools overlap but serve different primary purposes. Picking the wrong one costs you time.

	Flowise	Dify	n8n	LangChain (code)
Primary use	RAG chatbots, AI apps	AI app platform	Workflow automation	Any LLM application
Interface	Visual flow builder	Web UI + visual	Visual workflow editor	Python/JS code
Ollama support	Native node	Plugin	Native node (v1.25+)	Direct integration
RAG built-in	Yes, multiple vector stores	Yes, built-in knowledge base	Basic (Code node)	Yes (LlamaIndex)
API output	Automatic REST API	Automatic REST API	Webhook-based	Manual (Flask/FastAPI)
Best for	Building chatbots fast	Full AI app platform	Automating business processes	Custom AI applications
Learning curve	Low	Low-Medium	Low	High
Self-hosted	Yes, Docker	Yes, Docker	Yes, Docker	N/A (it is a library)
License	Apache 2.0	Custom (restrictive)	Fair-code (restrictions)	MIT

My recommendation:

Building a chatbot or document Q&A system? Use Flowise. It is purpose-built for this.
Automating business workflows with AI? Use n8n. See the n8n + Ollama guide.
Building a full AI application platform for a team? Dify has more features but a more restrictive license.
Need maximum flexibility and you know Python? Use LangChain directly.

You can run Flowise and n8n side-by-side. They solve different problems. Flowise builds the AI brain (chatbot). n8n handles the plumbing (triggers, integrations, scheduling).

Troubleshooting {#troubleshooting}

Flowise cannot connect to Ollama

# Test connectivity from the Flowise container
docker exec flowise wget -qO- http://ollama:11434/api/version

# If it fails, check that both containers are on the same Docker network
docker network ls
docker network inspect flowise-ollama_default

# Verify Ollama is running
docker exec ollama ollama list

Embedding fails with "model not found"

# You need to pull the embedding model separately
docker exec ollama ollama pull nomic-embed-text

# Verify it is available
docker exec ollama ollama list
# Should show nomic-embed-text in the list

RAG returns irrelevant answers

Common causes:

Chunk size too large: Reduce from 1000 to 500-800 characters
Wrong embedding model: Switch from all-minilm to nomic-embed-text for better quality
Too few documents: RAG works better with more source material
Temperature too high: Lower to 0.1-0.3 for factual tasks
Retrieved chunks insufficient: Increase "Top K" from 4 to 6-8 in the retriever settings

Flowise crashes with large PDFs

# Increase Node.js memory limit
# In docker-compose.yml, under flowise environment:
environment:
  - NODE_OPTIONS=--max-old-space-size=4096

Slow response times

# Check GPU utilization
nvidia-smi

# If GPU is not being used, verify Ollama has GPU access
docker exec ollama nvidia-smi

# For CPU-only: use a smaller model
docker exec ollama ollama pull phi3:mini

# Reduce chunk count retrieved (Top K = 3 instead of 6)

Chat history disappears on restart

The default In-Memory Vector Store and Buffer Memory are ephemeral. Switch to:

Chroma or Qdrant for vector store persistence
Redis-backed memory for conversation history persistence

Both require adding the respective service to your Docker Compose file (see sections above).

Production Deployment Checklist

Before exposing your Flowise chatbot to users:

Switch from In-Memory to Chroma or Qdrant vector store
Enable API key authentication on the chatflow
Set FLOWISE_USERNAME and FLOWISE_PASSWORD to strong values
Add Nginx reverse proxy with TLS (see the Ollama + Open WebUI guide for Nginx config)
Configure Docker resource limits to prevent OOM kills
Set up automated backups for the Flowise data volume
Test with adversarial prompts to verify the system prompt holds
Monitor GPU memory and set OLLAMA_MAX_LOADED_MODELS=1 if tight on VRAM

What to Build Next

Once you have Flowise + Ollama running:

Multi-source RAG: Combine PDFs, web pages, and database content in a single chatbot. Flowise supports mixing document loader types in one flow.
Agent with tools: Build an Agentflow that can search the web, query databases, and calculate — all locally with Ollama.
Custom embed widget: Style the Flowise chat widget to match your brand. The embed script accepts full theme customization.
Feedback loop: Use Flowise's built-in feedback collection to identify questions your chatbot cannot answer, then add those documents.

For the underlying RAG architecture in detail, see the RAG local setup guide. To add your Flowise chatbot into broader automation workflows, connect it to n8n via the Flowise API.

Visit flowiseai.com for the official documentation and community templates.

Need help picking the right local model for your chatbot? See the best Ollama models guide. For a different approach to self-hosted AI chat, check the AnythingLLM setup guide.

Flowise + Ollama: Build AI Chatbots Visually

Want to go deeper than this article?

Table of Contents

Reading articles is good. Building is better.

What is Flowise {#what-is-flowise}

Docker Setup: Flowise + Ollama {#docker-setup}

Docker Compose File

Launch and Prepare

Flowise Interface Overview {#interface-overview}

Key Areas

Setting Up Ollama as a Credential

Reading articles is good. Building is better.

Build a Simple Chatbot (No RAG) {#simple-chatbot}

Step-by-Step

Adding a System Prompt

Build a RAG Chatbot Step-by-Step {#rag-chatbot}

Architecture

Step-by-Step Build

Improving Accuracy

Embedding Models and Vector Stores {#embeddings}

Embedding Models Available in Ollama

Vector Store Options

Adding Conversation Memory {#memory}

Buffer Memory

DynamoDB / Redis Memory (for persistence)

Deploy via API {#api-deployment}

Get the API Endpoint

Test with cURL

Embed in a Website

Secure the API

Flowise vs Dify vs n8n vs LangChain {#comparison}

Troubleshooting {#troubleshooting}

Flowise cannot connect to Ollama

Embedding fails with "model not found"

RAG returns irrelevant answers

Flowise crashes with large PDFs

Slow response times

Chat history disappears on restart

Production Deployment Checklist

What to Build Next

Ollama’s running. Here’s what to build with it.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by the Local AI Master Team

Build Smarter Chatbots

Build Real AI on Your Machine

🎓 Continue Learning

Related Guides

RAG Local Setup: Complete Guide

n8n + Ollama: AI Automation

AnythingLLM: Self-Hosted AI Chat

Vector Databases Compared

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Ollama’s running. Here’s what to build with it.