Builder Guide

Flowise + Ollama: Build AI Chatbots Visually

April 10, 2026
19 min read
Local AI Master Research Team

Want to go deeper than this article?

The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.

Flowise + Ollama: Build AI Chatbots Without Writing Code

Published on April 10, 2026 — 19 min read

Flowise is the tool I wish existed two years ago. It is a visual drag-and-drop builder for LLM applications — RAG chatbots, AI agents, document Q&A systems — that actually works in production. Connect it to Ollama and you have a complete AI application platform running on your hardware with no API costs, no coding required for basic setups, and a REST API you can embed anywhere.

I built a customer-facing documentation chatbot with Flowise + Ollama in about 90 minutes. It indexes 200 pages of product docs, answers questions accurately, and serves ~300 queries/day on a machine with an RTX 3060. Here is how to replicate that.


What you will build:

  • Flowise + Ollama running in Docker
  • A RAG chatbot that answers questions from your documents
  • A conversational agent with memory
  • API endpoint you can embed in any website or app
  • Comparison with Dify, n8n, and LangChain for choosing the right tool

Prerequisites:

  • Docker installed on your machine
  • 8GB+ RAM (16GB recommended)
  • Optional: NVIDIA GPU for faster inference
  • Documents you want to make searchable (PDFs, text files, web pages)

For the foundational RAG concepts, see the RAG local setup guide. If you prefer automation workflows over chatbots, the n8n + Ollama guide covers that angle.

Table of Contents

  1. What is Flowise
  2. Docker Setup: Flowise + Ollama
  3. Flowise Interface Overview
  4. Build a Simple Chatbot (No RAG)
  5. Build a RAG Chatbot Step-by-Step
  6. Embedding Models and Vector Stores
  7. Adding Conversation Memory
  8. Deploy via API
  9. Flowise vs Dify vs n8n vs LangChain
  10. Troubleshooting

What is Flowise {#what-is-flowise}

Flowise is an open-source low-code platform for building LLM applications. It gives you a canvas where you drag nodes (LLMs, vector stores, document loaders, memory modules, tools) and connect them with wires. What you build visually compiles into a LangChain pipeline that runs as a REST API.

Why Flowise, not raw code?

If you are a developer, you might wonder why not just write LangChain/LlamaIndex code directly. Fair question. Here is where Flowise earns its keep:

  • Speed: A RAG chatbot that takes 200 lines of Python takes 10 minutes of drag-and-drop in Flowise.
  • Iteration: Swap embedding models, change chunk sizes, switch vector stores — all without touching code. Just reconnect nodes.
  • Non-developers: Hand it to a product manager or support lead. They can tune prompts and add documents without deploying code.
  • Built-in API: Every chatflow automatically gets a REST endpoint. No Flask/FastAPI boilerplate.

The tradeoff: Flowise is less flexible than raw code for complex custom logic. When you need 15-step agent chains with custom tool implementations, write code. For the 80% of use cases that are "load documents → chunk → embed → retrieve → answer," Flowise is faster and easier to maintain.


Docker Setup: Flowise + Ollama {#docker-setup}

Docker Compose File

mkdir -p ~/flowise-ollama && cd ~/flowise-ollama
# docker-compose.yml
version: "3.8"

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    environment:
      - OLLAMA_NUM_PARALLEL=2
      - OLLAMA_FLASH_ATTENTION=1
      - OLLAMA_KEEP_ALIVE=24h

  flowise:
    image: flowiseai/flowise:latest
    container_name: flowise
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - flowise_data:/root/.flowise
    environment:
      - FLOWISE_USERNAME=admin
      - FLOWISE_PASSWORD=change-this-password
      - APIKEY_PATH=/root/.flowise
      - SECRETKEY_PATH=/root/.flowise
      - LOG_LEVEL=info
    depends_on:
      - ollama

volumes:
  ollama_data:
  flowise_data:

Launch and Prepare

# Start both services
docker compose up -d

# Pull models for chat and embedding
docker exec ollama ollama pull llama3.2
docker exec ollama ollama pull nomic-embed-text

# Verify
docker compose ps
# flowise should be at http://localhost:3000
# ollama API at http://localhost:11434

Important model choices:

  • llama3.2 — the chat model that generates responses (7B parameters, needs ~4.5GB VRAM)
  • nomic-embed-text — the embedding model that converts text to vectors for search (137M parameters, needs ~300MB VRAM)

You need both. The chat model talks to users. The embedding model powers the search. They serve completely different functions.


Flowise Interface Overview {#interface-overview}

Open http://localhost:3000 and log in with the credentials from your Docker Compose file.

Key Areas

  • Chatflows: The main workspace. Each chatflow is a visual pipeline that becomes an API.
  • Agentflows: Similar to chatflows but designed for multi-step agent workflows with tool use.
  • Marketplaces: Pre-built templates you can import and customize.
  • Tools: Custom tools (API calls, code execution) that agents can use.
  • Credentials: API keys and connection strings (including your Ollama endpoint).
  • Document Stores: Manage uploaded documents and their vector embeddings.

Setting Up Ollama as a Credential

Before building anything:

  1. Click Credentials in the left sidebar
  2. Click Add Credential
  3. Search for "ChatOllama" → Add New
  4. Base URL: http://ollama:11434 (Docker service name, not localhost)
  5. Save

Repeat for "Ollama Embeddings":

  1. Add Credential → search "Ollama Embeddings"
  2. Base URL: http://ollama:11434
  3. Save

Build a Simple Chatbot (No RAG) {#simple-chatbot}

Start simple. A basic chatbot validates that your Ollama connection works before adding complexity.

Step-by-Step

  1. Click ChatflowsAdd New
  2. Name it "Basic Ollama Chat"
  3. From the left panel, drag these nodes onto the canvas:
    • ChatOllama (under Chat Models)
    • Conversation Chain (under Chains)
  4. Connect ChatOllamaConversation Chain (into the "Chat Model" input)
  5. Configure ChatOllama:
    • Base URL: http://ollama:11434
    • Model Name: llama3.2
    • Temperature: 0.7
  6. Click Save then use the chat panel on the right to test

Type a message. If you get a response, Ollama is connected and working. Response time on an RTX 3060 with Llama 3.2 7B: 2-4 seconds for a typical reply.

Adding a System Prompt

Drag a System Message node and connect it to the Conversation Chain's "System Message" input:

You are a helpful assistant for [Your Company]. You answer questions about our products and services. Be concise and accurate. If you don't know something, say so — never make up information.

Build a RAG Chatbot Step-by-Step {#rag-chatbot}

This is the main event. A RAG (Retrieval-Augmented Generation) chatbot that answers questions by searching your documents first, then generating responses based on what it finds.

Architecture

[User Question]
       ↓
[Embedding Model] → converts question to vector
       ↓
[Vector Store] → finds similar document chunks
       ↓
[Retrieved Context + Question]
       ↓
[Chat Model] → generates answer using context
       ↓
[Response to User]

Step-by-Step Build

1. Create a new Chatflow

  • Name: "Documentation RAG Bot"

2. Drag these nodes onto the canvas:

  • ChatOllama (Chat Models)
  • Ollama Embeddings (Embeddings)
  • Conversational Retrieval QA Chain (Chains)
  • In-Memory Vector Store (Vector Stores) — or Chroma/Qdrant for persistence
  • Document Loaders — choose based on your documents:
    • PDF File for PDFs
    • Text File for .txt files
    • Cheerio Web Scraper for web pages
  • Recursive Character Text Splitter (Text Splitters)

3. Connect the nodes:

[PDF File] → [Text Splitter] → [In-Memory Vector Store]
[Ollama Embeddings] → [In-Memory Vector Store]
[In-Memory Vector Store] → [Conversational Retrieval QA Chain]
[ChatOllama] → [Conversational Retrieval QA Chain]

4. Configure each node:

ChatOllama:

  • Model: llama3.2
  • Temperature: 0.3 (low for factual accuracy)
  • Base URL: http://ollama:11434

Ollama Embeddings:

  • Model: nomic-embed-text
  • Base URL: http://ollama:11434

Recursive Character Text Splitter:

  • Chunk Size: 1000
  • Chunk Overlap: 200

These numbers matter. 1000 characters per chunk gives the model enough context without overwhelming it. 200-character overlap ensures sentences are not cut in the middle. For dense technical documents, increase chunk size to 1500. For conversational content, 800 works well.

PDF File:

  • Upload your PDF documents directly in the node configuration

5. Save and test

Upload a document and ask a question about its contents. The chatbot should quote or paraphrase information from the document rather than making up an answer.

Improving Accuracy

If the chatbot gives vague or incorrect answers:

  • Increase chunk overlap to 300 — prevents context from being split mid-sentence
  • Lower temperature to 0.1 — makes the model stick closer to retrieved context
  • Use a larger modelllama3.1:13b retrieves context better than 7B
  • Add more documents — RAG accuracy improves with more source material
  • Change the prompt — add "Only answer based on the provided context" to the system message

Embedding Models and Vector Stores {#embeddings}

Embedding Models Available in Ollama

ModelDimensionsSizeSpeedQuality
nomic-embed-text768274MBFastGood
mxbai-embed-large1024670MBMediumBetter
all-minilm38446MBFastestAcceptable
snowflake-arctic-embed1024670MBMediumBetter
# Pull embedding models
docker exec ollama ollama pull nomic-embed-text
docker exec ollama ollama pull mxbai-embed-large

For most use cases, nomic-embed-text hits the sweet spot of speed and quality. It produces 768-dimensional vectors and processes ~500 text chunks per minute on an RTX 3060.

If you need higher retrieval accuracy for technical or domain-specific content, mxbai-embed-large at 1024 dimensions performs noticeably better at the cost of 2x the processing time and storage.

Vector Store Options

In-Memory (default):

  • No setup required
  • Data lost on restart
  • Fine for testing and small document sets (<100 pages)

Chroma (recommended for production):

Add Chroma to your Docker Compose:

  chroma:
    image: chromadb/chroma:latest
    container_name: chroma
    restart: unless-stopped
    ports:
      - "8000:8000"
    volumes:
      - chroma_data:/chroma/chroma

In Flowise, replace In-Memory Vector Store with Chroma:

  • Collection Name: my-docs
  • Chroma URL: http://chroma:8000

Qdrant (for larger deployments):

  qdrant:
    image: qdrant/qdrant:latest
    container_name: qdrant
    restart: unless-stopped
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

Qdrant handles millions of vectors efficiently and supports filtering, which Chroma does not do as well at scale.

For a deeper comparison of vector databases, see the vector databases comparison guide.


Adding Conversation Memory {#memory}

Without memory, each message is independent — the chatbot forgets the previous exchange. Adding memory makes it conversational.

Buffer Memory

Drag a Buffer Memory node and connect it to the Conversational Retrieval QA Chain's "Memory" input.

Configuration:

  • Memory Key: chat_history
  • Session ID: leave blank (auto-generated per session)
  • Memory Size: 5 (remembers last 5 exchanges)

This keeps the last 5 question-answer pairs in context. More than 5 starts consuming significant context window space, which reduces the room available for retrieved documents.

DynamoDB / Redis Memory (for persistence)

If your chatbot serves multiple users and you want conversation history to survive restarts:

  redis:
    image: redis:7-alpine
    container_name: redis
    restart: unless-stopped
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

In Flowise, use the Redis-Backed Chat Memory node instead of Buffer Memory.


Deploy via API {#api-deployment}

Every Flowise chatflow automatically gets a REST API. This is how you embed the chatbot in your website, mobile app, or any other application.

Get the API Endpoint

  1. Open your chatflow
  2. Click the </> icon (top right, "API Endpoint")
  3. Copy the endpoint URL: http://localhost:3000/api/v1/prediction/<chatflow-id>

Test with cURL

curl -X POST http://localhost:3000/api/v1/prediction/<chatflow-id> \
  -H "Content-Type: application/json" \
  -d '{"question": "What is your return policy?"}'

Response:

{
  "text": "Based on our documentation, the return policy allows returns within 30 days of purchase...",
  "sourceDocuments": [
    {
      "pageContent": "Returns are accepted within 30 days...",
      "metadata": { "source": "return-policy.pdf", "page": 3 }
    }
  ]
}

Embed in a Website

Flowise generates an embeddable chat widget:

  1. In the chatflow, click the Share icon
  2. Copy the embed code:
<script type="module">
  import Chatbot from 'https://cdn.jsdelivr.net/npm/flowise-embed/dist/web.js'
  Chatbot.init({
    chatflowid: '<your-chatflow-id>',
    apiHost: 'http://your-server:3000',
    theme: {
      button: { backgroundColor: '#3B81F6' },
      chatWindow: { title: 'Documentation Assistant' }
    }
  })
</script>

Secure the API

# In Flowise, go to the chatflow → Settings → API Key
# Enable "Require API Key" and generate one

# Then include it in requests:
curl -X POST http://localhost:3000/api/v1/prediction/<chatflow-id> \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <your-api-key>" \
  -d '{"question": "What is your return policy?"}'

Flowise vs Dify vs n8n vs LangChain {#comparison}

These tools overlap but serve different primary purposes. Picking the wrong one costs you time.

FlowiseDifyn8nLangChain (code)
Primary useRAG chatbots, AI appsAI app platformWorkflow automationAny LLM application
InterfaceVisual flow builderWeb UI + visualVisual workflow editorPython/JS code
Ollama supportNative nodePluginNative node (v1.25+)Direct integration
RAG built-inYes, multiple vector storesYes, built-in knowledge baseBasic (Code node)Yes (LlamaIndex)
API outputAutomatic REST APIAutomatic REST APIWebhook-basedManual (Flask/FastAPI)
Best forBuilding chatbots fastFull AI app platformAutomating business processesCustom AI applications
Learning curveLowLow-MediumLowHigh
Self-hostedYes, DockerYes, DockerYes, DockerN/A (it is a library)
LicenseApache 2.0Custom (restrictive)Fair-code (restrictions)MIT

My recommendation:

  • Building a chatbot or document Q&A system? Use Flowise. It is purpose-built for this.
  • Automating business workflows with AI? Use n8n. See the n8n + Ollama guide.
  • Building a full AI application platform for a team? Dify has more features but a more restrictive license.
  • Need maximum flexibility and you know Python? Use LangChain directly.

You can run Flowise and n8n side-by-side. They solve different problems. Flowise builds the AI brain (chatbot). n8n handles the plumbing (triggers, integrations, scheduling).


Troubleshooting {#troubleshooting}

Flowise cannot connect to Ollama

# Test connectivity from the Flowise container
docker exec flowise wget -qO- http://ollama:11434/api/version

# If it fails, check that both containers are on the same Docker network
docker network ls
docker network inspect flowise-ollama_default

# Verify Ollama is running
docker exec ollama ollama list

Embedding fails with "model not found"

# You need to pull the embedding model separately
docker exec ollama ollama pull nomic-embed-text

# Verify it is available
docker exec ollama ollama list
# Should show nomic-embed-text in the list

RAG returns irrelevant answers

Common causes:

  1. Chunk size too large: Reduce from 1000 to 500-800 characters
  2. Wrong embedding model: Switch from all-minilm to nomic-embed-text for better quality
  3. Too few documents: RAG works better with more source material
  4. Temperature too high: Lower to 0.1-0.3 for factual tasks
  5. Retrieved chunks insufficient: Increase "Top K" from 4 to 6-8 in the retriever settings

Flowise crashes with large PDFs

# Increase Node.js memory limit
# In docker-compose.yml, under flowise environment:
environment:
  - NODE_OPTIONS=--max-old-space-size=4096

Slow response times

# Check GPU utilization
nvidia-smi

# If GPU is not being used, verify Ollama has GPU access
docker exec ollama nvidia-smi

# For CPU-only: use a smaller model
docker exec ollama ollama pull phi3:mini

# Reduce chunk count retrieved (Top K = 3 instead of 6)

Chat history disappears on restart

The default In-Memory Vector Store and Buffer Memory are ephemeral. Switch to:

  • Chroma or Qdrant for vector store persistence
  • Redis-backed memory for conversation history persistence

Both require adding the respective service to your Docker Compose file (see sections above).


Production Deployment Checklist

Before exposing your Flowise chatbot to users:

  • Switch from In-Memory to Chroma or Qdrant vector store
  • Enable API key authentication on the chatflow
  • Set FLOWISE_USERNAME and FLOWISE_PASSWORD to strong values
  • Add Nginx reverse proxy with TLS (see the Ollama + Open WebUI guide for Nginx config)
  • Configure Docker resource limits to prevent OOM kills
  • Set up automated backups for the Flowise data volume
  • Test with adversarial prompts to verify the system prompt holds
  • Monitor GPU memory and set OLLAMA_MAX_LOADED_MODELS=1 if tight on VRAM

What to Build Next

Once you have Flowise + Ollama running:

  • Multi-source RAG: Combine PDFs, web pages, and database content in a single chatbot. Flowise supports mixing document loader types in one flow.
  • Agent with tools: Build an Agentflow that can search the web, query databases, and calculate — all locally with Ollama.
  • Custom embed widget: Style the Flowise chat widget to match your brand. The embed script accepts full theme customization.
  • Feedback loop: Use Flowise's built-in feedback collection to identify questions your chatbot cannot answer, then add those documents.

For the underlying RAG architecture in detail, see the RAG local setup guide. To add your Flowise chatbot into broader automation workflows, connect it to n8n via the Flowise API.

Visit flowiseai.com for the official documentation and community templates.


Need help picking the right local model for your chatbot? See the best Ollama models guide. For a different approach to self-hosted AI chat, check the AnythingLLM setup guide.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Enjoyed this? There are 10 full courses waiting.

10 complete AI courses. From fundamentals to production. Everything runs on your hardware.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: April 10, 2026🔄 Last Updated: April 10, 2026✓ Manually Reviewed
PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor

Build Smarter Chatbots

Flowise templates, RAG optimization tips, and embedding model benchmarks. One email per week, practical content only.

Build Real AI on Your Machine

RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.

Was this helpful?

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators