Free course — 2 free chapters of every course. No credit card.Start learning free
Developer Tutorial

Build a Local AI Slack & Discord Bot with Ollama (Full Tutorial)

April 23, 2026
19 min read
Local AI Master Research Team

Want to go deeper than this article?

The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.

Build a Local AI Slack & Discord Bot with Ollama (Full Tutorial)

Published on April 23, 2026 • 19 min read

The pitch deck for cloud chatbots is always the same: connect your team chat, get an AI assistant. The fine print is also always the same: the assistant reads your messages, your DMs, your customer data, your internal docs, and the vendor stores it long enough to "improve the service." For most engineering teams that is a non-starter.

I built the first version of this bot for a startup whose CTO had a simple request: "Give my team an AI in Slack that doesn't leak our roadmap." The first iteration took an afternoon. The version this guide describes — with RAG over a private docs folder, per-channel rate limiting, threaded replies, and slash commands — took about two days. It has been running on a $40/month hetzner box for nine months and processed roughly 180,000 messages.

This is the production blueprint. Both Slack and Discord. Real Python code that you can drop into a repo and run today.


Quick Start: 8 Minutes to a Working Bot {#quick-start}

If you just want to see a bot reply in your channel:

# Prerequisites: Python 3.11+, Ollama already installed
ollama pull llama3.1:8b

# Slack version
pip install slack-bolt ollama python-dotenv
export SLACK_BOT_TOKEN=xoxb-...
export SLACK_APP_TOKEN=xapp-...
python slack_bot.py

# Or Discord version
pip install discord.py ollama python-dotenv
export DISCORD_TOKEN=...
python discord_bot.py

The minimal Slack bot (full version below) is 35 lines of Python. By the end of this guide you will have something a real team can use without you babysitting it.


Table of Contents

  1. Why Self-Host Your Team AI Bot
  2. Architecture Overview
  3. Hardware & Hosting
  4. Ollama Setup for Bot Workloads
  5. Slack Bot — Full Implementation
  6. Discord Bot — Full Implementation
  7. Adding RAG over Team Docs
  8. Slash Commands & Tools
  9. Rate Limiting & Cost Control
  10. Production Deployment
  11. Pitfalls We Hit (So You Do Not)
  12. FAQs

Why Self-Host Your Team AI Bot {#why-self-host}

Three concrete reasons that came up in customer interviews:

1. Channel content is sensitive by default. Engineering, security, finance, and exec channels routinely contain credentials, customer names, and unannounced product details. Sending them to OpenAI or Anthropic — even with their enterprise privacy promises — creates an audit trail you do not want.

2. Cloud per-message pricing punishes adoption. A 200-person team that adopts an AI bot easily generates 50,000 messages a month routed through the LLM. At GPT-4o pricing that is $300-800/month. A self-hosted bot on a $40 GPU VPS handles the same traffic for the cost of electricity.

3. RAG over private docs needs to stay private. The most useful team bots answer "what does our pricing tier contain?" or "where is the runbook for X service?" That requires indexing internal docs. Cloud RAG means uploading your wiki to a third party. Local RAG keeps it on your hardware.

A well-built local bot is also faster: 200-400ms latency to first token versus 600-1200ms for cloud APIs because the network round trip drops out.


Architecture Overview {#architecture}

┌──────────┐   websocket    ┌────────────┐   HTTP    ┌────────┐
│  Slack   │◄──────────────►│  Bot Proc  │──────────►│ Ollama │
│ Discord  │   socket mode  │  (Python)  │   :11434  │  LLM   │
└──────────┘                └────────────┘           └────────┘
                                  │
                                  ▼
                            ┌─────────────┐
                            │  ChromaDB   │  ← team docs RAG
                            │  (local)    │
                            └─────────────┘

Three components:

  1. Bot process — Python event loop that listens to Slack/Discord events and orchestrates responses.
  2. Ollama — Local LLM server. Same machine or a different one on your network.
  3. ChromaDB (optional) — Vector store for RAG. Docker container on the same host.

No public ingress required. Slack uses Socket Mode and Discord uses websockets, so your bot connects out — no inbound port exposure.


Hardware & Hosting {#hardware}

What you actually need depends on team size:

Team SizeAvg msgs/dayConcurrentHardwareModel
1-10<5001-216 GB RAM, integrated GPUllama3.1:8b
10-501,000-5,0003-532 GB RAM, RTX 3060 12 GBqwen2.5:7b
50-2005,000-20,0005-15RTX 4060 Ti 16 GBqwen2.5:14b
200-100020,000+15-40RTX 4090 or 2× RTX 3090llama3.3:70b-q4

Concrete VPS picks that work:

  • Hetzner GEX44 (RTX 4000 Ada, 16 GB VRAM, 64 GB RAM): €184/month — fits 50-200 user teams
  • Vast.ai RTX 3090: $0.20-0.40/hr on-demand
  • Self-host on a NUC + eGPU: ~$1,500 one-time, runs forever

For team deployments behind your firewall, see Ollama production deployment.


Ollama Setup for Bot Workloads {#ollama-setup}

Default Ollama settings are tuned for single-user laptops. For a bot serving multiple concurrent users, change three things:

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="OLLAMA_NUM_PARALLEL=4"            # 4 concurrent requests
Environment="OLLAMA_MAX_LOADED_MODELS=2"       # keep 2 models hot
Environment="OLLAMA_KEEP_ALIVE=24h"            # don't unload between messages
Environment="OLLAMA_HOST=0.0.0.0:11434"        # if bot is on different host

Then:

sudo systemctl daemon-reload
sudo systemctl restart ollama
ollama pull llama3.1:8b
ollama pull nomic-embed-text   # for RAG

Verify it's serving:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Say hello in 5 words"
}'

Slack Bot — Full Implementation {#slack-bot}

Step 1: Create the Slack App

  1. Go to api.slack.com/apps → Create New App → From scratch
  2. Socket Mode: enable it (you don't need a public URL)
  3. OAuth Scopes (Bot Token Scopes): app_mentions:read, chat:write, channels:history, im:history, im:write, commands
  4. Event Subscriptions → enable → subscribe to app_mention, message.im
  5. Slash Commands → create /ai with description "Ask the local AI"
  6. Install to Workspace — copy the Bot Token (starts with xoxb-)
  7. Basic Information → App-Level Tokens → generate one with connections:write scope (starts with xapp-)

Step 2: The Bot Code

# slack_bot.py
import os
import re
import logging
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
import ollama

logging.basicConfig(level=logging.INFO)
log = logging.getLogger("slack_bot")

OLLAMA_HOST = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
MODEL = os.environ.get("OLLAMA_MODEL", "llama3.1:8b")

app = App(token=os.environ["SLACK_BOT_TOKEN"])
client = ollama.Client(host=OLLAMA_HOST)

SYSTEM_PROMPT = (
    "You are a helpful assistant for an engineering team in Slack. "
    "Be concise. Format code in fenced blocks. "
    "If you do not know, say so. Do not invent facts about the team."
)

def strip_mention(text: str) -> str:
    return re.sub(r"<@[A-Z0-9]+>", "", text).strip()

def ask_ollama(prompt: str, history: list[dict]) -> str:
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.extend(history[-10:])  # keep last 10 turns
    messages.append({"role": "user", "content": prompt})
    resp = client.chat(model=MODEL, messages=messages,
                       options={"temperature": 0.4, "num_predict": 800})
    return resp["message"]["content"].strip()

# Per-thread conversation memory (production: use Redis)
THREAD_HISTORY: dict[str, list[dict]] = {}

@app.event("app_mention")
def on_mention(event, say, client_slack):
    text = strip_mention(event["text"])
    thread_ts = event.get("thread_ts") or event["ts"]
    history = THREAD_HISTORY.setdefault(thread_ts, [])

    # Show "thinking" reaction
    client_slack.reactions_add(channel=event["channel"], name="hourglass_flowing_sand", timestamp=event["ts"])

    try:
        answer = ask_ollama(text, history)
        history.append({"role": "user", "content": text})
        history.append({"role": "assistant", "content": answer})
        say(text=answer, thread_ts=thread_ts)
    except Exception as e:
        log.exception("ollama error")
        say(text=f"Sorry — backend error: {e}", thread_ts=thread_ts)
    finally:
        client_slack.reactions_remove(channel=event["channel"], name="hourglass_flowing_sand", timestamp=event["ts"])

@app.command("/ai")
def slash_ai(ack, respond, command):
    ack()
    prompt = command["text"]
    if not prompt:
        respond("Usage: /ai <your question>")
        return
    answer = ask_ollama(prompt, [])
    respond(text=answer, response_type="in_channel")

@app.event("message")
def on_dm(event, say):
    # Only respond to DMs, ignore channel messages (handled by app_mention)
    if event.get("channel_type") != "im":
        return
    if event.get("subtype") == "bot_message":
        return
    history = THREAD_HISTORY.setdefault(event["channel"], [])
    answer = ask_ollama(event["text"], history)
    history.append({"role": "user", "content": event["text"]})
    history.append({"role": "assistant", "content": answer})
    say(answer)

if __name__ == "__main__":
    handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
    log.info("Slack bot starting...")
    handler.start()

That's a fully functional Slack bot. @bot what does our deploy script do? works in any channel where the bot is added, /ai slash command works anywhere, and DMs work too.


Discord Bot — Full Implementation {#discord-bot}

Step 1: Create the Discord App

  1. Discord Developer Portal → New Application
  2. Bot tab → Reset Token → copy it (this is your DISCORD_TOKEN)
  3. Bot tab → enable Message Content Intent (required to read message text)
  4. OAuth2 → URL Generator → scopes: bot, applications.commands → permissions: Send Messages, Read Messages, Add Reactions, Use Slash Commands → invite the bot to your server

Step 2: The Bot Code

# discord_bot.py
import os
import logging
import asyncio
import discord
from discord import app_commands
import ollama

logging.basicConfig(level=logging.INFO)
log = logging.getLogger("discord_bot")

OLLAMA_HOST = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
MODEL = os.environ.get("OLLAMA_MODEL", "llama3.1:8b")

intents = discord.Intents.default()
intents.message_content = True
client_d = discord.Client(intents=intents)
tree = app_commands.CommandTree(client_d)
ollama_client = ollama.Client(host=OLLAMA_HOST)

SYSTEM_PROMPT = "You are a helpful assistant in Discord. Be concise. Use markdown."

CHANNEL_HISTORY: dict[int, list[dict]] = {}

async def ask_ollama_async(prompt: str, history: list[dict]) -> str:
    def _call():
        messages = [{"role": "system", "content": SYSTEM_PROMPT}]
        messages.extend(history[-10:])
        messages.append({"role": "user", "content": prompt})
        return ollama_client.chat(model=MODEL, messages=messages,
                                  options={"temperature": 0.4, "num_predict": 800})
    resp = await asyncio.to_thread(_call)
    return resp["message"]["content"].strip()

@client_d.event
async def on_ready():
    await tree.sync()
    log.info(f"Logged in as {client_d.user}")

@client_d.event
async def on_message(message: discord.Message):
    if message.author == client_d.user or message.author.bot:
        return
    # Respond on mention or DM
    is_dm = isinstance(message.channel, discord.DMChannel)
    is_mention = client_d.user in message.mentions
    if not (is_dm or is_mention):
        return

    prompt = message.content.replace(f"<@{client_d.user.id}>", "").strip()
    if not prompt:
        return

    history = CHANNEL_HISTORY.setdefault(message.channel.id, [])
    async with message.channel.typing():
        try:
            answer = await ask_ollama_async(prompt, history)
            history.append({"role": "user", "content": prompt})
            history.append({"role": "assistant", "content": answer})
            # Discord max message length is 2000 chars
            for i in range(0, len(answer), 1900):
                await message.reply(answer[i:i+1900], mention_author=False)
        except Exception as e:
            log.exception("ollama error")
            await message.reply(f"Backend error: {e}")

@tree.command(name="ai", description="Ask the local AI a question")
async def slash_ai(interaction: discord.Interaction, prompt: str):
    await interaction.response.defer()
    answer = await ask_ollama_async(prompt, [])
    for i in range(0, len(answer), 1900):
        if i == 0:
            await interaction.followup.send(answer[i:i+1900])
        else:
            await interaction.followup.send(answer[i:i+1900])

if __name__ == "__main__":
    client_d.run(os.environ["DISCORD_TOKEN"])

Mention the bot or DM it — it replies. /ai prompt slash command works server-wide. The 1900-char chunking handles long responses (Discord limits messages to 2000 chars).


Adding RAG over Team Docs {#rag}

This is the killer feature. The bot answers questions from your internal docs, runbooks, and wikis instead of generic training data.

Step 1: Run ChromaDB

docker run -d -p 8000:8000 -v chroma-data:/chroma/chroma --name chroma chromadb/chroma:latest

Step 2: Index Your Docs

# index_docs.py
import os, glob, chromadb
import ollama

ollama_client = ollama.Client(host="http://localhost:11434")
chroma = chromadb.HttpClient(host="localhost", port=8000)
coll = chroma.get_or_create_collection(name="team_docs")

def embed(text: str):
    return ollama_client.embeddings(model="nomic-embed-text", prompt=text)["embedding"]

def chunk(text: str, size: int = 800, overlap: int = 100):
    chunks = []
    for i in range(0, len(text), size - overlap):
        chunks.append(text[i:i+size])
    return chunks

for path in glob.glob("./docs/**/*.md", recursive=True):
    with open(path) as f:
        content = f.read()
    for i, ch in enumerate(chunk(content)):
        coll.upsert(
            ids=[f"{path}:{i}"],
            documents=[ch],
            embeddings=[embed(ch)],
            metadatas=[{"source": path, "chunk": i}],
        )
print("Indexed all docs.")

Run python index_docs.py whenever your docs change. For automatic re-indexing on file changes, wrap it in watchdog.

Step 3: RAG-Enabled Chat Function

Replace the ask_ollama function in either bot:

def ask_with_rag(prompt: str, history: list[dict]) -> str:
    q_embedding = ollama_client.embeddings(model="nomic-embed-text", prompt=prompt)["embedding"]
    results = coll.query(query_embeddings=[q_embedding], n_results=5)
    context = "\n\n---\n\n".join(results["documents"][0])

    augmented_system = (
        SYSTEM_PROMPT + "\n\n"
        "Use the following context from internal team docs to answer. "
        "If the answer is not in the context, say so plainly.\n\n"
        f"CONTEXT:\n{context}"
    )
    messages = [{"role": "system", "content": augmented_system}]
    messages.extend(history[-6:])  # shorter history when context is large
    messages.append({"role": "user", "content": prompt})
    resp = ollama_client.chat(model=MODEL, messages=messages,
                              options={"temperature": 0.2, "num_predict": 800})
    return resp["message"]["content"].strip()

Now @bot what is our deploy procedure? returns answers grounded in your actual runbook. For deeper RAG tuning, see RAG local setup guide.


Slash Commands & Tools {#slash-commands}

Add structured commands beyond /ai:

# Slack
@app.command("/summarize")
def summarize_thread(ack, respond, command, client_slack):
    ack()
    channel = command["channel_id"]
    # Fetch last 50 messages
    history = client_slack.conversations_history(channel=channel, limit=50)
    text = "\n".join(m["text"] for m in history["messages"] if "text" in m)
    summary = ask_ollama(f"Summarize this Slack channel in 5 bullets:\n{text}", [])
    respond(summary, response_type="in_channel")

@app.command("/translate")
def translate(ack, respond, command):
    ack()
    args = command["text"].split(" ", 1)
    if len(args) != 2:
        respond("Usage: /translate <lang_code> <text>")
        return
    lang, text = args
    answer = ask_ollama(f"Translate to {lang}: {text}", [])
    respond(answer)

Discord equivalent uses @tree.command decorator with the same logic.

Useful slash commands seen in the wild:

  • /summarize — collapse a long channel into bullets
  • /translate — quick translation
  • /sql — natural language to SQL
  • /onboard — generate onboarding steps for a new team member
  • /runbook — pull a runbook by name from RAG

Rate Limiting & Cost Control {#rate-limiting}

Without limits, one curious user will lock up the bot for everyone. The pattern:

import time
from collections import defaultdict, deque

USER_REQUESTS: dict[str, deque] = defaultdict(deque)
USER_RATE_LIMIT = 10      # max requests
USER_RATE_WINDOW = 60     # per 60 seconds
GLOBAL_QUEUE_SIZE = 4     # match OLLAMA_NUM_PARALLEL

def check_rate_limit(user_id: str) -> bool:
    now = time.time()
    q = USER_REQUESTS[user_id]
    while q and q[0] < now - USER_RATE_WINDOW:
        q.popleft()
    if len(q) >= USER_RATE_LIMIT:
        return False
    q.append(now)
    return True

In the message handler:

if not check_rate_limit(event["user"]):
    say("You're sending too fast — try again in a minute.", thread_ts=thread_ts)
    return

For team-wide cost monitoring, track tokens per user in Redis or Postgres. Most teams flag any user exceeding 50,000 tokens/day for review.

For multi-user rate-limiting at the Ollama layer itself, see Ollama rate limiting for multi-user setups.


Production Deployment {#deployment}

systemd unit (Linux)

# /etc/systemd/system/ai-bot.service
[Unit]
Description=Local AI Slack/Discord bot
After=network.target ollama.service

[Service]
Type=simple
User=botuser
WorkingDirectory=/opt/ai-bot
EnvironmentFile=/opt/ai-bot/.env
ExecStart=/opt/ai-bot/venv/bin/python slack_bot.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now ai-bot
sudo journalctl -u ai-bot -f

Docker

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "slack_bot.py"]
docker build -t ai-bot .
docker run -d --restart always --env-file .env --name ai-bot --network host ai-bot

Monitoring

Bare minimum: log every request, response length, and latency. Plug into Prometheus for real metrics:

from prometheus_client import Counter, Histogram, start_http_server

REQ = Counter("bot_requests_total", "Total requests", ["channel_type"])
LAT = Histogram("bot_latency_seconds", "End-to-end latency")

start_http_server(9100)

For full observability, see Ollama Prometheus + Grafana.


Pitfalls We Hit (So You Do Not) {#pitfalls}

  1. Slack rate limits are per-method, not global. If you call reactions_add on every message, you will hit 429s under load. Cache per-channel reaction state.
  2. Discord intents must be enabled in the developer portal AND in code. Forgetting one or the other causes silent message-ignore behavior with no error.
  3. Streaming responses break Slack. Slack does not support edit-as-you-stream. Buffer the full response then post once.
  4. OLLAMA_KEEP_ALIVE matters. Default is 5 minutes; once unloaded, first message after a quiet period takes 8-15 seconds to respond. Set it to 24h.
  5. Thread history grows unbounded in memory. Use Redis with a TTL of 24 hours. Otherwise expect to restart the bot weekly.
  6. The model will roleplay as your CEO if asked. Add a system prompt rule: "Never impersonate specific people. Never claim to be a human."
  7. Bot owners get DM'd weird stuff. Add an admin command /audit that lets you see anonymized log samples to spot abuse patterns.

Wrap-Up

A self-hosted team chat bot is one of the highest-leverage pieces of software you can build for your company in 2026. It costs roughly nothing to run, gives non-technical employees an AI helper without leaking proprietary data, and turns your internal documentation into something people actually read. The Slack version above is in production at three companies I know of — including the original startup that asked for "an AI in Slack that doesn't leak our roadmap."

Start with the Quick Start. Get a basic mention working. Then add RAG over your handbook. Then add the two slash commands your team uses most. By month three, your team will treat the bot like a colleague. The fact that nobody outside the company can see anything it does is the part you sell to your security team.


Want a deeper integration story? Read Ollama function calling and tool use for action-taking bots, or private OpenAI-compatible API to expose your bot's brain to other apps.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Enjoyed this? There are 10 full courses waiting.

10 complete AI courses. From fundamentals to production. Everything runs on your hardware.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: April 23, 2026🔄 Last Updated: April 23, 2026✓ Manually Reviewed
PR

Written by Pattanaik Ramswarup

Creator of Local AI Master

I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

Ship Better Local AI Bots

New tutorials, model recommendations, and production patterns for self-hosted AI. One email per week.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.

Was this helpful?

📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators