Build a Telegram Bot with Local AI (Ollama + Python Tutorial)
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
Build a Telegram Bot with Local AI (Ollama + Python Tutorial)
Published on April 23, 2026 • 18 min read
I started using Telegram as my "personal AI front-end" two years ago for one specific reason: it works on every device I own, including the kitchen iPad my wife uses, the cheap Android phone I bring camping, and the Linux laptop I do not trust to install any chat client on. Telegram is the universal interface. The trick was making it talk to a model I controlled instead of OpenAI's API.
The bot in this guide is the version I have been running since early 2024. It started as 30 lines that piped messages to Ollama. It now handles voice notes (Whisper transcription), photos (LLaVA vision), long documents (RAG), and streams responses character-by-character so it feels like ChatGPT — except the entire pipeline runs on a $40 hetzner box on my home network, behind no public ports, with no API keys to anyone outside my house.
This guide is the full thing. Tested, working, with copy-paste code. By the end you will have a private AI assistant in your pocket that costs nothing to run and never sees a third-party server.
Quick Start: Working Bot in 7 Minutes {#quick-start}
# 1. Already have Ollama? Skip to step 2.
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b
# 2. Get a Telegram bot token
# Open Telegram, message @BotFather, send /newbot, follow prompts.
# Copy the token (looks like 1234567890:ABCdefGhi-JklMnoPqRstUvWxyZ)
# 3. Install Python deps
pip install python-telegram-bot ollama python-dotenv
# 4. Run the bot
export TELEGRAM_TOKEN="1234567890:ABC..."
python telegram_bot.py
The minimal bot is 40 lines (full code below). Send your bot /start from your Telegram account, then ask it anything. By the end of this guide you will have streaming responses, voice and photo support, RAG over your own docs, and a production deployment that auto-restarts.
Table of Contents
- Why Build a Personal Telegram AI Bot
- Architecture Overview
- Hardware & Hosting Options
- Get Your Bot Token from BotFather
- Minimal Bot — Full Code
- Streaming Responses
- Voice Notes with Whisper
- Photo Understanding with LLaVA
- Adding RAG over Personal Docs
- Allowlist & Security
- Production Deployment
- Pitfalls
- FAQs
Why Build a Personal Telegram AI Bot {#why-telegram}
Three reasons that hold up after two years of running mine:
1. Telegram is the only chat UI that runs everywhere. iOS, Android, Mac, Windows, Linux, web, and a tablet you do not bother updating. One bot, every device, no per-platform app to maintain.
2. Long polling means no public ingress. Telegram bots can use long polling — your bot script connects out to Telegram's servers and asks for new messages. No port forwarding, no SSL certificate, no exposed home IP. From a security standpoint this is dramatically simpler than running your own web app.
3. Telegram's UX matches an AI assistant naturally. Threading via reply, voice notes, image attachments, file uploads, persistent history. You get all of it for free without writing any UI code.
Compared to a Slack/Discord bot (which I also covered in build a local AI Slack & Discord bot), Telegram wins for personal/family use. The team-bot guide wins for workplace use.
Architecture Overview {#architecture}
[Telegram app] ──► Telegram MTProto servers ◄── [Your bot script] ──► [Ollama]
│
├──► [Whisper for voice]
│
└──► [ChromaDB for RAG]
Three things to know:
- No inbound connections to your machine. The bot connects out, polls Telegram, fetches new messages, sends replies. Same direction as a browser checking email.
- Ollama, Whisper, and ChromaDB all run on the same host (or your home network). Anywhere on private network is fine.
- Telegram never sees your model or its responses raw — it sees them as bot messages going to your account. Telegram could theoretically log them, so for truly sensitive content use Telegram's "Secret Chats" — but those don't support bots. Most users find Telegram's privacy posture acceptable for personal AI; if you need true zero-knowledge, build the same bot for Signal instead.
Hardware & Hosting Options {#hardware}
| Use Case | Hardware | Monthly Cost |
|---|---|---|
| Just you | Old laptop at home, port-forward not needed | $0 |
| You + family | Mini PC (Intel NUC, Beelink) at home | $0 + electricity |
| Always-on cloud | Hetzner GEX44 (RTX 4000 Ada) | €184 |
| Always-on budget | OVH RISE-1 + ollama on CPU | €17 |
| Burstable | Vast.ai GPU on-demand | $0.20-0.40/hr |
For personal use, the home setup wins on every metric. A 2018 mini PC running Ollama on CPU handles 1-2 users with 7B models at acceptable speed (8-15 tokens/sec). Add a $300 used RTX 3060 12 GB and you have GPU acceleration that can serve a small family.
For a deeper hosting walkthrough, see Ollama production deployment.
Get Your Bot Token from BotFather {#botfather}
- Open Telegram → search for
@BotFather→ start chat - Send
/newbot - Pick a display name (e.g., "My Local AI")
- Pick a username ending in
bot(e.g.,mylocalai_bot) - BotFather replies with a token like
1234567890:ABCdefGhi-JklMnoPqRstUvWxyZ
Save that token. Anyone with it can impersonate your bot, so treat it like a password. Put it in an .env file, never in git.
While you're with BotFather, configure these for nicer UX:
/setdescription— text shown when users open chat with bot/setabouttext— short bio in user info popup/setcommands— type-ahead command list:
start - Start the bot
help - Show available commands
reset - Clear conversation history
voice - Voice note transcription mode
img - Send a photo for analysis
Minimal Bot — Full Code {#minimal-bot}
The 40-line version that already works:
# telegram_bot.py
import os
import logging
import asyncio
from telegram import Update
from telegram.ext import (
Application, CommandHandler, MessageHandler,
ContextTypes, filters
)
import ollama
logging.basicConfig(level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s")
log = logging.getLogger("telegram_bot")
OLLAMA_HOST = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
MODEL = os.environ.get("OLLAMA_MODEL", "llama3.1:8b")
client = ollama.AsyncClient(host=OLLAMA_HOST)
# Per-chat conversation history (production: use Redis/SQLite)
HISTORY: dict[int, list[dict]] = {}
SYSTEM_PROMPT = (
"You are a helpful, concise personal assistant on Telegram. "
"Use markdown sparingly. Keep responses under 4000 characters."
)
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
await update.message.reply_text(
"Hi! I'm your local AI. Send any message and I'll respond.\n"
"Commands: /reset to clear history, /help for more."
)
async def help_cmd(update: Update, context: ContextTypes.DEFAULT_TYPE):
await update.message.reply_text(
"Send text — I'll reply.\n"
"Send a voice note — I'll transcribe + respond.\n"
"Send a photo — I'll describe what's in it.\n"
"/reset — clear our conversation history\n"
)
async def reset(update: Update, context: ContextTypes.DEFAULT_TYPE):
HISTORY.pop(update.effective_chat.id, None)
await update.message.reply_text("History cleared.")
async def chat(update: Update, context: ContextTypes.DEFAULT_TYPE):
chat_id = update.effective_chat.id
history = HISTORY.setdefault(chat_id, [])
user_msg = update.message.text
log.info(f"[{chat_id}] user: {user_msg[:80]}")
# Show typing indicator
await context.bot.send_chat_action(chat_id, "typing")
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
messages.extend(history[-10:])
messages.append({"role": "user", "content": user_msg})
try:
resp = await client.chat(model=MODEL, messages=messages,
options={"temperature": 0.5, "num_predict": 800})
answer = resp["message"]["content"].strip()
history.append({"role": "user", "content": user_msg})
history.append({"role": "assistant", "content": answer})
# Telegram caps message at 4096 chars
for i in range(0, len(answer), 4000):
await update.message.reply_text(answer[i:i+4000])
except Exception as e:
log.exception("ollama error")
await update.message.reply_text(f"Backend error: {e}")
def main():
token = os.environ["TELEGRAM_TOKEN"]
app = Application.builder().token(token).build()
app.add_handler(CommandHandler("start", start))
app.add_handler(CommandHandler("help", help_cmd))
app.add_handler(CommandHandler("reset", reset))
app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, chat))
log.info("Bot starting...")
app.run_polling()
if __name__ == "__main__":
main()
Run it with python telegram_bot.py. Open your bot in Telegram, send /start, then ask anything. You should get a response in 1-3 seconds depending on hardware.
Streaming Responses {#streaming}
The minimal bot above waits for the full response then sends it. ChatGPT-like streaming feels much better. The pattern:
async def chat_streaming(update: Update, context: ContextTypes.DEFAULT_TYPE):
chat_id = update.effective_chat.id
history = HISTORY.setdefault(chat_id, [])
user_msg = update.message.text
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
messages.extend(history[-10:])
messages.append({"role": "user", "content": user_msg})
# Send placeholder we will edit
sent = await update.message.reply_text("…")
full = ""
last_edit = 0
EDIT_EVERY = 25 # tokens
async for part in await client.chat(model=MODEL, messages=messages, stream=True):
chunk = part["message"]["content"]
full += chunk
if len(full) - last_edit >= EDIT_EVERY:
try:
await context.bot.edit_message_text(
chat_id=chat_id, message_id=sent.message_id, text=full
)
last_edit = len(full)
except Exception:
pass # Telegram throws if text unchanged or rate-limited
# Final edit with complete text
try:
await context.bot.edit_message_text(
chat_id=chat_id, message_id=sent.message_id, text=full
)
except Exception:
pass
history.append({"role": "user", "content": user_msg})
history.append({"role": "assistant", "content": full})
Replace the chat handler with chat_streaming. Two things to know:
- Telegram rate-limits message edits to ~30/minute per chat. Edit every 25 tokens (not every token) to stay under the limit.
- Use
parse_mode=Nonewhile streaming because partial markdown breaks. After the final edit, you can re-edit withparse_mode="Markdown"if you want formatting.
Voice Notes with Whisper {#voice}
This is the killer feature for mobile use — speak instead of type.
Step 1: Install whisper.cpp
# Mac
brew install whisper-cpp
# Linux
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
bash ./models/download-ggml-model.sh base.en
Step 2: Add Voice Handler
import subprocess
import tempfile
WHISPER_BIN = "/usr/local/bin/whisper-cli" # adjust path
WHISPER_MODEL = "/path/to/whisper.cpp/models/ggml-base.en.bin"
async def voice(update: Update, context: ContextTypes.DEFAULT_TYPE):
chat_id = update.effective_chat.id
voice_file = await update.message.voice.get_file()
with tempfile.NamedTemporaryFile(suffix=".ogg", delete=False) as f:
await voice_file.download_to_drive(f.name)
ogg_path = f.name
# Convert ogg to wav (whisper.cpp needs 16kHz wav)
wav_path = ogg_path.replace(".ogg", ".wav")
subprocess.run([
"ffmpeg", "-y", "-i", ogg_path,
"-ar", "16000", "-ac", "1", wav_path
], check=True, capture_output=True)
# Transcribe
result = subprocess.run([
WHISPER_BIN, "-m", WHISPER_MODEL, "-f", wav_path, "-otxt", "-of", wav_path[:-4]
], check=True, capture_output=True)
with open(wav_path[:-4] + ".txt") as f:
transcript = f.read().strip()
# Cleanup
os.unlink(ogg_path); os.unlink(wav_path); os.unlink(wav_path[:-4] + ".txt")
# Show transcript, then respond as if it were a text message
await update.message.reply_text(f"_Transcribed:_ {transcript}", parse_mode="Markdown")
update.message.text = transcript # forge text and route to chat handler
await chat_streaming(update, context)
# Register
app.add_handler(MessageHandler(filters.VOICE, voice))
A 30-second voice note transcribes in 1-2 seconds on a modern laptop. The bot first echoes the transcript (so you can verify) then responds to it.
Photo Understanding with LLaVA {#photos}
Drop a photo into Telegram and have your bot describe it.
ollama pull llava:13b
async def photo(update: Update, context: ContextTypes.DEFAULT_TYPE):
chat_id = update.effective_chat.id
photo_file = await update.message.photo[-1].get_file() # highest resolution
with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f:
await photo_file.download_to_drive(f.name)
img_path = f.name
caption = update.message.caption or "Describe this image in detail."
await context.bot.send_chat_action(chat_id, "typing")
resp = await client.chat(
model="llava:13b",
messages=[{"role": "user", "content": caption, "images": [img_path]}]
)
answer = resp["message"]["content"].strip()
await update.message.reply_text(answer)
os.unlink(img_path)
app.add_handler(MessageHandler(filters.PHOTO, photo))
LLaVA 13B handles most everyday images well: receipts, screenshots, photos of whiteboards, food labels in foreign languages. For a true OCR workflow (turning a photo of a document into text + Q&A), see the RAG section below combined with this handler.
Adding RAG over Personal Docs {#rag}
The most useful upgrade for a personal Telegram bot: it knows everything about your notes, recipes, runbooks, or whatever folder you point it at.
Step 1: Run ChromaDB
docker run -d -p 8000:8000 -v chroma:/chroma/chroma \
--name chroma chromadb/chroma:latest
Step 2: Index Your Files
# index.py — run once, then re-run when files change
import os, glob, chromadb, ollama
ollama_client = ollama.Client(host="http://localhost:11434")
chroma = chromadb.HttpClient(host="localhost", port=8000)
coll = chroma.get_or_create_collection(name="personal")
def chunk(text, size=800, overlap=100):
return [text[i:i+size] for i in range(0, len(text), size-overlap)]
def embed(text):
return ollama_client.embeddings(model="nomic-embed-text", prompt=text)["embedding"]
for path in glob.glob("./mynotes/**/*.md", recursive=True):
text = open(path).read()
for i, ch in enumerate(chunk(text)):
coll.upsert(ids=[f"{path}:{i}"], documents=[ch], embeddings=[embed(ch)],
metadatas=[{"source": path}])
print("Indexed.")
Step 3: RAG-Enabled Chat
async def chat_rag(update: Update, context: ContextTypes.DEFAULT_TYPE):
user_msg = update.message.text
q_emb = ollama_client.embeddings(model="nomic-embed-text", prompt=user_msg)["embedding"]
results = coll.query(query_embeddings=[q_emb], n_results=5)
context_text = "\n\n---\n\n".join(results["documents"][0])
aug_system = (
SYSTEM_PROMPT + "\n\n"
"Use the following context from the user's personal notes to answer. "
"If not in context, use general knowledge but say so.\n\n"
f"CONTEXT:\n{context_text}"
)
# ... rest same as chat handler with aug_system instead of SYSTEM_PROMPT
Now @bot what was that pasta sauce I made last summer? finds the actual recipe in your notes folder. For deeper RAG tuning (chunk size, embedding choice, hybrid search), see RAG local setup guide.
Allowlist & Security {#security}
Anyone who finds your bot's username can message it. If you do not lock it down, strangers will eat your tokens (and read whatever your RAG returns).
Allowlist by User ID
Find your Telegram user ID by messaging @userinfobot. Then:
ALLOWED_USERS = {123456789, 987654321} # your IDs
async def auth_check(update: Update) -> bool:
user_id = update.effective_user.id
if user_id not in ALLOWED_USERS:
await update.message.reply_text(
"This bot is private. Contact the owner."
)
log.warning(f"Denied user {user_id} ({update.effective_user.username})")
return False
return True
# At top of every handler:
if not await auth_check(update):
return
Rate Limiting
import time
from collections import defaultdict, deque
USER_REQS: dict[int, deque] = defaultdict(deque)
def rate_ok(user_id: int, max_n=20, window=60) -> bool:
now = time.time()
q = USER_REQS[user_id]
while q and q[0] < now - window:
q.popleft()
if len(q) >= max_n:
return False
q.append(now)
return True
Don't Log Sensitive Content
Replace log.info(f"user: {user_msg[:80]}") with log.info(f"user message len={len(user_msg)}") once you start using RAG over personal data. Logs are a side channel.
Production Deployment {#deployment}
systemd
# /etc/systemd/system/tgbot.service
[Unit]
Description=Local AI Telegram bot
After=network.target ollama.service
[Service]
Type=simple
User=botuser
WorkingDirectory=/opt/tgbot
EnvironmentFile=/opt/tgbot/.env
ExecStart=/opt/tgbot/venv/bin/python telegram_bot.py
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now tgbot
journalctl -u tgbot -f
Docker Alternative
FROM python:3.12-slim
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "telegram_bot.py"]
docker build -t tgbot .
docker run -d --restart always --env-file .env --name tgbot --network host tgbot
Persistence
In-memory HISTORY: dict works fine for personal use. For multi-user or long-running bots, use SQLite:
import sqlite3, json
conn = sqlite3.connect("history.db")
conn.execute("CREATE TABLE IF NOT EXISTS h (chat_id INT PRIMARY KEY, history TEXT)")
Persisting saves you when systemd restarts and means the bot remembers conversations across deploys.
Monitoring
from prometheus_client import Counter, Histogram, start_http_server
REQ = Counter("tgbot_requests_total", "Total requests", ["kind"])
LAT = Histogram("tgbot_latency_seconds", "Latency seconds")
start_http_server(9101)
For full Prometheus + Grafana setup, see Ollama monitoring guide.
Pitfalls {#pitfalls}
- Forgetting to allowlist. Within hours of going live, random users will find your bot. Add the allowlist before your first session.
- Telegram message length cap (4096 chars). Always loop
for i in range(0, len(text), 4000). Otherwise long responses 500-error. - Stream edits getting rate-limited. Edit every 25-50 tokens, not every token. Telegram allows ~30 edits/minute per chat.
- Voice transcription latency on first run. Whisper loads the model lazily; the first transcription takes 5-10x longer. "Warm up" Whisper at startup with a dummy file.
- LLaVA model unloading. Ollama unloads models after 5 minutes by default. For mixed text+vision bots, set
OLLAMA_KEEP_ALIVE=24handOLLAMA_MAX_LOADED_MODELS=2. - Using the python-telegram-bot v13 syntax. v20+ is async. Many StackOverflow answers are stale. Stick to the official v20+ docs.
- Storing the bot token in code. Use
.envandpython-dotenv. Add.envto.gitignore.
Wrap-Up
Telegram + Ollama is the cleanest "personal AI in your pocket" setup that exists in 2026. It costs nothing once installed, runs on hardware you already own, and gives you the universal interface (works on every device) plus the modalities you actually need (text, voice, photo, document Q&A) without writing a single line of UI code.
The bot in this guide has been the single most-used piece of software I have written in two years. My wife uses it more than ChatGPT. My family group chat has a copy that helps with shopping lists. My parents use the voice transcription mode because typing on a phone is hard for them. The fact that none of those messages, voice notes, or photos ever leave our home network is what makes me comfortable having it become that personal.
If you set this up this weekend, Monday morning will start with you using your bot to summarize your inbox while making coffee, voice-noting your meeting prep on the bus, and querying your RAG-indexed notes folder for the API key you wrote down six months ago. Highly recommended.
Looking for the workplace version? Build a local AI Slack & Discord bot covers team chat. For deeper integration patterns, see Ollama function calling and tool use.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!