Do I need to expose any port to run a Telegram bot from home?

No. Telegram bots support long polling, where your script connects out to Telegram's servers and asks for new messages. This means no inbound port forwarding, no public IP, no SSL certificate, no exposed firewall rule. Your bot can run behind a NAT on a home Wi-Fi network with no special configuration. Webhooks are an alternative that requires a public URL, but for personal use long polling is simpler and just as reliable.

Which model should I use for a personal Telegram bot?

llama3.1:8b is the best default for general chat — fast, friendly, runs on 16 GB RAM. qwen2.5:7b is slightly stronger at structured tasks (code, lists, JSON output). For voice + image + chat in one bot, you also need llava:13b for vision and Whisper base.en for speech-to-text. If you have 24+ GB VRAM, qwen2.5:14b is a meaningful upgrade for harder questions and longer responses.

Can the bot describe photos I send?

Yes, using LLaVA 13B via Ollama. The handler downloads the highest-resolution photo from Telegram, passes it to llava:13b with the user's caption (or 'Describe this image in detail' if no caption), and returns the description. Works well for receipts, screenshots, whiteboard photos, food labels, and most everyday images. For OCR-heavy documents, combine with a RAG workflow so the bot can also answer questions about the extracted text later.

How do I prevent random people from using my bot?

Implement a user-ID allowlist. Find your own Telegram user ID by messaging @userinfobot, then add a check at the top of every handler that rejects any user_id not in your allowed set. Within hours of a public bot going live, strangers will discover and use it — the allowlist takes 5 minutes to add and saves you from token spam, abuse, and accidental exposure of your RAG-indexed personal data.

How does streaming work in Telegram (since Telegram doesn't have native streaming)?

Send a placeholder message ('…'), then edit it with the accumulating text every 25-50 tokens of new content. Telegram allows ~30 edits per minute per chat, so editing every 25 tokens stays well under the limit. The user sees the message progressively grow, similar to ChatGPT streaming. Final edit applies the complete text. Use parse_mode=None during streaming because partial markdown breaks rendering.

Will my bot stay up if I close my laptop?

No. Run it on a machine that stays on — a home server, a mini PC, an old laptop with closed-lid awake setting, or a small VPS. Use systemd (Linux) or Docker with restart=always for automatic recovery. For zero downtime: enable the bot's history persistence to SQLite so context survives restarts, and put Ollama on systemd Restart=always too. A $40-100/year mini PC running 24/7 is the most popular setup for home users.

Is sending messages to my bot more private than ChatGPT?

Substantially yes — the model and your conversation context never leave your hardware. However, Telegram itself does see the messages between your phone and the bot (they go through Telegram's servers). For most personal use this is acceptable; Telegram has stronger privacy guarantees than most messengers but is not zero-knowledge. If you need true end-to-end privacy for AI chat, build the same bot on Signal (which supports bots via signal-cli) or run a local-only chat UI like Open WebUI on your phone via VPN.

Build a Telegram Bot with Local AI (Ollama + Python Tutorial)

Published on April 23, 2026 • 18 min read

I started using Telegram as my "personal AI front-end" two years ago for one specific reason: it works on every device I own, including the kitchen iPad my wife uses, the cheap Android phone I bring camping, and the Linux laptop I do not trust to install any chat client on. Telegram is the universal interface. The trick was making it talk to a model I controlled instead of OpenAI's API.

The bot in this guide is the version I have been running since early 2024. It started as 30 lines that piped messages to Ollama. It now handles voice notes (Whisper transcription), photos (LLaVA vision), long documents (RAG), and streams responses character-by-character so it feels like ChatGPT — except the entire pipeline runs on a $40 hetzner box on my home network, behind no public ports, with no API keys to anyone outside my house.

This guide is the full thing. Tested, working, with copy-paste code. By the end you will have a private AI assistant in your pocket that costs nothing to run and never sees a third-party server.

Quick Start: Working Bot in 7 Minutes {#quick-start}

# 1. Already have Ollama? Skip to step 2.
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1:8b

# 2. Get a Telegram bot token
# Open Telegram, message @BotFather, send /newbot, follow prompts.
# Copy the token (looks like 1234567890:ABCdefGhi-JklMnoPqRstUvWxyZ)

# 3. Install Python deps
pip install python-telegram-bot ollama python-dotenv

# 4. Run the bot
export TELEGRAM_TOKEN="1234567890:ABC..."
python telegram_bot.py

The minimal bot is 40 lines (full code below). Send your bot /start from your Telegram account, then ask it anything. By the end of this guide you will have streaming responses, voice and photo support, RAG over your own docs, and a production deployment that auto-restarts.

Why Build a Personal Telegram AI Bot
Architecture Overview
Hardware & Hosting Options
Get Your Bot Token from BotFather
Minimal Bot — Full Code
Streaming Responses
Voice Notes with Whisper
Photo Understanding with LLaVA
Adding RAG over Personal Docs
Allowlist & Security
Production Deployment
Pitfalls
FAQs

Why Build a Personal Telegram AI Bot {#why-telegram}

Three reasons that hold up after two years of running mine:

1. Telegram is the only chat UI that runs everywhere. iOS, Android, Mac, Windows, Linux, web, and a tablet you do not bother updating. One bot, every device, no per-platform app to maintain.

2. Long polling means no public ingress. Telegram bots can use long polling — your bot script connects out to Telegram's servers and asks for new messages. No port forwarding, no SSL certificate, no exposed home IP. From a security standpoint this is dramatically simpler than running your own web app.

3. Telegram's UX matches an AI assistant naturally. Threading via reply, voice notes, image attachments, file uploads, persistent history. You get all of it for free without writing any UI code.

Compared to a Slack/Discord bot (which I also covered in build a local AI Slack & Discord bot), Telegram wins for personal/family use. The team-bot guide wins for workplace use.

Architecture Overview {#architecture}

[Telegram app] ──► Telegram MTProto servers ◄── [Your bot script] ──► [Ollama]
                                                       │
                                                       ├──► [Whisper for voice]
                                                       │
                                                       └──► [ChromaDB for RAG]

Three things to know:

No inbound connections to your machine. The bot connects out, polls Telegram, fetches new messages, sends replies. Same direction as a browser checking email.
Ollama, Whisper, and ChromaDB all run on the same host (or your home network). Anywhere on private network is fine.
Telegram never sees your model or its responses raw — it sees them as bot messages going to your account. Telegram could theoretically log them, so for truly sensitive content use Telegram's "Secret Chats" — but those don't support bots. Most users find Telegram's privacy posture acceptable for personal AI; if you need true zero-knowledge, build the same bot for Signal instead.

Hardware & Hosting Options {#hardware}

Use Case	Hardware	Monthly Cost
Just you	Old laptop at home, port-forward not needed	$0
You + family	Mini PC (Intel NUC, Beelink) at home	$0 + electricity
Always-on cloud	Hetzner GEX44 (RTX 4000 Ada)	€184
Always-on budget	OVH RISE-1 + ollama on CPU	€17
Burstable	Vast.ai GPU on-demand	$0.20-0.40/hr

For personal use, the home setup wins on every metric. A 2018 mini PC running Ollama on CPU handles 1-2 users with 7B models at acceptable speed (8-15 tokens/sec). Add a $300 used RTX 3060 12 GB and you have GPU acceleration that can serve a small family.

For a deeper hosting walkthrough, see Ollama production deployment.

Get Your Bot Token from BotFather {#botfather}

Open Telegram → search for @BotFather → start chat
Send /newbot
Pick a display name (e.g., "My Local AI")
Pick a username ending in bot (e.g., mylocalai_bot)
BotFather replies with a token like 1234567890:ABCdefGhi-JklMnoPqRstUvWxyZ

Save that token. Anyone with it can impersonate your bot, so treat it like a password. Put it in an .env file, never in git.

While you're with BotFather, configure these for nicer UX:

/setdescription — text shown when users open chat with bot
/setabouttext — short bio in user info popup
/setcommands — type-ahead command list:

start - Start the bot
help - Show available commands
reset - Clear conversation history
voice - Voice note transcription mode
img - Send a photo for analysis

Minimal Bot — Full Code {#minimal-bot}

The 40-line version that already works:

# telegram_bot.py
import os
import logging
import asyncio
from telegram import Update
from telegram.ext import (
    Application, CommandHandler, MessageHandler,
    ContextTypes, filters
)
import ollama

logging.basicConfig(level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s")
log = logging.getLogger("telegram_bot")

OLLAMA_HOST = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
MODEL = os.environ.get("OLLAMA_MODEL", "llama3.1:8b")
client = ollama.AsyncClient(host=OLLAMA_HOST)

# Per-chat conversation history (production: use Redis/SQLite)
HISTORY: dict[int, list[dict]] = {}

SYSTEM_PROMPT = (
    "You are a helpful, concise personal assistant on Telegram. "
    "Use markdown sparingly. Keep responses under 4000 characters."
)

async def start(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await update.message.reply_text(
        "Hi! I'm your local AI. Send any message and I'll respond.\n"
        "Commands: /reset to clear history, /help for more."
    )

async def help_cmd(update: Update, context: ContextTypes.DEFAULT_TYPE):
    await update.message.reply_text(
        "Send text — I'll reply.\n"
        "Send a voice note — I'll transcribe + respond.\n"
        "Send a photo — I'll describe what's in it.\n"
        "/reset — clear our conversation history\n"
    )

async def reset(update: Update, context: ContextTypes.DEFAULT_TYPE):
    HISTORY.pop(update.effective_chat.id, None)
    await update.message.reply_text("History cleared.")

async def chat(update: Update, context: ContextTypes.DEFAULT_TYPE):
    chat_id = update.effective_chat.id
    history = HISTORY.setdefault(chat_id, [])

    user_msg = update.message.text
    log.info(f"[{chat_id}] user: {user_msg[:80]}")

    # Show typing indicator
    await context.bot.send_chat_action(chat_id, "typing")

    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.extend(history[-10:])
    messages.append({"role": "user", "content": user_msg})

    try:
        resp = await client.chat(model=MODEL, messages=messages,
            options={"temperature": 0.5, "num_predict": 800})
        answer = resp["message"]["content"].strip()
        history.append({"role": "user", "content": user_msg})
        history.append({"role": "assistant", "content": answer})
        # Telegram caps message at 4096 chars
        for i in range(0, len(answer), 4000):
            await update.message.reply_text(answer[i:i+4000])
    except Exception as e:
        log.exception("ollama error")
        await update.message.reply_text(f"Backend error: {e}")

def main():
    token = os.environ["TELEGRAM_TOKEN"]
    app = Application.builder().token(token).build()
    app.add_handler(CommandHandler("start", start))
    app.add_handler(CommandHandler("help", help_cmd))
    app.add_handler(CommandHandler("reset", reset))
    app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, chat))
    log.info("Bot starting...")
    app.run_polling()

if __name__ == "__main__":
    main()

Run it with python telegram_bot.py. Open your bot in Telegram, send /start, then ask anything. You should get a response in 1-3 seconds depending on hardware.

Streaming Responses {#streaming}

The minimal bot above waits for the full response then sends it. ChatGPT-like streaming feels much better. The pattern:

async def chat_streaming(update: Update, context: ContextTypes.DEFAULT_TYPE):
    chat_id = update.effective_chat.id
    history = HISTORY.setdefault(chat_id, [])
    user_msg = update.message.text

    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.extend(history[-10:])
    messages.append({"role": "user", "content": user_msg})

    # Send placeholder we will edit
    sent = await update.message.reply_text("…")
    full = ""
    last_edit = 0
    EDIT_EVERY = 25  # tokens

    async for part in await client.chat(model=MODEL, messages=messages, stream=True):
        chunk = part["message"]["content"]
        full += chunk
        if len(full) - last_edit >= EDIT_EVERY:
            try:
                await context.bot.edit_message_text(
                    chat_id=chat_id, message_id=sent.message_id, text=full
                )
                last_edit = len(full)
            except Exception:
                pass  # Telegram throws if text unchanged or rate-limited

    # Final edit with complete text
    try:
        await context.bot.edit_message_text(
            chat_id=chat_id, message_id=sent.message_id, text=full
        )
    except Exception:
        pass

    history.append({"role": "user", "content": user_msg})
    history.append({"role": "assistant", "content": full})

Replace the chat handler with chat_streaming. Two things to know:

Telegram rate-limits message edits to ~30/minute per chat. Edit every 25 tokens (not every token) to stay under the limit.
Use parse_mode=None while streaming because partial markdown breaks. After the final edit, you can re-edit with parse_mode="Markdown" if you want formatting.

Voice Notes with Whisper {#voice}

This is the killer feature for mobile use — speak instead of type.

Step 1: Install whisper.cpp

# Mac
brew install whisper-cpp

# Linux
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
bash ./models/download-ggml-model.sh base.en

Step 2: Add Voice Handler

import subprocess
import tempfile

WHISPER_BIN = "/usr/local/bin/whisper-cli"  # adjust path
WHISPER_MODEL = "/path/to/whisper.cpp/models/ggml-base.en.bin"

async def voice(update: Update, context: ContextTypes.DEFAULT_TYPE):
    chat_id = update.effective_chat.id
    voice_file = await update.message.voice.get_file()

    with tempfile.NamedTemporaryFile(suffix=".ogg", delete=False) as f:
        await voice_file.download_to_drive(f.name)
        ogg_path = f.name

    # Convert ogg to wav (whisper.cpp needs 16kHz wav)
    wav_path = ogg_path.replace(".ogg", ".wav")
    subprocess.run([
        "ffmpeg", "-y", "-i", ogg_path,
        "-ar", "16000", "-ac", "1", wav_path
    ], check=True, capture_output=True)

    # Transcribe
    result = subprocess.run([
        WHISPER_BIN, "-m", WHISPER_MODEL, "-f", wav_path, "-otxt", "-of", wav_path[:-4]
    ], check=True, capture_output=True)

    with open(wav_path[:-4] + ".txt") as f:
        transcript = f.read().strip()

    # Cleanup
    os.unlink(ogg_path); os.unlink(wav_path); os.unlink(wav_path[:-4] + ".txt")

    # Show transcript, then respond as if it were a text message
    await update.message.reply_text(f"_Transcribed:_ {transcript}", parse_mode="Markdown")
    update.message.text = transcript  # forge text and route to chat handler
    await chat_streaming(update, context)

# Register
app.add_handler(MessageHandler(filters.VOICE, voice))

A 30-second voice note transcribes in 1-2 seconds on a modern laptop. The bot first echoes the transcript (so you can verify) then responds to it.

Photo Understanding with LLaVA {#photos}

Drop a photo into Telegram and have your bot describe it.

ollama pull llava:13b

async def photo(update: Update, context: ContextTypes.DEFAULT_TYPE):
    chat_id = update.effective_chat.id
    photo_file = await update.message.photo[-1].get_file()  # highest resolution

    with tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) as f:
        await photo_file.download_to_drive(f.name)
        img_path = f.name

    caption = update.message.caption or "Describe this image in detail."

    await context.bot.send_chat_action(chat_id, "typing")

    resp = await client.chat(
        model="llava:13b",
        messages=[{"role": "user", "content": caption, "images": [img_path]}]
    )
    answer = resp["message"]["content"].strip()
    await update.message.reply_text(answer)
    os.unlink(img_path)

app.add_handler(MessageHandler(filters.PHOTO, photo))

LLaVA 13B handles most everyday images well: receipts, screenshots, photos of whiteboards, food labels in foreign languages. For a true OCR workflow (turning a photo of a document into text + Q&A), see the RAG section below combined with this handler.

Adding RAG over Personal Docs {#rag}

The most useful upgrade for a personal Telegram bot: it knows everything about your notes, recipes, runbooks, or whatever folder you point it at.

Step 1: Run ChromaDB

docker run -d -p 8000:8000 -v chroma:/chroma/chroma \
  --name chroma chromadb/chroma:latest

Step 2: Index Your Files

# index.py — run once, then re-run when files change
import os, glob, chromadb, ollama

ollama_client = ollama.Client(host="http://localhost:11434")
chroma = chromadb.HttpClient(host="localhost", port=8000)
coll = chroma.get_or_create_collection(name="personal")

def chunk(text, size=800, overlap=100):
    return [text[i:i+size] for i in range(0, len(text), size-overlap)]

def embed(text):
    return ollama_client.embeddings(model="nomic-embed-text", prompt=text)["embedding"]

for path in glob.glob("./mynotes/**/*.md", recursive=True):
    text = open(path).read()
    for i, ch in enumerate(chunk(text)):
        coll.upsert(ids=[f"{path}:{i}"], documents=[ch], embeddings=[embed(ch)],
                    metadatas=[{"source": path}])
print("Indexed.")

Step 3: RAG-Enabled Chat

async def chat_rag(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user_msg = update.message.text
    q_emb = ollama_client.embeddings(model="nomic-embed-text", prompt=user_msg)["embedding"]
    results = coll.query(query_embeddings=[q_emb], n_results=5)
    context_text = "\n\n---\n\n".join(results["documents"][0])

    aug_system = (
        SYSTEM_PROMPT + "\n\n"
        "Use the following context from the user's personal notes to answer. "
        "If not in context, use general knowledge but say so.\n\n"
        f"CONTEXT:\n{context_text}"
    )
    # ... rest same as chat handler with aug_system instead of SYSTEM_PROMPT

Now @bot what was that pasta sauce I made last summer? finds the actual recipe in your notes folder. For deeper RAG tuning (chunk size, embedding choice, hybrid search), see RAG local setup guide.

Allowlist & Security {#security}

Anyone who finds your bot's username can message it. If you do not lock it down, strangers will eat your tokens (and read whatever your RAG returns).

Allowlist by User ID

Find your Telegram user ID by messaging @userinfobot. Then:

ALLOWED_USERS = {123456789, 987654321}  # your IDs

async def auth_check(update: Update) -> bool:
    user_id = update.effective_user.id
    if user_id not in ALLOWED_USERS:
        await update.message.reply_text(
            "This bot is private. Contact the owner."
        )
        log.warning(f"Denied user {user_id} ({update.effective_user.username})")
        return False
    return True

# At top of every handler:
if not await auth_check(update):
    return

Rate Limiting

import time
from collections import defaultdict, deque

USER_REQS: dict[int, deque] = defaultdict(deque)

def rate_ok(user_id: int, max_n=20, window=60) -> bool:
    now = time.time()
    q = USER_REQS[user_id]
    while q and q[0] < now - window:
        q.popleft()
    if len(q) >= max_n:
        return False
    q.append(now)
    return True

Don't Log Sensitive Content

Replace log.info(f"user: {user_msg[:80]}") with log.info(f"user message len={len(user_msg)}") once you start using RAG over personal data. Logs are a side channel.

Production Deployment {#deployment}

systemd

# /etc/systemd/system/tgbot.service
[Unit]
Description=Local AI Telegram bot
After=network.target ollama.service

[Service]
Type=simple
User=botuser
WorkingDirectory=/opt/tgbot
EnvironmentFile=/opt/tgbot/.env
ExecStart=/opt/tgbot/venv/bin/python telegram_bot.py
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable --now tgbot
journalctl -u tgbot -f

Docker Alternative

FROM python:3.12-slim
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "telegram_bot.py"]

docker build -t tgbot .
docker run -d --restart always --env-file .env --name tgbot --network host tgbot

Persistence

In-memory HISTORY: dict works fine for personal use. For multi-user or long-running bots, use SQLite:

import sqlite3, json
conn = sqlite3.connect("history.db")
conn.execute("CREATE TABLE IF NOT EXISTS h (chat_id INT PRIMARY KEY, history TEXT)")

Persisting saves you when systemd restarts and means the bot remembers conversations across deploys.

Monitoring

from prometheus_client import Counter, Histogram, start_http_server

REQ = Counter("tgbot_requests_total", "Total requests", ["kind"])
LAT = Histogram("tgbot_latency_seconds", "Latency seconds")

start_http_server(9101)

For full Prometheus + Grafana setup, see Ollama monitoring guide.

Pitfalls {#pitfalls}

Forgetting to allowlist. Within hours of going live, random users will find your bot. Add the allowlist before your first session.
Telegram message length cap (4096 chars). Always loop for i in range(0, len(text), 4000). Otherwise long responses 500-error.
Stream edits getting rate-limited. Edit every 25-50 tokens, not every token. Telegram allows ~30 edits/minute per chat.
Voice transcription latency on first run. Whisper loads the model lazily; the first transcription takes 5-10x longer. "Warm up" Whisper at startup with a dummy file.
LLaVA model unloading. Ollama unloads models after 5 minutes by default. For mixed text+vision bots, set OLLAMA_KEEP_ALIVE=24h and OLLAMA_MAX_LOADED_MODELS=2.
Using the python-telegram-bot v13 syntax. v20+ is async. Many StackOverflow answers are stale. Stick to the official v20+ docs.
Storing the bot token in code. Use .env and python-dotenv. Add .env to .gitignore.

Wrap-Up

Telegram + Ollama is the cleanest "personal AI in your pocket" setup that exists in 2026. It costs nothing once installed, runs on hardware you already own, and gives you the universal interface (works on every device) plus the modalities you actually need (text, voice, photo, document Q&A) without writing a single line of UI code.

The bot in this guide has been the single most-used piece of software I have written in two years. My wife uses it more than ChatGPT. My family group chat has a copy that helps with shopping lists. My parents use the voice transcription mode because typing on a phone is hard for them. The fact that none of those messages, voice notes, or photos ever leave our home network is what makes me comfortable having it become that personal.

If you set this up this weekend, Monday morning will start with you using your bot to summarize your inbox while making coffee, voice-noting your meeting prep on the bus, and querying your RAG-indexed notes folder for the API key you wrote down six months ago. Highly recommended.

Looking for the workplace version? Build a local AI Slack & Discord bot covers team chat. For deeper integration patterns, see Ollama function calling and tool use.

Build a Telegram Bot with Local AI (Ollama + Python Tutorial)

Want to go deeper than this article?

Build a Telegram Bot with Local AI (Ollama + Python Tutorial)

Quick Start: Working Bot in 7 Minutes {#quick-start}

Table of Contents

Why Build a Personal Telegram AI Bot {#why-telegram}

Architecture Overview {#architecture}

Hardware & Hosting Options {#hardware}

Get Your Bot Token from BotFather {#botfather}

Minimal Bot — Full Code {#minimal-bot}

Streaming Responses {#streaming}

Voice Notes with Whisper {#voice}

Step 1: Install whisper.cpp

Step 2: Add Voice Handler

Photo Understanding with LLaVA {#photos}

Adding RAG over Personal Docs {#rag}

Step 1: Run ChromaDB

Step 2: Index Your Files

Step 3: RAG-Enabled Chat

Allowlist & Security {#security}

Allowlist by User ID

Rate Limiting

Don't Log Sensitive Content

Production Deployment {#deployment}

systemd

Docker Alternative

Persistence

Monitoring

Pitfalls {#pitfalls}

Wrap-Up

Go from reading about AI to building with AI

Enjoyed this? There are 10 full courses waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by Pattanaik Ramswarup

Build Better Local AI Tools

Build Real AI on Your Machine

🎓 Continue Learning

Related Guides

Build a Local AI Slack & Discord Bot

Ollama Python API Guide

Ollama Function Calling and Tool Use

Ollama Production Deployment

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI