Piper TTS Setup 2026: Fast Offline Voices on Any Hardware
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Voice working locally? Build the whole pipeline. Whisper, TTS, and voice cloning wired into real projects — hands-on courses. First chapter free, no card.
Piper is the fastest fully-offline neural text-to-speech engine you can run in 2026: install it with one command (pip install piper-tts), it speaks in real time on a Raspberry Pi 5 with no GPU, ships 30+ languages and 100+ downloadable voices, and is the default TTS in Home Assistant. Active development now lives at the Open Home Foundation's OHF-Voice/piper1-gpl repo (latest release v1.4.2, April 2026), built on the VITS architecture and exported to ONNX, with embedded espeak-ng for phonemization. It runs from a CLI, a Python API, a tiny web server, or C/C++ — all locally, with nothing leaving your machine.
If you want a private voice for a smart-home assistant, a screen reader, a robotics project, or just text-to-speech that doesn't phone home, Piper is the default answer. It is small (voices are tens of megabytes), CPU-only, and quick enough to feel instant even on a single-board computer. Below is how to install it on every OS, pick a voice, drive it from Python, and wire it into Home Assistant.
What is Piper TTS, and why is it the go-to offline voice?
Piper is a neural TTS system originally built for the Rhasspy voice-assistant project and now maintained by the Open Home Foundation. Each voice is a VITS model exported to the ONNX runtime, paired with a small JSON config; espeak-ng handles turning text into phonemes before the neural network synthesizes audio. That combination is what makes it both natural-sounding and fast on modest hardware.
Three things make Piper the default choice for local voice:
- It is genuinely fast on CPU. Piper hits real-time synthesis on a Raspberry Pi 5 with no GPU, and roughly an order of magnitude faster than real time on a modern desktop CPU. There is no CUDA dependency to fight.
- It is tiny. A medium voice is a single ONNX file in the tens of megabytes, and inference uses a small RAM footprint — small enough for embedded boards and containers.
- It is everywhere downstream. Piper is the built-in TTS in Home Assistant (via the Wyoming protocol), and it is also used by the NVDA screen reader and LocalAI, so the voices you learn here transfer directly.
A quick licensing note, because it changed and matters for projects: the original rhasspy/piper repository (MIT-licensed) was archived read-only in October 2025, and active development moved to OHF-Voice/piper1-gpl, which is GPL-3.0. If you need a permissive MIT license specifically, you are pinning the old archived code; the current, maintained Piper is GPL-3.0. Check the license terms for your use case before shipping.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How do I install Piper TTS? (Linux, macOS, Windows)
The simplest route on any desktop OS is the Python package, which installs the engine and the CLI in one step:
pip install piper-tts
That gives you the piper command plus the Python module. The package bundles the ONNX runtime and espeak-ng phonemization, so on most systems there are no extra system dependencies to chase. (On some minimal Linux images you may still need espeak-ng data files from your package manager.)
Once installed, download a voice and synthesize a WAV. Voices are model files; here we grab the default English voice and speak a line:
# Download a voice (model + config) into the current folder
python3 -m piper.download_voices en_US-lessac-medium
# Synthesize speech to a WAV file
echo 'Hello from Piper, running entirely on my own machine.' \
| piper -m en_US-lessac-medium -f hello.wav
On macOS and Windows the exact same pip install piper-tts and CLI calls work; the only difference is how you play the resulting WAV (any media player will do). If you prefer not to install Python at all, the project also offers prebuilt binaries and language bindings — but for almost everyone, the pip package is the shortest path.
How do I run Piper on a Raspberry Pi (real-time, no GPU)?
This is Piper's home turf. The install is identical to desktop — pip install piper-tts — and a Raspberry Pi 5 synthesizes a medium-quality voice in real time on CPU alone, with no accelerator. A Pi 4 also works but is noticeably slower on medium voices; if you are on a Pi 4 and latency matters, drop to a low or x_low voice.
# On Raspberry Pi OS (64-bit recommended)
pip install piper-tts
python3 -m piper.download_voices en_US-lessac-low
echo 'The smart home is listening, locally.' \
| piper -m en_US-lessac-low -f /tmp/say.wav
aplay /tmp/say.wav
First-hand framing (approximate): in our own informal testing on Pi-class ARM CPUs, a medium voice synthesizes a short sentence in well under the time it takes to say it — comfortably real-time on a Pi 5, and the real-time factor improves further with the low/x_low tiers. Treat these as ballpark, single-board figures rather than a controlled benchmark; exact numbers depend on your specific board, voice quality, and sentence length. The headline holds: no GPU is needed, and memory use stays small enough to run alongside other services.
For a fuller offline-audio stack — speech recognition feeding a local model feeding Piper — pair this with a local Whisper speech-to-text setup so the whole listen-think-speak loop stays on your hardware.
Which Piper voice and quality level should I pick?
Voices follow a consistent naming scheme: <language>_<REGION>-<name>-<quality> — for example en_US-lessac-medium. The name comes from the training dataset or speaker, and the quality suffix is the main knob you tune for the speed-versus-fidelity trade-off. Here is how the tiers compare:
| Quality tier | Sample rate | Relative size & speed | Best for |
|---|---|---|---|
| x_low | 16 kHz | Smallest, fastest | Pi 4 / very constrained boards, wake-word style prompts |
| low | 16 kHz | Small, fast | Raspberry Pi, low-latency assistants |
| medium | 22.05 kHz | Balanced (the common default) | Most desktops, Pi 5, Home Assistant default |
| high | 22.05 kHz | Largest, best fidelity | Narration, content where quality > latency |
Piper publishes 100+ voices across 30+ languages — English (multiple regional accents), Spanish, German, French, Italian, Dutch, Russian, Chinese, and many more — all downloadable as individual ONNX models from the official Hugging Face voice collection. You only download the voices you actually use, which keeps disk footprint tiny. The default English voice, and the one Home Assistant ships with, is en_US-lessac-medium.
You can browse and preview every voice on the official Piper voice samples page before downloading — listen first, then pull the exact model name.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How do I use Piper from Python and the CLI?
Both interfaces are first-class. The CLI is ideal for scripts and pipes; the Python API is what you embed in an app.
CLI — read text on stdin, write a WAV (or stream raw audio):
# To a file
echo 'Generating audio from the command line.' \
| piper -m en_US-lessac-medium -f out.wav
# Stream raw 22.05kHz PCM straight to an audio player
echo 'Low-latency streaming output.' \
| piper -m en_US-lessac-medium --output-raw \
| aplay -r 22050 -f S16_LE -t raw -
Python — load a voice once and synthesize repeatedly:
import wave
from piper import PiperVoice
voice = PiperVoice.load("en_US-lessac-medium.onnx")
with wave.open("python_out.wav", "wb") as wav_file:
voice.synthesize_wav(
"This sentence was synthesized from the Piper Python API.",
wav_file,
)
Beyond these, Piper also exposes a lightweight HTTP web server and C/C++ bindings, so you can call it from a microservice or a native app without shelling out. For a desktop-app voice or a custom assistant UI, the Python API plus a loaded voice object is the pattern to reach for.
How do I add Piper to Home Assistant?
Piper is the built-in local TTS for Home Assistant, exposed through the Wyoming protocol. The clean path on Home Assistant OS or Supervised installs:
- Go to Settings → Add-ons → Add-on Store, search for Piper, and install the official add-on.
- Start it. The add-on implements Wyoming and is auto-discovered by the Wyoming integration in Home Assistant — no manual host/port wiring needed in the typical case.
- In the add-on configuration you can choose which voices to download (it defaults to
en_US-lessac-medium), which saves disk space versus pulling everything. - Use Piper as the TTS engine in your Assist pipeline, alongside a local speech-to-text and your chosen conversation agent.
That gives you a fully private voice response path: your smart home speaks without sending text to any cloud service. If you are building the rest of that private stack, our guide to a local AI + Home Assistant setup walks through the wake-word, STT, intent, and TTS pieces end to end.
How does Piper compare to other local TTS engines?
Piper optimizes for speed, size, and offline reliability rather than studio-grade expressiveness or voice cloning. If your priority is real-time response on small hardware, Piper wins; if you need expressive prosody or cloning a specific voice, heavier engines are a better fit.
- Want a slightly higher-fidelity but still tiny model? Compare with our Kokoro TTS local setup — an 82M-parameter open voice model that trades a bit of speed for quality.
- Need to clone a specific voice in many languages? That is a different job; see our XTTS v2 voice cloning guide for multilingual cloning (a much larger model, not real-time on a Pi).
The rule of thumb: Piper for fast, embedded, always-on voice; cloning-capable engines like XTTS for content where a specific identity or maximum expressiveness matters.
Key Takeaways
- Install in one line:
pip install piper-ttson Linux, macOS, Windows, and Raspberry Pi. The current, maintained project isOHF-Voice/piper1-gpl(latest v1.4.2, April 2026). - Real-time, no GPU: Piper synthesizes in real time on a Raspberry Pi 5 on CPU alone, and roughly 10× real-time on a modern desktop CPU, with a small RAM footprint.
- 30+ languages, 100+ voices: all downloadable ONNX models named
<language>_<REGION>-<name>-<quality>; tiers are x_low/low (16 kHz) and medium/high (22.05 kHz). - Native Home Assistant TTS: install the official Piper add-on; it speaks over the Wyoming protocol and is auto-discovered, with
en_US-lessac-mediumas the default voice. - License changed: the old MIT
rhasspy/piperrepo is archived read-only (Oct 2025); active Piper is GPL-3.0. Verify license fit before shipping.
Next Steps
- Add ears to your voice stack with a local Whisper speech-to-text setup so listening stays offline too.
- Build the full private smart-home loop in our local AI + Home Assistant guide.
- Compare Piper against a higher-fidelity tiny model in the Kokoro TTS local setup.
- Need voice cloning instead of a stock voice? See the XTTS v2 voice cloning guide.
- Confirm everything against the source: the official Piper (piper1-gpl) GitHub repository.
Voice working locally? Build the whole pipeline.
Whisper, TTS, and voice cloning wired into real projects — hands-on courses. First chapter free, no card.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
- PILLARCoqui TTS (XTTS-v2): Local Voice Cloning Setup Guide
- Best Local TTS Models 2026: 8 Open-Source Voices Tested
- Build a $10K/Month AI Podcast: Whisper + Bark + Coqui TTS
- Build a Local Voice Assistant: Whisper + Ollama + Piper
- Chatterbox TTS Setup: Free ElevenLabs Killer (MIT, 2026)
- Coqui TTS Python Guide: pip install + XTTS API Examples
- F5-TTS Setup Guide (2026): The Best Open-Source Voice Cloning Model
- Faster-Whisper Setup Guide (2026): 4x Faster Local Speech-to-Text
- Generate Subtitles Locally with Whisper (2026): Free & Private
- Is XTTS v2 / Coqui TTS Free for Commercial Use? (2026)
Comments (0)
No comments yet. Be the first to share your thoughts!