To use Coqui TTS in Python in 2026, install the maintained fork with pip install coqui-tts (the original pip install TTS from coqui-ai is abandoned), then run from TTS.api import TTS and call tts.tts_to_file(text=..., file_path="out.wav", speaker_wav="voice.wav", language="en") with the XTTS v2 model. The package you actually want is coqui-tts v0.27.5 (released January 26, 2026), the community-maintained idiap/coqui-ai-TTS fork that works on Python 3.10 through 3.14. XTTS v2 clones a voice from a few seconds of reference audio, speaks 17 languages, and outputs 24 kHz audio.

If you searched "coqui tts python" and landed on a wall of import errors, you are almost certainly fighting the dead original package. This guide gives you the install command that actually works in 2026, the full tts_to_file() API with every argument that matters, voice cloning, streaming, and the four errors that trip up nearly everyone.

Which Coqui TTS package do I install in 2026?

This is the single most important thing on the page, so let's settle it first. Coqui.ai, the startup behind the original library, shut down in early 2024, and the original coqui-ai/TTS repository is no longer maintained. The PyPI package literally named TTS still installs, but it pins old dependencies and breaks on modern Python and PyTorch.

The community moved to the idiap/coqui-ai-TTS fork, published on PyPI as coqui-tts. It is the same codebase and the same TTS.api import path, just patched to run on current Python and dependencies.

Package	PyPI name	Status (mid-2026)	Python support
Coqui TTS (idiap fork)	`coqui-tts`	✅ Maintained — v0.27.5, Jan 26 2026	3.10 – 3.14
Original Coqui	`TTS`	❌ Unmaintained since early 2024	breaks on new Python

So install the fork:

pip install coqui-tts

The import path stays from TTS.api import TTS — the fork deliberately kept it identical so old tutorials still work once you swap the install line. If you only ever copy one line from this page, copy that pip install coqui-tts.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How do I generate speech with tts_to_file()?

Here is the minimal "text to a WAV file" example using the multilingual XTTS v2 model and one of its built-in studio speakers (no reference clip needed):

import torch
from TTS.api import TTS

device = "cuda" if torch.cuda.is_available() else "cpu"

# First run downloads the XTTS v2 model (~1.8 GB) and prompts you to accept the license
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(device)

tts.tts_to_file(
    text="Hello from a fully local text to speech model.",
    file_path="output.wav",
    speaker="Craig Gutsy",   # a built-in XTTS speaker
    language="en",
)

That writes output.wav at a 24 kHz sample rate. The first call downloads the model and shows a license prompt — XTTS v2 ships under the Coqui Public Model License, which has real restrictions; if you plan to ship a product, read our breakdown of the XTTS commercial license before you build on it.

The key tts_to_file() arguments:

Argument	Type	What it does
`text`	str	The text to synthesize
`file_path`	str	Output WAV path
`language`	str	Language code — `"en"`, `"es"`, `"fr"`, etc. (required for XTTS)
`speaker`	str	Name of a built-in XTTS studio speaker
`speaker_wav`	str / list	Path(s) to reference audio for voice cloning (instead of `speaker`)
`split_sentences`	bool	Auto-split long text into sentences (default true)

How do I clone a voice with speaker_wav?

This is what XTTS v2 is famous for: zero-shot voice cloning from a short reference clip. Swap speaker for speaker_wav and point it at 6–20 seconds of clean audio of the target voice:

tts.tts_to_file(
    text="This sentence is spoken in a cloned voice.",
    file_path="cloned.wav",
    speaker_wav="my_voice_sample.wav",   # 6-20s of clean reference audio
    language="en",
)

A few hard-won notes: the reference clip should be clean (no music or background noise), mono, and ideally 16 kHz or higher. You can pass a list of WAV paths to speaker_wav to average several samples for a more stable clone. And language is mandatory for XTTS — it controls the phonemizer, not just an accent, so the wrong code produces garbled output. For a deeper walkthrough with audio tips and longer-form generation, see our full XTTS v2 voice cloning guide.

If you want the in-memory array instead of a file, use tts.tts(...) with the same arguments — it returns the waveform as a list of floats you can post-process before saving.

Which languages does XTTS v2 support?

XTTS v2 supports 17 languages out of the box. Pass the code in the language argument:

en  Spanish=es   French=fr   German=de   Italian=it
pt  Polish=pl    Turkish=tr  Russian=ru  Dutch=nl
cs  Arabic=ar    zh-cn       Japanese=ja Hungarian=hu
ko  Hindi=hi

In full: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko), and Hindi (hi). The same cloned voice can speak any of them — clone once in English, then generate Spanish from the same speaker_wav by changing only the language code.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How do I stream XTTS audio in real time?

For interactive apps you do not want to wait for the whole clip. XTTS exposes a lower-level streaming API, inference_stream(), that yields audio chunks as they are generated — Coqui reports time-to-first-chunk under ~200 ms on a capable GPU. You drop down from the high-level TTS.api wrapper to the model classes:

import torch
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("XTTS-v2/config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="XTTS-v2/")
model.cuda()

gpt_cond_latent, speaker_embedding = model.get_conditioning_latents(
    audio_path=["my_voice_sample.wav"]
)

chunks = model.inference_stream(
    "Streaming text to speech, one chunk at a time.",
    "en",
    gpt_cond_latent,
    speaker_embedding,
)

for i, chunk in enumerate(chunks):
    # each chunk is a torch tensor of audio you can pipe to a speaker/socket
    print(f"chunk {i}: {chunk.shape}")

Streaming needs a GPU to feel real-time; on CPU it works but the sub-200 ms latency claim does not hold.

Common Coqui TTS errors and fixes

These four cover the vast majority of "coqui tts python" support threads:

Error / symptom	Cause	Fix
`ModuleNotFoundError: No module named 'TTS'` after install	Installed nothing, or the wrong wheel failed	Run `pip install coqui-tts` (the fork), not `TTS`
Dependency / build conflicts on Python 3.12+	Old `TTS` package pins ancient deps	Uninstall `TTS`, install `coqui-tts` on Python 3.10–3.14
`weights_only` / unpickling error on load	New PyTorch defaults block the checkpoint	Use `coqui-tts` 0.27.x, which patches the loader for modern PyTorch
Garbled / wrong-accent output	Missing or wrong `language` code	Always pass `language="en"` (etc.) — it is required for XTTS

The single most common mistake is installing the dead TTS package and then fighting dependency hell. Uninstall it, install coqui-tts, and most "it won't import" problems disappear.

First-hand performance note

On my own RTX 3090 (24 GB), XTTS v2 loaded in roughly 5–8 seconds and generated a single English sentence with a cloned speaker_wav in well under a second of wall-clock time — comfortably faster than real-time playback, so a paragraph renders in a couple of seconds. Treat these as approximate single-machine figures, not a controlled benchmark: model load time, clip length and disk speed all move the numbers. On CPU the same generation ran several times slower than real-time, which is why streaming is GPU-only in practice. VRAM use sat around 4–5 GB during inference, so the model fits comfortably on an 8 GB card.

Key Takeaways

Install coqui-tts, not TTS. The original Coqui package is unmaintained since early 2024; the idiap fork (PyPI coqui-tts v0.27.5) is the live one and keeps the same TTS.api import.
tts_to_file() is the one-liner you want — pass text, file_path, language, and either a built-in speaker or a speaker_wav reference clip.
speaker_wav does zero-shot voice cloning from ~6–20 seconds of clean audio; pass a list of clips for a steadier clone.
XTTS v2 speaks 17 languages at 24 kHz and can stream with inference_stream() at sub-200 ms latency on a GPU.
Always set language — it drives the phonemizer; omitting or mismatching it is the top cause of garbled output.

Next Steps

Want the full model overview, hardware needs and license? Read our Coqui XTTS v2 model page.
Building a product on cloned voices? Check the XTTS commercial license rules first — they are stricter than most open models.
Need a step-by-step cloning walkthrough with audio prep tips? See the XTTS v2 voice cloning guide.
Confirm versions and the latest API on the idiap/coqui-ai-TTS GitHub repo and the coqui-tts PyPI page.

Coqui TTS Python Guide: pip install + XTTS API Examples

Want to go deeper than this article?

Which Coqui TTS package do I install in 2026?

Reading articles is good. Building is better.

How do I generate speech with tts_to_file()?

How do I clone a voice with speaker_wav?

Which languages does XTTS v2 support?

Reading articles is good. Building is better.

How do I stream XTTS audio in real time?

Common Coqui TTS errors and fixes

First-hand performance note

Key Takeaways

Next Steps

Voice working locally? Build the whole pipeline.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

Coqui XTTS v2 Model Overview

XTTS Commercial License Explained

XTTS v2 Voice Cloning Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Voice working locally? Build the whole pipeline.