To run Suno Bark locally, install it straight from the official repo with pip install git+https://github.com/suno-ai/bark.git (not pip install bark, which is a different package), then call generate_audio() in a few lines of Python. The full model wants about 12 GB of VRAM; if you have less, set SUNO_USE_SMALL_MODELS=True to fit roughly 8 GB, and add SUNO_OFFLOAD_CPU=True for GPUs under 4 GB. Bark is unusual because it generates not just speech but laughter, sighs, music and sound effects from text tags like [laughs] and [music] — its party trick, and the reason people still reach for it in 2026 even though newer models like Kokoro and Chatterbox are faster for plain narration.

Bark works on Windows, Linux and Apple Silicon, runs entirely offline once the weights are cached, and is MIT-licensed for commercial use. The catch is speed: Bark is an autoregressive transformer that is genuinely slow next to the newer lightweight models, so this guide also covers exactly when you should switch to Kokoro or Chatterbox instead.

What is Bark and what makes it different?

Bark is a text-to-audio model from Suno, released open-source under the MIT license. Unlike a classic text-to-speech engine that only reads words, Bark is a fully generative audio model: it can produce multilingual speech, but also non-verbal sounds — laughter, sighs, gasps, throat-clearing, simple music and ambient effects — directly from the text prompt. That makes it the go-to for expressive, character-driven audio rather than clean audiobook narration.

A few facts worth pinning down before you install, because the package name and model variants trip people up:

Detail	Bark (full)	Bark-small
Model author	Suno (suno/bark on Hugging Face)	Suno (suno/bark-small)
License	MIT (commercial use OK)	MIT
Approx VRAM (full GPU)	~12 GB	~8 GB (with small-models flag)
CPU-only RAM	~8 GB (very slow)	~8 GB (very slow)
Languages supported	13	13
Output sample rate	24 kHz	24 kHz
Voice cloning	No (uses preset voice presets)	No

The 13 supported languages are English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Turkish and Simplified Chinese. Bark does not do zero-shot voice cloning — you steer the voice with built-in history-prompt presets (for example the `v2/en_speaker_6` series), not a reference clip.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How do you install Bark locally?

There are two supported install paths, and one common mistake. The mistake: do not run pip install bark — that pulls an unrelated package. Install from Suno's repo instead.

First create a clean environment with Python 3.10 or 3.11 and a recent PyTorch (2.0+). Then install Bark:

```bash

Recommended: install Bark from the official repo

pip install git+https://github.com/suno-ai/bark.git

Alternative: clone then install

git clone https://github.com/suno-ai/bark cd bark && pip install . ```

You can also run Bark through Hugging Face Transformers (4.31.0+) if you prefer the `AutoProcessor` / `BarkModel` API — but the native `bark` package above is the simplest first run.

Now generate your first clip:

```python from bark import SAMPLE_RATE, generate_audio, preload_models from scipy.io.wavfile import write as write_wav

preload_models() # downloads weights on first run, then caches them

text_prompt = "Hello, my name is Suno. [laughs] And I can also make sound effects." audio_array = generate_audio(text_prompt)

write_wav("bark_output.wav", SAMPLE_RATE, audio_array) ```

The first call downloads several gigabytes of weights, so it is slow once. After that everything runs offline. `SAMPLE_RATE` is the 24 kHz constant Bark exports — pass it straight to your WAV writer so the pitch is correct.

How do you run Bark on a low-VRAM GPU?

This is the question that decides whether Bark even starts on your machine. The full model wants Bark's three transformer sub-models — semantic, coarse and fine acoustics — plus the EnCodec codec all resident in VRAM at once, about 12 GB. Two environment variables shrink that, and they must be set before you import Bark:

```python import os

Use the smaller model variants — fits ~8 GB VRAM

os.environ["SUNO_USE_SMALL_MODELS"] = "True"

Offload sub-models to CPU between stages — for GPUs under 4 GB VRAM

os.environ["SUNO_OFFLOAD_CPU"] = "True"

from bark import SAMPLE_RATE, generate_audio, preload_models preload_models() ```

Here is the honest fit guide for picking the right flags:

Your hardware	Flags to set	What to expect
12 GB+ GPU (RTX 3060 12GB and up)	None	Full model, fastest GPU path
8 GB GPU (RTX 3070, 4060)	`SUNO_USE_SMALL_MODELS=True`	Small models, slight quality drop
Under 4 GB GPU	`SUNO_USE_SMALL_MODELS=True` + `SUNO_OFFLOAD_CPU=True`	Runs, but offload adds latency
No GPU (CPU only)	`SUNO_OFFLOAD_CPU=True` (+ small models)	~8 GB RAM, very slow — minutes per clip

On Windows you can set the same flags in a batch file with `set SUNO_USE_SMALL_MODELS=True` before launching Python, or in PowerShell with `$env:SUNO_USE_SMALL_MODELS="True"`. The small-model path is the realistic default for most consumer cards in 2026 — the quality gap is small for the expressive use cases Bark is best at.

How do you run Bark on a Mac (Apple Silicon)?

Bark runs on Apple Silicon, but MPS (Metal) support is experimental, not the polished CUDA path. Some PyTorch operators Bark relies on are not implemented for MPS yet (you may hit errors like `aten::weight_norm_interface is not currently implemented for the MPS device`), so you enable a CPU fallback for the gaps:

```bash

Mac (Apple Silicon) — enable experimental MPS + CPU fallback for unsupported ops

export SUNO_ENABLE_MPS=True export PYTORCH_ENABLE_MPS_FALLBACK=1 python your_bark_script.py ```

With MPS active, inference is meaningfully faster than pure CPU, but because some stages fall back to the CPU it is still slower than an equivalent NVIDIA card. If MPS gives you trouble, plain CPU mode always works — just slowly. For an M-series Mac, also set `SUNO_USE_SMALL_MODELS=True` if you have limited unified memory.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

What non-speech sounds can Bark generate?

This is Bark's signature feature and the reason to use it over a plain TTS engine. You embed tags and symbols directly in the text and Bark renders them as audio:

Tag / symbol	Effect
`[laughter]` / `[laughs]`	Laughter
`[sighs]`	Sighing
`[gasps]`	A gasp
`[clears throat]`	Throat-clearing
`[music]`	Simple background music
`—` or `...`	Hesitations / pauses
`♪ lyrics ♪`	Sung lyrics
`CAPITALIZATION`	Emphasis on a word
`[MAN]` / `[WOMAN]`	Bias toward a male/female voice

A prompt like `"Honestly? [sighs] I did NOT see that coming. [laughs]"` produces speech with a real sigh and laugh baked in — something Kokoro and most narration-focused models simply cannot do. The trade-off is that these tags are non-deterministic: Bark sometimes ignores or over-applies them, so for production you generate a few takes and keep the best. In my own testing on an RTX 3090, a short one-sentence clip with the full model took roughly 10-20 seconds to generate (approximate, single machine, not a controlled benchmark); the small-models path was faster but the expressive tags landed slightly less reliably.

When should you pick Kokoro or Chatterbox instead?

Bark is brilliant for expressive, sound-effect-laden audio, but it is slow and cannot clone a specific voice. Two newer open models cover those gaps:

Pick Kokoro (Kokoro-82M) for fast, clean narration. It is a tiny 82M-parameter, Apache-2.0 model (v1.0 shipped January 27, 2025 with 54 voices across 8 languages) that runs in ~2-3 GB of GPU memory and is dramatically faster than Bark — ideal for audiobooks, voiceovers and real-time-ish use. It does not do voice cloning or non-speech sound effects, so it is the opposite trade-off from Bark.
Pick Chatterbox (Resemble AI) for voice cloning. It is an MIT-licensed ~0.5B model that does zero-shot voice cloning from about 5 seconds of reference audio, with emotion control and near real-time generation. If you need a specific person's voice, Bark cannot do it and Chatterbox can.
Stick with Bark when you specifically want laughter, sighs, music and sound effects from text, or expressive character voices, and you can tolerate slower generation.

Quick decision matrix:

Need	Best pick	Why
Laughs, sighs, SFX, music from text	Bark	Only one of the three that does non-speech audio
Fast, clean narration / audiobooks	Kokoro	82M params, ~2-3 GB, much faster
Clone a specific voice	Chatterbox	Zero-shot cloning from ~5s of audio
Lowest VRAM footprint	Kokoro	Runs in ~2-3 GB
Commercial license, no fuss	Any of the three	Bark = MIT, Chatterbox = MIT, Kokoro = Apache-2.0

For a wider survey of every option side by side, see our roundup of the best local TTS models, and the full spec sheet on the Bark model page.

Key Takeaways

Install Bark with pip install git+https://github.com/suno-ai/bark.git — never pip install bark (that is a different package).
The full model needs ~12 GB VRAM. Set SUNO_USE_SMALL_MODELS=True for ~8 GB cards and add SUNO_OFFLOAD_CPU=True for GPUs under 4 GB — both before importing Bark.
On Mac, MPS is experimental: set SUNO_ENABLE_MPS=True plus PYTORCH_ENABLE_MPS_FALLBACK=1 so unsupported ops fall back to CPU.
Bark's superpower is non-speech audio — laughter, sighs, music and effects via tags like [laughs] and [music] — at the cost of speed and non-determinism.
Switch to Kokoro for fast narration or Chatterbox for voice cloning; keep Bark when you need expressive, sound-effect-rich output across its 13 languages.

Next Steps

Want the alternatives compared head-to-head? Read our best local TTS models guide for the full lineup and trade-offs.
Need fast, clean narration instead of effects? Follow the Kokoro local setup — the tiny 82M model that runs in ~2-3 GB.
Need a specific voice cloned? See the Chatterbox setup guide for zero-shot cloning from a few seconds of audio.
Want the deep spec sheet? The Bark model page lists languages, presets and license details.
Confirm everything against the source: the official Suno Bark GitHub repo and the suno/bark Hugging Face model card.

Run Bark AI Locally 2026: Setup on Windows, Mac & Linux

Want to go deeper than this article?

What is Bark and what makes it different?

Reading articles is good. Building is better.

How do you install Bark locally?

Recommended: install Bark from the official repo

Alternative: clone then install

How do you run Bark on a low-VRAM GPU?

Use the smaller model variants — fits ~8 GB VRAM

Offload sub-models to CPU between stages — for GPUs under 4 GB VRAM

How do you run Bark on a Mac (Apple Silicon)?

Mac (Apple Silicon) — enable experimental MPS + CPU fallback for unsupported ops

Reading articles is good. Building is better.

What non-speech sounds can Bark generate?

When should you pick Kokoro or Chatterbox instead?

Key Takeaways

Next Steps

Voice working locally? Build the whole pipeline.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

Best Local TTS Models

Kokoro TTS Local Setup

Chatterbox TTS Setup Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Voice working locally? Build the whole pipeline.