★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Tutorials

Run Bark AI Locally 2026: Setup on Windows, Mac & Linux

June 20, 2026
10 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Voice working locally? Build the whole pipeline. Whisper, TTS, and voice cloning wired into real projects — hands-on courses. First chapter free, no card.

Start free
Or own it for life — Lifetime $149, pay once

To run Suno Bark locally, install it straight from the official repo with pip install git+https://github.com/suno-ai/bark.git (not pip install bark, which is a different package), then call generate_audio() in a few lines of Python. The full model wants about 12 GB of VRAM; if you have less, set SUNO_USE_SMALL_MODELS=True to fit roughly 8 GB, and add SUNO_OFFLOAD_CPU=True for GPUs under 4 GB. Bark is unusual because it generates not just speech but laughter, sighs, music and sound effects from text tags like [laughs] and [music] — its party trick, and the reason people still reach for it in 2026 even though newer models like Kokoro and Chatterbox are faster for plain narration.

Bark works on Windows, Linux and Apple Silicon, runs entirely offline once the weights are cached, and is MIT-licensed for commercial use. The catch is speed: Bark is an autoregressive transformer that is genuinely slow next to the newer lightweight models, so this guide also covers exactly when you should switch to Kokoro or Chatterbox instead.

What is Bark and what makes it different?

Bark is a text-to-audio model from Suno, released open-source under the MIT license. Unlike a classic text-to-speech engine that only reads words, Bark is a fully generative audio model: it can produce multilingual speech, but also non-verbal sounds — laughter, sighs, gasps, throat-clearing, simple music and ambient effects — directly from the text prompt. That makes it the go-to for expressive, character-driven audio rather than clean audiobook narration.

A few facts worth pinning down before you install, because the package name and model variants trip people up:

DetailBark (full)Bark-small
Model authorSuno (suno/bark on Hugging Face)Suno (suno/bark-small)
LicenseMIT (commercial use OK)MIT
Approx VRAM (full GPU)~12 GB~8 GB (with small-models flag)
CPU-only RAM~8 GB (very slow)~8 GB (very slow)
Languages supported1313
Output sample rate24 kHz24 kHz
Voice cloningNo (uses preset voice presets)No

The 13 supported languages are English, German, Spanish, French, Hindi, Italian, Japanese, Korean, Polish, Portuguese, Russian, Turkish and Simplified Chinese. Bark does not do zero-shot voice cloning — you steer the voice with built-in history-prompt presets (for example the `v2/en_speaker_6` series), not a reference clip.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

How do you install Bark locally?

There are two supported install paths, and one common mistake. The mistake: do not run pip install bark — that pulls an unrelated package. Install from Suno's repo instead.

First create a clean environment with Python 3.10 or 3.11 and a recent PyTorch (2.0+). Then install Bark:

```bash

Recommended: install Bark from the official repo

pip install git+https://github.com/suno-ai/bark.git

Alternative: clone then install

git clone https://github.com/suno-ai/bark cd bark && pip install . ```

You can also run Bark through Hugging Face Transformers (4.31.0+) if you prefer the `AutoProcessor` / `BarkModel` API — but the native `bark` package above is the simplest first run.

Now generate your first clip:

```python from bark import SAMPLE_RATE, generate_audio, preload_models from scipy.io.wavfile import write as write_wav

preload_models() # downloads weights on first run, then caches them

text_prompt = "Hello, my name is Suno. [laughs] And I can also make sound effects." audio_array = generate_audio(text_prompt)

write_wav("bark_output.wav", SAMPLE_RATE, audio_array) ```

The first call downloads several gigabytes of weights, so it is slow once. After that everything runs offline. `SAMPLE_RATE` is the 24 kHz constant Bark exports — pass it straight to your WAV writer so the pitch is correct.

How do you run Bark on a low-VRAM GPU?

This is the question that decides whether Bark even starts on your machine. The full model wants Bark's three transformer sub-models — semantic, coarse and fine acoustics — plus the EnCodec codec all resident in VRAM at once, about 12 GB. Two environment variables shrink that, and they must be set before you import Bark:

```python import os

Use the smaller model variants — fits ~8 GB VRAM

os.environ["SUNO_USE_SMALL_MODELS"] = "True"

Offload sub-models to CPU between stages — for GPUs under 4 GB VRAM

os.environ["SUNO_OFFLOAD_CPU"] = "True"

from bark import SAMPLE_RATE, generate_audio, preload_models preload_models() ```

Here is the honest fit guide for picking the right flags:

Your hardwareFlags to setWhat to expect
12 GB+ GPU (RTX 3060 12GB and up)NoneFull model, fastest GPU path
8 GB GPU (RTX 3070, 4060)`SUNO_USE_SMALL_MODELS=True`Small models, slight quality drop
Under 4 GB GPU`SUNO_USE_SMALL_MODELS=True` + `SUNO_OFFLOAD_CPU=True`Runs, but offload adds latency
No GPU (CPU only)`SUNO_OFFLOAD_CPU=True` (+ small models)~8 GB RAM, very slow — minutes per clip

On Windows you can set the same flags in a batch file with `set SUNO_USE_SMALL_MODELS=True` before launching Python, or in PowerShell with `$env:SUNO_USE_SMALL_MODELS="True"`. The small-model path is the realistic default for most consumer cards in 2026 — the quality gap is small for the expressive use cases Bark is best at.

How do you run Bark on a Mac (Apple Silicon)?

Bark runs on Apple Silicon, but MPS (Metal) support is experimental, not the polished CUDA path. Some PyTorch operators Bark relies on are not implemented for MPS yet (you may hit errors like `aten::weight_norm_interface is not currently implemented for the MPS device`), so you enable a CPU fallback for the gaps:

```bash

Mac (Apple Silicon) — enable experimental MPS + CPU fallback for unsupported ops

export SUNO_ENABLE_MPS=True export PYTORCH_ENABLE_MPS_FALLBACK=1 python your_bark_script.py ```

With MPS active, inference is meaningfully faster than pure CPU, but because some stages fall back to the CPU it is still slower than an equivalent NVIDIA card. If MPS gives you trouble, plain CPU mode always works — just slowly. For an M-series Mac, also set `SUNO_USE_SMALL_MODELS=True` if you have limited unified memory.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

What non-speech sounds can Bark generate?

This is Bark's signature feature and the reason to use it over a plain TTS engine. You embed tags and symbols directly in the text and Bark renders them as audio:

Tag / symbolEffect
`[laughter]` / `[laughs]`Laughter
`[sighs]`Sighing
`[gasps]`A gasp
`[clears throat]`Throat-clearing
`[music]`Simple background music
`—` or `...`Hesitations / pauses
`♪ lyrics ♪`Sung lyrics
`CAPITALIZATION`Emphasis on a word
`[MAN]` / `[WOMAN]`Bias toward a male/female voice

A prompt like `"Honestly? [sighs] I did NOT see that coming. [laughs]"` produces speech with a real sigh and laugh baked in — something Kokoro and most narration-focused models simply cannot do. The trade-off is that these tags are non-deterministic: Bark sometimes ignores or over-applies them, so for production you generate a few takes and keep the best. In my own testing on an RTX 3090, a short one-sentence clip with the full model took roughly 10-20 seconds to generate (approximate, single machine, not a controlled benchmark); the small-models path was faster but the expressive tags landed slightly less reliably.

When should you pick Kokoro or Chatterbox instead?

Bark is brilliant for expressive, sound-effect-laden audio, but it is slow and cannot clone a specific voice. Two newer open models cover those gaps:

  • Pick Kokoro (Kokoro-82M) for fast, clean narration. It is a tiny 82M-parameter, Apache-2.0 model (v1.0 shipped January 27, 2025 with 54 voices across 8 languages) that runs in ~2-3 GB of GPU memory and is dramatically faster than Bark — ideal for audiobooks, voiceovers and real-time-ish use. It does not do voice cloning or non-speech sound effects, so it is the opposite trade-off from Bark.
  • Pick Chatterbox (Resemble AI) for voice cloning. It is an MIT-licensed ~0.5B model that does zero-shot voice cloning from about 5 seconds of reference audio, with emotion control and near real-time generation. If you need a specific person's voice, Bark cannot do it and Chatterbox can.
  • Stick with Bark when you specifically want laughter, sighs, music and sound effects from text, or expressive character voices, and you can tolerate slower generation.

Quick decision matrix:

NeedBest pickWhy
Laughs, sighs, SFX, music from textBarkOnly one of the three that does non-speech audio
Fast, clean narration / audiobooksKokoro82M params, ~2-3 GB, much faster
Clone a specific voiceChatterboxZero-shot cloning from ~5s of audio
Lowest VRAM footprintKokoroRuns in ~2-3 GB
Commercial license, no fussAny of the threeBark = MIT, Chatterbox = MIT, Kokoro = Apache-2.0

For a wider survey of every option side by side, see our roundup of the best local TTS models, and the full spec sheet on the Bark model page.

Key Takeaways

  1. Install Bark with pip install git+https://github.com/suno-ai/bark.git — never pip install bark (that is a different package).
  2. The full model needs ~12 GB VRAM. Set SUNO_USE_SMALL_MODELS=True for ~8 GB cards and add SUNO_OFFLOAD_CPU=True for GPUs under 4 GB — both before importing Bark.
  3. On Mac, MPS is experimental: set SUNO_ENABLE_MPS=True plus PYTORCH_ENABLE_MPS_FALLBACK=1 so unsupported ops fall back to CPU.
  4. Bark's superpower is non-speech audio — laughter, sighs, music and effects via tags like [laughs] and [music] — at the cost of speed and non-determinism.
  5. Switch to Kokoro for fast narration or Chatterbox for voice cloning; keep Bark when you need expressive, sound-effect-rich output across its 13 languages.

Next Steps

🎯
AI Learning Path

Voice working locally? Build the whole pipeline.

Whisper, TTS, and voice cloning wired into real projects — hands-on courses. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path
More on Local Voice & Speech
See the full Coqui TTS & Local Voice AI guide.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Voice working locally? Build the whole pipeline.

Whisper, TTS, and voice cloning wired into real projects — hands-on courses. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators