Can a local AI DJ really compete with Spotify DJ?

For DJ-style use it usually wins. Spotify recommends from its catalog using algorithms tuned for retention; a local AI DJ recommends from your actual library using real audio features (BPM, key, energy) extracted with Essentia. For sets where you have specific intent — a dinner party, a workout arc, harmonic mixing — the local system produces more coherent results because it has perfect knowledge of what is in your library. Spotify wins for passive listening when you do not care about the rules.

What model should I use for music recommendation?

Qwen 2.5 7B is the sweet spot. It is large enough to handle the structured pool of 400 candidate tracks and produce well-formed JSON output, fast enough to generate a playlist in 4-15 seconds depending on hardware, and small enough to run on a 16 GB Mac. Llama 3.1 8B is a close second. Avoid models below 7B — they fail at long structured-output tasks. Models above 13B are slower with no measurable quality gain on this task.

How long does it take to analyze a music library?

About 1.2 seconds per track on Apple Silicon, 1.5-2 seconds per track on a modern x86 CPU. A 5,000-track library finishes in 1.5-2 hours. A 14,000-track library takes 4-5 hours. Run the analyzer once overnight; subsequent imports only process new tracks. The Essentia features (BPM, key, energy, danceability) are cached in a small SQLite database.

Does this work with Spotify or Apple Music libraries?

Not directly — both services stream rather than provide local files. You would need to own the music as files (FLAC, MP3, AAC, ALAC). The system works perfectly with iTunes/Apple Music libraries that are stored as local files (not streamed), with FLAC libraries from Bandcamp or Qobuz, and with any folder of tagged audio files. If you only stream, this stack is not for you.

How does harmonic mixing work in this setup?

Essentia extracts the key (e.g., "A minor") of each track. The system maps that to Camelot notation (A minor = 8A) and tells the model to prefer consecutive tracks within ±1 Camelot number on the same letter, or with the same number across letters. This is the standard DJ rule for keys that mix without dissonance. Mixxx then handles the actual beat-matched crossfade using the BPM data.

Can I run this on a Synology or QNAP NAS?

You can run the library scan on a NAS — Essentia works fine on ARM and x86 NAS units, though slowly. The Ollama-based playlist generation requires more compute than most NAS units provide; a 7B model is unusably slow on a Ryzen R1600. Best architecture: run the scan on the NAS where the library lives, then sync the features.db file to a faster machine that runs the model and serves playlists.

Is the recommendation truly private?

Yes. Essentia analyzes audio locally; Ollama runs the model locally; the SQLite features database lives on your machine; Mixxx plays from local files. There are no telemetry calls in any of these tools beyond the one-time download of the Ollama model and Essentia binaries. After initial install you can disable the network and the entire stack continues to function. Verify with Little Snitch or by inspecting outbound connections during a generation run.

How do I add voice control like Hey Siri?

Pair this stack with whisper.cpp for offline speech-to-text. A Karabiner shortcut on Mac (or a hardware button on Linux) starts a 5-second recording, whisper transcribes it locally, and the transcript becomes the prompt for the playlist generator. The full chain — voice in to playing playlist out — runs in about 10 seconds on an M1 Mac with no cloud involvement.

Local AI DJ: Build a Private Music Recommender & Mix Generator

Published on April 23, 2026 • 17 min read

I have a 14,000-track FLAC library on a NAS in my closet. About 6,000 of those tracks I have not listened to in years, because Spotify trained me to ask "what should I listen to" and not "what do I have." When my friend's wedding came up last summer and I offered to DJ the cocktail hour, I realized something embarrassing: Spotify could build me a smarter playlist from someone else's library than I could from my own.

So I built the thing that should exist. A local AI DJ that knows my music — actually knows it, BPM and key and energy and what came on after what at the last party — and recommends the next track when I tell it where the room is right now. It runs on a Mac Mini, costs nothing per month, and outperforms Spotify's recommendations for my use case for a simple reason: it has read every tag I ever wrote, every play I ever logged, and the actual harmonic structure of every song I own.

This guide is the full build. Music library analysis with Essentia, smart playlist generation with Ollama, harmonic mixing with Mixxx, and a small Flask app that ties it together so you can ask "give me 90 minutes that starts mellow and lands at 128 BPM by song eight." Real benchmarks, real configs, real edge cases. By the end you will have a private DJ assistant that beats anything streaming can give you for sets where you actually care.

Quick Start: Your First AI-Generated Playlist in 25 Minutes {#quick-start}

The shortest path to "this works":

Install dependencies: brew install ollama essentia python beets ffmpeg on Mac (or apt equivalents on Linux).
Pull a model: ollama pull qwen2.5:7b.
Point beets at your library: beet config and set directory to your music folder.
Import: beet import ~/Music. Wait for tagging to finish.
Run the analyzer (script below) to extract BPM, key, and energy for every track.
Generate a playlist: python3 ai-dj.py "60 minutes of warm Sunday evening, no vocals after track 6".

Twenty-five minutes from clean machine to working AI DJ. The rest of this guide makes it good.

Why Local Beats Spotify for DJ Work
The Stack
Hardware Reality Check
Step 1 — Tag and Organize With beets
Step 2 — Audio Feature Extraction With Essentia
Step 3 — The Recommendation Prompt That Works
Step 4 — Harmonic Mixing With Camelot Wheel
Step 5 — Wire It Into Mixxx
Use Cases I Actually Run
Pitfalls and Performance
Comparison: Local AI DJ vs Spotify DJ vs Algoriddim
FAQ

Why Local Beats Spotify for DJ Work {#why-local}

Streaming services are fine if you want a passive sit-back-and-let-it-play experience. They are bad at three things that matter for DJ-style use:

They do not know your library. Spotify recommends from its catalog, not yours. If you have a private bootleg edit of a Talking Heads track, or a friend's unreleased demo, Spotify cannot build a set around it.

They do not understand transitions. Spotify will follow a 92 BPM A-minor track with a 124 BPM C-major one and call that a "vibe." A real DJ knows that hurts. A model that has Essentia's BPM and key data for both tracks can build sets that mix harmonically.

They optimize for retention, not for the room. Spotify's recommender is tuned to keep you in their ecosystem. A local model tuned on your listening history and a clear prompt — "warm Sunday brunch for eight people including two who do not like electronic music" — produces something Spotify structurally cannot.

The fourth reason is mine: the energy and tempo data sit on your laptop forever. You do not lose the work when you cancel a subscription, change services, or get rate-limited.

The Stack {#the-stack}

Layer	Tool	Job
Library manager	beets	Tag, dedupe, normalize metadata
Audio analysis	Essentia (Python bindings)	BPM, key, energy, danceability, mood
Recommendation engine	Ollama + Qwen 2.5 7B	Convert prompt to track sequence
Mixing software	Mixxx	DJ output with BPM sync and crossfade
Glue	Flask + SQLite	Web UI, history tracking
Optional voice	whisper.cpp	"Hey DJ, more like this" voice prompt

Total cost: $0. All open source. Full disk footprint: ~6 GB plus your music library.

Hardware Reality Check {#hardware}

I have run this on several machines. Numbers below are real measurements with my 14,000-track library.

Machine	Library scan time	Playlist gen	Verdict
Mac Mini M2 Pro 32 GB	2 hr 14 min	4-7 sec	Recommended
MacBook Air M1 8 GB	5 hr 20 min	11-18 sec	Works
Beelink SER5 (Ryzen 5800H, 32 GB)	4 hr 50 min	9-14 sec	Adequate
Synology DS923+ (Ryzen R1600, 16 GB)	14 hr 10 min	n/a (CPU too slow for 7B)	Library scan only

Recommended floor: any machine with 16 GB RAM and a 2020-or-later CPU. Library scan happens once. Playlist generation is the recurring cost, and any modern Mac handles it.

The library scan is one-time and overnight-friendly. Subsequent imports just analyze new tracks.

Step 1 — Tag and Organize With beets {#beets}

beets is the music librarian's tool of choice. It looks up every track on MusicBrainz, normalizes tags, organizes files, and gives you a SQLite database to query.

pip install beets[fetchart,lyrics,lastgenre,replaygain]

# Initial config
beet config -e

Minimum useful config:

directory: ~/Music/Library
library: ~/.beets/library.db
plugins: fetchart lyrics lastgenre chroma replaygain duplicates

import:
    move: yes
    write: yes

paths:
    default: $albumartist/$album%aunique{}/$track $title
    singleton: Non-Album/$artist/$title
    comp: Compilations/$album%aunique{}/$track $title

Run the import:

beet import ~/incoming-music

beets will prompt you to confirm matches. Use A to accept matches >90% confidence in bulk. Expect ~3-4 hours for a 14,000-track library on first pass.

After import you have a clean SQLite database at ~/.beets/library.db that the AI DJ will query.

Step 2 — Audio Feature Extraction With Essentia {#essentia}

Essentia is the best open-source audio feature library. It extracts BPM, key, danceability, energy, and a dozen other useful features. Install:

# Mac
brew install essentia

# Linux
pip install essentia-tensorflow

The analyzer script — save as analyze.py:

import json
import sqlite3
from pathlib import Path
import essentia.standard as es

DB = Path.home() / ".beets" / "library.db"
FEATURES_DB = Path.home() / ".dj" / "features.db"
FEATURES_DB.parent.mkdir(exist_ok=True)

def init_db():
    conn = sqlite3.connect(FEATURES_DB)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS features (
            path TEXT PRIMARY KEY,
            bpm REAL,
            key TEXT,
            scale TEXT,
            energy REAL,
            danceability REAL,
            duration REAL,
            loudness REAL
        )
    """)
    conn.commit()
    return conn

def analyze(path):
    audio = es.MonoLoader(filename=path, sampleRate=44100)()
    bpm, _, _, _, _ = es.RhythmExtractor2013(method="multifeature")(audio)
    key, scale, _ = es.KeyExtractor()(audio)
    danceability, _ = es.Danceability()(audio)
    loudness = es.LoudnessEBUR128()(audio)[2]
    duration = len(audio) / 44100.0
    energy = float(es.Energy()(audio))
    return {
        "bpm": float(bpm),
        "key": key,
        "scale": scale,
        "danceability": float(danceability),
        "loudness": float(loudness),
        "duration": float(duration),
        "energy": energy,
    }

def main():
    conn = init_db()
    beets = sqlite3.connect(DB)
    rows = beets.execute("SELECT path FROM items").fetchall()
    for (p,) in rows:
        path = p.decode() if isinstance(p, bytes) else p
        existing = conn.execute("SELECT path FROM features WHERE path=?", (path,)).fetchone()
        if existing:
            continue
        try:
            f = analyze(path)
            conn.execute(
                "INSERT INTO features VALUES (?,?,?,?,?,?,?,?)",
                (path, f["bpm"], f["key"], f["scale"], f["energy"],
                 f["danceability"], f["duration"], f["loudness"])
            )
            conn.commit()
            print(f"OK {path} bpm={f['bpm']:.0f} key={f['key']}{f['scale']}")
        except Exception as e:
            print(f"FAIL {path}: {e}")

if __name__ == "__main__":
    main()

This produces a features.db with one row per track. On an M1, expect ~1.2 seconds per track. A 14,000-track library completes in roughly 4-5 hours unattended. Run it once with nohup overnight and forget it.

For broader audio AI patterns, see the local AI voice clone guide, which uses some of the same tooling.

Step 3 — The Recommendation Prompt That Works {#prompt-design}

This is the heart of the system. The model needs the right context to make good calls.

import json, sqlite3, subprocess
from pathlib import Path

FEATURES_DB = Path.home() / ".dj" / "features.db"

def candidates(bpm_min, bpm_max, limit=400):
    conn = sqlite3.connect(FEATURES_DB)
    rows = conn.execute(
        "SELECT path, bpm, key, scale, danceability, energy, duration "
        "FROM features WHERE bpm BETWEEN ? AND ? "
        "ORDER BY RANDOM() LIMIT ?",
        (bpm_min, bpm_max, limit)
    ).fetchall()
    return [
        {"path": r[0], "bpm": round(r[1]), "key": r[2] + r[3],
         "danceability": round(r[4], 2), "duration": round(r[5])}
        for r in rows
    ]

def ask_dj(user_prompt, target_minutes=60, bpm_window=(85, 130)):
    pool = candidates(bpm_window[0], bpm_window[1])
    track_text = "\n".join(
        f"{i}: {Path(t['path']).stem} | {t['bpm']} BPM | {t['key']} | dance {t['danceability']}"
        for i, t in enumerate(pool)
    )
    prompt = f"""You are a thoughtful DJ planning a {target_minutes}-minute set.
User request: {user_prompt}

You may only choose tracks from this numbered pool. Output a JSON array of
exactly the track indices in the order they should play. Aim for total runtime
near {target_minutes} minutes. Build a smooth BPM and energy arc that matches
the request. Avoid repeating the same artist back-to-back.

POOL:
{track_text}

Output ONLY the JSON array of indices, nothing else."""
    res = subprocess.run(
        ["ollama", "run", "qwen2.5:7b", prompt],
        capture_output=True, text=True, timeout=120
    )
    indices = json.loads(res.stdout.strip())
    return [pool[i] for i in indices]

Why this prompt design works:

Constraining the candidate pool to ~400 tracks fits comfortably in Qwen 2.5's context window.
The BPM filter at retrieval time means the model never has to reject tracks for tempo — that work is done.
Asking for indices instead of paths sidesteps long-string copy errors that LLMs occasionally make.
The "smooth arc" instruction matters more than you would expect; without it, models default to clumping similar tracks.

Sample run on my library:

$ python3 ai-dj.py "warm Sunday morning, gradually wake up, end at 105 BPM" 60
1. Erlend Øye - La Prima Estate (88 BPM, F major, dance 0.61)
2. Tycho - Awake (94 BPM, F major, dance 0.72)
3. Bonobo - Cirrus (98 BPM, G major, dance 0.81)
...
12. Caribou - Can't Do Without You (105 BPM, A major, dance 0.85)
Total runtime: 58 min 12 sec

That set is harmonically coherent (F → G → A is a clean upward modulation) and the energy arc is exactly what was asked for. Spotify did not give me anything close when I tried the same brief.

Step 4 — Harmonic Mixing With Camelot Wheel {#camelot}

Real DJs talk about keys in Camelot notation because it makes "what mixes with what" obvious. Add a Camelot mapping to your prompt context:

CAMELOT = {
    ("C", "major"): "8B", ("A", "minor"): "8A",
    ("G", "major"): "9B", ("E", "minor"): "9A",
    ("D", "major"): "10B", ("B", "minor"): "10A",
    ("A", "major"): "11B", ("F#", "minor"): "11A",
    ("E", "major"): "12B", ("C#", "minor"): "12A",
    ("B", "major"): "1B", ("G#", "minor"): "1A",
    ("F#", "major"): "2B", ("D#", "minor"): "2A",
    ("C#", "major"): "3B", ("A#", "minor"): "3A",
    ("G#", "major"): "4B", ("F", "minor"): "4A",
    ("D#", "major"): "5B", ("C", "minor"): "5A",
    ("A#", "major"): "6B", ("G", "minor"): "6A",
    ("F", "major"): "7B", ("D", "minor"): "7A",
}

Compatible mixes from any Camelot key X-N: same number same letter (perfect), same number opposite letter (relative major/minor), one number up or down same letter (energy boost or drop). Add this rule to your prompt:

"Prefer transitions where consecutive tracks are within ±1 Camelot number on the same letter, or share a number across letters."

This single line transforms the output from "tracks that vaguely fit the energy" to "tracks that mix without sounding like a car crash."

Step 5 — Wire It Into Mixxx {#mixxx}

Mixxx is the open-source DJ software that reads M3U playlist files. The final piece of the system writes the playlist:

def write_m3u(tracks, out_path):
    with open(out_path, "w") as f:
        f.write("#EXTM3U\n")
        for t in tracks:
            f.write(f"#EXTINF:{int(t['duration'])},{Path(t['path']).stem}\n")
            f.write(f"{t['path']}\n")

write_m3u(plan, "~/Mixxx/Playlists/sunday-morning.m3u")

In Mixxx, drop the M3U into the auto DJ queue. Set crossfade to 8 seconds. Enable "Sync BPM" on both decks. The AI's harmonic ordering plus Mixxx's beat-grid sync produces transitions that sound deliberate, not accidental.

Use Cases I Actually Run {#use-cases}

1. Dinner party (60-90 minutes, 90-110 BPM, no vocals after course 2)

Prompt: "60 minutes for a 6-person dinner. First 20 min mid-tempo background with vocals, last 40 min instrumental and slightly more energy. No songs anyone will ask 'what is this' about."

That last constraint is real. Once you specify "no conversation-stopping tracks" the model avoids dropping a Frank Ocean a cappella into a moment where people are mid-sentence.

2. Long workout (90 min, ramp 130 → 160 BPM)

Prompt: "90-minute workout playlist. Start at 130 BPM, climb to 160 by minute 60, hold 160 until minute 75, taper to 140 for cooldown. Hard four-on-the-floor only. No drum and bass."

The BPM filter and the explicit arc make this a 4-second generation.

3. Solo reading session (60 min, soft, no surprises)

Prompt: "60 minutes of music for reading. Acoustic and ambient, never above 90 BPM, never anything I have to skip. Minimal vocals, calm dynamics, no sudden volume changes."

I run this most weekends. It's better than any "focus" playlist Spotify has shown me, because the model has read every track in my "calm" tag.

4. Friend's birthday (3 hours, party arc)

Prompt: "3-hour party. 7-9 PM warm dinner-vibe, 9-10 PM throwback dance, 10-11 PM peak time. End on something nostalgic. People at this party are 30-45 and like indie rock and 90s hip-hop."

This is where local AI shines. Spotify's "party" playlist algorithm can't possibly know that the room is 30-45-year-olds with specific taste.

Pitfalls and Performance {#pitfalls}

Pitfall 1: BPM detection on broken time signatures. Essentia's RhythmExtractor2013 is good but fails on tracks with strong 6/8 grooves or ambient pieces with no strong beat. The result is sometimes half- or double-time BPM. For my library, ~3% of tracks need manual correction. Add a sanity-check step: if the BPM is below 60 or above 200 and the genre tag isn't "Drum & Bass," flag it for review.

Pitfall 2: The model invents tracks. Early on, I let the model write paths instead of indices and got confident hallucinations of tracks that don't exist. Indices into a fixed pool eliminate this entirely.

Pitfall 3: The energy curve flattens. If the BPM window is too narrow, the model produces 60 minutes of tracks that all feel the same. Use a wider BPM range than you think — (85, 130) for a "calm to energetic" arc — and let the prompt do the shaping.

Pitfall 4: Tempo locks override taste. A perfectly tempo-matched set can still be musically wrong. Trust the model when it suggests a 4-BPM jump that lands on a perfect harmonic move.

Pitfall 5: Sample rate mismatches. Your 96 kHz FLAC and your 44.1 kHz MP3 will both load fine in Mixxx, but feature extraction needs consistent sample rate. The MonoLoader resamples to 44.1 kHz automatically — keep it that way.

Comparison: Local AI DJ vs Spotify DJ vs Algoriddim {#comparison}

Capability	Spotify DJ	Algoriddim djay AI	Local AI DJ
Plays from your local library	No	Partial	Yes
Works offline	No	Limited	Yes
Custom prompts	Limited	No	Full natural language
Harmonic mixing	No	Yes	Yes
Listening history private	No	No	Yes
Per-track BPM/key/energy	No	Yes	Yes
Hourly cost	$11.99/mo	$9.99/mo	$0
Setup time	0 min	10 min	4-5 hr (one time)
Skill ceiling	Low	Medium	High

The local stack is more work to set up and dramatically more capable once running. For people who care about music enough to have built a library, the math is obvious.

FAQ {#faq}

(See FAQ section below — schema-rendered for Google.)

Where to Take This Next

The same architecture extends in obvious ways:

Local AI voice clone — generate your own DJ intros and station IDs.
Local AI podcast production — apply the audio feature pipeline to spoken-word content.
Best Ollama clients — wrap the recommender in a desktop UI if Flask feels lightweight.

Music libraries are unusual in modern computing: you actually own them. A local AI DJ is what owning a library should feel like — a system that knows what you have and helps you use it. The streaming services made us forget that. Building this stack remembers it.

Local AI DJ: Build a Private Music Recommender & Mix Generator

Want to go deeper than this article?

Local AI DJ: Build a Private Music Recommender & Mix Generator

Quick Start: Your First AI-Generated Playlist in 25 Minutes {#quick-start}

Table of Contents

Why Local Beats Spotify for DJ Work {#why-local}

The Stack {#the-stack}

Hardware Reality Check {#hardware}

Step 1 — Tag and Organize With beets {#beets}

Step 2 — Audio Feature Extraction With Essentia {#essentia}

Step 3 — The Recommendation Prompt That Works {#prompt-design}

Step 4 — Harmonic Mixing With Camelot Wheel {#camelot}

Step 5 — Wire It Into Mixxx {#mixxx}

Use Cases I Actually Run {#use-cases}

Pitfalls and Performance {#pitfalls}

Comparison: Local AI DJ vs Spotify DJ vs Algoriddim {#comparison}

FAQ {#faq}

Where to Take This Next

Go from reading about AI to building with AI

Enjoyed this? There are 10 full courses waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Written by Pattanaik Ramswarup

More Creative Local AI Builds

Build Real AI on Your Machine

🎓 Continue Learning

Related Guides

Local AI Voice Clone

Local AI Podcast Production

Best Ollama Clients

Mac Local AI Setup

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI