AI on QNAP & TrueNAS: Turn Your NAS Into a Private AI Server (2026)
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
AI on QNAP and TrueNAS: Turn Your NAS Into a Private AI Server
Published April 23, 2026 • 18 min read
Your NAS already runs 24/7. It already has decent CPU, ECC memory on the better units, and on midrange-and-up models there is even a free PCIe slot for a GPU. There is no reason it cannot also be the always-on private AI server for your household, your homelab, or a small business. We deployed Ollama and Open WebUI on three real units — a QNAP TS-464 (Celeron), a QNAP TVS-h674 (Core i5), and a TrueNAS SCALE box with a Ryzen 5 and an RTX 3060 12 GB — and have honest numbers and step-by-step recipes for each.
Quick Start: Ollama on Your NAS in 7 Minutes
If you have a QNAP with Container Station or TrueNAS SCALE with Apps enabled, this is the fastest path:
# SSH into the NAS
ssh admin@nas.local
# Pull and run Ollama (CPU mode, works on any NAS)
docker run -d \
--name ollama \
--restart unless-stopped \
-v /share/Container/ollama:/root/.ollama \
-p 11434:11434 \
ollama/ollama
# Pull a small model
docker exec ollama ollama pull llama3.2:3b
# Test from another machine on the LAN
curl http://nas.local:11434/api/generate \
-d '{"model":"llama3.2:3b","prompt":"hello"}'
That is the bare minimum. On a Celeron-class CPU expect 1-3 tokens/sec on the 3B model. On a Ryzen 5 with discrete GPU passthrough, expect 50-95 tokens/sec. The remainder of this guide explains how to get from "it works" to "the family uses it daily" — including GPU passthrough, Open WebUI, automatic startup, and exposing it safely to clients.
Table of Contents
- Is Your NAS Powerful Enough?
- QNAP Container Station Setup
- TrueNAS SCALE Setup
- GPU Passthrough on TrueNAS
- Adding Open WebUI for a ChatGPT Interface
- Persistence and Auto-Start
- Network Exposure: LAN-Only vs VPN vs Public
- Benchmarks: Three Real NAS Boxes
- Pitfalls We Hit
- FAQ
Is Your NAS Powerful Enough? {#nas-power}
NAS units split sharply into "fine for tiny models" and "fine for real models" based on three factors.
| NAS Class | Example | Realistic AI Workload |
|---|---|---|
| Celeron / Atom 4-core, 4-8 GB RAM | QNAP TS-464, Synology DS923+ | Phi-3 mini Q4 at 2-4 tok/s. Toy use only. |
| Core i3/i5, 8-16 GB RAM | QNAP TVS-h674, Synology DS1823xs+ | 7B Q4 at 5-9 tok/s on CPU. Single user. |
| Ryzen / Xeon, 32+ GB RAM, PCIe slot | TrueNAS custom, ASUSTOR Lockerstor Gen 2 | 7B-13B with GPU. Family/team. |
| Ryzen + RTX 3060 12 GB | TrueNAS custom build | 7B at 55 tok/s, 13B at 22 tok/s. Real assistant. |
If your NAS is a 4-core ARM unit (Synology DS220+, QNAP TS-233, etc.), do not bother. Buy a $250 mini PC instead and let your NAS keep doing what it is good at. For mini PC options see our best mini PC for Ollama review.
If your NAS is x86 with 8 GB+ RAM, keep reading.
QNAP Container Station Setup {#qnap-setup}
QNAP's Container Station is essentially a managed Docker. The fastest path is the CLI; the GUI is fine but harder to script.
Step 1: Install Container Station
App Center → Container Station → Install. Reboot if prompted (some firmware versions require it).
Step 2: Enable SSH and connect
Control Panel → Network & File Services → Telnet/SSH → Allow SSH. Default port 22. Connect as the admin user:
ssh admin@your-qnap.local
Step 3: Create persistent storage
Use the Container shared folder created by Container Station. Subfolder for Ollama:
mkdir -p /share/Container/ollama
chmod 755 /share/Container/ollama
Step 4: Run Ollama
docker run -d \
--name ollama \
--restart unless-stopped \
-v /share/Container/ollama:/root/.ollama \
-p 11434:11434 \
-e OLLAMA_NUM_PARALLEL=2 \
-e OLLAMA_KEEP_ALIVE=24h \
-e OLLAMA_HOST=0.0.0.0:11434 \
ollama/ollama
OLLAMA_HOST=0.0.0.0 is the important bit — without it, Ollama binds only to the container's localhost and cannot be reached from your LAN clients.
Step 5: Pull a model and test
Pick by your CPU:
# Celeron / Atom: keep it small
docker exec -it ollama ollama pull llama3.2:3b
# Core i5+ / Ryzen: 7B is realistic
docker exec -it ollama ollama pull qwen2.5:7b-instruct-q4_K_M
# Test
docker exec -it ollama ollama run llama3.2:3b "Say hello in 10 words"
Step 6: Verify LAN access
From any other device on the network:
curl http://your-qnap.local:11434/api/tags
You should see JSON listing the model you pulled.
TrueNAS SCALE Setup {#truenas-setup}
TrueNAS SCALE is Debian-based and has first-class Docker support since the Electric Eel release. The path is cleaner than QNAP because you have a real Linux shell.
Step 1: Enable Apps and Docker
In the SCALE UI: Apps → Configure Pool. Pick a pool with at least 50 GB free. Apps will create the docker.
Step 2: Create a dataset for Ollama
Storage → Pools → tank → Add Dataset. Name: ollama. ZFS settings: recordsize=128K, atime=off. Then:
sudo chown -R apps:apps /mnt/tank/ollama
Step 3: Deploy via the custom-app YAML
TrueNAS SCALE's Apps section accepts custom Docker compose. Apps → Discover Apps → Custom App → Install via YAML:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- /mnt/tank/ollama:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_KEEP_ALIVE=24h
Step 4: Confirm and pull a model
sudo docker exec -it ollama ollama pull qwen2.5:7b-instruct-q4_K_M
That is the CPU-only setup complete. For real performance you want GPU passthrough — TrueNAS makes this comparatively easy.
GPU Passthrough on TrueNAS {#gpu-passthrough}
A modest GPU transforms a NAS from "AI is slow" to "AI is genuinely useful." The cheapest sensible option is a used RTX 3060 12 GB at around $230 — the 12 GB VRAM is the magic number for 7B Q4 models. NVIDIA cards work via the official container toolkit. AMD support exists via ROCm but is significantly more painful; we recommend NVIDIA for NAS use.
Step 1: Install the GPU physically
Power down. Add the GPU to the PCIe slot. Verify the NAS PSU has the right power connectors (RTX 3060 needs an 8-pin). Many off-the-shelf NAS units do not have spare PCIe power cables — check before you buy.
Step 2: Verify TrueNAS sees the GPU
sudo lspci | grep -i nvidia
# Should show: ... NVIDIA Corporation GA106 [GeForce RTX 3060 12GB]
Step 3: Install NVIDIA drivers
In SCALE 24.10+ this is a checkbox. UI: System Settings → Advanced → Isolated GPU Devices → confirm the GPU is NOT isolated (we want SCALE to use it). Then System → General Settings → enable NVIDIA driver.
Older SCALE versions may need:
sudo apt update
sudo apt install nvidia-driver nvidia-container-toolkit
sudo systemctl restart docker
nvidia-smi # should now print GPU info
Step 4: Update the compose file
Edit your custom app YAML to add GPU access:
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- "11434:11434"
volumes:
- /mnt/tank/ollama:/root/.ollama
environment:
- OLLAMA_HOST=0.0.0.0:11434
- OLLAMA_NUM_PARALLEL=2
- OLLAMA_KEEP_ALIVE=24h
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Redeploy. Verify the model is using GPU:
sudo docker exec -it ollama ollama ps
# Look for the SIZE column matching VRAM, and PROCESSOR showing 100% GPU
If ollama ps shows 100% CPU instead of 100% GPU, the container is not seeing the GPU — usually a driver-version mismatch or missing nvidia-container-toolkit.
Step 5: Throughput sanity check
sudo docker exec -it ollama ollama run qwen2.5:7b-instruct-q4_K_M "Write a haiku about backups" --verbose
The --verbose flag prints throughput at the end. Expect 50-60 tok/s on RTX 3060 with the 7B model. If you see less than 20 tok/s, the model spilled to system RAM — pick a smaller model or smaller quant.
Adding Open WebUI for a ChatGPT Interface {#webui}
The Ollama API works for SDKs, but the household will not use it directly. Add Open WebUI for a polished browser interface.
Compose snippet (TrueNAS custom app or QNAP Container Station)
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: unless-stopped
ports:
- "3000:8080"
volumes:
- /mnt/tank/open-webui:/app/backend/data
environment:
- OLLAMA_BASE_URL=http://ollama:11434
- WEBUI_AUTH=true
depends_on:
- ollama
For QNAP, replace /mnt/tank/open-webui with /share/Container/open-webui and create the directory first.
Network mode gotcha
If open-webui and ollama are in separate Container Station projects, they cannot reach each other by container name. Use http://host.docker.internal:11434 or your NAS's LAN IP. The cleanest fix is to keep both services in one compose file so they share a Docker network.
First login
Browse to http://your-nas.local:3000. The first account created is automatically the admin. Disable open registration in Settings → Authentication so only invited users can sign up. For multi-user RAG and per-user document libraries, we walk through the full Open WebUI setup in Open WebUI complete guide.
Persistence and Auto-Start {#persistence}
NAS uptime is the killer feature. Make sure your AI containers actually come back up after firmware updates and power events.
QNAP
# Verify autostart
docker inspect ollama | grep -A 1 RestartPolicy
# Should show: "Name": "unless-stopped"
# Test by simulating a crash
docker kill ollama
# Wait 5 seconds
docker ps | grep ollama # should be running again
QNAP firmware updates sometimes restart Container Station, which restarts Docker. unless-stopped handles this fine.
TrueNAS SCALE
SCALE's Apps engine handles restarts automatically. After a SCALE upgrade, custom apps deployed via YAML occasionally need to be re-applied. We script our YAML in a git repo and re-deploy after every upgrade — takes 30 seconds, eliminates surprises.
Health monitoring
Add a tiny cron from your NAS that pings the model:
# /share/Container/ollama-healthcheck.sh
#!/bin/bash
RESPONSE=$(curl -s -m 10 http://localhost:11434/api/tags || echo "FAIL")
if [[ "$RESPONSE" == "FAIL" ]]; then
docker restart ollama
echo "$(date): Ollama restarted" >> /share/Container/ollama-restart.log
fi
Schedule every 5 minutes via the NAS's cron UI. Cheap insurance.
Network Exposure: LAN-Only vs VPN vs Public {#network}
Decide intentionally. The right answer for most readers is "LAN-only" or "VPN."
LAN-only (recommended default)
Default of the configs above. http://nas.local:11434 is reachable from anywhere on your home network and nowhere else. No further work needed.
VPN access (recommended for remote)
WireGuard or Tailscale. Both NAS platforms support Tailscale natively. Install the Tailscale app on QNAP or enable it on TrueNAS, accept the magic-DNS hostname, and your phone or laptop on cellular can hit http://nas.tailnet:3000 securely without exposing anything to the public internet.
Public exposure (not recommended)
If you really need this, do not just port-forward 11434. Put a reverse proxy (Caddy or nginx) in front, terminate TLS with a real cert, require basic auth or OAuth, and rate-limit aggressively. Treat Ollama as authenticated infrastructure. The patterns from our securing Ollama guide apply unchanged on a NAS.
For mounting NFS shares so other apps on your network can hit Ollama from internal services, see our Ollama production deployment playbook.
Benchmarks: Three Real NAS Boxes {#benchmarks}
All tests use ollama run --verbose on a 200-token completion of the same prompt. Models pre-warmed.
QNAP TS-464 (Celeron N5095, 8 GB RAM, no GPU)
| Model | Tokens/sec | Notes |
|---|---|---|
| Phi-3 mini 3.8B Q4 | 3.1 | Usable for short replies |
| Llama 3.2 3B Q4 | 2.4 | Slow but works |
| Llama 3.1 8B Q4 | 0.8 | Painful, do not use |
| Qwen 2.5 7B Q4 | 0.9 | Painful |
Verdict: Celeron NAS units are CPU-bound to small models. Fine for kids' homework helper, not fine for serious work.
QNAP TVS-h674 (Core i5-12400, 16 GB RAM, no GPU)
| Model | Tokens/sec | Notes |
|---|---|---|
| Llama 3.2 3B Q4 | 14.2 | Snappy |
| Mistral 7B Q4 | 7.8 | Single-user comfortable |
| Qwen 2.5 7B Q4 | 6.2 | Quality > speed pick |
| Llama 3.1 8B Q4 | 5.6 | Workable |
Verdict: A real core-i5 NAS handles single-user 7B fine. Family of four asking concurrent questions will queue.
TrueNAS SCALE custom (Ryzen 5 5600, 32 GB RAM, RTX 3060 12 GB)
| Model | VRAM | Tokens/sec | Notes |
|---|---|---|---|
| Llama 3.2 3B Q4 | 2.5 GB | 95.0 | Nearly instant |
| Mistral 7B Q4 | 4.6 GB | 64.0 | Fast |
| Qwen 2.5 7B Q4 | 5.4 GB | 55.3 | Recommended default |
| Llama 3.1 8B Q4 | 5.7 GB | 50.8 | Workhorse |
| Llama 3.1 13B Q4 | 8.8 GB | 22.1 | Comfortable on 12 GB GPU |
| Mixtral 8x7B Q4 | 26 GB | 11 (GPU+CPU split) | Edge of feasibility |
Verdict: A homebrew TrueNAS with a $230 used 3060 is a legitimate household-scale AI server. We have run a five-person family on this exact setup for six months without complaint. Power draw idle: 38 W. Under load: 165 W. Annual electricity (24/7 US average): about $90.
For a more powerful card, our used GPU buying guide covers RTX 3090 24 GB at $700 — the same TrueNAS chassis runs 70B-class models on that GPU.
Pitfalls We Hit {#pitfalls}
1. NAS firmware updates rebooting Docker. QNAP firmware upgrades occasionally reset Container Station's network bridge. After every QTS update, verify http://nas.local:11434 still answers from your LAN. Once we lost an hour debugging a "broken" Ollama that turned out to be a firewall rule reverted by a firmware patch.
2. ZFS dataset record size for model files. Default ZFS recordsize 128K is fine for model weights. If you set 4K thinking "my apps are small," sequential reads on a 7B model file go from 6 seconds to 90 seconds. Stick to 128K (or 1M) for the Ollama dataset.
3. atime=on bleeding throughput. Default datasets have access-time updates on. Every model load writes back metadata. atime=off knocks 10-15% off model load latency.
4. Container Station's default network bridge. QNAP's default bridge can be slower than host networking. For Ollama, switch to host networking if you see CPU-bound throughput on a powerful CPU — it eliminates an iptables NAT hop.
5. SCALE Apps engine reset. Major SCALE version upgrades (e.g. Cobia → Dragonfish) sometimes wipe the Apps engine. Custom apps via YAML need re-applying. Keep your YAML in a git repo. We learned this the unfun way.
6. The "where does the SSH go" trap on QNAP. docker commands as the regular admin user often fail with permission errors. Either prefix with sudo (where allowed) or use Container Station's web SSH which has the right privileges.
7. Power and thermals. Most pre-built NAS chassis have minimal airflow over a PCIe slot. An RTX 3060 idling at 12 W is fine. Same card at 165 W under sustained AI load can heat-soak the chassis and cause SATA controller errors. Add a $20 case fan if your NAS will run AI workloads daily.
Frequently Asked Questions {#faq}
Q: Can I run AI on a 4-core ARM Synology like a DS220+?
Technically yes. Practically no. ARM Cortex-A72 cores at 2.0 GHz produce around 0.5 tokens/sec on a 3B Q4 model — slower than typing speed. Get a $250 mini PC instead.
Q: Does QNAP's QuTS hero (ZFS) make a difference for AI?
ZFS prefetching helps model load times. The ARC cache effectively keeps recently used models hot in RAM. On a 32 GB QuTS hero, we observed 40% faster cold-start of the second-most-recent model. No effect on inference once loaded.
Q: Will an RTX 3060 work in any NAS?
Three checks. First, the chassis must have a PCIe x16 slot with a real x16 link (some NAS units expose x4 electrically). Second, the PSU must have an 8-pin GPU power connector — most NAS PSUs do not. Third, the case must be tall enough for a dual-slot card. Many off-the-shelf NAS units fail check 2 or 3.
Q: Can I run multiple GPUs in TrueNAS SCALE for bigger models?
Yes. Add both GPUs in the YAML (count: 2). Ollama splits a 70B model across the pair when neither has enough VRAM alone. Performance is roughly 0.7x of a single GPU with enough VRAM, so two 12 GB cards are slower than one 24 GB card for a model that fits the latter.
Q: How does this compare to running AI on the Mac as a server?
A 32 GB Mac Studio runs 70B-class models comfortably and has a quiet, low-power profile. A NAS with a discrete GPU is cheaper for equivalent throughput at 7B-13B scale. If you already own the NAS, adding a GPU is the cheapest path. If you are starting from scratch and AI is the main workload, see our Mac local AI setup guide for the alternative.
Q: Will running AI on the NAS slow down file serving?
Under normal load, no. Ollama is GPU-bound; SMB and NFS are network and disk bound. We measured 4 KB random read IOPS on a SATA pool with and without 7B inference active — variation was below 2%. Avoid co-locating AI and heavy SMB write workloads (video editing, Time Machine backups) on the same disks.
Q: Can I use the NAS GPU for both Ollama and Plex transcoding?
Plex hardware transcode uses NVENC, which is a separate engine on the GPU. In theory both can run concurrently. In practice, sustained AI load can starve Plex of memory bandwidth and cause stuttering. If both matter, get two GPUs or pick one.
Q: What about running on Unraid instead of TrueNAS?
Unraid has excellent GPU passthrough and a strong community for AI containers. Setup is similar to TrueNAS SCALE — Docker + nvidia-container-toolkit. The benchmarks above translate directly.
Conclusion
A NAS earns its place in the rack by running quietly, all the time, with redundant storage. Adding Ollama and Open WebUI turns it into the always-on private AI server every household and homelab should have. On a Celeron unit you get a toy. On a real x86 NAS with a midrange CPU you get a usable single-user assistant. Add a $230 used GPU and you get a household-scale assistant that runs 7B and 13B models faster than most cloud free tiers, never sends your data anywhere, and costs roughly $7/month to run.
If you are ready to scale beyond a single household, our Ollama production deployment and AI gateway with LiteLLM guides cover putting your NAS behind a real reverse proxy with auth and routing. For picking the right model for the hardware tier you have, see AI on 16GB RAM.
Want more weird-but-practical local AI deployments delivered weekly? Subscribe to the LocalAIMaster newsletter — we write about the setups nobody else does.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!