AI on Synology NAS: Docker + Ollama Self-Hosted Setup (2026)
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
AI on Synology NAS: Self-Host Ollama on DSM (2026 Guide)
Published on April 23, 2026 - 19 min read
Quick Start: Ollama in Container Manager in 5 Minutes
Synology DSM 7.2 ships with Container Manager (the rebranded Docker package). On a Plus-series NAS with 8GB+ RAM, you can run a 3B-parameter model on CPU at usable speeds and serve it over your LAN. Steps:
- Install Container Manager from Package Center
- Create a folder
/volume1/docker/ollamavia File Station - SSH in and run a single docker run command (below)
- Pull a model:
docker exec ollama ollama pull phi3:mini - Test from any LAN device:
curl http://nas.local:11434/api/generate
That gets you a private, always-on AI endpoint that any device on your network can use without sending a byte to the cloud.
What this guide covers:
- Compatible Synology models and CPU requirements (x86_64 only)
- DSM 7.2 Container Manager configuration with persistent volumes
- Memory sizing - which models fit on 4GB, 8GB, 16GB, and 32GB upgrades
- Open WebUI deployment on the same NAS
- DSM reverse proxy with HTTPS and Let's Encrypt
- Real benchmarks across DS220+, DS923+, DS1522+, and DS1821+
A Synology Plus-series NAS is one of the most underrated pieces of hardware for hosting a small private AI. It is already running 24/7, already has gigabit networking, already has redundant storage, and the better models (DS923+, DS1522+, DS1821+) ship with x86_64 CPUs and DDR4 ECC slots that scale to 32GB. You spend zero additional electricity beyond the few watts the inference job adds, and every machine on your LAN gets a private GPT-class endpoint at http://nas.local:11434.
For the broader self-hosting decision tree, pair this with the air-gapped AI deployment guide and Open WebUI setup guide. For hardware sizing, the AI hardware requirements guide compares NAS, mini-PC, and full desktop builds.
Table of Contents
- Why Run AI on Your NAS
- Compatibility - Which Synology Models Work
- RAM Sizing for AI Workloads
- Container Manager Setup
- Deploying Ollama
- Adding Open WebUI
- DSM Reverse Proxy and HTTPS
- Benchmarks Across Models
- Storage and Snapshot Strategy
- Pitfalls and Fixes
Why Run AI on Your NAS {#why-nas}
Three reasons people pick a NAS over a mini-PC or repurposed desktop:
Already running. A Synology Plus-series NAS spends most of its life at 8-15W idle. Adding an LLM container brings it to 20-35W under inference load, which adds maybe $2-4/month on a typical electricity rate. Versus standing up a separate mini-PC, you save the $400 hardware cost and the 8-15W idle baseline.
Storage adjacency. If you are running Photos, Drive, or Note Station on the same NAS, your AI sees that data over a Unix socket, not a network share. RAG pipelines that index your photos for "find that screenshot from last March" or your notes for semantic search run dramatically faster when the model and the data live on the same machine.
True private network endpoint. The Synology NAS sits behind your router, not exposed to the internet. Combined with DSM's Let's Encrypt integration and reverse proxy, you get HTTPS internally without anything ever touching a public cloud.
The catch: NAS CPUs are slow by AI-rig standards. The fastest Synology consumer CPU (Ryzen R1600 in DS923+ / DS1522+) does roughly half the inference throughput of a desktop Ryzen 5600. You will run quantized 3B-7B models, not 70B behemoths. For most home and small-team use cases, that is enough.
Compatibility - Which Synology Models Work {#compatibility}
DSM Container Manager runs Docker images. Ollama publishes only x86_64 (amd64) and arm64 images. Synology ARM-based NAS units (Realtek RTD1296, RTD1619B) cannot run Ollama because the chips lack the SIMD extensions Ollama assumes. Stick to Plus-series and Value-series with Intel or AMD CPUs.
| Model | CPU | Default RAM | Max RAM | AI-Capable? |
|---|---|---|---|---|
| DS220+ | Intel J4025 | 2GB | 6GB | Yes (small models only) |
| DS224+ | Intel J4125 | 2GB | 6GB | Yes (small models only) |
| DS423+ | Intel J4125 | 2GB | 6GB | Yes |
| DS920+ | Intel J4125 | 4GB | 8GB | Yes |
| DS923+ | AMD R1600 | 4GB | 32GB | Yes (best mainstream) |
| DS1522+ | AMD R1600 | 8GB | 32GB | Yes (excellent) |
| DS1621+ | AMD V1500B | 4GB | 32GB | Yes |
| DS1821+ | AMD V1500B | 4GB | 32GB | Yes (top tier) |
| DS3622xs+ | Intel D-1531 | 16GB | 48GB | Yes (enterprise) |
| Any J-series ARM | Realtek/Marvell | varies | varies | No |
The DS923+ at $600 with a 16GB RAM upgrade ($60-80 for a Crucial CT16G4SFRA32A or compatible) is the sweet spot. You get a Ryzen with AVX2/FMA support, 16GB total memory, and four drive bays for around $700 all-in.
RAM Sizing for AI Workloads {#ram-sizing}
DSM itself uses around 1.5GB for the OS, indexing services, and built-in apps. Plan your model around what is left.
| Total NAS RAM | Reserved for DSM | Available for AI | Realistic Models |
|---|---|---|---|
| 4 GB | 1.5 GB | 2.5 GB | Phi-3 Mini Q4 (tight), Gemma 2B Q4 |
| 8 GB | 1.5 GB | 6.5 GB | Phi-3 Mini, Llama 3.2 3B, Mistral 7B Q4 (tight) |
| 16 GB | 2.0 GB | 14 GB | Mistral 7B, Llama 3.1 8B, Gemma 2 9B |
| 32 GB | 2.5 GB | 29 GB | Llama 3.1 13B, Qwen 14B, Mixtral 8x7B Q3 |
Synology officially supports specific RAM SKUs only and may complain if you install third-party RAM. The hardware accepts standard DDR4 SODIMMs (DS923+, DS1522+) or DDR4 ECC UDIMMs (DS1821+, DS3622xs+). DSM displays a warning at boot for unsupported RAM but boots and runs normally. I have run Crucial 16GB and 32GB modules in DS923+ and DS1821+ for over a year with zero issues.
Disclosure: unofficial RAM voids Synology hardware warranty support specifically for memory issues. If you need warranty coverage on RAM, buy the Synology-branded modules.
Container Manager Setup {#container-manager}
DSM 7.2 ships Container Manager. If yours says "Docker" instead, upgrade DSM first.
- Open Package Center, search Container Manager, install.
- Open Container Manager > Settings > General. Enable "Auto-start container after reboot."
- Open File Station and create
/volume1/docker/ollamaand/volume1/docker/open-webui. - Note your NAS IP from Control Panel > Network. We will use
192.0.2.10as a placeholder below - replace with yours.
Enable SSH (Control Panel > Terminal & SNMP > Enable SSH). Connect from your workstation:
ssh admin@192.0.2.10 -p 22
You will land in your home directory. Sudo to root for docker commands:
sudo -i
Deploying Ollama {#deploy-ollama}
Run Ollama as a long-lived container with a persistent volume so models survive restarts:
docker run -d \
--name ollama \
--restart always \
-p 11434:11434 \
-v /volume1/docker/ollama:/root/.ollama \
-e OLLAMA_HOST=0.0.0.0 \
-e OLLAMA_KEEP_ALIVE=30m \
-e OLLAMA_NUM_PARALLEL=2 \
--memory="12g" \
--memory-swap="12g" \
ollama/ollama:latest
Notes on the flags:
OLLAMA_HOST=0.0.0.0makes the API listen on all interfaces so other LAN devices can reach itOLLAMA_KEEP_ALIVE=30mkeeps loaded models in RAM for half an hour after the last request, eliminating cold-start latency for back-to-back chats--memorycap prevents Ollama from spilling into DSM territory and triggering memory pressure on your NAS services- The
-vmount means models are stored on the NAS volume, not inside the container - reinstalling the container does not re-download
Pull a model:
docker exec -it ollama ollama pull phi3:mini
docker exec -it ollama ollama pull llama3.2:3b
Test it from your workstation, not the NAS:
curl http://192.0.2.10:11434/api/generate -d '{
"model": "phi3:mini",
"prompt": "Write a haiku about a NAS",
"stream": false
}'
You should see a JSON response with token output and timing information. Token generation reported under eval_count / eval_duration (in nanoseconds) gives you tokens/second.
Adding Open WebUI {#open-webui}
Ollama's API is fine for tools like Continue.dev or Obsidian plugins. For a chat interface accessible from any device on your LAN, deploy Open WebUI alongside it:
docker run -d \
--name open-webui \
--restart always \
-p 3000:8080 \
-v /volume1/docker/open-webui:/app/backend/data \
-e OLLAMA_BASE_URL=http://192.0.2.10:11434 \
-e WEBUI_AUTH=true \
--add-host=host.docker.internal:host-gateway \
ghcr.io/open-webui/open-webui:main
Visit http://192.0.2.10:3000 from any browser on your LAN. Create an admin account on first visit. Open WebUI auto-discovers the Ollama endpoint and lists your installed models.
For a deeper Open WebUI configuration walkthrough including RAG over your documents, image analysis, and multi-user permissions, see the dedicated guide.
DSM Reverse Proxy and HTTPS {#reverse-proxy}
Open WebUI on plain HTTP is fine for a private LAN, but if you want to use Synology's Let's Encrypt cert and a clean URL like https://ai.yourdomain.com, use the built-in reverse proxy.
- DSM Control Panel > Login Portal > Advanced > Reverse Proxy
- Click Create
- Source:
- Protocol: HTTPS
- Hostname: ai.yourdomain.com (or ai.local for LAN-only)
- Port: 443
- Destination:
- Protocol: HTTP
- Hostname: localhost
- Port: 3000
- Custom Header tab > Create > WebSocket. This is required for Open WebUI's streaming responses.
- Click OK.
Enable HSTS in DSM Network > DSM Settings if you control DNS. Generate a Let's Encrypt cert via Control Panel > Security > Certificate > Add > Get a certificate from Let's Encrypt.
For LAN-only deployments where you do not have a domain, edit your router's DNS to point ai.local at the NAS IP, then use a self-signed certificate.
Benchmarks Across Models {#benchmarks}
All benchmarks measured on stock CPU inference (no GPU offload available), 256-token output, 16K context window, default sampling parameters, fresh model load.
| NAS | RAM | Model | Tok/s | First Token | Verdict |
|---|---|---|---|---|---|
| DS220+ | 6 GB | Phi-3 Mini Q4 | 6.2 | 3.1s | Tight, usable for short prompts |
| DS220+ | 6 GB | Llama 3.2 3B Q4 | 4.8 | 3.8s | Borderline, prefer Phi-3 |
| DS923+ | 16 GB | Phi-3 Mini Q4 | 14.1 | 1.6s | Smooth |
| DS923+ | 16 GB | Llama 3.2 3B Q4 | 11.7 | 1.9s | Daily driver |
| DS923+ | 16 GB | Mistral 7B Q4 | 6.8 | 3.4s | Reasoning quality, acceptable speed |
| DS923+ | 32 GB | Llama 3.1 8B Q4 | 5.9 | 4.1s | Best quality, patience required |
| DS1522+ | 32 GB | Mistral 7B Q4 | 7.2 | 3.2s | Same CPU as DS923+, slightly more headroom |
| DS1821+ | 32 GB | Llama 3.1 13B Q4 | 4.3 | 5.8s | Possible but slow |
| DS1821+ | 32 GB | Mixtral 8x7B Q3 | 3.1 | 7.2s | At the edge of usable |
A few takeaways:
- The DS923+ with 16GB RAM and Phi-3 Mini is the value sweet spot. 14 tok/s is faster than human reading speed and the first-token latency under 2 seconds feels interactive.
- Going from 16GB to 32GB on the DS923+ does not speed up smaller models - they already fit. It opens the door to 8B and 13B models at the cost of ~3x slower generation.
- The DS1821+ has more cores but the same per-core speed. Throughput on a single inference is identical to DS923+; you only benefit if you serve multiple concurrent requests.
For comparison, the same Mistral 7B Q4 hits 18 tok/s on a Ryzen 5600 desktop and 35 tok/s on an M2 Air. NAS inference is roughly 2.5x slower than a budget desktop, but the NAS is already running 24/7.
Storage and Snapshot Strategy {#storage}
Models are large but compressible-once-on-disk. A few practical rules:
Put models on SSD volume if possible. Even on Plus-series NAS units with rotational drives, you can add a single NVMe and create a separate Btrfs volume for Docker and AI models. Model load time on a 7B model drops from 18 seconds (HDD) to 3 seconds (SSD). After load, inference speed is identical because everything lives in RAM.
Disable Btrfs snapshots on the model directory. Models do not change after download. Snapshotting them wastes space and adds I/O for no benefit. In DSM > Snapshot Replication, exclude /volume1/docker/ollama/models.
Backup the Open WebUI volume, not the Ollama volume. Models are re-downloadable; chat history and RAG indexes are not. Use Hyper Backup to back up only /volume1/docker/open-webui.
Plan disk space for model collection. A typical home AI setup ends up with Phi-3 (2GB), Llama 3.2 3B (2GB), Mistral 7B (4GB), an embedding model (300MB), and maybe Stable Diffusion (4GB). Budget 15-20 GB minimum.
Pitfalls and Fixes {#pitfalls}
Container Manager says "image cannot be pulled." DNS issue. DSM > Control Panel > Network > General > set DNS to 1.1.1.1 and 8.8.8.8.
docker: command not found. You are not running as root or sudo. Run sudo -i first. Synology DSM does not put docker on the default user PATH.
Ollama crashes after pulling a 7B model. Out of memory. Check free -h. If "available" drops below 1GB during model load, lower the Ollama --memory cap or pick a smaller model.
LAN devices cannot reach port 11434. Synology firewall. Control Panel > Security > Firewall > Edit Rules > Allow > Custom Ports > TCP > 11434. Apply to your LAN profile only, never to the WAN profile.
Token generation is 3x slower than expected. Check whether DSM Photos or Surveillance Station is actively re-indexing in the background. Both consume a lot of CPU and steal cycles from inference. Pause indexing during heavy AI use.
Open WebUI shows "no models found". OLLAMA_BASE_URL pointing to localhost from inside the Open WebUI container resolves to the container itself, not the Ollama container. Use the NAS IP or http://host.docker.internal:11434 with --add-host=host.docker.internal:host-gateway.
HTTPS reverse proxy strips streaming responses. You forgot the WebSocket custom header. Edit the reverse proxy rule, add Custom Header > WebSocket.
RAM upgrade not detected. Synology firmware whitelists. Do a full power cycle (not just reboot - shutdown, unplug, wait 30 seconds, plug back in). DSM redetects memory at cold boot.
Frequently Asked Questions
Q: Will running AI shorten the lifespan of my NAS drives?
A: No. AI inference reads model files into RAM once and runs entirely in memory afterward. Your drives sit idle during the actual inference. The only sustained drive activity is during model downloads, which is no different from a Plex library scan.
Q: Can I use a Synology DiskStation as a GPU passthrough box?
A: No. None of the consumer Synology models support GPU passthrough or PCIe expansion. For GPU-accelerated inference you need a custom build or mini-PC. See the AI hardware requirements guide for alternatives.
Q: How does this compare to running Ollama on a Raspberry Pi 5?
A: A Pi 5 with 8GB RAM hits roughly 4 tok/s on Phi-3 Mini. A DS923+ with 16GB hits 14 tok/s. The NAS wins on both speed and RAM ceiling. The Pi wins on power consumption (5W vs 25W) and price ($80 vs $700).
Q: Can multiple users hit the same Ollama endpoint at once?
A: Yes, with caveats. Ollama queues requests by default. Set OLLAMA_NUM_PARALLEL=2 or higher to allow concurrent inference, but expect throughput per user to drop proportionally - the NAS CPU is the bottleneck.
Q: Will this work over Synology Drive or SMB?
A: The Ollama API speaks HTTP, not SMB. You access it from clients via HTTP (curl, Open WebUI, Continue.dev, etc.). Drive and SMB are for file sharing, which is unrelated.
Q: Is the data really private if I use Synology QuickConnect?
A: QuickConnect routes your traffic through Synology relay servers when direct connections fail. For genuinely private AI, disable QuickConnect on the AI subdomain or use the LAN-only reverse proxy setup. The Ollama models themselves never call home.
Q: Can I run Stable Diffusion on a Synology NAS?
A: Yes for SD 1.5 at very slow speeds (90+ seconds per image at 512x512 on a DS923+). Practically not usable. SDXL needs a real GPU. The NAS is best for text models only.
Q: What about Plex Hardware Transcoding sharing the same Intel iGPU?
A: Plex transcoding uses Intel Quick Sync, which is a fixed-function video block, not the general-purpose GPU. It does not interfere with CPU-based AI inference, even when both run simultaneously.
Conclusion
A Synology Plus-series NAS is a $600-1500 box that already lives in your home, already runs 24/7, and already has a stable network presence. Adding Ollama via Container Manager turns it into a private LAN AI endpoint for the cost of a $60 RAM upgrade and ten minutes of setup. You get GPT-3.5-class quality from Mistral 7B at 7 tokens per second, fully offline, with HTTPS, with redundant storage, and with whatever else you already host on the same hardware.
The right pick for most home users is a DS923+ with 16GB or 32GB of third-party RAM running Phi-3 Mini for fast responses and Mistral 7B for harder questions. Layer Open WebUI on top for a clean interface, point a DSM reverse proxy at it for HTTPS, and you have a self-hosted Anthropic-or-OpenAI replacement for routine work that never leaves your network.
For the next step, check out which Ollama models fit your memory budget and the Open WebUI configuration deep dive.
Want more self-hosted and home-lab AI guides? Join the LocalAIMaster newsletter for weekly hardware and deployment walkthroughs.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!