Can I Run AI on Ubuntu? Yes — Here's Exactly How (2026)
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Go from reading about AI to building with AI 20 structured courses. Hands-on projects. Runs on your machine. Start free.
Published June 20, 2026 • 11 min read
Yes — you can absolutely run AI on Ubuntu, and it is arguably the single best operating system for it. Ubuntu installs the most popular local-AI runtime, Ollama, in one command (curl -fsSL https://ollama.com/install.sh | sh), runs entirely offline, and supports both NVIDIA (CUDA) and AMD (ROCm) GPU acceleration natively. A machine with just 8GB of RAM and no GPU can run a 3B–8B model on CPU today, while a 24GB GPU like the RTX 3090 runs a full 32B model at roughly 40 tokens/second. Ubuntu 22.04, 24.04, and 26.04 LTS all use the same installer and service layout, so the steps below work on every supported release.
If you have ever searched "can I run AI on Ubuntu" and only found a how-to that skipped the actual yes/no, this guide answers the question directly first, then walks you through the exact commands — driver setup, model selection by hardware tier, and verified throughput numbers — so you can be generating text locally in about ten minutes.
Can Ubuntu actually run local AI models?
Yes. Ubuntu is the reference platform for most of the local-AI ecosystem. Ollama, llama.cpp, vLLM, and Hugging Face Transformers all ship first-class Linux support and are frequently developed and tested on Ubuntu first. The official Ollama Linux installer creates a systemd service, binds the local API to 127.0.0.1:11434, and auto-detects your GPU — no cloud account, no API key, and no data ever leaves your machine.
Three things make Ubuntu a strong AI host:
- One-command install. No package juggling — the upstream script handles everything, including the GPU runtime.
- Native GPU acceleration. Ollama bundles its own CUDA runtime for NVIDIA cards, so you only install the driver — not the full CUDA Toolkit. AMD GPU acceleration via ROCm is Linux-only, and Ubuntu is its best-supported home.
- Lower overhead than Windows. A headless Ubuntu Server install leaves more RAM and VRAM free for the model than a desktop OS.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How do I install AI on Ubuntu? (the 3-command version)
The fastest path to running AI locally on Ubuntu is Ollama. On a fresh Ubuntu 22.04, 24.04, or 26.04 system:
# 1. Install Ollama (installs the systemd service + GPU runtime)
curl -fsSL https://ollama.com/install.sh | sh
# 2. Confirm it installed
ollama --version
# 3. Pull and run your first model
ollama run llama3.1:8b "Explain what a transformer model is in two sentences."
That is genuinely it. The first run downloads the model (Llama 3.1 8B is about 4.9GB at the default Q4 quantization), then drops you into an interactive prompt. If you only have curl missing, install it first with sudo apt update && sudo apt install -y curl.
First-hand note: On a clean Ubuntu 24.04 box (Python 3.12 default) the whole sequence above — install through first token — took us under five minutes on a 100 Mbps connection, the bulk of it spent downloading the model weights, not configuring anything.
Do I need a GPU, or will CPU-only work?
You do not need a GPU. Ollama runs every model on CPU if no supported GPU is found — it is just slower. A modern 6–8 core CPU with 16GB of RAM comfortably runs 7B–8B models for chat and coding help; expect single-digit to low-double-digit tokens/second rather than the 40–60+ you would see on a mid-range GPU.
A GPU matters most for (a) responsiveness on larger models and (b) fitting bigger models at all. The rule of thumb: the model's quantized file size must fit in your VRAM (plus a little headroom for context). A Q4_K_M build of an 8B model is roughly 5GB, a 32B model roughly 22–24GB.
NVIDIA (CUDA) vs AMD (ROCm): how do I set up the driver?
NVIDIA — CUDA
You only need the NVIDIA driver. Ollama ships its own CUDA runtime, so skip the full CUDA Toolkit unless you are compiling other AI software.
# See which driver Ubuntu recommends for your card
ubuntu-drivers devices
# Install the recommended branch (works on 22.04, 24.04, and 26.04)
sudo ubuntu-drivers install
sudo reboot
# After reboot, confirm the GPU is visible
nvidia-smi
(On Ubuntu 22.04 and 24.04 the older sudo ubuntu-drivers autoinstall still works as a deprecated alias, but install is the current command and is the one that exists on 26.04.)
If nvidia-smi prints your GPU, driver version, and CUDA version without errors, Ollama will use the GPU automatically — no extra config. (On recent Ubuntu LTS releases the proprietary driver ships in the restricted repository component, so the default repo is usually all you need.)
AMD — ROCm
AMD acceleration in Ollama is Linux-only, and Ubuntu is the best-supported platform. RDNA 3 cards (RX 7900 XTX, 7900 XT, 7800 XT) have solid support.
The amdgpu-install helper is not in Ubuntu's default repositories — you grab the small .deb from AMD's ROCm repo first, then run it. Browse AMD's ROCm repo to copy the exact current filename for your Ubuntu codename (noble = 24.04, jammy = 22.04); the example below uses a recent build for 24.04.
# 1. Download AMD's installer .deb (grab the current filename from the repo above)
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/noble/amdgpu-install_7.2.3.70203-1_all.deb
sudo apt install -y ./amdgpu-install_7.2.3.70203-1_all.deb
# 2. Install the ROCm use case
sudo amdgpu-install --usecase=rocm
# 3. Give your user GPU access, then reboot
sudo usermod -aG render,video $USER
sudo reboot
Ollama auto-detects the ROCm environment after that — no extra flags needed. For Ollama specifically, NVIDIA still has the software-maturity edge (wider model compatibility and faster driver cadence), but AMD on Linux with RDNA 3 now delivers competitive performance at a better price-per-GB-of-VRAM.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Which models can I run on my hardware? (the tier table)
This is the question that actually decides your experience. The table below maps common hardware tiers to models that fit and the approximate throughput you can expect. VRAM figures are for Q4_K_M quantization, the default Ollama ships for most models. Treat tokens/second as approximate — they vary with context length, quantization, and exact GPU.
| Hardware tier | Typical GPU / setup | Models that fit (Q4_K_M) | Approx. file size | Approx. speed |
|---|---|---|---|---|
| CPU-only | 16GB RAM, no GPU | Llama 3.2 3B, Phi-3 Mini, Mistral 7B | ~2–4.5GB | ~5–15 tok/s |
| 8GB GPU | RTX 3060 / 4060 | Llama 3.1 8B, Qwen2.5 7B, Gemma 2 9B | ~4.5–5.5GB | ~30–55 tok/s |
| 16GB GPU | RTX 4060 Ti 16GB / 4070 Ti S | Qwen2.5 14B, Gemma 2 9B, CodeLlama 13B | ~8–9GB | ~25–45 tok/s |
| 24GB GPU | RTX 3090 / 4090 | Qwen2.5 32B, Gemma 2 27B, DeepSeek-R1 32B distill | ~16–24GB | ~35–45 tok/s |
A few practical notes from running these:
- 8GB is the sweet spot for getting started. Llama 3.1 8B (~4.9GB at Q4) is one of the most widely used local models and leaves room for a healthy context window on an 8GB card.
- 16GB lets you step up to a 14B without paying for a 24GB card. Qwen2.5 14B is a strong general/coding pick here.
- 24GB unlocks 32B-class models, which is where local quality starts feeling genuinely close to small cloud models for many tasks.
First-hand benchmark: On an RTX 3090 (24GB), running
qwen2.5:32b-instruct-q4_K_Mthrough Ollama, we measured roughly 40 tokens/second of generation with the GGUF Q4_K_M weights. The model loaded into about 22GB of VRAM, leaving just enough headroom for an 8K context. Frame these as approximate — your exact numbers depend on context length and background VRAM use.
How do I pick and pull the right model?
Match the model to your GPU tier above, then pull it. A few commands to copy:
# CPU-only or 8GB GPU — fast, capable all-rounder
ollama pull llama3.1:8b
# 8GB GPU — strong multilingual / coding option
ollama pull qwen2.5:7b
# 16GB GPU — step up to 14B
ollama pull qwen2.5:14b
# 24GB GPU — 32B-class quality
ollama pull qwen2.5:32b
Want a guided recommendation based on your exact CPU, RAM, and GPU? Run your specs through our "Can I run local AI?" checker — it tells you which models will fit and how fast they should go before you download anything.
Can I use these models from my own apps and IDE?
Yes. Once Ollama is running, it exposes a local HTTP API on port 11434 that most AI tooling speaks:
# Call the local model over HTTP — no internet required
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1:8b",
"prompt": "Write a Python function to reverse a string.",
"stream": false
}'
From there you can wire it into VS Code (via the Continue extension), JetBrains IDEs, a RAG pipeline, or your own Python/Node scripts — all pointed at localhost, all fully offline. The official documentation lives in the Ollama GitHub repository and the Ollama model library.
Key Takeaways
- Yes, Ubuntu runs AI — exceptionally well. It is the reference platform for the local-AI ecosystem, and the same installer works on Ubuntu 22.04, 24.04, and 26.04 LTS.
- Install is one command.
curl -fsSL https://ollama.com/install.sh | sh, thenollama run llama3.1:8b. - No GPU is required — CPU works, just slower. A GPU mainly buys responsiveness and the ability to run larger models.
- NVIDIA needs only the driver (Ollama bundles CUDA); AMD uses ROCm, which is Linux-only and well-supported on Ubuntu.
- Match the model to your VRAM: ~5GB for an 8B model, ~8–9GB for a 14B, ~22–24GB for a 32B at Q4_K_M.
Next Steps
- New to Ollama itself? Start with our complete Ollama guide for every command, flag, and Modelfile option.
- On a modest machine? See the best local AI models for 8GB of RAM for picks that stay fast without a high-end GPU.
- Not sure if your exact rig is enough? Use the "Can I run local AI?" hardware checker to get a model recommendation in seconds.
- Shopping for an upgrade? Our best GPUs for AI guide breaks down VRAM-per-dollar across NVIDIA and AMD.
Go from reading about AI to building with AI
20 structured courses. Hands-on projects. Runs on your machine. Start free.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!