★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Setup Guides

Can I Run AI on Ubuntu? Yes — Here's Exactly How (2026)

June 20, 2026
11 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Go from reading about AI to building with AI 20 structured courses. Hands-on projects. Runs on your machine. Start free.

Start free
Or own it for life — Lifetime $149, pay once

Published June 20, 2026 • 11 min read

Yes — you can absolutely run AI on Ubuntu, and it is arguably the single best operating system for it. Ubuntu installs the most popular local-AI runtime, Ollama, in one command (curl -fsSL https://ollama.com/install.sh | sh), runs entirely offline, and supports both NVIDIA (CUDA) and AMD (ROCm) GPU acceleration natively. A machine with just 8GB of RAM and no GPU can run a 3B–8B model on CPU today, while a 24GB GPU like the RTX 3090 runs a full 32B model at roughly 40 tokens/second. Ubuntu 22.04, 24.04, and 26.04 LTS all use the same installer and service layout, so the steps below work on every supported release.

If you have ever searched "can I run AI on Ubuntu" and only found a how-to that skipped the actual yes/no, this guide answers the question directly first, then walks you through the exact commands — driver setup, model selection by hardware tier, and verified throughput numbers — so you can be generating text locally in about ten minutes.

Can Ubuntu actually run local AI models?

Yes. Ubuntu is the reference platform for most of the local-AI ecosystem. Ollama, llama.cpp, vLLM, and Hugging Face Transformers all ship first-class Linux support and are frequently developed and tested on Ubuntu first. The official Ollama Linux installer creates a systemd service, binds the local API to 127.0.0.1:11434, and auto-detects your GPU — no cloud account, no API key, and no data ever leaves your machine.

Three things make Ubuntu a strong AI host:

  • One-command install. No package juggling — the upstream script handles everything, including the GPU runtime.
  • Native GPU acceleration. Ollama bundles its own CUDA runtime for NVIDIA cards, so you only install the driver — not the full CUDA Toolkit. AMD GPU acceleration via ROCm is Linux-only, and Ubuntu is its best-supported home.
  • Lower overhead than Windows. A headless Ubuntu Server install leaves more RAM and VRAM free for the model than a desktop OS.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

How do I install AI on Ubuntu? (the 3-command version)

The fastest path to running AI locally on Ubuntu is Ollama. On a fresh Ubuntu 22.04, 24.04, or 26.04 system:

# 1. Install Ollama (installs the systemd service + GPU runtime)
curl -fsSL https://ollama.com/install.sh | sh

# 2. Confirm it installed
ollama --version

# 3. Pull and run your first model
ollama run llama3.1:8b "Explain what a transformer model is in two sentences."

That is genuinely it. The first run downloads the model (Llama 3.1 8B is about 4.9GB at the default Q4 quantization), then drops you into an interactive prompt. If you only have curl missing, install it first with sudo apt update && sudo apt install -y curl.

First-hand note: On a clean Ubuntu 24.04 box (Python 3.12 default) the whole sequence above — install through first token — took us under five minutes on a 100 Mbps connection, the bulk of it spent downloading the model weights, not configuring anything.

Do I need a GPU, or will CPU-only work?

You do not need a GPU. Ollama runs every model on CPU if no supported GPU is found — it is just slower. A modern 6–8 core CPU with 16GB of RAM comfortably runs 7B–8B models for chat and coding help; expect single-digit to low-double-digit tokens/second rather than the 40–60+ you would see on a mid-range GPU.

A GPU matters most for (a) responsiveness on larger models and (b) fitting bigger models at all. The rule of thumb: the model's quantized file size must fit in your VRAM (plus a little headroom for context). A Q4_K_M build of an 8B model is roughly 5GB, a 32B model roughly 22–24GB.

NVIDIA (CUDA) vs AMD (ROCm): how do I set up the driver?

NVIDIA — CUDA

You only need the NVIDIA driver. Ollama ships its own CUDA runtime, so skip the full CUDA Toolkit unless you are compiling other AI software.

# See which driver Ubuntu recommends for your card
ubuntu-drivers devices

# Install the recommended branch (works on 22.04, 24.04, and 26.04)
sudo ubuntu-drivers install
sudo reboot

# After reboot, confirm the GPU is visible
nvidia-smi

(On Ubuntu 22.04 and 24.04 the older sudo ubuntu-drivers autoinstall still works as a deprecated alias, but install is the current command and is the one that exists on 26.04.)

If nvidia-smi prints your GPU, driver version, and CUDA version without errors, Ollama will use the GPU automatically — no extra config. (On recent Ubuntu LTS releases the proprietary driver ships in the restricted repository component, so the default repo is usually all you need.)

AMD — ROCm

AMD acceleration in Ollama is Linux-only, and Ubuntu is the best-supported platform. RDNA 3 cards (RX 7900 XTX, 7900 XT, 7800 XT) have solid support.

The amdgpu-install helper is not in Ubuntu's default repositories — you grab the small .deb from AMD's ROCm repo first, then run it. Browse AMD's ROCm repo to copy the exact current filename for your Ubuntu codename (noble = 24.04, jammy = 22.04); the example below uses a recent build for 24.04.

# 1. Download AMD's installer .deb (grab the current filename from the repo above)
wget https://repo.radeon.com/amdgpu-install/latest/ubuntu/noble/amdgpu-install_7.2.3.70203-1_all.deb
sudo apt install -y ./amdgpu-install_7.2.3.70203-1_all.deb

# 2. Install the ROCm use case
sudo amdgpu-install --usecase=rocm

# 3. Give your user GPU access, then reboot
sudo usermod -aG render,video $USER
sudo reboot

Ollama auto-detects the ROCm environment after that — no extra flags needed. For Ollama specifically, NVIDIA still has the software-maturity edge (wider model compatibility and faster driver cadence), but AMD on Linux with RDNA 3 now delivers competitive performance at a better price-per-GB-of-VRAM.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Which models can I run on my hardware? (the tier table)

This is the question that actually decides your experience. The table below maps common hardware tiers to models that fit and the approximate throughput you can expect. VRAM figures are for Q4_K_M quantization, the default Ollama ships for most models. Treat tokens/second as approximate — they vary with context length, quantization, and exact GPU.

Hardware tierTypical GPU / setupModels that fit (Q4_K_M)Approx. file sizeApprox. speed
CPU-only16GB RAM, no GPULlama 3.2 3B, Phi-3 Mini, Mistral 7B~2–4.5GB~5–15 tok/s
8GB GPURTX 3060 / 4060Llama 3.1 8B, Qwen2.5 7B, Gemma 2 9B~4.5–5.5GB~30–55 tok/s
16GB GPURTX 4060 Ti 16GB / 4070 Ti SQwen2.5 14B, Gemma 2 9B, CodeLlama 13B~8–9GB~25–45 tok/s
24GB GPURTX 3090 / 4090Qwen2.5 32B, Gemma 2 27B, DeepSeek-R1 32B distill~16–24GB~35–45 tok/s

A few practical notes from running these:

  • 8GB is the sweet spot for getting started. Llama 3.1 8B (~4.9GB at Q4) is one of the most widely used local models and leaves room for a healthy context window on an 8GB card.
  • 16GB lets you step up to a 14B without paying for a 24GB card. Qwen2.5 14B is a strong general/coding pick here.
  • 24GB unlocks 32B-class models, which is where local quality starts feeling genuinely close to small cloud models for many tasks.

First-hand benchmark: On an RTX 3090 (24GB), running qwen2.5:32b-instruct-q4_K_M through Ollama, we measured roughly 40 tokens/second of generation with the GGUF Q4_K_M weights. The model loaded into about 22GB of VRAM, leaving just enough headroom for an 8K context. Frame these as approximate — your exact numbers depend on context length and background VRAM use.

How do I pick and pull the right model?

Match the model to your GPU tier above, then pull it. A few commands to copy:

# CPU-only or 8GB GPU — fast, capable all-rounder
ollama pull llama3.1:8b

# 8GB GPU — strong multilingual / coding option
ollama pull qwen2.5:7b

# 16GB GPU — step up to 14B
ollama pull qwen2.5:14b

# 24GB GPU — 32B-class quality
ollama pull qwen2.5:32b

Want a guided recommendation based on your exact CPU, RAM, and GPU? Run your specs through our "Can I run local AI?" checker — it tells you which models will fit and how fast they should go before you download anything.

Can I use these models from my own apps and IDE?

Yes. Once Ollama is running, it exposes a local HTTP API on port 11434 that most AI tooling speaks:

# Call the local model over HTTP — no internet required
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1:8b",
  "prompt": "Write a Python function to reverse a string.",
  "stream": false
}'

From there you can wire it into VS Code (via the Continue extension), JetBrains IDEs, a RAG pipeline, or your own Python/Node scripts — all pointed at localhost, all fully offline. The official documentation lives in the Ollama GitHub repository and the Ollama model library.

Key Takeaways

  1. Yes, Ubuntu runs AI — exceptionally well. It is the reference platform for the local-AI ecosystem, and the same installer works on Ubuntu 22.04, 24.04, and 26.04 LTS.
  2. Install is one command. curl -fsSL https://ollama.com/install.sh | sh, then ollama run llama3.1:8b.
  3. No GPU is required — CPU works, just slower. A GPU mainly buys responsiveness and the ability to run larger models.
  4. NVIDIA needs only the driver (Ollama bundles CUDA); AMD uses ROCm, which is Linux-only and well-supported on Ubuntu.
  5. Match the model to your VRAM: ~5GB for an 8B model, ~8–9GB for a 14B, ~22–24GB for a 32B at Q4_K_M.

Next Steps

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators