★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds

Free Tool · No Signup

Which GPU Should I Buy to Run a Local LLM?

Pick the model you want to run (7B, 14B, 32B, or 70B) and your budget. The tool returns the cheapest real GPU that actually fits — with approximate 2026 prices and a live Amazon price-check link. Short version: a 7B runs on a $300 RTX 3060 12GB, a 32B wants 24GB (a used RTX 3090), and a 70B needs two cards. The tool shows the full list, cheapest first.

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

1 · What model do you want to run?

How the picker works

Every model size is tagged with the practical VRAM it needs at Q4_K_M quantization — the community-standard quant that runs at roughly a quarter of the full-precision size with almost no quality loss. The numbers include a few GB of headroom for the KV cache at a normal context length, so they reflect what actually fits in practice rather than the bare weight size. The tool then filters our six real GPUs down to the ones with enough VRAM and sorts them by your budget. If no single consumer card is big enough (the 70B case), it falls back to the cheapest two-card build.

The six cards are all things you can actually buy in 2026: the RTX 3060 12GB (~$300, new, relaunched this year), Tesla P40 24GB (~$200 used — the rawest VRAM-per-dollar, but slow and fanless), RTX 5060 Ti 16GB (~$480, best new-card value), used RTX 3090 24GB (~$800, the value king), RTX 4090 24GB (~$1,700, fast but out of production), and RTX 5090 32GB (~$2,200, the fastest and only consumer card above 24GB). All prices are approximate mid-2026 street prices and move constantly.

Worked examples

Run a 7B/8B model

Needs ~8GB. Cheapest sensible new card: RTX 3060 12GB (~$300). Llama 3.1 8B or Qwen 2.5 Coder 7B run smoothly with room to spare. See best GPUs for AI.

Run a 32B model

Needs ~22GB → you want a 24GB card. Cheapest: used RTX 3090 (~$800), or a Tesla P40 (~$200) if you accept ~3x slower speed. Qwen 2.5 32B and QwQ 32B fit at Q4.

Run a 70B model

Needs ~42GB — no single consumer card fits. Cheapest build: 2× used RTX 3090 (48GB, ~$1,600). See how the sizes compare in 7B vs 14B vs 32B vs 70B for coding.

Buying used safely

The 3090 and P40 are both used-market cards. Before you buy, read the used-GPU buying guide and the 5090 vs 4090 benchmark.

Want the exact VRAM number for a specific model, context length, and quant before you spend? Use the VRAM calculator, then come back here to find the card. If you're still unsure which model to run at all, the AI Model Finder matches hardware to the best model.

Frequently asked questions

What is the cheapest GPU to run a local LLM?
For a 7B/8B model, the cheapest sensible new card is the RTX 3060 12GB (approx $300). If you only care about raw VRAM-per-dollar and don't mind a slow, fanless, display-less homelab card, a used Tesla P40 gives you 24GB for around $150-240 — enough for a 32B model at Q4, but roughly 3x slower than an RTX 3090. For most people the RTX 3060 12GB or a used RTX 3090 24GB is the right balance of price and usability.
What GPU do I need to run a 70B model locally?
A 70B model at Q4_K_M needs roughly 40-44GB of VRAM once you add KV-cache headroom — more than any single consumer GPU. The proven cheap build is two used RTX 3090 24GB cards (48GB total, approx $1,400-1,800), with the model split across both GPUs via Ollama or llama.cpp. A single RTX 5090 (32GB) cannot run 70B at Q4; it only fits at very aggressive Q3/Q2 quantization, which costs noticeable quality.
How much VRAM does each model size need at Q4?
Rough practical floors at Q4_K_M (weights plus a few GB of context headroom): a 7B/8B model needs ~8GB, a 13B/14B needs ~10-11GB, a 30B/32B needs ~20-22GB, and a 70B needs ~40-44GB. These are the numbers this tool uses to decide which cards fit. For an exact figure for a specific model, context length, and quant level, use the VRAM calculator.
Is a used RTX 3090 still worth it in 2026?
Yes — for local AI it remains the value king. You get 24GB of fast GDDR6X for roughly $700-900 used, well under a 4090 or 5090, and two of them is the standard cheap 70B build. The 4090 (24GB) is much faster but out of production, so it stays expensive; the 5090 (32GB) is the fastest single card and the only consumer card with more than 24GB, but it costs $2,000+.
Are these prices accurate?
They are approximate mid-2026 street-price midpoints (verified against eBay, Amazon, and GPU price trackers) and they move constantly — GPU pricing in 2026 has been volatile due to memory-chip shortages. Treat every number as "approx." The Amazon links open a live search so you can see today's real price for yourself.

Bought the GPU — now run a model on it

Local AI Master's deployment course walks through standing up Ollama and llama.cpp, splitting a model across two GPUs, picking the right quant, and serving an OpenAI-compatible API from your own box. Real code, full repo.

See the deployment course →

Related tools & resources

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once
LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

Free Tools & Calculators