Free Tool · No Signup
Which GPU Should I Buy to Run a Local LLM?
Pick the model you want to run (7B, 14B, 32B, or 70B) and your budget. The tool returns the cheapest real GPU that actually fits — with approximate 2026 prices and a live Amazon price-check link. Short version: a 7B runs on a $300 RTX 3060 12GB, a 32B wants 24GB (a used RTX 3090), and a 70B needs two cards. The tool shows the full list, cheapest first.
1 · What model do you want to run?
How the picker works
Every model size is tagged with the practical VRAM it needs at Q4_K_M quantization — the community-standard quant that runs at roughly a quarter of the full-precision size with almost no quality loss. The numbers include a few GB of headroom for the KV cache at a normal context length, so they reflect what actually fits in practice rather than the bare weight size. The tool then filters our six real GPUs down to the ones with enough VRAM and sorts them by your budget. If no single consumer card is big enough (the 70B case), it falls back to the cheapest two-card build.
The six cards are all things you can actually buy in 2026: the RTX 3060 12GB (~$300, new, relaunched this year), Tesla P40 24GB (~$200 used — the rawest VRAM-per-dollar, but slow and fanless), RTX 5060 Ti 16GB (~$480, best new-card value), used RTX 3090 24GB (~$800, the value king), RTX 4090 24GB (~$1,700, fast but out of production), and RTX 5090 32GB (~$2,200, the fastest and only consumer card above 24GB). All prices are approximate mid-2026 street prices and move constantly.
Worked examples
Run a 7B/8B model
Needs ~8GB. Cheapest sensible new card: RTX 3060 12GB (~$300). Llama 3.1 8B or Qwen 2.5 Coder 7B run smoothly with room to spare. See best GPUs for AI.
Run a 32B model
Needs ~22GB → you want a 24GB card. Cheapest: used RTX 3090 (~$800), or a Tesla P40 (~$200) if you accept ~3x slower speed. Qwen 2.5 32B and QwQ 32B fit at Q4.
Run a 70B model
Needs ~42GB — no single consumer card fits. Cheapest build: 2× used RTX 3090 (48GB, ~$1,600). See how the sizes compare in 7B vs 14B vs 32B vs 70B for coding.
Buying used safely
The 3090 and P40 are both used-market cards. Before you buy, read the used-GPU buying guide and the 5090 vs 4090 benchmark.
Want the exact VRAM number for a specific model, context length, and quant before you spend? Use the VRAM calculator, then come back here to find the card. If you're still unsure which model to run at all, the AI Model Finder matches hardware to the best model.
Frequently asked questions
What is the cheapest GPU to run a local LLM?
What GPU do I need to run a 70B model locally?
How much VRAM does each model size need at Q4?
Is a used RTX 3090 still worth it in 2026?
Are these prices accurate?
Bought the GPU — now run a model on it
Local AI Master's deployment course walks through standing up Ollama and llama.cpp, splitting a model across two GPUs, picking the right quant, and serving an OpenAI-compatible API from your own box. Real code, full repo.
See the deployment course →Related tools & resources
- → VRAM Calculator — exact VRAM for any model + quant
- → AI Model Finder — match your hardware to a model
- → Best GPUs for AI — full ranked buyer guide
- → Used-GPU buying guide — how to buy a 3090/P40 safely
- → VRAM requirements 2026 — model-by-model VRAM table
Go from reading about AI to building with AI
20 structured courses. Hands-on projects. Runs on your machine. Start free.
Written by the Local AI Master Team
The team behind Local AI Master
We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.