Run Llama 3 Locally in 10 Minutes
Llama 3.1 running on your own laptop in 10 minutes. No API key, no usage limits, no data leaving your machine.
- 1
Install Ollama
macOS / Linux: `curl -fsSL https://ollama.com/install.sh | sh`. Windows: download the installer from ollama.com. Ollama is a small daemon that handles model downloads, GPU offloading, and an OpenAI-compatible API on localhost:11434. It just works on M-series Macs, NVIDIA GPUs, and falls back to CPU on anything else.
- 2
Pull the model that fits your hardware
For 8GB RAM: `ollama pull llama3.1:8b-instruct-q4_K_M` — fits in ~5GB and runs at ~30 tokens/sec on M2 / RTX 3060. For 16GB+: `ollama pull llama3.1:8b` (full Q5 quantization, slightly higher quality). For 32GB+: `ollama pull llama3.1:70b-q3_K_M` if you want 70B running locally (slow without a high-end GPU but workable).
- 3
Your first chat
Run `ollama run llama3.1:8b` and type a question. That's it — you have a private LLM. Try `Explain transformers in 3 sentences` to verify the model is responding sensibly. Exit with /bye.
- 4
Use it from Python
`pip install ollama`. Then `import ollama; print(ollama.chat(model="llama3.1:8b", messages=[{"role":"user","content":"hi"}])["message"]["content"])`. The same API is OpenAI-compatible at `http://localhost:11434/v1/chat/completions` — point any OpenAI SDK at that base_url and you're running locally.
- 5
Make it useful
A bare LLM isn't the goal — it's the foundation. From here, you build retrieval (point it at your docs), tools (let it call functions), agents (multi-step reasoning), and fine-tuning (teach it your style or data). The What is AI course covers the entire path; the local setup you just did is chapter 1c.
Continue with the full What is AI course on Local AI Master.
This page is one chapter of a structured course covering everything from foundations to production. Try Pro free for 7 days — full access to all 264 chapters across 10 courses, no charge until day 8, cancel anytime.