How to Run Llama 3 on Mac (Apple Silicon & Intel)
Run Llama 3 on macOS in 15 Minutes
Published on April 5, 2025 • 13 min read
Apple Silicon turned MacBooks into capable local AI rigs. With the right quantized weights you can run Llama 3 entirely offline—no cloud, no subscriptions, complete privacy. This walkthrough gets you from clean macOS install to a tuned, Metal-accelerated Llama 3 chat in under 15 minutes.
System Snapshot
MacBook Pro M3 Pro (18GB)
Tokens/sec
28
VRAM
12GB
Model
Llama 3.1 8B Q4
Battery
78% • Plugged
Table of Contents
- Prerequisites
- Step 1 – Install Command Line Tools & Homebrew
- Step 2 – Install Ollama
- Step 3 – Download Llama 3 Models
- Step 4 – Run & Optimize Llama 3
- Troubleshooting
- FAQ
- Next Steps
Prerequisites {#prerequisites}
- macOS 13.5 Ventura or newer (Sonoma recommended)
- 12GB+ unified memory for Llama 3 8B, 24GB for 70B
- 20GB of free SSD space
- Command Line Tools + Homebrew (installed in Step 1)
⚡ Quick Tip
Close Chrome and memory-heavy apps before running your first session. This frees up unified memory so the Metal backend can keep the entire model resident.
Step 1 – Install Command Line Tools & Homebrew {#step-1}
- Open Terminal (Spotlight → Terminal).
- Install Command Line Tools:
xcode-select --install
- Install Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
% xcode-select --install
softwareupdate --install-rosetta
% /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
✅ Homebrew installed in /opt/homebrew
Step 2 – Install Ollama {#step-2}
- Download the latest Ollama.dmg from ollama.com.
- Drag Ollama.app into Applications.
- Launch Ollama and approve the security prompt (System Settings → Privacy & Security → Allow).
- Start the Ollama service:
launchctl load /Library/LaunchDaemons/com.ollama.ollama.plist
Step 3 – Download Llama 3 Models {#step-3}
Model | Recommended Mac | Command |
---|---|---|
Llama 3.1 8B Q4_K_M | M1/M2/M3 (8–16GB) | ollama pull llama3.1:8b |
Llama 3.1 8B Q5_K_M | M3 Pro/Max (18GB+) | ollama pull llama3.1:8b-q5 |
Llama 3.1 70B Q4_0 | Mac Studio Ultra (64GB+) | ollama pull llama3.1:70b |
% ollama pull llama3.1:8b
pulling manifest ⣿⣿⣿⣿⣷⣄ 100%
pulling weights ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿ 7.2 GB
Model llama3.1:8b ready • tokens: 4k context
Step 4 – Run & Optimize Llama 3 {#step-4}
Run your first chat:
ollama run llama3.1:8b
Enable Metal acceleration and larger context windows by editing ~/.ollama/config.yaml
:
vm:
memory: 12g
compute: [metal, cpu]
llm:
context_length: 4096
num_ctx: 4096
Restart Ollama: ollama run --set default
.
Troubleshooting {#troubleshooting}
- Model fails to load (out of memory): Try the Q4_K_S build (
ollama pull llama3.1:8b-q4_k_s
). - Fans ramp up immediately: Set
compute: [metal]
and cap power withpmset -a reducespeed 1
during long sessions. - Slow tokens on Intel: Use
--num-parallel 2
and lower context to 2048. - Want automation? Integrate with Automator Quick Actions to send highlighted text directly to Llama 3.
FAQ {#faq}
- Can my M1 MacBook Air run Llama 3? Yes—stick with Q4 builds and keep the device plugged in.
- Do I need a GPU? Apple Silicon GPUs are built in; Intel users can still run CPU-only with lower speed.
- How do I update to the latest weights? Pull the new tag with
ollama pull llama3.2
and remove older versions.
Next Steps {#next-steps}
- Add vision or code assistants from the models directory.
- Compare GPU options if you dual-boot Windows by visiting the hardware guide.
- Need offline workflows? Read Run AI Offline for privacy hardening.
- Building workflows? Use our Choose the Right AI Model framework to match tasks to models.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!