Setup Guide

How to Run Llama 3 on Mac (Apple Silicon & Intel)

April 5, 2025
13 min read
Local AI Master Field Team

Run Llama 3 on macOS in 15 Minutes

Published on April 5, 2025 • 13 min read

Apple Silicon turned MacBooks into capable local AI rigs. With the right quantized weights you can run Llama 3 entirely offline—no cloud, no subscriptions, complete privacy. This walkthrough gets you from clean macOS install to a tuned, Metal-accelerated Llama 3 chat in under 15 minutes.

System Snapshot

MacBook Pro M3 Pro (18GB)

Tokens/sec

28

VRAM

12GB

Model

Llama 3.1 8B Q4

Battery

78% • Plugged

Table of Contents

  1. Prerequisites
  2. Step 1 – Install Command Line Tools & Homebrew
  3. Step 2 – Install Ollama
  4. Step 3 – Download Llama 3 Models
  5. Step 4 – Run & Optimize Llama 3
  6. Troubleshooting
  7. FAQ
  8. Next Steps

Prerequisites {#prerequisites}

  • macOS 13.5 Ventura or newer (Sonoma recommended)
  • 12GB+ unified memory for Llama 3 8B, 24GB for 70B
  • 20GB of free SSD space
  • Command Line Tools + Homebrew (installed in Step 1)

⚡ Quick Tip

Close Chrome and memory-heavy apps before running your first session. This frees up unified memory so the Metal backend can keep the entire model resident.

Step 1 – Install Command Line Tools & Homebrew {#step-1}

  1. Open Terminal (Spotlight → Terminal).
  2. Install Command Line Tools:
xcode-select --install
  1. Install Homebrew:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Terminal • Apple Silicon ~/

% xcode-select --install

softwareupdate --install-rosetta

% /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

✅ Homebrew installed in /opt/homebrew

Step 2 – Install Ollama {#step-2}

  1. Download the latest Ollama.dmg from ollama.com.
  2. Drag Ollama.app into Applications.
  3. Launch Ollama and approve the security prompt (System Settings → Privacy & Security → Allow).
  4. Start the Ollama service:
launchctl load /Library/LaunchDaemons/com.ollama.ollama.plist

Step 3 – Download Llama 3 Models {#step-3}

ModelRecommended MacCommand
Llama 3.1 8B Q4_K_MM1/M2/M3 (8–16GB)ollama pull llama3.1:8b
Llama 3.1 8B Q5_K_MM3 Pro/Max (18GB+)ollama pull llama3.1:8b-q5
Llama 3.1 70B Q4_0Mac Studio Ultra (64GB+)ollama pull llama3.1:70b
Download Monitor Network • 1.2 Gbps

% ollama pull llama3.1:8b

pulling manifest ⣿⣿⣿⣿⣷⣄ 100%

pulling weights ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿ 7.2 GB

Model llama3.1:8b ready • tokens: 4k context

Step 4 – Run & Optimize Llama 3 {#step-4}

Run your first chat:

ollama run llama3.1:8b

Enable Metal acceleration and larger context windows by editing ~/.ollama/config.yaml:

vm:
  memory: 12g
  compute: [metal, cpu]
llm:
  context_length: 4096
  num_ctx: 4096

Restart Ollama: ollama run --set default.

Troubleshooting {#troubleshooting}

  • Model fails to load (out of memory): Try the Q4_K_S build (ollama pull llama3.1:8b-q4_k_s).
  • Fans ramp up immediately: Set compute: [metal] and cap power with pmset -a reducespeed 1 during long sessions.
  • Slow tokens on Intel: Use --num-parallel 2 and lower context to 2048.
  • Want automation? Integrate with Automator Quick Actions to send highlighted text directly to Llama 3.

FAQ {#faq}

  • Can my M1 MacBook Air run Llama 3? Yes—stick with Q4 builds and keep the device plugged in.
  • Do I need a GPU? Apple Silicon GPUs are built in; Intel users can still run CPU-only with lower speed.
  • How do I update to the latest weights? Pull the new tag with ollama pull llama3.2 and remove older versions.

Next Steps {#next-steps}

Reading now
Join the discussion

Local AI Master Field Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: April 5, 2025🔄 Last Updated: October 15, 2025✓ Manually Reviewed

Mac-Specific Local AI Tips

Get our bi-weekly Mac notebook including new Metal patches, quantized builds, and workflow automations.

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
Free Tools & Calculators