How does Docker Model Runner compare to Ollama?

Performance is similar (0-12% variance, with DMR slightly faster on Apple Silicon). Docker Model Runner excels at native Docker Compose integration, automatic GPU detection, and treating models as Docker primitives. Ollama has a larger ecosystem, more integrations, and more community tools. Choose DMR for Docker-centric workflows and Compose; choose Ollama for broader ecosystem support and established integrations.

What models can I run with Docker Model Runner?

DMR supports GGUF models from Docker Hub: ai/smollm2, ai/llama3.2, ai/llama3.3, ai/gemma3, ai/phi4, ai/mistral, ai/mistral-nemo, ai/qwen2.5, ai/qwen3, ai/deepseek-r1-distill-llama, and ai/all-minilm (embeddings). You can also pull from Hugging Face: docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF, or from any custom OCI registry.

Does Docker Model Runner support GPU acceleration?

Yes, with multiple GPU backends. Apple Silicon (M1-M4) uses Metal API automatically—no configuration needed. NVIDIA GPUs require enabling "GPU-backed inference" in Docker Desktop settings or using --gpu cuda flag explicitly. Vulkan support (added 2025) enables acceleration on AMD, Intel, and integrated GPUs. GPU acceleration provides 3-10x faster inference compared to CPU.

Can I use Docker Model Runner with Docker Compose?

Yes, Docker Compose natively supports models with a dedicated "models" top-level section. Define models and reference them in services. Docker automatically injects MODEL_URL and MODEL_NAME environment variables. Containers connect via http://model-runner.docker.internal/engines/v1 (Desktop) or http://localhost:12434/engines/v1 (Linux). Short syntax, long syntax, and provider syntax are all supported.

What is the difference between llama.cpp, vLLM, and Diffusers engines?

llama.cpp is the default engine, works on all platforms, supports GGUF models, and is best for CPU/Apple Silicon. vLLM requires Linux with NVIDIA GPU (x86_64 or WSL2 on Windows), supports Safetensors format, and provides optimized high-throughput inference for production. Diffusers enables image generation (Stable Diffusion) and requires Linux with NVIDIA GPU.

What are the system requirements for Docker Model Runner?

Docker Desktop 4.42+ (Mac/Windows) or Docker Engine with docker-model-plugin (Linux). RAM: 8GB minimum, 16GB+ recommended. Storage varies by model (3-20GB+ per model). GPU optional but recommended for performance—Apple Silicon, NVIDIA CUDA, or Vulkan-compatible GPU. The inference server runs natively on the host (not containerized) for direct GPU access.

How do I pull models from Hugging Face?

Use the hf.co prefix: docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF. Hugging Face models must be in GGUF format for llama.cpp engine. For vLLM, Safetensors format is supported. You can also pull from custom OCI registries: docker model pull myregistry.com/models/mistral:latest.

What CLI commands does Docker Model Runner provide?

Core commands: pull (download), list (show models), rm (remove), run (interactive/background), inspect (details), status (check runner), version (show version). System commands: docker model system info, df (disk usage), prune (cleanup), prune -a (full cleanup). Linux runner management: install-runner, start-runner, stop-runner, restart-runner, uninstall-runner.

How do models load and unload in Docker Model Runner?

Models load into memory on-demand when you first make a request and unload automatically when idle. This differs from keeping models constantly loaded. Use docker model run -d to pre-load a model in detached/background mode. Monitor with docker model list and docker model status. Memory is released when models are unused.

What quantization formats should I use?

Q4_K_M is recommended as the best balance of quality and memory efficiency for most use cases. Q5_K_M provides slightly better quality with more memory. Q8_0 provides near-full quality but requires more memory. For GPU-limited systems, use Q4 variants. Available quantizations depend on the model; check with docker model pull to see tags like ai/llama3.2:3B-Q4_K_M.

How do I enable host-side TCP support?

In Docker Desktop: Settings > AI > Enable "host-side TCP support". This enables connections on port 12434 from the host machine. The endpoint becomes http://localhost:12434/engines/v1. This is required for accessing the API from host applications outside Docker containers. Containers use the internal endpoint model-runner.docker.internal.

Docker Model Runner Guide: Run LLMs with Docker 2026

Q: What is Docker Model Runner?

Docker Model Runner (DMR) is Docker's official solution for running AI/LLM models locally, launched April 2025. It treats AI models as first-class citizens within Docker, uses llama.cpp for inference (with vLLM and Diffusers support), exposes an OpenAI-compatible API on port 12434, and stores models as OCI artifacts on Docker Hub or any OCI-compliant registry. Available in Docker Desktop 4.42+ for Mac/Windows and as a plugin for Linux.

Docker Model Runner Quick Start

3 Commands to Start:
docker model pull ai/llama3.2
docker model run ai/llama3.2 "Hello"
curl localhost:12434/engines/v1/chat/completions

Key Features:
• OpenAI-compatible API
• Native Docker Compose support
• GPU: Metal, CUDA, Vulkan
• Models as Docker primitives

What is Docker Model Runner?

Docker Model Runner (DMR) is Docker's official solution for running AI/LLM models locally, launched April 2025. It treats AI models as first-class citizens within the Docker ecosystem—similar to how Docker treats containers, images, and volumes.

Key Characteristics

Built on llama.cpp: High-performance inference engine (with vLLM and Diffusers support)
OpenAI-compatible API: Drop-in replacement for OpenAI SDK on localhost:12434
OCI Artifacts: Models stored on Docker Hub or any OCI-compliant registry
On-demand Loading: Models load into memory at runtime, unload when idle
Native Execution: Inference server runs directly on host (not containerized) for GPU access

Why Docker Model Runner?

Docker Model Runner fills a gap in the AI development workflow. Instead of managing separate tools for local LLM inference, you can now use Docker's familiar commands and workflows:

Run models with docker model run just like docker run
Pull models from Docker Hub with docker model pull
Integrate AI into Docker Compose with native support
Use existing Docker infrastructure and CI/CD pipelines

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Installation and Setup

Docker Desktop (macOS and Windows)

Docker Model Runner comes included with Docker Desktop 4.42 or later.

Step 1: Install/Update Docker Desktop

Download the latest version from docker.com.

Step 2: Enable Docker Model Runner

Open Docker Desktop
Go to Settings (gear icon)
Navigate to AI
Enable Docker Model Runner
(Optional) Enable GPU-backed inference if you have NVIDIA GPU
(Optional) Enable Host-side TCP support for localhost:12434 access

Step 3: Verify Installation

# Check version
docker model version

# Check status
docker model status

# View help
docker model --help

Linux (Docker Engine)

On Linux, Docker Model Runner is available as a plugin for Docker Engine.

Ubuntu/Debian:

# Update package list
sudo apt-get update

# Install plugin
sudo apt-get install docker-model-plugin

# Verify installation
docker model version

Fedora/RHEL/CentOS:

# Update packages
sudo dnf update

# Install plugin
sudo dnf install docker-model-plugin

# Verify
docker model version

Runner Management (Linux):

# Install runner component
docker model install-runner

# Start the runner
docker model start-runner

# Check status
docker model status

# Stop/restart
docker model stop-runner
docker model restart-runner

# Uninstall
docker model uninstall-runner

Enable TCP Support

To access the API from host applications (outside Docker containers):

Docker Desktop: Settings > AI > Enable "host-side TCP support"
This enables connections on port 12434
API endpoint: http://localhost:12434/engines/v1

Available Models

Docker Model Runner supports GGUF models from Docker Hub and Hugging Face.

Official Docker Hub Models

Model	Tag	Size	Use Case
SmolLM2	ai/smollm2:360M-Q4_K_M	360M	Prototyping, testing
Llama 3.2	ai/llama3.2:3B-Q4_K_M	3B	General text generation
Llama 3.3	ai/llama3.3	70B	Advanced text generation
Gemma 3	ai/gemma3	Various	Reasoning tasks
Phi 4	ai/phi4	14B	Reasoning tasks
Qwen 2.5	ai/qwen2.5	Various	Code generation
Qwen 3	ai/qwen3	Various	Code, multilingual
Mistral	ai/mistral	7B	General, code
Mistral Nemo	ai/mistral-nemo	12B	Advanced tasks
DeepSeek R1	ai/deepseek-r1-distill-llama	Various	Advanced reasoning
All-MiniLM	ai/all-minilm	33M	Embeddings

Model Selection by Use Case

By Hardware:

Low-end (8GB RAM): ai/smollm2, ai/llama3.2:1B
Mid-range (16GB RAM): ai/llama3.2:3B, ai/gemma3:4b, ai/phi4
High-end (32GB+ RAM): ai/llama3.3, ai/deepseek-r1-distill-llama

By Application:

General chat: ai/llama3.2 or ai/llama3.3
Code generation: ai/qwen2.5 or ai/mistral
Reasoning tasks: ai/gemma3 or ai/phi4
Embeddings/RAG: ai/all-minilm
Quick prototyping: ai/smollm2

Pulling Models

# From Docker Hub
docker model pull ai/llama3.2
docker model pull ai/llama3.2:3B-Q4_K_M  # Specific quantization

# From Hugging Face
docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF

# From custom registry
docker model pull myregistry.com/models/mistral:latest

# Check downloaded models
docker model list

Model Format Requirements

Engine	Format	Platform
llama.cpp (default)	GGUF	All
vLLM	Safetensors	Linux + NVIDIA GPU
Diffusers	Various	Linux + NVIDIA GPU

CLI Commands Reference

Model Management

# Pull/download a model
docker model pull ai/llama3.2:3B-Q4_K_M
docker model pull hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF

# List downloaded models
docker model list

# Remove a model
docker model rm ai/llama3.2

# Inspect model details
docker model inspect ai/llama3.2

# View model metadata
docker model inspect ai/llama3.2 --format '{{.Config.Size}}'

Running Models

# Interactive query (load, run, display)
docker model run ai/llama3.2 "Explain Docker in one sentence"

# Interactive chat mode
docker model run ai/llama3.2

# Background/detached mode (pre-load model)
docker model run -d ai/llama3.2

# With options
docker model run --debug ai/llama3.2 "Test prompt"
docker model run --color yes ai/llama3.2 "Hello"
docker model run --ignore-runtime-memory-check ai/llama3.2 "Test"

Run Command Options

Option	Description
`--color auto	yes
`--debug`	Enable debug logging
`-d, --detach`	Load model in background
`--ignore-runtime-memory-check`	Skip memory validation

System Commands

# Check runner status
docker model status

# Show version
docker model version

# System information
docker model system info

# Disk usage
docker model system df

# Clean unused models
docker model system prune

# Remove ALL models
docker model system prune -a

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Using the OpenAI-Compatible API

Docker Model Runner exposes an OpenAI-compatible API, making it a drop-in replacement for OpenAI's API.

API Endpoint

Deployment	Endpoint
Host (TCP enabled)	`http://localhost:12434/engines/v1`
Docker containers (Desktop)	`http://model-runner.docker.internal/engines/v1`
Docker containers (Linux)	`http://localhost:12434/engines/v1`

Using cURL

# Chat completion
curl http://localhost:12434/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/llama3.2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is Docker?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

# List available models
curl http://localhost:12434/engines/v1/models

# Create embeddings
curl http://localhost:12434/engines/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/all-minilm",
    "input": "Docker makes it easy to run applications"
  }'

Python with OpenAI SDK

from openai import OpenAI

# Connect to Docker Model Runner
client = OpenAI(
    base_url="http://localhost:12434/engines/v1",
    api_key="not-needed"  # No API key required
)

# Chat completion
response = client.chat.completions.create(
    model="ai/llama3.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Docker Model Runner?"}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

# Streaming response
stream = client.chat.completions.create(
    model="ai/llama3.2",
    messages=[{"role": "user", "content": "Write a poem about Docker"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Node.js/TypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:12434/engines/v1',
  apiKey: 'not-needed'
});

async function chat() {
  const response = await client.chat.completions.create({
    model: 'ai/llama3.2',
    messages: [{ role: 'user', content: 'Hello!' }]
  });

  console.log(response.choices[0].message.content);
}

chat();

LangChain Integration

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://localhost:12434/engines/v1",
    api_key="not-needed",
    model="ai/llama3.2"
)

response = llm.invoke("What is Docker?")
print(response.content)

Docker Compose Integration

Docker Compose natively supports AI models as a top-level primitive.

Short Syntax (Simple)

services:
  app:
    image: my-app
    models:
      - llm
      - embedding-model

models:
  llm:
    model: ai/llama3.2
  embedding-model:
    model: ai/all-minilm

Long Syntax (With Configuration)

services:
  app:
    image: my-app
    environment:
      - LLM_URL=${LLM_URL}
    models:
      llm:
        endpoint_var: LLM_URL

models:
  llm:
    model: ai/llama3.2:3B-Q4_K_M
    context_size: 4096
    runtime_flags:
      - "--no-prefill-assistant"

Provider Syntax

services:
  chat:
    image: my-chat-app
    depends_on:
      - ai_runner

  ai_runner:
    provider:
      type: model
      options:
        model: ai/smollm2
        context-size: 1024

Auto-Injected Environment Variables

Docker Compose automatically injects environment variables:

Variable	Description
`{MODEL}_MODEL`	The model name (e.g., ai/llama3.2)
`{MODEL}_URL`	The endpoint URL

Connecting from Containers

# Docker Desktop
ENDPOINT="http://model-runner.docker.internal/engines/v1"

# Docker Engine (Linux)
ENDPOINT="http://localhost:12434/engines/v1"

Complete Example: AI Chat Application

version: '3.8'

services:
  backend:
    image: python:3.11
    command: python app.py
    environment:
      - MODEL_ENDPOINT=http://model-runner.docker.internal/engines/v1
      - MODEL_NAME=ai/llama3.2
    ports:
      - "8000:8000"
    models:
      - chat-model
      - embeddings

  frontend:
    image: node:20
    command: npm start
    ports:
      - "3000:3000"
    depends_on:
      - backend

models:
  chat-model:
    model: ai/llama3.2:3B-Q4_K_M
    context_size: 4096

  embeddings:
    model: ai/all-minilm

GPU Configuration

GPU acceleration dramatically improves inference performance—from 10+ seconds to under 3 seconds for typical responses.

Apple Silicon (Automatic)

GPU acceleration via Metal API is automatically configured on M1, M2, M3, and M4 Macs. No additional setup required.

Docker Model Runner runs natively on the host (not in a VM) for direct GPU access, providing excellent performance on Apple Silicon.

NVIDIA GPU Setup

Docker Desktop (Windows with WSL2):

Ensure NVIDIA drivers are installed
Open Docker Desktop > Settings > AI
Enable GPU-backed inference
Save and restart

Linux:

# 1. Install NVIDIA Container Runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# 2. Configure Docker daemon
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# 3. Verify configuration
docker info | grep -i runtime

# 4. Run with GPU
docker model run --gpu cuda ai/llama3.2 "Hello"

Best Practices for NVIDIA:

Always use --gpu cuda explicitly (not --gpu auto)
Monitor GPU usage with nvidia-smi
Use quantized models (Q4, Q5, Q8) for better GPU memory efficiency
Ensure sufficient GPU VRAM for your model size

Vulkan Support (AMD/Intel/Integrated)

Vulkan support enables GPU acceleration on a wider range of hardware:

AMD GPUs
Intel GPUs (including integrated)
Other Vulkan-compatible GPUs

Vulkan detection is automatic—no configuration needed.

GPU Support Matrix

Platform	GPU	API	Configuration
macOS	Apple Silicon (M1-M4)	Metal	Automatic
Windows	NVIDIA	CUDA	Enable in Settings
Windows	AMD/Intel	Vulkan	Automatic
Linux	NVIDIA	CUDA	`--gpu cuda` flag
Linux	AMD/Intel	Vulkan	Automatic

Inference Engines

Docker Model Runner supports multiple inference engines for different use cases.

llama.cpp (Default)

Platform: All (macOS, Windows, Linux)
Format: GGUF models only
GPU: Metal (Apple), CUDA (NVIDIA), Vulkan (AMD/Intel)
Best for: General use, Apple Silicon, development

vLLM

Platform: Linux x86_64, Windows WSL2 (with NVIDIA GPU)
Format: Safetensors
GPU: NVIDIA CUDA required
Best for: High-throughput production serving

Diffusers

Platform: Linux x86_64, Linux ARM64 (with NVIDIA GPU)
Format: Various (Stable Diffusion models)
GPU: NVIDIA CUDA required
Best for: Image generation

Performance: Docker Model Runner vs Ollama

Benchmark Comparison

Metric	Docker Model Runner	Ollama
Throughput	Baseline to +12%	Baseline
Startup Time	3-6 seconds	2-5 seconds
Memory (7B Q4)	4-6GB	4-6GB
Apple Silicon	Excellent (native Metal)	Excellent
NVIDIA	Good (CUDA)	Good

Feature Comparison

Feature	Docker Model Runner	Ollama
Docker Compose	Native integration	Requires setup
API Compatibility	OpenAI-compatible	OpenAI-compatible
Model Format	GGUF (+Safetensors vLLM)	GGUF
Model Library	Growing	Extensive
Ecosystem	New (Docker-focused)	Mature (many integrations)
Multimodal	Coming	Supported
Custom Models	OCI registries, HF	Modelfile

When to Choose Each

Choose Docker Model Runner when:

Your workflow is Docker-centric
You use Docker Compose for orchestration
You want models as Docker primitives
You're on Apple Silicon and want native performance
You need CI/CD integration with Docker

Choose Ollama when:

You need the largest model library
You want maximum ecosystem integrations
You need multimodal models now
You prefer Modelfile customization
You want the most mature solution

Resource Management

System Requirements

Resource	Minimum	Recommended
RAM	8GB	16GB+
Storage	Varies	50GB+ (for multiple models)
GPU	Optional	NVIDIA/Apple Silicon

Model Sizes (Approximate)

Model	Q4_K_M	Q8_0	FP16
SmolLM 360M	~250MB	~400MB	~720MB
Llama 3.2 1B	~800MB	~1.2GB	~2GB
Llama 3.2 3B	~2GB	~3.5GB	~6GB
Llama 3.3 70B	~40GB	~70GB	~140GB

Memory Management

Models in Docker Model Runner:

Load on-demand: First request loads the model
Unload when idle: Memory released after inactivity
Pre-loading: Use docker model run -d to keep model loaded

Monitoring and Cleanup

# Check disk usage by models
docker model system df

# View detailed model info
docker model inspect ai/llama3.2

# Remove unused models
docker model system prune

# Remove ALL models (use with caution)
docker model system prune -a

# Monitor GPU (NVIDIA)
nvidia-smi

# Check runner status
docker model status

Troubleshooting

Model Not Responding

# Check runner status
docker model status

# Restart runner (Linux)
docker model restart-runner

# Check if model is loaded
docker model list

# Try with debug output
docker model run --debug ai/llama3.2 "Test"

GPU Not Detected

NVIDIA:

# Verify NVIDIA runtime
docker info | grep -i runtime

# Check GPU visibility
nvidia-smi

# Ensure --gpu cuda flag is used
docker model run --gpu cuda ai/llama3.2 "Test"

Apple Silicon:

GPU via Metal should work automatically
Ensure Docker Desktop is up to date

Port 12434 Not Accessible

Enable Host-side TCP support in Docker Desktop settings
Check for firewall blocking the port
Verify with: curl http://localhost:12434/engines/v1/models

Out of Memory

# Use smaller quantization
docker model pull ai/llama3.2:3B-Q4_K_M

# Clean up unused models
docker model system prune

# Reduce context size in Compose
models:
  llm:
    model: ai/llama3.2
    context_size: 2048  # Smaller context

Best Practices

1. Use Appropriate Quantization

# Q4_K_M: Best balance of quality and memory
docker model pull ai/llama3.2:3B-Q4_K_M

# Q5_K_M: Slightly better quality
docker model pull ai/llama3.2:3B-Q5_K_M

# Q8_0: Near-full quality, more memory
docker model pull ai/llama3.2:3B-Q8_0

2. Pre-load Models for Production

# Load model in background for faster first request
docker model run -d ai/llama3.2

3. Set Appropriate Context Size

Larger context = more memory. Use smallest context that works:

models:
  llm:
    model: ai/llama3.2
    context_size: 2048  # Start small, increase if needed

4. Monitor Resources

# Regular cleanup
docker model system prune

# Check disk usage
docker model system df

Key Takeaways

Docker Model Runner is Docker's native AI inference solution
OpenAI-compatible API enables drop-in replacement for existing code
Docker Compose integration makes AI orchestration familiar and simple
GPU acceleration works across Apple Silicon, NVIDIA, and Vulkan
Performance matches Ollama, with Docker-native advantages
Models load on-demand and unload when idle for efficient memory use
Best for Docker-centric workflows, Compose deployments, and CI/CD

Next Steps

Compare with Ollama and LM Studio
Set up RAG with Docker Model Runner
Build AI agents using the OpenAI API
Choose the best models for your use case

Docker Model Runner brings AI inference into the Docker ecosystem, making it easy to run LLMs alongside your containerized applications with familiar tools, commands, and workflows. Whether you're developing locally or deploying to production, Docker Model Runner provides a seamless AI experience for Docker users.

Docker Model Runner Guide: Run LLMs with Docker

Want to go deeper than this article?

Docker Model Runner Quick Start

What is Docker Model Runner?

Key Characteristics

Why Docker Model Runner?

Reading articles is good. Building is better.

Installation and Setup

Docker Desktop (macOS and Windows)

Linux (Docker Engine)

Enable TCP Support

Available Models

Official Docker Hub Models

Model Selection by Use Case

Pulling Models

Model Format Requirements

CLI Commands Reference

Model Management

Running Models

Run Command Options

System Commands

Reading articles is good. Building is better.

Using the OpenAI-Compatible API

API Endpoint

Using cURL

Python with OpenAI SDK

Node.js/TypeScript

LangChain Integration

Docker Compose Integration

Short Syntax (Simple)

Long Syntax (With Configuration)

Provider Syntax

Auto-Injected Environment Variables

Connecting from Containers

Complete Example: AI Chat Application

GPU Configuration

Apple Silicon (Automatic)

NVIDIA GPU Setup

Vulkan Support (AMD/Intel/Integrated)

GPU Support Matrix

Inference Engines

llama.cpp (Default)

vLLM

Diffusers

Performance: Docker Model Runner vs Ollama

Benchmark Comparison

Feature Comparison

When to Choose Each

Resource Management

System Requirements

Model Sizes (Approximate)

Memory Management

Monitoring and Cleanup

Troubleshooting

Model Not Responding

GPU Not Detected

Port 12434 Not Accessible

Out of Memory

Best Practices

1. Use Appropriate Quantization

2. Pre-load Models for Production

3. Set Appropriate Context Size

4. Monitor Resources

Key Takeaways

Next Steps

Go from reading about AI to building with AI

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Build Real AI on Your Machine

Related Guides

Jan vs LM Studio vs Ollama

RAG Local Setup Guide

Best Open Source LLMs 2026

VRAM Requirements 2026