Free course — 2 free chapters of every course. No credit card.Start learning free
Developer Reference

Ollama Modelfile Mastery: Custom Prompts, Parameters, and Templates

April 23, 2026
18 min read
LocalAimaster Research Team

Want to go deeper than this article?

The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.

Ollama Modelfile Mastery: Custom Prompts, Parameters, and Templates

Published April 23, 2026 • 18 min read

The Modelfile is the most under-documented part of Ollama. Most users learn ollama run llama3.1 and never look back — which is fine, until you want to ship a coding assistant tuned to your house style, or import a fine-tuned GGUF, or pin a 16k context window for RAG without setting it on every API call. The Modelfile is the answer to all of that. Twenty lines of plain text and you have a custom model variant that behaves exactly the way your team needs.

This is the reference I wish existed when I first wrote one. Every directive, every parameter that actually matters, real recipes for real workflows, and the gotchas that ate hours of my time. Tested on Ollama 0.5.7, April 2026.

Quick Start: Your First Modelfile in 90 Seconds

Save as Modelfile:

FROM llama3.1:8b
PARAMETER temperature 0.2
PARAMETER num_ctx 8192
SYSTEM """
You are a senior Python engineer. Always include working code examples.
Prefer standard library over dependencies. Be concise.
"""

Build and run:

ollama create py-coach -f Modelfile
ollama run py-coach "How do I parse CSV files with type-checked rows?"

You now have a custom model named py-coach that ships with the system prompt baked in, low temperature for focused output, and an 8k context. Anyone on your team can ollama pull it (if you push it) and get the exact same behavior. That is the entire value proposition.

Table of Contents

  1. What a Modelfile Actually Does
  2. Modelfile Syntax Reference
  3. Every PARAMETER Explained
  4. SYSTEM Prompt Patterns
  5. TEMPLATE: Chat Format Mastery
  6. Importing a GGUF File
  7. Adding LoRA Adapters
  8. Real Recipes for Real Use Cases
  9. Sharing and Publishing Modelfiles
  10. Common Pitfalls
  11. FAQs

What a Modelfile Actually Does {#what-it-does}

A Modelfile is to Ollama what a Dockerfile is to Docker. It defines a layered, reproducible model artifact:

  • Base model (FROM) — what to start with.
  • Parameter overrides (PARAMETER) — context size, temperature, GPU offload, etc.
  • System prompt (SYSTEM) — instructions baked into every chat.
  • Chat template (TEMPLATE) — how messages are formatted before tokenization.
  • Adapters (ADAPTER) — fine-tuning weights layered on top of the base.
  • License and metadata (LICENSE, MESSAGE) — provenance and example interactions.

When you run ollama create, Ollama builds a new model layer that combines all of this. The base weights are not duplicated — only the diff (your overrides) is stored — so a custom variant of a 7B model adds maybe 5 KB of disk.

The reason this matters: every team has a "house version" of common models. The marketing team wants a writing assistant with brand voice in the system prompt. Engineering wants a coding model with their style guide. Support wants a chatbot with hard guardrails. Without Modelfiles, every team builds its own prompt-injection layer in code. With Modelfiles, you ship one artifact that works the same way from cURL, the Ollama CLI, LangChain, Continue.dev, and any other client.

For the broader ecosystem context, our complete Ollama guide is the right starting point if you are new, and best Ollama models helps pick the right FROM.


Modelfile Syntax Reference {#syntax}

Modelfile is a simple line-based DSL. Each instruction is a single line (or a triple-quoted block for multi-line strings). Comments start with #.

# This is a comment

FROM llama3.1:8b                      # base model (required)

PARAMETER num_ctx 8192                # parameter override
PARAMETER temperature 0.7
PARAMETER stop "<|eot_id|>"

SYSTEM """                            # multi-line system prompt
You are a helpful coding assistant.
Be concise. Always include working examples.
"""

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

"""

ADAPTER ./my-lora.gguf                # optional LoRA adapter
LICENSE """MIT License..."""          # optional license text
MESSAGE user "Show me a hello world"  # optional example messages
MESSAGE assistant "print(\"hello\")"

The instructions can appear in any order, but convention is FROM → PARAMETER → SYSTEM → TEMPLATE → ADAPTER → LICENSE → MESSAGE.

Build, inspect, push

# Build
ollama create my-model -f Modelfile

# See what was generated
ollama show my-model --modelfile

# Inspect parameters
ollama show my-model --parameters

# Inspect template
ollama show my-model --template

# Push to ollama.com (requires login)
ollama push username/my-model

# Pull from registry
ollama pull username/my-model

ollama show --modelfile is the secret weapon. It dumps the effective Modelfile of any installed model — even base models — so you can see exactly what TEMPLATE and PARAMETER values it ships with. Steal them as starting points for your own variants.


Every PARAMETER Explained {#parameters}

The full list of PARAMETER directives, what they actually do, and where they matter.

Context and output size

ParameterDefaultRangeWhat it controls
num_ctx2048512–131072Context window in tokens. Critical for RAG and long docs.
num_predict-1 (∞)1–∞Max tokens generated per response. -1 means until model emits stop or hits num_ctx.
num_keep40–num_ctxTokens kept from beginning when context overflows.

Practical note: most base models in Ollama ship with num_ctx=2048. That is far below what the model actually supports. Llama 3.1 supports 128k. Qwen 2.5 supports 32k natively. If you are doing RAG or long-doc summarization, override with PARAMETER num_ctx 8192 (or higher) — otherwise your retrieved chunks get silently truncated.

Sampling and creativity

ParameterDefaultRangeWhat it controls
temperature0.80.0–2.0Randomness. 0 = deterministic, 1 = balanced, 2 = chaos.
top_k400–100Sample from top-k tokens. 0 disables. Lower = more focused.
top_p0.90.0–1.0Nucleus sampling. Sample from tokens with cumulative prob >= top_p.
min_p0.050.0–1.0Newer alternative to top_p. Drop tokens with prob < min_p × max_prob.
tfs_z1.01.0–∞Tail-free sampling. Higher = more aggressive tail filtering.
typical_p1.00.0–1.0Locally typical sampling.
repeat_penalty1.10.0–2.0Penalize repeating tokens. 1.0 = off, >1.0 reduces repetition.
repeat_last_n640–num_ctxWindow over which repeat_penalty applies.
presence_penalty0.0-2.0–2.0Penalty for any token already present.
frequency_penalty0.0-2.0–2.0Penalty proportional to token frequency.
mirostat00/1/2Mirostat sampling. 0 = off, 1 = v1, 2 = v2. Trades diversity for stability.
mirostat_eta0.10.0–1.0Mirostat learning rate.
mirostat_tau5.00.0–10.0Mirostat target entropy.
seed0intRandom seed. Set non-zero for reproducible output.

My defaults by use case:

  • Coding / structured output: temperature 0.1, top_p 0.9, repeat_penalty 1.05
  • Conversational chat: temperature 0.7, top_p 0.9, repeat_penalty 1.1
  • Creative writing: temperature 0.95, top_p 0.95, repeat_penalty 1.15
  • RAG (factual): temperature 0.2, top_p 0.85, repeat_penalty 1.05

Stopping

ParameterDefaultWhat it does
stopmodel-dependentStop generation when this string is emitted. Repeat for multiple.
PARAMETER stop "<|eot_id|>"
PARAMETER stop "User:"
PARAMETER stop "</s>"

Hardware and runtime

ParameterDefaultRangeWhat it controls
num_gpu-1 (auto)0–999Layers offloaded to GPU. 999 = max. 0 = CPU only.
num_threadauto1–CPU coresCPU threads for prompt processing.
num_batch5121–num_ctxBatch size for prompt processing.
f16_kvtrueboolUse FP16 for KV cache. Halves VRAM vs FP32.
use_mmaptrueboolMemory-map the model file. Disable for read-once workloads.
use_mlockfalseboolLock model in RAM (no swap). Useful for low-RAM systems.
numafalseboolNUMA-aware memory allocation. For multi-socket servers.
vocab_onlyfalseboolLoad only the tokenizer. Diagnostic mostly.

The big knob most people miss: num_gpu. On a Mac with unified memory or a Linux box where Ollama auto-detects correctly, leave it. On systems where Ollama is offloading too few layers (you see "X/Y layers offloaded to GPU" in logs and X is suspiciously low), force-set num_gpu 999 — Ollama silently respects available VRAM and offloads as many as fit.


SYSTEM Prompt Patterns {#system-prompts}

The SYSTEM directive bakes a system prompt into the model. Patterns that work, ranked by usefulness:

Pattern 1: Persona + format constraints

SYSTEM """
You are a senior backend engineer specializing in Python and PostgreSQL.
- Always include working code examples.
- Prefer standard library; only suggest dependencies when justified.
- Use type hints in all function signatures.
- Be concise. No unnecessary preamble.
"""

This is the workhorse pattern. Persona narrows the model's response style, the bullets enforce concrete output rules.

Pattern 2: Hard guardrails for support bots

SYSTEM """
You are a customer support agent for ACME Corp.
NEVER discuss competitor products by name.
NEVER promise refunds — direct refund requests to support@acme.com.
NEVER answer questions outside ACME products and account help.
If asked something outside scope, respond: "I can only help with ACME product and account questions. For [topic], please contact [appropriate channel]."
"""

The "NEVER" lines work surprisingly well. Pair with a stop parameter that catches escape attempts.

Pattern 3: RAG-aware system prompt

SYSTEM """
You are a documentation assistant. Answer ONLY using the provided context.
If the context does not contain the answer, say: "I don't have that information in the available documentation."
Cite sources by filename when possible.
Do not invent facts. Do not generalize beyond the context.
"""
PARAMETER temperature 0.1
PARAMETER num_ctx 16384

The combination of low temperature + restrictive system prompt + long context is the standard recipe for a RAG-grounded model variant. Pair with our local RAG setup guide for the retrieval side.

Pattern 4: Style-mimicry for content

SYSTEM """
You write in the voice of Local AI Master:
- Direct, no fluff, no "let's dive in" or "in today's fast-paced world".
- Specific numbers, real benchmarks, concrete examples.
- First-person where appropriate ("I tested...", "we measured...").
- Short paragraphs. Hard returns instead of long sentences.
- Honest about tradeoffs. Acknowledge what doesn't work.
"""

Bake the brand voice into the variant. Every chat starts with the right tone without per-call prompt engineering.

Pattern 5: Multi-shot examples

Sometimes a system prompt is not enough. Use MESSAGE pairs:

SYSTEM "You are a SQL expert. Always show the full query and an explanation."

MESSAGE user "Find users who signed up in the last 7 days."
MESSAGE assistant """
\`\`\`sql
SELECT id, email, created_at
FROM users
WHERE created_at >= NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;
\`\`\`

This selects all columns we typically care about (id, email, signup time), filters to the last 7 days, and sorts newest-first.
"""

MESSAGE user "Same but only paying customers."
MESSAGE assistant """
\`\`\`sql
SELECT u.id, u.email, u.created_at
FROM users u
JOIN subscriptions s ON s.user_id = u.id
WHERE u.created_at >= NOW() - INTERVAL '7 days'
  AND s.status = 'active'
ORDER BY u.created_at DESC;
\`\`\`

Joined to subscriptions and filtered to active subs to capture only paying customers.
"""

The messages are seeded into the conversation as if they had happened. Few-shot examples baked into the model.


TEMPLATE: Chat Format Mastery {#template}

TEMPLATE is the trickiest directive. It defines the exact tokens that wrap system, user, and assistant messages before the model sees them. Get it wrong and the model produces garbage even with correct weights.

Inheriting the base template

If you are extending an existing model, inherit:

FROM llama3.1:8b
SYSTEM "You are a Python tutor."

The TEMPLATE from llama3.1:8b is preserved automatically. You only need to write your own TEMPLATE when you import a raw GGUF.

Writing a TEMPLATE for an imported GGUF

The TEMPLATE uses Go templating with these variables:

VariableMeaning
.SystemThe system prompt
.PromptThe user's current message
.ResponseThe assistant's response (used during streaming)
.MessagesArray of {Role, Content} for multi-turn
.ToolsArray of tools (for tool-calling models)

The Llama 3.1 TEMPLATE looks like this:

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>

{{ .Content }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

"""

For Qwen 2.5:

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

For Mistral / Mixtral:

TEMPLATE """[INST] {{ if .System }}{{ .System }}

{{ end }}{{ .Prompt }} [/INST]"""

Where to find the right template: look up the model on HuggingFace, find tokenizer_config.json, copy the chat_template field, and translate Jinja2 to Go templates (the syntax is similar but not identical).

Tool-call template

For a tool-calling model:

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}{{ if .Tools }}

You have access to the following tools:
{{ range .Tools }}{{ .Function.Name }}: {{ .Function.Description }}
{{ end }}{{ end }}<|eot_id|>{{ end }}{{ range .Messages }}<|start_header_id|>{{ .Role }}<|end_header_id|>

{{ .Content }}{{ if .ToolCalls }}{{ range .ToolCalls }}
{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{ end }}{{ end }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

"""

For deeper tool-calling work, our Ollama tool calling guide covers the runtime side.


Importing a GGUF File {#gguf-import}

This is one of the most useful Modelfile features. Got a GGUF you downloaded from HuggingFace, exported from llama.cpp, or quantized yourself? Import it.

# Modelfile
FROM ./qwen2.5-7b-instruct-q4_k_m.gguf

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ range .Messages }}<|im_start|>{{ .Role }}
{{ .Content }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>"
PARAMETER num_ctx 32768
PARAMETER temperature 0.7

SYSTEM "You are a helpful assistant."
ollama create my-qwen -f Modelfile
ollama run my-qwen

What just happened: Ollama copied the GGUF into its blob store under ~/.ollama/models/blobs, registered the new model, applied the template and parameters. From now on my-qwen works like any other Ollama model — including pulling it from a remote registry if you push.

Critical: the TEMPLATE has to match what the GGUF was trained for. If you skip TEMPLATE on a non-llama-format model, output is garbled. Always check the HuggingFace model card for the chat template.

For multimodal GGUFs (LLaVA, MiniCPM-V, BakLLaVA), Ollama 0.5+ supports them natively. The Modelfile is the same shape, just FROM the multimodal GGUF.


Adding LoRA Adapters {#adapters}

If you fine-tuned a model with axolotl, unsloth, or LLaMA-Factory, you can layer the adapter onto the base in a Modelfile:

FROM llama3.1:8b
ADAPTER ./company-style-lora.gguf

PARAMETER temperature 0.4
SYSTEM "You write in the company style."

The adapter must be in GGUF format. Most fine-tuning frameworks export to safetensors or HuggingFace format — convert with llama.cpp/convert_lora_to_gguf.py:

python llama.cpp/convert_lora_to_gguf.py \
  --base meta-llama/Meta-Llama-3.1-8B-Instruct \
  --outfile company-style-lora.gguf \
  ./my-lora-output/

After ollama create, the adapter is permanently merged into the model variant — runtime cost is identical to the base model.

For the full fine-tuning path that ends with this Modelfile step, see our fine-tune local AI for business guide.


Real Recipes for Real Use Cases {#recipes}

Working Modelfiles for common workflows. Copy, modify, ship.

Recipe 1: Coding assistant with house style

FROM qwen2.5-coder:14b

PARAMETER temperature 0.15
PARAMETER top_p 0.9
PARAMETER num_ctx 16384
PARAMETER repeat_penalty 1.05

SYSTEM """
You are a senior engineer at ACME, writing TypeScript and Python for backend services.
- Use type hints (Python) and strict types (TypeScript).
- Follow our style guide: snake_case in Python, camelCase in TS.
- Always include error handling, never bare except.
- Prefer composition over inheritance.
- Write tests when implementing new functionality (pytest / vitest).
- Be concise. No unnecessary preamble or explanations after the code.
"""

Recipe 2: RAG-grounded answerer

FROM llama3.1:8b

PARAMETER temperature 0.1
PARAMETER top_p 0.85
PARAMETER num_ctx 16384
PARAMETER repeat_penalty 1.05

SYSTEM """
You answer questions using ONLY the provided context.

Rules:
1. If the answer is in the context, give it directly with the source filename.
2. If the answer is not in the context, respond exactly: "I don't have that information in the available documentation."
3. Never invent details. Never extrapolate beyond the context.
4. Quote exact phrases from the context when accuracy matters.
"""

Recipe 3: SQL co-pilot for read-only analytics

FROM qwen2.5-coder:7b

PARAMETER temperature 0.05
PARAMETER num_ctx 8192

SYSTEM """
You are a PostgreSQL expert helping analysts query a read-only analytics database.

Schema:
- users(id, email, created_at, plan)
- subscriptions(id, user_id, status, started_at, canceled_at)
- events(id, user_id, name, properties JSONB, created_at)

Rules:
- Output only the SQL query in a fenced code block, then a one-sentence explanation.
- Always use explicit JOINs. Never SELECT *.
- Use CTEs for queries with more than 2 joins.
- Add comments for non-obvious logic.
- NEVER write INSERT, UPDATE, DELETE, or DDL — this is read-only.
"""

Recipe 4: Email triage assistant

FROM llama3.1:8b

PARAMETER temperature 0.3
PARAMETER num_ctx 4096

SYSTEM """
Classify each email into exactly one category: URGENT, ACTION_REQUIRED, FYI, NEWSLETTER, SPAM.
Return JSON: {"category": "...", "summary": "...", "suggested_action": "..."}.
- URGENT: direct request from a customer or boss requiring response within 4 hours.
- ACTION_REQUIRED: needs a response but not urgent.
- FYI: informational, no action needed.
- NEWSLETTER: bulk content from a list.
- SPAM: cold outreach, suspicious, irrelevant.
Never include explanations outside the JSON.
"""

Pair this with our local AI email triage guide once it ships for the orchestration side.

Recipe 5: Long-context document summarizer

FROM llama3.1:8b

PARAMETER temperature 0.2
PARAMETER num_ctx 32768
PARAMETER num_predict 1024

SYSTEM """
Summarize long documents with this structure:
1. TL;DR (2 sentences max).
2. Key Points (5-7 bullets, each <= 15 words).
3. Action Items (only if any are present in the source; otherwise omit).
4. Open Questions (only if applicable).

Be specific. Use numbers and proper nouns from the source. Avoid filler phrases.
"""

Recipe 6: Quantized model with explicit GPU offload

For a system where Ollama is misdetecting VRAM:

FROM ./mistral-nemo-instruct-q5_k_m.gguf

TEMPLATE """[INST] {{ if .System }}{{ .System }}

{{ end }}{{ .Prompt }} [/INST]"""

PARAMETER stop "[INST]"
PARAMETER stop "</s>"
PARAMETER num_ctx 8192
PARAMETER num_gpu 999      # offload all layers to GPU
PARAMETER num_thread 8     # CPU threads for prompt processing
PARAMETER f16_kv true      # halve KV cache VRAM

SYSTEM "You are a helpful assistant."

Sharing and Publishing Modelfiles {#publishing}

Push to the public registry

ollama login
ollama create yourname/py-coach -f Modelfile
ollama push yourname/py-coach

Anyone can now ollama pull yourname/py-coach. The base model is reused (not re-uploaded) — only your overrides ship.

Private sharing within a team

Three options:

  1. Commit Modelfile to a git repo. Teammates clone, run ollama create. Simplest, no infra.
  2. Self-hosted registry. Ollama supports OCI-compatible registries. Push to a private Harbor or AWS ECR with proper auth.
  3. Internal HTTP server with the GGUF + Modelfile. Teammates run a script that downloads + creates.

For 90% of teams, option 1 is the answer. The Modelfile is small enough to PR-review like any other code change.

Versioning

Tag your variants:

ollama create acme/coding-assistant:v1.2 -f Modelfile.v1.2
ollama push acme/coding-assistant:v1.2

Treat Modelfiles like Dockerfiles. Pin versions in app code. Never use :latest in production — a base model update can change behavior subtly.


Common Pitfalls {#pitfalls}

1. Forgetting num_ctx. Default is 2048. Most modern models support 8k-128k. Override to match your use case.

2. Wrong TEMPLATE for an imported GGUF. Output looks like noise. Check HuggingFace tokenizer_config.json for the right format.

3. SYSTEM with single quotes. Use triple double-quotes """...""" for multi-line. Single quotes truncate at first newline.

4. ADAPTER format mismatch. Adapter must be GGUF, not safetensors. Convert with llama.cpp's converter.

5. Stacking conflicting PARAMETER stop sequences. Each stop is OR'd. Too many fragments cause early termination on incidental matches.

6. Triple-quoted string escaping. Inside SYSTEM """...""" you do not need to escape quotes, but you do need to escape backslashes if you want them literal. Backticks are fine.

7. num_gpu 999 on CPU-only systems. Causes a warning and falls back to CPU, but logs are noisy. Set explicitly to 0 if no GPU.

8. Pushing without licensing. If you fine-tuned on data with restrictions, add a LICENSE block. Be honest about base model licenses (Llama community license, etc.).

9. Modelfile checked in without the GGUF. If your FROM is a local file, teammates cannot ollama create without the GGUF too. Either push the result to a registry, or document where to download the GGUF.

10. Re-creating instead of updating. ollama create my-model overwrites. Use a different name during testing (my-model-test) and only overwrite the production name when validated.

For deeper context on parameters and runtime tuning, the official Ollama Modelfile reference is the authoritative source — bookmark it.


Conclusion

Modelfiles are how Ollama becomes more than ollama run llama3.1. Twenty lines of plain text and you have a model variant tuned for your workflow, sharable across your team, version-controllable like any other artifact. The pattern composes: a base model + a system prompt + a few parameter overrides + maybe an adapter, and you have shipped a custom AI behavior.

The pieces I would internalize first: num_ctx (most users undertune it), SYSTEM with concrete persona + format rules, and temperature per use case. Those three knobs cover 80% of the value. After that, importing GGUFs and layering adapters opens up the long tail — fine-tuned coding assistants, brand-voice writers, RAG-grounded answerers — all running locally, all on hardware you control.

Once you have a Modelfile you trust, push it. Make it the team's default. Wire it into your Ollama production deployment so every API caller gets the same baked-in behavior. That is the moment Ollama stops being a tool and starts being part of your platform.


Subscribe to the Local AI Master newsletter for more Modelfile recipes, parameter tuning experiments, and shareable templates from real production stacks.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Enjoyed this? There are 10 full courses waiting.

10 complete AI courses. From fundamentals to production. Everything runs on your hardware.

Reading now
Join the discussion

LocalAimaster Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: April 23, 2026🔄 Last Updated: April 23, 2026✓ Manually Reviewed
PR

Written by Pattanaik Ramswarup

Creator of Local AI Master

I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor

Was this helpful?

Related Guides

Continue your local AI journey with these comprehensive guides

Get the Local AI Builder Newsletter

New Modelfile recipes, parameter tuning experiments, and shareable templates from production stacks. One email a week.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 10 courses that take you from reading about AI to building AI.

📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators