★ Reading this for free? Get 17 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Training

LoRA Fine-Tuning Local Guide: Train Custom AI Models

February 4, 2026
18 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 17 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Like this article? The AI Learning Path covers this and more — hands-on chapters, real projects, runs on your hardware.

Start free

LoRA (Low-Rank Adaptation) is a fine-tuning technique that trains only 0.1-1% of a model's parameters using small adapter matrices, reducing VRAM requirements by 90%. With QLoRA, you can fine-tune a 7B model on just 8GB VRAM in under 30 minutes using Unsloth. LoRA achieves 90%+ of full fine-tuning quality while keeping the base model frozen, and the resulting adapter deploys directly to Ollama as a custom model.

LoRA Training Quick Start

3-Step Process:
1. pip install unsloth
2. Prepare dataset (JSONL format)
3. Run training script (~1 hour on RTX 4090)

What is LoRA?

LoRA (Low-Rank Adaptation) fine-tunes large models efficiently by:

  • Training small adapter matrices (~0.1-1% of parameters)
  • Keeping base model frozen
  • Achieving 90%+ of full fine-tuning quality
  • Using 10x less memory
Full Fine-tuning:  Train all 7B parameters → 28GB+ VRAM
LoRA Fine-tuning:  Train ~50M parameters → 8GB VRAM

Reading articles is good. Building is better.

Free account = 17+ structured chapters across 17 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

VRAM Requirements

Model SizeFull Fine-tuneLoRAQLoRA
7B28GB16GB8GB
13B52GB24GB12GB
32B128GB48GB24GB
70B280GB96GB48GB

QLoRA makes consumer GPU training practical.

Setting Up Unsloth (Fastest Method)

Installation

pip install unsloth
pip install --upgrade transformers datasets

Basic Training Script

from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

# Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3.1-8b-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

# Load dataset
dataset = load_dataset("json", data_files="training_data.jsonl")

# Training arguments
training_args = TrainingArguments(
    output_dir="./lora_output",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    max_steps=100,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_steps=50,
)

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    args=training_args,
    max_seq_length=2048,
)

trainer.train()

# Save LoRA
model.save_pretrained("./my_lora")

Preparing Your Dataset

Dataset Format (JSONL)

{"instruction": "Write a poem about AI", "input": "", "output": "Silicon dreams..."}
{"instruction": "Explain quantum computing", "input": "", "output": "Quantum computing uses..."}
{"instruction": "Translate to French", "input": "Hello world", "output": "Bonjour le monde"}

Conversation Format

{"conversations": [
  {"role": "user", "content": "What is AI?"},
  {"role": "assistant", "content": "AI is..."}
]}

Dataset Tips

  1. Quality > Quantity: 500 excellent examples > 5000 mediocre
  2. Diversity: Cover all variations of your use case
  3. Format consistency: Same structure throughout
  4. Length variety: Mix short and long responses

Reading articles is good. Building is better.

Free account = 17+ structured chapters across 17 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

LoRA Hyperparameters

ParameterRecommendedDescription
r (rank)8-32Higher = more capacity, more VRAM
alpha16-32Scaling factor, usually 2x rank
dropout0-0.1Regularization
target_modulesAll attention + MLPWhat to adapt
lr1e-4 to 2e-4Learning rate
epochs1-3Often 1 is enough

Merging and Deploying

Merge LoRA with Base Model

# Merge LoRA adapters into base model
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

Convert to GGUF for Ollama

# Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Convert to GGUF
python convert.py ../merged_model --outtype f16 --outfile model.gguf

# Quantize
./quantize model.gguf model-q4.gguf q4_k_m

Create Ollama Model

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./model-q4.gguf
SYSTEM "You are a helpful assistant fine-tuned for specific tasks."
EOF

# Register with Ollama
ollama create my-custom-model -f Modelfile

# Run your model
ollama run my-custom-model

Training Time Estimates

ModelDatasetGPUTime
7B1K examplesRTX 409015 min
7B10K examplesRTX 40902 hours
14B1K examplesRTX 409030 min
32B1K examplesRTX 40901 hour

Unsloth is ~2x faster than standard training.

Common Use Cases

1. Custom Writing Style

Train on your existing content to match your voice.

2. Domain Expert

Train on medical, legal, or technical documents.

3. Company Knowledge

Train on internal docs for support chatbot.

4. Language Adaptation

Improve performance in specific languages.

5. Task Specialist

Optimize for specific tasks (summarization, extraction).

Troubleshooting

Out of Memory

  • Reduce batch size to 1
  • Lower LoRA rank (r=8)
  • Use gradient checkpointing
  • Reduce max_seq_length

Poor Results

  • More/better training data
  • Lower learning rate
  • Train longer (more steps)
  • Check data formatting

Overfitting

  • Fewer epochs
  • Add dropout (0.05-0.1)
  • More diverse data

Key Takeaways

  1. LoRA trains 1% of parameters for 90% of results
  2. QLoRA enables training on consumer GPUs
  3. Unsloth is the fastest training framework
  4. Quality data matters more than quantity
  5. Deploy to Ollama for easy local use
  6. 8GB VRAM can train 7B models with QLoRA

Next Steps

  1. Set up Ollama to run your model
  2. Compare models to choose a base
  3. Build RAG with your fine-tuned model

LoRA democratizes AI customization—you can now create specialized AI models on consumer hardware that rival expensive cloud fine-tuning.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Liked this? 17 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 17 courses that take you from reading about AI to building AI.

Want structured AI education?

17 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: February 4, 2026🔄 Last Updated: February 4, 2026✓ Manually Reviewed

Bonus kit

Fine-Tuning Starter Kit

Complete LoRA pipeline: prepare data → train (Unsloth 4x faster) → convert to GGUF → run in Ollama. 3 sample datasets included. Included with paid plans, or free after subscribing to both Local AI Master and Little AI Master on YouTube.

See Plans →

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 17 courses that take you from reading about AI to building AI.

Was this helpful?

PR

Written by Pattanaik Ramswarup

Creator of Local AI Master

I build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators