Training

LoRA Fine-Tuning Local Guide: Train Custom AI Models

February 4, 2026
18 min read
Local AI Master Research Team

Want to go deeper than this article?

The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.

LoRA (Low-Rank Adaptation) is a fine-tuning technique that trains only 0.1-1% of a model's parameters using small adapter matrices, reducing VRAM requirements by 90%. With QLoRA, you can fine-tune a 7B model on just 8GB VRAM in under 30 minutes using Unsloth. LoRA achieves 90%+ of full fine-tuning quality while keeping the base model frozen, and the resulting adapter deploys directly to Ollama as a custom model.

LoRA Training Quick Start

3-Step Process:
1. pip install unsloth
2. Prepare dataset (JSONL format)
3. Run training script (~1 hour on RTX 4090)

What is LoRA?

LoRA (Low-Rank Adaptation) fine-tunes large models efficiently by:

  • Training small adapter matrices (~0.1-1% of parameters)
  • Keeping base model frozen
  • Achieving 90%+ of full fine-tuning quality
  • Using 10x less memory
Full Fine-tuning:  Train all 7B parameters → 28GB+ VRAM
LoRA Fine-tuning:  Train ~50M parameters → 8GB VRAM

VRAM Requirements

Model SizeFull Fine-tuneLoRAQLoRA
7B28GB16GB8GB
13B52GB24GB12GB
32B128GB48GB24GB
70B280GB96GB48GB

QLoRA makes consumer GPU training practical.

Setting Up Unsloth (Fastest Method)

Installation

pip install unsloth
pip install --upgrade transformers datasets

Basic Training Script

from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

# Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3.1-8b-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

# Load dataset
dataset = load_dataset("json", data_files="training_data.jsonl")

# Training arguments
training_args = TrainingArguments(
    output_dir="./lora_output",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    max_steps=100,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_steps=50,
)

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    args=training_args,
    max_seq_length=2048,
)

trainer.train()

# Save LoRA
model.save_pretrained("./my_lora")

Preparing Your Dataset

Dataset Format (JSONL)

{"instruction": "Write a poem about AI", "input": "", "output": "Silicon dreams..."}
{"instruction": "Explain quantum computing", "input": "", "output": "Quantum computing uses..."}
{"instruction": "Translate to French", "input": "Hello world", "output": "Bonjour le monde"}

Conversation Format

{"conversations": [
  {"role": "user", "content": "What is AI?"},
  {"role": "assistant", "content": "AI is..."}
]}

Dataset Tips

  1. Quality > Quantity: 500 excellent examples > 5000 mediocre
  2. Diversity: Cover all variations of your use case
  3. Format consistency: Same structure throughout
  4. Length variety: Mix short and long responses

LoRA Hyperparameters

ParameterRecommendedDescription
r (rank)8-32Higher = more capacity, more VRAM
alpha16-32Scaling factor, usually 2x rank
dropout0-0.1Regularization
target_modulesAll attention + MLPWhat to adapt
lr1e-4 to 2e-4Learning rate
epochs1-3Often 1 is enough

Merging and Deploying

Merge LoRA with Base Model

# Merge LoRA adapters into base model
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

Convert to GGUF for Ollama

# Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Convert to GGUF
python convert.py ../merged_model --outtype f16 --outfile model.gguf

# Quantize
./quantize model.gguf model-q4.gguf q4_k_m

Create Ollama Model

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./model-q4.gguf
SYSTEM "You are a helpful assistant fine-tuned for specific tasks."
EOF

# Register with Ollama
ollama create my-custom-model -f Modelfile

# Run your model
ollama run my-custom-model

Training Time Estimates

ModelDatasetGPUTime
7B1K examplesRTX 409015 min
7B10K examplesRTX 40902 hours
14B1K examplesRTX 409030 min
32B1K examplesRTX 40901 hour

Unsloth is ~2x faster than standard training.

Common Use Cases

1. Custom Writing Style

Train on your existing content to match your voice.

2. Domain Expert

Train on medical, legal, or technical documents.

3. Company Knowledge

Train on internal docs for support chatbot.

4. Language Adaptation

Improve performance in specific languages.

5. Task Specialist

Optimize for specific tasks (summarization, extraction).

Troubleshooting

Out of Memory

  • Reduce batch size to 1
  • Lower LoRA rank (r=8)
  • Use gradient checkpointing
  • Reduce max_seq_length

Poor Results

  • More/better training data
  • Lower learning rate
  • Train longer (more steps)
  • Check data formatting

Overfitting

  • Fewer epochs
  • Add dropout (0.05-0.1)
  • More diverse data

Key Takeaways

  1. LoRA trains 1% of parameters for 90% of results
  2. QLoRA enables training on consumer GPUs
  3. Unsloth is the fastest training framework
  4. Quality data matters more than quantity
  5. Deploy to Ollama for easy local use
  6. 8GB VRAM can train 7B models with QLoRA

Next Steps

  1. Set up Ollama to run your model
  2. Compare models to choose a base
  3. Build RAG with your fine-tuned model

LoRA democratizes AI customization—you can now create specialized AI models on consumer hardware that rival expensive cloud fine-tuning.

🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Enjoyed this? There are 10 full courses waiting.

10 complete AI courses. From fundamentals to production. Everything runs on your hardware.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: February 4, 2026🔄 Last Updated: February 4, 2026✓ Manually Reviewed

Skip the setup

Fine-Tuning Starter Kit$19

Complete LoRA pipeline: prepare data → train (Unsloth 4x faster) → convert to GGUF → run in Ollama. 3 sample datasets included.

Get It Now →

Build Real AI on Your Machine

RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.

Was this helpful?

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
🎯
AI Learning Path

Go from reading about AI to building with AI

10 structured courses. Hands-on projects. Runs on your machine. Start free.

Free Tools & Calculators