Training

LoRA Fine-Tuning Local Guide: Train Custom AI Models

February 4, 2026
18 min read
Local AI Master Research Team
🎁 4 PDFs included
Newsletter

Before we dive deeper...

Get your free AI Starter Kit

Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.

No spam, everUnsubscribe anytime
12,000+ downloads

LoRA Training Quick Start

3-Step Process:
1. pip install unsloth
2. Prepare dataset (JSONL format)
3. Run training script (~1 hour on RTX 4090)

What is LoRA?

LoRA (Low-Rank Adaptation) fine-tunes large models efficiently by:

  • Training small adapter matrices (~0.1-1% of parameters)
  • Keeping base model frozen
  • Achieving 90%+ of full fine-tuning quality
  • Using 10x less memory
Full Fine-tuning:  Train all 7B parameters → 28GB+ VRAM
LoRA Fine-tuning:  Train ~50M parameters → 8GB VRAM

VRAM Requirements

Model SizeFull Fine-tuneLoRAQLoRA
7B28GB16GB8GB
13B52GB24GB12GB
32B128GB48GB24GB
70B280GB96GB48GB

QLoRA makes consumer GPU training practical.

Setting Up Unsloth (Fastest Method)

Installation

pip install unsloth
pip install --upgrade transformers datasets

Basic Training Script

from unsloth import FastLanguageModel
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments

# Load model with 4-bit quantization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3.1-8b-bnb-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                   "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

# Load dataset
dataset = load_dataset("json", data_files="training_data.jsonl")

# Training arguments
training_args = TrainingArguments(
    output_dir="./lora_output",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    max_steps=100,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_steps=50,
)

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    args=training_args,
    max_seq_length=2048,
)

trainer.train()

# Save LoRA
model.save_pretrained("./my_lora")

Preparing Your Dataset

Dataset Format (JSONL)

{"instruction": "Write a poem about AI", "input": "", "output": "Silicon dreams..."}
{"instruction": "Explain quantum computing", "input": "", "output": "Quantum computing uses..."}
{"instruction": "Translate to French", "input": "Hello world", "output": "Bonjour le monde"}

Conversation Format

{"conversations": [
  {"role": "user", "content": "What is AI?"},
  {"role": "assistant", "content": "AI is..."}
]}

Dataset Tips

  1. Quality > Quantity: 500 excellent examples > 5000 mediocre
  2. Diversity: Cover all variations of your use case
  3. Format consistency: Same structure throughout
  4. Length variety: Mix short and long responses

LoRA Hyperparameters

ParameterRecommendedDescription
r (rank)8-32Higher = more capacity, more VRAM
alpha16-32Scaling factor, usually 2x rank
dropout0-0.1Regularization
target_modulesAll attention + MLPWhat to adapt
lr1e-4 to 2e-4Learning rate
epochs1-3Often 1 is enough

Merging and Deploying

Merge LoRA with Base Model

# Merge LoRA adapters into base model
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

Convert to GGUF for Ollama

# Install llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Convert to GGUF
python convert.py ../merged_model --outtype f16 --outfile model.gguf

# Quantize
./quantize model.gguf model-q4.gguf q4_k_m

Create Ollama Model

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./model-q4.gguf
SYSTEM "You are a helpful assistant fine-tuned for specific tasks."
EOF

# Register with Ollama
ollama create my-custom-model -f Modelfile

# Run your model
ollama run my-custom-model

Training Time Estimates

ModelDatasetGPUTime
7B1K examplesRTX 409015 min
7B10K examplesRTX 40902 hours
14B1K examplesRTX 409030 min
32B1K examplesRTX 40901 hour

Unsloth is ~2x faster than standard training.

Common Use Cases

1. Custom Writing Style

Train on your existing content to match your voice.

2. Domain Expert

Train on medical, legal, or technical documents.

3. Company Knowledge

Train on internal docs for support chatbot.

4. Language Adaptation

Improve performance in specific languages.

5. Task Specialist

Optimize for specific tasks (summarization, extraction).

Troubleshooting

Out of Memory

  • Reduce batch size to 1
  • Lower LoRA rank (r=8)
  • Use gradient checkpointing
  • Reduce max_seq_length

Poor Results

  • More/better training data
  • Lower learning rate
  • Train longer (more steps)
  • Check data formatting

Overfitting

  • Fewer epochs
  • Add dropout (0.05-0.1)
  • More diverse data

Key Takeaways

  1. LoRA trains 1% of parameters for 90% of results
  2. QLoRA enables training on consumer GPUs
  3. Unsloth is the fastest training framework
  4. Quality data matters more than quantity
  5. Deploy to Ollama for easy local use
  6. 8GB VRAM can train 7B models with QLoRA

Next Steps

  1. Set up Ollama to run your model
  2. Compare models to choose a base
  3. Build RAG with your fine-tuned model

LoRA democratizes AI customization—you can now create specialized AI models on consumer hardware that rival expensive cloud fine-tuning.

🚀 Join 12K+ developers
Newsletter

Ready to start your AI career?

Get the complete roadmap

Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.

No spam, everUnsubscribe anytime
12,000+ downloads
Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: February 4, 2026🔄 Last Updated: February 4, 2026✓ Manually Reviewed

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
Free Tools & Calculators