← All build guides

Fine-Tune an LLM on Your Own Data

A LoRA fine-tune of Llama 3.1 8B on your own data, trained on a single consumer GPU and runnable in Ollama — start to finish in one afternoon.

3-4 hours (mostly waiting for training)🛠 NVIDIA GPU with 12GB+ VRAM (RTX 3060 12GB and up), Python 3.10+, ~50 training examples.
  1. 1

    Prepare your dataset

    You need 50-1000 examples in `{instruction, input, output}` JSONL format. Quality > quantity — 100 great examples beats 10,000 mediocre ones. Examples: support tickets paired with ideal replies, code snippets paired with reviews, raw prompts paired with style-corrected outputs. Save as `train.jsonl`.

  2. 2

    Install Unsloth

    `pip install unsloth` and `pip install transformers datasets peft accelerate bitsandbytes`. Unsloth is a drop-in replacement for HuggingFace transformers that makes training 4x faster and 60% lower memory — fits a Llama 3.1 8B LoRA in 12GB VRAM where vanilla HF needs 24GB+.

  3. 3

    Train the LoRA

    Load the base model with `FastLanguageModel.from_pretrained("unsloth/llama-3.1-8b-instruct-bnb-4bit", load_in_4bit=True)`. Wrap with LoRA adapters via `get_peft_model(...)`, target modules `["q_proj", "v_proj"]` to start. Use `SFTTrainer` with `train.jsonl`, batch_size 4, lr 2e-4, 3 epochs. On a 3060 with 200 examples this takes ~30 min.

  4. 4

    Convert to GGUF

    After training, merge the LoRA weights back: `model.save_pretrained_merged("merged_model", tokenizer, save_method="merged_16bit")`. Convert to GGUF with `python llama.cpp/convert_hf_to_gguf.py merged_model --outfile my-model.gguf --outtype q4_K_M`. You now have a quantized GGUF file ready for any local runtime.

  5. 5

    Run it in Ollama

    Create a `Modelfile`: `FROM ./my-model.gguf` + `TEMPLATE` matching Llama 3.1's chat format + your system prompt. Then `ollama create my-finetune -f Modelfile`. Run with `ollama run my-finetune` — your LLM, fine-tuned on your data, running locally, no cloud bill.

  6. 6

    When fine-tuning is the wrong tool

    Fine-tuning is for changing the *style* or *format* the model uses. For knowledge — "answer questions about my docs" — RAG beats fine-tuning every time and costs nothing to update. The What is AI fine-tuning chapter covers when to choose which (and the Fine-Tuning + Distillation course goes deeper, currently in early-access for Lifetime members).

Want the full path?

Continue with the full What is AI · Fine-tuning chapter course on Local AI Master.

This page is one chapter of a structured course covering everything from foundations to production. Try Pro free for 7 days — full access to all 264 chapters across 10 courses, no charge until day 8, cancel anytime.

No charge for 7 days · $9.99/mo after · cancel anytime
Free Tools & Calculators