Starter Kit · One-Time Purchase
Fine-Tuning Starter Kit
LoRA fine-tune any model on your data
Worth $49 — you pay $19 today.
Buy now — $19Instant delivery — download from your in-app library right after checkout. Linked to your account; sign in any time to re-download. All sales final.
Overview
The Fine-Tuning Starter Kit is a complete, working LoRA/QLoRA pipeline that takes you from raw client data to a custom model running in Ollama, all on your own (or your client's) hardware. It ships with verified configs for current open base models (Qwen3-8B, Llama 3.1-8B, Gemma 3-4B), sample instruction and multi-turn datasets, a data-prep script, an Unsloth-based trainer with a built-in eval step, a side-by-side base-vs-fine-tuned evaluator, and a one-command GGUF export that writes the Ollama Modelfile for you. `make all` runs data → train → GGUF end to end. Nothing leaves your machine, and the dependency pins sit inside Unsloth's tested window so the trainer actually runs instead of erroring on a version mismatch.
The real value is who this lets you sell to. Plenty of businesses are legally or contractually barred from sending data to OpenAI/Anthropic/Google — law firms, clinics, accounting firms, manufacturers, defense subcontractors, GDPR-bound EU companies, and any org that already banned ChatGPT internally. A private model fine-tuned on their data and run on their hardware is often the only compliant option they have. Most freelancers can't deliver that because they don't have the pipeline. This kit is that pipeline plus an included income playbook (MONETIZE.md) with pricing, a ready-to-send statement of work, and a no-cold-outreach client-finding plan.
Honest framing: this removes the build barrier, not the work. You still find the client, clean their data, run training on real GPU hardware, and stand behind the result. There is no passive income here — there is a real, billable service ($1,500–$4,000 pilots, $5,000–$15,000 production builds, $500–$2,500/mo retainers) that you can now actually deliver.
What's included
- Three verified, ready-to-run training configs pinned to base models that exist on Hugging Face today: qwen3-8b.yaml (default, ~16GB VRAM), llama-3.1-8b.yaml (~16GB), gemma-3-4b.yaml (laptop/small GPU, ~6-8GB)
- prepare_data.py — normalizes JSON, CSV, and JSONL into chat-template JSONL with an automatic train/val split; accepts both instruction format and multi-turn messages[] chat format
- train_lora.py — LoRA/QLoRA fine-tuning powered by Unsloth (2-5x faster, 4-bit QLoRA support) with an automatic eval step when a val file is present
- evaluate.py — generates held-out answers from base vs fine-tuned model side by side, so you have readable proof the model actually improved (this is your client deliverable)
- convert_to_gguf.py — one-command GGUF export at a chosen quant (e.g. Q4_K_M) that auto-detects the chat template and writes a ready Ollama Modelfile (--create runs `ollama create` for you)
- Makefile with one-command workflows: make data | train | eval | gguf | all, all overridable by CONFIG/DATA/TEMPLATE/QUANT on the command line
- Four sample datasets to test the pipeline immediately: customer_support.json (50 examples), code_review.json (30), qa_pairs.json (20), chat_multiturn.jsonl (5 multi-turn chats)
- Verified chat templates for ChatML/Qwen3, Llama 3.1/3.3, and Gemma 3 — the part people usually get wrong and that silently wrecks output quality
- requirements.txt with deliberate version caps (transformers <=5.5, trl <=0.24, datasets <4.4) that sit inside Unsloth’s tested window so training runs instead of crashing
- README with one-command quickstart, manual step-by-step, data-format spec, VRAM-by-model table, and honest notes (model licenses, when to use RAG instead, don’t promise 70B on a laptop)
- MONETIZE.md income playbook — the paid differentiator: the exact service to sell, realistic 2026 pricing tiers, a fill-in-the-brackets Statement of Work, a per-engagement delivery checklist, and scope traps that kill margins
- Notes on going bigger (Llama 3.3 70B config guidance) and exact llama.cpp fallback commands if Unsloth’s built-in GGUF exporter isn’t available
Who it's for
- Freelance ML/AI developers who want to add a high-ticket, hard-to-commoditize service (private fine-tuning) instead of competing on generic prompt/automation gigs
- Solo consultants and small dev shops serving compliance-sensitive verticals (legal, healthcare/dental, accounting, manufacturing, defense) who need an on-prem AI answer
- Local AI / Ollama enthusiasts who can already run models and want to turn that skill into paid client work
- Agencies and fractional CTOs/CISOs who get asked for 'private AI we can run ourselves' and have no pipeline to deliver it
- Developers learning LoRA/QLoRA who want a working, current (June 2026) reference pipeline instead of stitching together broken tutorials and version-mismatched dependencies
Use cases
- Fine-tune a model on a firm's resolved support tickets so new agents draft on-brand replies in the company's tone and policy
- Build an on-prem assistant that answers staff questions from a company's internal SOPs and handbook — without uploading the handbook anywhere
- Train a classifier/router that sorts incoming email or documents into a client's own categories
- Create a field extractor that pulls structured data from a client's forms, contracts, or invoices
- Deliver a house-style first-draft writer (proposals, reports, summaries) that sounds like the client, not like generic ChatGPT
- Sell a fixed-scope 'Private AI Readiness Assessment' (use case + data audit + hardware spec + go/no-go) as an easy first purchase that converts to a build
- Run quarterly model-refresh retainers, re-training on the client's new data as their policies and language drift
Sell compliant, on-prem fine-tuned models to businesses that legally can't touch cloud AI
The service
A private language model, fine-tuned on the client's data, that runs entirely on their own hardware — no data ever leaves their network. Scope each engagement to ONE narrow, valuable task (draft support replies, route documents, answer from internal SOPs, extract form fields, write in house style). 'Build us an AI' is not sellable; 'fine-tune a model on your 2,000 resolved tickets so new agents draft on-brand replies' is.
What to charge
Paid pilot / proof-of-concept (1 task, eval report, runs on 1 GPU): $1,500–$4,000. Production fine-tune (cleaned dataset, 2-3 iterations, Ollama deploy + handoff doc): $5,000–$15,000. + Integration into their tools: +$3,000–$10,000. Retainer / quarterly model refresh: $500–$2,500/mo. Hourly for fuzzy scope: $90–$200/hr. Always take a 50% deposit and bill data cleaning separately or cap it.
How to find clients
- Publish proof, not pitches: write one short case-style post ('How a 10-person law firm got an on-prem AI draft assistant without sending a single file to the cloud') on your site/LinkedIn — compliance buyers search for exactly this
- Partner with people who already hold the trust: MSPs, fractional CISOs/CTOs, compliance consultants, and boutique dev shops have clients asking for 'private AI' and no one to build it — offer a 10-20% referral cut or white-label (highest-ROI channel for on-prem work)
- Answer where they ask: compliance and IT folks ask 'can we run AI without sending data out?' in industry forums, LinkedIn comments, and Slack/Discord communities — give a genuinely useful answer and let your profile sell (skip Reddit)
- Go local + vertical, not global + generic: 'AI consultant' is saturated, 'on-prem AI for dental practices in [your metro]' is not — pick one vertical you understand and become the obvious name
- Run a 30-minute talk/webinar ('Private AI for [industry] — what's actually possible on your own hardware') for a local business or trade group to book pipeline
- Sell a $1,500 fixed-scope 'Private AI Readiness Assessment' as a low-friction tripwire that naturally converts into the full build
The delivery steps
- Qualify before you say yes: one clearly defined task + written success metric, ~300+ good examples (1,000+ comfortable), a confirmed GPU, the data-residency requirement in writing, and a deposit received
- Get their examples in the kit’s format (data/sample-dataset is your spec sheet), run prepare_data.py with the right template, eyeball 20 random rows, and strip PII you don’t need
- Pick the config that fits their GPU (gemma-3-4b for ≤8GB, qwen3/llama-3.1-8b for 16GB — never promise 70B on a laptop) and run train_lora.py with eval on; save the loss curve
- Run evaluate.py for side-by-side base-vs-fine-tuned generations — this readable comparison is your proof and what convinces the client to pay the balance
- Ship: convert_to_gguf.py to GGUF + Modelfile, ollama create on THEIR machine, verify it answers correctly, then write the handoff doc (how to run/update it, eval results, hardware notes, what’s out of scope)
- Do the closeout: delete their data per the SOW and tell them, offer the refresh retainer, and ask "who else do you know with the same privacy constraint?" for the referral
How to market it
- Lead with the compliance angle, not the tech: market 'your data never leaves your building' to law/health/finance/defense buyers — that's the moat, fine-tuning is just how you deliver it
- Pick ONE vertical and own its language: a page or post titled 'On-prem AI for [accounting firms / dental practices / law offices]' beats a generic 'AI services' page because it matches the exact compliance search intent
- Publish a before/after case study using the kit’s evaluate.py output — readable base-vs-fine-tuned samples are concrete proof that converts skeptics far better than claims
- Build a referral pipeline with MSPs and compliance consultants: a one-page 'I build the private AI your clients are asking for' partner sheet with a 10-20% cut is a repeatable channel
- Offer the $1,500 Readiness Assessment as your front-door product — easy yes, low risk, and it pre-qualifies and warms up the real build
- Speak at local business groups, trade associations, and industry webinars — a 30-min 'what's actually possible on your own hardware' talk positions you as the expert and books discovery calls
- Add a 'hire me to build this for you' CTA at the end of any local-AI tutorials or content you already publish, turning existing readers into a warm inbound funnel
Frequently asked questions
Do I need to be a machine-learning expert to use this?
No — you need to be comfortable in a terminal and have a CUDA GPU. The kit handles model selection, LoRA/QLoRA configs, data formatting, training, eval, and Ollama deploy. `make all` runs the whole pipeline. You do need to understand the workflow well enough to stand behind a paid deliverable, which is exactly what the included sample datasets let you practice on first.
What hardware do I need to run this?
A CUDA GPU. QLoRA on the gemma-3-4b config fits ~6-8GB (laptop-class), and the qwen3-8b or llama-3.1-8b configs need ~16GB. For paid client work, the kit's playbook recommends quoting a GPU spec (RTX 4090 / A6000 / a private-cloud A10) and having the CLIENT buy or rent it — you don't want to own their infrastructure, and 'it runs on your box' is part of the compliance pitch. Don't promise 70B fine-tunes on hobby GPUs.
Will this work with current models, or is it already outdated?
It's verified for June 2026 with three base models confirmed to exist on Hugging Face: Qwen3-8B, Llama 3.1-8B, and Gemma 3-4B, each with a matching Ollama tag. The dependency pins are deliberately set inside Unsloth's tested window so the trainer actually runs — newer transformers/trl/datasets majors currently break Unsloth's patches, so 'upgrade to latest' is a trap the README warns you about.
Can I really charge $1,500-$15,000 for this? Isn’t that a lot?
Those are realistic 2026 freelance/small-shop ranges for clients who can’t legally use cloud AI — you’re pricing the compliance and the outcome, not GPU-hours (agencies charge 2-4x more). Honest expectation: your FIRST deal will likely take longer and pay less than the top of the range. Its real product is the case study and testimonial that make deals two through five much easier. The kit gets you to "I can deliver this"; the rest is showing up.
Is fine-tuning even the right approach, or should I just use RAG or a prompt?
Fine-tuning is for style, tone, format, and domain jargon — making the model SOUND like the client without a giant prompt every time — and for cheap on-prem inference on a small model. It is NOT a fact database. For facts that change weekly, that’s RAG’s job. The best deliverable is often fine-tune + a little RAG, and the included MONETIZE.md covers exactly when to sell which. Being honest about this is what earns referrals.
What’s actually in the download, and are model weights included?
Python scripts (prepare_data, train_lora, evaluate, convert_to_gguf), three model configs, a Makefile, four sample datasets, requirements.txt, a thorough README, and the MONETIZE.md income playbook with a fill-in SOW and pricing. No model weights are bundled — base weights download from Hugging Face once (Qwen3 is Apache-2.0; Llama and Gemma require a license click), and everything after that runs locally.
After you buy
Purchases are linked to your account — sign in and head to your product library to download anytime. Bought without an account? Check your email for the download link and a one-click way to set a password.
← Back to all kits, tools & codebases