To train a Stable Diffusion image LoRA locally in 2026, use Kohya's sd-scripts (the standard trainer): an SDXL LoRA needs about 12 GB of VRAM minimum (16 GB comfortable), while a FLUX.1 LoRA traditionally wanted 24 GB but now runs on 16 GB — and as low as 4-8 GB — thanks to Kohya's fused-backward-pass optimizations added in v0.9.0 (January 2025). You only need 10-30 captioned images, a sensible learning rate (around 1e-4), and a LoRA rank of roughly 16-32 for FLUX or 32-64 for SDXL. Plan on 1-3 hours of training on a 24 GB card, then publish the resulting .safetensors file to CivitAI.

This guide is specifically about image LoRAs — teaching a diffusion model a new character, art style, or product. That is a completely different toolchain from training a text LLM LoRA. If you came here looking to fine-tune a language model with Unsloth/PEFT instead, read our separate LLM LoRA fine-tuning guide; everything below targets SDXL and FLUX.1 diffusion models.

What is the standard tool for training an image LoRA locally?

The de facto standard is Kohya's sd-scripts (kohya-ss/sd-scripts on GitHub) — a Python toolkit of training scripts for Stable Diffusion and related image models. It ships sdxl_train_network.py for SDXL LoRAs and flux_train_network.py for FLUX.1 LoRAs, plus support for SD 1.5/2.x, SD3/3.5 and other models. Almost every other local image-LoRA tool wraps these scripts.

Your practical options, all built on the same engine:

Tool	What it is	Best for	Min VRAM (LoRA)
kohya-ss/sd-scripts	The raw upstream training scripts (CLI/TOML config)	Maximum control, repeatable configs	SDXL ~8-12 GB / FLUX ~4-16 GB
bmaltais/kohya_ss	A Gradio GUI wrapping sd-scripts	Beginners who want a UI over the same scripts	Same as above
FluxGym	A "dead simple" low-VRAM web UI for FLUX, wrapping Kohya	FLUX on 12/16/20 GB cards	FLUX ~12 GB
ComfyUI-FluxTrainer (kijai)	A ComfyUI node set wrapping modified Kohya FLUX scripts	People already living in ComfyUI	FLUX ~12 GB (split_mode)

The takeaway: pick the interface you like, but the actual math is Kohya's underneath. You can verify the script list and supported models on the official kohya-ss/sd-scripts repository. If you have never installed a local diffusion stack at all, start with our FLUX local image generation guide first so you already have the base models and Python environment in place.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

SDXL LoRA vs FLUX LoRA: which should you train?

This is the first real decision, because it sets your VRAM floor and your settings.

SDXL (Stable Diffusion XL) is a 3.5B-parameter U-Net diffusion model. It is the easy, forgiving option: lighter on VRAM, faster to train, very tolerant of imperfect settings, and backed by a massive ecosystem of existing LoRAs and checkpoints.
FLUX.1 is a 12-billion-parameter rectified-flow transformer (a DiT — Diffusion Transformer) from Black Forest Labs. It produces noticeably better prompt adherence and anatomy, but it is heavier and — importantly — more sensitive to learning rate than SDXL. Push the LR too high on FLUX and outputs over-saturate or distort; SDXL shrugs off the same mistake.

Factor	SDXL LoRA	FLUX.1 LoRA
Base architecture	3.5B U-Net diffusion	12B rectified-flow DiT
Min VRAM (LoRA)	~12 GB (8-10 GB with optimizations)	~16 GB (4-8 GB heavily optimized)
Recommended VRAM	16 GB	24 GB
Typical LoRA rank (dim)	32-64	16-32
Typical learning rate	~1e-4	~1e-4, but lower-tolerance
LR sensitivity	Forgiving	Touchy — sweep carefully
Training resolution	1024×1024	1024×1024
Image quality ceiling	High	Higher (better hands/text)

Rule of thumb: if you have a 12 GB card, train SDXL. If you have 16 GB, you can do either (FLUX via FluxGym or ComfyUI-FluxTrainer). If you have 24 GB, train FLUX comfortably and stop worrying about optimization flags.

What about FLUX.2? Black Forest Labs released the FLUX.2 series on 25 November 2025 (Pro, Flex, Dev, and an Apache-2.0 "Klein" variant). The open-weight FLUX.2 dev model is larger and heavier than FLUX.1 and realistically wants 24 GB+ to train, while the smaller Klein is aimed at consumer cards. It's newer, and the consumer-GPU training tooling is still settling. This guide deliberately stays on the proven, lighter FLUX.1 and SDXL path, which is what most people training image LoRAs on their own hardware still use — the workflow below carries over to FLUX.2 once your VRAM and tooling are ready for the bigger model.

How much VRAM do you actually need to train an image LoRA?

Here is the honest, by-tier breakdown for LoRA training specifically (full fine-tuning needs far more). The low-end FLUX numbers are real but come with tradeoffs — block swapping to system RAM, smaller batch sizes, and longer training times.

GPU VRAM	SDXL LoRA	FLUX.1 LoRA	Notes
8 GB	⚠️ Possible	⚠️ Possible (heavy optimization)	Slow; gradient checkpointing + 8-bit Adam + block swap mandatory
12 GB	✅ Comfortable	✅ Via FluxGym / ComfyUI split_mode	The practical SDXL sweet spot (e.g. RTX 3060 12GB)
16 GB	✅ Easy, batch 2-4	✅ Good	RTX 4060 Ti 16GB, 16GB Mac viable
24 GB	✅ Headroom to spare	✅ Recommended FLUX tier	RTX 3090 / 4090 — the no-compromise choice
48 GB+	✅	✅ Full fine-tunes too	A6000/A100 territory

The big VRAM unlock was Kohya's fused backward pass in sd-scripts v0.9.0 (January 2025), which is what dropped FLUX LoRA training from a hard 24 GB requirement down toward 4-8 GB on optimized configs. Note that "it fits in 8 GB" and "it trains well in 8 GB" are different statements — expect long runs and small batches at the bottom of this table. For a deeper look at what a 24 GB card buys you across image work, see our RTX 3090 local AI guide.

How do you prepare a dataset for an image LoRA?

The dataset is where most LoRAs are won or lost. Quality and consistency beat quantity every time — 20 sharp, varied images outperform 75 inconsistent ones.

1. Collect images (10-30 is plenty). A character LoRA can train on as few as 10 good images; a style LoRA usually wants 15-30 to capture range. Vary the pose, lighting, and background so the model learns the subject, not the backdrop.

2. Resize to the training resolution. For both SDXL and FLUX, 1024×1024 is the standard. Kohya's bucketing can handle mixed aspect ratios, but consistent square crops are the simplest path.

3. Caption every image. Each image needs a matching .txt caption file. For a character LoRA, use a unique trigger word plus minimal description (e.g. "ohwx woman, smiling, outdoor"). For a style LoRA, describe the content and let the style be learned implicitly, or use a style trigger token. Auto-captioners (BLIP, WD14 tagger) get you 80% there; hand-edit the rest.

4. Organize folders with a repeat count. Kohya reads folder names like 10_ohwx woman, where 10 is the number of repeats per image per epoch. Repeats × image count × epochs = total steps.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

What learning rate, rank, and step count should you use?

These are sane starting points, not gospel — every dataset is different. Treat the FLUX learning rate as something to sweep, because, as noted, FLUX is far less forgiving than SDXL here.

Setting	SDXL LoRA	FLUX.1 LoRA
Network rank (dim)	32 (up to 64)	16-32
Network alpha	16 (half of dim is common)	16
Learning rate	1e-4	~1e-4, sweep 5e-5 → 2e-4
Optimizer	AdamW8bit	AdamW8bit
Resolution	1024	1024
Total steps	~1,000-2,000	~1,000-2,500
Gradient checkpointing	On (saves VRAM)	On (saves VRAM)

A few notes from running these: higher rank = larger LoRA file and more capacity to memorize, but also more risk of overfitting on a small dataset — for a single character, rank 16-32 is usually enough. The AdamW8bit optimizer alone cuts VRAM meaningfully versus full-precision AdamW, with negligible quality loss, which is why it appears in nearly every low-VRAM recipe.

How do you train SDXL with Kohya sd-scripts (CLI)?

Once your dataset folder and a TOML config are ready, an SDXL LoRA run looks roughly like this. (Paths and the exact flag set depend on your config; this is the shape of the command, not a copy-paste-forever recipe.)

accelerate launch sdxl_train_network.py \
  --pretrained_model_name_or_path="/models/sd_xl_base_1.0.safetensors" \
  --dataset_config="/dataset/config.toml" \
  --output_dir="/output" --output_name="my_sdxl_lora" \
  --network_module=networks.lora \
  --network_dim=32 --network_alpha=16 \
  --learning_rate=1e-4 --optimizer_type=AdamW8bit \
  --max_train_steps=1500 --mixed_precision=bf16 \
  --gradient_checkpointing --save_model_as=safetensors

The FLUX equivalent uses flux_train_network.py and additionally points at the FLUX text encoders (CLIP-L and T5-XXL) and the FLUX VAE/autoencoder. Because of FLUX's size, low-VRAM runs add block-swap flags to offload transformer blocks to system RAM — which is exactly the kind of plumbing FluxGym and ComfyUI-FluxTrainer hide for you.

What is the ComfyUI Flux Trainer alternative?

If you already run image generation in ComfyUI, you do not need to touch a terminal. ComfyUI-FluxTrainer (by kijai) is a set of custom nodes that wrap slightly modified Kohya FLUX training scripts, letting you build a training workflow as a node graph — and use the same base models you already generate with. It can run on around 12 GB of VRAM with split_mode enabled (it then leans on roughly 32 GB of system RAM), and you install it straight from the ComfyUI Manager.

The appeal is comparison: because settings are nodes, you can branch a graph to test two ranks or learning rates side by side. The tradeoff is that node-based training is fiddlier to reproduce than a saved TOML. New to ComfyUI itself? Our complete ComfyUI guide covers installation and the node basics you'll need before adding the trainer. You can also read the node documentation on the ComfyUI-FluxTrainer repository.

Recipes: character, style, and product LoRAs

The same engine, tuned differently for the three most common jobs:

Character LoRA. 10-20 images of one subject across varied poses/lighting, a unique trigger token, rank 16-32, ~1,500 steps. Keep backgrounds varied so the LoRA learns the face/body, not a setting. This is the most forgiving recipe and a great first project.
Style LoRA. 15-30 images that share an art style but differ in subject. Caption the content of each image and let the style be the constant the model absorbs. Slightly higher rank (32) helps capture broad stylistic range; watch for the LoRA "leaking" specific subjects from your dataset (a sign of too-high rank or too-few images).
Product LoRA. 15-25 clean shots of the product on neutral and varied backgrounds, multiple angles. Consistency of the object matters most — crop tightly, keep lighting honest, and avoid heavy filters that the model would memorize as part of the product.

How long does training take, and where do you publish it?

Approximate, single-machine figures. On an RTX 3090 (24 GB), I'd ballpark an SDXL LoRA at roughly 30-60 minutes for ~1,500 steps at 1024px, and a FLUX.1 LoRA noticeably longer — on the order of 1-3 hours for a comparable run — because of FLUX's 12B size. These are rough, hardware- and config-dependent numbers, not a benchmark; batch size, resolution, and whether you're block-swapping to system RAM swing them a lot. The moment FLUX has to offload blocks to RAM on a smaller card, training time climbs steeply.

When training finishes you get a single .safetensors file. Drop it in your models/Lora folder to use locally, and to share it, the standard hub is CivitAI — the largest community catalog of image models, hosting the overwhelming majority of public Stable Diffusion and FLUX LoRAs (tens of thousands of LoRA resources and counting). On CivitAI you upload the file as a "LoRA" resource, set a category and tags, write a description with example prompts, and choose your merge/commercial permissions before publishing.

Key Takeaways

Kohya's sd-scripts is the standard image-LoRA trainer. FluxGym, the bmaltais GUI, and ComfyUI-FluxTrainer all wrap it — pick the interface, the engine is the same.
SDXL needs ~12 GB minimum (16 GB comfortable); FLUX wanted 24 GB but now runs on 16 GB and even 4-8 GB thanks to Kohya's fused backward pass (v0.9.0, Jan 2025) — with longer runs at the bottom.
FLUX is more learning-rate sensitive than SDXL. Both sit near 1e-4, but sweep FLUX (5e-5 → 2e-4) and use a lower rank (16-32 vs SDXL's 32-64).
10-30 captioned images is enough. Quality and variety beat quantity; a character LoRA trains on as few as 10 good images.
Train, then publish the .safetensors to CivitAI — the dominant community hub for image LoRAs.

Next Steps

Don't have the base models yet? Set up generation first with our FLUX local image generation guide.
Prefer a visual workflow over the CLI? Read the complete ComfyUI guide, then add ComfyUI-FluxTrainer.
Training a text model instead of an image one? That's a different stack — see the LLM LoRA fine-tuning guide.
Weighing the 24 GB upgrade for faster FLUX runs? Our RTX 3090 local AI guide covers what that VRAM unlocks.

Train an Image LoRA Locally (2026): Kohya, SDXL & FLUX

Want to go deeper than this article?

What is the standard tool for training an image LoRA locally?

Reading articles is good. Building is better.

SDXL LoRA vs FLUX LoRA: which should you train?

How much VRAM do you actually need to train an image LoRA?

How do you prepare a dataset for an image LoRA?

Reading articles is good. Building is better.

What learning rate, rank, and step count should you use?

How do you train SDXL with Kohya sd-scripts (CLI)?

What is the ComfyUI Flux Trainer alternative?

Recipes: character, style, and product LoRAs

How long does training take, and where do you publish it?

Key Takeaways

Next Steps

Generating images locally? Take it further.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

FLUX Local Image Generation

Complete ComfyUI Guide

LLM LoRA Fine-Tuning Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI