★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Image Generation

Train an Image LoRA Locally (2026): Kohya, SDXL & FLUX

June 20, 2026
12 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Generating images locally? Take it further. From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Start free
Or own it for life — Lifetime $149, pay once

To train a Stable Diffusion image LoRA locally in 2026, use Kohya's sd-scripts (the standard trainer): an SDXL LoRA needs about 12 GB of VRAM minimum (16 GB comfortable), while a FLUX.1 LoRA traditionally wanted 24 GB but now runs on 16 GB — and as low as 4-8 GB — thanks to Kohya's fused-backward-pass optimizations added in v0.9.0 (January 2025). You only need 10-30 captioned images, a sensible learning rate (around 1e-4), and a LoRA rank of roughly 16-32 for FLUX or 32-64 for SDXL. Plan on 1-3 hours of training on a 24 GB card, then publish the resulting .safetensors file to CivitAI.

This guide is specifically about image LoRAs — teaching a diffusion model a new character, art style, or product. That is a completely different toolchain from training a text LLM LoRA. If you came here looking to fine-tune a language model with Unsloth/PEFT instead, read our separate LLM LoRA fine-tuning guide; everything below targets SDXL and FLUX.1 diffusion models.

What is the standard tool for training an image LoRA locally?

The de facto standard is Kohya's sd-scripts (kohya-ss/sd-scripts on GitHub) — a Python toolkit of training scripts for Stable Diffusion and related image models. It ships sdxl_train_network.py for SDXL LoRAs and flux_train_network.py for FLUX.1 LoRAs, plus support for SD 1.5/2.x, SD3/3.5 and other models. Almost every other local image-LoRA tool wraps these scripts.

Your practical options, all built on the same engine:

ToolWhat it isBest forMin VRAM (LoRA)
kohya-ss/sd-scriptsThe raw upstream training scripts (CLI/TOML config)Maximum control, repeatable configsSDXL ~8-12 GB / FLUX ~4-16 GB
bmaltais/kohya_ssA Gradio GUI wrapping sd-scriptsBeginners who want a UI over the same scriptsSame as above
FluxGymA "dead simple" low-VRAM web UI for FLUX, wrapping KohyaFLUX on 12/16/20 GB cardsFLUX ~12 GB
ComfyUI-FluxTrainer (kijai)A ComfyUI node set wrapping modified Kohya FLUX scriptsPeople already living in ComfyUIFLUX ~12 GB (split_mode)

The takeaway: pick the interface you like, but the actual math is Kohya's underneath. You can verify the script list and supported models on the official kohya-ss/sd-scripts repository. If you have never installed a local diffusion stack at all, start with our FLUX local image generation guide first so you already have the base models and Python environment in place.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

SDXL LoRA vs FLUX LoRA: which should you train?

This is the first real decision, because it sets your VRAM floor and your settings.

  • SDXL (Stable Diffusion XL) is a 3.5B-parameter U-Net diffusion model. It is the easy, forgiving option: lighter on VRAM, faster to train, very tolerant of imperfect settings, and backed by a massive ecosystem of existing LoRAs and checkpoints.
  • FLUX.1 is a 12-billion-parameter rectified-flow transformer (a DiT — Diffusion Transformer) from Black Forest Labs. It produces noticeably better prompt adherence and anatomy, but it is heavier and — importantly — more sensitive to learning rate than SDXL. Push the LR too high on FLUX and outputs over-saturate or distort; SDXL shrugs off the same mistake.
FactorSDXL LoRAFLUX.1 LoRA
Base architecture3.5B U-Net diffusion12B rectified-flow DiT
Min VRAM (LoRA)~12 GB (8-10 GB with optimizations)~16 GB (4-8 GB heavily optimized)
Recommended VRAM16 GB24 GB
Typical LoRA rank (dim)32-6416-32
Typical learning rate~1e-4~1e-4, but lower-tolerance
LR sensitivityForgivingTouchy — sweep carefully
Training resolution1024×10241024×1024
Image quality ceilingHighHigher (better hands/text)

Rule of thumb: if you have a 12 GB card, train SDXL. If you have 16 GB, you can do either (FLUX via FluxGym or ComfyUI-FluxTrainer). If you have 24 GB, train FLUX comfortably and stop worrying about optimization flags.

What about FLUX.2? Black Forest Labs released the FLUX.2 series on 25 November 2025 (Pro, Flex, Dev, and an Apache-2.0 "Klein" variant). The open-weight FLUX.2 dev model is larger and heavier than FLUX.1 and realistically wants 24 GB+ to train, while the smaller Klein is aimed at consumer cards. It's newer, and the consumer-GPU training tooling is still settling. This guide deliberately stays on the proven, lighter FLUX.1 and SDXL path, which is what most people training image LoRAs on their own hardware still use — the workflow below carries over to FLUX.2 once your VRAM and tooling are ready for the bigger model.

How much VRAM do you actually need to train an image LoRA?

Here is the honest, by-tier breakdown for LoRA training specifically (full fine-tuning needs far more). The low-end FLUX numbers are real but come with tradeoffs — block swapping to system RAM, smaller batch sizes, and longer training times.

GPU VRAMSDXL LoRAFLUX.1 LoRANotes
8 GB⚠️ Possible⚠️ Possible (heavy optimization)Slow; gradient checkpointing + 8-bit Adam + block swap mandatory
12 GB✅ Comfortable✅ Via FluxGym / ComfyUI split_modeThe practical SDXL sweet spot (e.g. RTX 3060 12GB)
16 GB✅ Easy, batch 2-4✅ GoodRTX 4060 Ti 16GB, 16GB Mac viable
24 GB✅ Headroom to spare✅ Recommended FLUX tierRTX 3090 / 4090 — the no-compromise choice
48 GB+✅ Full fine-tunes tooA6000/A100 territory

The big VRAM unlock was Kohya's fused backward pass in sd-scripts v0.9.0 (January 2025), which is what dropped FLUX LoRA training from a hard 24 GB requirement down toward 4-8 GB on optimized configs. Note that "it fits in 8 GB" and "it trains well in 8 GB" are different statements — expect long runs and small batches at the bottom of this table. For a deeper look at what a 24 GB card buys you across image work, see our RTX 3090 local AI guide.

How do you prepare a dataset for an image LoRA?

The dataset is where most LoRAs are won or lost. Quality and consistency beat quantity every time — 20 sharp, varied images outperform 75 inconsistent ones.

1. Collect images (10-30 is plenty). A character LoRA can train on as few as 10 good images; a style LoRA usually wants 15-30 to capture range. Vary the pose, lighting, and background so the model learns the subject, not the backdrop.

2. Resize to the training resolution. For both SDXL and FLUX, 1024×1024 is the standard. Kohya's bucketing can handle mixed aspect ratios, but consistent square crops are the simplest path.

3. Caption every image. Each image needs a matching .txt caption file. For a character LoRA, use a unique trigger word plus minimal description (e.g. "ohwx woman, smiling, outdoor"). For a style LoRA, describe the content and let the style be learned implicitly, or use a style trigger token. Auto-captioners (BLIP, WD14 tagger) get you 80% there; hand-edit the rest.

4. Organize folders with a repeat count. Kohya reads folder names like 10_ohwx woman, where 10 is the number of repeats per image per epoch. Repeats × image count × epochs = total steps.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

What learning rate, rank, and step count should you use?

These are sane starting points, not gospel — every dataset is different. Treat the FLUX learning rate as something to sweep, because, as noted, FLUX is far less forgiving than SDXL here.

SettingSDXL LoRAFLUX.1 LoRA
Network rank (dim)32 (up to 64)16-32
Network alpha16 (half of dim is common)16
Learning rate1e-4~1e-4, sweep 5e-5 → 2e-4
OptimizerAdamW8bitAdamW8bit
Resolution10241024
Total steps~1,000-2,000~1,000-2,500
Gradient checkpointingOn (saves VRAM)On (saves VRAM)

A few notes from running these: higher rank = larger LoRA file and more capacity to memorize, but also more risk of overfitting on a small dataset — for a single character, rank 16-32 is usually enough. The AdamW8bit optimizer alone cuts VRAM meaningfully versus full-precision AdamW, with negligible quality loss, which is why it appears in nearly every low-VRAM recipe.

How do you train SDXL with Kohya sd-scripts (CLI)?

Once your dataset folder and a TOML config are ready, an SDXL LoRA run looks roughly like this. (Paths and the exact flag set depend on your config; this is the shape of the command, not a copy-paste-forever recipe.)

accelerate launch sdxl_train_network.py \
  --pretrained_model_name_or_path="/models/sd_xl_base_1.0.safetensors" \
  --dataset_config="/dataset/config.toml" \
  --output_dir="/output" --output_name="my_sdxl_lora" \
  --network_module=networks.lora \
  --network_dim=32 --network_alpha=16 \
  --learning_rate=1e-4 --optimizer_type=AdamW8bit \
  --max_train_steps=1500 --mixed_precision=bf16 \
  --gradient_checkpointing --save_model_as=safetensors

The FLUX equivalent uses flux_train_network.py and additionally points at the FLUX text encoders (CLIP-L and T5-XXL) and the FLUX VAE/autoencoder. Because of FLUX's size, low-VRAM runs add block-swap flags to offload transformer blocks to system RAM — which is exactly the kind of plumbing FluxGym and ComfyUI-FluxTrainer hide for you.

What is the ComfyUI Flux Trainer alternative?

If you already run image generation in ComfyUI, you do not need to touch a terminal. ComfyUI-FluxTrainer (by kijai) is a set of custom nodes that wrap slightly modified Kohya FLUX training scripts, letting you build a training workflow as a node graph — and use the same base models you already generate with. It can run on around 12 GB of VRAM with split_mode enabled (it then leans on roughly 32 GB of system RAM), and you install it straight from the ComfyUI Manager.

The appeal is comparison: because settings are nodes, you can branch a graph to test two ranks or learning rates side by side. The tradeoff is that node-based training is fiddlier to reproduce than a saved TOML. New to ComfyUI itself? Our complete ComfyUI guide covers installation and the node basics you'll need before adding the trainer. You can also read the node documentation on the ComfyUI-FluxTrainer repository.

Recipes: character, style, and product LoRAs

The same engine, tuned differently for the three most common jobs:

  • Character LoRA. 10-20 images of one subject across varied poses/lighting, a unique trigger token, rank 16-32, ~1,500 steps. Keep backgrounds varied so the LoRA learns the face/body, not a setting. This is the most forgiving recipe and a great first project.
  • Style LoRA. 15-30 images that share an art style but differ in subject. Caption the content of each image and let the style be the constant the model absorbs. Slightly higher rank (32) helps capture broad stylistic range; watch for the LoRA "leaking" specific subjects from your dataset (a sign of too-high rank or too-few images).
  • Product LoRA. 15-25 clean shots of the product on neutral and varied backgrounds, multiple angles. Consistency of the object matters most — crop tightly, keep lighting honest, and avoid heavy filters that the model would memorize as part of the product.

How long does training take, and where do you publish it?

Approximate, single-machine figures. On an RTX 3090 (24 GB), I'd ballpark an SDXL LoRA at roughly 30-60 minutes for ~1,500 steps at 1024px, and a FLUX.1 LoRA noticeably longer — on the order of 1-3 hours for a comparable run — because of FLUX's 12B size. These are rough, hardware- and config-dependent numbers, not a benchmark; batch size, resolution, and whether you're block-swapping to system RAM swing them a lot. The moment FLUX has to offload blocks to RAM on a smaller card, training time climbs steeply.

When training finishes you get a single .safetensors file. Drop it in your models/Lora folder to use locally, and to share it, the standard hub is CivitAI — the largest community catalog of image models, hosting the overwhelming majority of public Stable Diffusion and FLUX LoRAs (tens of thousands of LoRA resources and counting). On CivitAI you upload the file as a "LoRA" resource, set a category and tags, write a description with example prompts, and choose your merge/commercial permissions before publishing.

Key Takeaways

  1. Kohya's sd-scripts is the standard image-LoRA trainer. FluxGym, the bmaltais GUI, and ComfyUI-FluxTrainer all wrap it — pick the interface, the engine is the same.
  2. SDXL needs ~12 GB minimum (16 GB comfortable); FLUX wanted 24 GB but now runs on 16 GB and even 4-8 GB thanks to Kohya's fused backward pass (v0.9.0, Jan 2025) — with longer runs at the bottom.
  3. FLUX is more learning-rate sensitive than SDXL. Both sit near 1e-4, but sweep FLUX (5e-5 → 2e-4) and use a lower rank (16-32 vs SDXL's 32-64).
  4. 10-30 captioned images is enough. Quality and variety beat quantity; a character LoRA trains on as few as 10 good images.
  5. Train, then publish the .safetensors to CivitAI — the dominant community hub for image LoRAs.

Next Steps

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path
More on Local Image Generation
See the full Run FLUX.1 Locally guide.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators