FLUX Local Setup: Run AI Image Generation (2026 Guide)
Before we dive deeper...
Get your free AI Starter Kit
Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.
FLUX Quick Reference
| Variant | Steps | License | VRAM (Q4) |
|---|---|---|---|
| FLUX.1 schnell | 1-4 | Apache 2.0 | 6-8GB |
| FLUX.1 dev | 20-30 | Non-Commercial | 6-8GB |
| FLUX.1 pro | Variable | API Only | ā |
What is FLUX?
FLUX is a 12-billion parameter text-to-image model from Black Forest Labsāthe same team that created Stable Diffusion. Released in 2024, FLUX represents the next generation of open image generation.
Why FLUX Over Stable Diffusion?
| Feature | FLUX.1 | Stable Diffusion 3.5 |
|---|---|---|
| Photorealism | Excellent | Good |
| Typography | Excellent | Good (3.5), Poor (1.5/XL) |
| Human anatomy | Excellent | Struggles with fingers |
| Prompt adherence | Excellent | Good |
| Parameters | 12B | 2-8B |
Company Background
Black Forest Labs secured:
- $300M funding at $3.25B valuation (2025)
- $140M Meta partnership
- NVIDIA Blackwell integration
- Adobe Photoshop integration
FLUX Model Variants
FLUX.1 Family (12B Parameters)
| Variant | Steps | Quality | License |
|---|---|---|---|
| schnell | 1-4 | Good | Apache 2.0 (free commercial) |
| dev | 20-30 | High | Non-commercial |
| pro | Variable | Highest | API only |
FLUX.1 [schnell] ("fast" in German):
- Generates in just 1-4 steps via adversarial distillation
- Free for commercial use (Apache 2.0)
- Best for rapid prototyping
FLUX.1 [dev]:
- Guidance-distilled from pro
- Best quality for local use
- Requires commercial license for business use
FLUX.2 Family (32B Parameters)
Released November 2025 with major improvements:
- Multi-reference support (up to 10 images)
- 4-megapixel editing
- Complex typography and infographics
- Couples Mistral-3 24B vision-language model
| Variant | Parameters | Notes |
|---|---|---|
| FLUX.2 klein | 4B | Sub-second on consumer hardware |
| FLUX.2 dev/pro | 32B | Requires 54-90GB VRAM |
Hardware Requirements
VRAM by Precision
| Precision | VRAM | Quality | GPU Examples |
|---|---|---|---|
| FP16 (full) | 24-33GB | Maximum | RTX 4090, A6000 |
| FP8 | 12-16GB | Near-identical | RTX 4070 Ti, 3060 12GB |
| GGUF Q8 | 12-16GB | Near-identical | RTX 4070 Ti |
| GGUF Q5 | 8-10GB | 95%+ quality | RTX 4060, 3060 |
| GGUF Q4/NF4 | 6-8GB | Good | RTX 4060, 3060 |
Recommended GPUs
High-End (Full Models):
| GPU | VRAM | Speed |
|---|---|---|
| RTX 5090 | 32GB | ~7 sec/image |
| RTX 4090 | 24GB | ~10-18 sec/image |
| H100 | 80GB | ~1.6 sec/image |
Mid-Range (Quantized):
| GPU | VRAM | Best Quantization |
|---|---|---|
| RTX 4070 Ti Super | 16GB | Q8 |
| RTX 3060 | 12GB | Q5/Q6 |
| RTX 4060 Ti | 16GB | Q6/Q8 |
Budget:
| GPU | VRAM | Max Quantization |
|---|---|---|
| RTX 3050 | 8GB | Q4/Q5 |
| GTX 1660 Ti | 6GB | Q3/Q4 |
Apple Silicon
| Chip | Memory | Time (1024x1024) |
|---|---|---|
| M4 Max | 32-128GB | ~85 sec |
| M3 Max | 32-128GB | ~105 sec |
| M2 Max | 32-96GB | ~145 sec |
Note: 2-4x slower than NVIDIA. Use Draw Things or Stability Matrix for best Mac support.
ComfyUI Setup
Step 1: Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
Step 2: Download Required Files
Text Encoders (models/clip/):
| File | Size | Use |
|---|---|---|
| clip_l.safetensors | ~250MB | Required |
| t5xxl_fp16.safetensors | ~9.4GB | High VRAM |
| t5xxl_fp8_e4m3fn.safetensors | ~4.7GB | Low VRAM |
VAE (models/vae/):
| File | Size |
|---|---|
| flux_ae.safetensors | ~335MB |
UNET Model (models/unet/):
| File | VRAM | Quality |
|---|---|---|
| flux1-dev.safetensors | 24GB+ | Maximum |
| flux1-dev-fp8.safetensors | 12-16GB | Excellent |
| flux1-dev-Q8_0.gguf | 12-16GB | Excellent |
| flux1-dev-Q5_0.gguf | 8-10GB | Very good |
| flux1-dev-Q4_0.gguf | 6-8GB | Good |
Step 3: For GGUF Models (Low VRAM)
- Open ComfyUI Manager
- Install "ComfyUI-GGUF" node
- Restart ComfyUI
- Use GGUF-specific workflow
Step 4: Run ComfyUI
# Standard
python main.py
# Low VRAM (8-12GB)
python main.py --lowvram
# Very Low VRAM (6-8GB)
python main.py --lowvram --cpu-text-encoder
Forge WebUI Setup
Note: Automatic1111 does NOT support FLUX. Use Forge instead.
Installation
- Download Forge one-click package (CUDA 12.1 + PyTorch 2.3.1)
- Extract and run
update.bat - Run
run.bat
Model Download
Download flux1-dev-bnb-nf4 from Hugging Face:
https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main
Place in: stable-diffusion-webui-forge/models/Stable-diffusion/
Python/Diffusers Setup
import torch
from diffusers import FluxPipeline
# Load model
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16
)
# Memory optimization
pipe.enable_model_cpu_offload()
# Generate image
image = pipe(
"A photorealistic portrait of a woman, golden hour lighting, "
"shot on Fujifilm X-T5, 35mm f/1.4",
num_inference_steps=28,
guidance_scale=3.5
).images[0]
image.save("output.png")
4-bit Quantization (Low VRAM)
from diffusers import FluxPipeline, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
quantization_config=quantization_config,
device_map="cpu"
)
pipe.enable_model_cpu_offload()
Prompting Guide
Prompt Structure
Subject + Action + Style + Context
Example Prompts
Photorealistic:
A weathered fisherman with deep wrinkles, wearing a yellow raincoat,
standing on a wooden dock at golden hour, dramatic rim lighting,
shot on Fujifilm X-T5, 35mm f/1.4
Artistic:
A bioluminescent forest with crystalline trees, ethereal mist
rising from an obsidian lake, otherworldly atmosphere,
hyper-detailed fantasy illustration
Typography:
A neon sign reading "OPEN 24 HOURS" in pink and blue,
mounted on a brick wall, rain-slicked street reflections,
night photography, shallow depth of field
Prompting Do's and Don'ts
| Do | Don't |
|---|---|
| Write naturally | Use prompt weights |
| Be specific | Use negative prompts |
| Include camera details | Overload with keywords |
| Layer foreground to background | Describe sequential actions |
Recommended Settings
FLUX.1 [dev]
| Setting | Value |
|---|---|
| Steps | 20-30 (25 optimal) |
| CFG Scale | 3.5 (art) or 1-3 (photo) |
| Sampler | Euler |
| Resolution | 1024x1024 |
| Seed | -1 (variety) |
FLUX.1 [schnell]
| Setting | Value |
|---|---|
| Steps | 1-4 (up to 8 possible) |
| CFG Scale | 4-9 |
| Sampler | Euler |
| Resolution | 1024x1024 |
Speed LoRAs
Use HyperFlux or FluxTurbo LoRAs to reduce dev from 25 steps to 4-9:
| LoRA | Steps | Quality |
|---|---|---|
| HyperFlux | 4-8 | 90%+ |
| FluxTurbo | 7-9 | 95%+ |
Memory Optimization
ComfyUI Launch Flags
python main.py \
--lowvram \
--cpu-text-encoder \
--preview-method none \
--disable-xformers
Flag Reference
| Flag | Effect |
|---|---|
--lowvram | Aggressive memory management |
--cpu-text-encoder | Offload T5 to CPU (saves 1-2GB) |
--cpu-vae | Offload VAE to CPU |
--preview-method none | Disable previews |
General Tips
- Close background apps (browsers, Discord)
- Reduce resolution for testing (768x768)
- Keep batch size at 1
- Use GGUF Q5 - 95%+ quality at 1/4 memory
- Restart ComfyUI between model changes
VRAM Rule of Thumb
GGUF file size ā VRAM usage
- Q8: ~12-13GB file = ~12-13GB VRAM
- Q5: ~6-8GB file = ~6-8GB VRAM
- Q4: ~4-6GB file = ~4-6GB VRAM
FLUX ControlNets
Available Tools
| Tool | Purpose | Location |
|---|---|---|
| Canny | Edge-guided | models/diffusion_models/ |
| Depth | Depth-map control | models/diffusion_models/ |
| Redux | Image mixing | models/style_models/ |
| Fill | Inpainting | models/diffusion_models/ |
Download
Full models from Hugging Face:
flux1-canny-dev.safetensorsflux1-depth-dev.safetensors
LoRA versions for lower VRAM:
flux1-canny-dev-lora.safetensorsflux1-depth-dev-lora.safetensors
Redux requires sigclip_vision encoder in models/clip_vision/.
Performance Benchmarks
Generation Speed
| GPU | Resolution | Steps | Time |
|---|---|---|---|
| RTX 5090 | 1024x1024 | 20 | ~7 sec |
| RTX 4090 | 1024x1024 | 20 | ~10-18 sec |
| RTX 4090 (first) | 1024x1024 | 20 | ~41 sec |
| M4 Max | 1024x1024 | 20 | ~85 sec |
Quality vs Speed Trade-off
| Model | Steps | Speed | Quality |
|---|---|---|---|
| schnell | 4 | Fastest | Good |
| dev + HyperFlux | 8 | Fast | Very good |
| dev | 25 | Moderate | Excellent |
| dev | 30 | Slower | Maximum |
Key Takeaways
- FLUX.1 schnell is Apache 2.0 - Free for commercial use
- 8GB GPUs work with GGUF Q4/Q5 quantization
- RTX 4090 generates in 10-18 seconds at full quality
- Natural language prompting - No weights or negatives
- Use FP8 T5 encoder to save 5GB VRAM
- Apple Silicon is 2-4x slower but works with MPS
- FLUX.2 requires 54-90GB VRAM - Most stay on FLUX.1
Next Steps
- Check VRAM requirements for your GPU
- Compare with RTX 5090 for upgrades
- Learn quantization techniques
- Explore local AI tools for LLMs
- Set up RAG for text-based AI
FLUX represents the cutting edge of open-source image generation, delivering Midjourney-level quality that runs on consumer hardware. Whether you're using a high-end RTX 4090 for instant generation or an 8GB GPU with quantized models, FLUX enables professional-quality AI art creation without cloud dependencies or API costs.
Ready to start your AI career?
Get the complete roadmap
Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!