FLUX.1 VRAM Requirements & Local Setup Guide (2026)
Want to go deeper than this article?
The AI Learning Path covers this topic and more — hands-on chapters across 10 courses across 10 courses.
FLUX Quick Reference
| Variant | Steps | License | VRAM (Q4) |
|---|---|---|---|
| FLUX.1 schnell | 1-4 | Apache 2.0 | 6-8GB |
| FLUX.1 dev | 20-30 | Non-Commercial | 6-8GB |
| FLUX.1 pro | Variable | API Only | — |
What is FLUX?
FLUX is a 12-billion parameter text-to-image model from Black Forest Labs—the same team that created Stable Diffusion. Released in 2024, FLUX represents the next generation of open image generation.
Why FLUX Over Stable Diffusion?
| Feature | FLUX.1 | Stable Diffusion 3.5 |
|---|---|---|
| Photorealism | Excellent | Good |
| Typography | Excellent | Good (3.5), Poor (1.5/XL) |
| Human anatomy | Excellent | Struggles with fingers |
| Prompt adherence | Excellent | Good |
| Parameters | 12B | 2-8B |
Company Background
Black Forest Labs secured:
- $300M funding at $3.25B valuation (2025)
- $140M Meta partnership
- NVIDIA Blackwell integration
- Adobe Photoshop integration
FLUX Model Variants
FLUX.1 Family (12B Parameters)
| Variant | Steps | Quality | License |
|---|---|---|---|
| schnell | 1-4 | Good | Apache 2.0 (free commercial) |
| dev | 20-30 | High | Non-commercial |
| pro | Variable | Highest | API only |
FLUX.1 [schnell] ("fast" in German):
- Generates in just 1-4 steps via adversarial distillation
- Free for commercial use (Apache 2.0)
- Best for rapid prototyping
FLUX.1 [dev]:
- Guidance-distilled from pro
- Best quality for local use
- Requires commercial license for business use
FLUX.2 Family (32B Parameters)
Released November 2025 with major improvements:
- Multi-reference support (up to 10 images)
- 4-megapixel editing
- Complex typography and infographics
- Couples Mistral-3 24B vision-language model
| Variant | Parameters | Notes |
|---|---|---|
| FLUX.2 klein | 4B | Sub-second on consumer hardware |
| FLUX.2 dev/pro | 32B | Requires 54-90GB VRAM |
Hardware Requirements
VRAM by Precision
| Precision | VRAM | Quality | GPU Examples |
|---|---|---|---|
| FP16 (full) | 24-33GB | Maximum | RTX 4090, A6000 |
| FP8 | 12-16GB | Near-identical | RTX 4070 Ti, 3060 12GB |
| GGUF Q8 | 12-16GB | Near-identical | RTX 4070 Ti |
| GGUF Q5 | 8-10GB | 95%+ quality | RTX 4060, 3060 |
| GGUF Q4/NF4 | 6-8GB | Good | RTX 4060, 3060 |
Recommended GPUs
High-End (Full Models):
| GPU | VRAM | Speed |
|---|---|---|
| RTX 5090 | 32GB | ~7 sec/image |
| RTX 4090 | 24GB | ~10-18 sec/image |
| H100 | 80GB | ~1.6 sec/image |
Mid-Range (Quantized):
| GPU | VRAM | Best Quantization |
|---|---|---|
| RTX 4070 Ti Super | 16GB | Q8 |
| RTX 3060 | 12GB | Q5/Q6 |
| RTX 4060 Ti | 16GB | Q6/Q8 |
Budget:
| GPU | VRAM | Max Quantization |
|---|---|---|
| RTX 3050 | 8GB | Q4/Q5 |
| GTX 1660 Ti | 6GB | Q3/Q4 |
Apple Silicon
| Chip | Memory | Time (1024x1024) |
|---|---|---|
| M4 Max | 32-128GB | ~85 sec |
| M3 Max | 32-128GB | ~105 sec |
| M2 Max | 32-96GB | ~145 sec |
Note: 2-4x slower than NVIDIA. Use Draw Things or Stability Matrix for best Mac support.
ComfyUI Setup
Step 1: Install ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
Step 2: Download Required Files
Text Encoders (models/clip/):
| File | Size | Use |
|---|---|---|
| clip_l.safetensors | ~250MB | Required |
| t5xxl_fp16.safetensors | ~9.4GB | High VRAM |
| t5xxl_fp8_e4m3fn.safetensors | ~4.7GB | Low VRAM |
VAE (models/vae/):
| File | Size |
|---|---|
| flux_ae.safetensors | ~335MB |
UNET Model (models/unet/):
| File | VRAM | Quality |
|---|---|---|
| flux1-dev.safetensors | 24GB+ | Maximum |
| flux1-dev-fp8.safetensors | 12-16GB | Excellent |
| flux1-dev-Q8_0.gguf | 12-16GB | Excellent |
| flux1-dev-Q5_0.gguf | 8-10GB | Very good |
| flux1-dev-Q4_0.gguf | 6-8GB | Good |
Step 3: For GGUF Models (Low VRAM)
- Open ComfyUI Manager
- Install "ComfyUI-GGUF" node
- Restart ComfyUI
- Use GGUF-specific workflow
Step 4: Run ComfyUI
# Standard
python main.py
# Low VRAM (8-12GB)
python main.py --lowvram
# Very Low VRAM (6-8GB)
python main.py --lowvram --cpu-text-encoder
Forge WebUI Setup
Note: Automatic1111 does NOT support FLUX. Use Forge instead.
Installation
- Download Forge one-click package (CUDA 12.1 + PyTorch 2.3.1)
- Extract and run
update.bat - Run
run.bat
Model Download
Download flux1-dev-bnb-nf4 from Hugging Face:
https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main
Place in: stable-diffusion-webui-forge/models/Stable-diffusion/
Python/Diffusers Setup
import torch
from diffusers import FluxPipeline
# Load model
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16
)
# Memory optimization
pipe.enable_model_cpu_offload()
# Generate image
image = pipe(
"A photorealistic portrait of a woman, golden hour lighting, "
"shot on Fujifilm X-T5, 35mm f/1.4",
num_inference_steps=28,
guidance_scale=3.5
).images[0]
image.save("output.png")
4-bit Quantization (Low VRAM)
from diffusers import FluxPipeline, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
quantization_config=quantization_config,
device_map="cpu"
)
pipe.enable_model_cpu_offload()
Prompting Guide
Prompt Structure
Subject + Action + Style + Context
Example Prompts
Photorealistic:
A weathered fisherman with deep wrinkles, wearing a yellow raincoat,
standing on a wooden dock at golden hour, dramatic rim lighting,
shot on Fujifilm X-T5, 35mm f/1.4
Artistic:
A bioluminescent forest with crystalline trees, ethereal mist
rising from an obsidian lake, otherworldly atmosphere,
hyper-detailed fantasy illustration
Typography:
A neon sign reading "OPEN 24 HOURS" in pink and blue,
mounted on a brick wall, rain-slicked street reflections,
night photography, shallow depth of field
Prompting Do's and Don'ts
| Do | Don't |
|---|---|
| Write naturally | Use prompt weights |
| Be specific | Use negative prompts |
| Include camera details | Overload with keywords |
| Layer foreground to background | Describe sequential actions |
Recommended Settings
FLUX.1 [dev]
| Setting | Value |
|---|---|
| Steps | 20-30 (25 optimal) |
| CFG Scale | 3.5 (art) or 1-3 (photo) |
| Sampler | Euler |
| Resolution | 1024x1024 |
| Seed | -1 (variety) |
FLUX.1 [schnell]
| Setting | Value |
|---|---|
| Steps | 1-4 (up to 8 possible) |
| CFG Scale | 4-9 |
| Sampler | Euler |
| Resolution | 1024x1024 |
Speed LoRAs
Use HyperFlux or FluxTurbo LoRAs to reduce dev from 25 steps to 4-9:
| LoRA | Steps | Quality |
|---|---|---|
| HyperFlux | 4-8 | 90%+ |
| FluxTurbo | 7-9 | 95%+ |
Memory Optimization
ComfyUI Launch Flags
python main.py \
--lowvram \
--cpu-text-encoder \
--preview-method none \
--disable-xformers
Flag Reference
| Flag | Effect |
|---|---|
--lowvram | Aggressive memory management |
--cpu-text-encoder | Offload T5 to CPU (saves 1-2GB) |
--cpu-vae | Offload VAE to CPU |
--preview-method none | Disable previews |
General Tips
- Close background apps (browsers, Discord)
- Reduce resolution for testing (768x768)
- Keep batch size at 1
- Use GGUF Q5 - 95%+ quality at 1/4 memory
- Restart ComfyUI between model changes
VRAM Rule of Thumb
GGUF file size ≈ VRAM usage
- Q8: ~12-13GB file = ~12-13GB VRAM
- Q5: ~6-8GB file = ~6-8GB VRAM
- Q4: ~4-6GB file = ~4-6GB VRAM
FLUX ControlNets
Available Tools
| Tool | Purpose | Location |
|---|---|---|
| Canny | Edge-guided | models/diffusion_models/ |
| Depth | Depth-map control | models/diffusion_models/ |
| Redux | Image mixing | models/style_models/ |
| Fill | Inpainting | models/diffusion_models/ |
Download
Full models from Hugging Face:
flux1-canny-dev.safetensorsflux1-depth-dev.safetensors
LoRA versions for lower VRAM:
flux1-canny-dev-lora.safetensorsflux1-depth-dev-lora.safetensors
Redux requires sigclip_vision encoder in models/clip_vision/.
Performance Benchmarks
Generation Speed
| GPU | Resolution | Steps | Time |
|---|---|---|---|
| RTX 5090 | 1024x1024 | 20 | ~7 sec |
| RTX 4090 | 1024x1024 | 20 | ~10-18 sec |
| RTX 4090 (first) | 1024x1024 | 20 | ~41 sec |
| M4 Max | 1024x1024 | 20 | ~85 sec |
Quality vs Speed Trade-off
| Model | Steps | Speed | Quality |
|---|---|---|---|
| schnell | 4 | Fastest | Good |
| dev + HyperFlux | 8 | Fast | Very good |
| dev | 25 | Moderate | Excellent |
| dev | 30 | Slower | Maximum |
Key Takeaways
- FLUX.1 schnell is Apache 2.0 - Free for commercial use
- 8GB GPUs work with GGUF Q4/Q5 quantization
- RTX 4090 generates in 10-18 seconds at full quality
- Natural language prompting - No weights or negatives
- Use FP8 T5 encoder to save 5GB VRAM
- Apple Silicon is 2-4x slower but works with MPS
- FLUX.2 requires 54-90GB VRAM - Most stay on FLUX.1
Next Steps
- Check VRAM requirements for your GPU
- Compare with RTX 5090 for upgrades
- Learn quantization techniques
- Explore local AI tools for LLMs
- Set up RAG for text-based AI
FLUX represents the cutting edge of open-source image generation, delivering Midjourney-level quality that runs on consumer hardware. Whether you're using a high-end RTX 4090 for instant generation or an 8GB GPU with quantized models, FLUX enables professional-quality AI art creation without cloud dependencies or API costs.
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Enjoyed this? There are 10 full courses waiting.
10 complete AI courses. From fundamentals to production. Everything runs on your hardware.
Build Real AI on Your Machine
RAG, agents, NLP, vision, MLOps — chapters across 10 courses that take you from reading about AI to building AI.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!