AI Tools

FLUX Local Setup: Run AI Image Generation (2026 Guide)

February 6, 2026
18 min read
Local AI Master Research Team
šŸŽ 4 PDFs included
Newsletter

Before we dive deeper...

Get your free AI Starter Kit

Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.

No spam, everUnsubscribe anytime
12,000+ downloads

FLUX Quick Reference

VariantStepsLicenseVRAM (Q4)
FLUX.1 schnell1-4Apache 2.06-8GB
FLUX.1 dev20-30Non-Commercial6-8GB
FLUX.1 proVariableAPI Only—
Best for 8GB GPU: GGUF Q4/Q5 | Best for 24GB: FP16 full precision

What is FLUX?

FLUX is a 12-billion parameter text-to-image model from Black Forest Labs—the same team that created Stable Diffusion. Released in 2024, FLUX represents the next generation of open image generation.

Why FLUX Over Stable Diffusion?

FeatureFLUX.1Stable Diffusion 3.5
PhotorealismExcellentGood
TypographyExcellentGood (3.5), Poor (1.5/XL)
Human anatomyExcellentStruggles with fingers
Prompt adherenceExcellentGood
Parameters12B2-8B

Company Background

Black Forest Labs secured:

  • $300M funding at $3.25B valuation (2025)
  • $140M Meta partnership
  • NVIDIA Blackwell integration
  • Adobe Photoshop integration

FLUX Model Variants

FLUX.1 Family (12B Parameters)

VariantStepsQualityLicense
schnell1-4GoodApache 2.0 (free commercial)
dev20-30HighNon-commercial
proVariableHighestAPI only

FLUX.1 [schnell] ("fast" in German):

  • Generates in just 1-4 steps via adversarial distillation
  • Free for commercial use (Apache 2.0)
  • Best for rapid prototyping

FLUX.1 [dev]:

  • Guidance-distilled from pro
  • Best quality for local use
  • Requires commercial license for business use

FLUX.2 Family (32B Parameters)

Released November 2025 with major improvements:

  • Multi-reference support (up to 10 images)
  • 4-megapixel editing
  • Complex typography and infographics
  • Couples Mistral-3 24B vision-language model
VariantParametersNotes
FLUX.2 klein4BSub-second on consumer hardware
FLUX.2 dev/pro32BRequires 54-90GB VRAM

Hardware Requirements

VRAM by Precision

PrecisionVRAMQualityGPU Examples
FP16 (full)24-33GBMaximumRTX 4090, A6000
FP812-16GBNear-identicalRTX 4070 Ti, 3060 12GB
GGUF Q812-16GBNear-identicalRTX 4070 Ti
GGUF Q58-10GB95%+ qualityRTX 4060, 3060
GGUF Q4/NF46-8GBGoodRTX 4060, 3060

High-End (Full Models):

GPUVRAMSpeed
RTX 509032GB~7 sec/image
RTX 409024GB~10-18 sec/image
H10080GB~1.6 sec/image

Mid-Range (Quantized):

GPUVRAMBest Quantization
RTX 4070 Ti Super16GBQ8
RTX 306012GBQ5/Q6
RTX 4060 Ti16GBQ6/Q8

Budget:

GPUVRAMMax Quantization
RTX 30508GBQ4/Q5
GTX 1660 Ti6GBQ3/Q4

Apple Silicon

ChipMemoryTime (1024x1024)
M4 Max32-128GB~85 sec
M3 Max32-128GB~105 sec
M2 Max32-96GB~145 sec

Note: 2-4x slower than NVIDIA. Use Draw Things or Stability Matrix for best Mac support.


ComfyUI Setup

Step 1: Install ComfyUI

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

Step 2: Download Required Files

Text Encoders (models/clip/):

FileSizeUse
clip_l.safetensors~250MBRequired
t5xxl_fp16.safetensors~9.4GBHigh VRAM
t5xxl_fp8_e4m3fn.safetensors~4.7GBLow VRAM

VAE (models/vae/):

FileSize
flux_ae.safetensors~335MB

UNET Model (models/unet/):

FileVRAMQuality
flux1-dev.safetensors24GB+Maximum
flux1-dev-fp8.safetensors12-16GBExcellent
flux1-dev-Q8_0.gguf12-16GBExcellent
flux1-dev-Q5_0.gguf8-10GBVery good
flux1-dev-Q4_0.gguf6-8GBGood

Step 3: For GGUF Models (Low VRAM)

  1. Open ComfyUI Manager
  2. Install "ComfyUI-GGUF" node
  3. Restart ComfyUI
  4. Use GGUF-specific workflow

Step 4: Run ComfyUI

# Standard
python main.py

# Low VRAM (8-12GB)
python main.py --lowvram

# Very Low VRAM (6-8GB)
python main.py --lowvram --cpu-text-encoder

Forge WebUI Setup

Note: Automatic1111 does NOT support FLUX. Use Forge instead.

Installation

  1. Download Forge one-click package (CUDA 12.1 + PyTorch 2.3.1)
  2. Extract and run update.bat
  3. Run run.bat

Model Download

Download flux1-dev-bnb-nf4 from Hugging Face:

https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main

Place in: stable-diffusion-webui-forge/models/Stable-diffusion/


Python/Diffusers Setup

import torch
from diffusers import FluxPipeline

# Load model
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    torch_dtype=torch.bfloat16
)

# Memory optimization
pipe.enable_model_cpu_offload()

# Generate image
image = pipe(
    "A photorealistic portrait of a woman, golden hour lighting, "
    "shot on Fujifilm X-T5, 35mm f/1.4",
    num_inference_steps=28,
    guidance_scale=3.5
).images[0]

image.save("output.png")

4-bit Quantization (Low VRAM)

from diffusers import FluxPipeline, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)

pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    quantization_config=quantization_config,
    device_map="cpu"
)
pipe.enable_model_cpu_offload()

Prompting Guide

Prompt Structure

Subject + Action + Style + Context

Example Prompts

Photorealistic:

A weathered fisherman with deep wrinkles, wearing a yellow raincoat,
standing on a wooden dock at golden hour, dramatic rim lighting,
shot on Fujifilm X-T5, 35mm f/1.4

Artistic:

A bioluminescent forest with crystalline trees, ethereal mist
rising from an obsidian lake, otherworldly atmosphere,
hyper-detailed fantasy illustration

Typography:

A neon sign reading "OPEN 24 HOURS" in pink and blue,
mounted on a brick wall, rain-slicked street reflections,
night photography, shallow depth of field

Prompting Do's and Don'ts

DoDon't
Write naturallyUse prompt weights
Be specificUse negative prompts
Include camera detailsOverload with keywords
Layer foreground to backgroundDescribe sequential actions

FLUX.1 [dev]

SettingValue
Steps20-30 (25 optimal)
CFG Scale3.5 (art) or 1-3 (photo)
SamplerEuler
Resolution1024x1024
Seed-1 (variety)

FLUX.1 [schnell]

SettingValue
Steps1-4 (up to 8 possible)
CFG Scale4-9
SamplerEuler
Resolution1024x1024

Speed LoRAs

Use HyperFlux or FluxTurbo LoRAs to reduce dev from 25 steps to 4-9:

LoRAStepsQuality
HyperFlux4-890%+
FluxTurbo7-995%+

Memory Optimization

ComfyUI Launch Flags

python main.py \
  --lowvram \
  --cpu-text-encoder \
  --preview-method none \
  --disable-xformers

Flag Reference

FlagEffect
--lowvramAggressive memory management
--cpu-text-encoderOffload T5 to CPU (saves 1-2GB)
--cpu-vaeOffload VAE to CPU
--preview-method noneDisable previews

General Tips

  1. Close background apps (browsers, Discord)
  2. Reduce resolution for testing (768x768)
  3. Keep batch size at 1
  4. Use GGUF Q5 - 95%+ quality at 1/4 memory
  5. Restart ComfyUI between model changes

VRAM Rule of Thumb

GGUF file size ā‰ˆ VRAM usage

  • Q8: ~12-13GB file = ~12-13GB VRAM
  • Q5: ~6-8GB file = ~6-8GB VRAM
  • Q4: ~4-6GB file = ~4-6GB VRAM

FLUX ControlNets

Available Tools

ToolPurposeLocation
CannyEdge-guidedmodels/diffusion_models/
DepthDepth-map controlmodels/diffusion_models/
ReduxImage mixingmodels/style_models/
FillInpaintingmodels/diffusion_models/

Download

Full models from Hugging Face:

  • flux1-canny-dev.safetensors
  • flux1-depth-dev.safetensors

LoRA versions for lower VRAM:

  • flux1-canny-dev-lora.safetensors
  • flux1-depth-dev-lora.safetensors

Redux requires sigclip_vision encoder in models/clip_vision/.


Performance Benchmarks

Generation Speed

GPUResolutionStepsTime
RTX 50901024x102420~7 sec
RTX 40901024x102420~10-18 sec
RTX 4090 (first)1024x102420~41 sec
M4 Max1024x102420~85 sec

Quality vs Speed Trade-off

ModelStepsSpeedQuality
schnell4FastestGood
dev + HyperFlux8FastVery good
dev25ModerateExcellent
dev30SlowerMaximum

Key Takeaways

  1. FLUX.1 schnell is Apache 2.0 - Free for commercial use
  2. 8GB GPUs work with GGUF Q4/Q5 quantization
  3. RTX 4090 generates in 10-18 seconds at full quality
  4. Natural language prompting - No weights or negatives
  5. Use FP8 T5 encoder to save 5GB VRAM
  6. Apple Silicon is 2-4x slower but works with MPS
  7. FLUX.2 requires 54-90GB VRAM - Most stay on FLUX.1

Next Steps

  1. Check VRAM requirements for your GPU
  2. Compare with RTX 5090 for upgrades
  3. Learn quantization techniques
  4. Explore local AI tools for LLMs
  5. Set up RAG for text-based AI

FLUX represents the cutting edge of open-source image generation, delivering Midjourney-level quality that runs on consumer hardware. Whether you're using a high-end RTX 4090 for instant generation or an 8GB GPU with quantized models, FLUX enables professional-quality AI art creation without cloud dependencies or API costs.

šŸš€ Join 12K+ developers
Newsletter

Ready to start your AI career?

Get the complete roadmap

Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.

No spam, everUnsubscribe anytime
12,000+ downloads
Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

šŸ“… Published: February 6, 2026šŸ”„ Last Updated: February 6, 2026āœ“ Manually Reviewed

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

āœ“ 10+ Years in ML/AIāœ“ 77K Dataset Creatorāœ“ Open Source Contributor
Free Tools & Calculators