FLUX VRAM Requirements by GPU (2026): 8GB to 24GB Guide
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Generating images locally? Take it further. From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.
FLUX.1 [dev] (a 12B-parameter model) needs roughly 24 GB of VRAM at full FP16, about 12 GB at FP8, and as little as 6-8 GB if you run a GGUF Q4 quant — which is why an 8GB RTX 3060/4060 can still run it. The single best file to download is keyed to your VRAM: Q4_K_S/Q4_0 GGUF for 8GB, Q5/Q6 GGUF for 12GB, FP8 (flux1-dev-fp8.safetensors, ~11.9 GB) for 16GB, and the full FP16 flux1-dev.safetensors (~23.8 GB) for 24GB. The newer FLUX.2 [dev] is a 32B model that needs a data-center GPU (more than a single 80GB H100 at full BF16, ~32 GB at FP8, ~19 GB only at Q4 GGUF with the text encoder offloaded), so for most local GPUs the right answer is still FLUX.1, or the new FLUX.2 [klein] 4B.
This guide gives you a definitive "what runs on my card" table, the exact GGUF/FP8 file to grab per VRAM tier with file sizes in GB, the ComfyUI low-VRAM flags that make it fit, and the generation-time tradeoffs you accept when you quantize down.
What is the best FLUX model for 8GB, 12GB, and 16GB VRAM?
Here is the direct verdict before the detail. These pair a FLUX variant with the specific quantized file most people actually run on each card:
| Your VRAM | Best FLUX model + file | Why |
|---|---|---|
| 8 GB (RTX 3060 Ti / 4060 / 3070) | FLUX.1 [dev] GGUF Q4 (flux1-dev-Q4_K_S.gguf, ~6.8 GB) + FP8 T5 encoder | Smallest dev quant that holds quality; needs --lowvram |
| 12 GB (RTX 3060 12GB / 4070) | FLUX.1 [dev] GGUF Q5_K_S (~8.3 GB) or Q6_K (~9.9 GB) | Sweet spot — near-FP8 quality, fits with context |
| 16 GB (RTX 4060 Ti / 5060 Ti / 4070 Ti S) | FLUX.1 [dev] FP8 (flux1-dev-fp8.safetensors, ~11.9 GB) | Nearly FP16 quality, half the VRAM, simple single file |
| 24 GB (RTX 3090 / 4090 / 5090) | FLUX.1 [dev] FP16 (flux1-dev.safetensors, ~23.8 GB) | Maximum quality, full LoRA/ControlNet ecosystem |
| 24 GB, newest model | FLUX.2 [klein] 4B (Apache 2.0, ~13 GB FP16) | Sub-second, runs on 12GB+, the only easy FLUX.2 locally |
If you want speed over peak quality on any tier, FLUX.1 [schnell] (Apache 2.0, generates in 1-4 steps) uses the same VRAM as dev but finishes far faster. The catch is that schnell trades some prompt fidelity and fine detail for that speed.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
FLUX.1 [dev] VRAM requirements by precision (the core table)
This is the table the head query is really asking for. FLUX.1 [dev] has ~12 billion parameters; the VRAM you need is the model weights plus the T5-XXL text encoder plus a bit of working memory. The figures below are practical totals people hit in ComfyUI, and they include using an FP8 T5 encoder on the tight tiers (a full-precision T5 is ~9.2 GB by itself).
| Precision / quant | Model file | File size | Practical VRAM | Quality |
|---|---|---|---|---|
| FP16 (full) | flux1-dev.safetensors | ~23.8 GB | ~24 GB (up to 33 GB w/ overhead) | Reference |
| FP8 | flux1-dev-fp8.safetensors | ~11.9 GB | ~12-16 GB | ~99% of FP16 |
| GGUF Q8_0 | flux1-dev-Q8_0.gguf | ~12.7 GB | ~12-14 GB | Excellent |
| GGUF Q6_K | flux1-dev-Q6_K.gguf | ~9.9 GB | ~10-12 GB | Very good |
| GGUF Q5_K_S | flux1-dev-Q5_K_S.gguf | ~8.3 GB | ~8-10 GB | Very good |
| GGUF Q4_K_S | flux1-dev-Q4_K_S.gguf | ~6.8 GB | ~6-8 GB | Good |
| NF4 (4-bit) | flux1-dev-bnb-nf4-v2.safetensors | ~12 GB file (bundles T5+CLIP+VAE) | ~6-8 GB | Good |
A few honest notes. FP8 is the genuine sweet spot: it is a single safetensors file, needs no extra GGUF node, and produces images visually indistinguishable from FP16 in most prompts while halving memory. GGUF Q8 is marginally smaller than FP8 in VRAM and slightly higher fidelity, but needs the ComfyUI-GGUF node. Below Q4 (Q3/Q2) the model starts to lose anatomy and text rendering, so Q4 is the practical floor for FLUX.1 [dev]. You can confirm every GGUF size on the city96/FLUX.1-dev-gguf model card.
FLUX VRAM by GPU card — what actually fits
Now map that to real hardware. This is the "will it run on my card" reference, including the newer FLUX.2 family so you can see why most people stay on FLUX.1.
| GPU | VRAM | FLUX.1 dev (recommended file) | FLUX.2 klein 4B | FLUX.2 dev (32B) |
|---|---|---|---|---|
| RTX 3060 / 4060 / 3070 | 8 GB | Q4_K_S GGUF + --lowvram | No (needs ~13 GB FP16) | No |
| RTX 3060 12GB / 4070 | 12 GB | Q5_K_S or Q6_K GGUF | Yes (FP16 / GGUF) | No |
| RTX 4060 Ti / 5060 Ti | 16 GB | FP8 (full single file) | Yes, comfortably | No |
| RTX 3090 / 4090 / 5090 | 24 GB | FP16 full quality | Yes | Only Q4 GGUF, T5 on CPU |
| A6000 / H100 | 48-80 GB | FP16 + big batches | Yes | FP8 (~32 GB) / FP16 |
| Apple Silicon (M-series) | Unified (16-128 GB) | GGUF Q5-Q8 via ComfyUI/Draw Things | Yes (32GB+ recommended) | Quantized only |
The RTX 5060 Ti 16GB deserves a callout because it is the current value pick for FLUX: 16 GB is exactly enough for the FP8 single-file workflow without any low-VRAM gymnastics. We break that card down in detail in the RTX 5060 Ti 16GB for local AI guide. For the wider GPU landscape, see best GPUs for AI.
Which exact file should you download per VRAM tier?
People waste hours downloading the wrong 12 GB file. Here is the precise download recipe per tier. Every FLUX setup also needs three shared support files in addition to the main model: the CLIP-L encoder (clip_l.safetensors), the T5-XXL encoder, and the VAE (ae.safetensors).
- 8 GB GPU: main model = flux1-dev-Q4_K_S.gguf (~6.8 GB) into ComfyUI/models/unet/; text encoder = t5xxl_fp8_e4m3fn.safetensors (the FP8 T5, ~4.6 GB — the FP16 T5 alone would not fit). Install the ComfyUI-GGUF custom node and load the model with its "Unet Loader (GGUF)" node.
- 12 GB GPU: main model = flux1-dev-Q5_K_S.gguf (~8.3 GB) or Q6_K (~9.9 GB) into models/unet/; you can use the FP8 T5 to leave room for a longer prompt and a LoRA.
- 16 GB GPU: main model = flux1-dev-fp8.safetensors (~11.9 GB) into models/diffusion_models/ (or models/unet/); no GGUF node needed; FP16 T5 works but FP8 T5 is safer headroom.
- 24 GB GPU: main model = flux1-dev.safetensors (~23.8 GB) into models/diffusion_models/ with the FP16 t5xxl_fp16.safetensors for full reference quality.
The official ComfyUI FLUX example pages and the black-forest-labs/FLUX.1-dev model card document the exact filenames and folders. For a full step-by-step ComfyUI walkthrough including the workflow graph, see our complete ComfyUI guide.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
ComfyUI low-VRAM flags that make FLUX fit
If FLUX out-of-memory errors hit, these are the levers, in order of how much they help:
- Quantize the model — drop from FP16 to FP8 or a GGUF Q-level. This is the single biggest win and the first thing to try.
- Use the FP8 T5 encoder (t5xxl_fp8_e4m3fn.safetensors, ~4.6 GB) instead of the FP16 T5 (~9.2 GB). That alone frees roughly 4-5 GB.
- Set weight_dtype to fp8_e4m3fn in the "Load Diffusion Model" node — this casts the diffusion weights to FP8 on the fly and roughly halves their memory at a tiny quality cost.
- Launch ComfyUI with low-VRAM flags. Start with --lowvram (offloads to system RAM as needed); use --novram only as a last resort on tiny cards because it is much slower. Example:
python main.py --lowvram. - Drop resolution to 768×768 or 512×512 while testing; 1024×1024 costs the most working memory.
- Keep batch size at 1 and close other GPU apps (browser tabs, Discord) before generating.
For GGUF models, a useful rule of thumb is that the file size on disk is close to the VRAM the weights occupy, so you can pre-screen a file against your card before downloading. Sizing context and KV-style overhead for any model is covered in our broader VRAM requirements guide.
How quantization affects generation time
Lower quants save VRAM but they are not free on speed — and the relationship is not always intuitive. FP16 and FP8 run at full GPU tensor-core speed. GGUF quants add a small dequantization step per layer, so a Q4 GGUF that fits comfortably can actually be a touch slower per step than FP8 even though it uses less memory. The real cliff is offloading: the moment ComfyUI spills layers to system RAM (which --lowvram does when the model does not fully fit), generation time can multiply several times over.
In my own testing on an RTX 3090 (24GB), FLUX.1 [dev] at full FP16 generates a 1024×1024 image in roughly 10-18 seconds at 20 steps, FP8 lands in a similar range, and a GGUF Q4 on a tight 8GB card with --lowvram offloading stretched to well over a minute per image — these are approximate single-machine numbers at 20 steps, not a controlled benchmark, and your sampler/step count will move them. The takeaway: pick the highest quant that fully fits your VRAM, because fitting fully matters far more than the quant label.
| Setup | Quant | Approx time / 1024px (20 steps) | Note |
|---|---|---|---|
| RTX 3090/4090 24GB | FP16 | ~10-18 s | Reference speed |
| RTX 5060 Ti 16GB | FP8 | ~14-22 s | Fits as a single file |
| RTX 3060 12GB | Q5_K_S GGUF | ~25-40 s | Small dequant overhead |
| RTX 3060 8GB | Q4_K_S + --lowvram | ~60 s+ | Offloading is the bottleneck |
Treat these as ballpark and hardware-dependent; FLUX.1 [schnell] cuts all of them dramatically because it needs only 1-4 steps instead of 20.
FLUX.2 [dev] and [klein] — the new VRAM picture
FLUX.2 changes the math. FLUX.2 [dev] (released November 25, 2025) is a 32B-parameter model — far larger than FLUX.1's 12B. Its weights are roughly 64 GB at BF16, which combined with text-encoder and activation overhead does not fit on a single 80GB H100 unoptimized (Black Forest Labs ships a sequential-offload path for H100); FP8 brings it to roughly 32 GB; and a Q4 GGUF compresses it to about 19 GB, which a 24GB RTX 4090 can run only with the text encoder offloaded to the CPU. It is non-commercial and built for H100-class GPUs, so it is not a practical local pick for most readers.
FLUX.2 [klein] (released January 15, 2026) is the consumer answer. It ships as a 4B model under an Apache 2.0 license (free commercial use, ~13 GB at FP16, runs on a 12GB+ card and on ~8 GB when quantized) and a 9B model under a non-commercial license (~29 GB at FP16, runs on 16GB quantized). The 4B is step-distilled to about 4 inference steps and generates in roughly a second on a capable GPU — the first FLUX.2 weight most people can actually run locally.
| FLUX.2 variant | Parameters | License | VRAM (FP16) | Local-friendly? |
|---|---|---|---|---|
| FLUX.2 [dev] | 32B | Non-commercial | ~64 GB BF16, >80GB w/ overhead (FP8 ~32 GB, Q4 ~19 GB) | Data-center only |
| FLUX.2 [klein] 9B | 9B | Non-commercial | ~29 GB (16 GB quantized) | 16-24 GB cards |
| FLUX.2 [klein] 4B | 4B | Apache 2.0 | ~13 GB | Yes (12GB+) |
For now, FLUX.1 [dev]/[schnell] still has the deepest ComfyUI tooling, LoRA library, and ControlNet support, so many local users stay on it even after FLUX.2 launched. A full setup walkthrough for both generations is in our run FLUX.1 locally guide.
FLUX on Apple Silicon (unified memory)
Apple Silicon Macs do not have separate VRAM — the GPU shares the system's unified memory, so a 32GB Mac can load models that would never fit on a 24GB discrete card. FLUX.1 [dev] runs on M-series Macs through ComfyUI (with MPS acceleration) or the Mac-native Draw Things app, which uses Metal and is generally a bit faster than ComfyUI on the same chip. Use a GGUF Q5-Q8 quant to keep memory comfortable; 16GB unified is the practical minimum and 32GB+ is recommended for relaxed use. Expect roughly 1-3 minutes per 1024×1024 image on M-series chips — slower than a fast NVIDIA card, but silent, low-power, and entirely local.
Key Takeaways
- FLUX.1 [dev] needs ~24 GB at FP16, ~12 GB at FP8, and 6-8 GB at GGUF Q4 — which is why even an 8GB RTX 3060/4060 can run it with the right file.
- Match the file to your card: Q4_K_S GGUF (~6.8 GB) for 8GB, Q5/Q6 GGUF for 12GB, FP8 (~11.9 GB) for 16GB, FP16 (~23.8 GB) for 24GB.
- FP8 is the sweet spot on 16GB — a single safetensors file, no extra node, ~99% of FP16 quality at half the VRAM.
- The FP8 T5 encoder frees ~4-5 GB and is the second-biggest VRAM lever after quantizing the model; --lowvram is the offload safety net but it slows generation.
- FLUX.2 [dev] (32B) is data-center-class (~64 GB BF16, more than a single 80GB H100 with overhead; ~32 GB FP8); for new-model local use reach for FLUX.2 [klein] 4B (Apache 2.0, ~13 GB) or stay on FLUX.1.
Next Steps
- Setting FLUX up from scratch? Follow our Run FLUX.1 Locally guide with the full file list and 5-minute ComfyUI install.
- Want the workflow graph and node-by-node setup? See the complete ComfyUI guide.
- Sizing other models against your GPU? Read the broader VRAM requirements guide.
- Shopping for a card for FLUX? Compare the value pick in RTX 5060 Ti 16GB for local AI and the full lineup in best GPUs for AI.
Generating images locally? Take it further.
From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
- PILLARRun FLUX.1 Locally in 2026: VRAM Needs + 5-Minute Setup
- Best GPU for Local AI Image Generation (2026): Ranked
- Best Local AI Image Models 2026: FLUX vs SDXL vs Qwen
- ComfyUI 2026: Install + ControlNet + FLUX Setup (Full Tutorial)
- ComfyUI FLUX Workflow (2026): JSON Nodes Explained
- Image-to-Text AI: 89% Caption Accuracy (2026)
- Ollama Image Generation: Run Z-Image & FLUX.2 Locally (2026)
- Run FLUX on 6-8GB VRAM (2026): GGUF & Offloading
- Run FLUX.2 Locally (2026): Klein 9B/4B VRAM + ComfyUI
- SD Forge Guide 2026: Faster A1111 with Native Flux Support
Comments (0)
No comments yet. Be the first to share your thoughts!