★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Image Generation

FLUX VRAM Requirements by GPU (2026): 8GB to 24GB Guide

June 20, 2026
12 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Generating images locally? Take it further. From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Start free
Or own it for life — Lifetime $149, pay once

FLUX.1 [dev] (a 12B-parameter model) needs roughly 24 GB of VRAM at full FP16, about 12 GB at FP8, and as little as 6-8 GB if you run a GGUF Q4 quant — which is why an 8GB RTX 3060/4060 can still run it. The single best file to download is keyed to your VRAM: Q4_K_S/Q4_0 GGUF for 8GB, Q5/Q6 GGUF for 12GB, FP8 (flux1-dev-fp8.safetensors, ~11.9 GB) for 16GB, and the full FP16 flux1-dev.safetensors (~23.8 GB) for 24GB. The newer FLUX.2 [dev] is a 32B model that needs a data-center GPU (more than a single 80GB H100 at full BF16, ~32 GB at FP8, ~19 GB only at Q4 GGUF with the text encoder offloaded), so for most local GPUs the right answer is still FLUX.1, or the new FLUX.2 [klein] 4B.

This guide gives you a definitive "what runs on my card" table, the exact GGUF/FP8 file to grab per VRAM tier with file sizes in GB, the ComfyUI low-VRAM flags that make it fit, and the generation-time tradeoffs you accept when you quantize down.

What is the best FLUX model for 8GB, 12GB, and 16GB VRAM?

Here is the direct verdict before the detail. These pair a FLUX variant with the specific quantized file most people actually run on each card:

Your VRAMBest FLUX model + fileWhy
8 GB (RTX 3060 Ti / 4060 / 3070)FLUX.1 [dev] GGUF Q4 (flux1-dev-Q4_K_S.gguf, ~6.8 GB) + FP8 T5 encoderSmallest dev quant that holds quality; needs --lowvram
12 GB (RTX 3060 12GB / 4070)FLUX.1 [dev] GGUF Q5_K_S (~8.3 GB) or Q6_K (~9.9 GB)Sweet spot — near-FP8 quality, fits with context
16 GB (RTX 4060 Ti / 5060 Ti / 4070 Ti S)FLUX.1 [dev] FP8 (flux1-dev-fp8.safetensors, ~11.9 GB)Nearly FP16 quality, half the VRAM, simple single file
24 GB (RTX 3090 / 4090 / 5090)FLUX.1 [dev] FP16 (flux1-dev.safetensors, ~23.8 GB)Maximum quality, full LoRA/ControlNet ecosystem
24 GB, newest modelFLUX.2 [klein] 4B (Apache 2.0, ~13 GB FP16)Sub-second, runs on 12GB+, the only easy FLUX.2 locally

If you want speed over peak quality on any tier, FLUX.1 [schnell] (Apache 2.0, generates in 1-4 steps) uses the same VRAM as dev but finishes far faster. The catch is that schnell trades some prompt fidelity and fine detail for that speed.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

FLUX.1 [dev] VRAM requirements by precision (the core table)

This is the table the head query is really asking for. FLUX.1 [dev] has ~12 billion parameters; the VRAM you need is the model weights plus the T5-XXL text encoder plus a bit of working memory. The figures below are practical totals people hit in ComfyUI, and they include using an FP8 T5 encoder on the tight tiers (a full-precision T5 is ~9.2 GB by itself).

Precision / quantModel fileFile sizePractical VRAMQuality
FP16 (full)flux1-dev.safetensors~23.8 GB~24 GB (up to 33 GB w/ overhead)Reference
FP8flux1-dev-fp8.safetensors~11.9 GB~12-16 GB~99% of FP16
GGUF Q8_0flux1-dev-Q8_0.gguf~12.7 GB~12-14 GBExcellent
GGUF Q6_Kflux1-dev-Q6_K.gguf~9.9 GB~10-12 GBVery good
GGUF Q5_K_Sflux1-dev-Q5_K_S.gguf~8.3 GB~8-10 GBVery good
GGUF Q4_K_Sflux1-dev-Q4_K_S.gguf~6.8 GB~6-8 GBGood
NF4 (4-bit)flux1-dev-bnb-nf4-v2.safetensors~12 GB file (bundles T5+CLIP+VAE)~6-8 GBGood

A few honest notes. FP8 is the genuine sweet spot: it is a single safetensors file, needs no extra GGUF node, and produces images visually indistinguishable from FP16 in most prompts while halving memory. GGUF Q8 is marginally smaller than FP8 in VRAM and slightly higher fidelity, but needs the ComfyUI-GGUF node. Below Q4 (Q3/Q2) the model starts to lose anatomy and text rendering, so Q4 is the practical floor for FLUX.1 [dev]. You can confirm every GGUF size on the city96/FLUX.1-dev-gguf model card.

FLUX VRAM by GPU card — what actually fits

Now map that to real hardware. This is the "will it run on my card" reference, including the newer FLUX.2 family so you can see why most people stay on FLUX.1.

GPUVRAMFLUX.1 dev (recommended file)FLUX.2 klein 4BFLUX.2 dev (32B)
RTX 3060 / 4060 / 30708 GBQ4_K_S GGUF + --lowvramNo (needs ~13 GB FP16)No
RTX 3060 12GB / 407012 GBQ5_K_S or Q6_K GGUFYes (FP16 / GGUF)No
RTX 4060 Ti / 5060 Ti16 GBFP8 (full single file)Yes, comfortablyNo
RTX 3090 / 4090 / 509024 GBFP16 full qualityYesOnly Q4 GGUF, T5 on CPU
A6000 / H10048-80 GBFP16 + big batchesYesFP8 (~32 GB) / FP16
Apple Silicon (M-series)Unified (16-128 GB)GGUF Q5-Q8 via ComfyUI/Draw ThingsYes (32GB+ recommended)Quantized only

The RTX 5060 Ti 16GB deserves a callout because it is the current value pick for FLUX: 16 GB is exactly enough for the FP8 single-file workflow without any low-VRAM gymnastics. We break that card down in detail in the RTX 5060 Ti 16GB for local AI guide. For the wider GPU landscape, see best GPUs for AI.

Which exact file should you download per VRAM tier?

People waste hours downloading the wrong 12 GB file. Here is the precise download recipe per tier. Every FLUX setup also needs three shared support files in addition to the main model: the CLIP-L encoder (clip_l.safetensors), the T5-XXL encoder, and the VAE (ae.safetensors).

  • 8 GB GPU: main model = flux1-dev-Q4_K_S.gguf (~6.8 GB) into ComfyUI/models/unet/; text encoder = t5xxl_fp8_e4m3fn.safetensors (the FP8 T5, ~4.6 GB — the FP16 T5 alone would not fit). Install the ComfyUI-GGUF custom node and load the model with its "Unet Loader (GGUF)" node.
  • 12 GB GPU: main model = flux1-dev-Q5_K_S.gguf (~8.3 GB) or Q6_K (~9.9 GB) into models/unet/; you can use the FP8 T5 to leave room for a longer prompt and a LoRA.
  • 16 GB GPU: main model = flux1-dev-fp8.safetensors (~11.9 GB) into models/diffusion_models/ (or models/unet/); no GGUF node needed; FP16 T5 works but FP8 T5 is safer headroom.
  • 24 GB GPU: main model = flux1-dev.safetensors (~23.8 GB) into models/diffusion_models/ with the FP16 t5xxl_fp16.safetensors for full reference quality.

The official ComfyUI FLUX example pages and the black-forest-labs/FLUX.1-dev model card document the exact filenames and folders. For a full step-by-step ComfyUI walkthrough including the workflow graph, see our complete ComfyUI guide.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

ComfyUI low-VRAM flags that make FLUX fit

If FLUX out-of-memory errors hit, these are the levers, in order of how much they help:

  1. Quantize the model — drop from FP16 to FP8 or a GGUF Q-level. This is the single biggest win and the first thing to try.
  2. Use the FP8 T5 encoder (t5xxl_fp8_e4m3fn.safetensors, ~4.6 GB) instead of the FP16 T5 (~9.2 GB). That alone frees roughly 4-5 GB.
  3. Set weight_dtype to fp8_e4m3fn in the "Load Diffusion Model" node — this casts the diffusion weights to FP8 on the fly and roughly halves their memory at a tiny quality cost.
  4. Launch ComfyUI with low-VRAM flags. Start with --lowvram (offloads to system RAM as needed); use --novram only as a last resort on tiny cards because it is much slower. Example: python main.py --lowvram.
  5. Drop resolution to 768×768 or 512×512 while testing; 1024×1024 costs the most working memory.
  6. Keep batch size at 1 and close other GPU apps (browser tabs, Discord) before generating.

For GGUF models, a useful rule of thumb is that the file size on disk is close to the VRAM the weights occupy, so you can pre-screen a file against your card before downloading. Sizing context and KV-style overhead for any model is covered in our broader VRAM requirements guide.

How quantization affects generation time

Lower quants save VRAM but they are not free on speed — and the relationship is not always intuitive. FP16 and FP8 run at full GPU tensor-core speed. GGUF quants add a small dequantization step per layer, so a Q4 GGUF that fits comfortably can actually be a touch slower per step than FP8 even though it uses less memory. The real cliff is offloading: the moment ComfyUI spills layers to system RAM (which --lowvram does when the model does not fully fit), generation time can multiply several times over.

In my own testing on an RTX 3090 (24GB), FLUX.1 [dev] at full FP16 generates a 1024×1024 image in roughly 10-18 seconds at 20 steps, FP8 lands in a similar range, and a GGUF Q4 on a tight 8GB card with --lowvram offloading stretched to well over a minute per image — these are approximate single-machine numbers at 20 steps, not a controlled benchmark, and your sampler/step count will move them. The takeaway: pick the highest quant that fully fits your VRAM, because fitting fully matters far more than the quant label.

SetupQuantApprox time / 1024px (20 steps)Note
RTX 3090/4090 24GBFP16~10-18 sReference speed
RTX 5060 Ti 16GBFP8~14-22 sFits as a single file
RTX 3060 12GBQ5_K_S GGUF~25-40 sSmall dequant overhead
RTX 3060 8GBQ4_K_S + --lowvram~60 s+Offloading is the bottleneck

Treat these as ballpark and hardware-dependent; FLUX.1 [schnell] cuts all of them dramatically because it needs only 1-4 steps instead of 20.

FLUX.2 [dev] and [klein] — the new VRAM picture

FLUX.2 changes the math. FLUX.2 [dev] (released November 25, 2025) is a 32B-parameter model — far larger than FLUX.1's 12B. Its weights are roughly 64 GB at BF16, which combined with text-encoder and activation overhead does not fit on a single 80GB H100 unoptimized (Black Forest Labs ships a sequential-offload path for H100); FP8 brings it to roughly 32 GB; and a Q4 GGUF compresses it to about 19 GB, which a 24GB RTX 4090 can run only with the text encoder offloaded to the CPU. It is non-commercial and built for H100-class GPUs, so it is not a practical local pick for most readers.

FLUX.2 [klein] (released January 15, 2026) is the consumer answer. It ships as a 4B model under an Apache 2.0 license (free commercial use, ~13 GB at FP16, runs on a 12GB+ card and on ~8 GB when quantized) and a 9B model under a non-commercial license (~29 GB at FP16, runs on 16GB quantized). The 4B is step-distilled to about 4 inference steps and generates in roughly a second on a capable GPU — the first FLUX.2 weight most people can actually run locally.

FLUX.2 variantParametersLicenseVRAM (FP16)Local-friendly?
FLUX.2 [dev]32BNon-commercial~64 GB BF16, >80GB w/ overhead (FP8 ~32 GB, Q4 ~19 GB)Data-center only
FLUX.2 [klein] 9B9BNon-commercial~29 GB (16 GB quantized)16-24 GB cards
FLUX.2 [klein] 4B4BApache 2.0~13 GBYes (12GB+)

For now, FLUX.1 [dev]/[schnell] still has the deepest ComfyUI tooling, LoRA library, and ControlNet support, so many local users stay on it even after FLUX.2 launched. A full setup walkthrough for both generations is in our run FLUX.1 locally guide.

FLUX on Apple Silicon (unified memory)

Apple Silicon Macs do not have separate VRAM — the GPU shares the system's unified memory, so a 32GB Mac can load models that would never fit on a 24GB discrete card. FLUX.1 [dev] runs on M-series Macs through ComfyUI (with MPS acceleration) or the Mac-native Draw Things app, which uses Metal and is generally a bit faster than ComfyUI on the same chip. Use a GGUF Q5-Q8 quant to keep memory comfortable; 16GB unified is the practical minimum and 32GB+ is recommended for relaxed use. Expect roughly 1-3 minutes per 1024×1024 image on M-series chips — slower than a fast NVIDIA card, but silent, low-power, and entirely local.

Key Takeaways

  1. FLUX.1 [dev] needs ~24 GB at FP16, ~12 GB at FP8, and 6-8 GB at GGUF Q4 — which is why even an 8GB RTX 3060/4060 can run it with the right file.
  2. Match the file to your card: Q4_K_S GGUF (~6.8 GB) for 8GB, Q5/Q6 GGUF for 12GB, FP8 (~11.9 GB) for 16GB, FP16 (~23.8 GB) for 24GB.
  3. FP8 is the sweet spot on 16GB — a single safetensors file, no extra node, ~99% of FP16 quality at half the VRAM.
  4. The FP8 T5 encoder frees ~4-5 GB and is the second-biggest VRAM lever after quantizing the model; --lowvram is the offload safety net but it slows generation.
  5. FLUX.2 [dev] (32B) is data-center-class (~64 GB BF16, more than a single 80GB H100 with overhead; ~32 GB FP8); for new-model local use reach for FLUX.2 [klein] 4B (Apache 2.0, ~13 GB) or stay on FLUX.1.

Next Steps

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path
More on Local Image Generation
See the full Run FLUX.1 Locally guide.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators