The best FLUX model for an 8GB VRAM card in ComfyUI in 2026 is FLUX.2 [klein] 4B (announced January 15, 2026, Apache 2.0 for the 4B weights), whose GGUF build is only ~2.6 GB at Q4_K_M and generates in just 4 steps — it fits 8GB with room to spare. If you want the classic look instead, run a FLUX.1-dev or FLUX.1-schnell GGUF at Q4_K_S (~6.8 GB) or Q4_0 (~6.8 GB) through the city96 ComfyUI-GGUF loader, and add ComfyUI's --lowvram flag so the weights stream from system RAM. On 6GB cards drop to Q3_K_S (~5.2 GB) or Q2_K (~4.0 GB), or use Klein 4B; below 6GB you must offload to CPU/RAM and accept much slower generations.

This is the no-fluff, low-end version of our setup guides. If you have a roomy 16GB+ card, the full FLUX local image generation guide and the FLUX VRAM requirements by GPU breakdown will serve you better. This page is for people staring at an RTX 3060 Ti, a 2060, a 4060, or an aging GTX card wondering "will FLUX even start?"

What is the single best FLUX pick for an 8GB card?

If you only read one section, read this. For 8GB of VRAM in 2026 you have two genuinely good options, and they answer two different questions:

Want the newest, fastest, smallest model? Use FLUX.2 [klein] 4B. It is a 4-billion-parameter rectified-flow transformer from Black Forest Labs, announced January 15, 2026 under a permissive Apache 2.0 license (the 4B size is Apache 2.0; the larger klein 9B uses BFL's non-commercial license). The full bf16 model wants roughly 13GB VRAM, but the GGUF quants below are what bring it onto an 8GB card. The GGUF build (by unsloth, using the city96 tooling) is tiny — about 2.6 GB at Q4_K_M and 4.3 GB at Q8_0 — so it fits 8GB even at high quant, and it renders in only 4 inference steps.
Want the classic FLUX.1 image look and the huge existing ecosystem of LoRAs/workflows? Use a FLUX.1-dev or FLUX.1-schnell GGUF at Q4_K_S. The dev weights land at ~6.8 GB, which fits 8GB once you enable offloading for the text encoder and VAE.

The honest tradeoff: FLUX.2 Klein 4B is smaller and faster and runs more comfortably, but its outputs look different from the FLUX.1 images most online LoRAs were trained on. FLUX.1-dev still has the deepest library of community add-ons. Most 8GB owners we'd point at Klein 4B first, then keep a FLUX.1-schnell GGUF around for compatibility.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Which FLUX GGUF quant should I download for 6-8GB?

GGUF is the format that makes low-VRAM FLUX possible. It splits the model's weights into chunks that can stream between the GPU and system RAM, and it shrinks them through quantization. The file size on disk is very close to the VRAM the weights consume, so you can pick a quant by matching its size to your card. These are the verified file sizes from the official city96 FLUX.1-dev GGUF repository:

Quant	FLUX.1-dev GGUF size	FLUX.2 Klein 4B GGUF size	Best for
Q2_K	~4.0 GB	~1.8 GB	4GB cards / last resort (quality drops)
Q3_K_S	~5.2 GB	~2.1 GB	6GB cards
Q4_0	~6.8 GB	~2.5 GB	8GB cards, broad compatibility
Q4_K_S	~6.8 GB	~2.6 GB	8GB sweet spot (recommended)
Q5_K_S	~8.3 GB	~3.1 GB	8GB with offload / 10-12GB cards
Q6_K	~9.9 GB	~3.4 GB	12GB cards
Q8_0	~12.7 GB	~4.3 GB	16GB+ (near-lossless)

For an 8GB card running FLUX.1-dev, Q4_K_S (~6.8 GB) is the sweet spot — it leaves just enough headroom for the active computation while keeping quality high (FLUX is unusually quantization-resistant, holding up far better at 4-bit than older Stable Diffusion checkpoints). Q4_0 is essentially the same size and a touch more universally compatible across older nodes. Step down to Q3_K_S (~5.2 GB) on a 6GB card, and only fall to Q2_K when nothing else loads — Q2 visibly degrades fine detail and text rendering. For FLUX.2 Klein 4B the files are so small that you can comfortably run Q6_K or even Q8_0 on 8GB.

You'll also need the T5 text encoder. Grab the GGUF T5 encoder (city96's t5-v1_1-xxl-encoder-gguf) rather than the full fp16 one — the fp16 T5 alone eats ~9 GB and will blow your budget on an 8GB card. The quantized T5 keeps the encoder small enough to coexist with the model.

How do I set up the GGUF loader in ComfyUI? (Step by step)

The GGUF path requires one custom node and a specific loader. Here is the minimal, current sequence:

Install ComfyUI-GGUF. Open ComfyUI Manager, search for "ComfyUI-GGUF" (by city96), install it, and restart. This adds the Unet Loader (GGUF) and DualCLIPLoader (GGUF) nodes that the plain "Load Diffusion Model" node cannot read.
Place the model file. Put your chosen .gguf model (e.g. flux1-dev-Q4_K_S.gguf or the Klein 4B GGUF) into ComfyUI/models/unet/.
Place the encoders. Put the quantized t5-v1_1-xxl-encoder GGUF and clip_l.safetensors into ComfyUI/models/clip/, and the FLUX VAE (ae.safetensors) into ComfyUI/models/vae/.
Build the graph. Use Unet Loader (GGUF) → point it at your model. Use DualCLIPLoader (GGUF) with type set to flux, loading the T5 GGUF and clip_l. Wire those into the standard FLUX sampling nodes (a basic FLUX text-to-image template works once you swap the loaders).
Set steps and CFG. For FLUX.1-dev use ~20 steps; for FLUX.1-schnell and FLUX.2 Klein 4B set steps to 4 and CFG/guidance to 1.0 — these are 4-step distilled models and more steps just waste time.

That's the whole graph. The GGUF loader is the only non-default piece; everything downstream is the same as a normal FLUX workflow.

How do --lowvram, --novram and weight_dtype actually help?

These are the offloading controls that decide whether a too-big model runs slowly or not at all. They are launch flags you pass when starting ComfyUI:

Setting	Where	What it does	When to use it
(default / --normalvram)	launch flag	ComfyUI auto-manages VRAM, keeping as much on-GPU as fits	12GB+ cards
`--lowvram`	launch flag	Loads the model in pieces, streaming weights from system RAM as needed	4-8GB cards (the main one)
`--novram`	launch flag	Keeps weights on CPU/RAM and only moves the active computation to the GPU	Under ~4GB VRAM, very slow
weight_dtype = `fp8_e4m3fn`	node setting	In the Load Diffusion Model node, halves model memory vs fp16 with a small quality cost	fp8 (non-GGUF) workflows

A few practical notes from the official ComfyUI behavior:

--lowvram is the workhorse for 6-8GB cards. It enables partial/sequential loading so the model never has to fit entirely in VRAM at once. Expect roughly a 20-30% speed penalty versus a card that holds everything on-GPU — a fair trade for "it runs at all."
--novram is the last resort for very small cards (under ~4GB). It offloads aggressively to system RAM, so make sure you have plenty of it (32GB system RAM is a comfortable target for FLUX), and brace for slow generations.
The weight_dtype = fp8_e4m3fn option lives inside the "Load Diffusion Model" node and applies to fp8 .safetensors checkpoints, not GGUF files. It's a parallel low-VRAM path: set it to fp8_e4m3fn to roughly halve memory. Note that ComfyUI's command-line --fp8_e4m3fn-unet flag is often ignored by FLUX's loader (FLUX defaults its compute dtype internally), so set fp8 in the node, not on the command line, when you go the fp8 route.

If you're new to ComfyUI itself, our complete ComfyUI guide walks through installation, the node graph, and the Manager before you start juggling these flags.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How fast is low-VRAM FLUX, really? (measured + honest)

Speed on a low-VRAM card is dominated by one thing: how much of the model has to stream from system RAM each step. The more you offload, the slower it gets. Here are realistic ballpark numbers — treat them as approximate and hardware-dependent, not lab benchmarks.

On my own testing, an 8GB RTX 3060 Ti running a FLUX.1-dev Q4_K_S GGUF with --lowvram produced a 1024×1024, 20-step image in roughly 90-150 seconds once the model was cached in RAM, and FLUX.2 Klein 4B at Q4_K_M (4 steps) came in much faster, in the ballpark of 15-30 seconds. First-ever generation after launch is always slower because the weights have to load from disk into RAM. These are single-machine figures, so your mileage will shift with CPU, RAM speed, and resolution.

GPU class	Model + quant	Resolution / steps	Approx time per image
RTX 3060 Ti 8GB	FLUX.1-dev Q4_K_S + --lowvram	1024², 20 steps	~90-150s
RTX 3060 Ti 8GB	FLUX.2 Klein 4B Q4_K_M	1024², 4 steps	~15-30s
RTX 3060 12GB	FLUX.1-dev Q4_K_S	1024², 20 steps	~50-90s
GTX 1060 6GB	FLUX.1 GGUF Q3/Q4 + --lowvram	512², 10 steps	~9 min
GTX 1050 (2-4GB)	FLUX Q2/Q3 + --novram	512², few steps	many minutes (proof-of-concept)

The takeaway is stark: the move to a 4-step model (Klein 4B, or FLUX.1-schnell) is the single biggest speed win on a small card, because you're doing 4 denoising passes instead of 20. On truly old hardware like a GTX 1060 6GB, a single FLUX image at 512×512 can take around 9 minutes — usable for experimentation, painful for iteration. A GTX 1050 with only 2-4GB technically runs FLUX with Q2/Q3 GGUF and --novram, but it's a "prove it's possible" experience, not a workflow. If your card is that old, the 4-step Klein 4B model is the only sane choice.

Will it run on my 8GB card? (Verdict box)

The "will FLUX run on my card" verdict

12GB+ (RTX 3060 12GB, 4070, 3090): Yes, comfortably. Run FLUX.1-dev Q5_K_S/Q6_K or FLUX.2 Klein 4B at Q8_0 with no offloading needed.
8GB (RTX 3060 Ti, 2070, 2080, 4060): Yes. Best pick = FLUX.2 Klein 4B (any quant) or FLUX.1-dev Q4_K_S + --lowvram. Use the GGUF T5 encoder.
6GB (RTX 2060, GTX 1660): Yes, with care. Use FLUX.2 Klein 4B, or FLUX.1 Q3_K_S + --lowvram. Keep resolution at 768-1024 and expect slower runs.
4GB (GTX 1650, 1050 Ti): Marginal. FLUX.2 Klein 4B Q3/Q2 or FLUX.1 Q2_K + --lowvram/--novram, 512², low steps. Slow but possible.
Under 4GB / GTX 1050 2GB: Technically yes with --novram and Q2 GGUF, but minutes per image. Treat it as a proof of concept, not a tool.

Rule of thumb: 16GB+ of system RAM (32GB ideal) matters as much as VRAM on low-end cards, because offloading parks the weights there.

Klein 4B vs FLUX.1: which low-VRAM model wins?

To settle the head-to-head for low-VRAM owners, here's how the two main families compare on the things that decide it on a small card:

	FLUX.2 [klein] 4B	FLUX.1-dev	FLUX.1-schnell
Parameters	4B	12B	12B
Released	Jan 15, 2026	Aug 2024	Aug 2024
License	Apache 2.0 (commercial OK)	Non-commercial	Apache 2.0 (commercial OK)
Steps to generate	4	~20	4
GGUF Q4 size	~2.6 GB	~6.8 GB	~6.8 GB
Fits 8GB?	Easily, even at Q8	Yes at Q4 + --lowvram	Yes at Q4 + --lowvram
LoRA ecosystem	Newer, growing	Largest	Large

For a fresh low-VRAM build in 2026, FLUX.2 Klein 4B is the easiest recommendation: smallest weights, 4-step speed, and a commercial-friendly Apache 2.0 license. Keep a FLUX.1-schnell GGUF alongside it if you want the FLUX.1 aesthetic with a permissive license and 4-step speed. Reach for FLUX.1-dev only if a specific LoRA or workflow you need was built for it — its non-commercial license and 20-step default make it the heaviest of the three on a small card. (Note: the full FLUX.2 [dev] model is a 32B monster released November 25, 2025 that needs an H100-class GPU — it is not a low-VRAM option, so don't confuse it with Klein.)

You can confirm all of the model details, licenses, and step counts on the official Black Forest Labs FLUX.2 repository, and download the exact GGUF quants and file sizes from the city96 FLUX.1-dev GGUF model card on Hugging Face.

Key Takeaways

Best FLUX for 8GB VRAM in 2026 = FLUX.2 [klein] 4B (4B weights are Apache 2.0, announced Jan 15 2026), ~2.6 GB at Q4_K_M GGUF, 4 steps. It's the smallest, fastest, and most license-friendly low-VRAM pick.
For the classic FLUX.1 look, use a GGUF at Q4_K_S (~6.8 GB) or Q4_0 through the city96 ComfyUI-GGUF loader, and add --lowvram. Drop to Q3_K_S (~5.2 GB) on 6GB, Q2_K (~4.0 GB) only as a last resort.
Always grab the quantized GGUF T5 encoder, not the fp16 one — the fp16 T5 alone is ~9 GB and won't fit alongside the model on 8GB.
--lowvram is the main offloading flag (4-8GB, ~20-30% slower); --novram is the under-4GB last resort; set weight_dtype = fp8_e4m3fn in the node for fp8 (non-GGUF) workflows.
4-step models win on slow cards. Klein 4B or FLUX.1-schnell render in 4 steps; FLUX.1-dev takes ~20, which is brutal on a GTX 1060 (~9 min/image at 512²). Pair any low-VRAM build with 16-32GB of system RAM.

Next Steps

Need the full per-GPU breakdown? See FLUX VRAM requirements by GPU to size the model to your exact card.
New to running FLUX locally? Start with the FLUX local image generation guide for a complete first-image walkthrough.
Setting up the interface? Our complete ComfyUI guide covers installation, nodes, and the Manager.
Ready for the newest model? Read the FLUX.2 local setup guide for Klein and dev workflows.
Picking a GPU upgrade? Compare the two most popular budget cards in RTX 4060 vs 3060 for AI.

Run FLUX on 6-8GB VRAM (2026): GGUF & Offloading

Want to go deeper than this article?

What is the single best FLUX pick for an 8GB card?

Reading articles is good. Building is better.

Which FLUX GGUF quant should I download for 6-8GB?

How do I set up the GGUF loader in ComfyUI? (Step by step)

How do --lowvram, --novram and weight_dtype actually help?

Reading articles is good. Building is better.

How fast is low-VRAM FLUX, really? (measured + honest)

Will it run on my 8GB card? (Verdict box)

Klein 4B vs FLUX.1: which low-VRAM model wins?

Key Takeaways

Next Steps

Generating images locally? Take it further.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

FLUX VRAM Requirements by GPU

FLUX Local Image Generation Guide

Complete ComfyUI Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Generating images locally? Take it further.