The best way to run FLUX.2 locally in 2026 is to pick the variant your VRAM can hold: FLUX.2 [dev] is a 32B model that needs about 64 GB at BF16, ~32 GB at FP8, or roughly 19 GB as a GGUF Q4_K_S on a 24 GB RTX 4090 — while FLUX.2 [klein] 9B fits a 16 GB card at FP8 (~15 GB) and FLUX.2 [klein] 4B runs on a 12 GB card (~13 GB BF16, less when offloaded). Install ComfyUI, drop the matching diffusion-model, text-encoder and VAE files into the right folders, load the official FLUX.2 template, and you are generating. If you only have an 8-12 GB GPU, run Klein 4B or a low GGUF build of dev; if you have no usable GPU at all, a rented cloud GPU is the realistic fallback — there is no fast CPU-only path for a 32B image model.

This guide is FLUX.2-specific. Our older FLUX.1 local setup guide still applies to the FLUX.1 family, but FLUX.2 changed the parameter count, the text encoder, and the VRAM math enough that the FLUX.1 numbers will mislead you. Below is the corrected, verified picture for the FLUX.2 lineup as of mid-2026.

What is FLUX.2 and how is it different from FLUX.1?

FLUX.2 is Black Forest Labs' second-generation open-weight image family. FLUX.2 [dev] shipped on November 25, 2025, and the lighter, distilled FLUX.2 [klein] models (9B and 4B) followed on January 15, 2026. Compared with FLUX.1, the practical wins are:

Much better text rendering. Legible signage, UI mockups, memes and infographics are far more reliable than FLUX.1, which routinely garbled longer strings. Black Forest Labs frames this as production-grade typography rather than the occasional lucky word.
Stronger prompt adherence. FLUX.2 follows multi-part, structured prompts (compositional constraints, "X on the left, Y on the right") more faithfully, which cuts the retry loop.
Higher resolution. FLUX.2 generates up to roughly 4 megapixels, useful for print and aggressive cropping.
Multi-reference editing up to 10 images. You can feed character sheets, product shots or style boards and keep consistency across them — a genuinely new capability versus FLUX.1.
A new text encoder. FLUX.2 [dev] swaps FLUX.1's dual CLIP+T5 setup for a single Mistral Small 24B vision-language encoder; the Klein models use a Qwen3 text embedder (8B on 9B, 4B on the 4B). This is why the FLUX.1 download list does not transfer.

The catch is size. FLUX.1 [dev] was a 12B model that fit a 24 GB card at FP16. FLUX.2 [dev] is 32B, so the headline model no longer fits a consumer card without quantization. That single fact reshapes the whole "how much VRAM" conversation, so let's get exact.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How much VRAM does FLUX.2 dev need? (the head-query answer)

FLUX.2 [dev] is a 32-billion-parameter rectified-flow transformer. Here are the verified figures, sourced from Black Forest Labs, NVIDIA and the FLUX.2 inference repo. "Practical total" assumes the Mistral text encoder and VAE are also resident or streamed; with CPU offload you can push lower at a speed cost.

FLUX.2 [dev] precision	Approx model VRAM	Realistic GPU	Notes
BF16 / FP16 (full)	~64 GB	H100 80G / H200 / B200	Reference quality; not a consumer target
FP8 (fp8_scaled)	~32 GB	H100 PCIe / A100 80G / RTX 5090 32G (tight)	~40% VRAM cut, near-identical quality
GGUF Q4_K_S	~19 GB	RTX 4090 24 GB	The realistic single-consumer-GPU path for dev
Lower GGUF (Q3/Q2) + offload	<16 GB	12-16 GB cards w/ lots of system RAM	Works, but quality and speed drop noticeably

So the honest answer to "flux.2 dev vram requirements 2026": you want a 24 GB card minimum to run [dev] comfortably, via a GGUF Q4 build at about 19 GB. FP8 (~32 GB) needs a 32 GB+ card. Full BF16 (~64 GB) is a datacenter affair. If your card is below 24 GB, do not fight the 32B [dev] model — use Klein, which is what the next section is about. To sanity-check any specific quant against your exact card and context, run it through our 2026 VRAM requirements breakdown.

FLUX.2 dev vs Klein 9B vs Klein 4B — full spec table

This is the comparison most people actually need: which FLUX.2 variant matches your GPU. Klein is the distilled, consumer-friendly branch (4 inference steps, sub-second on high-end cards); [dev] is the heavyweight.

Variant	Params	Released	Text encoder	Steps	VRAM (native)	VRAM (FP8)	License
FLUX.2 [dev]	32B dense	Nov 25, 2025	Mistral Small 24B	~28-50	~64 GB BF16	~32 GB	FLUX dev non-commercial
FLUX.2 [klein] 9B	9B flow + 8B Qwen3	Jan 15, 2026	Qwen3 8B	4 (distilled)	~29 GB BF16	~15 GB (fits 16 GB)	FLUX non-commercial
FLUX.2 [klein] 4B	4B flow + 4B Qwen3	Jan 15, 2026	Qwen3 4B	4 (distilled)	~13 GB BF16	~8-9 GB measured	Apache 2.0

Three things worth calling out. First, Klein 4B is the only FLUX.2 weight under a permissive Apache 2.0 license — [dev] and Klein 9B are non-commercial, so if you need commercial use without an API, 4B is your model. Second, Klein 9B at FP8 lands around 15 GB, which is the sweet spot for a 16 GB card (RTX 4070 Ti Super, 4080, or a 16 GB Mac via comparable tooling). Third, Klein 4B is genuinely a 12 GB-class model: Black Forest Labs lists ~13 GB BF16 (RTX 3090/4070 and up), and ComfyUI's own template reports the distilled 4B running in roughly 8.4 GB with offload on a 5090. Treat the low end as offload-assisted, not a guarantee on an 8 GB card.

What's the best way to run FLUX.2 locally? (step-by-step in ComfyUI)

ComfyUI is the path of least resistance — it ships native FLUX.2 templates, so you are not hand-wiring nodes. Here is the clean sequence.

Step 1 — Install or update ComfyUI. Use a recent build (the FLUX.2 nodes and templates are baked into current releases). If you plan to run GGUF [dev] builds, also install the ComfyUI-GGUF custom node by city96.

Step 2 — Pick your variant by VRAM:

8-12 GB GPU → FLUX.2 [klein] 4B (FP8 or GGUF).
16 GB GPU → FLUX.2 [klein] 9B (FP8, ~15 GB).
24 GB GPU → FLUX.2 [dev] as GGUF Q4_K_S (~19 GB), or Klein 9B for speed.
32 GB+ / datacenter → FLUX.2 [dev] FP8 or BF16.

Step 3 — Download the three files for your variant and place them in the right folders. For Klein (the consumer case), ComfyUI's template expects, for the 9B:

Diffusion model → ComfyUI/models/diffusion_models/ (e.g. flux-2-klein-9b-fp8.safetensors)
Text encoder → ComfyUI/models/text_encoders/ (e.g. qwen_3_8b_fp8mixed.safetensors)
VAE → ComfyUI/models/vae/ (flux2-vae.safetensors)

For Klein 4B, use flux-2-klein-4b-fp8.safetensors with qwen_3_4b.safetensors and the same flux2-vae.safetensors. For [dev], the diffusion file is your chosen GGUF or FP8 build and the text encoder is mistral_3_small_flux2_fp8.safetensors (or the BF16 version if you have the VRAM).

Step 4 — Load the official FLUX.2 template from ComfyUI's template browser. The Klein templates are named like image_flux2_klein_text_to_image.json (4B) and image_flux2_text_to_image_9b.json (9B), with separate image-edit templates for single- and multi-reference work. The template pre-wires the loaders, sampler and VAE decode so you only set the prompt.

Step 5 — Generate. Klein is distilled to 4 steps, so first images come back fast; [dev] uses more steps and is slower but higher fidelity. Iterate on the prompt, then raise resolution toward the ~4 MP ceiling once you like the composition.

If you are new to ComfyUI's node graph, work through our complete ComfyUI guide first — it covers installation, model folders, custom nodes and the sampler/VAE chain that the FLUX.2 templates assume you understand.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Quantization options: FP8, GGUF and NF4 explained

You will see four formats in the wild. Choose by what your card can hold while keeping quality acceptable:

BF16 / FP16 — full precision, reference quality, biggest VRAM. Only realistic for Klein 4B (~13 GB) on consumer cards, or [dev] on datacenter GPUs.
FP8 (fp8_scaled) — about a 40% VRAM cut with near-identical quality; the recommended format on RTX 40/50-series. This is how Klein 9B drops to ~15 GB and [dev] to ~32 GB.
GGUF (Q8 down to Q2) — flexible CPU/GPU-offloadable quantization via the ComfyUI-GGUF node. For [dev], Q4_K_S (~19 GB) is the workable 24 GB-card build; lower quants trade quality for fit. Keep FP16 compute even with GGUF weights — stacking FP8 on top of GGUF buys little and can cause instability.
NF4 / NVFP4 — 4-bit formats; NVFP4 is fastest on RTX 50-series specifically. Useful for squeezing more headroom, with a quality trade-off versus FP8.

A practical rule: prefer FP8 if it fits, drop to GGUF Q4 if it does not, and only go below Q4 when you have no other way to load the model.

First-hand notes on speed and fit

Treat these as approximate, single-machine observations rather than a controlled benchmark. On a 24 GB RTX 4090, a GGUF Q4_K_S build of FLUX.2 [dev] loads with a few GB of headroom for context and renders a 1 MP image in the tens-of-seconds range — slower than FLUX.1 [dev] was, which is the cost of going from 12B to 32B. FLUX.2 [klein] 9B at FP8 on a 16 GB card felt dramatically snappier because it is distilled to 4 steps; first images come back in seconds, which makes it the better choice for interactive iteration even though [dev] edges it on fine detail. The familiar cliff still applies: the moment weights spill from VRAM into system RAM, generation time balloons, so keep the model resident or use deliberate offload rather than accidental overflow. For a per-GPU view of where these models comfortably sit, our Forge / Stable Diffusion setup guide covers the same VRAM-budgeting habits on a lighter engine.

No GPU? The honest fallback

There is no fast CPU-only route for a 32B image model — FLUX.2 [dev] on CPU is effectively unusable, and even Klein is impractical without a GPU. The realistic "no local GPU" options are: rent a cloud GPU by the hour (an A100 80G or H100 runs [dev] at FP8 comfortably; a 24 GB cloud 4090 runs the GGUF Q4 build), or use Black Forest Labs' hosted FLUX.2 API for the [pro]/[max] tiers if you do not need the weights locally. If your goal is specifically local and free, buy the variant to your hardware: Klein 4B is the entry point and is Apache 2.0, so it is the one most people without high-end cards should start on.

Key Takeaways

FLUX.2 [dev] is 32B, not 12B like FLUX.1 [dev] — it needs ~64 GB BF16, ~32 GB FP8, or ~19 GB as a GGUF Q4_K_S on a 24 GB RTX 4090.
Match the variant to your VRAM: Klein 4B for 12 GB (Apache 2.0), Klein 9B FP8 (~15 GB) for 16 GB, [dev] GGUF Q4 for 24 GB, [dev] FP8/BF16 for 32 GB+/datacenter.
FLUX.2 beats FLUX.1 on text rendering, prompt adherence, ~4 MP resolution, and multi-reference editing up to 10 images, with a new Mistral/Qwen3 text-encoder stack.
ComfyUI is the easiest path: install, drop the diffusion/text-encoder/VAE files in their folders, load the native FLUX.2 template, generate. Use the ComfyUI-GGUF node for [dev] GGUF builds.
FP8 first, GGUF Q4 if it doesn't fit. Keep the model fully in VRAM (or use deliberate offload) to avoid the system-RAM speed cliff.

Next Steps

Coming from FLUX.1? Compare against our FLUX.1 local image generation guide to see exactly what changed in the download list and VRAM math.
New to the node editor? Start with the complete ComfyUI guide before loading the FLUX.2 templates.
Want a lighter, faster local image model to pair with Klein? See our writeup on Z-Image Turbo in ComfyUI.
Confirm the official specs on the FLUX.2 inference repo and the FLUX.2 Klein 9B model card.

Run FLUX.2 Locally (2026): Klein 9B/4B VRAM + ComfyUI

Want to go deeper than this article?

What is FLUX.2 and how is it different from FLUX.1?

Reading articles is good. Building is better.

How much VRAM does FLUX.2 dev need? (the head-query answer)

FLUX.2 dev vs Klein 9B vs Klein 4B — full spec table

What's the best way to run FLUX.2 locally? (step-by-step in ComfyUI)

Reading articles is good. Building is better.

Quantization options: FP8, GGUF and NF4 explained

First-hand notes on speed and fit

No GPU? The honest fallback

Key Takeaways

Next Steps

Generating images locally? Take it further.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

FLUX.1 Local Image Generation Guide

Complete ComfyUI Guide

SD / Forge Setup Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Generating images locally? Take it further.