★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Image Models

Run FLUX.2 Locally (2026): Klein 9B/4B VRAM + ComfyUI

June 20, 2026
12 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Generating images locally? Take it further. From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Start free
Or own it for life — Lifetime $149, pay once

The best way to run FLUX.2 locally in 2026 is to pick the variant your VRAM can hold: FLUX.2 [dev] is a 32B model that needs about 64 GB at BF16, ~32 GB at FP8, or roughly 19 GB as a GGUF Q4_K_S on a 24 GB RTX 4090 — while FLUX.2 [klein] 9B fits a 16 GB card at FP8 (~15 GB) and FLUX.2 [klein] 4B runs on a 12 GB card (~13 GB BF16, less when offloaded). Install ComfyUI, drop the matching diffusion-model, text-encoder and VAE files into the right folders, load the official FLUX.2 template, and you are generating. If you only have an 8-12 GB GPU, run Klein 4B or a low GGUF build of dev; if you have no usable GPU at all, a rented cloud GPU is the realistic fallback — there is no fast CPU-only path for a 32B image model.

This guide is FLUX.2-specific. Our older FLUX.1 local setup guide still applies to the FLUX.1 family, but FLUX.2 changed the parameter count, the text encoder, and the VRAM math enough that the FLUX.1 numbers will mislead you. Below is the corrected, verified picture for the FLUX.2 lineup as of mid-2026.

What is FLUX.2 and how is it different from FLUX.1?

FLUX.2 is Black Forest Labs' second-generation open-weight image family. FLUX.2 [dev] shipped on November 25, 2025, and the lighter, distilled FLUX.2 [klein] models (9B and 4B) followed on January 15, 2026. Compared with FLUX.1, the practical wins are:

  • Much better text rendering. Legible signage, UI mockups, memes and infographics are far more reliable than FLUX.1, which routinely garbled longer strings. Black Forest Labs frames this as production-grade typography rather than the occasional lucky word.
  • Stronger prompt adherence. FLUX.2 follows multi-part, structured prompts (compositional constraints, "X on the left, Y on the right") more faithfully, which cuts the retry loop.
  • Higher resolution. FLUX.2 generates up to roughly 4 megapixels, useful for print and aggressive cropping.
  • Multi-reference editing up to 10 images. You can feed character sheets, product shots or style boards and keep consistency across them — a genuinely new capability versus FLUX.1.
  • A new text encoder. FLUX.2 [dev] swaps FLUX.1's dual CLIP+T5 setup for a single Mistral Small 24B vision-language encoder; the Klein models use a Qwen3 text embedder (8B on 9B, 4B on the 4B). This is why the FLUX.1 download list does not transfer.

The catch is size. FLUX.1 [dev] was a 12B model that fit a 24 GB card at FP16. FLUX.2 [dev] is 32B, so the headline model no longer fits a consumer card without quantization. That single fact reshapes the whole "how much VRAM" conversation, so let's get exact.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

How much VRAM does FLUX.2 dev need? (the head-query answer)

FLUX.2 [dev] is a 32-billion-parameter rectified-flow transformer. Here are the verified figures, sourced from Black Forest Labs, NVIDIA and the FLUX.2 inference repo. "Practical total" assumes the Mistral text encoder and VAE are also resident or streamed; with CPU offload you can push lower at a speed cost.

FLUX.2 [dev] precisionApprox model VRAMRealistic GPUNotes
BF16 / FP16 (full)~64 GBH100 80G / H200 / B200Reference quality; not a consumer target
FP8 (fp8_scaled)~32 GBH100 PCIe / A100 80G / RTX 5090 32G (tight)~40% VRAM cut, near-identical quality
GGUF Q4_K_S~19 GBRTX 4090 24 GBThe realistic single-consumer-GPU path for dev
Lower GGUF (Q3/Q2) + offload<16 GB12-16 GB cards w/ lots of system RAMWorks, but quality and speed drop noticeably

So the honest answer to "flux.2 dev vram requirements 2026": you want a 24 GB card minimum to run [dev] comfortably, via a GGUF Q4 build at about 19 GB. FP8 (~32 GB) needs a 32 GB+ card. Full BF16 (~64 GB) is a datacenter affair. If your card is below 24 GB, do not fight the 32B [dev] model — use Klein, which is what the next section is about. To sanity-check any specific quant against your exact card and context, run it through our 2026 VRAM requirements breakdown.

FLUX.2 dev vs Klein 9B vs Klein 4B — full spec table

This is the comparison most people actually need: which FLUX.2 variant matches your GPU. Klein is the distilled, consumer-friendly branch (4 inference steps, sub-second on high-end cards); [dev] is the heavyweight.

VariantParamsReleasedText encoderStepsVRAM (native)VRAM (FP8)License
FLUX.2 [dev]32B denseNov 25, 2025Mistral Small 24B~28-50~64 GB BF16~32 GBFLUX dev non-commercial
FLUX.2 [klein] 9B9B flow + 8B Qwen3Jan 15, 2026Qwen3 8B4 (distilled)~29 GB BF16~15 GB (fits 16 GB)FLUX non-commercial
FLUX.2 [klein] 4B4B flow + 4B Qwen3Jan 15, 2026Qwen3 4B4 (distilled)~13 GB BF16~8-9 GB measuredApache 2.0

Three things worth calling out. First, Klein 4B is the only FLUX.2 weight under a permissive Apache 2.0 license — [dev] and Klein 9B are non-commercial, so if you need commercial use without an API, 4B is your model. Second, Klein 9B at FP8 lands around 15 GB, which is the sweet spot for a 16 GB card (RTX 4070 Ti Super, 4080, or a 16 GB Mac via comparable tooling). Third, Klein 4B is genuinely a 12 GB-class model: Black Forest Labs lists ~13 GB BF16 (RTX 3090/4070 and up), and ComfyUI's own template reports the distilled 4B running in roughly 8.4 GB with offload on a 5090. Treat the low end as offload-assisted, not a guarantee on an 8 GB card.

What's the best way to run FLUX.2 locally? (step-by-step in ComfyUI)

ComfyUI is the path of least resistance — it ships native FLUX.2 templates, so you are not hand-wiring nodes. Here is the clean sequence.

Step 1 — Install or update ComfyUI. Use a recent build (the FLUX.2 nodes and templates are baked into current releases). If you plan to run GGUF [dev] builds, also install the ComfyUI-GGUF custom node by city96.

Step 2 — Pick your variant by VRAM:

  • 8-12 GB GPU → FLUX.2 [klein] 4B (FP8 or GGUF).
  • 16 GB GPU → FLUX.2 [klein] 9B (FP8, ~15 GB).
  • 24 GB GPU → FLUX.2 [dev] as GGUF Q4_K_S (~19 GB), or Klein 9B for speed.
  • 32 GB+ / datacenter → FLUX.2 [dev] FP8 or BF16.

Step 3 — Download the three files for your variant and place them in the right folders. For Klein (the consumer case), ComfyUI's template expects, for the 9B:

  • Diffusion model → ComfyUI/models/diffusion_models/ (e.g. flux-2-klein-9b-fp8.safetensors)
  • Text encoder → ComfyUI/models/text_encoders/ (e.g. qwen_3_8b_fp8mixed.safetensors)
  • VAE → ComfyUI/models/vae/ (flux2-vae.safetensors)

For Klein 4B, use flux-2-klein-4b-fp8.safetensors with qwen_3_4b.safetensors and the same flux2-vae.safetensors. For [dev], the diffusion file is your chosen GGUF or FP8 build and the text encoder is mistral_3_small_flux2_fp8.safetensors (or the BF16 version if you have the VRAM).

Step 4 — Load the official FLUX.2 template from ComfyUI's template browser. The Klein templates are named like image_flux2_klein_text_to_image.json (4B) and image_flux2_text_to_image_9b.json (9B), with separate image-edit templates for single- and multi-reference work. The template pre-wires the loaders, sampler and VAE decode so you only set the prompt.

Step 5 — Generate. Klein is distilled to 4 steps, so first images come back fast; [dev] uses more steps and is slower but higher fidelity. Iterate on the prompt, then raise resolution toward the ~4 MP ceiling once you like the composition.

If you are new to ComfyUI's node graph, work through our complete ComfyUI guide first — it covers installation, model folders, custom nodes and the sampler/VAE chain that the FLUX.2 templates assume you understand.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Quantization options: FP8, GGUF and NF4 explained

You will see four formats in the wild. Choose by what your card can hold while keeping quality acceptable:

  • BF16 / FP16 — full precision, reference quality, biggest VRAM. Only realistic for Klein 4B (~13 GB) on consumer cards, or [dev] on datacenter GPUs.
  • FP8 (fp8_scaled) — about a 40% VRAM cut with near-identical quality; the recommended format on RTX 40/50-series. This is how Klein 9B drops to ~15 GB and [dev] to ~32 GB.
  • GGUF (Q8 down to Q2) — flexible CPU/GPU-offloadable quantization via the ComfyUI-GGUF node. For [dev], Q4_K_S (~19 GB) is the workable 24 GB-card build; lower quants trade quality for fit. Keep FP16 compute even with GGUF weights — stacking FP8 on top of GGUF buys little and can cause instability.
  • NF4 / NVFP4 — 4-bit formats; NVFP4 is fastest on RTX 50-series specifically. Useful for squeezing more headroom, with a quality trade-off versus FP8.

A practical rule: prefer FP8 if it fits, drop to GGUF Q4 if it does not, and only go below Q4 when you have no other way to load the model.

First-hand notes on speed and fit

Treat these as approximate, single-machine observations rather than a controlled benchmark. On a 24 GB RTX 4090, a GGUF Q4_K_S build of FLUX.2 [dev] loads with a few GB of headroom for context and renders a 1 MP image in the tens-of-seconds range — slower than FLUX.1 [dev] was, which is the cost of going from 12B to 32B. FLUX.2 [klein] 9B at FP8 on a 16 GB card felt dramatically snappier because it is distilled to 4 steps; first images come back in seconds, which makes it the better choice for interactive iteration even though [dev] edges it on fine detail. The familiar cliff still applies: the moment weights spill from VRAM into system RAM, generation time balloons, so keep the model resident or use deliberate offload rather than accidental overflow. For a per-GPU view of where these models comfortably sit, our Forge / Stable Diffusion setup guide covers the same VRAM-budgeting habits on a lighter engine.

No GPU? The honest fallback

There is no fast CPU-only route for a 32B image model — FLUX.2 [dev] on CPU is effectively unusable, and even Klein is impractical without a GPU. The realistic "no local GPU" options are: rent a cloud GPU by the hour (an A100 80G or H100 runs [dev] at FP8 comfortably; a 24 GB cloud 4090 runs the GGUF Q4 build), or use Black Forest Labs' hosted FLUX.2 API for the [pro]/[max] tiers if you do not need the weights locally. If your goal is specifically local and free, buy the variant to your hardware: Klein 4B is the entry point and is Apache 2.0, so it is the one most people without high-end cards should start on.

Key Takeaways

  1. FLUX.2 [dev] is 32B, not 12B like FLUX.1 [dev] — it needs ~64 GB BF16, ~32 GB FP8, or ~19 GB as a GGUF Q4_K_S on a 24 GB RTX 4090.
  2. Match the variant to your VRAM: Klein 4B for 12 GB (Apache 2.0), Klein 9B FP8 (~15 GB) for 16 GB, [dev] GGUF Q4 for 24 GB, [dev] FP8/BF16 for 32 GB+/datacenter.
  3. FLUX.2 beats FLUX.1 on text rendering, prompt adherence, ~4 MP resolution, and multi-reference editing up to 10 images, with a new Mistral/Qwen3 text-encoder stack.
  4. ComfyUI is the easiest path: install, drop the diffusion/text-encoder/VAE files in their folders, load the native FLUX.2 template, generate. Use the ComfyUI-GGUF node for [dev] GGUF builds.
  5. FP8 first, GGUF Q4 if it doesn't fit. Keep the model fully in VRAM (or use deliberate offload) to avoid the system-RAM speed cliff.

Next Steps

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path
More on Local Image Generation
See the full Run FLUX.1 Locally guide.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators