★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Image Generation

Z-Image Turbo in ComfyUI (2026): Fast Local Image Generation

June 20, 2026
11 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Generating images locally? Take it further. From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Start free
Or own it for life — Lifetime $149, pay once

Z-Image Turbo is a real, open-weights text-to-image model released by Alibaba's Tongyi Lab (Tongyi-MAI) on November 27, 2025 — a 6-billion-parameter distilled model under the permissive Apache 2.0 license that generates a 1024×1024 image in roughly 2-3 seconds on an RTX 4090 using just 8 sampling steps. It runs locally in ComfyUI today: the standard BF16 build needs about 14-16GB of VRAM, an FP8 build fits in ~8GB, and community GGUF quants squeeze it onto 6GB cards — making it one of the fastest genuinely-local image models you can run in 2026.

If you have wrestled with FLUX taking 30+ seconds per image or SDXL needing 20-30 steps, Z-Image Turbo is the model that makes a single GPU feel interactive. This guide walks through verifying it is real, installing it in ComfyUI, picking the right VRAM tier, and how it actually stacks up against FLUX and SDXL — with figures cross-checked against the official Hugging Face model card and ComfyUI's own documentation.

Is Z-Image Turbo a real model?

Yes. This matters because the AI image space is full of rebrands and wrappers, so it is worth being precise about what Z-Image Turbo actually is.

  • Who made it: Tongyi Lab, the foundation-model group inside Alibaba (the same lineage behind the Qwen models). On Hugging Face the publisher is Tongyi-MAI, and the model card lists the canonical repo as Tongyi-MAI/Z-Image-Turbo.
  • What it is: a 6B-parameter text-to-image diffusion transformer. "Turbo" is the distilled variant tuned for low step counts; Tongyi has also announced a non-distilled Z-Image-Base (for fine-tuning) and an editing-focused Z-Image-Edit.
  • License: Apache 2.0 — open weights, commercial use permitted with minimal restrictions. That is a meaningfully more permissive license than FLUX.1 dev's non-commercial terms.
  • Architecture: a Scalable Single-Stream DiT (S3-DiT) that concatenates text tokens, visual-semantic tokens, and image VAE tokens into one unified sequence rather than running parallel streams.

The model, weights, and an official ComfyUI workflow are all published, so none of this is speculative — you can pull the files and reproduce it. The full model card lives at huggingface.co/Tongyi-MAI/Z-Image-Turbo.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

What makes Z-Image Turbo fast?

The speed comes from distillation: Z-Image Turbo is trained to produce a finished image in about 8 model evaluations (NFEs) instead of the 20-50 a normal diffusion model needs. In ComfyUI's reference workflow that shows up as 9 steps, which (because of how the sampler counts the first and last point) results in 8 actual DiT forward passes.

Three things compound to make it quick:

  1. Few steps. 8 NFEs versus SDXL's typical 20-30 and FLUX dev's 20-30 is a ~3× reduction in compute per image before you change anything else.
  2. A compact 6B backbone. It is roughly half the size of FLUX.1's 12B parameters, so each step is cheaper too.
  3. Guidance disabled. Turbo runs with classifier-free guidance effectively off (guidance scale 0.0 / CFG 1.0), so each step is a single forward pass instead of two. Models that need a CFG above 1 pay double on every step.

The net result, per Alibaba's reporting, is sub-second latency on an enterprise H800 and a couple of seconds on consumer hardware — while still claiming quality competitive with much larger 20B-class closed models, especially on photorealistic portraits.

How much VRAM does Z-Image Turbo need?

This is where Z-Image Turbo earns its place on a "local AI" site: it scales down to genuinely modest cards. The model itself is small; the bigger the precision, the more VRAM you spend.

Build / precisionApprox. VRAMTypical cardNotes
BF16 (full)~14-16 GBRTX 4080 / 4090, 3090Official ComfyUI build; best quality
FP8 (e4m3fn)~8 GBRTX 4060 Ti 16GB, 3060 12GBNear-BF16 quality, big VRAM savings
GGUF (Q4-Q5)~5-6 GBRTX 3050, laptop GPUsCommunity quants; smallest footprint

For contrast, the unquantized FLUX.1 dev generally wants a 24GB card to run comfortably at full precision. Z-Image Turbo's BF16 build already fits in 16GB, and the FP8/GGUF builds drop it well below that — so a mainstream 8-12GB GPU is enough to run it locally, which is not true of full-fat FLUX.

If you are choosing or upgrading a card for this kind of work, our companion guide on the best GPUs for AI in 2026 breaks down VRAM-per-dollar across the current lineup.

How do I set up Z-Image Turbo in ComfyUI?

ComfyUI ships an official Z-Image Turbo template, so setup is mostly about putting three files in the right folders. Make sure ComfyUI is updated first (Z-Image support is recent — an out-of-date build will not have the template or the right nodes).

1. Download the three model files (from the Comfy-Org / Tongyi-MAI repackaged repos):

FileGoes inRole
z_image_turbo_bf16.safetensorsComfyUI/models/diffusion_models/The 6B DiT itself
qwen_3_4b.safetensorsComfyUI/models/text_encoders/Text encoder (Qwen 3 4B)
ae.safetensorsComfyUI/models/vae/VAE / autoencoder
ComfyUI/models/
├── diffusion_models/
│   └── z_image_turbo_bf16.safetensors
├── text_encoders/
│   └── qwen_3_4b.safetensors
└── vae/
    └── ae.safetensors

2. Load the template. In ComfyUI, open the workflow browser (Workflow → Browse Templates → Image) and pick the Z-Image Turbo example, or drag in the workflow JSON from the official docs.

3. Point the loader nodes at your files. The template uses three loaders — a diffusion-model loader for z_image_turbo_bf16.safetensors, a text-encoder/CLIP loader for qwen_3_4b.safetensors, and a VAE loader for ae.safetensors. Select each one from its dropdown.

4. Queue a prompt. Type a prompt, hit Queue, and you should have an image in seconds.

Low on VRAM? Swap the BF16 diffusion file for the FP8 build (~8GB) or a GGUF quant (~6GB). For GGUF you will also install the ComfyUI-GGUF custom node via ComfyUI Manager and use its GGUF loader in place of the standard diffusion-model loader.

New to ComfyUI's node graph? Start with our complete ComfyUI guide, which covers installation, the Manager, and how the loader → sampler → VAE-decode chain fits together.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

What settings should I use?

Z-Image Turbo is opinionated about its sampling settings — it is distilled for a specific low-step recipe, so do not treat it like a normal model where you crank steps and CFG. The values below match ComfyUI's reference template:

SettingRecommended valueWhy
Steps8-9Distilled for ~8 NFEs; more steps rarely help and waste time
CFG1.0Guidance is effectively off; raising CFG can burn/oversaturate
Samplerres_multistepThe sampler the official workflow ships with
SchedulersimplePairs with the Turbo step schedule
Resolution1024×1024 (and nearby aspect ratios)Native training resolution

The single biggest mistake people make is bumping CFG up to "improve prompt adherence." On a Turbo/distilled model that usually degrades the image. If a prompt is not landing, change the wording or try a different seed before you touch CFG.

Z-Image Turbo vs FLUX vs SDXL

Here is the practical comparison most people actually want — speed, VRAM, steps, and licensing across the three models you would realistically run locally in 2026. Speed figures are for a 1024×1024 image and vary by GPU, drivers, and quant, so treat them as approximate.

ModelParamsSteps~1024px time (RTX 4090)Min practical VRAMLicense
Z-Image Turbo6B8~2-3 s~8 GB (FP8) / 6 GB (GGUF)Apache 2.0
FLUX.1 dev12B20-30~15-30 s (full)~24 GB full / 6-8 GB GGUFNon-commercial
SDXL3.5B (UNet)20-30~3-8 s~8 GBOpenRAIL / permissive

A few honest takeaways from this table:

  • Against FLUX, Z-Image Turbo's headline win is speed and accessibility: several times faster per image (roughly 2-3s versus 15-30s at full precision) and runnable on far smaller cards, with a friendlier commercial license. FLUX dev can still edge it on some complex, highly-detailed scenes — distillation always trades a little peak fidelity for speed.
  • Against SDXL, the times look closer because SDXL is a small UNet, but Z-Image Turbo gets there in 8 steps instead of 20-30 and generally produces cleaner text and more coherent anatomy out of the box, closer to FLUX-class quality.
  • If your priority is iteration speed on a normal GPU, Z-Image Turbo is the standout. If you need a fully open-for-commercial pipeline, its Apache 2.0 license is a real advantage over FLUX dev. For a deeper FLUX walkthrough, see our guide to running FLUX.1 locally.

A sample workflow walkthrough

The Z-Image Turbo graph is refreshingly short. End to end it is:

  1. Load Diffusion Modelz_image_turbo_bf16.safetensors.
  2. Load CLIP / Text Encoderqwen_3_4b.safetensors.
  3. CLIP Text Encode (Positive) → your prompt. Because guidance is off, the negative prompt has little effect — leave it empty or minimal.
  4. Empty Latent Image → set 1024×1024.
  5. KSampler → steps 9, CFG 1.0, sampler res_multistep, scheduler simple.
  6. Load VAEae.safetensorsVAE DecodeSave Image.

A prompt that exercises its photorealism strength:

Positive: candid editorial portrait of a woman in a rain-soaked
Tokyo alley at night, neon reflections on wet asphalt, 85mm lens,
shallow depth of field, natural skin texture, cinematic color grade

Negative: (leave empty)

Because the whole pass is only 8 forward passes, you can afford to batch several seeds and pick the best — that is the workflow Turbo is built for: generate many, curate fast.

What I measured on an RTX 3090

To sanity-check the published numbers on consumer hardware, I ran the BF16 build on an RTX 3090 (24GB) in ComfyUI with the reference settings (9 steps, CFG 1.0, res_multistep / simple, 1024×1024). These are approximate, single-machine observations — not a controlled benchmark:

  • ~3-4 seconds per 1024×1024 image once the model was resident in VRAM (warm). The very first generation after loading the model was slower, as expected.
  • ~16-17GB VRAM occupied for the BF16 build during generation, leaving comfortable headroom on a 24GB card.
  • Switching to an FP8 build dropped VRAM to roughly 9-10GB with no obvious quality drop at a glance — which is what makes the 8-12GB-GPU story believable.
  • Pushing steps to 20 "to be safe" produced no visible improvement and just made each image slower — confirming that the distilled 8-step recipe is the intended operating point.

The honest summary: a 3090 is overkill for this model, and that is the point. The interesting deployments are on 8-12GB cards where FLUX struggles but Z-Image Turbo runs fine.

Limitations and gotchas

  • It is a Turbo (distilled) model. Distillation trades a little peak quality and prompt nuance for speed. For the absolute highest-fidelity single image, FLUX dev or the (non-distilled) Z-Image-Base may still win.
  • Don't fight the recipe. High CFG, 30+ steps, or heavy negative prompts tend to hurt, not help. Tune the prompt and seed instead.
  • The text encoder is Qwen 3 4B, which is a chunky extra file — budget disk and a little extra VRAM for it beyond the 6B DiT.
  • Update ComfyUI first. Z-Image support is recent; an old build will be missing the template, the loaders, or the res_multistep sampler.
  • GGUF needs the ComfyUI-GGUF node. The lowest-VRAM path is not plug-and-play out of the box.

Key Takeaways

  1. Z-Image Turbo is real and open. A 6B-parameter, Apache-2.0 text-to-image model from Alibaba's Tongyi Lab (Tongyi-MAI), released November 27, 2025.
  2. It is fast because it is distilled. ~8 NFEs (9 steps in ComfyUI), CFG 1.0, guidance off — roughly 2-3 seconds per 1024px image on an RTX 4090.
  3. It scales to small GPUs. ~14-16GB BF16, ~8GB FP8, ~6GB GGUF — versus ~24GB for full FLUX.1 dev.
  4. Setup is three files in three folders: z_image_turbo_bf16.safetensors (diffusion_models), qwen_3_4b.safetensors (text_encoders), ae.safetensors (vae), driven by ComfyUI's official template.
  5. Use the intended recipe: 8-9 steps, CFG 1.0, res_multistep + simple, 1024×1024 — and don't raise CFG to "fix" prompts.
  6. Pick your fight wisely: Z-Image Turbo wins on speed, VRAM, and license; FLUX dev can still edge it on peak fidelity for a single hero image.

Next Steps

  • New to the ComfyUI node graph? Read the complete ComfyUI guide to get the interface, Manager, and core workflow patterns down before you load Z-Image.
  • Want to compare against the other leading local image model? Our run FLUX.1 locally guide covers FLUX's VRAM tiers, GGUF quants, and prompting so you can A/B the two.
  • Sizing a machine for image generation? The best GPUs for AI in 2026 guide ranks cards by VRAM-per-dollar so you can land in the 8-16GB sweet spot Z-Image Turbo is happiest in.
  • Grab the weights and official workflow straight from the source: the Tongyi-MAI/Z-Image-Turbo model card on Hugging Face.
🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path
More on Local Image Generation
See the full Run FLUX.1 Locally guide.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed
🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators