ComfyUI Complete Guide (2026): Install, Workflows, ControlNet, Flux, SDXL
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Generating images locally? Take it further. From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.
ComfyUI is a free, open-source node-based interface for running diffusion models locally — you wire each step (load checkpoint, encode prompt, sample, decode) into a visual graph. It is the dominant Stable Diffusion frontend in 2026 because it supports new models on day one and exposes every internal step. To start: download the portable Windows build (or git clone on Linux/Mac), install ComfyUI Manager, drop a .safetensors checkpoint in models/checkpoints/, and click Queue Prompt. Minimum hardware is a 6 GB NVIDIA GPU (SD 1.5); a 12 GB card runs SDXL and Flux Schnell, and 24 GB runs Flux Dev and video models.
This guide covers everything: installation across NVIDIA / AMD / Mac, the node graph mental model, ControlNet, IPAdapter, regional prompting, LoRA stacks, Flux / FLUX.2 [klein] / SDXL / SD 3.5 workflows, video generation (Wan, HunyuanVideo, Mochi), and serious tuning for 24 GB and below.
Whether you are coming from Automatic1111, Forge, or fresh to local image gen, this is the reference.
Table of Contents
- What ComfyUI Is and Why It Wins
- Hardware Requirements
- Installation: Windows, Linux, macOS
- Folder Layout & Where Models Go
- ComfyUI Manager — Mandatory First Install
- The Node Graph Mental Model
- Your First Workflow: SDXL Text-to-Image
- Models in 2026: Flux, SDXL, SD 3.5, Pony, Illustrious
- LoRAs and Embeddings
- ControlNet — Composition, Pose, Depth, Canny
- IPAdapter — Image Prompts and Style Transfer
- Inpainting, Outpainting, and Upscaling
- Regional Prompting and Conditioning Combine
- Flux: Dev, Schnell, GGUF, Quantized
- Video Generation: Wan 2.2, HunyuanVideo, Mochi
- API Mode and Programmatic Use
- VRAM Optimization Tricks
- Speed Tuning: Sage Attention, Triton, Compile
- Common Custom Node Packs Worth Installing
- Troubleshooting
- FAQ
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
What ComfyUI Is and Why It Wins {#what-is}
Diffusion image generation is a pipeline:
Prompt → Text Encoder → Conditioning
↓
Empty Latent → Sampler ← Model (UNet/DiT)
↓
VAE Decode → Image
A1111 / Forge / Fooocus hide this pipeline behind a tabbed UI. ComfyUI exposes it as a graph: nodes for "Load Checkpoint," "CLIP Text Encode," "KSampler," "VAE Decode," "Save Image," with explicit data flow between them. You can branch, merge, swap, and chain nodes arbitrarily.
This is why ComfyUI is the fastest to support new models: when Flux launched, supporting it was a matter of writing a new "Load Diffusion Model" node and a new sampler — no UI surgery required. Same for SD 3.5, Wan 2.2, HunyuanVideo, and every model since 2023.
The trade-off: more upfront learning. But every workflow is a JSON file you can load with one drag-and-drop. The community ships pre-built workflows for almost every common task.
Hardware Requirements {#requirements}
| Tier | GPU | Models You Can Run |
|---|---|---|
| Minimum | 6 GB VRAM | SD 1.5 only |
| Entry | 8 GB VRAM (RTX 3060 8GB) | SDXL Q8, SD 1.5 |
| Recommended | 12 GB (RTX 3060 12GB / 4070) | SDXL FP16, Flux Schnell GGUF Q4 |
| Sweet spot | 16 GB (RTX 4080 / 5070 Ti) | Flux Dev FP8, SD 3.5 Large |
| High-end | 24 GB (RTX 3090 / 4090) | Flux Dev BF16, Wan 2.2, HunyuanVideo Q4 |
| Top | 32+ GB (RTX 5090 / RTX 6000 Ada) | HunyuanVideo BF16, full Mochi |
RAM: at least 2x your largest model file in system RAM for offload buffers. 32 GB is the realistic minimum, 64 GB recommended for video.
Disk: Flux Dev = 24 GB, SDXL = 7 GB, SD 1.5 = 4 GB, plus VAEs (300 MB each), CLIP (1-5 GB), LoRAs (10-500 MB each), ControlNet (1.5 GB each). Plan for 200-500 GB on NVMe.
AMD: RX 7900 XTX works via ROCm at 70-85% of equivalent NVIDIA speed. See AMD ROCm Setup.
Apple Silicon: M2 or newer via MPS, but 3-5x slower than NVIDIA. Use MLX-based alternatives like Draw Things for better performance.
Installation: Windows, Linux, macOS {#installation}
Windows (portable — recommended)
- Download the latest
ComfyUI_windows_portable.7zfrom github.com/comfyanonymous/ComfyUI/releases. - Extract with 7-Zip to e.g.
D:\ComfyUI\. - Run
run_nvidia_gpu.bat(NVIDIA) orrun_cpu.bat(no GPU). - Browser opens to
http://127.0.0.1:8188.
Linux / Mac (git)
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3.11 -m venv venv
source venv/bin/activate
# NVIDIA CUDA 12.4
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# AMD ROCm 6.2
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2
# Mac MPS
pip install torch torchvision
# ComfyUI dependencies
pip install -r requirements.txt
# Run
python main.py --listen 0.0.0.0
Docker
docker run --gpus all \
-p 8188:8188 \
-v $(pwd)/models:/app/models \
-v $(pwd)/output:/app/output \
-v $(pwd)/workflows:/app/workflows \
yanwk/comfyui-boot:latest
yanwk/comfyui-boot includes Manager, common custom nodes, and starter models.
Useful launch flags
| Flag | Purpose |
|---|---|
--listen 0.0.0.0 | Accept connections from LAN |
--port 8188 | Change port |
--lowvram | Aggressive memory offload (12 GB and below) |
--novram | CPU offload everything (only for emergencies) |
--use-pytorch-cross-attention | Force PyTorch SDPA (most stable) |
--use-sage-attention | Use Sage Attention (faster, see Speed Tuning) |
--fast | Enable FP8 ops on Ada/Hopper/Blackwell |
--cache-classic | Old caching behavior (workaround for some custom nodes) |
--preview-method auto | Show in-progress previews during sampling |
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Folder Layout & Where Models Go {#folder-layout}
ComfyUI/
├── models/
│ ├── checkpoints/ # SD 1.5, SDXL, SD 3.5, Pony, Illustrious .safetensors
│ ├── unet/ # Flux UNet/DiT files
│ ├── diffusion_models/ # Newer alias for unet/
│ ├── clip/ # T5 / CLIP-L / CLIP-G text encoders
│ ├── vae/ # VAE decoders
│ ├── loras/ # LoRA files
│ ├── controlnet/ # ControlNet models
│ ├── embeddings/ # Textual inversion embeddings
│ ├── ipadapter/ # IPAdapter models
│ ├── upscale_models/ # 4x-UltraSharp, RealESRGAN, etc.
│ └── animatediff_models/
├── custom_nodes/ # Third-party node packs
├── workflows/ # Saved JSON workflows
├── input/ # Images for img2img, ControlNet
└── output/ # Generated images
To share models between A1111/Forge and ComfyUI without duplicating files:
# Edit ComfyUI/extra_model_paths.yaml
a111:
base_path: /path/to/stable-diffusion-webui/
checkpoints: models/Stable-diffusion
loras: models/Lora
controlnet: models/ControlNet
upscale_models: models/ESRGAN
embeddings: embeddings
vae: models/VAE
ComfyUI Manager — Mandatory First Install {#manager}
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager
Restart ComfyUI. The "Manager" button appears on the right sidebar.
What it does:
- Install Missing Custom Nodes — when you load someone else's workflow.
- Install Models — auto-downloads missing checkpoints/LoRAs/ControlNets.
- Update All — keeps ComfyUI and all custom nodes current.
- Snapshot / Restore — versioned backups before risky updates.
- Disable / Uninstall — surgically remove a custom node.
Without Manager, every missing node is a manual git clone. Install it first; do not skip.
The Node Graph Mental Model {#node-graph}
A workflow is a directed graph. Each node:
- Has inputs (connectors on the left)
- Has outputs (connectors on the right)
- Performs a function (load model, encode text, sample, decode VAE)
Connections carry typed values:
MODEL— diffusion modelCLIP— text encoderVAE— image encoder/decoderCONDITIONING— encoded promptLATENT— latent image (compressed representation)IMAGE— pixel imageMASK— binary maskCONTROL_NET— ControlNet modelSTRING,INT,FLOAT— primitives
The default workflow looks like:
[Load Checkpoint] → MODEL ─┐
→ CLIP │
→ VAE ───┼──┐
│ │
[CLIP Text Encode pos] ←──┘ │
↓ CONDITIONING │
[KSampler] ←──────────────────┘
↑ LATENT (from Empty Latent Image)
↓ LATENT (after sampling)
[VAE Decode] → IMAGE
↓
[Save Image]
Master this and you can build anything. Common patterns:
- Two samplers in series — base + refiner (SDXL).
- Conditioning combine — merge two prompts with weights.
- ControlNet apply — modulate conditioning with control image.
- Latent composite — blend two latents before decoding.
Your First Workflow: SDXL Text-to-Image {#first-workflow}
- Download
sd_xl_base_1.0.safetensorsfrom Hugging Face → put inmodels/checkpoints/. - Open ComfyUI → Load Default workflow (right sidebar).
- In Load Checkpoint node, select
sd_xl_base_1.0.safetensors. - In Empty Latent Image, set width/height to 1024×1024.
- In CLIP Text Encode (Positive), write your prompt: e.g. "cinematic photo of a samurai standing in a misty forest, ultra-detailed, 35mm film".
- In CLIP Text Encode (Negative), write: "blurry, deformed, extra fingers, low quality, bad anatomy".
- In KSampler, set steps=25, cfg=7.0, sampler=
dpmpp_2m, scheduler=karras. - Click Queue Prompt.
Expected time on RTX 4090: ~3-5 seconds per 1024² image.
Models in 2026: Flux, SDXL, SD 3.5, Pony, Illustrious {#models}
| Model | Params | License | VRAM (BF16) | Strengths |
|---|---|---|---|---|
| FLUX.2 [klein] | 4B | Apache 2.0 | ~13 GB | Newest (Jan 2026), commercial-friendly, sub-second on consumer GPUs |
| Flux Dev | 12B | Non-commercial | ~24 GB | Best general quality, prompt adherence |
| Flux Schnell | 12B | Apache 2.0 | ~24 GB | 4-step distilled — fastest top-tier |
| SD 3.5 Large | 8B | Stability AI Community | ~16 GB | Permissive license, strong prompts |
| SD 3.5 Medium | 2.5B | Stability AI Community | ~10 GB | Lower VRAM, decent quality |
| SDXL 1.0 / Lightning | 3.5B | OpenRAIL | ~10 GB | Largest LoRA ecosystem |
| Pony Diffusion v6 XL | 3.5B | Fair AI | ~10 GB | Anime / character / NSFW |
| Illustrious XL | 3.5B | Fair AI | ~10 GB | Anime, cleaner than Pony |
| SD 1.5 | 0.9B | OpenRAIL | ~4 GB | Legacy, fast iteration |
Flux GGUF / FP8 quantized
Flux Dev in BF16 needs ~24 GB. To run on 12-16 GB:
- Flux Dev FP8 (
flux1-dev-fp8.safetensors) — ~12 GB, near-identical quality. Use--fastflag. - Flux Dev GGUF Q8_0 — ~13 GB.
- Flux Dev GGUF Q4_K_S — ~7 GB. Use the
ComfyUI-GGUFcustom node by city96.
[UnetLoader (GGUF)] → MODEL
[DualCLIPLoader (GGUF)] → CLIP # T5-XXL + CLIP-L
[Load VAE] → VAE
LoRAs and Embeddings {#loras}
LoRA chains
[Load Checkpoint] → MODEL ──────┐
→ CLIP ───────┤
↓
[Load LoRA #1]
↓
[Load LoRA #2]
↓
[Load LoRA #3]
↓
MODEL → KSampler
CLIP → CLIP Text Encode
Each LoRA node has strength_model and strength_clip (0.0-1.5 typical, 1.0 default). Stack as many as you want — but >3 strong LoRAs usually conflict.
Embeddings (textual inversion)
Reference in your prompt with embedding:filename (without extension):
masterpiece, beautiful landscape, embedding:negative_easy in negative
Place .pt / .safetensors files in models/embeddings/.
Best LoRA sources in 2026
- Civitai — largest collection, NSFW filter optional.
- Hugging Face — official model authors.
- Tensor.Art — curated workflows.
Always check the LoRA's recommended trigger words and base model — an SDXL LoRA does not work on Flux.
ControlNet — Composition, Pose, Depth, Canny {#controlnet}
ControlNet conditions generation on a structural input (pose, depth map, edges, etc.).
Pattern
[Load Image] → IMAGE → [Canny Preprocessor] → IMAGE
↓
[Load ControlNet (canny)] → CONTROL_NET ─────┐ │
↓ ↓
[CLIP Text Encode pos] → CONDITIONING → [Apply ControlNet] → CONDITIONING
↓
[KSampler]
Common preprocessors
| Type | Use Case |
|---|---|
| Canny | Preserve edges from reference image |
| Depth (MiDaS, Marigold) | Match 3D structure |
| OpenPose | Match human pose |
| DWPose | Higher-quality OpenPose alternative |
| LineArt | Line drawings, anime |
| Scribble | Rough sketches → finished images |
| Tile | Upscale-friendly preservation |
| Reference | Style match without LoRA |
| InstantID / IP-Composition | Face / composition transfer |
SDXL vs Flux ControlNet
SDXL has the most mature ControlNet ecosystem (xinsir, kohya, lllyasviel). Flux ControlNets are catching up — InstantX, Shakker-Labs, and Black Forest Labs ship Flux ControlNets but coverage is narrower. Still deciding which base model to standardize on? Our SDXL vs Flux local comparison weighs quality, speed, VRAM, and ecosystem maturity head to head.
IPAdapter — Image Prompts and Style Transfer {#ipadapter}
IPAdapter conditions generation on a reference image (instead of a text prompt). Best for style transfer, character consistency, and "make it look like this" workflows.
[Load Image (reference)] → IMAGE
↓
[IPAdapter Unified Loader] → MODEL, IPADAPTER
↓
[IPAdapter Advanced] ← MODEL ←─────┘
↓ MODEL
[KSampler]
Use the ComfyUI_IPAdapter_plus custom node by cubiq.
IPAdapter strength
| Strength | Effect |
|---|---|
| 0.3 | Subtle style hint |
| 0.6 | Clear style influence |
| 0.9-1.0 | Strong reference dominance |
| 1.2+ | Reference overrides prompt |
IPAdapter FaceID
For consistent character across generations: ip-adapter-faceid-portrait_sdxl.bin + face embedding extracted with InsightFace. One reference image → consistent character in any pose / scene / outfit.
Inpainting, Outpainting, and Upscaling {#inpainting}
Inpainting
[Load Image] → IMAGE
[Load Image (mask)] → IMAGE
[Image to Mask] → MASK
[VAE Encode (Inpaint)] → LATENT (with masked region noised)
[KSampler] (with denoise=0.8) → LATENT
[VAE Decode] → IMAGE
Use the dedicated inpainting checkpoint (e.g. sd_xl_base_1.0_inpainting_0.1.safetensors) for best results. Set sampler denoise to 0.7-0.95.
Outpainting
Use Pad Image for Outpainting node → mask the new edges → inpaint.
Upscaling
Two stages:
- Latent upscale (cheap, blurry) —
Latent Upscale bynode, factor 2.0. - Image upscale model (sharp) —
Upscale Image (using Model)with 4x-UltraSharp or RealESRGAN_x4plus_anime_6B for anime.
Or iterative upscale: small image → upscale 1.5x → low-denoise sampler pass → upscale 1.5x again. Best quality, slowest. For a deeper comparison of the upscale models themselves (RealESRGAN, 4x-UltraSharp, SUPIR), see our guide to local AI image upscaling.
Regional Prompting and Conditioning Combine {#regional}
To prompt different regions of the image differently (left side: knight, right side: wizard):
Use ComfyUI_Cutoff or ComfyUI-RegionalPrompter custom nodes. Pattern:
[CLIP Text Encode "knight"] → COND_LEFT
[CLIP Text Encode "wizard"] → COND_RIGHT
[Conditioning (Set Area)] (left half) → COND_LEFT_AREA
[Conditioning (Set Area)] (right half) → COND_RIGHT_AREA
[Conditioning Combine] → COND_FINAL → KSampler
Areas are specified in pixel coordinates. Resolution must match Empty Latent Image size.
Flux: Dev, Schnell, GGUF, Quantized {#flux}
Flux is a 12B-parameter Diffusion Transformer (DiT) — different architecture from SD's UNet, with better prompt adherence and text rendering.
Files needed
models/unet/flux1-dev.safetensors # 24 GB BF16
models/clip/t5xxl_fp16.safetensors # 9.8 GB
models/clip/clip_l.safetensors # 246 MB
models/vae/ae.safetensors # 335 MB
For 16 GB VRAM use flux1-dev-fp8.safetensors (12 GB) and t5xxl_fp8_e4m3fn.safetensors (5 GB).
Workflow
[Load Diffusion Model] → MODEL (flux1-dev)
[DualCLIPLoader] → CLIP (clip_l + t5xxl)
[Load VAE] → VAE (ae.safetensors)
[CLIP Text Encode] → CONDITIONING
[Empty Latent Image] (1024×1024) → LATENT
[KSamplerAdvanced] (20 steps, cfg=1.0, euler, simple scheduler)
[VAE Decode] → IMAGE
Flux uses cfg=1.0 (no classifier-free guidance) — set CFG to 1.0 always. Different from SD which uses CFG 5-10. For a node-by-node build with screenshots, follow our dedicated ComfyUI Flux workflow guide.
Flux Schnell
Same workflow, but use flux1-schnell.safetensors and 4 steps. ~5x faster, slightly lower quality.
FLUX.2 [klein]
Black Forest Labs released FLUX.2 [klein] in January 2026 — a 4B-parameter distilled model under an Apache 2.0 license (the first fully commercial-friendly Flux). It runs in ~13 GB VRAM, generates in under a second on a consumer GPU, and supports text-to-image, image editing, and multi-reference prompting at quality that punches above its size. ComfyUI added native workflow support, so it loads the same way as Flux Dev (Load Diffusion Model → DualCLIPLoader → Load VAE). Pick it when you need a fast, license-clean Flux for commercial work; Flux Dev still edges it on peak quality.
Flux LoRA
[Load LoRA] (after Load Diffusion Model)
Most Flux LoRAs work at 0.7-1.0 strength. Civitai now has 3,000+ Flux LoRAs. Want to train your own on your face, product, or art style? Our local image LoRA training guide walks through dataset prep and training on a single consumer GPU.
Video Generation: Wan 2.2, HunyuanVideo, Mochi {#video}
Wan 2.2 (recommended starting point)
Alibaba's open-source video model. 5-10 second 720p clips, ~24 GB VRAM at Q8 GGUF. On a smaller card? Our guide to local text-to-video on low VRAM covers the GGUF quants and offload settings that get Wan and HunyuanVideo running on 8-12 GB GPUs.
[UnetLoader (GGUF)] → MODEL (wan2.2-i2v-q8_0.gguf)
[DualCLIPLoader] → CLIP (umt5-xxl encoder)
[Load VAE] → VAE
[Load Image] → IMAGE (first frame for image-to-video)
[CLIP Text Encode] → CONDITIONING
[WanImageToVideo] → LATENT (sequence)
[KSampler] (30 steps)
[VAE Decode (sequence)] → IMAGES
[Video Combine (FFmpeg)] → MP4
Render time on RTX 4090: ~6-10 minutes for 5 seconds at 720p.
HunyuanVideo
Tencent's 13B video DiT. Highest quality, ~40 GB BF16 (fits in 24 GB at Q4 GGUF). Use kijai/ComfyUI-HunyuanVideoWrapper. HunyuanVideo 1.5 (released late 2025) is a lighter 8.3B model with native ComfyUI support that hits flagship quality at native 720p (upscalable to 1080p) and runs comfortably on a single 24 GB GPU — start here if you have 24 GB rather than the original 13B model.
Mochi
Genmo's 10B model. Lower VRAM (~16 GB), faster, slightly lower quality than Hunyuan.
Frame interpolation and upscaling
After generating a 24fps video:
- RIFE for interpolation to 60fps (
ComfyUI-Frame-Interpolation). - Real-ESRGAN x4 for upscaling to 1440p / 4K.
API Mode and Programmatic Use {#api}
ComfyUI exposes POST /prompt for queuing workflows programmatically.
import json
import requests
import uuid
# Load workflow JSON saved from UI (Save (API Format))
with open("workflow_api.json") as f:
workflow = json.load(f)
# Modify any node — e.g., set positive prompt
workflow["6"]["inputs"]["text"] = "a cyberpunk samurai at dawn"
workflow["3"]["inputs"]["seed"] = 42
# Submit
client_id = str(uuid.uuid4())
resp = requests.post("http://127.0.0.1:8188/prompt", json={
"prompt": workflow,
"client_id": client_id,
})
prompt_id = resp.json()["prompt_id"]
# Poll history
history = requests.get(f"http://127.0.0.1:8188/history/{prompt_id}").json()
WebSocket (/ws) streams progress events. Output images are at /view?filename=...&subfolder=....
VRAM Optimization Tricks {#vram-optimization}
| Trick | VRAM Saved | Quality Cost |
|---|---|---|
FP8 weights (--fast) | 50% | <1% |
| GGUF Q8_0 | 50% | <1% |
| GGUF Q4_K_S | 75% | 3-5% |
| Tile VAE Decode | ~30% peak | 0% |
Sequential Offload (--lowvram) | 70% | -30% speed |
| CFG Rescale (skip CFG dup) | 50% during sampling | 0% |
| BFloat16 over FP32 | 50% | 0% |
| Reduce resolution then upscale | proportional | varies |
For 12 GB VRAM running Flux Dev: --fast + Q8 T5 + Tile VAE + --lowvram if needed.
Speed Tuning: Sage Attention, Triton, Compile {#speed-tuning}
Sage Attention
pip install sageattention
python main.py --use-sage-attention
Sage Attention is a faster attention implementation than xformers / SDPA on Ada and Blackwell. 15-30% throughput improvement on Flux and SDXL.
Triton (Linux/WSL2)
pip install triton
Triton enables more efficient kernels. Required by Sage Attention and several custom node packs. Windows native does not officially support Triton; use WSL2.
torch.compile
Some custom node packs (kijai's wrappers, ComfyUI-MultiGPU) expose torch.compile modes. Compilation takes 1-3 minutes on first run but generation is 10-25% faster afterward. Mode max-autotune is fastest but adds ~5 min compile time.
TeaCache / FBCache
Caches diffusion model attention/MLP outputs across consecutive timesteps. 1.5-2.0x speedup with 1-3% quality loss. Custom nodes: ComfyUI-TeaCache, ComfyUI-FBCache.
Common Custom Node Packs Worth Installing {#custom-nodes}
| Pack | Purpose |
|---|---|
| ComfyUI-Manager | Mandatory |
| ComfyUI_IPAdapter_plus | IPAdapter |
| ComfyUI-Advanced-ControlNet | Better ControlNet |
| ComfyUI-Impact-Pack | Detailers, face fix |
| rgthree-comfy | Better UI nodes (mute, group bypass) |
| ComfyUI-GGUF | GGUF-quantized model loaders |
| ComfyUI-Frame-Interpolation | RIFE / FILM video frame interp |
| ComfyUI-VideoHelperSuite | Video I/O |
| ComfyUI-TeaCache | 2x speed for diffusion |
| was-node-suite-comfyui | Misc utility nodes |
| ComfyUI-KJNodes | kijai's nodes for Wan, Hunyuan, Mochi |
| ComfyUI-MultiGPU | Run encoder on GPU 1, UNet on GPU 0 |
| ComfyUI-Custom-Scripts | UI improvements |
Troubleshooting {#troubleshooting}
| Symptom | Cause | Fix |
|---|---|---|
| OOM on first generation | VRAM too tight | Add --lowvram or use FP8/GGUF |
| Black output | NaN in VAE | Switch to fp16 VAE, or --no-half-vae |
| Workflow won't load | Missing custom nodes | Manager → Install Missing Custom Nodes |
| Very slow on RTX 40 | Not using FP8 | Add --fast flag |
| Flux looks washed out | Wrong sampler | Use euler + simple, cfg=1.0 |
| ControlNet has no effect | Wrong base model | SDXL ControlNet on SDXL only, etc. |
| LoRA doesn't trigger | Missing trigger words | Check Civitai page for prompt tokens |
| Video sequences flicker | No frame consistency | Enable fp16_attention in Wan / Hunyuan nodes |
| Crashes on large images | Tile VAE not enabled | Add Tile VAE Decode node |
| AMD users: Vulkan slow | Use ROCm | See AMD ROCm Setup |
FAQ {#faq}
What is ComfyUI and why is it the dominant Stable Diffusion frontend in 2026?
ComfyUI is a node-based graphical interface for diffusion models. Each step (load checkpoint, encode prompt, sample, decode VAE, save image) is a node, and you wire them into a graph. This visual programming model is more flexible than Automatic1111 / Forge / Fooocus because it exposes every internal step — making it the default choice for advanced workflows like IPAdapter chains, ControlNet stacks, regional prompting, multi-stage refiners, and video pipelines. It also has the fastest support for new models — Flux, FLUX.2 [klein], SD 3.5, Wan 2.2, and HunyuanVideo all shipped with ComfyUI workflows on day one.
What hardware do I need for ComfyUI?
Minimum: NVIDIA GPU with 6 GB VRAM (SD 1.5 only), 16 GB system RAM. Recommended: RTX 3060 12 GB or RTX 4070 12 GB for SDXL and Flux Schnell. Ideal: RTX 4090 24 GB or RTX 5090 32 GB for Flux Dev, SD 3.5 Large, and video models. AMD Radeon 7900 XTX works via ROCm at 70-85% of NVIDIA performance — see our AMD ROCm guide. Apple Silicon (M2 or newer) works via MPS but is 3-5x slower than NVIDIA. CPU-only is not practical — generation times run to tens of minutes per image.
How is ComfyUI different from Automatic1111, Forge, and Fooocus?
Automatic1111 (A1111) is the original tab-based UI — easiest for beginners, slower with new model support. Forge is a fork of A1111 with significantly faster sampling and lower VRAM use; a great middle-ground. Fooocus is a stripped-down ComfyUI backend with a one-click UI optimized for SDXL — best for "just give me good images" users. ComfyUI is the most flexible and the fastest to support new models, but has a steeper learning curve. For most serious users in 2026: start in Fooocus or Forge, graduate to ComfyUI when you need ControlNet stacks, IPAdapter, or video.
How do I install ComfyUI on Windows, Linux, or Mac?
Easiest: download the portable Windows build (ComfyUI_windows_portable.7z), which includes Python, PyTorch with CUDA, and ComfyUI — unzip and run run_nvidia_gpu.bat. For Linux/Mac: git clone https://github.com/comfyanonymous/ComfyUI, create a venv, and pip install -r requirements.txt. The PyTorch index URL must match your hardware: cu124 for NVIDIA, rocm6.2 for AMD, default for Mac MPS. After install, place .safetensors model files in models/checkpoints/ and they appear in the load-checkpoint dropdown.
What is ComfyUI Manager and why do I need it?
ComfyUI Manager is a custom node that adds a Manager button to the UI for installing other custom nodes, missing models, and updates. Without it you must clone repos manually into custom_nodes/. With it, you click "Install Missing Custom Nodes" when you load a workflow that needs them, "Install Models" for missing checkpoints/LoRAs, and "Update All" to keep everything current. Install it once via git clone https://github.com/ltdrdata/ComfyUI-Manager custom_nodes/ComfyUI-Manager and restart. It is effectively mandatory for any non-trivial workflow.
Should I use Flux, SDXL, or SD 3.5 in 2026?
Flux Dev is the best general-purpose model in 2026 — superb prompt adherence, photorealism, and text rendering, but 12B parameters means ~24 GB VRAM in BF16 (or ~12 GB with Q8 GGUF). FLUX.2 [klein] (4B, Apache 2.0) is the fast, commercial-friendly option that runs in ~13 GB. SDXL is still the best for fine-tuning and the LoRA ecosystem (10,000+ community LoRAs). SD 3.5 Large sits between them on quality with a more permissive license than Flux Dev. One-pick-fits-all: Flux Dev for quality, SDXL for LoRAs, SD 3.5 Large when license matters, FLUX.2 [klein] when you need speed plus commercial rights.
How do I run video generation (Wan 2.2, HunyuanVideo, Mochi) in ComfyUI?
All three have official ComfyUI nodes. Wan 2.2 is the easiest entry — runs in ~24 GB VRAM with Q8 GGUF and produces 5-10 second 720p clips at 24fps (Wan 2.5 exists but is API-only with no public local weights as of mid-2026, so 2.2 remains the latest for local use). HunyuanVideo is higher quality; the lighter HunyuanVideo 1.5 (8.3B) runs natively in 24 GB. Mochi is the lowest-VRAM option (~16 GB). Render times on an RTX 4090: a Wan 2.2 5-second clip ≈ 4-8 minutes; HunyuanVideo same length ≈ 12-20 minutes.
How do I save and share ComfyUI workflows?
ComfyUI embeds the entire workflow JSON inside every PNG it saves — drag any image generated by ComfyUI back into the canvas and the full workflow loads. To share without an image, use Save (top right) → JSON, or Save (API Format) for programmatic use via the /prompt REST endpoint. The community shares workflows via OpenArt.ai, the Civitai workflows tab, and r/comfyui. When loading someone else's workflow, ComfyUI Manager prompts to install any missing custom nodes automatically.
Sources: ComfyUI GitHub | ComfyUI Manager | Black Forest Labs Flux | Stability AI SD 3.5 | kijai's video nodes | city96 GGUF nodes | Internal benchmarks on RTX 3090, 4090, 5090, RX 7900 XTX, M4 Max.
Related guides on Local AI Master:
Generating images locally? Take it further.
From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
- PILLARRun FLUX.1 Locally in 2026: VRAM Needs + 5-Minute Setup
- Best GPU for Local AI Image Generation (2026): Ranked
- Best Local AI Image Models 2026: FLUX vs SDXL vs Qwen
- ComfyUI FLUX Workflow (2026): JSON Nodes Explained
- FLUX VRAM Requirements by GPU (2026): 8GB to 24GB Guide
- Image-to-Text AI: 89% Caption Accuracy (2026)
- Ollama Image Generation: Run Z-Image & FLUX.2 Locally (2026)
- Run FLUX on 6-8GB VRAM (2026): GGUF & Offloading
- Run FLUX.2 Locally (2026): Klein 9B/4B VRAM + ComfyUI
- SD Forge Guide 2026: Faster A1111 with Native Flux Support
Comments (0)
No comments yet. Be the first to share your thoughts!