ComfyUI Complete Guide (2026): Install, Workflows, ControlNet, Flux, SDXL
Want to go deeper than this article?
Free account unlocks the first chapter of all 17 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
ComfyUI is the most powerful frontend for local image and video generation in 2026 — a node-based interface that exposes every internal step of diffusion models, supports new architectures on day one, and ships with the fastest sampling implementations. This guide covers everything: installation across NVIDIA / AMD / Mac, the node graph mental model, ControlNet, IPAdapter, regional prompting, LoRA stacks, Flux / SDXL / SD 3.5 workflows, video generation (Wan, HunyuanVideo, Mochi), and serious tuning for 24 GB and below.
Whether you are coming from Automatic1111, Forge, or fresh to local image gen, this is the reference.
Table of Contents
- What ComfyUI Is and Why It Wins
- Hardware Requirements
- Installation: Windows, Linux, macOS
- Folder Layout & Where Models Go
- ComfyUI Manager — Mandatory First Install
- The Node Graph Mental Model
- Your First Workflow: SDXL Text-to-Image
- Models in 2026: Flux, SDXL, SD 3.5, Pony, Illustrious
- LoRAs and Embeddings
- ControlNet — Composition, Pose, Depth, Canny
- IPAdapter — Image Prompts and Style Transfer
- Inpainting, Outpainting, and Upscaling
- Regional Prompting and Conditioning Combine
- Flux: Dev, Schnell, GGUF, Quantized
- Video Generation: Wan 2.2, HunyuanVideo, Mochi
- API Mode and Programmatic Use
- VRAM Optimization Tricks
- Speed Tuning: Sage Attention, Triton, Compile
- Common Custom Node Packs Worth Installing
- Troubleshooting
- FAQ
Reading articles is good. Building is better.
Free account = 17+ structured chapters across 17 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
What ComfyUI Is and Why It Wins {#what-is}
Diffusion image generation is a pipeline:
Prompt → Text Encoder → Conditioning
↓
Empty Latent → Sampler ← Model (UNet/DiT)
↓
VAE Decode → Image
A1111 / Forge / Fooocus hide this pipeline behind a tabbed UI. ComfyUI exposes it as a graph: nodes for "Load Checkpoint," "CLIP Text Encode," "KSampler," "VAE Decode," "Save Image," with explicit data flow between them. You can branch, merge, swap, and chain nodes arbitrarily.
This is why ComfyUI is the fastest to support new models: when Flux launched, supporting it was a matter of writing a new "Load Diffusion Model" node and a new sampler — no UI surgery required. Same for SD 3.5, Wan 2.2, HunyuanVideo, and every model since 2023.
The trade-off: more upfront learning. But every workflow is a JSON file you can load with one drag-and-drop. The community ships pre-built workflows for almost every common task.
Hardware Requirements {#requirements}
| Tier | GPU | Models You Can Run |
|---|---|---|
| Minimum | 6 GB VRAM | SD 1.5 only |
| Entry | 8 GB VRAM (RTX 3060 8GB) | SDXL Q8, SD 1.5 |
| Recommended | 12 GB (RTX 3060 12GB / 4070) | SDXL FP16, Flux Schnell GGUF Q4 |
| Sweet spot | 16 GB (RTX 4080 / 5070 Ti) | Flux Dev FP8, SD 3.5 Large |
| High-end | 24 GB (RTX 3090 / 4090) | Flux Dev BF16, Wan 2.2, HunyuanVideo Q4 |
| Top | 32+ GB (RTX 5090 / RTX 6000 Ada) | HunyuanVideo BF16, full Mochi |
RAM: at least 2x your largest model file in system RAM for offload buffers. 32 GB is the realistic minimum, 64 GB recommended for video.
Disk: Flux Dev = 24 GB, SDXL = 7 GB, SD 1.5 = 4 GB, plus VAEs (300 MB each), CLIP (1-5 GB), LoRAs (10-500 MB each), ControlNet (1.5 GB each). Plan for 200-500 GB on NVMe.
AMD: RX 7900 XTX works via ROCm at 70-85% of equivalent NVIDIA speed. See AMD ROCm Setup.
Apple Silicon: M2 or newer via MPS, but 3-5x slower than NVIDIA. Use MLX-based alternatives like Draw Things for better performance.
Installation: Windows, Linux, macOS {#installation}
Windows (portable — recommended)
- Download the latest
ComfyUI_windows_portable.7zfrom github.com/comfyanonymous/ComfyUI/releases. - Extract with 7-Zip to e.g.
D:\ComfyUI\. - Run
run_nvidia_gpu.bat(NVIDIA) orrun_cpu.bat(no GPU). - Browser opens to
http://127.0.0.1:8188.
Linux / Mac (git)
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python3.11 -m venv venv
source venv/bin/activate
# NVIDIA CUDA 12.4
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
# AMD ROCm 6.2
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2
# Mac MPS
pip install torch torchvision
# ComfyUI dependencies
pip install -r requirements.txt
# Run
python main.py --listen 0.0.0.0
Docker
docker run --gpus all \
-p 8188:8188 \
-v $(pwd)/models:/app/models \
-v $(pwd)/output:/app/output \
-v $(pwd)/workflows:/app/workflows \
yanwk/comfyui-boot:latest
yanwk/comfyui-boot includes Manager, common custom nodes, and starter models.
Useful launch flags
| Flag | Purpose |
|---|---|
--listen 0.0.0.0 | Accept connections from LAN |
--port 8188 | Change port |
--lowvram | Aggressive memory offload (12 GB and below) |
--novram | CPU offload everything (only for emergencies) |
--use-pytorch-cross-attention | Force PyTorch SDPA (most stable) |
--use-sage-attention | Use Sage Attention (faster, see Speed Tuning) |
--fast | Enable FP8 ops on Ada/Hopper/Blackwell |
--cache-classic | Old caching behavior (workaround for some custom nodes) |
--preview-method auto | Show in-progress previews during sampling |
Reading articles is good. Building is better.
Free account = 17+ structured chapters across 17 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Folder Layout & Where Models Go {#folder-layout}
ComfyUI/
├── models/
│ ├── checkpoints/ # SD 1.5, SDXL, SD 3.5, Pony, Illustrious .safetensors
│ ├── unet/ # Flux UNet/DiT files
│ ├── diffusion_models/ # Newer alias for unet/
│ ├── clip/ # T5 / CLIP-L / CLIP-G text encoders
│ ├── vae/ # VAE decoders
│ ├── loras/ # LoRA files
│ ├── controlnet/ # ControlNet models
│ ├── embeddings/ # Textual inversion embeddings
│ ├── ipadapter/ # IPAdapter models
│ ├── upscale_models/ # 4x-UltraSharp, RealESRGAN, etc.
│ └── animatediff_models/
├── custom_nodes/ # Third-party node packs
├── workflows/ # Saved JSON workflows
├── input/ # Images for img2img, ControlNet
└── output/ # Generated images
To share models between A1111/Forge and ComfyUI without duplicating files:
# Edit ComfyUI/extra_model_paths.yaml
a111:
base_path: /path/to/stable-diffusion-webui/
checkpoints: models/Stable-diffusion
loras: models/Lora
controlnet: models/ControlNet
upscale_models: models/ESRGAN
embeddings: embeddings
vae: models/VAE
ComfyUI Manager — Mandatory First Install {#manager}
cd ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager
Restart ComfyUI. The "Manager" button appears on the right sidebar.
What it does:
- Install Missing Custom Nodes — when you load someone else's workflow.
- Install Models — auto-downloads missing checkpoints/LoRAs/ControlNets.
- Update All — keeps ComfyUI and all custom nodes current.
- Snapshot / Restore — versioned backups before risky updates.
- Disable / Uninstall — surgically remove a custom node.
Without Manager, every missing node is a manual git clone. Install it first; do not skip.
The Node Graph Mental Model {#node-graph}
A workflow is a directed graph. Each node:
- Has inputs (connectors on the left)
- Has outputs (connectors on the right)
- Performs a function (load model, encode text, sample, decode VAE)
Connections carry typed values:
MODEL— diffusion modelCLIP— text encoderVAE— image encoder/decoderCONDITIONING— encoded promptLATENT— latent image (compressed representation)IMAGE— pixel imageMASK— binary maskCONTROL_NET— ControlNet modelSTRING,INT,FLOAT— primitives
The default workflow looks like:
[Load Checkpoint] → MODEL ─┐
→ CLIP │
→ VAE ───┼──┐
│ │
[CLIP Text Encode pos] ←──┘ │
↓ CONDITIONING │
[KSampler] ←──────────────────┘
↑ LATENT (from Empty Latent Image)
↓ LATENT (after sampling)
[VAE Decode] → IMAGE
↓
[Save Image]
Master this and you can build anything. Common patterns:
- Two samplers in series — base + refiner (SDXL).
- Conditioning combine — merge two prompts with weights.
- ControlNet apply — modulate conditioning with control image.
- Latent composite — blend two latents before decoding.
Your First Workflow: SDXL Text-to-Image {#first-workflow}
- Download
sd_xl_base_1.0.safetensorsfrom Hugging Face → put inmodels/checkpoints/. - Open ComfyUI → Load Default workflow (right sidebar).
- In Load Checkpoint node, select
sd_xl_base_1.0.safetensors. - In Empty Latent Image, set width/height to 1024×1024.
- In CLIP Text Encode (Positive), write your prompt: e.g. "cinematic photo of a samurai standing in a misty forest, ultra-detailed, 35mm film".
- In CLIP Text Encode (Negative), write: "blurry, deformed, extra fingers, low quality, bad anatomy".
- In KSampler, set steps=25, cfg=7.0, sampler=
dpmpp_2m, scheduler=karras. - Click Queue Prompt.
Expected time on RTX 4090: ~3-5 seconds per 1024² image.
Models in 2026: Flux, SDXL, SD 3.5, Pony, Illustrious {#models}
| Model | Params | License | VRAM (BF16) | Strengths |
|---|---|---|---|---|
| Flux Dev | 12B | Non-commercial | ~24 GB | Best general quality, prompt adherence |
| Flux Schnell | 12B | Apache 2.0 | ~24 GB | 4-step distilled — fastest top-tier |
| SD 3.5 Large | 8B | Stability AI Community | ~16 GB | Permissive license, strong prompts |
| SD 3.5 Medium | 2.5B | Stability AI Community | ~10 GB | Lower VRAM, decent quality |
| SDXL 1.0 / Lightning | 3.5B | OpenRAIL | ~10 GB | Largest LoRA ecosystem |
| Pony Diffusion v6 XL | 3.5B | Fair AI | ~10 GB | Anime / character / NSFW |
| Illustrious XL | 3.5B | Fair AI | ~10 GB | Anime, cleaner than Pony |
| SD 1.5 | 0.9B | OpenRAIL | ~4 GB | Legacy, fast iteration |
Flux GGUF / FP8 quantized
Flux Dev in BF16 needs ~24 GB. To run on 12-16 GB:
- Flux Dev FP8 (
flux1-dev-fp8.safetensors) — ~12 GB, near-identical quality. Use--fastflag. - Flux Dev GGUF Q8_0 — ~13 GB.
- Flux Dev GGUF Q4_K_S — ~7 GB. Use the
ComfyUI-GGUFcustom node by city96.
[UnetLoader (GGUF)] → MODEL
[DualCLIPLoader (GGUF)] → CLIP # T5-XXL + CLIP-L
[Load VAE] → VAE
LoRAs and Embeddings {#loras}
LoRA chains
[Load Checkpoint] → MODEL ──────┐
→ CLIP ───────┤
↓
[Load LoRA #1]
↓
[Load LoRA #2]
↓
[Load LoRA #3]
↓
MODEL → KSampler
CLIP → CLIP Text Encode
Each LoRA node has strength_model and strength_clip (0.0-1.5 typical, 1.0 default). Stack as many as you want — but >3 strong LoRAs usually conflict.
Embeddings (textual inversion)
Reference in your prompt with embedding:filename (without extension):
masterpiece, beautiful landscape, embedding:negative_easy in negative
Place .pt / .safetensors files in models/embeddings/.
Best LoRA sources in 2026
- Civitai — largest collection, NSFW filter optional.
- Hugging Face — official model authors.
- Tensor.Art — curated workflows.
Always check the LoRA's recommended trigger words and base model — an SDXL LoRA does not work on Flux.
ControlNet — Composition, Pose, Depth, Canny {#controlnet}
ControlNet conditions generation on a structural input (pose, depth map, edges, etc.).
Pattern
[Load Image] → IMAGE → [Canny Preprocessor] → IMAGE
↓
[Load ControlNet (canny)] → CONTROL_NET ─────┐ │
↓ ↓
[CLIP Text Encode pos] → CONDITIONING → [Apply ControlNet] → CONDITIONING
↓
[KSampler]
Common preprocessors
| Type | Use Case |
|---|---|
| Canny | Preserve edges from reference image |
| Depth (MiDaS, Marigold) | Match 3D structure |
| OpenPose | Match human pose |
| DWPose | Higher-quality OpenPose alternative |
| LineArt | Line drawings, anime |
| Scribble | Rough sketches → finished images |
| Tile | Upscale-friendly preservation |
| Reference | Style match without LoRA |
| InstantID / IP-Composition | Face / composition transfer |
SDXL vs Flux ControlNet
SDXL has the most mature ControlNet ecosystem (xinsir, kohya, lllyasviel). Flux ControlNets are catching up — InstantX, Shakker-Labs, and Black Forest Labs ship Flux ControlNets but coverage is narrower.
IPAdapter — Image Prompts and Style Transfer {#ipadapter}
IPAdapter conditions generation on a reference image (instead of a text prompt). Best for style transfer, character consistency, and "make it look like this" workflows.
[Load Image (reference)] → IMAGE
↓
[IPAdapter Unified Loader] → MODEL, IPADAPTER
↓
[IPAdapter Advanced] ← MODEL ←─────┘
↓ MODEL
[KSampler]
Use the ComfyUI_IPAdapter_plus custom node by cubiq.
IPAdapter strength
| Strength | Effect |
|---|---|
| 0.3 | Subtle style hint |
| 0.6 | Clear style influence |
| 0.9-1.0 | Strong reference dominance |
| 1.2+ | Reference overrides prompt |
IPAdapter FaceID
For consistent character across generations: ip-adapter-faceid-portrait_sdxl.bin + face embedding extracted with InsightFace. One reference image → consistent character in any pose / scene / outfit.
Inpainting, Outpainting, and Upscaling {#inpainting}
Inpainting
[Load Image] → IMAGE
[Load Image (mask)] → IMAGE
[Image to Mask] → MASK
[VAE Encode (Inpaint)] → LATENT (with masked region noised)
[KSampler] (with denoise=0.8) → LATENT
[VAE Decode] → IMAGE
Use the dedicated inpainting checkpoint (e.g. sd_xl_base_1.0_inpainting_0.1.safetensors) for best results. Set sampler denoise to 0.7-0.95.
Outpainting
Use Pad Image for Outpainting node → mask the new edges → inpaint.
Upscaling
Two stages:
- Latent upscale (cheap, blurry) —
Latent Upscale bynode, factor 2.0. - Image upscale model (sharp) —
Upscale Image (using Model)with 4x-UltraSharp or RealESRGAN_x4plus_anime_6B for anime.
Or iterative upscale: small image → upscale 1.5x → low-denoise sampler pass → upscale 1.5x again. Best quality, slowest.
Regional Prompting and Conditioning Combine {#regional}
To prompt different regions of the image differently (left side: knight, right side: wizard):
Use ComfyUI_Cutoff or ComfyUI-RegionalPrompter custom nodes. Pattern:
[CLIP Text Encode "knight"] → COND_LEFT
[CLIP Text Encode "wizard"] → COND_RIGHT
[Conditioning (Set Area)] (left half) → COND_LEFT_AREA
[Conditioning (Set Area)] (right half) → COND_RIGHT_AREA
[Conditioning Combine] → COND_FINAL → KSampler
Areas are specified in pixel coordinates. Resolution must match Empty Latent Image size.
Flux: Dev, Schnell, GGUF, Quantized {#flux}
Flux is a 12B-parameter Diffusion Transformer (DiT) — different architecture from SD's UNet, with better prompt adherence and text rendering.
Files needed
models/unet/flux1-dev.safetensors # 24 GB BF16
models/clip/t5xxl_fp16.safetensors # 9.8 GB
models/clip/clip_l.safetensors # 246 MB
models/vae/ae.safetensors # 335 MB
For 16 GB VRAM use flux1-dev-fp8.safetensors (12 GB) and t5xxl_fp8_e4m3fn.safetensors (5 GB).
Workflow
[Load Diffusion Model] → MODEL (flux1-dev)
[DualCLIPLoader] → CLIP (clip_l + t5xxl)
[Load VAE] → VAE (ae.safetensors)
[CLIP Text Encode] → CONDITIONING
[Empty Latent Image] (1024×1024) → LATENT
[KSamplerAdvanced] (20 steps, cfg=1.0, euler, simple scheduler)
[VAE Decode] → IMAGE
Flux uses cfg=1.0 (no classifier-free guidance) — set CFG to 1.0 always. Different from SD which uses CFG 5-10.
Flux Schnell
Same workflow, but use flux1-schnell.safetensors and 4 steps. ~5x faster, slightly lower quality.
Flux LoRA
[Load LoRA] (after Load Diffusion Model)
Most Flux LoRAs work at 0.7-1.0 strength. Civitai now has 3,000+ Flux LoRAs.
Video Generation: Wan 2.2, HunyuanVideo, Mochi {#video}
Wan 2.2 (recommended starting point)
Alibaba's open-source video model. 5-10 second 720p clips, ~24 GB VRAM at Q8 GGUF.
[UnetLoader (GGUF)] → MODEL (wan2.2-i2v-q8_0.gguf)
[DualCLIPLoader] → CLIP (umt5-xxl encoder)
[Load VAE] → VAE
[Load Image] → IMAGE (first frame for image-to-video)
[CLIP Text Encode] → CONDITIONING
[WanImageToVideo] → LATENT (sequence)
[KSampler] (30 steps)
[VAE Decode (sequence)] → IMAGES
[Video Combine (FFmpeg)] → MP4
Render time on RTX 4090: ~6-10 minutes for 5 seconds at 720p.
HunyuanVideo
Tencent's 13B video DiT. Highest quality, ~40 GB BF16 (fits in 24 GB at Q4 GGUF). Use kijai/ComfyUI-HunyuanVideoWrapper.
Mochi
Genmo's 10B model. Lower VRAM (~16 GB), faster, slightly lower quality than Hunyuan.
Frame interpolation and upscaling
After generating a 24fps video:
- RIFE for interpolation to 60fps (
ComfyUI-Frame-Interpolation). - Real-ESRGAN x4 for upscaling to 1440p / 4K.
API Mode and Programmatic Use {#api}
ComfyUI exposes POST /prompt for queuing workflows programmatically.
import json
import requests
import uuid
# Load workflow JSON saved from UI (Save (API Format))
with open("workflow_api.json") as f:
workflow = json.load(f)
# Modify any node — e.g., set positive prompt
workflow["6"]["inputs"]["text"] = "a cyberpunk samurai at dawn"
workflow["3"]["inputs"]["seed"] = 42
# Submit
client_id = str(uuid.uuid4())
resp = requests.post("http://127.0.0.1:8188/prompt", json={
"prompt": workflow,
"client_id": client_id,
})
prompt_id = resp.json()["prompt_id"]
# Poll history
history = requests.get(f"http://127.0.0.1:8188/history/{prompt_id}").json()
WebSocket (/ws) streams progress events. Output images are at /view?filename=...&subfolder=....
VRAM Optimization Tricks {#vram-optimization}
| Trick | VRAM Saved | Quality Cost |
|---|---|---|
FP8 weights (--fast) | 50% | <1% |
| GGUF Q8_0 | 50% | <1% |
| GGUF Q4_K_S | 75% | 3-5% |
| Tile VAE Decode | ~30% peak | 0% |
Sequential Offload (--lowvram) | 70% | -30% speed |
| CFG Rescale (skip CFG dup) | 50% during sampling | 0% |
| BFloat16 over FP32 | 50% | 0% |
| Reduce resolution then upscale | proportional | varies |
For 12 GB VRAM running Flux Dev: --fast + Q8 T5 + Tile VAE + --lowvram if needed.
Speed Tuning: Sage Attention, Triton, Compile {#speed-tuning}
Sage Attention
pip install sageattention
python main.py --use-sage-attention
Sage Attention is a faster attention implementation than xformers / SDPA on Ada and Blackwell. 15-30% throughput improvement on Flux and SDXL.
Triton (Linux/WSL2)
pip install triton
Triton enables more efficient kernels. Required by Sage Attention and several custom node packs. Windows native does not officially support Triton; use WSL2.
torch.compile
Some custom node packs (kijai's wrappers, ComfyUI-MultiGPU) expose torch.compile modes. Compilation takes 1-3 minutes on first run but generation is 10-25% faster afterward. Mode max-autotune is fastest but adds ~5 min compile time.
TeaCache / FBCache
Caches diffusion model attention/MLP outputs across consecutive timesteps. 1.5-2.0x speedup with 1-3% quality loss. Custom nodes: ComfyUI-TeaCache, ComfyUI-FBCache.
Common Custom Node Packs Worth Installing {#custom-nodes}
| Pack | Purpose |
|---|---|
| ComfyUI-Manager | Mandatory |
| ComfyUI_IPAdapter_plus | IPAdapter |
| ComfyUI-Advanced-ControlNet | Better ControlNet |
| ComfyUI-Impact-Pack | Detailers, face fix |
| rgthree-comfy | Better UI nodes (mute, group bypass) |
| ComfyUI-GGUF | GGUF-quantized model loaders |
| ComfyUI-Frame-Interpolation | RIFE / FILM video frame interp |
| ComfyUI-VideoHelperSuite | Video I/O |
| ComfyUI-TeaCache | 2x speed for diffusion |
| was-node-suite-comfyui | Misc utility nodes |
| ComfyUI-KJNodes | kijai's nodes for Wan, Hunyuan, Mochi |
| ComfyUI-MultiGPU | Run encoder on GPU 1, UNet on GPU 0 |
| ComfyUI-Custom-Scripts | UI improvements |
Troubleshooting {#troubleshooting}
| Symptom | Cause | Fix |
|---|---|---|
| OOM on first generation | VRAM too tight | Add --lowvram or use FP8/GGUF |
| Black output | NaN in VAE | Switch to fp16 VAE, or --no-half-vae |
| Workflow won't load | Missing custom nodes | Manager → Install Missing Custom Nodes |
| Very slow on RTX 40 | Not using FP8 | Add --fast flag |
| Flux looks washed out | Wrong sampler | Use euler + simple, cfg=1.0 |
| ControlNet has no effect | Wrong base model | SDXL ControlNet on SDXL only, etc. |
| LoRA doesn't trigger | Missing trigger words | Check Civitai page for prompt tokens |
| Video sequences flicker | No frame consistency | Enable fp16_attention in Wan / Hunyuan nodes |
| Crashes on large images | Tile VAE not enabled | Add Tile VAE Decode node |
| AMD users: Vulkan slow | Use ROCm | See AMD ROCm Setup |
FAQ {#faq}
See answers to common ComfyUI questions below.
Sources: ComfyUI GitHub | ComfyUI Manager | Black Forest Labs Flux | Stability AI SD 3.5 | kijai's video nodes | city96 GGUF nodes | Internal benchmarks on RTX 3090, 4090, 5090, RX 7900 XTX, M4 Max.
Related guides on Local AI Master:
Go from reading about AI to building with AI
10 structured courses. Hands-on projects. Runs on your machine. Start free.
Liked this? 17 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 17 courses that take you from reading about AI to building AI.
Want structured AI education?
17 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!