★ Reading this for free? Get 20 structured AI courses + per-chapter AI tutor — the first chapter of every course free, no card.Start free in 30 seconds
Image Generation

Stable Diffusion Local Install (2026): VRAM, Setup, Models

June 20, 2026
12 min read
Local AI Master Research Team

Want to go deeper than this article?

Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.

📚AI Learning Path

Generating images locally? Take it further. From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Start free
Or own it for life — Lifetime $149, pay once

To run Stable Diffusion locally in 2026 you need an NVIDIA GPU with at least 8 GB of VRAM for SDXL (12 GB is the comfortable floor); SD 1.5 runs on as little as 4 GB, while SD 3.5 Large (8.1B params) wants ~11 GB even with NVIDIA's FP8 build. Pick a checkpoint family (SDXL is still the best all-round choice in 2026 thanks to its enormous LoRA and ControlNet ecosystem), grab the weights from Hugging Face or CivitAI, and run them through a front end like Forge, Fooocus, or ComfyUI. This page is the install-and-hardware hub; the step-by-step UI walkthroughs live in their own guides linked below.

If you only remember one thing: VRAM, not raw GPU speed, decides what you can run. An 8 GB card opens SDXL; 12 GB makes SDXL comfortable and brings SD 3.5 into reach; 16 GB+ removes nearly every limit on the consumer side.

Can my GPU run Stable Diffusion? (Verdict)

Here is the blunt answer by VRAM tier. These are practical, generation-time figures for 1024x1024 (SDXL/SD 3.5) or 512x512 (SD 1.5), not training.

Your VRAMWhat you can runVerdict
4 GBSD 1.5 only (512x512), slow⚠️ Minimum — works, but tight
6 GBSD 1.5 comfortably; SDXL with offload tricks⚠️ SD 1.5 tier
8 GBSDXL at 1024x1024; SD 3.5 Medium✅ The real entry point
12 GBSDXL comfortably + SD 3.5 Large (FP8)✅ Sweet spot
16 GBEverything above, faster, bigger batches✅ Comfortable
24 GB+SDXL/SD 3.5 with headroom, light training✅ No limits (consumer)

Short version: 8 GB is the honest entry point for modern image work, 12 GB is the sweet spot, and 4-6 GB confines you to SD 1.5. AMD and Apple Silicon can run Stable Diffusion too, but with caveats covered further down. Not sure how your exact card maps? Run it through our "Can I run local AI?" checker before you download gigabytes of weights.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

SD 1.5 vs SDXL vs SD 3.5: which checkpoint family?

Stability AI has shipped three major open generations that people still run locally. They are not strictly "newer is better" — each trades VRAM for quality and ecosystem depth, and the parameter counts below explain why bigger models need more memory.

ModelParametersReleasedNative resolutionPractical VRAM (local)
SD 1.5~0.98BOct 2022512x512~4 GB min
SDXL 1.0~3.5BJul 20231024x10248 GB min / 12 GB comfortable
SD 3.5 Medium2.5BOct 29, 2024up to ~1440px~9.9 GB (excl. text encoders)
SD 3.5 Large8.1BOct 22, 20241 MP (1024x1024)~18 GB FP16; ~11 GB FP8

A few honest notes on these numbers:

  • SD 1.5 is ancient by AI standards (2022) but refuses to die. Its 512x512 base and tiny ~1 GB-class weights mean it runs on almost anything, and its decade-deep library of fine-tunes and LoRAs keeps it relevant for stylized work on weak GPUs.
  • SDXL 1.0 roughly tripled the parameters to ~3.5B and moved to native 1024x1024. This is the model most local creators still build around in 2026.
  • SD 3.5 Large is the highest-quality open Stability model at 8.1B parameters, but at FP16 it needs around 18 GB of VRAM. NVIDIA and Stability shipped an FP8 build that cuts that requirement by roughly 40% to about 11 GB, which is what makes it viable on a 12 GB card.
  • SD 3.5 Medium (2.5B) is the consumer-targeted variant — Stability quotes ~9.9 GB of VRAM excluding text encoders, so plan for a bit more in practice.

All four are distributed for download from Stability AI's Hugging Face organization. SD 3.5 ships under the Stability AI Community License (free for research and for commercial use under $1M annual revenue).

Why does SDXL still win in 2026?

Newer does not mean dominant. Even with FLUX and SD 3.5 available, SDXL remains the default for most serious local workflows, and the reason is ecosystem, not base-model quality:

  1. The LoRA library is enormous. CivitAI hosts thousands of SDXL style, character, and concept LoRAs — the deepest catalog of any open image model. If you want a specific art style, character, or look, the odds it already exists as an SDXL LoRA are high.
  2. The ControlNet and inpainting stack is the most mature. SDXL has well-supported ControlNet models (pose, depth, edges, etc.), plus battle-tested inpainting, outpainting, and img2img workflows that "just work" across Forge, ComfyUI, and A1111.
  3. It fits 8 GB. SDXL runs at native 1024x1024 on an 8 GB card, where SD 3.5 Large and FLUX push you toward 12 GB+ or aggressive quantization.

FLUX produces cleaner prompt-following and better photorealism out of the box, and for some users it is the better default in 2026 — but its LoRA/ControlNet tooling is still catching up to SDXL's. If you depend on specific community fine-tunes, SDXL's ecosystem is currently deeper. For the FLUX side of that trade-off, including its VRAM tiers, see our guide on running FLUX.1 locally.

Where do I get checkpoints and models?

Two sources cover virtually everything:

  • Hugging Face — the official, canonical source for base weights (SD 1.5, SDXL, SD 3.5). Use this for the original Stability models and reputable community releases. Files are usually .safetensors.
  • CivitAI — the community hub for fine-tuned checkpoints, LoRAs, ControlNet models, embeddings, and VAEs. This is where the thousands of SDXL community models live. Filter by base model (SD 1.5 / SDXL / SD 3.5) so you download a LoRA that matches your checkpoint — an SDXL LoRA will not load against an SD 1.5 checkpoint.

Always prefer .safetensors over .ckpt (the older pickle format can execute arbitrary code on load). Drop checkpoints into your UI's models folder (for example, models/Stable-diffusion in A1111/Forge, or models/checkpoints in ComfyUI), and put LoRAs in the matching Lora / loras folder.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

How do I actually install it? (Setup paths)

There is no single "Stable Diffusion installer" — you install a front end (UI) that loads the checkpoints. Pick one based on how much control you want:

Front endBest forDifficulty
FooocusBeginners — Midjourney-style, near-zero configEasiest
ForgeA1111 users who want speed + native FLUXEasy-Medium
AUTOMATIC1111The classic full-featured WebUI, max extensionsMedium
ComfyUIPower users — node graphs, full pipeline controlAdvanced

Rather than duplicate hundreds of lines of click-by-click steps here, follow the dedicated walkthrough for whichever UI you choose — each is kept current with 2026 versions:

  • Fooocus guide — the easiest path; great if you just want good SDXL images with almost no setup.
  • SD Forge guide — a faster A1111 fork with native FLUX support; the pragmatic middle ground.
  • AUTOMATIC1111 guide — the original, extension-rich WebUI most tutorials assume.
  • ComfyUI guide — node-based, the most flexible front end and the reference UI for SD 3.5 and FLUX.

The common flow is the same regardless of UI: install Python + Git, clone or download the UI, drop a checkpoint into its models folder, and launch. On NVIDIA this is the smoothest; AMD and Apple need the extra steps below.

Best Stable Diffusion models per VRAM tier

If you are choosing a checkpoint to match your card, this is the practical mapping. "Comfortable" means 1024x1024 (or 512x512 for SD 1.5) without offload hacks.

Your GPU VRAMRecommended checkpoint familyWhy
4-6 GBSD 1.5 (and SD 1.5 fine-tunes)Only family that fits comfortably; huge LoRA library
8 GBSDXL 1.0 (+ SDXL fine-tunes)Native 1024x1024 fits; deepest ControlNet/LoRA stack
12 GBSDXL + SD 3.5 Large (FP8)SDXL with headroom; SD 3.5 Large becomes viable at FP8 (~11 GB)
16 GB+Any: SDXL, SD 3.5 Large, FLUXRoom for big batches, high res, and quantized FLUX

For an 8 GB card the answer is almost always SDXL — it is the highest-quality family that fits at full resolution and has the richest tooling. For 12 GB, SDXL stays the workhorse but SD 3.5 Large in FP8 is worth keeping around for its prompt fidelity. To size a specific quant against your exact card and context, our VRAM requirements guide breaks down the math.

NVIDIA vs AMD vs Apple Silicon

Stable Diffusion runs on all three, but the experience is not equal:

  • NVIDIA (recommended): CUDA is the native target. Everything — Forge, ComfyUI, A1111, every extension — is built and tested for it first. An RTX-class card with 8-12 GB is the path of least resistance and the fastest per dollar.
  • AMD: Works, with caveats. On Linux with ROCm properly configured, AMD cards reach near-CUDA performance. On Windows you are on DirectML or ZLUDA (community projects such as the AMD-GPU A1111/Forge forks), where performance typically runs roughly 30-50% behind an equivalent NVIDIA card. Doable, but expect setup friction.
  • Apple Silicon (M-series): Runs via the Metal Performance Shaders (MPS) backend. SDXL works on a 16 GB-plus Mac, and native apps like Draw Things use Apple's Metal engine for a smoother experience — roughly an SDXL 1024x1024 image in the ~25-40 second range on an M2 Pro per community reports. Usable and convenient, but a Mac generates noticeably slower than a comparable CUDA GPU, so it is a good "I already own one" option rather than a machine to buy specifically for image generation.

First-hand notes from a local box

Take these as approximate, single-machine observations, not a controlled benchmark. On an RTX 3060 12 GB with Forge, SDXL at 1024x1024 / ~25 steps lands roughly in the 12-20 second per image range once the model is loaded, and SD 1.5 at 512x512 is near-instant by comparison. SD 3.5 Large in its FP8 build fit on the same 12 GB card but ran appreciably slower and left far less headroom for big batches. The pattern matches the rule above: VRAM determines what loads at all, and once a model fully fits in VRAM, generation is fast — the moment it spills to system RAM, speed collapses. If your generations suddenly crawl, you have almost certainly run out of VRAM and the UI is offloading.

Key Takeaways

  1. 8 GB of NVIDIA VRAM is the honest entry point for modern Stable Diffusion (SDXL). SD 1.5 runs on 4 GB; 12 GB is the comfortable sweet spot and unlocks SD 3.5 Large in FP8.
  2. SDXL (3.5B, 2023) still wins for most local workflows in 2026 — not because the base is best, but because its LoRA, ControlNet, and inpainting ecosystem is the deepest of any open model.
  3. SD 3.5 Large (8.1B) needs ~18 GB at FP16, ~11 GB with the FP8 build; SD 3.5 Medium (2.5B) targets consumer GPUs at ~9.9 GB excluding text encoders.
  4. Get base weights from Hugging Face, community fine-tunes/LoRAs from CivitAI, and always prefer .safetensors — matching the LoRA's base model to your checkpoint.
  5. You install a front end, not "Stable Diffusion" itself — Fooocus (easiest), Forge (fast), A1111 (classic), or ComfyUI (most flexible). NVIDIA is smoothest; AMD and Apple work with extra setup and slower speeds.

Next Steps

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once

Liked this? 20 full AI courses are waiting.

From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.

Reading now
Join the discussion

Local AI Master Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

Build Real AI on Your Machine

RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.

Want structured AI education?

20 courses, 495+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path
More on Local Image Generation
See the full Run FLUX.1 Locally guide.

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: June 20, 2026🔄 Last Updated: June 20, 2026✓ Manually Reviewed

Ready to Go Beyond Tutorials?

20 structured courses with hands-on chapters - build RAG chatbots, AI agents, and ML pipelines on your own hardware.

🎯
AI Learning Path

Go from reading about AI to building with AI

20 structured courses. Hands-on projects. Runs on your machine. Start free.

Or own it for life — Lifetime $149 $599, pay once

Was this helpful?

LM

Written by the Local AI Master Team

The team behind Local AI Master

We build Local AI Master around practical, testable local AI workflows: model selection, hardware planning, RAG systems, agents, and MLOps. The goal is to turn scattered tutorials into a structured learning path you can follow on your own hardware.

✓ Local AI Curriculum✓ Hands-On Projects✓ Open Source Contributor
📚
Free · no account required

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

No spam. Unsubscribe with one click.

🎯
AI Learning Path

Generating images locally? Take it further.

From FLUX and ComfyUI setup to building real image pipelines and apps. First chapter free, no card.

Or own it for life — Lifetime $149 $599, pay once
Free Tools & Calculators