To run Stable Diffusion locally in 2026 you need an NVIDIA GPU with at least 8 GB of VRAM for SDXL (12 GB is the comfortable floor); SD 1.5 runs on as little as 4 GB, while SD 3.5 Large (8.1B params) wants ~11 GB even with NVIDIA's FP8 build. Pick a checkpoint family (SDXL is still the best all-round choice in 2026 thanks to its enormous LoRA and ControlNet ecosystem), grab the weights from Hugging Face or CivitAI, and run them through a front end like Forge, Fooocus, or ComfyUI. This page is the install-and-hardware hub; the step-by-step UI walkthroughs live in their own guides linked below.

If you only remember one thing: VRAM, not raw GPU speed, decides what you can run. An 8 GB card opens SDXL; 12 GB makes SDXL comfortable and brings SD 3.5 into reach; 16 GB+ removes nearly every limit on the consumer side.

Can my GPU run Stable Diffusion? (Verdict)

Here is the blunt answer by VRAM tier. These are practical, generation-time figures for 1024x1024 (SDXL/SD 3.5) or 512x512 (SD 1.5), not training.

Your VRAM	What you can run	Verdict
4 GB	SD 1.5 only (512x512), slow	⚠️ Minimum — works, but tight
6 GB	SD 1.5 comfortably; SDXL with offload tricks	⚠️ SD 1.5 tier
8 GB	SDXL at 1024x1024; SD 3.5 Medium	✅ The real entry point
12 GB	SDXL comfortably + SD 3.5 Large (FP8)	✅ Sweet spot
16 GB	Everything above, faster, bigger batches	✅ Comfortable
24 GB+	SDXL/SD 3.5 with headroom, light training	✅ No limits (consumer)

Short version: 8 GB is the honest entry point for modern image work, 12 GB is the sweet spot, and 4-6 GB confines you to SD 1.5. AMD and Apple Silicon can run Stable Diffusion too, but with caveats covered further down. Not sure how your exact card maps? Run it through our "Can I run local AI?" checker before you download gigabytes of weights.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

SD 1.5 vs SDXL vs SD 3.5: which checkpoint family?

Stability AI has shipped three major open generations that people still run locally. They are not strictly "newer is better" — each trades VRAM for quality and ecosystem depth, and the parameter counts below explain why bigger models need more memory.

Model	Parameters	Released	Native resolution	Practical VRAM (local)
SD 1.5	~0.98B	Oct 2022	512x512	~4 GB min
SDXL 1.0	~3.5B	Jul 2023	1024x1024	8 GB min / 12 GB comfortable
SD 3.5 Medium	2.5B	Oct 29, 2024	up to ~1440px	~9.9 GB (excl. text encoders)
SD 3.5 Large	8.1B	Oct 22, 2024	1 MP (1024x1024)	~18 GB FP16; ~11 GB FP8

A few honest notes on these numbers:

SD 1.5 is ancient by AI standards (2022) but refuses to die. Its 512x512 base and tiny ~1 GB-class weights mean it runs on almost anything, and its decade-deep library of fine-tunes and LoRAs keeps it relevant for stylized work on weak GPUs.
SDXL 1.0 roughly tripled the parameters to ~3.5B and moved to native 1024x1024. This is the model most local creators still build around in 2026.
SD 3.5 Large is the highest-quality open Stability model at 8.1B parameters, but at FP16 it needs around 18 GB of VRAM. NVIDIA and Stability shipped an FP8 build that cuts that requirement by roughly 40% to about 11 GB, which is what makes it viable on a 12 GB card.
SD 3.5 Medium (2.5B) is the consumer-targeted variant — Stability quotes ~9.9 GB of VRAM excluding text encoders, so plan for a bit more in practice.

All four are distributed for download from Stability AI's Hugging Face organization. SD 3.5 ships under the Stability AI Community License (free for research and for commercial use under $1M annual revenue).

Why does SDXL still win in 2026?

Newer does not mean dominant. Even with FLUX and SD 3.5 available, SDXL remains the default for most serious local workflows, and the reason is ecosystem, not base-model quality:

The LoRA library is enormous. CivitAI hosts thousands of SDXL style, character, and concept LoRAs — the deepest catalog of any open image model. If you want a specific art style, character, or look, the odds it already exists as an SDXL LoRA are high.
The ControlNet and inpainting stack is the most mature. SDXL has well-supported ControlNet models (pose, depth, edges, etc.), plus battle-tested inpainting, outpainting, and img2img workflows that "just work" across Forge, ComfyUI, and A1111.
It fits 8 GB. SDXL runs at native 1024x1024 on an 8 GB card, where SD 3.5 Large and FLUX push you toward 12 GB+ or aggressive quantization.

FLUX produces cleaner prompt-following and better photorealism out of the box, and for some users it is the better default in 2026 — but its LoRA/ControlNet tooling is still catching up to SDXL's. If you depend on specific community fine-tunes, SDXL's ecosystem is currently deeper. For the FLUX side of that trade-off, including its VRAM tiers, see our guide on running FLUX.1 locally.

Where do I get checkpoints and models?

Two sources cover virtually everything:

Hugging Face — the official, canonical source for base weights (SD 1.5, SDXL, SD 3.5). Use this for the original Stability models and reputable community releases. Files are usually .safetensors.
CivitAI — the community hub for fine-tuned checkpoints, LoRAs, ControlNet models, embeddings, and VAEs. This is where the thousands of SDXL community models live. Filter by base model (SD 1.5 / SDXL / SD 3.5) so you download a LoRA that matches your checkpoint — an SDXL LoRA will not load against an SD 1.5 checkpoint.

Always prefer .safetensors over .ckpt (the older pickle format can execute arbitrary code on load). Drop checkpoints into your UI's models folder (for example, models/Stable-diffusion in A1111/Forge, or models/checkpoints in ComfyUI), and put LoRAs in the matching Lora / loras folder.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How do I actually install it? (Setup paths)

There is no single "Stable Diffusion installer" — you install a front end (UI) that loads the checkpoints. Pick one based on how much control you want:

Front end	Best for	Difficulty
Fooocus	Beginners — Midjourney-style, near-zero config	Easiest
Forge	A1111 users who want speed + native FLUX	Easy-Medium
AUTOMATIC1111	The classic full-featured WebUI, max extensions	Medium
ComfyUI	Power users — node graphs, full pipeline control	Advanced

Rather than duplicate hundreds of lines of click-by-click steps here, follow the dedicated walkthrough for whichever UI you choose — each is kept current with 2026 versions:

Fooocus guide — the easiest path; great if you just want good SDXL images with almost no setup.
SD Forge guide — a faster A1111 fork with native FLUX support; the pragmatic middle ground.
AUTOMATIC1111 guide — the original, extension-rich WebUI most tutorials assume.
ComfyUI guide — node-based, the most flexible front end and the reference UI for SD 3.5 and FLUX.

The common flow is the same regardless of UI: install Python + Git, clone or download the UI, drop a checkpoint into its models folder, and launch. On NVIDIA this is the smoothest; AMD and Apple need the extra steps below.

Best Stable Diffusion models per VRAM tier

If you are choosing a checkpoint to match your card, this is the practical mapping. "Comfortable" means 1024x1024 (or 512x512 for SD 1.5) without offload hacks.

Your GPU VRAM	Recommended checkpoint family	Why
4-6 GB	SD 1.5 (and SD 1.5 fine-tunes)	Only family that fits comfortably; huge LoRA library
8 GB	SDXL 1.0 (+ SDXL fine-tunes)	Native 1024x1024 fits; deepest ControlNet/LoRA stack
12 GB	SDXL + SD 3.5 Large (FP8)	SDXL with headroom; SD 3.5 Large becomes viable at FP8 (~11 GB)
16 GB+	Any: SDXL, SD 3.5 Large, FLUX	Room for big batches, high res, and quantized FLUX

For an 8 GB card the answer is almost always SDXL — it is the highest-quality family that fits at full resolution and has the richest tooling. For 12 GB, SDXL stays the workhorse but SD 3.5 Large in FP8 is worth keeping around for its prompt fidelity. To size a specific quant against your exact card and context, our VRAM requirements guide breaks down the math.

NVIDIA vs AMD vs Apple Silicon

Stable Diffusion runs on all three, but the experience is not equal:

NVIDIA (recommended): CUDA is the native target. Everything — Forge, ComfyUI, A1111, every extension — is built and tested for it first. An RTX-class card with 8-12 GB is the path of least resistance and the fastest per dollar.
AMD: Works, with caveats. On Linux with ROCm properly configured, AMD cards reach near-CUDA performance. On Windows you are on DirectML or ZLUDA (community projects such as the AMD-GPU A1111/Forge forks), where performance typically runs roughly 30-50% behind an equivalent NVIDIA card. Doable, but expect setup friction.
Apple Silicon (M-series): Runs via the Metal Performance Shaders (MPS) backend. SDXL works on a 16 GB-plus Mac, and native apps like Draw Things use Apple's Metal engine for a smoother experience — roughly an SDXL 1024x1024 image in the ~25-40 second range on an M2 Pro per community reports. Usable and convenient, but a Mac generates noticeably slower than a comparable CUDA GPU, so it is a good "I already own one" option rather than a machine to buy specifically for image generation.

First-hand notes from a local box

Take these as approximate, single-machine observations, not a controlled benchmark. On an RTX 3060 12 GB with Forge, SDXL at 1024x1024 / ~25 steps lands roughly in the 12-20 second per image range once the model is loaded, and SD 1.5 at 512x512 is near-instant by comparison. SD 3.5 Large in its FP8 build fit on the same 12 GB card but ran appreciably slower and left far less headroom for big batches. The pattern matches the rule above: VRAM determines what loads at all, and once a model fully fits in VRAM, generation is fast — the moment it spills to system RAM, speed collapses. If your generations suddenly crawl, you have almost certainly run out of VRAM and the UI is offloading.

Key Takeaways

8 GB of NVIDIA VRAM is the honest entry point for modern Stable Diffusion (SDXL). SD 1.5 runs on 4 GB; 12 GB is the comfortable sweet spot and unlocks SD 3.5 Large in FP8.
SDXL (3.5B, 2023) still wins for most local workflows in 2026 — not because the base is best, but because its LoRA, ControlNet, and inpainting ecosystem is the deepest of any open model.
SD 3.5 Large (8.1B) needs ~18 GB at FP16, ~11 GB with the FP8 build; SD 3.5 Medium (2.5B) targets consumer GPUs at ~9.9 GB excluding text encoders.
Get base weights from Hugging Face, community fine-tunes/LoRAs from CivitAI, and always prefer .safetensors — matching the LoRA's base model to your checkpoint.
You install a front end, not "Stable Diffusion" itself — Fooocus (easiest), Forge (fast), A1111 (classic), or ComfyUI (most flexible). NVIDIA is smoothest; AMD and Apple work with extra setup and slower speeds.

Next Steps

Brand new and want images fast? Start with the Fooocus guide — the lowest-effort path to good SDXL output.
Want speed plus FLUX support in a familiar A1111 layout? Follow the SD Forge guide.
Prefer the classic, extension-heavy WebUI? Use the AUTOMATIC1111 guide.
Ready for full pipeline control or SD 3.5 / FLUX? See the ComfyUI guide.
Considering FLUX as your default instead of SDXL? Read Run FLUX.1 locally for its VRAM tiers and trade-offs.
Not sure your GPU is enough? Check the VRAM requirements guide or the can-I-run checker before downloading.

Stable Diffusion Local Install (2026): VRAM, Setup, Models

Want to go deeper than this article?

Can my GPU run Stable Diffusion? (Verdict)

Reading articles is good. Building is better.

SD 1.5 vs SDXL vs SD 3.5: which checkpoint family?

Why does SDXL still win in 2026?

Where do I get checkpoints and models?

Reading articles is good. Building is better.

How do I actually install it? (Setup paths)

Best Stable Diffusion models per VRAM tier

NVIDIA vs AMD vs Apple Silicon

First-hand notes from a local box

Key Takeaways

Next Steps

Generating images locally? Take it further.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

Fooocus Guide: Easiest Stable Diffusion UI

SD Forge Guide: Faster A1111 + FLUX

Run FLUX.1 Locally: VRAM + Setup

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Generating images locally? Take it further.