Local AI Image Upscaling (2026): ESRGAN, GFPGAN & 4x
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Go from reading about AI to building with AI 20 structured courses. Hands-on projects. Runs on your machine. Start free.
The best local AI image upscaler in 2026 is Real-ESRGAN (the open-source xinntao project) for general photos and a 4x ESRGAN model like 4x-UltraSharp for AI art — both run free and fully offline, need only ~2-4 GB of VRAM, and do 4x or 2x upscaling that rivals paid cloud tools. Pair them with GFPGAN v1.4 or CodeFormer to fix faces, and SwinIR when you need denoising and JPEG-artifact removal alongside the upscale. The honest trade-off versus Topaz Gigapixel: local tools are free and private but ask you to pick the right model per image; Topaz is one-click but, as of its October 2025 switch, subscription-only (Gigapixel is about $29/month or $149/year).
If you already generate images with Stable Diffusion or Flux, upscaling is the missing finishing step — it turns a 512x512 or 1024x1024 generation into a clean 2K-4K print without re-rolling the prompt. This guide covers the models that matter, the ComfyUI and Forge nodes that drive them, how little VRAM you actually need, and when a cloud upscaler is still worth paying for.
What is the best local AI image upscaler in 2026?
There is no single winner — the right model depends on the image. Here is the short version, then the detail below:
- Photos and mixed content: Real-ESRGAN (
RealESRGAN_x4plus) — the most widely used, most robust general 4x upscaler. - AI art / illustrations / sharp detail: a 4x ESRGAN-architecture model such as 4x-UltraSharp.
- Anime and line art:
RealESRGAN_x4plus_anime_6B(a smaller, anime-tuned 6-block model). - Restoration (denoise + de-JPEG + upscale): SwinIR, a Swin-Transformer restoration model.
- Faces: GFPGAN v1.4 or CodeFormer as a second pass, never as the primary upscaler.
The reason you keep several around is that upscalers are specialists. A model trained to sharpen photographic texture will invent crunchy "hair" detail on a smooth illustration; an anime model will smear photographic skin. Keeping three or four model files on disk (each is roughly 60-350 MB) lets you match the tool to the image, which is the single biggest quality lever in local upscaling.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
Local upscaling models compared (VRAM, scale, best use)
The table below lists the models worth keeping, with their real architectures, typical scale factors, approximate file sizes, and what each is for. ESRGAN-family models (Real-ESRGAN, 4x-UltraSharp, Remacri, etc.) are interchangeable in the same "load upscale model" node — they are all the same network shape with different training data.
| Model | Architecture | Native scale | File size (approx) | Best for |
|---|---|---|---|---|
| Real-ESRGAN x4plus | ESRGAN (RRDBNet) | 4x (2x variant exists) | ~64 MB | General photos, mixed content |
| Real-ESRGAN x4plus_anime_6B | ESRGAN (6-block) | 4x | ~18 MB | Anime, line art, flat color |
| realesr-general-x4v3 | ESRGAN (tiny) | 4x | ~5 MB | Low-VRAM / fast general use |
| 4x-UltraSharp | ESRGAN | 4x | ~67 MB | AI art, crisp edges, illustration |
| SwinIR | Swin Transformer | 2x / 4x | ~60-140 MB (variant-dependent) | Restoration: denoise + de-JPEG + SR |
| GFPGAN v1.4 | StyleGAN2 face prior | face-only pass | ~349 MB | Repairing faces after upscale |
| CodeFormer | Codebook Transformer | face-only pass | ~360 MB | Faces with adjustable fidelity |
A few honest notes. Real-ESRGAN, 4x-UltraSharp and the other ESRGAN models all upscale at a fixed integer factor (usually 4x); if you want 2x, you upscale 4x and then downscale, or use the dedicated RealESRGAN_x2plus weights. GFPGAN and CodeFormer are not general upscalers at all — they only reconstruct faces, so you run them after a normal upscale (or with Real-ESRGAN's built-in --face_enhance flag, which calls GFPGAN under the hood).
How much VRAM does local upscaling need?
This is the good news: upscaling is far lighter than image generation. An ESRGAN-family model like Real-ESRGAN runs comfortably in roughly 2-4 GB of VRAM, and the tiny realesr-general-x4v3 runs in well under 2 GB. That means upscaling works on hardware that struggles to run Stable Diffusion XL, and it even runs acceptably on CPU (just slower).
The thing that blows up memory is not the model — it is the output resolution. Upscaling a 1024x1024 image by 4x produces a 4096x4096 result, and holding that whole tensor in VRAM is what causes out-of-memory errors. The fix is tiling: the image is split into small tiles (e.g. 512x512), each is upscaled, and the tiles are stitched back. Real-ESRGAN exposes this with a --tile option, and in ComfyUI the UltimateSDUpscale node automatically tiles and encodes in blocks when VRAM runs short (it warns that this is slower, which is the expected trade-off).
| Task | Practical VRAM | Notes |
|---|---|---|
| ESRGAN 4x, image fits in memory | ~2-4 GB | Real-ESRGAN, 4x-UltraSharp, etc. |
| ESRGAN 4x to very large output | ~2-4 GB with tiling | Tile size 512 keeps memory flat |
| SwinIR restoration | ~4-6 GB | Transformer, heavier than ESRGAN |
| Face restore (GFPGAN/CodeFormer) | ~2-3 GB | Runs on a cropped face region |
| No GPU at all | CPU works | Minutes per image instead of seconds |
On my own machine (an RTX 3090, 24 GB) a single 4x Real-ESRGAN pass on a 1024x1024 image to 4096x4096 finishes in roughly 1-2 seconds, and batch-upscaling a folder of 100 images runs unattended in a couple of minutes. Treat those as approximate, single-machine numbers — the point is that upscaling is fast and cheap on local hardware, not that your exact times will match.
How does ComfyUI handle upscaling? (latent vs model upscale)
ComfyUI gives you two fundamentally different ways to make an image bigger, and mixing them up is the most common beginner mistake. For the full node-graph basics, start with our ComfyUI complete guide; the upscaling-specific distinction is this:
- Model upscale (pixel space) — the Load Upscale Model + Upscale Image (Using Model) nodes. You feed in a finished, decoded image and an ESRGAN model (Real-ESRGAN, 4x-UltraSharp). It intelligently reconstructs detail at 4x. This is the everyday upscaler and the one you want for an already-rendered image.
- Latent upscale — the Upscale Latent / Upscale Latent By nodes operate on the latent tensor before it is decoded to pixels, mid-generation. It is faster and keeps generation coherence, but it is not a detail-adding super-resolution model on its own — you typically follow it with a second sampler pass at low denoise so the model "repaints" the new resolution. Run a latent upscale on an already-decoded image and you have just enlarged pixels, not added detail.
For the best of both, the UltimateSDUpscale node (the ComfyUI port of Coyote-A's Ultimate SD Upscale) combines them: it upscales with an ESRGAN model, splits the result into tiles, and runs a low-denoise img2img pass on each tile to add genuine new detail. Because it tiles, it finishes large outputs on limited VRAM — the official guidance is that if VRAM is short it auto-tiles and encodes in blocks, just more slowly.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How do you upscale in Forge or Automatic1111?
If you prefer a classic WebUI, both Automatic1111 and the faster Forge fork expose two routes. For setup, see our Automatic1111 guide and the leaner SD Forge guide.
- Hires. fix (txt2img): built into the generation tab. It renders at your base size, upscales (you choose an upscaler like R-ESRGAN 4x+ or a 4x ESRGAN model), then runs a second diffusion pass at a denoise you set (~0.3-0.5) to sharpen. It is the simplest "make my generation bigger and better" toggle.
- Extras tab: a pure, non-diffusion upscale. Drop any image in, pick an upscaler (Real-ESRGAN, SwinIR), choose a scale, and it runs the model directly with optional GFPGAN/CodeFormer face restoration. Good for upscaling photos or images you did not generate.
- Ultimate SD Upscale (extension): Coyote-A's tile-and-inpaint script for big, detailed results on any GPU. It breaks the image into 512x512 tiles and runs img2img at high denoise (0.3-0.5) per tile, producing fewer seams than the old built-in "SD upscale." One caveat to know going in: some users report it does not activate correctly under Forge and falls back to plain img2img, while it works reliably in A1111 — test it on your build before trusting it for a batch.
Whether you generate with Stable Diffusion or with Flux locally, the upscale step is identical: produce the base image, then run it through one of these passes.
How do you fix faces after upscaling?
General upscalers reconstruct texture, not identity, so small or low-quality faces often come out smudged. That is what GFPGAN and CodeFormer are for — they are face-specific restorers you run after the main upscale.
- GFPGAN v1.4 uses a StyleGAN2 facial prior to rebuild realistic faces; v1.4 produces slightly more detail and better identity than v1.3. It is the simplest one-shot option and is what Real-ESRGAN's
--face_enhanceflag invokes automatically. - CodeFormer adds a controllable fidelity weight (w from 0 to 1): a lower w yields higher visual quality (the model invents more), a higher w yields higher fidelity to the original face (better identity, less invention). That knob makes CodeFormer the better choice when preserving someone's actual likeness matters — old-photo restoration, for instance.
A practical rule: start with CodeFormer at w ≈ 0.5-0.7 for real people, and reach for GFPGAN when you just want a quick, good-looking face on AI-generated portraits. Both run on a cropped face region, so they need only ~2-3 GB of VRAM and add a second or two per image.
Photo restoration: a real local use case
Beyond making AI art bigger, the same toolchain restores damaged real photos — old scans, blurry phone shots, JPEG-mangled images. A reliable local pipeline is: SwinIR (or Real-ESRGAN) to denoise and upscale → CodeFormer to repair faces. SwinIR is purpose-built here: the official paper reports state-of-the-art results on real-world super-resolution, denoising, and JPEG-artifact reduction, beating prior methods by up to 0.14-0.45 dB while using up to 67% fewer parameters. Because everything runs locally, you can restore family photos without uploading them to a stranger's server — a privacy win cloud tools cannot match.
4x vs 2x: which scale should you pick?
Always upscale by the smallest factor that hits your target resolution. A 4x pass invents the most new detail, but on a clean source it can also over-sharpen and add texture that was not there. Guidance:
- 2x when the source is already fairly large or high-quality (e.g. a 1024px AI render going to 2K). Less invention, more faithful.
- 4x when the source is small or you need a big print (e.g. a 512px image to 2048px). More reconstruction, watch for artifacts on flat areas.
- Chained / iterative (2x then 2x, or model-upscale then a low-denoise diffusion tile pass) when you want maximum size with controlled detail — this is exactly what Ultimate SD Upscale automates.
Local upscalers vs Topaz Gigapixel (cost and privacy)
The obvious commercial comparison is Topaz Gigapixel AI. It is genuinely excellent and one-click, but two facts changed the math in late 2025: Topaz retired perpetual licenses in October 2025 and moved to subscription-only pricing — a standalone Gigapixel subscription runs about $29/month or $149/year (a Pro tier is $499/year, and Gigapixel is also bundled in the broader Topaz Studio subscription). The old one-time ~$99 Gigapixel license is gone for new buyers.
| Factor | Local (Real-ESRGAN, SwinIR, etc.) | Topaz Gigapixel AI |
|---|---|---|
| Cost | Free, open-source | ~$29/mo or $149/year (subscription since Oct 2025) |
| Privacy | 100% offline, images never leave your PC | Local app, but paid + account-bound |
| Ease | Pick the right model per image | One-click, auto model selection |
| VRAM | ~2-4 GB, runs on modest GPUs/CPU | Optimized desktop app |
| Batch | Free, unlimited, scriptable | Included |
| Updates | Community-driven | Subscription only |
The honest verdict: if you upscale occasionally and value zero cost plus full privacy, the local stack wins outright and runs on hardware you already own. If you upscale professionally at volume and want a polished one-click result without choosing models, Topaz's subscription can be worth it. Many people do both — local for everyday AI-art finishing, Topaz for a handful of client-grade restorations.
Key Takeaways
- Real-ESRGAN is the best general local upscaler in 2026, with 4x-UltraSharp for AI art and the anime 6B model for line art. They are free, offline, and need only ~2-4 GB of VRAM.
- Upscaling is much lighter than generation. The model is small; the output resolution is what eats memory — use tiling (Real-ESRGAN
--tileor UltimateSDUpscale) to keep large outputs within VRAM. - In ComfyUI, "model upscale" (pixel) and "latent upscale" are different tools. Use model upscale for finished images; latent upscale belongs mid-generation, followed by a sampler pass.
- GFPGAN v1.4 and CodeFormer fix faces after the main upscale. CodeFormer's fidelity weight (lower = more invention, higher = more identity) makes it the pick for restoring real people.
- Local beats Topaz on cost and privacy. Topaz dropped perpetual licenses in Oct 2025 and Gigapixel is now subscription-only (~$29/mo or $149/year); the local stack is free and never uploads your images.
Next Steps
- Generating images first? Set up the engine with our Stable Diffusion local install guide, then add an upscale pass.
- Prefer a node-based workflow? The ComfyUI complete guide walks through building the upscale graph from scratch.
- Want a classic WebUI? Compare the original interface in our Automatic1111 guide with the faster SD Forge guide.
- Using Flux instead of SD? See Flux local image generation — the upscale step plugs in the same way.
- Source models from the official repos: the Real-ESRGAN GitHub for general upscaling and the CodeFormer GitHub for face restoration.
Go from reading about AI to building with AI
20 structured courses. Hands-on projects. Runs on your machine. Start free.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!