Yes — as of its January 20, 2026 release, Ollama can generate images locally. It is an experimental, macOS-only feature (Windows and Linux are "coming soon") and ships with two models: x/z-image-turbo (Alibaba Tongyi Lab's 6B model, photorealistic with bilingual English/Chinese text) and x/flux2-klein from Black Forest Labs (a fast 4B/9B family). You run them straight from the terminal, e.g. ollama run x/z-image-turbo "a cat holding a sign that says hello world". The best Ollama model for image generation right now is Z-Image-Turbo for quality and FLUX.2 Klein 4B for speed — but for serious work (LoRAs, ControlNet, inpainting) you still want ComfyUI or Forge.

This is genuinely new in 2026, which clears up a common misconception below: Ollama did not generate images before this release. For years it was a text-and-vision (image-in, text-out) runner only. Text-to-image is a fresh, clearly-labeled experimental capability — not something that quietly existed.

Did Ollama always generate images? (Clearing up the misconception)

No. This trips a lot of people up, so let's be precise. Before January 2026, Ollama ran language models and vision models — meaning it could read an image you handed it (with multimodal models like LLaVA or Llama 3.2 Vision) and describe it. It could not create images. There was no text-to-image in Ollama at all.

The January 20, 2026 release added experimental image generation as a distinct feature. So if you remember someone saying "Ollama does images," before 2026 they meant image understanding, not image generation. The two are completely different model types, and only the latter is what this guide is about.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

What models can Ollama generate images with in 2026?

There are two, both pulled from Ollama's x/ (experimental) namespace. Here is the verified lineup with real download sizes from the Ollama model pages.

Model tag	Source	Params	Default quant + size	License	Best for
x/z-image-turbo	Alibaba Tongyi Lab	6B	fp8 (default) ~13 GB · bf16 ~33 GB	Apache 2.0	Photorealism, bilingual EN/CN text
x/flux2-klein:4b	Black Forest Labs	4B	~5.7 GB	Apache 2.0	Fast, commercial-friendly, readable text
x/flux2-klein:9b	Black Forest Labs	9B	~12 GB	FLUX Non-Commercial License v2.1	Higher fidelity (non-commercial only)

A few details that matter:

Z-Image-Turbo is a "turbo" few-step model. It was built by Alibaba's Tongyi Lab to produce a 1024×1024 image in roughly 8 sampling steps (8 NFEs), which is why it feels fast despite being the highest-quality option here. Its standout feature is accurate bilingual text rendering in English and Chinese — text inside the image actually reads correctly, which most open models botch.
FLUX.2 Klein comes in two sizes. The 4B is the default (and the one to start with): small, fast, and Apache 2.0 so you can use outputs commercially. The 9B is sharper but carries Black Forest Labs' FLUX Non-Commercial License v2.1, so do not ship its outputs in a paid product without a commercial agreement.
Both also publish smaller quantized tags (e.g. flux2-klein:4b-fp8, z-image-turbo:fp8) if you want a smaller download.

If you want the deeper architecture and ComfyUI workflow for Z-Image specifically, we have a dedicated walkthrough on running Z-Image-Turbo in ComfyUI.

How do I generate images with Ollama? (Exact commands)

First, make sure you are on macOS with a recent Ollama version — image generation does not run on Windows or Linux yet. Then pull and run a model. The model downloads on first run.

# Z-Image-Turbo (Alibaba Tongyi, 6B) — quality + bilingual text
ollama run x/z-image-turbo "a cat holding a sign that says hello world"

# FLUX.2 Klein 4B (Black Forest Labs) — fast, commercial-friendly
ollama run x/flux2-klein "a neon-lit Tokyo street at night, photorealistic"

# FLUX.2 Klein 9B — higher fidelity (non-commercial license)
ollama run x/flux2-klein:9b "a watercolor fox in a misty forest"

By default the generated image is saved to your current directory. If your terminal supports inline image rendering — Ghostty, iTerm2 and similar — the picture also previews right in the terminal window, no external viewer needed. That terminal-native preview is a deliberate part of the experience.

Inside an interactive session you can tune generation with slash commands:

# Inside an interactive run:
/set width 1024      # output width in pixels
/set height 1024     # output height in pixels

You can also control the number of steps (fewer = faster, more = more detailed), set a fixed random seed for reproducible results, and supply a negative prompt to steer away from unwanted elements. The official details live on the Ollama image generation blog post and the x/z-image-turbo model page.

How much VRAM / unified memory does Ollama image generation need?

Because the feature is macOS-only today, the practical resource is Apple Silicon unified memory (the same pool the GPU and CPU share). Use the download size as your floor and add headroom — the model has to live in memory while it runs. These are practical targets, framed approximately:

Model	Download (default)	Practical unified-memory floor	Comfortable on
x/flux2-klein:4b	~5.7 GB	~10-12 GB	16 GB Mac (M-series)
x/z-image-turbo (fp8)	~13 GB	~16 GB	24 GB+ Mac
x/flux2-klein:9b	~12 GB	~16 GB	24 GB+ Mac
x/z-image-turbo (bf16)	~33 GB	~36 GB+	48 GB / 64 GB Mac

Alibaba states Z-Image-Turbo was designed to fit 16 GB VRAM consumer devices at its native precision, which lines up with the fp8 row above. The takeaway: a 16 GB Apple Silicon Mac comfortably runs FLUX.2 Klein 4B and can handle Z-Image-Turbo fp8 with little else open; for the bf16 full-precision Z-Image you really want 36 GB+ of unified memory. To sanity-check any model against your machine before downloading 13-33 GB, our VRAM calculator is the quickest gut-check.

What we measured (informal, single machine)

On an M-series Mac with 24 GB unified memory, FLUX.2 Klein 4B returned a 1024×1024 image in roughly a few seconds per generation once the model was warm, and Z-Image-Turbo fp8 took noticeably longer per image but produced cleaner text rendering. Treat these as ballpark, hardware-dependent observations from one machine — not a controlled benchmark. The pattern that holds: the 4B is your "fast iteration" model, Z-Image-Turbo is your "make it look right" model, and cold-start (first run, model still downloading/loading) is always the slow part.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How is Ollama image generation different from ComfyUI?

This is the honest heart of the article. Ollama's image generation is fast to start and great for one-off prompts from the terminal, but it is intentionally minimal. ComfyUI (and Forge/A1111) are full image pipelines. Here is the real gap:

Capability	Ollama (2026, experimental)	ComfyUI / Forge
Platforms	macOS only (Win/Linux soon)	Windows, Linux, macOS
Interface	Terminal prompt	Full node graph / web UI
Model selection	2 curated models (Z-Image, FLUX.2 Klein)	Hundreds (SDXL, FLUX, SD3.5, custom)
LoRA support	No (not yet)	Yes
ControlNet	No (not yet)	Yes
Inpainting / outpainting	No	Yes
img2img	No	Yes
Batch + automation	Limited	Extensive (API, workflows)
Setup effort	Trivial (one command)	Moderate (install + nodes)

So when does each win?

Use Ollama image generation when you want a quick image from the command line, you are already running Ollama for text models, you are on a Mac, and you do not need fine control. It is the lowest-friction way to do ollama text to image.
Use ComfyUI or Forge the moment you need LoRAs, ControlNet, inpainting, img2img, specific checkpoints, or you are on Windows/Linux. For any serious or repeatable image work, ComfyUI is still the tool. Our complete ComfyUI guide covers that workflow end to end, and the broader local FLUX image generation guide goes deeper on the FLUX family outside Ollama.

Which is the best Ollama model for image generation?

For most people: Z-Image-Turbo for quality, FLUX.2 Klein 4B for speed.

Pick x/z-image-turbo if you want the most photorealistic output and especially if your images contain text (signs, logos, UI mockups, English or Chinese). Its bilingual text rendering is the single most differentiated thing in this lineup.
Pick x/flux2-klein:4b if you want fast iteration, a small ~5.7 GB download, and commercial usage rights (Apache 2.0). It is the friendliest starting point on a 16 GB Mac.
Pick x/flux2-klein:9b only if you need extra fidelity and your use is non-commercial — its FLUX Non-Commercial License v2.1 rules out shipping outputs in a paid product without a separate agreement.

Key Takeaways

Ollama image generation is real and new in 2026 — released January 20, 2026, experimental, macOS only (Windows/Linux coming soon). It did not exist before this; earlier "Ollama + images" meant vision (image-in, text-out), not generation.
Two models ship today: x/z-image-turbo (Alibaba Tongyi, 6B, fp8 ~13 GB, Apache 2.0, bilingual text) and x/flux2-klein (Black Forest Labs, 4B ~5.7 GB Apache 2.0 / 9B ~12 GB non-commercial).
The commands are dead simple: ollama run x/z-image-turbo "your prompt" saves to the current directory and previews inline in Ghostty/iTerm2.
Plan for memory: a 16 GB Apple Silicon Mac runs FLUX.2 Klein 4B and Z-Image-Turbo fp8; full bf16 Z-Image wants 36 GB+ unified memory.
It is not a ComfyUI replacement. No LoRA, no ControlNet, no inpainting/img2img yet. For serious or repeatable work, ComfyUI/Forge are still required.

Next Steps

Want the full Ollama setup (text models, GPU, config) before adding images? Start with our complete Ollama guide.
Curious which text/vision models to run alongside image generation? See the best Ollama models roundup.
Need real control — LoRAs, ControlNet, inpainting? Read the ComfyUI complete guide and our Z-Image-Turbo in ComfyUI walkthrough.
Going deeper on FLUX outside Ollama? The local FLUX image generation guide covers the full family.

Ollama Image Generation: Run Z-Image & FLUX.2 Locally (2026)

Want to go deeper than this article?

Did Ollama always generate images? (Clearing up the misconception)

Reading articles is good. Building is better.

What models can Ollama generate images with in 2026?

How do I generate images with Ollama? (Exact commands)

How much VRAM / unified memory does Ollama image generation need?

What we measured (informal, single machine)

Reading articles is good. Building is better.

How is Ollama image generation different from ComfyUI?

Which is the best Ollama model for image generation?

Key Takeaways

Next Steps

Ollama’s running. Here’s what to build with it.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

Complete Ollama Guide

Z-Image-Turbo in ComfyUI

ComfyUI Complete Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Ollama’s running. Here’s what to build with it.