Ollama Image Generation: Run Z-Image & FLUX.2 Locally (2026)
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Ollama’s running. Here’s what to build with it. Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
Yes — as of its January 20, 2026 release, Ollama can generate images locally. It is an experimental, macOS-only feature (Windows and Linux are "coming soon") and ships with two models: x/z-image-turbo (Alibaba Tongyi Lab's 6B model, photorealistic with bilingual English/Chinese text) and x/flux2-klein from Black Forest Labs (a fast 4B/9B family). You run them straight from the terminal, e.g. ollama run x/z-image-turbo "a cat holding a sign that says hello world". The best Ollama model for image generation right now is Z-Image-Turbo for quality and FLUX.2 Klein 4B for speed — but for serious work (LoRAs, ControlNet, inpainting) you still want ComfyUI or Forge.
This is genuinely new in 2026, which clears up a common misconception below: Ollama did not generate images before this release. For years it was a text-and-vision (image-in, text-out) runner only. Text-to-image is a fresh, clearly-labeled experimental capability — not something that quietly existed.
Did Ollama always generate images? (Clearing up the misconception)
No. This trips a lot of people up, so let's be precise. Before January 2026, Ollama ran language models and vision models — meaning it could read an image you handed it (with multimodal models like LLaVA or Llama 3.2 Vision) and describe it. It could not create images. There was no text-to-image in Ollama at all.
The January 20, 2026 release added experimental image generation as a distinct feature. So if you remember someone saying "Ollama does images," before 2026 they meant image understanding, not image generation. The two are completely different model types, and only the latter is what this guide is about.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
What models can Ollama generate images with in 2026?
There are two, both pulled from Ollama's x/ (experimental) namespace. Here is the verified lineup with real download sizes from the Ollama model pages.
| Model tag | Source | Params | Default quant + size | License | Best for |
|---|---|---|---|---|---|
| x/z-image-turbo | Alibaba Tongyi Lab | 6B | fp8 (default) ~13 GB · bf16 ~33 GB | Apache 2.0 | Photorealism, bilingual EN/CN text |
| x/flux2-klein:4b | Black Forest Labs | 4B | ~5.7 GB | Apache 2.0 | Fast, commercial-friendly, readable text |
| x/flux2-klein:9b | Black Forest Labs | 9B | ~12 GB | FLUX Non-Commercial License v2.1 | Higher fidelity (non-commercial only) |
A few details that matter:
- Z-Image-Turbo is a "turbo" few-step model. It was built by Alibaba's Tongyi Lab to produce a 1024×1024 image in roughly 8 sampling steps (8 NFEs), which is why it feels fast despite being the highest-quality option here. Its standout feature is accurate bilingual text rendering in English and Chinese — text inside the image actually reads correctly, which most open models botch.
- FLUX.2 Klein comes in two sizes. The 4B is the default (and the one to start with): small, fast, and Apache 2.0 so you can use outputs commercially. The 9B is sharper but carries Black Forest Labs' FLUX Non-Commercial License v2.1, so do not ship its outputs in a paid product without a commercial agreement.
- Both also publish smaller quantized tags (e.g.
flux2-klein:4b-fp8,z-image-turbo:fp8) if you want a smaller download.
If you want the deeper architecture and ComfyUI workflow for Z-Image specifically, we have a dedicated walkthrough on running Z-Image-Turbo in ComfyUI.
How do I generate images with Ollama? (Exact commands)
First, make sure you are on macOS with a recent Ollama version — image generation does not run on Windows or Linux yet. Then pull and run a model. The model downloads on first run.
# Z-Image-Turbo (Alibaba Tongyi, 6B) — quality + bilingual text
ollama run x/z-image-turbo "a cat holding a sign that says hello world"
# FLUX.2 Klein 4B (Black Forest Labs) — fast, commercial-friendly
ollama run x/flux2-klein "a neon-lit Tokyo street at night, photorealistic"
# FLUX.2 Klein 9B — higher fidelity (non-commercial license)
ollama run x/flux2-klein:9b "a watercolor fox in a misty forest"
By default the generated image is saved to your current directory. If your terminal supports inline image rendering — Ghostty, iTerm2 and similar — the picture also previews right in the terminal window, no external viewer needed. That terminal-native preview is a deliberate part of the experience.
Inside an interactive session you can tune generation with slash commands:
# Inside an interactive run:
/set width 1024 # output width in pixels
/set height 1024 # output height in pixels
You can also control the number of steps (fewer = faster, more = more detailed), set a fixed random seed for reproducible results, and supply a negative prompt to steer away from unwanted elements. The official details live on the Ollama image generation blog post and the x/z-image-turbo model page.
How much VRAM / unified memory does Ollama image generation need?
Because the feature is macOS-only today, the practical resource is Apple Silicon unified memory (the same pool the GPU and CPU share). Use the download size as your floor and add headroom — the model has to live in memory while it runs. These are practical targets, framed approximately:
| Model | Download (default) | Practical unified-memory floor | Comfortable on |
|---|---|---|---|
| x/flux2-klein:4b | ~5.7 GB | ~10-12 GB | 16 GB Mac (M-series) |
| x/z-image-turbo (fp8) | ~13 GB | ~16 GB | 24 GB+ Mac |
| x/flux2-klein:9b | ~12 GB | ~16 GB | 24 GB+ Mac |
| x/z-image-turbo (bf16) | ~33 GB | ~36 GB+ | 48 GB / 64 GB Mac |
Alibaba states Z-Image-Turbo was designed to fit 16 GB VRAM consumer devices at its native precision, which lines up with the fp8 row above. The takeaway: a 16 GB Apple Silicon Mac comfortably runs FLUX.2 Klein 4B and can handle Z-Image-Turbo fp8 with little else open; for the bf16 full-precision Z-Image you really want 36 GB+ of unified memory. To sanity-check any model against your machine before downloading 13-33 GB, our VRAM calculator is the quickest gut-check.
What we measured (informal, single machine)
On an M-series Mac with 24 GB unified memory, FLUX.2 Klein 4B returned a 1024×1024 image in roughly a few seconds per generation once the model was warm, and Z-Image-Turbo fp8 took noticeably longer per image but produced cleaner text rendering. Treat these as ballpark, hardware-dependent observations from one machine — not a controlled benchmark. The pattern that holds: the 4B is your "fast iteration" model, Z-Image-Turbo is your "make it look right" model, and cold-start (first run, model still downloading/loading) is always the slow part.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How is Ollama image generation different from ComfyUI?
This is the honest heart of the article. Ollama's image generation is fast to start and great for one-off prompts from the terminal, but it is intentionally minimal. ComfyUI (and Forge/A1111) are full image pipelines. Here is the real gap:
| Capability | Ollama (2026, experimental) | ComfyUI / Forge |
|---|---|---|
| Platforms | macOS only (Win/Linux soon) | Windows, Linux, macOS |
| Interface | Terminal prompt | Full node graph / web UI |
| Model selection | 2 curated models (Z-Image, FLUX.2 Klein) | Hundreds (SDXL, FLUX, SD3.5, custom) |
| LoRA support | No (not yet) | Yes |
| ControlNet | No (not yet) | Yes |
| Inpainting / outpainting | No | Yes |
| img2img | No | Yes |
| Batch + automation | Limited | Extensive (API, workflows) |
| Setup effort | Trivial (one command) | Moderate (install + nodes) |
So when does each win?
- Use Ollama image generation when you want a quick image from the command line, you are already running Ollama for text models, you are on a Mac, and you do not need fine control. It is the lowest-friction way to do ollama text to image.
- Use ComfyUI or Forge the moment you need LoRAs, ControlNet, inpainting, img2img, specific checkpoints, or you are on Windows/Linux. For any serious or repeatable image work, ComfyUI is still the tool. Our complete ComfyUI guide covers that workflow end to end, and the broader local FLUX image generation guide goes deeper on the FLUX family outside Ollama.
Which is the best Ollama model for image generation?
For most people: Z-Image-Turbo for quality, FLUX.2 Klein 4B for speed.
- Pick x/z-image-turbo if you want the most photorealistic output and especially if your images contain text (signs, logos, UI mockups, English or Chinese). Its bilingual text rendering is the single most differentiated thing in this lineup.
- Pick x/flux2-klein:4b if you want fast iteration, a small ~5.7 GB download, and commercial usage rights (Apache 2.0). It is the friendliest starting point on a 16 GB Mac.
- Pick x/flux2-klein:9b only if you need extra fidelity and your use is non-commercial — its FLUX Non-Commercial License v2.1 rules out shipping outputs in a paid product without a separate agreement.
Key Takeaways
- Ollama image generation is real and new in 2026 — released January 20, 2026, experimental, macOS only (Windows/Linux coming soon). It did not exist before this; earlier "Ollama + images" meant vision (image-in, text-out), not generation.
- Two models ship today:
x/z-image-turbo(Alibaba Tongyi, 6B, fp8 ~13 GB, Apache 2.0, bilingual text) andx/flux2-klein(Black Forest Labs, 4B ~5.7 GB Apache 2.0 / 9B ~12 GB non-commercial). - The commands are dead simple:
ollama run x/z-image-turbo "your prompt"saves to the current directory and previews inline in Ghostty/iTerm2. - Plan for memory: a 16 GB Apple Silicon Mac runs FLUX.2 Klein 4B and Z-Image-Turbo fp8; full bf16 Z-Image wants 36 GB+ unified memory.
- It is not a ComfyUI replacement. No LoRA, no ControlNet, no inpainting/img2img yet. For serious or repeatable work, ComfyUI/Forge are still required.
Next Steps
- Want the full Ollama setup (text models, GPU, config) before adding images? Start with our complete Ollama guide.
- Curious which text/vision models to run alongside image generation? See the best Ollama models roundup.
- Need real control — LoRAs, ControlNet, inpainting? Read the ComfyUI complete guide and our Z-Image-Turbo in ComfyUI walkthrough.
- Going deeper on FLUX outside Ollama? The local FLUX image generation guide covers the full family.
Ollama’s running. Here’s what to build with it.
Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
- PILLARRun FLUX.1 Locally in 2026: VRAM Needs + 5-Minute Setup
- Best GPU for Local AI Image Generation (2026): Ranked
- Best Local AI Image Models 2026: FLUX vs SDXL vs Qwen
- ComfyUI 2026: Install + ControlNet + FLUX Setup (Full Tutorial)
- ComfyUI FLUX Workflow (2026): JSON Nodes Explained
- FLUX VRAM Requirements by GPU (2026): 8GB to 24GB Guide
- Image-to-Text AI: 89% Caption Accuracy (2026)
- Run FLUX on 6-8GB VRAM (2026): GGUF & Offloading
- Run FLUX.2 Locally (2026): Klein 9B/4B VRAM + ComfyUI
- SD Forge Guide 2026: Faster A1111 with Native Flux Support
Comments (0)
No comments yet. Be the first to share your thoughts!