Goose + Ollama (2026): Run Block’s Open Coding Agent Locally
Want to go deeper than this article?
Free account unlocks the first chapter of all 20 courses — RAG, agents, MCP, voice AI, MLOps, real GitHub repos.
Ollama’s running. Here’s what to build with it. Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
Yes — Goose, the open-source AI agent originally built by Block (Square/Cash App), runs completely locally on Ollama models with zero data leaving your machine. As of June 2026 the project lives at the Agentic AI Foundation (repo `aaif-goose/goose`, ~49,800 GitHub stars, Apache-2.0, latest release v1.38.0 on June 17, 2026), ships as a Rust CLI plus a desktop app, supports 15+ LLM providers including Ollama, and gains real tools through the Model Context Protocol (MCP). The catch is honest and important: Goose is only as autonomous as the model behind it, and small local models (7–14B) handle tool-calling far less reliably than frontier cloud models. This guide shows the exact local setup, a working example, and where the small-model walls actually are.
Most "run an AI agent locally" tutorials quietly assume you'll plug in Claude or GPT. Goose is different: it was designed from the start to work with any LLM, and the maintainers actively test it against local Ollama models. That makes it one of the few genuinely-local autonomous coding agents you can run today for $0 in API spend.
Heads-up on the name: you'll see this project called "Goose", "codename goose", "Block Goose", and "AAIF Goose". They're all the same agent. In early 2026 governance moved from Block into the Agentic AI Foundation under the Linux Foundation, so the GitHub repo redirects from
block/goosetoaaif-goose/goose. Block still drives most of the engineering.
What is Goose and can it run fully locally?
Goose is an on-machine, extensible AI agent — not a chat box. It plans a task, then actually executes it: it edits files, runs shell commands, installs packages, runs your tests, reads the output, and iterates until the job is done. The official tagline is "goes beyond code suggestions — install, execute, edit, and test with any LLM."
What makes the local story real:
- Any provider. Goose supports 15+ providers (Anthropic, OpenAI, Google, OpenRouter, Bedrock, Azure…) and — crucially — local ones: Ollama, Ramalama, and Docker Model Runner. With Ollama selected, nothing is sent to a third party.
- Open standard for tools. Capabilities come from extensions, which connect over the Model Context Protocol (MCP). The built-in "developer" extension gives Goose shell access and a file editor; the wider ecosystem adds 70+ community extensions.
- Two front-ends, one engine. A Rust CLI (
goose session) and a desktop app. Both read the same config and providers. - Truly free path. Goose itself is Apache-2.0. Pair it with an Ollama model and the entire stack is free and offline.
If you want the conceptual background on how a loop like this works — plan, call a tool, observe, repeat — read our walkthrough on how to build a local AI agent first; Goose is essentially a polished, production-grade version of that loop.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How do I install Goose?
You need two things: Ollama (to serve the model) and Goose (the agent).
1. Install and start Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Pull a tool-calling-capable model (see model section below)
ollama pull qwen3:8b
Confirm the server is up — by default Ollama listens on http://localhost:11434.
2. Install Goose (CLI)
# macOS / Linux one-liner
curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | bash
On macOS you can also grab the desktop app from the releases page. Verify the install:
goose --version
# goose 1.38.0 (or newer)
Always pull the current release rather than hardcoding a version — Goose ships frequently (v1.38.0 landed June 17, 2026). The official source of truth is the Goose GitHub repository.
How do I configure Ollama as the Goose provider?
There are two equivalent ways: the interactive wizard or environment variables.
Option A — the wizard (recommended first time)
goose configure
Then:
- Choose Configure Providers.
- Select Ollama.
- Confirm the host (default
http://localhost:11434). - Enter the model name, e.g.
qwen3:8b.
Option B — environment variables (great for scripts/CI)
export GOOSE_PROVIDER=ollama
export GOOSE_MODEL=qwen3:8b
export OLLAMA_HOST=http://localhost:11434 # default; only set if remote
goose session
If your Ollama runs on another box, point OLLAMA_HOST at it (e.g. http://192.168.1.50:11434). To use Ollama's hosted cloud instead of local hardware, set OLLAMA_HOST=https://ollama.com and add an API key — but that defeats the "fully local" point, so we'll stay on localhost.
The #1 silent failure
Ollama's default context window is small — 4096 tokens per its current FAQ (older versions defaulted to 2048) — and it truncates silently with no error. An agent like Goose blows past that fast (system prompt + ~11 tool schemas + file contents), so the model never even sees its instructions and "forgets" the tools. Raise the context before you blame the model. Set OLLAMA_CONTEXT_LENGTH=32768 on the Ollama server (or bake PARAMETER num_ctx 32768 into a Modelfile). This single fix resolves most "Goose won't use tools" reports.
Which local model should Goose use?
This is the decision that makes or breaks a local agent. An agent must emit structured tool calls, which is a much harder skill than chatting. Plenty of capable local models are weak at it. Below is a grounded comparison of practical picks (figures are approximate — throughput depends heavily on your hardware and quant).
| Model (Ollama tag) | Params | VRAM ≈ (Q4_K_M) | Tool-calling for agents | Notes |
|---|---|---|---|---|
qwen3:8b | 8B | ~6 GB | Good (multi-turn confirmed) | Best small all-rounder; Goose's own blog demoed it doing multi-turn tool calls |
qwen3:14b | 14B | ~10 GB | Strong | Noticeably steadier than 8B once tasks branch |
qwen3:30b-a3b | 30B MoE (~3B active) | ~18–20 GB | Strong + fast | MoE keeps it quick; great if you have 24GB VRAM |
llama3.1:8b | 8B | ~6 GB | OK | Native tool-calling, but more brittle with many tools |
qwen2.5-coder:14b | 14B | ~10 GB | OK | Excellent code quality; verify tool calls on your task |
devstral / large coders | 24B+ | 16 GB+ | Best local | Closest to "just works"; needs real GPU |
Recommendation: start with qwen3:8b. The official Goose write-up "Goose and Qwen3 for Local Execution" reported a Qwen3 8B model at 4-bit quantization doing successful multi-turn tool calling — something that's historically been shaky at that size. If you have 16–24GB of VRAM, jump to qwen3:14b or the qwen3:30b-a3b MoE for fewer dropped tool calls on longer tasks.
First-hand note: on an RTX 3090 (24GB) running qwen3:8b at Q4_K_M with num_ctx raised to 32K, I measured roughly 45–60 tokens/sec and Goose completed short, well-scoped edit-and-test loops reliably. The same model on an Apple M-series laptop (unified memory) ran "acceptably" but slower — closer to single-digit-to-teens tok/s under a full agent context, matching what Goose's maintainers report on a 64GB M1 Pro. Treat these as ballpark, not benchmarks; your mileage varies with quant, context size, and how many tools are loaded.
For a deeper ranking of which local models actually write good code, see our best local AI models for programming guide — just remember "good at coding" and "good at tool-calling" are different skills, and an agent needs both.
Reading articles is good. Building is better.
Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.
How do I give Goose tools (extensions & MCP)?
Out of the box, Goose enables the developer extension, which is what turns it from a chatbot into an agent. It exposes tools such as:
shell— run terminal commands (build, test, git, install).text_editor— view, write, and patch files.- file/codebase helpers — list directories, search, read files.
That built-in developer toolkit ships roughly 11 tools, which matters for the small-model caveat below. You add more capability via MCP extensions:
# Launch the extension manager inside Goose
goose configure
# -> Add Extension -> Command-line Extension (an MCP server)
# e.g. a filesystem, fetch, GitHub, or Playwright MCP server
Each extension is just an MCP server, so anything in the broader MCP ecosystem works with Goose. If MCP is new to you, our Ollama + MCP integration guide explains the protocol and how local models talk to MCP tool servers.
A real example: Goose fixing a bug locally
Here's an end-to-end local run. The point is to give Goose a small, verifiable task — which is exactly where local models succeed.
export GOOSE_PROVIDER=ollama
export GOOSE_MODEL=qwen3:8b
export OLLAMA_CONTEXT_LENGTH=32768
cd my-python-project
goose session
Then, at the prompt:
( O)> The function average() in stats.py crashes on an empty list.
Reproduce it by running pytest, then fix it so it returns 0.0
for an empty list, and confirm the tests pass.
A capable local model will, on its own:
- Call
shellto runpytestand read theZeroDivisionError. - Call
text_editorto openstats.pyand viewaverage(). - Patch it to guard the empty case (e.g.
return 0.0 if not values else sum(values)/len(values)). - Re-run
pytestviashelland report green.
That whole loop runs offline, on your CPU/GPU, with your code never leaving the machine. Scope is everything: "fix this one failing test" works; "refactor my entire service and add auth" will overwhelm an 8B model's planning. Keep tasks tight and let the agent iterate.
Speed tip for reasoning models: Qwen3 "thinks" before acting, which is slow on local hardware. Appending /no_think to the system prompt makes it skip the reasoning phase and jump to tool execution — faster and, per Goose's maintainers, often more reliable for short agent tasks (at the cost of being weaker on genuinely hard, multi-step work).
Why does my local model ignore the tools?
This is the honest core of running Goose locally. If Goose just describes what it would do instead of doing it, one of these is the cause:
- Context truncation (most common). Ollama's small default context (
num_ctxis 4096 in current versions, 2048 in older ones) silently drops the tool schemas. Fix: raise context to 16K–32K+. - Too many tools for a small model. This is a real, documented limitation. Goose's GitHub issue #6883 reports that Qwen3-Coder via Ollama emits valid JSON tool calls only with roughly 5 or fewer tools — above that threshold it flips to dumping XML-style tool calls into the text content, which Goose can't execute. Since the default developer extension alone is ~11 tools, a weak model can choke. Mitigations: pick a stronger tool-caller (Qwen3 14B/30B), disable extensions you don't need to shrink the tool count, or use Goose's plan/execute split.
- Model just isn't a good tool-caller. Not every "smart" local model emits clean structured calls. Stick to the confirmed-good list above and verify on a trivial task first.
- Quant too aggressive. Heavy quantization (Q2/Q3) degrades tool-call formatting. Q4_K_M or higher is the safer floor for agent work.
The blunt summary: a local 8B agent is genuinely useful for small, well-scoped tasks, but it is not Claude or GPT. It will occasionally mis-call a tool, loop, or give up on complex multi-file work. That's the trade for $0 cost and total privacy — go in expecting it.
Goose vs other local agents
Goose isn't the only way to point a local model at your codebase. Quick orientation:
- Goose — standalone agent, CLI + desktop, MCP-native, provider-agnostic, autonomous loop. Best when you want an agent that acts across the whole project and tooling.
- Cline / Continue (VS Code) — editor-embedded. Great when you want the agent inside your IDE with diffs in front of you. See our Cline + Ollama setup guide.
- Raw scripts — maximum control, maximum work. Covered in build a local AI agent.
Goose's edge is that it's a mature, dedicated agent (49K+ stars, weekly releases, Linux Foundation governance) that treats local models as first-class — not an afterthought.
Key Takeaways
- Goose runs fully local on Ollama. Set
GOOSE_PROVIDER=ollamaandGOOSE_MODEL=<tag>, and nothing leaves your machine. It's Apache-2.0, now governed by the Agentic AI Foundation (aaif-goose/goose), ~49,800 stars, v1.38.0 (June 17, 2026). - Raise the context window first. Ollama's small default context (
num_ctx4096, or 2048 on older versions) silently truncates and breaks tool use. Set 16K–32K+ before debugging anything else. - Model choice is the whole game. Start with
qwen3:8b(confirmed multi-turn tool calling); step up toqwen3:14b/30b-a3bfor steadier behavior on longer tasks. - Tools come from MCP. The built-in developer extension (~11 tools: shell, editor, file ops) makes it an agent; add more via MCP servers.
- Be honest about limits. Small local models reliably handle small, scoped tasks. Issue #6883 shows tool-calling degrades past ~5 tools on weaker models — shrink the toolset or use a stronger model.
- Use
/no_thinkfor speed on reasoning models doing short agent tasks.
Next Steps
- New to agent loops? Start with how to build a local AI agent to understand the plan→tool→observe cycle Goose automates.
- Prefer your agent inside VS Code? Follow our Cline + Ollama setup guide for an editor-native alternative.
- Want to add real tools to your local model? Read the Ollama + MCP integration guide — every Goose extension is an MCP server.
- Choosing the underlying model? Compare options in best local AI models for programming (remember: coding skill ≠ tool-calling skill).
For the authoritative, always-current docs, see the official Goose repository and its provider configuration pages.
Ollama’s running. Here’s what to build with it.
Go from “ollama run” to RAG apps, agents, and fine-tuned models — structured and hands-on. First chapter free.
Liked this? 20 full AI courses are waiting.
From fundamentals to RAG, agents, MCP servers, voice AI, and production deployment with real GitHub repos. First chapter free, every course.
Build Real AI on Your Machine
RAG, agents, NLP, vision, and MLOps - chapters across 20 courses that take you from reading about AI to building AI.
Want structured AI education?
20 courses, 495+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
- PILLARBest Local AI for Coding 2026: 10 Models Tested & Ranked
- 7B vs 14B vs 32B vs 70B for Coding (2026): What Size?
- AI Context Windows: 4K vs 128K vs 1M vs 10M Tokens (2026)
- AI vs Coding for Kids: Which Should Children Learn First?
- Best 14B Coding Models (2026): Ranked by HumanEval + VRAM
- Best AI Coding Models 2026: Top 12 Ranked on SWE-Bench
- Best AI for JavaScript & TypeScript 2026: 10 Models Ranked
- Best AI Models for Python Development 2026: Top 10 Ranked
- Best Claude Model for Coding (2026): Opus 4.8 vs Sonnet 4.6 vs Haiku
- Best Local AI Coding Models 2026: Qwen3-Coder, DeepSeek & Llama, Ranked
Comments (0)
No comments yet. Be the first to share your thoughts!