To build a local AI agent with LangGraph and Ollama in 2026, install langgraph langchain langchain-ollama (LangGraph 1.x, langchain-ollama 0.3+), point ChatOllama at a tool-capable local model such as qwen2.5:7b or llama3.1:8b, then either use the prebuilt agent factory or wire your own StateGraph with a ToolNode and a tools_condition edge. Everything runs offline on your machine — no API keys, no per-token billing. The one hard requirement is a model that actually supports tool calling; a base chat model without tool support will just talk back in plain text instead of invoking your functions.

LangGraph is the low-level, graph-based agent framework from the LangChain team. Where a simple chain runs a fixed A-to-B sequence, LangGraph models your agent as a state machine: typed state, nodes that mutate it, and edges (including conditional ones) that decide what runs next. That structure is exactly what you need for agents that loop — call a tool, look at the result, decide whether to call another, and only then answer. Paired with Ollama for fully local inference, you get a private, reproducible agent stack with zero cloud dependency.

This guide is hands-on. We install the stack, build a minimal State graph, drop in a ReAct agent with two real tools, add conditional edges and loops by hand so you understand what the prebuilt agent hides, persist conversations with SqliteSaver, stream tokens as they generate, and finish with a multi-agent supervisor. Every API shown reflects the current LangGraph 1.x surface (verified June 2026).

What you need before you start

Three pieces, all local:

Component	Minimum version (mid-2026)	Why
Ollama	0.5.2+	Runtime + native tool-calling API for local models
langchain-ollama	0.3.0+	Provides the `ChatOllama` chat model with `.bind_tools()`
langgraph	1.x	State machine: `StateGraph`, `ToolNode`, checkpointers
Python	3.10+	Required by langgraph and langchain-ollama

Install everything in one shot:

pip install -U langgraph langchain langchain-ollama
# checkpointing (separate package):
pip install -U langgraph-checkpoint-sqlite
# multi-agent supervisor (optional, last section):
pip install -U langgraph-supervisor

Then pull a tool-capable model. This is the single most common cause of "my agent ignores its tools" — not every model can emit tool calls. Per Ollama's own tool-support documentation, models like Llama 3.1, Mistral Nemo, Command-R+ and Firefunction v2 expose tool calling, and Qwen2.5 / Qwen3 are reliable choices too. For local agents I default to Qwen2.5:

ollama pull qwen2.5:7b      # strong, reliable tool selection
ollama pull llama3.1:8b     # solid, well-documented alternative

You can confirm a model advertises tools before you build anything:

ollama show qwen2.5:7b      # look for "tools" in the Capabilities line

If tools is missing from the capabilities, swap models. (Note: plain mistral-nemo has been flaky for tool calling in practice — if you want a Mistral, mistral-small is the safer pick.) For a fuller breakdown of which local models behave well as agents, see our roundup of the best Ollama models.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How do you connect ChatOllama to LangGraph?

The bridge is the ChatOllama class from langchain-ollama. It speaks the standard LangChain chat-model interface, so LangGraph treats it like any other model. The one method that matters for agents is .bind_tools(), which tells the model what functions it is allowed to call.

from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="qwen2.5:7b",
    temperature=0,          # deterministic tool selection
    # base_url="http://localhost:11434",  # default; change for remote Ollama
)

# quick sanity check — pure chat, no graph yet
print(llm.invoke("Say hi in five words.").content)

If that prints a reply, your local model is wired up. We never leave your machine — ChatOllama talks to the Ollama daemon on localhost:11434. For more on the underlying LangChain + Ollama wiring (embeddings, streaming, RAG), see our companion Ollama + LangChain integration guide.

Building a minimal State graph (nodes, edges, State)

Before agents, understand the three primitives LangGraph is built on:

State — a typed dictionary that flows through the graph. You declare its shape with a TypedDict, and you can attach a reducer (like add_messages) so updates append instead of overwrite.
Nodes — plain Python functions that take the state and return a partial update.
Edges — connections that decide which node runs next. Normal edges are fixed; conditional edges call a function to choose at runtime.

Here is the smallest useful graph: one node that calls the model, with the special START and END markers wiring it up.

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_ollama import ChatOllama

llm = ChatOllama(model="qwen2.5:7b", temperature=0)

class State(TypedDict):
    # add_messages is a reducer: new messages are appended, not replaced
    messages: Annotated[list, add_messages]

def chatbot(state: State):
    return {"messages": [llm.invoke(state["messages"])]}

graph = StateGraph(State)
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", END)
app = graph.compile()

result = app.invoke({"messages": [("user", "What is LangGraph in one sentence?")]})
print(result["messages"][-1].content)

That is the entire mental model: state in, node mutates it, edges route, state out. Everything else in this guide is the same pattern with more nodes and smarter edges.

How do you build a ReAct agent with tools in LangGraph?

The ReAct pattern (Reason + Act) is the workhorse of tool-using agents: the model reasons about the task, acts by calling a tool, reads the observation, and repeats until it can answer. LangGraph ships a prebuilt factory that assembles this loop for you.

First, define two real tools with the @tool decorator. The docstring is not optional — it is what the model reads to decide when to call the tool.

from langchain_core.tools import tool

@tool
def calculator(expression: str) -> str:
    """Evaluate a basic arithmetic expression, e.g. '23 * 7 + 1'."""
    import ast, operator
    ops = {ast.Add: operator.add, ast.Sub: operator.sub,
           ast.Mult: operator.mul, ast.Div: operator.truediv}
    def ev(node):
        if isinstance(node, ast.Constant):
            return node.value
        if isinstance(node, ast.BinOp):
            return ops[type(node.op)](ev(node.left), ev(node.right))
        raise ValueError("unsupported expression")
    return str(ev(ast.parse(expression, mode="eval").body))

@tool
def word_count(text: str) -> str:
    """Count the number of words in a piece of text."""
    return str(len(text.split()))

Now build the agent. In LangGraph 1.x the recommended factory is create_agent from the langchain package; the older create_react_agent from langgraph.prebuilt still works but is deprecated in favor of it. Both take a model and a list of tools and return a compiled, runnable graph.

from langchain.agents import create_agent   # LangGraph 1.x recommended factory
# (legacy equivalent: from langgraph.prebuilt import create_react_agent)
from langchain_ollama import ChatOllama

llm = ChatOllama(model="qwen2.5:7b", temperature=0)

agent = create_agent(llm, tools=[calculator, word_count])

result = agent.invoke(
    {"messages": [("user", "How many words are in 'the quick brown fox', and what is 23*7?")]}
)
print(result["messages"][-1].content)

Under the hood this is just a State graph with two nodes — the model and a tool runner — connected by a conditional edge that loops back whenever the model emits a tool call. Next we'll build that exact loop by hand so nothing is a black box.

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Conditional edges and loops (the manual ReAct loop)

The prebuilt agent is convenient, but you give up control. Building the loop yourself with ToolNode and tools_condition shows precisely how the agent decides to call a tool versus finish — and lets you insert your own logic (logging, guardrails, retries) into the cycle.

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_ollama import ChatOllama

tools = [calculator, word_count]
llm = ChatOllama(model="qwen2.5:7b", temperature=0).bind_tools(tools)

class State(TypedDict):
    messages: Annotated[list, add_messages]

def assistant(state: State):
    return {"messages": [llm.invoke(state["messages"])]}

graph = StateGraph(State)
graph.add_node("assistant", assistant)
graph.add_node("tools", ToolNode(tools))

graph.add_edge(START, "assistant")
# tools_condition routes to "tools" if the model asked for a tool,
# otherwise to END.
graph.add_conditional_edges("assistant", tools_condition)
# after running tools, loop BACK to the assistant to read the results
graph.add_edge("tools", "assistant")

app = graph.compile()

for chunk in app.stream(
    {"messages": [("user", "What is 144 / 12, then count the words in your answer?")]},
    stream_mode="values",
):
    chunk["messages"][-1].pretty_print()

The key line is add_conditional_edges("assistant", tools_condition). After the assistant node runs, tools_condition inspects the last message: if it contains tool calls it routes to the tools node; if not, it routes to END. The fixed edge from tools back to assistant closes the loop, so the model sees each tool result and can keep going. That cycle — assistant, tools, assistant, tools — is the entire ReAct algorithm expressed as a graph. To go deeper on the agent-from-scratch mindset, our build a local AI agent walkthrough covers the same loop from a different angle.

How do you add memory with SqliteSaver checkpointing?

So far every invoke starts from a blank slate. Checkpointing persists the graph's state after each step, so the agent remembers earlier turns — and you can resume or branch a conversation later. For local apps, SqliteSaver writes that state to a plain SQLite file on disk. It lives in the separate langgraph-checkpoint-sqlite package and is used as a context manager via from_conn_string.

from langgraph.checkpoint.sqlite import SqliteSaver

with SqliteSaver.from_conn_string("agent_memory.db") as checkpointer:
    app = graph.compile(checkpointer=checkpointer)

    # a thread_id namespaces one conversation
    config = {"configurable": {"thread_id": "user-42"}}

    app.invoke({"messages": [("user", "My name is Sam.")]}, config)
    out = app.invoke({"messages": [("user", "What is my name?")]}, config)
    print(out["messages"][-1].content)   # -> remembers "Sam"

Because the state is keyed by thread_id, you can run many independent conversations against the same database file, and the history survives a process restart. For async apps there is a matching AsyncSqliteSaver in langgraph.checkpoint.sqlite.aio. When you outgrow a single file, swap in the Postgres checkpointer with no changes to your graph logic.

Streaming output from a local agent

Local models feel a lot snappier when you stream tokens instead of waiting for the whole answer. LangGraph exposes streaming at the graph level through .stream() with several modes. The two you'll reach for most:

stream_mode	What you get	Use it for
`"values"`	The full state after each node runs	Watching the agent step through nodes
`"updates"`	Only the delta each node returns	Logging which node changed what
`"messages"`	LLM tokens as they generate, per node	A ChatGPT-style typing effect

For a typing effect, use stream_mode="messages", which yields message chunks token-by-token from any node that calls the model:

for token, metadata in app.stream(
    {"messages": [("user", "Explain checkpointing in two sentences.")]},
    config={"configurable": {"thread_id": "user-42"}},
    stream_mode="messages",
):
    if token.content:
        print(token.content, end="", flush=True)

That prints the answer as it is produced, which on a small local model on consumer hardware is the difference between "instant" and "did it freeze?".

A measured baseline on real hardware

Numbers help you set expectations, so here is what I see on my own machine rather than a vendor chart. On an RTX 3090 (24GB) running qwen2.5:7b at Q4_K_M through ChatOllama, a single ReAct turn that calls one tool completes in roughly 2-4 seconds end to end, with the model itself generating somewhere around 50-70 tokens/sec. Each extra tool-call loop adds another model round-trip, so a two-tool task lands closer to 5-8 seconds. These are approximate, single-machine figures — your throughput moves with the model, quant, GPU and prompt length — but they reflect a real, fully-offline run, not a controlled benchmark. The practical takeaway: keep the model small enough to stay entirely in VRAM (a 7B-8B tool model is the sweet spot for agents), because the moment layers spill to system RAM, each ReAct loop slows enough to make the agent feel sluggish. To size a specific model and quant against your own GPU, run it through our VRAM calculator.

Multi-agent supervisor with LangGraph

When one agent juggling ten tools gets unreliable, split the work across specialists and put a supervisor in charge of routing. The langgraph-supervisor package builds this hierarchy for you: each worker is its own agent, and the supervisor decides which one to hand off to next based on the conversation.

from langchain.agents import create_agent
from langgraph_supervisor import create_supervisor
from langchain_ollama import ChatOllama

llm = ChatOllama(model="qwen2.5:7b", temperature=0)

math_agent = create_agent(
    llm, tools=[calculator], name="math_expert",
    prompt="You are a math expert. Use the calculator tool for any arithmetic.",
)
text_agent = create_agent(
    llm, tools=[word_count], name="text_expert",
    prompt="You are a text analysis expert. Use word_count for counting words.",
)

supervisor = create_supervisor(
    agents=[math_agent, text_agent],
    model=llm,
    prompt=(
        "You manage a math expert and a text expert. "
        "Delegate each request to the right specialist."
    ),
).compile()

result = supervisor.invoke(
    {"messages": [("user", "Count the words in 'hello there friend' and compute 9*9.")]}
)
print(result["messages"][-1].content)

The supervisor uses tool-based handoffs under the hood — delegating to a worker is itself modeled as a tool call — which is why a tool-capable local model matters here even more than in a single-agent setup. Keep each worker's tool list short; small local models route far more accurately when each specialist owns two or three tools rather than a dozen.

LangGraph vs CrewAI: which should you use locally?

Both run fully local against Ollama, but they sit at different altitudes. LangGraph is a low-level graph framework — you control state, edges and loops explicitly. CrewAI is higher-level and role-based — you describe agents, tasks and a process, and it orchestrates them for you. The trade is control versus speed-to-first-result.

Dimension	LangGraph	CrewAI
Abstraction	Low-level state machine (nodes/edges)	High-level roles, tasks, crews
Best for	Custom control flow, loops, branching, human-in-the-loop	Quick role-based multi-agent teams
State control	Explicit typed State + reducers	Mostly managed for you
Checkpointing	Built-in (SqliteSaver / Postgres)	Less granular
Learning curve	Steeper — you build the graph	Gentler — declarative setup
Local Ollama	Via ChatOllama	Via ChatOllama / LiteLLM

Pick LangGraph when you need precise control over the agent's flow — conditional loops, retries, human approval steps, durable state you can resume. Pick CrewAI when you want a role-based team running fast and don't need to hand-tune the control flow; our CrewAI local setup guide walks through that path. Many teams reach for CrewAI to prototype, then drop to LangGraph when they hit a control-flow wall. For the full landscape across both plus alternatives, see our AI agent frameworks comparison.

Key Takeaways

The stack is langgraph + langchain-ollama + a tool-capable local model. Use LangGraph 1.x, langchain-ollama 0.3+, Ollama 0.5.2+, and a model like qwen2.5:7b or llama3.1:8b that supports tool calling.
Verify tool support first. Run ollama show and look for tools in the capabilities — a model without it will reply in text instead of calling your functions, which is the #1 reason agents "ignore" tools.
LangGraph is a state machine. State (a typed dict with reducers), nodes (functions), and edges (including conditional ones via tools_condition) give you explicit control over the ReAct loop.
create_agent is the current factory. It replaces the now-deprecated create_react_agent; both build the assistant-tools-assistant loop you can also wire by hand with ToolNode.
Persist with SqliteSaver, stream with stream_mode="messages". Checkpointing (separate langgraph-checkpoint-sqlite package) gives durable per-thread memory; streaming makes local models feel responsive.
Choose LangGraph for control, CrewAI for speed. Use LangGraph when you need custom loops, branching, or human-in-the-loop; reach for CrewAI for fast role-based teams.

Next Steps

New to running agents locally? Start with our from-scratch build a local AI agent tutorial, then come back for the graph-based version.
Want the simpler LangChain wiring behind ChatOllama (embeddings, RAG, streaming)? Read the Ollama + LangChain integration guide.
Prefer a higher-level, role-based framework? Compare the workflow in our CrewAI local setup guide.
Deciding between frameworks? Our AI agent frameworks comparison ranks LangGraph, CrewAI and the rest.
Not sure which local model to run as the agent brain? See the best Ollama models for tool-capable picks and VRAM tiers.
Official sources: the LangGraph GitHub repo and Ollama's tool-support documentation.

LangGraph + Ollama: Build Local AI Agents (2026 Guide)

Want to go deeper than this article?

What you need before you start

Reading articles is good. Building is better.

How do you connect ChatOllama to LangGraph?

Building a minimal State graph (nodes, edges, State)

How do you build a ReAct agent with tools in LangGraph?

Reading articles is good. Building is better.

Conditional edges and loops (the manual ReAct loop)

How do you add memory with SqliteSaver checkpointing?

Streaming output from a local agent

A measured baseline on real hardware

Multi-agent supervisor with LangGraph

LangGraph vs CrewAI: which should you use locally?

Key Takeaways

Next Steps

Ollama’s running. Here’s what to build with it.

Liked this? 20 full AI courses are waiting.

Local AI Master Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

Ready to Go Beyond Tutorials?

Go from reading about AI to building with AI

Related Guides

Build a Local AI Agent

Ollama + LangChain Integration

CrewAI Local Setup Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Go from reading about AI to building with AI