Ollama Tool Calling Guide: Build AI Agents with Local LLMs
Before we dive deeper...
Get your free AI Starter Kit
Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.
Ollama tool calling (function calling) lets your local LLM interact with external tools — search the web, query databases, execute code, read files, and call APIs. Send a tools array in the /api/chat request with function definitions, and compatible models (Llama 3.1+, Qwen 2.5+, Mistral) return structured JSON with the function name and arguments. No cloud API needed — everything runs locally.
This guide covers the complete tool calling workflow: how it works, which models support it, Python and JavaScript implementations, building a multi-tool agent, and production best practices.
Table of Contents
- How Tool Calling Works
- Supported Models
- Basic Example (Python)
- Basic Example (JavaScript)
- Multi-Tool Agent Pattern
- Real-World Tools
- Using with Frameworks
- Best Practices
- FAQ
How Tool Calling Works {#how-tool-calling-works}
Tool calling follows a 4-step loop:
Step 1 — Define tools: You describe your available functions (name, description, parameters) in JSON schema format and send them with your chat request.
Step 2 — Model decides: The LLM reads the user's message and your tool definitions. If a tool is relevant, it returns a tool_calls response instead of regular text.
Step 3 — Execute locally: Your code receives the tool call, runs the actual function (API call, database query, file operation), and gets the result.
Step 4 — Send result back: You send the tool result back to the model as a tool message. The model incorporates the result and generates its final response.
User: "What's the weather in Tokyo?"
→ Model sees get_weather tool → returns: tool_calls: [{name: "get_weather", args: {city: "Tokyo"}}]
→ Your code calls weather API → result: "22°C, partly cloudy"
→ Model receives result → "The weather in Tokyo is 22°C and partly cloudy."
The model never executes code or accesses the internet directly. It only decides which tool to call and what arguments to pass. Your code handles all execution.
Supported Models {#supported-models}
Not all Ollama models support tool calling. Here are the confirmed models as of March 2026:
| Model | Size | VRAM (Q4) | Tool Calling Quality | Install |
|---|---|---|---|---|
| Llama 3.1 8B | 8B | 5.5 GB | Good | ollama pull llama3.1 |
| Qwen 2.5 7B | 7B | 5 GB | Good | ollama pull qwen2.5:7b |
| Qwen 2.5 14B | 14B | 9.5 GB | Very Good | ollama pull qwen2.5:14b |
| Qwen 2.5 32B | 32B | 22 GB | Excellent | ollama pull qwen2.5:32b |
| Llama 3.3 70B | 70B | 42 GB | Excellent | ollama pull llama3.3:70b |
| Mistral 7B | 7B | 5 GB | Good | ollama pull mistral |
| Mistral Small | 24B | 15 GB | Very Good | ollama pull mistral-small |
| Llama 4 Scout | 109B MoE | 55 GB | Excellent | ollama pull llama4-scout |
Recommendation: Start with Llama 3.1 8B for development (fast, 5.5GB). Use Qwen 2.5 14B+ for production (more reliable tool selection). Check our VRAM Calculator to verify your GPU can run the model.
Basic Example (Python) {#basic-python-example}
Minimal tool calling example
import requests
import json
# Step 1: Define your tools
tools = [
{
"type": "function",
"function": {
"name": "get_current_time",
"description": "Get the current date and time",
"parameters": {
"type": "object",
"properties": {
"timezone": {
"type": "string",
"description": "Timezone (e.g., 'UTC', 'US/Eastern', 'Asia/Tokyo')"
}
},
"required": ["timezone"]
}
}
}
]
# Step 2: Send message with tools
response = requests.post("http://localhost:11434/api/chat", json={
"model": "llama3.1",
"messages": [{"role": "user", "content": "What time is it in Tokyo?"}],
"tools": tools,
"stream": False
})
message = response.json()["message"]
# Step 3: Check for tool calls
if message.get("tool_calls"):
for tool_call in message["tool_calls"]:
func_name = tool_call["function"]["name"]
func_args = tool_call["function"]["arguments"]
print(f"Model wants to call: {func_name}({func_args})")
# Step 3: Execute the function
if func_name == "get_current_time":
from datetime import datetime
import pytz
tz = pytz.timezone(func_args["timezone"])
result = datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S %Z")
# Step 4: Send result back
final = requests.post("http://localhost:11434/api/chat", json={
"model": "llama3.1",
"messages": [
{"role": "user", "content": "What time is it in Tokyo?"},
message, # Include the assistant's tool_calls message
{"role": "tool", "content": result}
],
"stream": False
})
print(final.json()["message"]["content"])
else:
# No tool call — direct response
print(message["content"])
Using the Python library
import ollama
tools = [{
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a mathematical expression",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression like '2 + 2' or 'sqrt(144)'"}
},
"required": ["expression"]
}
}
}]
response = ollama.chat(
model="llama3.1",
messages=[{"role": "user", "content": "What is 15% of 847?"}],
tools=tools
)
if response["message"].get("tool_calls"):
tool_call = response["message"]["tool_calls"][0]
expression = tool_call["function"]["arguments"]["expression"]
result = str(eval(expression)) # In production, use a safe math parser
# Send result back
final = ollama.chat(
model="llama3.1",
messages=[
{"role": "user", "content": "What is 15% of 847?"},
response["message"],
{"role": "tool", "content": result}
]
)
print(final["message"]["content"])
Basic Example (JavaScript) {#basic-javascript-example}
import { Ollama } from 'ollama'
const ollama = new Ollama()
const tools = [{
type: 'function',
function: {
name: 'search_web',
description: 'Search the web for current information',
parameters: {
type: 'object',
properties: {
query: { type: 'string', description: 'Search query' }
},
required: ['query']
}
}
}]
// Send message with tools
const response = await ollama.chat({
model: 'llama3.1',
messages: [{ role: 'user', content: 'Search for the latest Ollama version' }],
tools
})
if (response.message.tool_calls) {
for (const toolCall of response.message.tool_calls) {
console.log(`Calling: ${toolCall.function.name}(${JSON.stringify(toolCall.function.arguments)})`)
// Execute tool (your implementation)
const result = await executeSearch(toolCall.function.arguments.query)
// Send result back
const final = await ollama.chat({
model: 'llama3.1',
messages: [
{ role: 'user', content: 'Search for the latest Ollama version' },
response.message,
{ role: 'tool', content: result }
]
})
console.log(final.message.content)
}
}
Multi-Tool Agent Pattern {#multi-tool-agent}
Real agents use multiple tools in a loop. Here is the complete pattern:
import ollama
# Define multiple tools
tools = [
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the internet for current information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read contents of a local file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path"}
},
"required": ["path"]
}
}
},
{
"type": "function",
"function": {
"name": "run_python",
"description": "Execute Python code and return the output",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Python code to execute"}
},
"required": ["code"]
}
}
}
]
# Tool implementations
def execute_tool(name, args):
if name == "search_web":
from duckduckgo_search import DDGS
results = list(DDGS().text(args["query"], max_results=3))
return "\n".join(f"- {r['title']}: {r['body']}" for r in results)
elif name == "read_file":
return open(args["path"]).read()[:5000]
elif name == "run_python":
import subprocess
result = subprocess.run(["python3", "-c", args["code"]],
capture_output=True, text=True, timeout=10)
return result.stdout or result.stderr
return "Unknown tool"
# Agent loop
def run_agent(question, max_iterations=10):
messages = [{"role": "user", "content": question}]
for i in range(max_iterations):
response = ollama.chat(
model="qwen2.5:14b",
messages=messages,
tools=tools
)
message = response["message"]
messages.append(message)
# If no tool calls, we have the final answer
if not message.get("tool_calls"):
return message["content"]
# Execute each tool call
for tool_call in message["tool_calls"]:
name = tool_call["function"]["name"]
args = tool_call["function"]["arguments"]
print(f" [{i+1}] Calling {name}({args})")
result = execute_tool(name, args)
messages.append({"role": "tool", "content": str(result)})
return "Agent reached max iterations"
# Run it
answer = run_agent("Search for the latest Ollama release and tell me what's new")
print(answer)
This is the same pattern used in our AI Agent Frameworks Comparison — CrewAI, LangGraph, and AutoGen all implement this loop with additional features like memory, error handling, and parallel tool execution.
Real-World Tools {#real-world-tools}
Here are production-ready tool definitions for common use cases:
Web Search
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the internet for current information. Use when the user asks about recent events, current data, or anything that might have changed after your training.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Specific search query"},
"num_results": {"type": "integer", "description": "Number of results (default 5)"}
},
"required": ["query"]
}
}
}
Database Query
{
"type": "function",
"function": {
"name": "query_database",
"description": "Run a read-only SQL query against the application database. Only SELECT queries are allowed.",
"parameters": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "SQL SELECT query"},
"limit": {"type": "integer", "description": "Max rows to return (default 10)"}
},
"required": ["sql"]
}
}
}
Send Email
{
"type": "function",
"function": {
"name": "send_email",
"description": "Send an email. Use only when the user explicitly asks to send an email.",
"parameters": {
"type": "object",
"properties": {
"to": {"type": "string", "description": "Recipient email address"},
"subject": {"type": "string", "description": "Email subject line"},
"body": {"type": "string", "description": "Email body text"}
},
"required": ["to", "subject", "body"]
}
}
}
Using with Frameworks {#frameworks}
LangChain + Ollama
from langchain_ollama import ChatOllama
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# Your implementation
return f"Weather in {city}: 22°C, sunny"
llm = ChatOllama(model="llama3.1")
llm_with_tools = llm.bind_tools([get_weather])
result = llm_with_tools.invoke("What's the weather in Paris?")
CrewAI + Ollama
from crewai import Agent, Task, Crew
from crewai_tools import tool
@tool("Search Tool")
def search(query: str) -> str:
"""Search the web for information."""
# Your implementation
return "search results..."
researcher = Agent(
role="Researcher",
goal="Find accurate information",
tools=[search],
llm="ollama/llama3.1"
)
For a complete framework comparison, see our AI Agent Frameworks guide.
Best Practices {#best-practices}
1. Write clear tool descriptions
The model decides which tool to use based on the description field. Vague descriptions cause wrong tool selection.
Bad: "description": "Get data"
Good: "description": "Search the web for current information. Use when the user asks about recent events or data not in your training."
2. Use smaller, focused tools
Break complex operations into simple tools. Instead of one do_everything tool, create search_web, read_file, run_code separately.
3. Set low temperature for reliability
Tool calling requires structured JSON output. Higher temperatures increase the chance of malformed responses.
ollama.chat(model="llama3.1", messages=messages, tools=tools,
options={"temperature": 0.1})
4. Validate tool arguments
Never trust model-generated arguments blindly. Validate types, sanitize strings, check for path traversal in file operations, and use parameterized SQL queries.
5. Set iteration limits
Always cap the agent loop. Models can get stuck in tool-calling cycles. 5-10 iterations is usually sufficient.
6. Handle errors gracefully
If a tool fails, send the error back as the tool result. The model can often recover and try a different approach.
try:
result = execute_tool(name, args)
except Exception as e:
result = f"Error: {str(e)}. Try a different approach."
FAQ {#faq}
See answers to common questions about Ollama tool calling below.
Sources: Ollama Tool Calling Documentation | Ollama Blog: Tool Support | LangChain Ollama Integration | CrewAI Documentation
Ready to start your AI career?
Get the complete roadmap
Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.
Want structured AI education?
10 courses, 160+ chapters, from $9. Understand AI, don't just use it.
Continue Your Local AI Journey
Comments (0)
No comments yet. Be the first to share your thoughts!