Tutorial

Ollama Tool Calling Guide: Build AI Agents with Local LLMs

March 19, 2026
16 min read
LocalAimaster Research Team
🎁 4 PDFs included
Newsletter

Before we dive deeper...

Get your free AI Starter Kit

Join 12,000+ developers. Instant download: Career Roadmap + Fundamentals Cheat Sheets.

No spam, everUnsubscribe anytime
12,000+ downloads

Ollama tool calling (function calling) lets your local LLM interact with external tools — search the web, query databases, execute code, read files, and call APIs. Send a tools array in the /api/chat request with function definitions, and compatible models (Llama 3.1+, Qwen 2.5+, Mistral) return structured JSON with the function name and arguments. No cloud API needed — everything runs locally.

This guide covers the complete tool calling workflow: how it works, which models support it, Python and JavaScript implementations, building a multi-tool agent, and production best practices.

Table of Contents

  1. How Tool Calling Works
  2. Supported Models
  3. Basic Example (Python)
  4. Basic Example (JavaScript)
  5. Multi-Tool Agent Pattern
  6. Real-World Tools
  7. Using with Frameworks
  8. Best Practices
  9. FAQ

How Tool Calling Works {#how-tool-calling-works}

Tool calling follows a 4-step loop:

Step 1 — Define tools: You describe your available functions (name, description, parameters) in JSON schema format and send them with your chat request.

Step 2 — Model decides: The LLM reads the user's message and your tool definitions. If a tool is relevant, it returns a tool_calls response instead of regular text.

Step 3 — Execute locally: Your code receives the tool call, runs the actual function (API call, database query, file operation), and gets the result.

Step 4 — Send result back: You send the tool result back to the model as a tool message. The model incorporates the result and generates its final response.

User: "What's the weather in Tokyo?"
  → Model sees get_weather tool → returns: tool_calls: [{name: "get_weather", args: {city: "Tokyo"}}]
  → Your code calls weather API → result: "22°C, partly cloudy"
  → Model receives result → "The weather in Tokyo is 22°C and partly cloudy."

The model never executes code or accesses the internet directly. It only decides which tool to call and what arguments to pass. Your code handles all execution.


Supported Models {#supported-models}

Not all Ollama models support tool calling. Here are the confirmed models as of March 2026:

ModelSizeVRAM (Q4)Tool Calling QualityInstall
Llama 3.1 8B8B5.5 GBGoodollama pull llama3.1
Qwen 2.5 7B7B5 GBGoodollama pull qwen2.5:7b
Qwen 2.5 14B14B9.5 GBVery Goodollama pull qwen2.5:14b
Qwen 2.5 32B32B22 GBExcellentollama pull qwen2.5:32b
Llama 3.3 70B70B42 GBExcellentollama pull llama3.3:70b
Mistral 7B7B5 GBGoodollama pull mistral
Mistral Small24B15 GBVery Goodollama pull mistral-small
Llama 4 Scout109B MoE55 GBExcellentollama pull llama4-scout

Recommendation: Start with Llama 3.1 8B for development (fast, 5.5GB). Use Qwen 2.5 14B+ for production (more reliable tool selection). Check our VRAM Calculator to verify your GPU can run the model.


Basic Example (Python) {#basic-python-example}

Minimal tool calling example

import requests
import json

# Step 1: Define your tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Get the current date and time",
            "parameters": {
                "type": "object",
                "properties": {
                    "timezone": {
                        "type": "string",
                        "description": "Timezone (e.g., 'UTC', 'US/Eastern', 'Asia/Tokyo')"
                    }
                },
                "required": ["timezone"]
            }
        }
    }
]

# Step 2: Send message with tools
response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "What time is it in Tokyo?"}],
    "tools": tools,
    "stream": False
})

message = response.json()["message"]

# Step 3: Check for tool calls
if message.get("tool_calls"):
    for tool_call in message["tool_calls"]:
        func_name = tool_call["function"]["name"]
        func_args = tool_call["function"]["arguments"]
        print(f"Model wants to call: {func_name}({func_args})")

        # Step 3: Execute the function
        if func_name == "get_current_time":
            from datetime import datetime
            import pytz
            tz = pytz.timezone(func_args["timezone"])
            result = datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S %Z")

        # Step 4: Send result back
        final = requests.post("http://localhost:11434/api/chat", json={
            "model": "llama3.1",
            "messages": [
                {"role": "user", "content": "What time is it in Tokyo?"},
                message,  # Include the assistant's tool_calls message
                {"role": "tool", "content": result}
            ],
            "stream": False
        })
        print(final.json()["message"]["content"])
else:
    # No tool call — direct response
    print(message["content"])

Using the Python library

import ollama

tools = [{
    "type": "function",
    "function": {
        "name": "calculate",
        "description": "Evaluate a mathematical expression",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression like '2 + 2' or 'sqrt(144)'"}
            },
            "required": ["expression"]
        }
    }
}]

response = ollama.chat(
    model="llama3.1",
    messages=[{"role": "user", "content": "What is 15% of 847?"}],
    tools=tools
)

if response["message"].get("tool_calls"):
    tool_call = response["message"]["tool_calls"][0]
    expression = tool_call["function"]["arguments"]["expression"]
    result = str(eval(expression))  # In production, use a safe math parser

    # Send result back
    final = ollama.chat(
        model="llama3.1",
        messages=[
            {"role": "user", "content": "What is 15% of 847?"},
            response["message"],
            {"role": "tool", "content": result}
        ]
    )
    print(final["message"]["content"])

Basic Example (JavaScript) {#basic-javascript-example}

import { Ollama } from 'ollama'

const ollama = new Ollama()

const tools = [{
  type: 'function',
  function: {
    name: 'search_web',
    description: 'Search the web for current information',
    parameters: {
      type: 'object',
      properties: {
        query: { type: 'string', description: 'Search query' }
      },
      required: ['query']
    }
  }
}]

// Send message with tools
const response = await ollama.chat({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Search for the latest Ollama version' }],
  tools
})

if (response.message.tool_calls) {
  for (const toolCall of response.message.tool_calls) {
    console.log(`Calling: ${toolCall.function.name}(${JSON.stringify(toolCall.function.arguments)})`)

    // Execute tool (your implementation)
    const result = await executeSearch(toolCall.function.arguments.query)

    // Send result back
    const final = await ollama.chat({
      model: 'llama3.1',
      messages: [
        { role: 'user', content: 'Search for the latest Ollama version' },
        response.message,
        { role: 'tool', content: result }
      ]
    })
    console.log(final.message.content)
  }
}

Multi-Tool Agent Pattern {#multi-tool-agent}

Real agents use multiple tools in a loop. Here is the complete pattern:

import ollama

# Define multiple tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the internet for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read contents of a local file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path"}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": "Execute Python code and return the output",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string", "description": "Python code to execute"}
                },
                "required": ["code"]
            }
        }
    }
]

# Tool implementations
def execute_tool(name, args):
    if name == "search_web":
        from duckduckgo_search import DDGS
        results = list(DDGS().text(args["query"], max_results=3))
        return "\n".join(f"- {r['title']}: {r['body']}" for r in results)
    elif name == "read_file":
        return open(args["path"]).read()[:5000]
    elif name == "run_python":
        import subprocess
        result = subprocess.run(["python3", "-c", args["code"]],
                              capture_output=True, text=True, timeout=10)
        return result.stdout or result.stderr
    return "Unknown tool"

# Agent loop
def run_agent(question, max_iterations=10):
    messages = [{"role": "user", "content": question}]

    for i in range(max_iterations):
        response = ollama.chat(
            model="qwen2.5:14b",
            messages=messages,
            tools=tools
        )

        message = response["message"]
        messages.append(message)

        # If no tool calls, we have the final answer
        if not message.get("tool_calls"):
            return message["content"]

        # Execute each tool call
        for tool_call in message["tool_calls"]:
            name = tool_call["function"]["name"]
            args = tool_call["function"]["arguments"]
            print(f"  [{i+1}] Calling {name}({args})")

            result = execute_tool(name, args)
            messages.append({"role": "tool", "content": str(result)})

    return "Agent reached max iterations"

# Run it
answer = run_agent("Search for the latest Ollama release and tell me what's new")
print(answer)

This is the same pattern used in our AI Agent Frameworks ComparisonCrewAI, LangGraph, and AutoGen all implement this loop with additional features like memory, error handling, and parallel tool execution.


Real-World Tools {#real-world-tools}

Here are production-ready tool definitions for common use cases:

{
    "type": "function",
    "function": {
        "name": "search_web",
        "description": "Search the internet for current information. Use when the user asks about recent events, current data, or anything that might have changed after your training.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Specific search query"},
                "num_results": {"type": "integer", "description": "Number of results (default 5)"}
            },
            "required": ["query"]
        }
    }
}

Database Query

{
    "type": "function",
    "function": {
        "name": "query_database",
        "description": "Run a read-only SQL query against the application database. Only SELECT queries are allowed.",
        "parameters": {
            "type": "object",
            "properties": {
                "sql": {"type": "string", "description": "SQL SELECT query"},
                "limit": {"type": "integer", "description": "Max rows to return (default 10)"}
            },
            "required": ["sql"]
        }
    }
}

Send Email

{
    "type": "function",
    "function": {
        "name": "send_email",
        "description": "Send an email. Use only when the user explicitly asks to send an email.",
        "parameters": {
            "type": "object",
            "properties": {
                "to": {"type": "string", "description": "Recipient email address"},
                "subject": {"type": "string", "description": "Email subject line"},
                "body": {"type": "string", "description": "Email body text"}
            },
            "required": ["to", "subject", "body"]
        }
    }
}

Using with Frameworks {#frameworks}

LangChain + Ollama

from langchain_ollama import ChatOllama
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # Your implementation
    return f"Weather in {city}: 22°C, sunny"

llm = ChatOllama(model="llama3.1")
llm_with_tools = llm.bind_tools([get_weather])

result = llm_with_tools.invoke("What's the weather in Paris?")

CrewAI + Ollama

from crewai import Agent, Task, Crew
from crewai_tools import tool

@tool("Search Tool")
def search(query: str) -> str:
    """Search the web for information."""
    # Your implementation
    return "search results..."

researcher = Agent(
    role="Researcher",
    goal="Find accurate information",
    tools=[search],
    llm="ollama/llama3.1"
)

For a complete framework comparison, see our AI Agent Frameworks guide.


Best Practices {#best-practices}

1. Write clear tool descriptions

The model decides which tool to use based on the description field. Vague descriptions cause wrong tool selection.

Bad: "description": "Get data" Good: "description": "Search the web for current information. Use when the user asks about recent events or data not in your training."

2. Use smaller, focused tools

Break complex operations into simple tools. Instead of one do_everything tool, create search_web, read_file, run_code separately.

3. Set low temperature for reliability

Tool calling requires structured JSON output. Higher temperatures increase the chance of malformed responses.

ollama.chat(model="llama3.1", messages=messages, tools=tools,
            options={"temperature": 0.1})

4. Validate tool arguments

Never trust model-generated arguments blindly. Validate types, sanitize strings, check for path traversal in file operations, and use parameterized SQL queries.

5. Set iteration limits

Always cap the agent loop. Models can get stuck in tool-calling cycles. 5-10 iterations is usually sufficient.

6. Handle errors gracefully

If a tool fails, send the error back as the tool result. The model can often recover and try a different approach.

try:
    result = execute_tool(name, args)
except Exception as e:
    result = f"Error: {str(e)}. Try a different approach."

FAQ {#faq}

See answers to common questions about Ollama tool calling below.


Sources: Ollama Tool Calling Documentation | Ollama Blog: Tool Support | LangChain Ollama Integration | CrewAI Documentation

🚀 Join 12K+ developers
Newsletter

Ready to start your AI career?

Get the complete roadmap

Download the AI Starter Kit: Career path, fundamentals, and cheat sheets used by 12K+ developers.

No spam, everUnsubscribe anytime
12,000+ downloads
Reading now
Join the discussion

LocalAimaster Research Team

Creator of Local AI Master. I've built datasets with over 77,000 examples and trained AI models from scratch. Now I help people achieve AI independence through local AI mastery.

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Want structured AI education?

10 courses, 160+ chapters, from $9. Understand AI, don't just use it.

AI Learning Path

Comments (0)

No comments yet. Be the first to share your thoughts!

📅 Published: March 19, 2026🔄 Last Updated: March 19, 2026✓ Manually Reviewed

Skip the setup

AI Agent Starter Kit$19

3 production-ready agents with tool calling already wired up. Research, Code Review, Data Analysis.

Get It Now →

My 77K Dataset Insights Delivered Weekly

Get exclusive access to real dataset optimization strategies and AI model performance tips.

Was this helpful?

PR

Written by Pattanaik Ramswarup

AI Engineer & Dataset Architect | Creator of the 77,000 Training Dataset

I've personally trained over 50 AI models from scratch and spent 2,000+ hours optimizing local AI deployments. My 77K dataset project revolutionized how businesses approach AI training. Every guide on this site is based on real hands-on experience, not theory. I test everything on my own hardware before writing about it.

✓ 10+ Years in ML/AI✓ 77K Dataset Creator✓ Open Source Contributor
Free Tools & Calculators