Which Ollama models support tool calling?

As of March 2026, these Ollama models support tool calling: Llama 3.1 (all sizes — 8B, 70B, 405B), Llama 3.3 70B, Qwen 2.5 (all sizes — 7B, 14B, 32B, 72B), Mistral 7B Instruct, Mistral Small 24B, Mistral Large, Command R+, and Firefunction V2. Llama 3.1 8B and Qwen 2.5 7B are the most popular choices for local tool calling on consumer hardware. Models must be pulled with tool-calling support — the default Ollama tags include it.

What is the difference between tool calling and function calling?

They are the same thing — different terminology for the same capability. "Function calling" was coined by OpenAI when they launched the feature in GPT-3.5/4. "Tool calling" is the broader term used by Anthropic, Meta (Llama), and Ollama. In both cases, the model reads your function definitions, decides when to use them, and returns structured JSON with the function name and arguments. Ollama uses the "tools" parameter in its API.

How do I debug tool calling issues in Ollama?

Common issues and fixes: 1) Model ignores tools — ensure you are using a supported model (Llama 3.1+, Qwen 2.5+) and that the tools array is correctly formatted in your API request. 2) Invalid JSON in arguments — set temperature to 0.1-0.3 for more reliable structured output. 3) Model calls wrong tool — improve your tool descriptions to be more specific and distinct. 4) Enable debug logging with OLLAMA_DEBUG=1 to see the raw model output before tool parsing.

Can Ollama tool calling work with streaming responses?

Yes, but with a caveat. When stream is true, Ollama streams text tokens normally. When the model decides to call a tool, the stream includes a tool_calls object in the final message. You process the tool call, execute the function, send the result back, and the model continues generating (streaming) its response. For simpler implementations, set stream to false — you get the complete response including any tool calls in one JSON object.

How many tools can I define for Ollama?

There is no hard limit on the number of tools, but practical limits exist. With 7B models, 3-5 tools work reliably. With 14B+ models, 5-10 tools work well. With 70B+ models, 10-20 tools are feasible. More tools increase the prompt length (each tool definition uses tokens) and can confuse smaller models. Best practice: define only the tools relevant to the current task, not every tool you have available.

Can I use Ollama tool calling with LangChain or CrewAI?

Yes. LangChain supports Ollama tool calling through ChatOllama with bind_tools(). CrewAI supports it through its Ollama integration — define tools with the @tool decorator and CrewAI handles the tool calling loop automatically. LangGraph provides the most control with explicit tool nodes in your graph. All three frameworks handle the multi-turn conversation (call tool → send result → get final answer) automatically.

Ollama Tool Calling Guide: Build AI Agents with Local LLMs

Ollama tool calling (function calling) lets your local LLM interact with external tools — search the web, query databases, execute code, read files, and call APIs. Send a tools array in the /api/chat request with function definitions, and compatible models (Llama 3.1+, Qwen 2.5+, Mistral) return structured JSON with the function name and arguments. No cloud API needed — everything runs locally.

This guide covers the complete tool calling workflow: how it works, which models support it, Python and JavaScript implementations, building a multi-tool agent, and production best practices.

How Tool Calling Works
Supported Models
Basic Example (Python)
Basic Example (JavaScript)
Multi-Tool Agent Pattern
Real-World Tools
Using with Frameworks
Best Practices
FAQ

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

How Tool Calling Works {#how-tool-calling-works}

Tool calling follows a 4-step loop:

Step 1 — Define tools: You describe your available functions (name, description, parameters) in JSON schema format and send them with your chat request.

Step 2 — Model decides: The LLM reads the user's message and your tool definitions. If a tool is relevant, it returns a tool_calls response instead of regular text.

Step 3 — Execute locally: Your code receives the tool call, runs the actual function (API call, database query, file operation), and gets the result.

Step 4 — Send result back: You send the tool result back to the model as a tool message. The model incorporates the result and generates its final response.

User: "What's the weather in Tokyo?"
  → Model sees get_weather tool → returns: tool_calls: [{name: "get_weather", args: {city: "Tokyo"}}]
  → Your code calls weather API → result: "22°C, partly cloudy"
  → Model receives result → "The weather in Tokyo is 22°C and partly cloudy."

The model never executes code or accesses the internet directly. It only decides which tool to call and what arguments to pass. Your code handles all execution.

Supported Models {#supported-models}

Not all Ollama models support tool calling. Here are the confirmed models as of March 2026:

Model	Size	VRAM (Q4)	Tool Calling Quality	Install
Llama 3.1 8B	8B	5.5 GB	Good	`ollama pull llama3.1`
Qwen 2.5 7B	7B	5 GB	Good	`ollama pull qwen2.5:7b`
Qwen 2.5 14B	14B	9.5 GB	Very Good	`ollama pull qwen2.5:14b`
Qwen 2.5 32B	32B	22 GB	Excellent	`ollama pull qwen2.5:32b`
Llama 3.3 70B	70B	42 GB	Excellent	`ollama pull llama3.3:70b`
Mistral 7B	7B	5 GB	Good	`ollama pull mistral`
Mistral Small	24B	15 GB	Very Good	`ollama pull mistral-small`
Llama 4 Scout	109B MoE	55 GB	Excellent	`ollama pull llama4-scout`

Recommendation: Start with Llama 3.1 8B for development (fast, 5.5GB). Use Qwen 2.5 14B+ for production (more reliable tool selection). Check our VRAM Calculator to verify your GPU can run the model.

Basic Example (Python) {#basic-python-example}

Minimal tool calling example

import requests
import json

# Step 1: Define your tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_time",
            "description": "Get the current date and time",
            "parameters": {
                "type": "object",
                "properties": {
                    "timezone": {
                        "type": "string",
                        "description": "Timezone (e.g., 'UTC', 'US/Eastern', 'Asia/Tokyo')"
                    }
                },
                "required": ["timezone"]
            }
        }
    }
]

# Step 2: Send message with tools
response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "What time is it in Tokyo?"}],
    "tools": tools,
    "stream": False
})

message = response.json()["message"]

# Step 3: Check for tool calls
if message.get("tool_calls"):
    for tool_call in message["tool_calls"]:
        func_name = tool_call["function"]["name"]
        func_args = tool_call["function"]["arguments"]
        print(f"Model wants to call: {func_name}({func_args})")

        # Step 3: Execute the function
        if func_name == "get_current_time":
            from datetime import datetime
            import pytz
            tz = pytz.timezone(func_args["timezone"])
            result = datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S %Z")

        # Step 4: Send result back
        final = requests.post("http://localhost:11434/api/chat", json={
            "model": "llama3.1",
            "messages": [
                {"role": "user", "content": "What time is it in Tokyo?"},
                message,  # Include the assistant's tool_calls message
                {"role": "tool", "content": result}
            ],
            "stream": False
        })
        print(final.json()["message"]["content"])
else:
    # No tool call — direct response
    print(message["content"])

Using the Python library

import ollama

tools = [{
    "type": "function",
    "function": {
        "name": "calculate",
        "description": "Evaluate a mathematical expression",
        "parameters": {
            "type": "object",
            "properties": {
                "expression": {"type": "string", "description": "Math expression like '2 + 2' or 'sqrt(144)'"}
            },
            "required": ["expression"]
        }
    }
}]

response = ollama.chat(
    model="llama3.1",
    messages=[{"role": "user", "content": "What is 15% of 847?"}],
    tools=tools
)

if response["message"].get("tool_calls"):
    tool_call = response["message"]["tool_calls"][0]
    expression = tool_call["function"]["arguments"]["expression"]
    result = str(eval(expression))  # In production, use a safe math parser

    # Send result back
    final = ollama.chat(
        model="llama3.1",
        messages=[
            {"role": "user", "content": "What is 15% of 847?"},
            response["message"],
            {"role": "tool", "content": result}
        ]
    )
    print(final["message"]["content"])

Reading articles is good. Building is better.

Free account = 20+ free chapters across 20 courses, with a per-chapter AI tutor. No card. Cancel anytime if you ever upgrade.

Start free in 30 seconds See pricing

Basic Example (JavaScript) {#basic-javascript-example}

import { Ollama } from 'ollama'

const ollama = new Ollama()

const tools = [{
  type: 'function',
  function: {
    name: 'search_web',
    description: 'Search the web for current information',
    parameters: {
      type: 'object',
      properties: {
        query: { type: 'string', description: 'Search query' }
      },
      required: ['query']
    }
  }
}]

// Send message with tools
const response = await ollama.chat({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Search for the latest Ollama version' }],
  tools
})

if (response.message.tool_calls) {
  for (const toolCall of response.message.tool_calls) {
    console.log(`Calling: ${toolCall.function.name}(${JSON.stringify(toolCall.function.arguments)})`)

    // Execute tool (your implementation)
    const result = await executeSearch(toolCall.function.arguments.query)

    // Send result back
    const final = await ollama.chat({
      model: 'llama3.1',
      messages: [
        { role: 'user', content: 'Search for the latest Ollama version' },
        response.message,
        { role: 'tool', content: result }
      ]
    })
    console.log(final.message.content)
  }
}

Multi-Tool Agent Pattern {#multi-tool-agent}

Real agents use multiple tools in a loop. Here is the complete pattern:

import ollama

# Define multiple tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the internet for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read contents of a local file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path"}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": "Execute Python code and return the output",
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {"type": "string", "description": "Python code to execute"}
                },
                "required": ["code"]
            }
        }
    }
]

# Tool implementations
def execute_tool(name, args):
    if name == "search_web":
        from duckduckgo_search import DDGS
        results = list(DDGS().text(args["query"], max_results=3))
        return "\n".join(f"- {r['title']}: {r['body']}" for r in results)
    elif name == "read_file":
        return open(args["path"]).read()[:5000]
    elif name == "run_python":
        import subprocess
        result = subprocess.run(["python3", "-c", args["code"]],
                              capture_output=True, text=True, timeout=10)
        return result.stdout or result.stderr
    return "Unknown tool"

# Agent loop
def run_agent(question, max_iterations=10):
    messages = [{"role": "user", "content": question}]

    for i in range(max_iterations):
        response = ollama.chat(
            model="qwen2.5:14b",
            messages=messages,
            tools=tools
        )

        message = response["message"]
        messages.append(message)

        # If no tool calls, we have the final answer
        if not message.get("tool_calls"):
            return message["content"]

        # Execute each tool call
        for tool_call in message["tool_calls"]:
            name = tool_call["function"]["name"]
            args = tool_call["function"]["arguments"]
            print(f"  [{i+1}] Calling {name}({args})")

            result = execute_tool(name, args)
            messages.append({"role": "tool", "content": str(result)})

    return "Agent reached max iterations"

# Run it
answer = run_agent("Search for the latest Ollama release and tell me what's new")
print(answer)

This is the same pattern used in our AI Agent Frameworks Comparison — CrewAI, LangGraph, and AutoGen all implement this loop with additional features like memory, error handling, and parallel tool execution.

Real-World Tools {#real-world-tools}

Here are production-ready tool definitions for common use cases:

Web Search

{
    "type": "function",
    "function": {
        "name": "search_web",
        "description": "Search the internet for current information. Use when the user asks about recent events, current data, or anything that might have changed after your training.",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Specific search query"},
                "num_results": {"type": "integer", "description": "Number of results (default 5)"}
            },
            "required": ["query"]
        }
    }
}

Database Query

{
    "type": "function",
    "function": {
        "name": "query_database",
        "description": "Run a read-only SQL query against the application database. Only SELECT queries are allowed.",
        "parameters": {
            "type": "object",
            "properties": {
                "sql": {"type": "string", "description": "SQL SELECT query"},
                "limit": {"type": "integer", "description": "Max rows to return (default 10)"}
            },
            "required": ["sql"]
        }
    }
}

Send Email

{
    "type": "function",
    "function": {
        "name": "send_email",
        "description": "Send an email. Use only when the user explicitly asks to send an email.",
        "parameters": {
            "type": "object",
            "properties": {
                "to": {"type": "string", "description": "Recipient email address"},
                "subject": {"type": "string", "description": "Email subject line"},
                "body": {"type": "string", "description": "Email body text"}
            },
            "required": ["to", "subject", "body"]
        }
    }
}

Using with Frameworks {#frameworks}

LangChain + Ollama

from langchain_ollama import ChatOllama
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # Your implementation
    return f"Weather in {city}: 22°C, sunny"

llm = ChatOllama(model="llama3.1")
llm_with_tools = llm.bind_tools([get_weather])

result = llm_with_tools.invoke("What's the weather in Paris?")

CrewAI + Ollama

from crewai import Agent, Task, Crew
from crewai_tools import tool

@tool("Search Tool")
def search(query: str) -> str:
    """Search the web for information."""
    # Your implementation
    return "search results..."

researcher = Agent(
    role="Researcher",
    goal="Find accurate information",
    tools=[search],
    llm="ollama/llama3.1"
)

For a complete framework comparison, see our AI Agent Frameworks guide.

Best Practices {#best-practices}

1. Write clear tool descriptions

The model decides which tool to use based on the description field. Vague descriptions cause wrong tool selection.

Bad: "description": "Get data" Good: "description": "Search the web for current information. Use when the user asks about recent events or data not in your training."

2. Use smaller, focused tools

Break complex operations into simple tools. Instead of one do_everything tool, create search_web, read_file, run_code separately.

3. Set low temperature for reliability

Tool calling requires structured JSON output. Higher temperatures increase the chance of malformed responses.

ollama.chat(model="llama3.1", messages=messages, tools=tools,
            options={"temperature": 0.1})

4. Validate tool arguments

Never trust model-generated arguments blindly. Validate types, sanitize strings, check for path traversal in file operations, and use parameterized SQL queries.

5. Set iteration limits

Always cap the agent loop. Models can get stuck in tool-calling cycles. 5-10 iterations is usually sufficient.

6. Handle errors gracefully

If a tool fails, send the error back as the tool result. The model can often recover and try a different approach.

try:
    result = execute_tool(name, args)
except Exception as e:
    result = f"Error: {str(e)}. Try a different approach."

FAQ {#faq}

See answers to common questions about Ollama tool calling below.

Sources: Ollama Tool Calling Documentation | Ollama Blog: Tool Support | LangChain Ollama Integration | CrewAI Documentation

Ollama Tool Calling Guide: Build AI Agents with Local LLMs

Want to go deeper than this article?

Table of Contents

Reading articles is good. Building is better.

How Tool Calling Works {#how-tool-calling-works}

Supported Models {#supported-models}

Basic Example (Python) {#basic-python-example}

Minimal tool calling example

Using the Python library

Reading articles is good. Building is better.

Basic Example (JavaScript) {#basic-javascript-example}

Multi-Tool Agent Pattern {#multi-tool-agent}

Real-World Tools {#real-world-tools}

Web Search

Database Query

Send Email

Using with Frameworks {#frameworks}

LangChain + Ollama

CrewAI + Ollama

Best Practices {#best-practices}

1. Write clear tool descriptions

2. Use smaller, focused tools

3. Set low temperature for reliability

4. Validate tool arguments

5. Set iteration limits

6. Handle errors gracefully

FAQ {#faq}

Ollama’s running. Here’s what to build with it.

Liked this? 20 full AI courses are waiting.

LocalAimaster Research Team

Build Real AI on Your Machine

Want structured AI education?

Continue Your Local AI Journey

How to Install Your First Local AI Model

How to Choose the Right AI Model for Your Computer

Comments (0)

AI Agent Starter Kit

Build Real AI on Your Machine

Related Guides

AI Agent Frameworks Comparison

Complete Ollama Guide

Best Ollama Models

RAG Local Setup Guide

Written by the Local AI Master Team

Grab the AI Starter Kit — career roadmap, cheat sheet, setup guide

Ollama’s running. Here’s what to build with it.