Chapter 10 · Tool Use & Function Calling

The API contract that turned LLMs into software.


A pure LLM is a wordsmith in a sealed room. It can't look anything up, can't compute, can't act. Tool use — also called function calling — is how we let the model reach out of the room and touch the world.

In plain English. A tool is a function the model can call by name. You define what tools exist; the model decides when to use them; your code runs them.

A tour of useful tool categories

mindmap
  root((Tools you
give an agent)) Read Database query Web search File read Vector search Logs and metrics Write Database write Email send Ticket create Code edit Git commit Compute Calculator Code runner Image generator Embedding Communicate Slack message User question Approval request Control Wait Retry Hand off Stop

A useful design rule: read tools should be free; write tools should require a budget; control tools should always be available.

It is, mechanically, a very simple protocol. But it is the mechanism by which every production agent, every RAG system with routing, and every enterprise AI workflow runs.

10.1 The protocol

You give the model a list of tools — each described by a name, description, and input schema (JSON Schema). The model either:

You execute the tool in your code and feed the result back as a tool result message. The model continues the conversation with that new information.

sequenceDiagram
    participant U as User
    participant A as Your app
    participant M as Model
    participant T as Tool (DB / API)
    U->>A: What is the P99 for /checkout?
    A->>M: prompt + tool schemas
    M-->>A: tool_call(query_prometheus, {q: "..."})
    A->>T: HTTP call
    T-->>A: 342 ms
    A->>M: tool_result: 342 ms
    M-->>A: P99 is 342 ms, up 18% WoW.
    A-->>U: P99 is 342 ms, up 18% WoW.

10.2 The schema

A tool definition looks like this (shared shape across Claude, OpenAI, Gemini, and others):

{
  "name": "get_order",
  "description": "Fetch an order by ID. Returns order state, line items, and customer.",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "UUID or human-readable ID (ORD-12345)"
      },
      "include_items": {
        "type": "boolean",
        "default": true
      }
    },
    "required": ["order_id"]
  }
}

Two things to notice:

  1. The description field is not metadata — it's the prompt. The model chooses which tool to call based on the name and descriptions. Invest as much craft in writing these as you would in writing a function docstring read by a junior engineer.
  2. The schema is enforced (with modern providers, via constrained decoding). You do not need to defensively parse malformed JSON.

10.3 Parallel tool use

Modern models (Claude 3.5+, GPT-4 Turbo+, Gemini 2+) can request multiple tool calls in a single turn. Design your runtime to execute them in parallel.

flowchart LR
    M[Model] --> T1[Tool A: get_user]
    M --> T2[Tool B: get_account]
    M --> T3[Tool C: get_history]
    T1 -.parallel.-> R[Combine results]
    T2 -.parallel.-> R
    T3 -.parallel.-> R
    R --> M

Real example: an "account summary" agent might, in one turn, call get_user, get_balance, get_recent_transactions, and get_notifications in parallel. Sequential execution turns a 2-second operation into an 8-second one.

10.4 Tool design principles

Agents live or die by tool design. A few hard-won rules:

  1. One tool, one job. run_sql that does everything is a footgun. get_customer_by_email, list_orders_by_customer, cancel_order — three safe, typed tools.
  2. Descriptions as prompts. Write them like you're onboarding a new hire. Mention edge cases, return format, failure modes.
  3. Idempotency and safety classes. Mark tools as read-only, write, or destructive. Gate the destructive ones behind confirmations.
  4. Typed outputs. Return JSON, not free text. Future you will thank present you.
  5. Rich structured errors. { "ok": false, "error": "order_not_found", "suggestion": "try search_orders_by_email" } — the model will follow the suggestion.
  6. Bounded outputs. If a tool can return 10k rows, paginate or summarize. Don't flood the context.
  7. Stable interfaces. Once an agent is trained (or a prompt is tuned), renaming a tool mid-flight breaks everything silently.

10.5 An end-to-end example (Python, Anthropic)

from anthropic import Anthropic
import orders_api, customers_api

client = Anthropic()

TOOLS = [
    {
        "name": "get_customer",
        "description": "Look up a customer by email address.",
        "input_schema": {
            "type": "object",
            "properties": {"email": {"type": "string", "format": "email"}},
            "required": ["email"],
        },
    },
    {
        "name": "list_orders",
        "description": "List the N most recent orders for a customer ID.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string"},
                "limit": {"type": "integer", "default": 5, "maximum": 50},
            },
            "required": ["customer_id"],
        },
    },
]

TOOL_FNS = {
    "get_customer": lambda args: customers_api.by_email(args["email"]),
    "list_orders": lambda args: orders_api.list_for(args["customer_id"], args.get("limit", 5)),
}

def ask(question, max_turns=8):
    msgs = [{"role": "user", "content": question}]
    for _ in range(max_turns):
        r = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=1024,
            tools=TOOLS,
            messages=msgs,
        )
        msgs.append({"role": "assistant", "content": r.content})

        tool_uses = [b for b in r.content if b.type == "tool_use"]
        if not tool_uses:
            return next(b.text for b in r.content if b.type == "text")

        results = []
        for tu in tool_uses:
            try:
                result = TOOL_FNS[tu.name](tu.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": tu.id,
                    "content": str(result),
                })
            except Exception as e:
                results.append({
                    "type": "tool_result",
                    "tool_use_id": tu.id,
                    "content": f"error: {e}",
                    "is_error": True,
                })
        msgs.append({"role": "user", "content": results})
    raise RuntimeError("turn budget exceeded")

print(ask("How many orders has ava@example.com placed this year?"))

Run that and you have a working, production-shaped tool-using agent in ~50 lines.

10.6 Tool use in Java

Spring AI makes this nearly identical:

@Tool(description = "Fetch an order by ID")
public Order getOrder(String orderId) {
    return orderService.findById(orderId);
}

// In your ChatClient:
String answer = chatClient.prompt()
    .user(question)
    .tools(new OrderTools())
    .call()
    .content();

Spring AI handles the JSON schema generation (from annotations + types), the dispatch, and the round-trip. LangChain4j offers the same ergonomics.

10.7 Safety: tool use is where AI meets real systems

Every dangerous AI security incident in 2024–2025 involved tool use of some kind. The model isn't malicious; its inputs are. Some defense basics:

10.8 Structured outputs vs tool use — same mechanism, different framing

Most providers expose "structured output" (return JSON matching a schema) as either a dedicated feature or a degenerate case of tool use (one tool, and the model must call it). Use whichever feels cleaner:

10.9 Tool orchestration patterns

A few patterns you'll reach for repeatedly:

10.10 The MCP shift

In late 2024, Anthropic released MCP (Model Context Protocol) — a standardized way to expose tools that works across clients. Instead of re-implementing tools per client, you write one MCP server and any MCP-compatible client (Claude Desktop, Cursor, ChatGPT, Cowork, your own app) can use it. Chapter 12 dives in.

Further reading & watching