Chapter 10 · Tool Use & Function Calling

The API contract that turned LLMs into software.

A pure LLM is a wordsmith in a sealed room. It can't look anything up, can't compute, can't act. Tool use — also called function calling — is how we let the model reach out of the room and touch the world.

In plain English. A tool is a function the model can call by name. You define what tools exist; the model decides when to use them; your code runs them.

A tour of useful tool categories

mindmap
  root((Tools you
give an agent))
    Read
      Database query
      Web search
      File read
      Vector search
      Logs and metrics
    Write
      Database write
      Email send
      Ticket create
      Code edit
      Git commit
    Compute
      Calculator
      Code runner
      Image generator
      Embedding
    Communicate
      Slack message
      User question
      Approval request
    Control
      Wait
      Retry
      Hand off
      Stop

A useful design rule: read tools should be free; write tools should require a budget; control tools should always be available.

It is, mechanically, a very simple protocol. But it is the mechanism by which every production agent, every RAG system with routing, and every enterprise AI workflow runs.

10.1 The protocol

You give the model a list of tools — each described by a name, description, and input schema (JSON Schema). The model either:

Responds with text (normal chat), or
Responds with a tool call: a name and a JSON payload matching the schema.

You execute the tool in your code and feed the result back as a tool result message. The model continues the conversation with that new information.

sequenceDiagram
    participant U as User
    participant A as Your app
    participant M as Model
    participant T as Tool (DB / API)
    U->>A: What is the P99 for /checkout?
    A->>M: prompt + tool schemas
    M-->>A: tool_call(query_prometheus, {q: "..."})
    A->>T: HTTP call
    T-->>A: 342 ms
    A->>M: tool_result: 342 ms
    M-->>A: P99 is 342 ms, up 18% WoW.
    A-->>U: P99 is 342 ms, up 18% WoW.

10.2 The schema

A tool definition looks like this (shared shape across Claude, OpenAI, Gemini, and others):

{
  "name": "get_order",
  "description": "Fetch an order by ID. Returns order state, line items, and customer.",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "UUID or human-readable ID (ORD-12345)"
      },
      "include_items": {
        "type": "boolean",
        "default": true
      }
    },
    "required": ["order_id"]
  }
}

Two things to notice:

The description field is not metadata — it's the prompt. The model chooses which tool to call based on the name and descriptions. Invest as much craft in writing these as you would in writing a function docstring read by a junior engineer.
The schema is enforced (with modern providers, via constrained decoding). You do not need to defensively parse malformed JSON.

10.3 Parallel tool use

Modern models (Claude 3.5+, GPT-4 Turbo+, Gemini 2+) can request multiple tool calls in a single turn. Design your runtime to execute them in parallel.

flowchart LR
    M[Model] --> T1[Tool A: get_user]
    M --> T2[Tool B: get_account]
    M --> T3[Tool C: get_history]
    T1 -.parallel.-> R[Combine results]
    T2 -.parallel.-> R
    T3 -.parallel.-> R
    R --> M

Real example: an "account summary" agent might, in one turn, call get_user, get_balance, get_recent_transactions, and get_notifications in parallel. Sequential execution turns a 2-second operation into an 8-second one.

10.4 Tool design principles

Agents live or die by tool design. A few hard-won rules:

One tool, one job. run_sql that does everything is a footgun. get_customer_by_email, list_orders_by_customer, cancel_order — three safe, typed tools.
Descriptions as prompts. Write them like you're onboarding a new hire. Mention edge cases, return format, failure modes.
Idempotency and safety classes. Mark tools as read-only, write, or destructive. Gate the destructive ones behind confirmations.
Typed outputs. Return JSON, not free text. Future you will thank present you.
Rich structured errors. { "ok": false, "error": "order_not_found", "suggestion": "try search_orders_by_email" } — the model will follow the suggestion.
Bounded outputs. If a tool can return 10k rows, paginate or summarize. Don't flood the context.
Stable interfaces. Once an agent is trained (or a prompt is tuned), renaming a tool mid-flight breaks everything silently.

10.5 An end-to-end example (Python, Anthropic)

from anthropic import Anthropic
import orders_api, customers_api

client = Anthropic()

TOOLS = [
    {
        "name": "get_customer",
        "description": "Look up a customer by email address.",
        "input_schema": {
            "type": "object",
            "properties": {"email": {"type": "string", "format": "email"}},
            "required": ["email"],
        },
    },
    {
        "name": "list_orders",
        "description": "List the N most recent orders for a customer ID.",
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string"},
                "limit": {"type": "integer", "default": 5, "maximum": 50},
            },
            "required": ["customer_id"],
        },
    },
]

TOOL_FNS = {
    "get_customer": lambda args: customers_api.by_email(args["email"]),
    "list_orders": lambda args: orders_api.list_for(args["customer_id"], args.get("limit", 5)),
}

def ask(question, max_turns=8):
    msgs = [{"role": "user", "content": question}]
    for _ in range(max_turns):
        r = client.messages.create(
            model="claude-opus-4-7",
            max_tokens=1024,
            tools=TOOLS,
            messages=msgs,
        )
        msgs.append({"role": "assistant", "content": r.content})

        tool_uses = [b for b in r.content if b.type == "tool_use"]
        if not tool_uses:
            return next(b.text for b in r.content if b.type == "text")

        results = []
        for tu in tool_uses:
            try:
                result = TOOL_FNS[tu.name](tu.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": tu.id,
                    "content": str(result),
                })
            except Exception as e:
                results.append({
                    "type": "tool_result",
                    "tool_use_id": tu.id,
                    "content": f"error: {e}",
                    "is_error": True,
                })
        msgs.append({"role": "user", "content": results})
    raise RuntimeError("turn budget exceeded")

print(ask("How many orders has ava@example.com placed this year?"))

Run that and you have a working, production-shaped tool-using agent in ~50 lines.

10.6 Tool use in Java

Spring AI makes this nearly identical:

@Tool(description = "Fetch an order by ID")
public Order getOrder(String orderId) {
    return orderService.findById(orderId);
}

// In your ChatClient:
String answer = chatClient.prompt()
    .user(question)
    .tools(new OrderTools())
    .call()
    .content();

Spring AI handles the JSON schema generation (from annotations + types), the dispatch, and the round-trip. LangChain4j offers the same ergonomics.

10.7 Safety: tool use is where AI meets real systems

Every dangerous AI security incident in 2024–2025 involved tool use of some kind. The model isn't malicious; its inputs are. Some defense basics:

Never let a model execute arbitrary code with production credentials. Sandboxed Python, read-only SQL, or nothing.
Scope credentials minimally. Per-agent service accounts with row-level security.
Treat tool outputs as untrusted input. A webpage fetched by a browser tool might contain "ignore previous instructions" — this is prompt injection (Chapter 20).
Human confirmation on destructive actions. Payments, emails, deployments, DB writes — require an explicit approval step.
Rate-limit and budget. An agent in a loop can make thousands of calls before you notice. Set ceilings.
Log everything. Every tool call, every input, every output, with a trace ID. You'll need this when something goes sideways.

10.8 Structured outputs vs tool use — same mechanism, different framing

Most providers expose "structured output" (return JSON matching a schema) as either a dedicated feature or a degenerate case of tool use (one tool, and the model must call it). Use whichever feels cleaner:

Just JSON: response_format={"type": "json_schema", ...}.
JSON with an action: tool use.
Free text: plain messages.

10.9 Tool orchestration patterns

A few patterns you'll reach for repeatedly:

Dispatcher. A thin layer in front of tools that enforces auth, rate limits, and tracing.
Dry-run mode. Return what would happen without executing. Great for agent previews.
Batch tools. One tool that takes a list, to avoid N-round-trips.
Chunked streaming results. For tools that return lots of data, stream chunks back with pagination tokens.
Versioning. Suffix tool names (get_order_v2) so models don't get confused when you change schemas mid-deployment.

10.10 The MCP shift

In late 2024, Anthropic released MCP (Model Context Protocol) — a standardized way to expose tools that works across clients. Instead of re-implementing tools per client, you write one MCP server and any MCP-compatible client (Claude Desktop, Cursor, ChatGPT, Cowork, your own app) can use it. Chapter 12 dives in.