The API contract that turned LLMs into software.
A pure LLM is a wordsmith in a sealed room. It can't look anything up, can't compute, can't act. Tool use — also called function calling — is how we let the model reach out of the room and touch the world.
In plain English. A tool is a function the model can call by name. You define what tools exist; the model decides when to use them; your code runs them.
mindmap root((Tools you
give an agent)) Read Database query Web search File read Vector search Logs and metrics Write Database write Email send Ticket create Code edit Git commit Compute Calculator Code runner Image generator Embedding Communicate Slack message User question Approval request Control Wait Retry Hand off Stop
A useful design rule: read tools should be free; write tools should require a budget; control tools should always be available.
It is, mechanically, a very simple protocol. But it is the mechanism by which every production agent, every RAG system with routing, and every enterprise AI workflow runs.
You give the model a list of tools — each described by a name, description, and input schema (JSON Schema). The model either:
You execute the tool in your code and feed the result back as a tool result message. The model continues the conversation with that new information.
sequenceDiagram
participant U as User
participant A as Your app
participant M as Model
participant T as Tool (DB / API)
U->>A: What is the P99 for /checkout?
A->>M: prompt + tool schemas
M-->>A: tool_call(query_prometheus, {q: "..."})
A->>T: HTTP call
T-->>A: 342 ms
A->>M: tool_result: 342 ms
M-->>A: P99 is 342 ms, up 18% WoW.
A-->>U: P99 is 342 ms, up 18% WoW.
A tool definition looks like this (shared shape across Claude, OpenAI, Gemini, and others):
{
"name": "get_order",
"description": "Fetch an order by ID. Returns order state, line items, and customer.",
"input_schema": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "UUID or human-readable ID (ORD-12345)"
},
"include_items": {
"type": "boolean",
"default": true
}
},
"required": ["order_id"]
}
}
Two things to notice:
description field is not metadata — it's the prompt. The model chooses which tool to call based on the name and descriptions. Invest as much craft in writing these as you would in writing a function docstring read by a junior engineer.Modern models (Claude 3.5+, GPT-4 Turbo+, Gemini 2+) can request multiple tool calls in a single turn. Design your runtime to execute them in parallel.
flowchart LR
M[Model] --> T1[Tool A: get_user]
M --> T2[Tool B: get_account]
M --> T3[Tool C: get_history]
T1 -.parallel.-> R[Combine results]
T2 -.parallel.-> R
T3 -.parallel.-> R
R --> M
Real example: an "account summary" agent might, in one turn, call get_user, get_balance, get_recent_transactions, and get_notifications in parallel. Sequential execution turns a 2-second operation into an 8-second one.
Agents live or die by tool design. A few hard-won rules:
run_sql that does everything is a footgun. get_customer_by_email, list_orders_by_customer, cancel_order — three safe, typed tools.read-only, write, or destructive. Gate the destructive ones behind confirmations.{ "ok": false, "error": "order_not_found", "suggestion": "try search_orders_by_email" } — the model will follow the suggestion.from anthropic import Anthropic
import orders_api, customers_api
client = Anthropic()
TOOLS = [
{
"name": "get_customer",
"description": "Look up a customer by email address.",
"input_schema": {
"type": "object",
"properties": {"email": {"type": "string", "format": "email"}},
"required": ["email"],
},
},
{
"name": "list_orders",
"description": "List the N most recent orders for a customer ID.",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {"type": "string"},
"limit": {"type": "integer", "default": 5, "maximum": 50},
},
"required": ["customer_id"],
},
},
]
TOOL_FNS = {
"get_customer": lambda args: customers_api.by_email(args["email"]),
"list_orders": lambda args: orders_api.list_for(args["customer_id"], args.get("limit", 5)),
}
def ask(question, max_turns=8):
msgs = [{"role": "user", "content": question}]
for _ in range(max_turns):
r = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=TOOLS,
messages=msgs,
)
msgs.append({"role": "assistant", "content": r.content})
tool_uses = [b for b in r.content if b.type == "tool_use"]
if not tool_uses:
return next(b.text for b in r.content if b.type == "text")
results = []
for tu in tool_uses:
try:
result = TOOL_FNS[tu.name](tu.input)
results.append({
"type": "tool_result",
"tool_use_id": tu.id,
"content": str(result),
})
except Exception as e:
results.append({
"type": "tool_result",
"tool_use_id": tu.id,
"content": f"error: {e}",
"is_error": True,
})
msgs.append({"role": "user", "content": results})
raise RuntimeError("turn budget exceeded")
print(ask("How many orders has ava@example.com placed this year?"))
Run that and you have a working, production-shaped tool-using agent in ~50 lines.
Spring AI makes this nearly identical:
@Tool(description = "Fetch an order by ID")
public Order getOrder(String orderId) {
return orderService.findById(orderId);
}
// In your ChatClient:
String answer = chatClient.prompt()
.user(question)
.tools(new OrderTools())
.call()
.content();
Spring AI handles the JSON schema generation (from annotations + types), the dispatch, and the round-trip. LangChain4j offers the same ergonomics.
Every dangerous AI security incident in 2024–2025 involved tool use of some kind. The model isn't malicious; its inputs are. Some defense basics:
Most providers expose "structured output" (return JSON matching a schema) as either a dedicated feature or a degenerate case of tool use (one tool, and the model must call it). Use whichever feels cleaner:
response_format={"type": "json_schema", ...}.A few patterns you'll reach for repeatedly:
get_order_v2) so models don't get confused when you change schemas mid-deployment.In late 2024, Anthropic released MCP (Model Context Protocol) — a standardized way to expose tools that works across clients. Instead of re-implementing tools per client, you write one MCP server and any MCP-compatible client (Claude Desktop, Cursor, ChatGPT, Cowork, your own app) can use it. Chapter 12 dives in.