A chatbot answers. An agent acts.
The leap from "a model that answers questions" to "a model that accomplishes tasks" is, arguably, the defining arc of 2023–2026. It is also where most production AI engineering effort now goes.
An agent is, in its simplest form, an LLM in a loop with tools and a goal. Nothing more.
In plain English. A chatbot is a pen pal. An agent is an intern with a credit card and a laptop.
stateDiagram-v2
[*] --> Goal
Goal --> Plan: read goal + memory
Plan --> Act: choose tool
Act --> Observe: execute
Observe --> Reflect: read result
Reflect --> Plan: not done
Reflect --> Verify: looks done
Verify --> Plan: failed checks
Verify --> [*]: passed
Real systems add three things to that loop: memory (persistent state across turns), safety checks (don't email the customer, don't drop the table), and budgets (no more than N tool calls, no more than M dollars).
flowchart LR
G[Goal] --> P[Plan / think]
P --> A[Act: call a tool]
A --> O[Observe result]
O --> D{Done?}
D -- no --> P
D -- yes --> R[Return result]
That loop — Observe, Orient, Decide, Act — is ancient in AI (it's Boyd's OODA loop, it's classical control theory, it's reinforcement learning). The Transformer-era twist is that the "Decide" node can be an LLM reading natural language and writing natural language.
Yao et al.'s ReAct paper (Oct 2022) showed that interleaving reasoning and action outperformed both pure reasoning (CoT) and pure action (tool use). The model alternates:
Thought: I need the current price of AAPL.
Action: get_quote("AAPL")
Observation: 198.42
Thought: The user asked in euros. I should convert.
Action: get_fx_rate("USD", "EUR")
Observation: 0.92
Thought: 198.42 * 0.92 = 182.55
Final Answer: AAPL is trading at roughly €182.55.
This four-line format is the ancestor of every agent framework that followed. Modern agents dress it up (JSON instead of text, parallel tool calls, typed schemas) but the core is unchanged.
By spring 2023, agent demos were everywhere:
They were chaotic because:
But they proved the shape.
flowchart TB
subgraph Runtime
L[LLM: reasoning core]
ST[Short-term state
conversation + scratch]
MEM[Long-term memory
vector + summaries]
end
subgraph Tools
T1[Retrieval]
T2[Code execution]
T3[HTTP / API]
T4[DB query]
T5[Filesystem]
T6[Browser]
T7[Domain tools]
end
G[Goal] --> L
L <--> ST
L <--> MEM
L --> T1
L --> T2
L --> T3
L --> T4
L --> T5
L --> T6
L --> T7
T1 --> L
T2 --> L
T3 --> L
T4 --> L
T5 --> L
T6 --> L
T7 --> L
L --> R[Result]
Common patterns:
Categories that matter in 2026:
flowchart LR
subgraph Coding
A1[Claude Code]
A2[Cursor agent]
A3[Devin]
end
subgraph Computer-use
B1[Claude CU]
B2[OpenAI Operator]
B3[Project Mariner]
end
subgraph Research
C1[Gemini DR]
C2[ChatGPT DR]
C3[Perplexity]
end
subgraph Ops
D1[Resolve AI]
D2[Runbooks]
end
subgraph Business
E1[Sierra]
E2[Decagon]
E3[11x]
end
Don't. The Anthropic guidance (Building Effective Agents, 2024) is the single best piece of advice here:
Use the simplest thing that works.
Often you don't need an agent — you need a well-structured workflow with an LLM call or two. Full agentic autonomy (the model decides the number of steps and the order) is the most expensive, least predictable, and hardest to debug pattern. Use it only when:
When a deterministic workflow suffices, write the workflow. When an LLM routes between workflows, call it a "router." Reserve "agent" for the cases where the model really does need to make decisions inside a loop.
flowchart TD
A[Task] --> B{Bounded steps
enumerable?}
B -- yes --> C[Workflow
with LLM calls]
B -- no --> D{Safe to explore?}
D -- yes --> E[Agent]
D -- no --> F[Workflow with
human checkpoints]
The production art of agents, as learned the hard way:
{ok: false, error: "...", suggestion: "..."}; the model self-corrects.Here is the shape. In production you'd use LangGraph, OpenAI Agents SDK, or Claude Agent SDK — but seeing it bare is worth a page.
from anthropic import Anthropic
client = Anthropic()
TOOLS = [
{
"name": "search",
"description": "Search the company wiki.",
"input_schema": {
"type": "object",
"properties": {"q": {"type": "string"}},
"required": ["q"],
},
},
{
"name": "finish",
"description": "Return the final answer to the user.",
"input_schema": {
"type": "object",
"properties": {"answer": {"type": "string"}},
"required": ["answer"],
},
},
]
def run_tool(name, args):
if name == "search":
return wiki_search(args["q"])
if name == "finish":
return {"done": True, "answer": args["answer"]}
return {"error": f"unknown tool {name}"}
def agent(goal, max_steps=10):
messages = [{"role": "user", "content": goal}]
for step in range(max_steps):
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
tools=TOOLS,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
tool_uses = [b for b in resp.content if b.type == "tool_use"]
if not tool_uses:
return resp # model answered without tool use
results = []
for tu in tool_uses:
result = run_tool(tu.name, tu.input)
if tu.name == "finish":
return result["answer"]
results.append({
"type": "tool_result",
"tool_use_id": tu.id,
"content": str(result),
})
messages.append({"role": "user", "content": results})
raise RuntimeError("agent exceeded step budget")
Everything else — multi-agent, MCP, long-running workflows — is a variation of this loop.
If you're building an agent, the question to keep asking is: "would a junior engineer with this toolkit and these instructions succeed?" If the answer is "probably not," your agent won't either.