When one agent isn't enough — and the hazards of adding more.
A multi-agent system is a collection of agents that coordinate to solve a task that a single agent could not, or could not cleanly. Think specialists, not clones.
In plain English. A multi-agent system is the difference between a generalist intern and a small startup with founders, an engineer, a designer, and an ops person. Each role does less, but the team does more.
flowchart TB
subgraph A1[Single agent]
SA[Agent]
end
subgraph A2[Orchestrator + workers]
O[Orchestrator] --> W1[SQL]
O --> W2[Code]
O --> W3[Search]
end
subgraph A3[Hierarchical]
P[Planner] --> M1[Manager A] --> E1[Executor]
P --> M2[Manager B] --> E2[Executor]
end
subgraph A4[Critic / debate]
G[Generator] --> C[Critic]
C --> G
C --> J[Judge]
end
subgraph A5[Swarm / blackboard]
B[(Shared blackboard)]
Ag1[Agent] <--> B
Ag2[Agent] <--> B
Ag3[Agent] <--> B
end
In production, you mostly see single agent and orchestrator + workers. The other patterns are real, but they pay off only on hard or open-ended tasks.
The most important thing about multi-agent systems is knowing when not to build one. We'll start there.
Every additional agent adds:
Anthropic's rule, from Building Effective Agents, deserves to be tattooed on every AI engineer:
Use the simplest thing that works.
If a single agent with clear tools can do the job — even if the prompt is long — prefer that. Graduate to multi-agent only when there's a real reason.
Cases where splitting helps:
flowchart TB
subgraph pat1[Pattern 1: Router]
R1[Router] --> S1[Specialist A]
R1 --> S2[Specialist B]
R1 --> S3[Specialist C]
end
subgraph pat2[Pattern 2: Orchestrator-Worker]
O[Orchestrator] --> W1[Worker 1]
O --> W2[Worker 2]
O --> W3[Worker 3]
W1 --> O
W2 --> O
W3 --> O
end
subgraph pat3[Pattern 3: Planner-Executor-Reviewer]
P[Planner] --> E[Executor]
E --> V[Reviewer / Critic]
V -->|retry| E
V -->|ok| OUT[Done]
end
subgraph pat4[Pattern 4: Debate]
A1[Agent A] --- A2[Agent B]
A1 --> J[Judge]
A2 --> J
end
subgraph pat5[Pattern 5: Hierarchical]
H[Top planner] --> M1[Manager]
M1 --> EX1[Executor]
M1 --> EX2[Executor]
end
A thin LLM call selects which downstream agent / prompt / model / workflow to use. Everything else is vanilla. Most common pattern in production.
An orchestrator agent decomposes a task into parallel subtasks, dispatches to workers, and synthesizes. Anthropic's own research system uses this shape — one "lead" agent spawning search subagents.
One agent plans, one executes, one critiques. A loop until the reviewer approves. Used in coding agents, writing pipelines, and scientific-style workflows.
Two agents argue different sides; a third judges. Expensive but useful for high-stakes reasoning (constitutional AI research, hard math, adversarial evaluation).
Planner → managers → executors. Scales to long, tree-shaped tasks. Watch out for compounding latency.
Anthropic's public write-up of their research agent gives a clean shape:
flowchart TB
U[User asks complex question] --> L[Lead agent]
L --> P[Plan: decompose into subqueries]
P --> S1[Subagent: topic A]
P --> S2[Subagent: topic B]
P --> S3[Subagent: topic C]
S1 --> SUM[Summaries + citations]
S2 --> SUM
S3 --> SUM
SUM --> L
L --> W[Writer agent
compose final answer]
W --> C[Citation checker agent]
C --> ANS[Final answer with sources]
Key choices:
That architecture took Anthropic from 45% to 90%+ on their internal research benchmark.
Multi-agent systems need to agree on something. Options:
For backend engineers, Temporal + LLMs is a particularly good fit: you get retries, timeouts, visibility, and replay for free, and the LLM call is just another activity.
flowchart LR
subgraph Temporal workflow
S[Start] --> A1[Activity: plan
LLM call]
A1 --> A2[Activity: execute step 1]
A2 --> A3[Activity: execute step 2]
A3 --> A4[Activity: review
LLM call]
A4 -->|fail| A2
A4 -->|ok| E[End]
end
A multi-agent system is a tree of LLM calls. Logs won't save you. Traces will.
Emit OpenTelemetry spans for every agent turn and tool call. Hosted options:
Minimum fields in each span: model, prompt hash, input tokens, output tokens, latency, cost, tool name (if applicable), error.
If you're a backend engineer, LangGraph or Temporal are the two I'd start with.
Research and demos showcase swarms of 100+ agents self-organizing to accomplish goals. As of 2026, these are mostly fascinating science projects; production versions remain rare outside narrow domains. The bottlenecks are evaluation, debugging, and cost. Watch the space; don't bet your roadmap on it yet.
Single agent + tools [default]
+ router [if many paths]
+ orchestrator + workers [if parallelizable]
+ planner / executor / reviewer [if quality-critical]
+ hierarchical [if genuinely tree-shaped]
Swarm [research]
Climb the ladder only as the task forces you to.