Chapter 11 · Multi-Agent Systems

When one agent isn't enough — and the hazards of adding more.


A multi-agent system is a collection of agents that coordinate to solve a task that a single agent could not, or could not cleanly. Think specialists, not clones.

In plain English. A multi-agent system is the difference between a generalist intern and a small startup with founders, an engineer, a designer, and an ops person. Each role does less, but the team does more.

Patterns at a glance

flowchart TB
    subgraph A1[Single agent]
    SA[Agent]
    end
    subgraph A2[Orchestrator + workers]
    O[Orchestrator] --> W1[SQL]
    O --> W2[Code]
    O --> W3[Search]
    end
    subgraph A3[Hierarchical]
    P[Planner] --> M1[Manager A] --> E1[Executor]
    P --> M2[Manager B] --> E2[Executor]
    end
    subgraph A4[Critic / debate]
    G[Generator] --> C[Critic]
    C --> G
    C --> J[Judge]
    end
    subgraph A5[Swarm / blackboard]
    B[(Shared blackboard)]
    Ag1[Agent] <--> B
    Ag2[Agent] <--> B
    Ag3[Agent] <--> B
    end

In production, you mostly see single agent and orchestrator + workers. The other patterns are real, but they pay off only on hard or open-ended tasks.

The most important thing about multi-agent systems is knowing when not to build one. We'll start there.

11.1 The cost of more agents

Every additional agent adds:

Anthropic's rule, from Building Effective Agents, deserves to be tattooed on every AI engineer:

Use the simplest thing that works.

If a single agent with clear tools can do the job — even if the prompt is long — prefer that. Graduate to multi-agent only when there's a real reason.

11.2 When multi-agent earns its keep

Cases where splitting helps:

11.3 The five patterns you'll actually use

flowchart TB
    subgraph pat1[Pattern 1: Router]
    R1[Router] --> S1[Specialist A]
    R1 --> S2[Specialist B]
    R1 --> S3[Specialist C]
    end
    subgraph pat2[Pattern 2: Orchestrator-Worker]
    O[Orchestrator] --> W1[Worker 1]
    O --> W2[Worker 2]
    O --> W3[Worker 3]
    W1 --> O
    W2 --> O
    W3 --> O
    end
    subgraph pat3[Pattern 3: Planner-Executor-Reviewer]
    P[Planner] --> E[Executor]
    E --> V[Reviewer / Critic]
    V -->|retry| E
    V -->|ok| OUT[Done]
    end
    subgraph pat4[Pattern 4: Debate]
    A1[Agent A] --- A2[Agent B]
    A1 --> J[Judge]
    A2 --> J
    end
    subgraph pat5[Pattern 5: Hierarchical]
    H[Top planner] --> M1[Manager]
    M1 --> EX1[Executor]
    M1 --> EX2[Executor]
    end

Pattern 1 — Router

A thin LLM call selects which downstream agent / prompt / model / workflow to use. Everything else is vanilla. Most common pattern in production.

Pattern 2 — Orchestrator + workers

An orchestrator agent decomposes a task into parallel subtasks, dispatches to workers, and synthesizes. Anthropic's own research system uses this shape — one "lead" agent spawning search subagents.

Pattern 3 — Planner → Executor → Reviewer

One agent plans, one executes, one critiques. A loop until the reviewer approves. Used in coding agents, writing pipelines, and scientific-style workflows.

Pattern 4 — Debate

Two agents argue different sides; a third judges. Expensive but useful for high-stakes reasoning (constitutional AI research, hard math, adversarial evaluation).

Pattern 5 — Hierarchical

Planner → managers → executors. Scales to long, tree-shaped tasks. Watch out for compounding latency.

11.4 A concrete orchestrator-worker example

Anthropic's public write-up of their research agent gives a clean shape:

flowchart TB
    U[User asks complex question] --> L[Lead agent]
    L --> P[Plan: decompose into subqueries]
    P --> S1[Subagent: topic A]
    P --> S2[Subagent: topic B]
    P --> S3[Subagent: topic C]
    S1 --> SUM[Summaries + citations]
    S2 --> SUM
    S3 --> SUM
    SUM --> L
    L --> W[Writer agent
compose final answer] W --> C[Citation checker agent] C --> ANS[Final answer with sources]

Key choices:

That architecture took Anthropic from 45% to 90%+ on their internal research benchmark.

11.5 Coordination and memory

Multi-agent systems need to agree on something. Options:

For backend engineers, Temporal + LLMs is a particularly good fit: you get retries, timeouts, visibility, and replay for free, and the LLM call is just another activity.

flowchart LR
    subgraph Temporal workflow
    S[Start] --> A1[Activity: plan
LLM call] A1 --> A2[Activity: execute step 1] A2 --> A3[Activity: execute step 2] A3 --> A4[Activity: review
LLM call] A4 -->|fail| A2 A4 -->|ok| E[End] end

11.6 Observability is non-negotiable

A multi-agent system is a tree of LLM calls. Logs won't save you. Traces will.

Emit OpenTelemetry spans for every agent turn and tool call. Hosted options:

Minimum fields in each span: model, prompt hash, input tokens, output tokens, latency, cost, tool name (if applicable), error.

11.7 Frameworks

If you're a backend engineer, LangGraph or Temporal are the two I'd start with.

11.8 A word on "swarm" and fully autonomous agents

Research and demos showcase swarms of 100+ agents self-organizing to accomplish goals. As of 2026, these are mostly fascinating science projects; production versions remain rare outside narrow domains. The bottlenecks are evaluation, debugging, and cost. Watch the space; don't bet your roadmap on it yet.

11.9 Practical advice

  1. Start with one agent. Add the second only when you can name the specific failure it fixes.
  2. Each subagent: clean context, narrow tool set, focused prompt.
  3. Decide shared state up front. Scratch file, blackboard, or durable workflow.
  4. Budget at the system level. Total tokens, total wall clock, total dollars.
  5. Evals at the system level. End-to-end tasks with expected outcomes.
  6. Treat prompts as code. Version them, review them, test them, deploy them.
  7. Cache aggressively. A multi-agent system redoes enormous amounts of prefix work.

11.10 A half-page to take away

Single agent + tools                 [default]
+ router                            [if many paths]
+ orchestrator + workers            [if parallelizable]
+ planner / executor / reviewer     [if quality-critical]
+ hierarchical                      [if genuinely tree-shaped]
Swarm                               [research]

Climb the ladder only as the task forces you to.

Further reading & watching