🧠 Multi-Agent Orchestration — AI / ML Interview Guide

Agentic Systems · interactive visualization + interview prep

Open the interactive Multi-Agent Orchestration visualization on PrepGrind → Step through a live animation, tune the parameters, and read the full theory, math, reference code, and interview Q&A below — free, in your browser.

What it is

Multi-agent orchestration solves a problem by coordinating several specialized LLM agents instead of asking one agent to do everything. Each agent has a focused role, its own prompt/tools, and a narrow responsibility; an orchestration layer routes work between them, passes messages, and decides who acts next. The win is separation of concerns — a planner that decomposes, workers that execute, a reviewer that checks — so each agent's context stays small and on-task. The cost is coordination: more LLM calls, more places to fail, and the need for clear protocols and stop conditions.

Mental model

Don't picture smarter agents — picture an ORG CHART. One coordinator owns the plan and the handoffs; each worker owns exactly one skill and a small, focused context. Progress is messages moving between roles, and the system is "done" only when the coordinator says so or a guardrail trips. The hard part is never the agents' intelligence — it is the WIRING: who acts next, what context they get, and when to stop.

Theory

Multi-agent orchestration coordinates several specialized LLM agents instead of asking one agent to do everything. Each agent has a focused role, its own prompt and tools, and a narrow responsibility; an orchestration layer routes work, passes messages, and decides who acts next. The benefit is separation of concerns — each agent's context stays small and on-task; the cost is coordination overhead.

There are three canonical patterns. SUPERVISOR/WORKER centralizes control in one agent that routes to specialists — best when you need a clear owner of "who acts and when to stop". PLANNER/EXECUTOR splits "think then do" into a decomposition step and execution — best when an inspectable upfront plan helps. Peer HANDOFFS let agents transfer control directly — natural for pipelines (triage → specialist) but harder to keep from looping without a coordinator.

The useful "math" is accounting, not equations. Cost ≈ Σ over agents of (calls × tokens × price), so multi-agent multiplies LLM cost. Latency ≈ the sum of steps on the critical path, though independent subtasks can run in parallel. Reliability ≈ the PRODUCT of per-step success probabilities, so chaining N agents compounds error — which is why a verifier/reviewer step raises end-to-end success.

Coordination requires deliberate information flow: explicit message passing plus a shared state/scratchpad (a "blackboard" the orchestrator owns), and a defined protocol for what each handoff must contain (task, context, expected output). Context handed between agents should be intentional — exactly what the receiver needs and nothing it must not see — not assumed.

Two non-negotiables: guardrails and observability. Always bound the system (max steps/depth, token/dollar budget, timeout, no-progress detector) so it cannot spin forever, and trace every agent's inputs/outputs/tool-calls or it is nearly impossible to debug. The default advice: start with a single well-prompted agent and only split when you can name the concrete benefit.

Concrete example

Imagine "build me a small data-analysis report" handled by a crew. A SUPERVISOR agent reads the goal and delegates: it asks a RESEARCH agent to gather and summarize sources, hands those notes to a CODE agent that writes and runs the analysis/plots, then routes the draft to a REVIEW agent that checks correctness and style. If the reviewer flags a bug, the supervisor loops back to the code agent with the feedback; once review passes, the supervisor assembles the final answer. No single agent holds the whole job in its head — the supervisor owns the plan and the handoffs, each worker owns one skill.

Key equations

No closed-form equations — orchestration is a control-flow / systems pattern, not a numeric model. The useful "math" is the cost and reliability accounting:
Total cost ≈ Σ_agents (calls × tokens × price) — multi-agent multiplies LLM calls, so cost can balloon vs a single agent.
Latency ≈ Σ steps on the critical path (sequential handoffs add up); independent subtasks can run in parallel to cut wall-clock time.
Reliability ≈ Π_steps p(step succeeds) — chaining N agents compounds error unless you add checks/retries, so a reviewer or validator raises end-to-end success.
Always bound the loop: max_steps / max_cost / max_depth, so the system cannot spin forever.

Step by step

Define roles: write a focused system prompt and a tool set for each agent (planner, researcher, coder, reviewer, etc.). Narrow roles keep context small and behavior predictable.
Pick an orchestration pattern: SUPERVISOR/WORKER (a central agent routes to specialists), PLANNER/EXECUTOR (one decomposes into a task list, others carry it out), or peer HANDOFFS (agents pass control directly).
Decompose the goal: the supervisor or planner turns the request into subtasks and decides ordering and dependencies (some can run in parallel, some must be sequential).
Pass messages: route each subtask and the relevant context to the chosen agent. Keep a shared scratchpad/state so results accumulate without resending everything.
Execute and observe: each worker runs (calling tools as needed) and returns a result. The orchestrator inspects it and decides the next move.
Review / verify: a critic or reviewer agent (or a deterministic check) validates outputs; failures loop back to the responsible agent with feedback.
Terminate: stop when the goal is met OR a guardrail trips (max steps, budget, or no-progress detector), then the supervisor synthesizes the final answer.

Interview questions & answers

When should you use multiple agents instead of one?

Use multi-agent when the task has distinct sub-skills or tool sets that benefit from isolated context (research vs code vs review), when you want parallelism across independent subtasks, or when one agent's context window/prompt would otherwise become an unfocused mess. For simple or tightly-coupled tasks a single well-prompted agent with tools is cheaper, lower-latency, and easier to debug — start there and only split when you can name the benefit.

What are the main failure modes of multi-agent systems?

Infinite or ping-pong loops (agents bouncing work back and forth), cost/latency blowup from too many LLM calls, context loss across handoffs (an agent lacks info another had), error compounding down a long chain, and ambiguous ownership where two agents do the same work or each assumes the other will. Mitigate with hard step/budget limits, explicit shared state, a verifier step, and crisp role boundaries.

Supervisor/worker vs planner/executor vs handoffs — when each?

Supervisor/worker centralizes control: good when you need a clear owner deciding who acts and when to stop. Planner/executor splits "think then do": good when the task benefits from an explicit upfront plan you can inspect. Peer handoffs (agents transfer control directly) suit pipelines/workflows with a natural sequence (e.g. triage -> specialist), but are harder to keep from looping without a coordinator.

How do you control cost in a multi-agent system?

Cap calls/steps and a token/dollar budget per run; use a cheaper/smaller model for routing, summarizing, and simple workers and reserve the strong model for hard reasoning; summarize or trim shared state instead of resending full history; cache repeated tool/LLM results; and prefer a single agent when the extra agents don't demonstrably improve the outcome.

How do agents share information and stay coordinated?

Via explicit message passing and a shared state/scratchpad (often a blackboard the orchestrator owns), plus a defined protocol for what each handoff message must contain (task, context, expected output). Standards like MCP standardize tool/context access. The key is that context handed between agents is deliberate, not assumed — each agent should receive exactly what it needs and nothing it must not see.

How do you keep a multi-agent loop from running forever?

Enforce guardrails: a max-step / max-depth counter, a cost or token budget, a wall-clock timeout, and a no-progress detector (stop if state hasn't changed across iterations). The orchestrator owns these limits and forces termination, returning the best partial result with a clear status rather than spinning.

How do you evaluate and debug a multi-agent system?

Trace every agent's inputs, outputs, and tool calls (observability is essential), evaluate per-agent (did the researcher find good sources?) AND end-to-end (did the final answer meet the goal?), and use an LLM-as-judge or deterministic checks on outputs. Reproduce failures by replaying the trace; most bugs are bad handoffs or unbounded loops, which the trace makes visible.

Common pitfalls

Infinite / ping-pong loops between agents — always cap steps, budget, and depth, and add a no-progress detector.
Cost blowup: every agent and every handoff is more LLM calls; multi-agent can cost several times a single-agent solution for no quality gain.
Over-engineering: reaching for a crew when one agent with tools would do — more agents means more coordination surface and more bugs.
Context loss across handoffs: an agent acts without info a previous agent had; pass deliberate, sufficient context (and nothing it shouldn't see).
Error compounding: a mistake early in a long chain corrupts everything downstream; insert verification/review steps.
Ambiguous ownership: unclear roles cause duplicated or dropped work — give each agent one crisp responsibility.
Missing observability: without per-agent traces these systems are nearly impossible to debug.

Where it shows up

Agent frameworks: LangGraph, AutoGen, CrewAI, OpenAI Swarm — all encode supervisor/worker or handoff orchestration.
Coding assistants that split planning, editing, and test/review across sub-agents.
Deep-research / report-generation systems that fan out retrieval to worker agents and synthesize results.
Customer-support routing: a triage agent hands off to specialist agents (billing, technical, account).
The Model Context Protocol (MCP) — standardizing how agents and tools exchange context across a system.

More AI / ML interview concepts

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…