🧠 Multi-Agent Orchestration — AI / ML Interview Guide

Agentic Systems · interactive visualization + interview prep

Open the interactive Multi-Agent Orchestration visualization on PrepGrind → Step through a live animation, tune the parameters, and read the full theory, math, reference code, and interview Q&A below — free, in your browser.

What it is

Multi-agent orchestration solves a problem by coordinating several specialized LLM agents instead of asking one agent to do everything. Each agent has a focused role, its own prompt/tools, and a narrow responsibility; an orchestration layer routes work between them, passes messages, and decides who acts next. The win is separation of concerns — a planner that decomposes, workers that execute, a reviewer that checks — so each agent's context stays small and on-task. The cost is coordination: more LLM calls, more places to fail, and the need for clear protocols and stop conditions.

Mental model

Don't picture smarter agents — picture an ORG CHART. One coordinator owns the plan and the handoffs; each worker owns exactly one skill and a small, focused context. Progress is messages moving between roles, and the system is "done" only when the coordinator says so or a guardrail trips. The hard part is never the agents' intelligence — it is the WIRING: who acts next, what context they get, and when to stop.

Theory

Multi-agent orchestration coordinates several specialized LLM agents instead of asking one agent to do everything. Each agent has a focused role, its own prompt and tools, and a narrow responsibility; an orchestration layer routes work, passes messages, and decides who acts next. The benefit is separation of concerns — each agent's context stays small and on-task; the cost is coordination overhead.

There are three canonical patterns. SUPERVISOR/WORKER centralizes control in one agent that routes to specialists — best when you need a clear owner of "who acts and when to stop". PLANNER/EXECUTOR splits "think then do" into a decomposition step and execution — best when an inspectable upfront plan helps. Peer HANDOFFS let agents transfer control directly — natural for pipelines (triage → specialist) but harder to keep from looping without a coordinator.

The useful "math" is accounting, not equations. Cost ≈ Σ over agents of (calls × tokens × price), so multi-agent multiplies LLM cost. Latency ≈ the sum of steps on the critical path, though independent subtasks can run in parallel. Reliability ≈ the PRODUCT of per-step success probabilities, so chaining N agents compounds error — which is why a verifier/reviewer step raises end-to-end success.

Coordination requires deliberate information flow: explicit message passing plus a shared state/scratchpad (a "blackboard" the orchestrator owns), and a defined protocol for what each handoff must contain (task, context, expected output). Context handed between agents should be intentional — exactly what the receiver needs and nothing it must not see — not assumed.

Two non-negotiables: guardrails and observability. Always bound the system (max steps/depth, token/dollar budget, timeout, no-progress detector) so it cannot spin forever, and trace every agent's inputs/outputs/tool-calls or it is nearly impossible to debug. The default advice: start with a single well-prompted agent and only split when you can name the concrete benefit.

Concrete example

Imagine "build me a small data-analysis report" handled by a crew. A SUPERVISOR agent reads the goal and delegates: it asks a RESEARCH agent to gather and summarize sources, hands those notes to a CODE agent that writes and runs the analysis/plots, then routes the draft to a REVIEW agent that checks correctness and style. If the reviewer flags a bug, the supervisor loops back to the code agent with the feedback; once review passes, the supervisor assembles the final answer. No single agent holds the whole job in its head — the supervisor owns the plan and the handoffs, each worker owns one skill.

Key equations

Step by step

  1. Define roles: write a focused system prompt and a tool set for each agent (planner, researcher, coder, reviewer, etc.). Narrow roles keep context small and behavior predictable.
  2. Pick an orchestration pattern: SUPERVISOR/WORKER (a central agent routes to specialists), PLANNER/EXECUTOR (one decomposes into a task list, others carry it out), or peer HANDOFFS (agents pass control directly).
  3. Decompose the goal: the supervisor or planner turns the request into subtasks and decides ordering and dependencies (some can run in parallel, some must be sequential).
  4. Pass messages: route each subtask and the relevant context to the chosen agent. Keep a shared scratchpad/state so results accumulate without resending everything.
  5. Execute and observe: each worker runs (calling tools as needed) and returns a result. The orchestrator inspects it and decides the next move.
  6. Review / verify: a critic or reviewer agent (or a deterministic check) validates outputs; failures loop back to the responsible agent with feedback.
  7. Terminate: stop when the goal is met OR a guardrail trips (max steps, budget, or no-progress detector), then the supervisor synthesizes the final answer.

Interview questions & answers

When should you use multiple agents instead of one?

Use multi-agent when the task has distinct sub-skills or tool sets that benefit from isolated context (research vs code vs review), when you want parallelism across independent subtasks, or when one agent's context window/prompt would otherwise become an unfocused mess. For simple or tightly-coupled tasks a single well-prompted agent with tools is cheaper, lower-latency, and easier to debug — start there and only split when you can name the benefit.

What are the main failure modes of multi-agent systems?

Infinite or ping-pong loops (agents bouncing work back and forth), cost/latency blowup from too many LLM calls, context loss across handoffs (an agent lacks info another had), error compounding down a long chain, and ambiguous ownership where two agents do the same work or each assumes the other will. Mitigate with hard step/budget limits, explicit shared state, a verifier step, and crisp role boundaries.

Supervisor/worker vs planner/executor vs handoffs — when each?

Supervisor/worker centralizes control: good when you need a clear owner deciding who acts and when to stop. Planner/executor splits "think then do": good when the task benefits from an explicit upfront plan you can inspect. Peer handoffs (agents transfer control directly) suit pipelines/workflows with a natural sequence (e.g. triage -> specialist), but are harder to keep from looping without a coordinator.

How do you control cost in a multi-agent system?

Cap calls/steps and a token/dollar budget per run; use a cheaper/smaller model for routing, summarizing, and simple workers and reserve the strong model for hard reasoning; summarize or trim shared state instead of resending full history; cache repeated tool/LLM results; and prefer a single agent when the extra agents don't demonstrably improve the outcome.

How do agents share information and stay coordinated?

Via explicit message passing and a shared state/scratchpad (often a blackboard the orchestrator owns), plus a defined protocol for what each handoff message must contain (task, context, expected output). Standards like MCP standardize tool/context access. The key is that context handed between agents is deliberate, not assumed — each agent should receive exactly what it needs and nothing it must not see.

How do you keep a multi-agent loop from running forever?

Enforce guardrails: a max-step / max-depth counter, a cost or token budget, a wall-clock timeout, and a no-progress detector (stop if state hasn't changed across iterations). The orchestrator owns these limits and forces termination, returning the best partial result with a clear status rather than spinning.

How do you evaluate and debug a multi-agent system?

Trace every agent's inputs, outputs, and tool calls (observability is essential), evaluate per-agent (did the researcher find good sources?) AND end-to-end (did the final answer meet the goal?), and use an LLM-as-judge or deterministic checks on outputs. Reproduce failures by replaying the trace; most bugs are bad handoffs or unbounded loops, which the trace makes visible.

Common pitfalls

Where it shows up

More AI / ML interview concepts

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…