🧠 Multi-Agent Orchestration — AI / ML Interview Guide
Agentic Systems · interactive visualization + interview prep
Open the interactive Multi-Agent Orchestration visualization on PrepGrind → Step through a live animation, tune the parameters, and read the full theory, math, reference code, and interview Q&A below — free, in your browser.
What it is
Multi-agent orchestration solves a problem by coordinating several specialized LLM agents instead of asking one agent to do everything. Each agent has a focused role, its own prompt/tools, and a narrow responsibility; an orchestration layer routes work between them, passes messages, and decides who acts next. The win is separation of concerns — a planner that decomposes, workers that execute, a reviewer that checks — so each agent's context stays small and on-task. The cost is coordination: more LLM calls, more places to fail, and the need for clear protocols and stop conditions.
Mental model
Don't picture smarter agents — picture an ORG CHART. One coordinator owns the plan and the handoffs; each worker owns exactly one skill and a small, focused context. Progress is messages moving between roles, and the system is "done" only when the coordinator says so or a guardrail trips. The hard part is never the agents' intelligence — it is the WIRING: who acts next, what context they get, and when to stop.
Theory
Multi-agent orchestration coordinates several specialized LLM agents instead of asking one agent to do everything. Each agent has a focused role, its own prompt and tools, and a narrow responsibility; an orchestration layer routes work, passes messages, and decides who acts next. The benefit is separation of concerns — each agent's context stays small and on-task; the cost is coordination overhead.
There are three canonical patterns. SUPERVISOR/WORKER centralizes control in one agent that routes to specialists — best when you need a clear owner of "who acts and when to stop". PLANNER/EXECUTOR splits "think then do" into a decomposition step and execution — best when an inspectable upfront plan helps. Peer HANDOFFS let agents transfer control directly — natural for pipelines (triage → specialist) but harder to keep from looping without a coordinator.
The useful "math" is accounting, not equations. Cost ≈ Σ over agents of (calls × tokens × price), so multi-agent multiplies LLM cost. Latency ≈ the sum of steps on the critical path, though independent subtasks can run in parallel. Reliability ≈ the PRODUCT of per-step success probabilities, so chaining N agents compounds error — which is why a verifier/reviewer step raises end-to-end success.
Coordination requires deliberate information flow: explicit message passing plus a shared state/scratchpad (a "blackboard" the orchestrator owns), and a defined protocol for what each handoff must contain (task, context, expected output). Context handed between agents should be intentional — exactly what the receiver needs and nothing it must not see — not assumed.
Two non-negotiables: guardrails and observability. Always bound the system (max steps/depth, token/dollar budget, timeout, no-progress detector) so it cannot spin forever, and trace every agent's inputs/outputs/tool-calls or it is nearly impossible to debug. The default advice: start with a single well-prompted agent and only split when you can name the concrete benefit.
Concrete example
Imagine "build me a small data-analysis report" handled by a crew. A SUPERVISOR agent reads the goal and delegates: it asks a RESEARCH agent to gather and summarize sources, hands those notes to a CODE agent that writes and runs the analysis/plots, then routes the draft to a REVIEW agent that checks correctness and style. If the reviewer flags a bug, the supervisor loops back to the code agent with the feedback; once review passes, the supervisor assembles the final answer. No single agent holds the whole job in its head — the supervisor owns the plan and the handoffs, each worker owns one skill.
Key equations
No closed-form equations — orchestration is a control-flow / systems pattern, not a numeric model. The useful "math" is the cost and reliability accounting:Total cost ≈ Σ_agents (calls × tokens × price) — multi-agent multiplies LLM calls, so cost can balloon vs a single agent.Latency ≈ Σ steps on the critical path (sequential handoffs add up); independent subtasks can run in parallel to cut wall-clock time.Reliability ≈ Π_steps p(step succeeds) — chaining N agents compounds error unless you add checks/retries, so a reviewer or validator raises end-to-end success.Always bound the loop: max_steps / max_cost / max_depth, so the system cannot spin forever.
Step by step
- Define roles: write a focused system prompt and a tool set for each agent (planner, researcher, coder, reviewer, etc.). Narrow roles keep context small and behavior predictable.
- Pick an orchestration pattern: SUPERVISOR/WORKER (a central agent routes to specialists), PLANNER/EXECUTOR (one decomposes into a task list, others carry it out), or peer HANDOFFS (agents pass control directly).
- Decompose the goal: the supervisor or planner turns the request into subtasks and decides ordering and dependencies (some can run in parallel, some must be sequential).
- Pass messages: route each subtask and the relevant context to the chosen agent. Keep a shared scratchpad/state so results accumulate without resending everything.
- Execute and observe: each worker runs (calling tools as needed) and returns a result. The orchestrator inspects it and decides the next move.
- Review / verify: a critic or reviewer agent (or a deterministic check) validates outputs; failures loop back to the responsible agent with feedback.
- Terminate: stop when the goal is met OR a guardrail trips (max steps, budget, or no-progress detector), then the supervisor synthesizes the final answer.
Interview questions & answers
When should you use multiple agents instead of one?
Use multi-agent when the task has distinct sub-skills or tool sets that benefit from isolated context (research vs code vs review), when you want parallelism across independent subtasks, or when one agent's context window/prompt would otherwise become an unfocused mess. For simple or tightly-coupled tasks a single well-prompted agent with tools is cheaper, lower-latency, and easier to debug — start there and only split when you can name the benefit.
What are the main failure modes of multi-agent systems?
Infinite or ping-pong loops (agents bouncing work back and forth), cost/latency blowup from too many LLM calls, context loss across handoffs (an agent lacks info another had), error compounding down a long chain, and ambiguous ownership where two agents do the same work or each assumes the other will. Mitigate with hard step/budget limits, explicit shared state, a verifier step, and crisp role boundaries.
Supervisor/worker vs planner/executor vs handoffs — when each?
Supervisor/worker centralizes control: good when you need a clear owner deciding who acts and when to stop. Planner/executor splits "think then do": good when the task benefits from an explicit upfront plan you can inspect. Peer handoffs (agents transfer control directly) suit pipelines/workflows with a natural sequence (e.g. triage -> specialist), but are harder to keep from looping without a coordinator.
How do you control cost in a multi-agent system?
Cap calls/steps and a token/dollar budget per run; use a cheaper/smaller model for routing, summarizing, and simple workers and reserve the strong model for hard reasoning; summarize or trim shared state instead of resending full history; cache repeated tool/LLM results; and prefer a single agent when the extra agents don't demonstrably improve the outcome.
How do agents share information and stay coordinated?
Via explicit message passing and a shared state/scratchpad (often a blackboard the orchestrator owns), plus a defined protocol for what each handoff message must contain (task, context, expected output). Standards like MCP standardize tool/context access. The key is that context handed between agents is deliberate, not assumed — each agent should receive exactly what it needs and nothing it must not see.
How do you keep a multi-agent loop from running forever?
Enforce guardrails: a max-step / max-depth counter, a cost or token budget, a wall-clock timeout, and a no-progress detector (stop if state hasn't changed across iterations). The orchestrator owns these limits and forces termination, returning the best partial result with a clear status rather than spinning.
How do you evaluate and debug a multi-agent system?
Trace every agent's inputs, outputs, and tool calls (observability is essential), evaluate per-agent (did the researcher find good sources?) AND end-to-end (did the final answer meet the goal?), and use an LLM-as-judge or deterministic checks on outputs. Reproduce failures by replaying the trace; most bugs are bad handoffs or unbounded loops, which the trace makes visible.
Common pitfalls
- Infinite / ping-pong loops between agents — always cap steps, budget, and depth, and add a no-progress detector.
- Cost blowup: every agent and every handoff is more LLM calls; multi-agent can cost several times a single-agent solution for no quality gain.
- Over-engineering: reaching for a crew when one agent with tools would do — more agents means more coordination surface and more bugs.
- Context loss across handoffs: an agent acts without info a previous agent had; pass deliberate, sufficient context (and nothing it shouldn't see).
- Error compounding: a mistake early in a long chain corrupts everything downstream; insert verification/review steps.
- Ambiguous ownership: unclear roles cause duplicated or dropped work — give each agent one crisp responsibility.
- Missing observability: without per-agent traces these systems are nearly impossible to debug.
Where it shows up
- Agent frameworks: LangGraph, AutoGen, CrewAI, OpenAI Swarm — all encode supervisor/worker or handoff orchestration.
- Coding assistants that split planning, editing, and test/review across sub-agents.
- Deep-research / report-generation systems that fan out retrieval to worker agents and synthesize results.
- Customer-support routing: a triage agent hands off to specialist agents (billing, technical, account).
- The Model Context Protocol (MCP) — standardizing how agents and tools exchange context across a system.
More AI / ML interview concepts
- Neural Networks & Backpropagation
- Gradient Descent & Optimizers
- Activation Functions
- K-Means Clustering
- Self-Attention
- Multi-Head Attention
- Softmax, Temperature & Sampling
- Tokenization (Byte-Pair Encoding)
- Positional Encoding
- KV Cache
- Rotary Position Embedding (RoPE)
- The Transformer Block
- Normalization (LayerNorm / RMSNorm)
- Multi-Query & Grouped-Query Attention
- Flash Attention
- Decoding: Beam Search & Speculative Decoding
- Embeddings & Cosine Similarity
- RAG (Retrieval-Augmented Generation) Pipeline
- Vector Search (HNSW)
- Chunking & Reranking
- ReAct Agent Loop
- Tool / Function Calling
- Planning & Task Decomposition
- Agent Memory
- Model Context Protocol (MCP)
- Quantization
- LoRA / PEFT Fine-Tuning
- Mixture of Experts (MoE)
- RLHF / DPO Alignment
- Evals & LLM-as-Judge
- Prompt Injection & Guardrails
- Knowledge Distillation
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…