🧠 ReAct Agent Loop — AI / ML Interview Guide
Agentic Systems · interactive visualization + interview prep
Open the interactive ReAct Agent Loop visualization on PrepGrind → Step through a live animation, tune the parameters, and read the full theory, math, reference code, and interview Q&A below — free, in your browser.
What it is
A ReAct agent solves a task by interleaving REASoning and ACTing: it writes a Thought, takes an Action (calls a tool), reads the Observation (the tool’s result), and loops — using each observation to decide the next step — until it has enough to give a final Answer.
Mental model
A detective working a case out loud. Instead of guessing the whole answer, the agent forms a hypothesis (Thought), checks ONE clue (Action → tool), reads what it reveals (Observation), updates, and repeats. The growing trace IS its working memory; each observation is a fact it did not have a moment ago, so the next thought is grounded in reality rather than invented.
Theory
ReAct (Reason + Act) interleaves two things an LLM does well separately but poorly alone. Pure chain-of-thought reasons but never checks the world, so it hallucinates facts. Pure tool-calling acts but does not deliberate about what to do next. ReAct alternates them: a Thought decides what is needed, an Action gets it, an Observation grounds the next Thought.
Mechanically it is a loop over a growing text trace. The LLM generates a Thought and an Action; the environment (your code) executes the action and returns an Observation; that observation is appended to the context so the next generation is conditioned on it. The accumulating Thought/Action/Observation trace is the agent's scratchpad and short-term memory.
The reason to interleave rather than plan everything up front is that real tasks depend on information you do not have yet — the population query needs the capital first. Reacting to each observation lets the agent adapt, recover from dead ends, and avoid committing to a stale plan. (When an explicit upfront plan IS worth it, see the Planning concept; hybrids plan then re-plan.)
Termination is a first-class concern: the loop ends when the model emits a final Answer, or when a guardrail trips — max steps, a token/dollar budget, a wall-clock timeout, or a repetition/no-progress detector. Without these an agent can loop forever and burn unbounded cost.
Two practical failure surfaces dominate. Context bloat: large tool outputs appended verbatim blow up the context, so observations should be summarized or trimmed. And trust: the agent treats observations as ground truth, so a wrong or poisoned tool result derails the whole chain — tool output is an untrusted input, not gospel.
Concrete example
Ask "how many people live in the capital of France?" A ReAct agent thinks "I need the capital first", searches → "Paris", thinks "now I need its population", searches → "2.1M", then answers. It chained two tool calls, each informed by the last result — something a single LLM call can’t reliably do.
Key equations
loop: Thought → Action → Observation → (repeat) → Answerthe LLM generates Thought + Action; the environment returns the Observationeach Observation is appended to the context, conditioning the next Thoughta stop condition (Answer / max steps) ends the loop
Step by step
- Thought — the model reasons about what it needs next.
- Action — it calls a tool (search, calculator, API…) with arguments.
- Observation — the tool result comes back into the context.
- Loop — repeat, each step grounded in the previous observation.
- Answer — once it has enough, it stops and responds.
Interview questions & answers
Why interleave reasoning and acting instead of planning everything up front?
Real tasks need information you don’t have yet. ReAct lets the agent adapt: each observation can change the plan, reducing hallucination and handling dead ends — whereas a fixed upfront plan can’t react to what tools actually return.
What stops a ReAct loop from running forever?
A stop condition: the model emits a final Answer action, or you cap max steps / a budget. Production agents also add loop/repetition detection and timeouts.
How is the Observation fed back to the model?
It’s appended to the conversation/context as text (or a tool message), so the next generation is conditioned on it. The growing trace is the agent’s working memory.
ReAct vs plain chain-of-thought?
CoT reasons internally with no external actions — it can still hallucinate facts. ReAct grounds reasoning in real tool observations, so answers are checkable and current.
Common pitfalls
- No step cap → infinite or runaway loops (and runaway cost).
- Feeding huge observations back → context bloat; summarize/trim tool results.
- Trusting tool output blindly — bad observations derail the whole chain.
Where it shows up
- LangChain / LlamaIndex agents, the original ReAct paper
- Tool-using assistants & coding agents
- Function-calling agent loops over APIs
More AI / ML interview concepts
- Neural Networks & Backpropagation
- Gradient Descent & Optimizers
- Activation Functions
- K-Means Clustering
- Self-Attention
- Multi-Head Attention
- Softmax, Temperature & Sampling
- Tokenization (Byte-Pair Encoding)
- Positional Encoding
- KV Cache
- Rotary Position Embedding (RoPE)
- The Transformer Block
- Normalization (LayerNorm / RMSNorm)
- Multi-Query & Grouped-Query Attention
- Flash Attention
- Decoding: Beam Search & Speculative Decoding
- Embeddings & Cosine Similarity
- RAG (Retrieval-Augmented Generation) Pipeline
- Vector Search (HNSW)
- Chunking & Reranking
- Tool / Function Calling
- Multi-Agent Orchestration
- Planning & Task Decomposition
- Agent Memory
- Model Context Protocol (MCP)
- Quantization
- LoRA / PEFT Fine-Tuning
- Mixture of Experts (MoE)
- RLHF / DPO Alignment
- Evals & LLM-as-Judge
- Prompt Injection & Guardrails
- Knowledge Distillation
PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…