🧠 ReAct Agent Loop — AI / ML Interview Guide

Agentic Systems · interactive visualization + interview prep

Open the interactive ReAct Agent Loop visualization on PrepGrind → Step through a live animation, tune the parameters, and read the full theory, math, reference code, and interview Q&A below — free, in your browser.

What it is

A ReAct agent solves a task by interleaving REASoning and ACTing: it writes a Thought, takes an Action (calls a tool), reads the Observation (the tool’s result), and loops — using each observation to decide the next step — until it has enough to give a final Answer.

Mental model

A detective working a case out loud. Instead of guessing the whole answer, the agent forms a hypothesis (Thought), checks ONE clue (Action → tool), reads what it reveals (Observation), updates, and repeats. The growing trace IS its working memory; each observation is a fact it did not have a moment ago, so the next thought is grounded in reality rather than invented.

Theory

ReAct (Reason + Act) interleaves two things an LLM does well separately but poorly alone. Pure chain-of-thought reasons but never checks the world, so it hallucinates facts. Pure tool-calling acts but does not deliberate about what to do next. ReAct alternates them: a Thought decides what is needed, an Action gets it, an Observation grounds the next Thought.

Mechanically it is a loop over a growing text trace. The LLM generates a Thought and an Action; the environment (your code) executes the action and returns an Observation; that observation is appended to the context so the next generation is conditioned on it. The accumulating Thought/Action/Observation trace is the agent's scratchpad and short-term memory.

The reason to interleave rather than plan everything up front is that real tasks depend on information you do not have yet — the population query needs the capital first. Reacting to each observation lets the agent adapt, recover from dead ends, and avoid committing to a stale plan. (When an explicit upfront plan IS worth it, see the Planning concept; hybrids plan then re-plan.)

Termination is a first-class concern: the loop ends when the model emits a final Answer, or when a guardrail trips — max steps, a token/dollar budget, a wall-clock timeout, or a repetition/no-progress detector. Without these an agent can loop forever and burn unbounded cost.

Two practical failure surfaces dominate. Context bloat: large tool outputs appended verbatim blow up the context, so observations should be summarized or trimmed. And trust: the agent treats observations as ground truth, so a wrong or poisoned tool result derails the whole chain — tool output is an untrusted input, not gospel.

Concrete example

Ask "how many people live in the capital of France?" A ReAct agent thinks "I need the capital first", searches → "Paris", thinks "now I need its population", searches → "2.1M", then answers. It chained two tool calls, each informed by the last result — something a single LLM call can’t reliably do.

Key equations

loop: Thought → Action → Observation → (repeat) → Answer
the LLM generates Thought + Action; the environment returns the Observation
each Observation is appended to the context, conditioning the next Thought
a stop condition (Answer / max steps) ends the loop

Step by step

Thought — the model reasons about what it needs next.
Action — it calls a tool (search, calculator, API…) with arguments.
Observation — the tool result comes back into the context.
Loop — repeat, each step grounded in the previous observation.
Answer — once it has enough, it stops and responds.

Interview questions & answers

Why interleave reasoning and acting instead of planning everything up front?

Real tasks need information you don’t have yet. ReAct lets the agent adapt: each observation can change the plan, reducing hallucination and handling dead ends — whereas a fixed upfront plan can’t react to what tools actually return.

What stops a ReAct loop from running forever?

A stop condition: the model emits a final Answer action, or you cap max steps / a budget. Production agents also add loop/repetition detection and timeouts.

How is the Observation fed back to the model?

It’s appended to the conversation/context as text (or a tool message), so the next generation is conditioned on it. The growing trace is the agent’s working memory.

ReAct vs plain chain-of-thought?

CoT reasons internally with no external actions — it can still hallucinate facts. ReAct grounds reasoning in real tool observations, so answers are checkable and current.

Common pitfalls

No step cap → infinite or runaway loops (and runaway cost).
Feeding huge observations back → context bloat; summarize/trim tool results.
Trusting tool output blindly — bad observations derail the whole chain.

Where it shows up

LangChain / LlamaIndex agents, the original ReAct paper
Tool-using assistants & coding agents
Function-calling agent loops over APIs

More AI / ML interview concepts

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…