🧠 Tool / Function Calling — AI / ML Interview Guide

Agentic Systems · interactive visualization + interview prep

Open the interactive Tool / Function Calling visualization on PrepGrind → Step through a live animation, tune the parameters, and read the full theory, math, reference code, and interview Q&A below — free, in your browser.

What it is

An LLM can’t fetch live data or do exact math on its own. Function calling fixes that: you give the model a list of tools (name + typed parameters); instead of answering directly, it emits a structured CALL (tool name + JSON args); your code runs the tool and feeds the result back; the model then answers using it.

Mental model

The LLM is a smart manager who cannot touch the tools. It writes a precise work order — WHICH tool and WHAT arguments, as structured JSON — and your code (the worker) executes it and reports back; the manager then interprets the result for the user. The model only ever CHOOSES; your runtime ACTS. The model never runs code or hits an API itself — a separation that is both the safety boundary and the source of every tool-use bug.

Theory

A language model is a text predictor: it cannot fetch live data, do exact arithmetic reliably, or cause side effects. Function (tool) calling bridges that gap by letting the model emit a STRUCTURED request to call external code instead of answering from memory — turning a closed system into one that can act on the world.

The contract has four parts. (1) You declare tools as schemas: name, a natural-language description, and a JSON-schema for the parameters. (2) The model, given the user message and these schemas, may emit a tool call {name, arguments} instead of prose. (3) Your runtime validates the args, executes the tool, and appends the result as a tool message. (4) The model reads the result and produces the final grounded answer — often looping through several calls first.

How does the model reliably produce valid JSON? It is trained on tool-use data, and most APIs additionally use CONSTRAINED DECODING (a grammar / JSON mode) that masks out tokens which would break the schema, so the output is guaranteed to parse. The description and schema quality directly determine whether the model picks the right tool with the right arguments.

Tool calling is the atomic primitive beneath agents. A single call is one Action+Observation; wrap it in a reasoning loop and you have a ReAct agent; standardize WHERE the tools live and how they are discovered and you have MCP. Understanding this layering is a common interview thread.

The risks are real because the model now triggers real effects: it can pick the wrong tool or arguments, be steered by prompt injection into unintended calls, or invoke something destructive. Mitigations are least-privilege tools, schema validation, allow-lists, explicit confirmation for destructive actions, and always feeding tool ERRORS back so the model can recover rather than stall.

Concrete example

Ask "what’s the weather in Tokyo?" The model returns get_weather({"city":"Tokyo"}) instead of guessing. Your backend calls the real weather API, returns {temp:18, condition:"cloudy"}, and the model replies "It’s 18°C and cloudy in Tokyo." The LLM chose the tool and the arguments; your code did the actual work.

Key equations

you provide: tools = [{ name, description, parameters (JSON schema) }]
model output: a tool call { name, arguments } (validated against the schema)
your runtime executes the tool → returns a result message
the result is appended to context; the model generates the final answer
often loops (multiple calls) before the final response

Step by step

User asks a question that needs external data or computation.
The model emits a structured function call (name + JSON arguments) — not prose.
Your code executes that tool and returns the result.
The model reads the result and writes the grounded final answer.

Interview questions & answers

Does the LLM execute the function?

No. The model only DECIDES which tool to call and with what arguments (a structured JSON object). Your application code executes it and returns the result. The model never runs code or hits APIs itself.

How does the model produce valid JSON arguments?

It’s trained/constrained to emit arguments matching the tool’s JSON schema; many APIs enforce it with constrained decoding (grammar/JSON mode) so the output always parses.

How does this relate to a ReAct agent?

Function calling is the Action+Observation mechanism. A ReAct loop is repeated tool calling with reasoning between calls until the task is done.

What are the main risks?

Wrong tool / wrong args, prompt-injection causing unintended calls, and side effects. Mitigate with schema validation, allow-lists, confirmation for destructive actions, and least-privilege tools.

Common pitfalls

Letting the model call tools with side effects without validation/confirmation.
Vague tool descriptions → the model picks the wrong tool or bad arguments.
Not handling tool errors — the model needs the error back to recover.

Where it shows up

OpenAI/Anthropic tool use & function calling APIs
Every tool-using agent framework (the Action step)
MCP servers exposing tools to models

More AI / ML interview concepts

PrepGrind runs entirely in your browser, free, no installation required. Loading the interactive playground…