Part 3: Tool Calls: How an LLM Takes Action

Part 3 of the Building with LLMs series. A tool call is just the model emitting structured text that asks your code to run a function. The model never acts itself.

Krishna C

April 28, 2026

•

4 min read

TL;DR

An LLM still only does text in, text out. A tool call is just the model emitting structured text that says "call this function with these arguments." Your code runs it, sends the result back as another message, and the model continues. The model never runs anything itself.

This is Part 3 of the Building with LLMs series. Part 1 showed that an LLM is a stateless function: text in, text out. Part 2 showed how to force that text into a structured shape your code can trust. Tool calling sits right on top of both. If a model can't fetch your order status or run a query, it's just a clever writer. Tools are how it actually does things.

The Model Still Only Talks

This is the part people get wrong. They picture the model reaching out and hitting an API. It doesn't. It can't. The model has no network, no database, no file system. All it can do is produce text.

A tool call is text. The model writes "I want to call get_weather with city = Paris," and that's the entire extent of its power. Something else has to read that and actually make the call. That something is your code.

You Hand It a Menu of Tools

The model can't request a tool it doesn't know exists. So in the request, alongside the system and user messages, you include a list of available tools. Each tool has a name, a description, and a parameter schema, the same JSON Schema idea from Part 2.

Handing over the menu doesn't give the model new abilities. It just tells the model what's on offer and how to ask for it. The description matters more than people think. It's how the model decides when a tool is the right move.

The Model Asks, It Doesn't Act

When the model decides a tool is needed, it doesn't return prose. It returns a structured tool-call request: the tool name and the arguments, constrained to that tool's schema. This is literally the structured output from Part 2, with a specific job.

The model then stops and waits. It has done all it can. It asked. Nothing has happened yet.

Your App Runs the Tool and Replies

Your code takes over. You parse the tool-call request, find the matching function, run it with the given arguments, and get a result. Then you append that result to the message chain as a tool message and send the whole thing back to the model, exactly the "resend the chain" idea from Part 1.

The model now sees the tool result in its input and writes a normal answer using it. From the model's side, nothing magic happened. It asked for something in text and got an answer back in text. You did the actual work in between.

Many Tools at Once

Older models asked for one tool, waited, then maybe asked for another. Slow when the calls don't depend on each other. Newer models can request several tool calls in a single turn.

Your code runs the independent ones together, collects every result, appends them all, and sends the chain back once. Fewer round trips, same pattern.

What This Looks Like in Code

You declare a tool with a schema, the same way you declared a response schema in Part 2:

1{
2  "name": "get_weather",
3  "description": "Get current weather for a city.",
4  "parameters": {
5    "type": "object",
6    "properties": { "city": { "type": "string" } },
7    "required": ["city"],
8    "additionalProperties": false
9  }
10}

The model replies with a tool-call request, not prose:

1{ "tool": "get_weather", "arguments": { "city": "Paris" } }

Your app parses it, runs the real function, and appends the result as a message:

1{ "role": "tool", "name": "get_weather", "content": "18C, clear" }

The loop that ties it together is small:

1messages = [system, user]
2loop:
3    response = llm.generate(messages, tools = TOOLS)
4    if response has tool calls:
5        for call in response.tool_calls:        # may be several
6            result = run(call.tool, call.arguments)
7            messages.append(tool_message(call, result))
8    else:
9        return response.text                    # model is done

That loop, run a tool, feed the result back, let the model continue, is the seed of what later becomes an agent.

When Tool Calls Go Wrong

Two kinds of failure, and you handle them differently.

The model's side:

It asks for a tool that doesn't exist, or misspells the name.
It sends arguments that don't match the schema, or invents a parameter.
It calls a tool when it shouldn't, or skips one it should have used.

Your side:

The function throws, times out, or the downstream API is down.
The tool returns something huge or malformed.
The arguments are valid JSON but nonsense, like a city that doesn't exist.

The trick for both: don't crash. Turn the failure into a tool message and send it back. Models are good at reading "that tool doesn't exist, here are the ones that do" or "the API timed out" and adjusting. Schema constraints from Part 2 prevent most malformed argument problems before they ever reach your code. Always cap retries so a stubborn failure can't loop forever.

Thoughts? Hit me up at [email protected]

Building with LLMs — full series

Part 3 of 22

Part 1:How an LLM Actually Works: It Has No Memory
Part 2:Structured Output: Getting Reliable Data Out of an LLM
Part 3:Tool Calls: How an LLM Takes Action(you are here)
Part 4:Streaming: Getting Tokens as the Model Generates Them
Part 5:Memory and Context Engineering (coming soon)
Part 6:Evaluating LLM Output (coming soon)
Part 7:Telemetry and Observability (coming soon)
Part 8:Human in the Loop (coming soon)
Part 9:RAG (coming soon)
Part 10:Agents (coming soon)
Part 11:Frameworks (coming soon)
Part 12:Model Context Protocol (MCP) (coming soon)
Part 13:Skills (coming soon)
Part 14:Agent-to-Agent Communication (A2A) (coming soon)
Part 15:Tokenomics and Cost (coming soon)
Part 16:Browser Use (coming soon)
Part 17:Computer Use (coming soon)
Part 18:Mobile Use (coming soon)
Part 19:LLM-Powered Frontends (coming soon)
Part 20:AI Gateways (coming soon)
Part 21:Design Patterns for Agentic AI (coming soon)
Part 22:Scaling AI Agents (coming soon)

← Previous

Part 4: Streaming: Getting Tokens as the Model Generates Them

Part 4 of the Building with LLMs series. Streaming changes when you get the text, not what. It feels faster, and it quietly breaks structured output.

Part 2: Structured Output: Getting Reliable Data Out of an LLM

Part 2 of the Building with LLMs series. An LLM only produces text. Structured output is how you force that text into a shape your code can parse every time.