AI & ML

Part 2: Structured Output: Getting Reliable Data Out of an LLM

Part 2 of the Building with LLMs series. An LLM only produces text. Structured output is how you force that text into a shape your code can parse every time.

Krishna C
Krishna C

April 21, 2026

4 min read

TL;DR

An LLM only produces text. If you want to use that output inside a program, you need it structured and predictable, not a paragraph of prose. Structured output is how you force the model's text to fit a shape your code can parse, every single time.

This is Part 2 of the Building with LLMs series. In Part 1 we covered the one idea everything rests on: an LLM is a stateless function. Text in, text out. That's great when a human reads the answer. It's a problem the moment your code has to read it.

Text Is Hard for Code to Use

Say you ask a model to pull the name, email, and company out of an email signature. It might reply:

Sure! The person's name is Priya Nair, her email is [email protected], and she works at Acme Corp.

A human gets that instantly. Your code does not. Now you're writing fragile string parsing, and the next response phrases it differently and breaks everything. The model is helpful, chatty, and inconsistent. Code needs the opposite: boring and exact.

What you actually want back is this:

1{ "name": "Priya Nair", "email": "[email protected]", "company": "Acme Corp" }

Same information. One is a story. The other is data.

Asking Nicely Isn't Enough

The first thing everyone tries is asking for JSON in the prompt. "Respond only with JSON in this format." It works most of the time, which is the trap. The failures are ugly:

  • It wraps the JSON in a Markdown code fence.
  • It adds "Here is the JSON you asked for:" before it.
  • It invents an extra field, or renames one.
  • One time in fifty it returns a sentence instead.

"Most of the time" is fine for a demo and unacceptable in a workflow that runs ten thousand times a day. You need a guarantee, not a polite request.

Constrain the Output to a Schema

Modern model APIs let you attach a schema to the request, usually JSON Schema. You define the exact shape you want: which fields exist, their types, which are required. The API then constrains the model's decoding so the text it produces has to fit that schema.

This is different from asking nicely. The model is not free to add a friendly preamble or a stray field, because tokens that would break the schema are not allowed during generation. You go from "usually valid" to "valid by construction."

A schema for the example above looks like this:

1{
2 "type": "object",
3 "properties": {
4 "name": { "type": "string" },
5 "email": { "type": "string" },
6 "company": { "type": "string" }
7 },
8 "required": ["name", "email", "company"],
9 "additionalProperties": false
10}

You send your prompt plus this schema. You get back JSON that matches it. Your code parses it directly.

Validate, Then Trust

Schema-constrained output is strong, but treat the boundary like any other external input. The structure is guaranteed. The meaning is not. The model can still return "company": "Unknown" or an email that isn't real. Validate values, not just shape.

A small retry loop handles the rare miss. Parse it, check the values you care about, and if something is off, send the response back to the model with a short note about what was wrong. Cap the retries so a bad input can't loop forever.

It's the Same Trick Behind Tool Calls

This matters beyond data extraction. When a model "calls a tool," it isn't running anything. It's emitting structured output that names a function and its arguments, constrained to a schema you provided. Tool calling is structured output with a specific purpose. Get this part solid and the next one comes almost for free. That's Part 3.

What This Looks Like in Code

The shape of the call is simple. You send the prompt and the schema, then parse and validate the result:

1schema = { ...JSON Schema as above... }
2
3response = llm.generate(
4 prompt = "Extract name, email, company from this signature:\n" + text,
5 response_schema = schema
6)
7
8data = json.parse(response) # guaranteed to parse
9if not looks_valid(data): # your own value checks
10 # retry once with the error fed back
11 ...
12use(data)

Notice there's no string scraping anywhere. The schema did the hard part, and your code only checks meaning.

When It Breaks

Even with constraints, plan for these:

  • Refusals. If the request trips a safety filter, you may get a refusal that doesn't fit the schema. Detect it and handle it separately.
  • Empty or null fields. The model fit the shape but had nothing to put in a field, so it used an empty string or null. Decide what your code does with that.
  • Streaming. If you stream the response, you only have valid JSON once it's complete. Don't parse a half-finished object.
  • Schema too strict. Over-constraining can hurt answer quality, because you've removed room for the model to reason. Keep schemas as loose as your code can tolerate.

Thoughts? Hit me up at [email protected]

#ai

← Previous

Part 3: Tool Calls: How an LLM Takes Action

Part 3 of the Building with LLMs series. A tool call is just the model emitting structured text that asks your code to run a function. The model never acts itself.