Part 1: How an LLM Actually Works: It Has No Memory

Part 1 of the Building with LLMs series. An LLM is a stateless function: prompt in, response out. It remembers nothing, so you resend the whole chain.

Krishna C

April 14, 2026

•

7 min read

TL;DR

An LLM is a stateless function. You send it text, it sends text back, and it forgets everything the moment it responds. It has no built-in memory. If you want it to "remember" your conversation, you resend the entire chain every single turn. Once you get this, the rest of agentic AI starts to make sense.

A lot of people I know keep asking me the same thing. Colleagues, friends, folks who haven't touched the Agentic world yet. Where do I start? How do I start? Should I even bother learning this at all?

You should start, and you should start now. The way things are going, knowing how to build with LLMs and agents won't be a nice skill to have. It'll be non-negotiable, the same way knowing how to use a computer became non-negotiable.

That's why I'm writing this series. Not theory I picked up from a paper. This is what I know from building these systems, written the way I'd explain it to a friend asking where to begin. Each part builds on the last.

Most people think an LLM is like a chatbot that knows you. It doesn't. It's closer to a calculator. You press buttons, you get a number, and the calculator has no idea you used it five seconds ago. An LLM works the same way, just with text instead of numbers.

An LLM Is Just a Function

Think back to math class. A function takes an input and gives you an output. Same input, same kind of output. Nothing is stored between calls.

An LLM is that, at a huge scale. The input is text. The output is text. There is no hidden state carried over from your last question.

That's the whole model of how it works. Text goes in, text comes out. The interesting part is that you control what goes into that box.

You Design the Input

When you call an LLM, the input usually has two parts:

System prompt: the instructions. Who the model should act as, the rules, the tone, the format you want back.
User prompt: the actual question or task from the person using your app.

The model reads both, reasons over them, and produces one response. We call that response the AI message.

The "question" side is yours to build. The model doesn't decide what context it gets. You do. Good output starts with a well-designed input, and that's a skill we'll keep coming back to in this series.

It Forgets Everything

This trips up most beginners. The moment the model finishes its response, it forgets the whole exchange. The next call starts from zero.

The model is stateless. It does not have memory between calls. It only reasons about exactly what you sent it in that one call. Nothing more.

So if you ask "What's the capital of France?" and it says "Paris," then you ask "What's the population there?", the model has no idea what "there" means. That second call is brand new. It never saw the first one.

To "Remember," You Resend the Chain

So how do chat apps feel like they remember you? They cheat, in a good way. They resend the whole conversation every turn.

On turn one, you send the system prompt and the user message. The model replies. On turn two, you don't just send the new question. You send the system prompt, the original user message, the AI message from turn one, and then the new user message. The whole chain.

The model on turn two now has the full history in front of it, so it can resolve "there" to Paris. It didn't remember anything. You handed it the memory as part of the input.

This is why long conversations get slower and cost more. Every turn, the input grows because you're resending the entire history.

What This Looks Like in Code

In practice, the conversation is just a list of messages with roles. Here's turn one:

1[
2  { "role": "system", "content": "You are a helpful geography assistant." },
3  { "role": "user", "content": "What's the capital of France?" }
4]

The model returns "Paris". For turn two, you append the model's reply and the new question, then send the whole list again:

1[
2  { "role": "system", "content": "You are a helpful geography assistant." },
3  { "role": "user", "content": "What's the capital of France?" },
4  { "role": "assistant", "content": "Paris" },
5  { "role": "user", "content": "What's the population there?" }
6]

Notice the list grew. The model never stored the first exchange. Your app did, and replayed it. That growing list is the entire trick behind "memory" in chat apps, and it's why context management becomes a real engineering problem later in this series.

The Mental Shift

An LLM isn't a person who knows you. It's one step in a workflow, a step that can reason about messy input and give you a useful answer instead of you writing rigid if-else rules for every case.

That's the whole idea. An LLM is a building block you drop into a system, a step that brings a little reasoning where you used to need hardcoded logic. Everything else in agentic AI (tools, memory, agents) is built on top of this one stateless step.

Play to What the Model Already Knows

An LLM learned from a giant pile of the internet. It's already very good at the things the internet has a lot of, and weaker at the things the internet barely has.

Roughly, from most common to least, this is what models have seen mountains of:

Plain prose: articles, blogs, forum posts, documentation. The biggest chunk of all.
HTML and web markup: the entire web is built on it.
Code: millions of public repos, Stack Overflow answers, tutorials. Most mainstream languages are covered deeply.
Markdown: READMEs, docs, Reddit, GitHub issues. It's everywhere.
Config formats: JSON, YAML, TOML, dotenv style files.
Shell and Bash: install steps, CI scripts, one-liners from a thousand blog posts.
SQL: queries and schemas all over tutorials and answers.
Regular expressions: short, common, and heavily explained.
Templating languages: Jinja, Go text/template, Mustache, Helm charts. Common, but less than raw code.
Books and long-form text: large, but a smaller slice than the live web.

Because of this, you usually don't need to teach the model how to write Markdown, format JSON, or explain a bash command. It already knows. Asking it to "return the answer as a Markdown table" just works. You're not spending prompt space on examples for things it has seen a million times.

The flip side matters more. Where the internet is thin, the model is weak. Your company's internal config format, a niche DSL, a brand new library released after training, your team's naming conventions. The model has little to go on. That's where you do the heavy lifting in the prompt: give examples, spell out the rules, show the exact format you want.

One caveat. We don't control what goes into the model. The labs decide the training recipe, and it shifts with every release. A good example is chess. Around the GPT 3.5 era, someone decided to add a pile of chess data to the training mix, and models suddenly got noticeably good at chess. Nothing about the architecture changed. It came down to what went into training, how the neural net wired those patterns together, and how that knowledge ended up distributed across the model. Treat this as a rule of thumb, not a law. But the shape of it holds: common on the internet means strong by default, rare means you have to support it.

It's like you learning a new topic. It's easy when you can map it to something you already know well, and hard when there's nothing to anchor it to. The model is the same. When your task looks like something it has seen endlessly, it's on solid ground. When it doesn't, you build that bridge for it in the prompt.

Thoughts? Hit me up at [email protected]

Building with LLMs — full series

Part 1 of 22

Part 1:How an LLM Actually Works: It Has No Memory(you are here)
Part 2:Structured Output: Getting Reliable Data Out of an LLM
Part 3:Tool Calls: How an LLM Takes Action
Part 4:Streaming: Getting Tokens as the Model Generates Them
Part 5:Memory and Context Engineering (coming soon)
Part 6:Evaluating LLM Output (coming soon)
Part 7:Telemetry and Observability (coming soon)
Part 8:Human in the Loop (coming soon)
Part 9:RAG (coming soon)
Part 10:Agents (coming soon)
Part 11:Frameworks (coming soon)
Part 12:Model Context Protocol (MCP) (coming soon)
Part 13:Skills (coming soon)
Part 14:Agent-to-Agent Communication (A2A) (coming soon)
Part 15:Tokenomics and Cost (coming soon)
Part 16:Browser Use (coming soon)
Part 17:Computer Use (coming soon)
Part 18:Mobile Use (coming soon)
Part 19:LLM-Powered Frontends (coming soon)
Part 20:AI Gateways (coming soon)
Part 21:Design Patterns for Agentic AI (coming soon)
Part 22:Scaling AI Agents (coming soon)

← Previous

Part 2: Structured Output: Getting Reliable Data Out of an LLM

Part 2 of the Building with LLMs series. An LLM only produces text. Structured output is how you force that text into a shape your code can parse every time.

What's Next for Software Engineers?

Writing code is becoming another layer of abstraction. English is the new programming language. The moat isn't code anymore. It's ideas, reputation, trust, and the wisdom to build a life that doesn't depend on one employer.