What is an LLM?

Large language models: how they work, what they’re good at, and where they break.

026 min7 resources

You will learn

Describe tokens, context windows, and their product implications.
List core LLM capabilities and hard limitations.
Choose grounding strategies (tools, retrieval) for factual tasks.

Definition

A large language model (LLM) is a neural network trained to predict text—typically the next token in a sequence. At scale, that objective produces systems that can draft, summarize, translate, reason over context, and follow instructions when fine-tuned or prompted appropriately.

“Large” refers both to parameter count (billions or more) and to training data size (web text, books, code, and curated datasets). Examples include GPT, Claude, Gemini, Llama, and Mistral families.

Tokens and context

LLMs do not read words the way humans do. Text is split into tokens—subword units—and the model processes a fixed context window (e.g. 8k–200k+ tokens depending on the model). Everything in that window—system instructions, conversation history, retrieved documents—competes for the same budget.

Context limits are a practical constraint for product design: long documents must be chunked, summarized, or retrieved selectively rather than pasted in full.

Capabilities

LLMs excel at language-centric tasks: writing, coding assistance, classification via prompts, structured extraction, and multi-step reasoning when given clear goals and examples.

They can be steered with system prompts, few-shot examples, and tool definitions—but they do not inherently know your private data, live systems, or current events unless you connect them via retrieval, APIs, or browsing tools.

Limitations and risks

LLMs can hallucinate plausible-sounding falsehoods, reflect biases in training data, and behave inconsistently across phrasings. They are not databases, calculators, or authorization layers unless you wrap them with verification and guardrails.

Production use requires evaluation suites, monitoring, access control, and human oversight for high-stakes decisions.

Knowledge cutoff — training data has a date; real-time facts need tools or retrieval.
Non-determinism — temperature and sampling change outputs run to run.
Cost and latency — long contexts and large models are expensive at scale.

From chat to systems

A chat interface is one surface. In agentic systems, the LLM is usually a reasoning core inside a loop: observe state, plan, call tools, update memory, repeat. The next topics cover agents and the harnesses that run them reliably.