You know the feeling. You type what seems like a perfectly clear question into an AI assistant, and what comes back is... adjacent. Technically relevant, but missing the real point. So you rephrase. Try again. Get another near-miss. After five rounds you finally have something useful — but it took longer than just doing it yourself.
Here's what's actually happening: the model can only work with what you've put in front of it. It has no idea what your codebase looks like, doesn't know your constraints, can't see what you've already tried, and has zero background on why you're asking. When AI responses miss, the root cause is almost always missing information — not wording, not clever prompt tricks. This is the insight at the heart of what the AI community calls context engineering.
Context engineering is the discipline of deliberately designing the information you give an AI — its role, background, task, constraints, and output format — so it can produce reliably useful results without back-and-forth.
What context engineering actually is
Andrej Karpathy popularised the term in 2025, and the distinction from "prompt engineering" matters. Prompt engineering is about word choice — phrasing requests cleverly. Context engineering goes a level deeper: it's about what information you make available before the model generates anything. The skill of deliberately structuring the model's input so it has everything it needs to produce something genuinely useful, on the first try.
Think of it as mise en place — the chef's practice of having every ingredient prepped and in place before cooking starts. A great chef doesn't improvise around missing ingredients mid-dish. Context engineering is doing that prep work for your AI collaborator. The more complete and well-organized the context, the fewer iterations, the better the first draft.
This shift changes how you approach AI work. Instead of asking "how should I phrase this?", you start asking "what does the model need to know to answer this well?" Those are different questions — and the second one leads to consistently better results.
Anatomy of a context window
Every interaction with an AI happens within a fixed working memory — a context window that holds everything the model can "see" when it generates its response. There are four distinct layers.
The System Prompt is the most powerful layer and the most overlooked. It sets the model's persona, expertise level, and the standing rules that govern every response. A well-crafted system prompt is like giving the model a job description before it starts work.
The Injected Context is where you provide raw material: code files, documentation, error logs, database schemas. This is what the model reasons over — the difference between asking a friend "fix this" versus "here's the code, the error, and the relevant docs." The quality of this layer often determines the quality of the answer more than any other.
The Conversation History accumulates as your session progresses. Every exchange is added to what the model can see — useful (it "remembers" earlier decisions) but also costly (long histories consume tokens that could hold more useful injected context).
The User Message is the tip of the iceberg: the actual question. Despite being what people focus on most, it's often under 5% of total tokens in a well-engineered setup. The real work happens in the layers above it.
Why most prompts fail: the five context gaps
Thin context produces generic output. The model has no way to know what you actually need, so it defaults to the most statistically likely answer — which is usually too broad, too vague, or aimed at the wrong audience. There are five specific gaps that cause this.
No role. Without a defined persona, the model defaults to "generic helpful assistant." An expert-level review lands differently than a junior-friendly explanation, even on the same topic.
No state. The model doesn't know what's been tried, what constraints exist, or what the current situation looks like. You're asking it to solve a puzzle while hiding the puzzle.
Unclear task. "Help me with this" could mean a hundred things. Ambiguous tasks produce hedged, general responses that avoid every decision the model can't make.
No constraints. Without guardrails, the model explores the entire solution space — and may suggest a TypeScript refactor when you're on Python, recommend a library you can't use, or write far more than you need.
Undefined format. Without specifying structure, you get whatever the model thinks is appropriate — a three-paragraph essay when you wanted bullet points, or a one-liner when you needed a detailed walkthrough.
The five pillars of great context
Each gap maps to a pillar. Engineering great context means consciously addressing all five.
Pillar 1: Role
Define who the model should be — their expertise, perspective, communication style. This single change shapes vocabulary, depth, and which trade-offs get emphasized.
| Without role | With role |
|---|---|
Explain authentication. |
You are a senior backend engineer specializing in security, writing for a junior developer who knows JavaScript but has never implemented auth. Explain JWT focusing on the security trade-offs and what can go wrong. |
The role doesn't need to be elaborate. Even "You are an expert TypeScript developer" shifts response quality noticeably. Make it specific to the kind of response you actually want.
Pillar 2: State
Tell the model what already exists: codebase structure, what you've tried, what hasn't worked, decisions made, constraints in play. This is the background that transforms a generic answer into a specific, actually-applicable one.
| Without state | With state |
|---|---|
Fix this bug.(with pasted code, no other context) |
[paste code] This debounce function works fine for regular input events but fails on rapid scroll — fires immediately instead of waiting. I've tried clearTimeout at the start, still fires. Stack: React 18, TypeScript. Console error: [message]. |
State is the most underused pillar. Providing it removes dozens of things the model would otherwise guess — and guess wrong.
Pillar 3: Task
Specify exactly what you want, not a category of it. "Help with my API" is a category. "Write an Express middleware that validates JWT tokens, returns 401 for invalid or missing ones, and attaches the decoded user to req.user" is a task. The more precise, the more the model's creativity is directed at solving your actual problem.
| Vague task | Precise task |
|---|---|
Write some tests for this function. |
Write Jest unit tests for validateEmail. Function is pure. Test: valid formats, missing @, missing TLD, spaces, empty string, null, undefined. Explicit named tests (not test.each) for readability. |
Pillar 4: Constraints
Tell the model what it cannot do, what to avoid, and what guardrails apply. Constraints sound limiting but actually liberate the response — they eliminate the entire class of "technically correct but useless for my situation" answers.
Write constraints as explicit "do not" or "avoid" statements. The model is trained to be helpful and naturally tries to expand scope. Explicit negatives are the most reliable guardrails you have.
Common useful constraints: technology stack, output length, libraries to avoid, whether to explain or just write code, whether to ask clarifying questions or make reasonable assumptions.
Pillar 5: Format
Specify the shape of output: length, structure, level of detail, audience. Without format guidance, the model makes a judgment call that may not match your needs — a 500-word essay when you wanted a one-paragraph summary, or bullet points when you needed running prose.
| Without format | With format |
|---|---|
What's the best database for my app? |
Compare PostgreSQL vs MongoDB for a SaaS app with structured user records and flexible per-user metadata, ~10k users scaling to 1M. Format: comparison table (write/read performance, schema flexibility, ops complexity), then a recommendation with 3 bullet reasons. Under 300 words. |
Anti-patterns that quietly ruin your results
Some context patterns feel natural but consistently produce poor results.
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| The naked paste | Pasting a wall of code with no framing and a vague question. | Add a one-paragraph summary: what the code does, what's wrong, what you've tried. |
| The bundle request | "Review this, write tests, and refactor the naming" — three tasks in one message. | One task per message. The model optimises for each separately. |
| The assumed expert | Internal jargon or abbreviations the model doesn't know. | Treat the model like a smart contractor on day one — explain your specific setup. |
| The context collapse | Asking a new question that depends on decisions made six exchanges ago. | Re-inject key decisions: "We decided to use Redis for caching. Now..." |
| The contradiction | "Be concise but comprehensive. Simple but cover every edge case." | Pick a priority: "Prioritise brevity — mention edge cases exist but don't detail them." |
Building reusable context templates
Once you understand the five pillars, turn them into reusable templates — structured context scaffolds you fill in for recurring tasks. Here's a general-purpose engineering template:
## Role
You are [ROLE], specialized in [DOMAIN].
Your audience is [AUDIENCE] who [KEY CONTEXT].
## Background state
Tech stack: [LANGUAGES / FRAMEWORKS / TOOLS]
What exists: [BRIEF DESCRIPTION]
What I've tried: [ATTEMPTS AND OUTCOMES]
Relevant error: [IF APPLICABLE]
## Task
[ONE PRECISE SENTENCE: verb + object + scope]
## Constraints
- Do NOT use [LIBRARY / APPROACH]
- Keep [ASPECT] under [LIMIT]
- Assume [SAFE ASSUMPTION]
## Output format
[LENGTH] — [STRUCTURE] — [LEVEL OF DETAIL]
You don't need to fill in every field every time. For a simple lookup, role + task + format is enough. For complex debugging, all five matter. The template is a prompt, not a mandatory form.
Applied to a real code review request:
## Role
You are a senior TypeScript engineer who cares about maintainability.
Audience: a mid-level developer who wants to learn, not just a list of fixes.
## Background state
Stack: TypeScript 5.4, React 18, no class components.
This is a custom hooks file handling auth state. No tests exist yet.
## Task
Review for correctness, type safety issues, and maintainability concerns.
## Constraints
- Do NOT suggest moving to a state management library (intentional decision)
- For each issue, show a concrete fix — not just a description
- Maximum 5 issues, prioritized by impact
## Output format
List: issue name | why it matters | code fix (2–5 lines)
Notice the difference: "review my code" becomes a specific, bounded, teachable review. The model knows who it's writing for, what's off-limits, how many issues to flag, and exactly how to present each one.
Context by use case
Different tasks weight the five pillars differently. A quick reference:
| Use case | Critical pillars | Common mistake |
|---|---|---|
| Debugging | State (what failed, error, what was tried) + Task (expected behaviour) | Pasting code without the error message or reproduction steps |
| Code generation | Role + Task (precise spec) + Constraints (stack, patterns) | Not specifying the tech stack — model picks one for you |
| Code review | State (what the code does) + Constraints (scope) + Format (how to present issues) | No context on intent — model critiques deliberate choices |
| Technical writing | Role (audience-matched) + Task (topic + scope) + Format (length, tone) | Not specifying audience — gets either too basic or too advanced |
| Architecture decisions | State (system, constraints, scale) + Task (decision needed) + Format (pros/cons + recommendation) | Asking "best approach?" without the constraints that make it non-trivial |
The token budget question
Context engineering has a practical ceiling: every token you put in is a token that doesn't go to something else. Modern models have large windows (100k–2M tokens), but this creates a different problem: with unlimited space, people dump everything and let the model figure it out. This works, but it's expensive — and it produces worse results than focused context.
A better mental model: the context window is a finite workspace, and you're the one who decides what goes on the desk. Irrelevant information isn't neutral — it's noise the model has to reason through. A codebase with 10 relevant files and 40 irrelevant ones produces worse results than the 10 alone.
Before pasting anything, ask: does this help the model answer the question, or am I including it "just in case"? If it's "just in case", leave it out. The model's attention dilutes across everything in front of it.
Practical tips:
- Paste specific functions, not whole files. If the model needs
validateUser, give itvalidateUser, not the entireauth.ts. - Summarise conversation history instead of letting it grow. After a long session, start fresh with a "here's where we are" paragraph.
- Put standing instructions in the system prompt, not repeated per-message. Say it once.
- Strip boilerplate from pasted code. License headers, import blocks, and comments rarely help the model understand the logic.
Key takeaways
- Context engineering beats prompt engineering: the information you provide matters more than the wording you choose. Better input = better output, every time.
- Four layers, one budget: system prompt, injected context, conversation history, user message — each has a different purpose and competes for the same tokens.
- Five pillars, five gaps: Role (who), State (background), Task (what exactly), Constraints (what not to do), Format (what shape). Cover all five and back-and-forth drops dramatically.
- Build reusable templates: a good code review template is worth building once and reusing hundreds of times. Standardize your context structure for recurring tasks.
- Quality beats quantity: relevant, focused, structured context outperforms large unfocused dumps. Trim aggressively.
- Weight pillars by use case: debugging needs State most; code generation needs Task and Constraints; technical writing needs Role and Format.
Context engineering is a habit of thinking: before you send a prompt, pause and ask what you'd need to explain to a brilliant new colleague to make this task self-contained. Add that. Remove what isn't relevant. You'll spend less time iterating and more time using the output. That's the whole idea.