Nguyen Le Phong

Context Engineering: The Skill That Separates Great AI Users From Everyone Else

Most AI results are disappointing not because the models are bad, but because the context is thin. A practical guide to context engineering — what lives in the model's window, the five pillars of great context, common anti-patterns, and reusable templates that consistently produce better results.

You know the feeling. You type what seems like a perfectly clear question into an AI assistant, and what comes back is... adjacent. Technically relevant, but missing the real point. So you rephrase. Try again. Get another near-miss. After five rounds you finally have something useful — but it took longer than just doing it yourself.

Here's what's actually happening: the model can only work with what you've put in front of it. It has no idea what your codebase looks like, doesn't know your constraints, can't see what you've already tried, and has zero background on why you're asking. When AI responses miss, the root cause is almost always missing information — not wording, not clever prompt tricks. This is the insight at the heart of what the AI community calls context engineering.

Definition

Context engineering is the discipline of deliberately designing the information you give an AI — its role, background, task, constraints, and output format — so it can produce reliably useful results without back-and-forth.

What context engineering actually is

Andrej Karpathy popularised the term in 2025, and the distinction from "prompt engineering" matters. Prompt engineering is about word choice — phrasing requests cleverly. Context engineering goes a level deeper: it's about what information you make available before the model generates anything. The skill of deliberately structuring the model's input so it has everything it needs to produce something genuinely useful, on the first try.

Think of it as mise en place — the chef's practice of having every ingredient prepped and in place before cooking starts. A great chef doesn't improvise around missing ingredients mid-dish. Context engineering is doing that prep work for your AI collaborator. The more complete and well-organized the context, the fewer iterations, the better the first draft.

This shift changes how you approach AI work. Instead of asking "how should I phrase this?", you start asking "what does the model need to know to answer this well?" Those are different questions — and the second one leads to consistently better results.

Anatomy of a context window

Every interaction with an AI happens within a fixed working memory — a context window that holds everything the model can "see" when it generates its response. There are four distinct layers.

Four layers inside an AI context window: System Prompt, Injected Context, Conversation History, and User Message feed into the LLM to produce a response. CONTEXT WINDOW — everything the model sees before writing a single word ROLE System Prompt Persona, role, standing instructions — the model's "constitution" for this session DOCS Injected Context Files, search results, tool outputs, docs — raw material the model reasons over MEM Conversation History Prior messages, decisions — working memory of everything that has happened so far TASK User Message The actual question or task — often under 5% of total tokens, but triggers everything LLM attention + generation Response quality = f(context quality) All four layers share the same token budget. Context engineering is choosing wisely.
The four layers of a context window. Most people focus only on layer 4 (the user message). Context engineering is about consciously managing all four — especially the first two, which are the most powerful and most overlooked.

The System Prompt is the most powerful layer and the most overlooked. It sets the model's persona, expertise level, and the standing rules that govern every response. A well-crafted system prompt is like giving the model a job description before it starts work.

The Injected Context is where you provide raw material: code files, documentation, error logs, database schemas. This is what the model reasons over — the difference between asking a friend "fix this" versus "here's the code, the error, and the relevant docs." The quality of this layer often determines the quality of the answer more than any other.

The Conversation History accumulates as your session progresses. Every exchange is added to what the model can see — useful (it "remembers" earlier decisions) but also costly (long histories consume tokens that could hold more useful injected context).

The User Message is the tip of the iceberg: the actual question. Despite being what people focus on most, it's often under 5% of total tokens in a well-engineered setup. The real work happens in the layers above it.

Why most prompts fail: the five context gaps

Thin context produces generic output. The model has no way to know what you actually need, so it defaults to the most statistically likely answer — which is usually too broad, too vague, or aimed at the wrong audience. There are five specific gaps that cause this.

No role. Without a defined persona, the model defaults to "generic helpful assistant." An expert-level review lands differently than a junior-friendly explanation, even on the same topic.

No state. The model doesn't know what's been tried, what constraints exist, or what the current situation looks like. You're asking it to solve a puzzle while hiding the puzzle.

Unclear task. "Help me with this" could mean a hundred things. Ambiguous tasks produce hedged, general responses that avoid every decision the model can't make.

No constraints. Without guardrails, the model explores the entire solution space — and may suggest a TypeScript refactor when you're on Python, recommend a library you can't use, or write far more than you need.

Undefined format. Without specifying structure, you get whatever the model thinks is appropriate — a three-paragraph essay when you wanted bullet points, or a one-liner when you needed a detailed walkthrough.

The five pillars of great context

Each gap maps to a pillar. Engineering great context means consciously addressing all five.

The five pillars of great context: Role, State, Task, Constraints, Format. 1 Role Who should the model behave as? 2 State What's known, decided, or already tried? 3 Task What exactly do you want done? MOST CRITICAL 4 Constraints What to avoid, scope limits, hard rules 5 Format Output shape, length, level of detail
The five pillars. Pillar 3 (Task) gets the most attention — but pillars 1, 2, 4, and 5 are what make the task answerable well. Skip any one and quality drops noticeably.

Pillar 1: Role

Define who the model should be — their expertise, perspective, communication style. This single change shapes vocabulary, depth, and which trade-offs get emphasized.

Without roleWith role
Explain authentication. You are a senior backend engineer specializing in security, writing for a junior developer who knows JavaScript but has never implemented auth. Explain JWT focusing on the security trade-offs and what can go wrong.

The role doesn't need to be elaborate. Even "You are an expert TypeScript developer" shifts response quality noticeably. Make it specific to the kind of response you actually want.

Pillar 2: State

Tell the model what already exists: codebase structure, what you've tried, what hasn't worked, decisions made, constraints in play. This is the background that transforms a generic answer into a specific, actually-applicable one.

Without stateWith state
Fix this bug.
(with pasted code, no other context)
[paste code] This debounce function works fine for regular input events but fails on rapid scroll — fires immediately instead of waiting. I've tried clearTimeout at the start, still fires. Stack: React 18, TypeScript. Console error: [message].

State is the most underused pillar. Providing it removes dozens of things the model would otherwise guess — and guess wrong.

Pillar 3: Task

Specify exactly what you want, not a category of it. "Help with my API" is a category. "Write an Express middleware that validates JWT tokens, returns 401 for invalid or missing ones, and attaches the decoded user to req.user" is a task. The more precise, the more the model's creativity is directed at solving your actual problem.

Vague taskPrecise task
Write some tests for this function. Write Jest unit tests for validateEmail. Function is pure. Test: valid formats, missing @, missing TLD, spaces, empty string, null, undefined. Explicit named tests (not test.each) for readability.

Pillar 4: Constraints

Tell the model what it cannot do, what to avoid, and what guardrails apply. Constraints sound limiting but actually liberate the response — they eliminate the entire class of "technically correct but useless for my situation" answers.

Constraint framing

Write constraints as explicit "do not" or "avoid" statements. The model is trained to be helpful and naturally tries to expand scope. Explicit negatives are the most reliable guardrails you have.

Common useful constraints: technology stack, output length, libraries to avoid, whether to explain or just write code, whether to ask clarifying questions or make reasonable assumptions.

Pillar 5: Format

Specify the shape of output: length, structure, level of detail, audience. Without format guidance, the model makes a judgment call that may not match your needs — a 500-word essay when you wanted a one-paragraph summary, or bullet points when you needed running prose.

Without formatWith format
What's the best database for my app? Compare PostgreSQL vs MongoDB for a SaaS app with structured user records and flexible per-user metadata, ~10k users scaling to 1M. Format: comparison table (write/read performance, schema flexibility, ops complexity), then a recommendation with 3 bullet reasons. Under 300 words.

Anti-patterns that quietly ruin your results

Some context patterns feel natural but consistently produce poor results.

Anti-patternWhy it failsFix
The naked pastePasting a wall of code with no framing and a vague question.Add a one-paragraph summary: what the code does, what's wrong, what you've tried.
The bundle request"Review this, write tests, and refactor the naming" — three tasks in one message.One task per message. The model optimises for each separately.
The assumed expertInternal jargon or abbreviations the model doesn't know.Treat the model like a smart contractor on day one — explain your specific setup.
The context collapseAsking a new question that depends on decisions made six exchanges ago.Re-inject key decisions: "We decided to use Redis for caching. Now..."
The contradiction"Be concise but comprehensive. Simple but cover every edge case."Pick a priority: "Prioritise brevity — mention edge cases exist but don't detail them."

Building reusable context templates

Once you understand the five pillars, turn them into reusable templates — structured context scaffolds you fill in for recurring tasks. Here's a general-purpose engineering template:

## Role
You are [ROLE], specialized in [DOMAIN].
Your audience is [AUDIENCE] who [KEY CONTEXT].

## Background state
Tech stack: [LANGUAGES / FRAMEWORKS / TOOLS]
What exists: [BRIEF DESCRIPTION]
What I've tried: [ATTEMPTS AND OUTCOMES]
Relevant error: [IF APPLICABLE]

## Task
[ONE PRECISE SENTENCE: verb + object + scope]

## Constraints
- Do NOT use [LIBRARY / APPROACH]
- Keep [ASPECT] under [LIMIT]
- Assume [SAFE ASSUMPTION]

## Output format
[LENGTH] — [STRUCTURE] — [LEVEL OF DETAIL]

You don't need to fill in every field every time. For a simple lookup, role + task + format is enough. For complex debugging, all five matter. The template is a prompt, not a mandatory form.

Applied to a real code review request:

## Role
You are a senior TypeScript engineer who cares about maintainability.
Audience: a mid-level developer who wants to learn, not just a list of fixes.

## Background state
Stack: TypeScript 5.4, React 18, no class components.
This is a custom hooks file handling auth state. No tests exist yet.

## Task
Review for correctness, type safety issues, and maintainability concerns.

## Constraints
- Do NOT suggest moving to a state management library (intentional decision)
- For each issue, show a concrete fix — not just a description
- Maximum 5 issues, prioritized by impact

## Output format
List: issue name | why it matters | code fix (2–5 lines)

Notice the difference: "review my code" becomes a specific, bounded, teachable review. The model knows who it's writing for, what's off-limits, how many issues to flag, and exactly how to present each one.

Context by use case

Different tasks weight the five pillars differently. A quick reference:

Use caseCritical pillarsCommon mistake
DebuggingState (what failed, error, what was tried) + Task (expected behaviour)Pasting code without the error message or reproduction steps
Code generationRole + Task (precise spec) + Constraints (stack, patterns)Not specifying the tech stack — model picks one for you
Code reviewState (what the code does) + Constraints (scope) + Format (how to present issues)No context on intent — model critiques deliberate choices
Technical writingRole (audience-matched) + Task (topic + scope) + Format (length, tone)Not specifying audience — gets either too basic or too advanced
Architecture decisionsState (system, constraints, scale) + Task (decision needed) + Format (pros/cons + recommendation)Asking "best approach?" without the constraints that make it non-trivial

The token budget question

Context engineering has a practical ceiling: every token you put in is a token that doesn't go to something else. Modern models have large windows (100k–2M tokens), but this creates a different problem: with unlimited space, people dump everything and let the model figure it out. This works, but it's expensive — and it produces worse results than focused context.

A better mental model: the context window is a finite workspace, and you're the one who decides what goes on the desk. Irrelevant information isn't neutral — it's noise the model has to reason through. A codebase with 10 relevant files and 40 irrelevant ones produces worse results than the 10 alone.

Token hygiene

Before pasting anything, ask: does this help the model answer the question, or am I including it "just in case"? If it's "just in case", leave it out. The model's attention dilutes across everything in front of it.

Practical tips:

  • Paste specific functions, not whole files. If the model needs validateUser, give it validateUser, not the entire auth.ts.
  • Summarise conversation history instead of letting it grow. After a long session, start fresh with a "here's where we are" paragraph.
  • Put standing instructions in the system prompt, not repeated per-message. Say it once.
  • Strip boilerplate from pasted code. License headers, import blocks, and comments rarely help the model understand the logic.

Key takeaways

  • Context engineering beats prompt engineering: the information you provide matters more than the wording you choose. Better input = better output, every time.
  • Four layers, one budget: system prompt, injected context, conversation history, user message — each has a different purpose and competes for the same tokens.
  • Five pillars, five gaps: Role (who), State (background), Task (what exactly), Constraints (what not to do), Format (what shape). Cover all five and back-and-forth drops dramatically.
  • Build reusable templates: a good code review template is worth building once and reusing hundreds of times. Standardize your context structure for recurring tasks.
  • Quality beats quantity: relevant, focused, structured context outperforms large unfocused dumps. Trim aggressively.
  • Weight pillars by use case: debugging needs State most; code generation needs Task and Constraints; technical writing needs Role and Format.

Context engineering is a habit of thinking: before you send a prompt, pause and ask what you'd need to explain to a brilliant new colleague to make this task self-contained. Add that. Remove what isn't relevant. You'll spend less time iterating and more time using the output. That's the whole idea.

What did you think?

Frequently asked questions

What is context engineering and why does it matter for developers?
Context engineering is the skill of deliberately designing the information you provide to an AI — its role, background state, task, constraints, and output format — so it produces useful results without back-and-forth. It matters because the quality of AI output is almost entirely determined by the quality of input: vague, thin context produces generic, off-target answers, while rich, structured context produces specific, actionable ones. Most developers underestimate this and focus on rephrasing prompts instead.
What is the difference between context engineering and prompt engineering?
Prompt engineering is about word choice — how to phrase a request so the model interprets it correctly. Context engineering is a level deeper: it's about what information you make available before the model generates anything. You can have a perfectly phrased prompt and still get a poor answer because the model lacks background, state, or constraints. Context engineering addresses the information architecture; prompt engineering addresses the wording within it.
What are the five pillars of great context?
The five pillars are: (1) Role — who the model should be (persona, expertise, audience perspective); (2) State — what already exists, what's been tried, what constraints apply; (3) Task — exactly what you want done, not a vague category; (4) Constraints — what to avoid, scope limits, hard rules; (5) Format — the shape of the output (length, structure, level of detail). Addressing all five reduces back-and-forth dramatically and improves first-draft quality.
How do I build a reusable context template?
Structure your template around the five pillars: Role (who the model is and for whom), Background State (tech stack, what exists, what's been tried), Task (one precise sentence), Constraints (explicit negatives: what not to do), and Output Format (length, structure, detail level). You don't need to fill in every field every time — for a simple lookup, role + task + format is enough. For complex debugging or code review, all five matter. The template is a scaffold, not a form.
What does 'token budget' mean in context engineering?
Every interaction with an AI happens inside a context window with a fixed size measured in tokens. Everything you include — the system prompt, pasted code, conversation history, your question — competes for space in that window. Token budget management means being selective about what you include: paste specific functions rather than whole files, strip boilerplate from pasted code, summarize conversation history after long sessions, and put standing instructions in the system prompt rather than repeating them per-message. Irrelevant content isn't neutral — it's noise the model has to reason through.