What is cognitive debt in software engineering?

Cognitive debt is the gap between the software that exists and the team's shared understanding of how to change, debug, extend, and operate it safely.

How is cognitive debt different from technical debt?

Technical debt usually lives in code and architecture, such as duplication, tight coupling, or weak tests. Cognitive debt lives in the team's mental model: people no longer understand the system well enough to change it confidently.

Does AI-generated code always create cognitive debt?

No. AI can reduce cognitive debt when used to explain dependencies, draft documentation, generate test matrices, and expose edge cases. The risk appears when AI increases output faster than the team can verify and understand it.

What is one simple rule for teams using AI coding tools?

Do not merge code that no human understands well enough to operate. At least one person should be able to explain the intent, assumptions, invariants, failure modes, and rollback path.

Cognitive Debt in AI-Assisted Development: When Code Grows Faster Than Understanding

AI coding tools can make a team produce code faster than it can understand, review, debug, and safely operate that code. This article explains cognitive debt, how it differs from technical debt and intent debt, why agentic coding makes it more visible, and the practical controls teams can use: smaller batches, better PR rationale, human-owned invariants, disposable prototypes, and AI used to repay understanding instead of only generating more work.

著者：Nguyen Le Phong2026年4月5日15 分で読めます

AI
Cognitive Debt
AI Agents
Software Engineering
Code Review
Engineering Culture

You have probably seen the pull request already. It arrives late in the afternoon with a friendly note: AI helped with most of this, tests are green. The diff is large. The naming is mostly fine. The code compiles. The unit tests pass. The branch deploys to staging. Nothing looks obviously broken.

And still, while reviewing it, something in your stomach stays tense. You cannot quite explain the authorization edge case. You are not sure why the cache TTL is five minutes. The fallback path looks reasonable, but no one remembers whether a revoked user is allowed to keep access during propagation. The code may be correct. The problem is that the team does not understand it well enough to be confident.

Nguyen Le Phong and a teammate study an AI-assisted pull request and flow diagram on a large monitor while late-afternoon light fills the review room. — A green test suite does not remove the moment when a reviewer realizes the team still cannot explain the change with enough confidence.

That is the shape of cognitive debt in AI-assisted development. It is not simply bad code. It is the gap between the system that exists and the shared understanding the team has about how to change, debug, extend, and operate it safely.

What cognitive debt is

Technical debt is familiar to most engineers. We borrow against the future when we accept duplication, tight coupling, thin tests, rushed design, or an abstraction that will need to be cleaned later. The debt lives mostly in the code and architecture.

Cognitive debt lives in people. It appears when the team no longer carries a reliable mental model of the system. Code may be typed, formatted, tested, and merged, but the team cannot explain why a flow exists, which invariant protects it, what trade-off was chosen, or what might break if one module changes.

Margaret-Anne Storey has recently framed this as part of a broader triple debt model: technical debt in code, cognitive debt in shared understanding, and intent debt in the externalized rationale that people and AI agents need later. That distinction matters. A codebase can look clean and still be hard to change if the reasoning behind it has disappeared.

Debt type	Where it lives	The broken question
Technical debt	Code, design, architecture	Is this easy to change?
Cognitive debt	The team's mental model	Do we understand this well enough to change it safely?
Intent debt	Decision records, specs, docs, prompts, rationale	Do we know why it was designed this way?

The painful part is that cognitive debt rarely fails a build. It does not show up as a TypeScript error or a lint warning. It usually announces itself later, when someone says, do not touch that area, nobody knows how it works.

Why AI makes it worse

AI changes the speed ratio. Before AI, writing code was often also a way of building the mental model. You discovered the constraints while you typed, hit the edge cases while you tested, and felt the shape of the system while changing it by hand.

With an AI coding assistant or agent, a developer can accept a large patch, a refactor, a SQL query, an integration layer, or a DevOps workflow before building the same depth of understanding. The artifact grows faster than the human model around it. The team gets velocity without always getting comprehension.

Nguyen Le Phong sits alone at a home desk at night, comparing code, dependency maps, and handwritten questions as he rebuilds his mental model. — Cognitive debt starts to shrink only when someone slows down long enough to turn generated output back into a human mental model.

For example, imagine asking an agent to “make payment retries more robust.” It adds exponential backoff, extracts a shared retry helper, catches more provider errors, and updates a few tests. The diff looks like a reliability improvement. But if nobody checks which provider errors are safe to retry, whether every charge carries an idempotency key, and how the partner rate limit behaves under retry storms, the team has not only changed code. It has changed an operational contract it may not understand.

This is why the recent delivery numbers are worth reading carefully. DORA's report on generative AI in software development found that a 25% increase in AI adoption was associated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability, with larger batch sizes named as one likely mechanism. Sonar's survey on AI coding described a similar verification gap: 96% of developers did not fully trust AI-generated code, but only 48% said they always checked it before committing.

Those numbers do not mean AI is bad for software. They mean speed is not the same thing as throughput, and generated code is not the same thing as understood code. If AI moves the bottleneck from writing to verifying, then the team has to redesign the process around verification, not pretend the old review habits still scale.

The human side: cognitive offloading

There is also a more general cognitive pattern underneath this. Michael Gerlich's 2025 study in Societies, based on 666 participants, found a strong relationship between AI tool usage, cognitive offloading, and lower critical thinking scores. The important reading is not that using AI automatically makes people worse thinkers. The more useful reading is that when we use AI to avoid doing the hard thinking ourselves, the thinking muscle receives less training.

Software engineering has its own version of that pattern. If the assistant writes the implementation, the tests, the migration, and the explanation, and the human only accepts the final bundle, the human may not have built the mental model needed for the next incident. The code exists, but the understanding was outsourced.

A fair distinction

AI can also reduce cognitive debt when used deliberately: summarizing dependencies, generating test matrices, drafting ADRs, finding missing edge cases, or turning code into a diagram the team can verify. The risk is not AI itself. The risk is using AI only to increase output while shared understanding stays flat.

A concrete example: the authorization layer

Imagine a team asks an AI agent to generate an authorization layer:

User -> Role -> Policy -> Resource -> Action

The demo works. The middleware is clean. There is a cache. There are basic tests. But nobody writes down the real operating rules:

Why can admins bypass some policies?
Why is the cache TTL five minutes?
How long is acceptable between permission revocation and propagation?
Which policies are business-critical and must fail closed?
What must the audit log prove during an incident?

Six months later, there is an incident. A user whose access was revoked can still reach a resource for a few minutes. The code is not necessarily ugly. The failure lives in the lost or never-created understanding around the code. The team accepted a working artifact without capturing the intent and invariants that make the artifact safe to operate.

Signs your team is carrying cognitive debt

The signs are usually ordinary at first.

Code runs, but nobody wants to touch it. Every small change requires three people, ten files, and too much manual regression.
PR review becomes a ritual. Reviewers comment on names, style, and test snapshots, but cannot really judge intent because the patch is too large or too unfamiliar.
Onboarding slows down. New teammates read the docs, then read the code, then realize each person explains the system differently.
Debugging depends on the person who remembers. The system has a few human routers. When they are away, delivery stalls.
The team asks AI to explain code no human can verify. That can help, but it becomes dangerous when the explanation has no tests, telemetry, domain rules, or design notes to check against.

The common thread is not that the system is complex. Most useful systems become complex. The problem is that the shared theory of the system is no longer available to the people responsible for it.

Slowing down is an engineering control

It is tempting to answer this with a slogan: slow down. But the point is not nostalgia for hand-written code or suspicion toward tools. The point is more precise: slow down at the moments where understanding is created or lost.

Some friction is not waste. Writing a short design note is friction. Explaining a PR's assumptions is friction. Splitting an AI-generated change into smaller batches is friction. Making a reviewer restate the invariant is friction. But these frictions are the same places where the team rebuilds the mental model that makes future speed possible.

A useful rule for teams is simple: do not merge code that no one understands well enough to operate. At least one human should be able to explain what changed, why it changed, what could fail, how to detect failure, and how to roll back. That person does not need to have typed every line. They do need to be accountable for the system becoming understandable again before it ships.

Nguyen Le Phong leads two teammates through a board of small review steps, invariants, and rollback paths during a daylight engineering workshop. — Smaller batches and human-owned checkpoints are not ceremony; they are how a team makes an AI-assisted change understandable again before it becomes production responsibility.

A PR template for the AI era

One practical place to start is the pull request. Do not ask only what changed. Ask for the thinking that lets someone else safely inherit the change.

A lightweight checklist

For meaningful AI-assisted changes, add these fields to the PR: Why this change? Key assumptions. New or protected invariants. Failure modes. Rollback path. What AI generated or assisted. What was manually verified.

What this looks like

For the authorization example, a useful PR note might say: Why: revoked permissions must stop working within 60 seconds. Assumptions: the policy cache may be stale for at most one minute; finance resources must fail closed. Invariants: every access decision writes an audit event. Rollback: disable the cache behind the feature flag. Manual verification: revoke a user in staging, retry the resource, and confirm the audit trail.

This is not bureaucracy for its own sake. It is intent capture. It gives future humans and future agents something better than archeology.

Six practices that reduce cognitive debt

Limit AI-generated batch size. If a patch is too large for a reviewer to reason about, it is too large to merge safely. Ask the agent to produce smaller steps.
Treat AI prototypes as disposable. Use them to explore, then refactor through the architecture, naming, constraints, and test strategy of your system.
Review for understanding, not only correctness. Ask whether the reviewer can explain the change back. If not, the review is not done.
Externalize intent. Write ADRs, decision logs, domain glossaries, sequence diagrams, runbooks, threat models, and migration notes when the change is meaningful.
Use AI to repay debt. Ask it to generate a test matrix, list edge cases, summarize dependencies, compare implementation with a spec, or draft documentation that humans verify.
Protect human checkpoints. Architecture, public APIs, security boundaries, data migrations, and irreversible operations should remain human-owned decisions, even when AI helps produce the draft.

A practical difference: do not ask an agent to “modernize the whole auth module” and merge the result as one impressive diff. Ask it first to add characterization tests around current behavior. Then extract policy lookup. Then add cache behind a flag. Then document the revoke semantics. Each step leaves the reviewer able to explain one idea, instead of pretending to understand five intertwined changes at once.

The leader's role: permission to slow down

The teams most at risk are not always the teams using AI badly. Sometimes they are the teams using AI enthusiastically while leadership quietly rewards only visible output. If the only celebrated number is how much faster code is produced, people will optimize for code volume and hide the cost of understanding.

Leaders need to make a different permission explicit: we are not trying to do everything AI can do. We are trying to do the important work with enough understanding that we can own the result. That means making space for deep work, design review, knowledge sharing, and honest conversations about where AI is helping and where it is creating pressure.

A team can move fast with AI. But it has to move at the speed of verified understanding, not at the speed of generated text.

Key takeaways

Cognitive debt is the gap between the system and the team's understanding of it. Code can compile, tests can pass, and the team can still be unable to change it safely.
Technical debt lives mostly in code; cognitive debt lives in people; intent debt lives in missing rationale. All three interact in AI-assisted development.
AI increases the risk because generation can outrun comprehension. The patch arrives faster than the team's mental model can form.
The bottleneck moves from writing to verification. If review habits do not change, AI can create larger batches that are harder to reason about.
Some friction is healthy. Design notes, smaller PRs, explicit invariants, and rollback plans are not slowdown for its own sake. They are how shared understanding is rebuilt.
Use AI to repay debt, not only create artifacts. Let it draft test matrices, dependency maps, runbooks, and ADRs, then make humans verify them.
Do not merge what no one can operate. At least one human should understand the change deeply enough to debug it at 2 a.m.

The most useful posture toward AI coding is not fear, and it is not surrender. It is stewardship. Let the machine draft, search, compare, and accelerate. But keep the team's understanding close to the system it owns. The future codebase will not be maintained by the prompt that created it. It will be maintained by people who have to understand what was built, why it was built, and how to change it without breaking the world around it.

記事はいかがでしたか？

よくある質問

What is cognitive debt in software engineering?: Cognitive debt is the gap between the software that exists and the team's shared understanding of how to change, debug, extend, and operate it safely.
How is cognitive debt different from technical debt?: Technical debt usually lives in code and architecture, such as duplication, tight coupling, or weak tests. Cognitive debt lives in the team's mental model: people no longer understand the system well enough to change it confidently.
Does AI-generated code always create cognitive debt?: No. AI can reduce cognitive debt when used to explain dependencies, draft documentation, generate test matrices, and expose edge cases. The risk appears when AI increases output faster than the team can verify and understand it.
What is one simple rule for teams using AI coding tools?: Do not merge code that no human understands well enough to operate. At least one person should be able to explain the intent, assumptions, invariants, failure modes, and rollback path.