Nguyen Le PhongNguyen Le Phong

Caching Strategies and Pitfalls

A practical explanation of caching strategies: where caches help, how cache-aside, read-through, write-through, TTLs, and invalidation differ, and why stale data, stampedes, keys, and observability matter as much as speed.

The page loads slowly after lunch, and everyone feels it before anyone can explain it. A dashboard that was fine yesterday now spins for eight seconds. Someone opens the network tab. Someone else checks the database chart. Then the familiar sentence appears in the chat: we should just add a cache.

It is a reasonable instinct. Caching is one of the simplest ways to make a system feel faster. If many users ask for the same expensive answer, the system should not recompute it every time. A cache is like leaving a frequently used document on the desk instead of walking to the archive room for every small question. The desk is faster. The archive is still the source of truth. The trouble begins when the team forgets which one is which.

The first strategy many teams use is cache-aside. The application asks the cache first. If the value is there, it uses it. If the value is missing, the application reads from the database, stores the answer in the cache, and returns it. This pattern is easy to understand and easy to add incrementally. It works well for product details, profile summaries, permissions, configuration, and other reads that are repeated often enough to justify keeping a copy nearby.

Cache-aside also makes the application responsible for the awkward parts. It must choose a good cache key. It must decide how long a value can live. It must handle a miss without making the user wait too long. It must remove or refresh the value when the source changes. A small mistake in any of those choices can turn a performance improvement into a correctness bug. The cache did not become dangerous because it was fast. It became dangerous because it was trusted without enough boundaries.

Read-through caching moves some of that responsibility into the cache layer. The application asks the cache, and the cache knows how to load the value if it is missing. Write-through caching writes to the cache and the backing store as part of the same flow. Write-behind caching accepts a write quickly and updates the backing store later. Each pattern buys a different kind of simplicity and pays a different kind of risk. Read-through can centralize loading logic. Write-through can keep reads fresh. Write-behind can reduce latency but makes durability and recovery more serious.

Time to live, usually called TTL, is the quiet agreement between speed and freshness. A five-minute TTL may be perfect for a public article list, acceptable for a product catalog, and risky for account permissions. A one-hour TTL may be harmless for a footer menu and harmful for pricing. The important question is not simply how long the cache should live. The better question is how much stale data this user experience can honestly tolerate, and what happens when the stale value is wrong.

Invalidation is where many teams learn humility. There is a reason people joke that cache invalidation is one of the hard problems in computer science. The database changed, but which cache entries should be removed? One user changed their name, but the name appears in a profile card, team list, audit view, search index, notification template, and mention autocomplete. If the keys are not designed with the real read paths in mind, invalidation becomes a messy hunt through the system.

Stale data is not always a failure. Some products can say that a report updates every few minutes. A feed can lag a little. A recommendation list does not need to reflect the last click instantly. But some data carries trust. A permission check, account balance, seat reservation, checkout price, or privacy setting should not quietly rely on a stale copy unless the system has a very deliberate safety model. The cache boundary should respect the promise the product is making.

Another pitfall is the cache stampede. Imagine a popular page whose cached value expires at noon. At 12:00:01, thousands of requests miss the cache together and all rush to the database to rebuild the same answer. The cache was meant to protect the database, but the expiry moment turns into a crowd at one small service counter. Teams reduce this with request coalescing, jittered TTLs, background refresh, soft expiry, or locking so one request does the expensive work while others wait or receive a slightly older value.

Memory pressure is quieter but just as real. Caches are finite. If keys are too broad, values too large, or eviction policies poorly matched to usage, the system may spend its time filling and throwing away data. A low hit rate can mean the cache is only adding network calls, serialization cost, and operational complexity. Before celebrating that a cache exists, the team should ask whether it is actually being used well: hit rate, miss latency, eviction rate, object size, error rate, and the age of returned values all tell part of the story.

A healthy caching design starts with the user promise. Which answer is slow? Why is it slow? How often does it change? Who is allowed to see it? How wrong can it be, and for how long? From there, the team can choose the smallest useful cache. Maybe it is an in-process cache for reference data. Maybe it is Redis for shared hot reads. Maybe it is CDN caching for public assets. Maybe the better answer is a better database index, a smaller query, or a precomputed read model instead of a cache at all.

Caching is not a decoration added after architecture is done. It becomes part of the architecture because it creates another place where truth can be observed, delayed, or misunderstood. Used carefully, it turns repeated work into quiet speed. Used casually, it hides bugs behind fast responses. The next time a slow page makes someone say "just add cache," it may be worth pausing for one calmer question: what truth are we copying, and what promise are we making while that copy exists?

이 글 어떠셨나요?