Foundations of Software ArchitecturePart 8 of 15

Who Owns the Data? Database-per-Service, Sagas, and Eventual Consistency Without Tears

Splitting code is the easy half — splitting data is where distributed systems humble you. A practical guide to owning data across services: why a shared database is a distributed monolith, the trade from ACID to eventual consistency, the dual-write bug and the outbox that fixes it, sagas with compensating actions, and when CQRS and event sourcing are worth their lifetime cost.

By Nguyen Le PhongApril 10, 202613 min read

Software Architecture
Distributed Systems
Saga Pattern
CQRS
Event Sourcing
Data Consistency

In the previous part we split a synchronous chain into events. But splitting the code is the easy half. The moment two services stop sharing a database, a quieter and harder question arrives: who owns the data, and what does "true" even mean when the truth is spread across five services?

This is the part of distributed systems that humbles experienced engineers, because it is not really about technology. It is about giving up a comfort you have leaned on your whole career — the single database transaction that either fully happens or fully does not — and learning to build correctly without it.

Nguyen Le Phong stands beside a glass wall diagram, walking two teammates through how separate services keep their own data and exchange events instead of sharing tables. — The first hard data conversation in a distributed system is usually not about scale. It is about drawing a boundary the team will actually respect when each service owns its truth.

The shared database that couples everyone

When a team splits a monolith, the tempting shortcut is: split the code into services, but let them all keep talking to the same database. It feels pragmatic. It is also the single fastest way to build a distributed monolith — the worst of both worlds.

The reason is invisible coupling. If the orders service and the billing service both read and write the orders table, then billing's schema is now silently part of orders' contract. Change a column and you break a service you did not even open. You have all the operational cost of many services and none of the independence.

The line that defines a real split

A service split is only real when each service owns its data privately. Other services may not touch its tables — they ask through its API or react to its events. If two services share tables, you have not built two services; you have built one service with a confusing deployment story.

Database per service: the rule and its bill

The discipline is simple to state: a service's data is private. The only way in is through the service. This is what buys you the independence the whole microservices bet was about — each team can change its schema, pick its storage, and deploy without a cross-team meeting.

The bill arrives immediately, and it is steep:

No more cross-service JOINs. "Show me orders with the customer's name" used to be one query. Now the data lives in two services and you must compose it in code, cache it, or duplicate it.
No more cross-service transactions. You cannot wrap "take payment" and "reserve stock" in one BEGIN / COMMIT when they live in different databases. The safety net is gone.

That second loss is the big one. Everything else in this article is a technique for living without the cross-service transaction you used to take for granted.

The shift: from ACID to "true in a moment"

Inside one database, you get strong consistency: the instant a transaction commits, everyone sees the new truth. Across services, that guarantee is gone. What you get instead is eventual consistency: the system will agree on the truth soon — usually milliseconds, sometimes seconds — but not in the same instant.

This is not a bug to be fixed; it is physics. A famous result (the CAP theorem) says that when the network between services fails — and it will — you must choose between staying available and staying perfectly consistent. Most business systems choose availability and design around a brief window where, say, the order exists but the loyalty points have not landed yet.

	Strong consistency	Eventual consistency
When is it true?	The instant you commit	A moment later, once events propagate
Scope	One database	Across services / regions
You pay in	Coupling, contention, harder scaling	Brief disagreement windows you must design for
Right for	Money inside one ledger, a single aggregate	Cross-service workflows, read models, analytics

The reframe that makes it bearable

Ask the business, not the database: "is it acceptable for this to be true a second later?" For analytics, search indexes, notifications, and recommendations — almost always yes. For "did this exact card already get charged?" — design that boundary so the money lives inside one service's strong transaction, and let the rest be eventual.

The dual-write problem (and the outbox)

Here is the bug that catches almost everyone first. A service needs to do two things: save to its own database and publish an event. The naive code does them one after another:

// The dual-write bug: two systems, no shared transaction
await db.orders.insert(order)              // 1) committed to the database
await broker.publish("OrderPlaced", order)   // 2) what if we crash right here?

If the process dies between step 1 and step 2, the order exists but nobody was told. Payment never runs. The order is a ghost. Worse, you cannot fix it by reordering — publish first and you might announce an order that never saved.

The clean fix is the transactional outbox. Instead of publishing directly, you write the event into an outbox table in the same transaction as the order. A separate relay then reads the outbox and publishes. One commit, no gap.

// Outbox: order and event commit together, or not at all
await db.transaction(async (tx) => {
  await tx.orders.insert(order)
  await tx.outbox.insert({ type: "OrderPlaced", payload: order })
})
// A relay polls the outbox (or tails the DB log) and publishes — retrying safely.

Because the relay retries, delivery is at-least-once — which is exactly why the previous part insisted every consumer be idempotent. The two ideas are partners.

Nguyen Le Phong sits beside a teammate at a workstation, pointing at a simple database-to-outbox-to-broker flow while they review how one commit becomes a safe publish. — The outbox earns its keep in moments like this: the team can point to one local commit, one relay, and a publish path that no longer depends on lucky timing.

Sagas: transactions without a rollback button

Now the hard case: a single business action that spans services — charge the card, reserve the stock, confirm the order — where step three can fail after steps one and two succeeded. There is no ROLLBACK that reaches across three databases. The answer is the saga: break the action into a sequence of local transactions, and for each step define a compensating action that undoes it.

There is no database that can roll all three services back at once. Instead, a failed step triggers compensating actions that walk the completed steps backward — a refund, a stock release — each one a normal local transaction.

If "confirm order" fails because stock ran out, the saga does not magically rewind. It runs the undo steps in reverse: release the reservation, refund the card. Each compensation is an ordinary local transaction — and notice it is a business reversal, not a technical one. A refund is not the same as "the charge never happened"; the customer may have seen it on their statement. Sagas force you to model failure as a real-world event, which is uncomfortable and also more honest.

Sagas come in the two flavours from the last part: orchestrated (a coordinator drives the steps and is easy to follow) or choreographed (services react to each other's events, more decoupled but harder to trace). For anything money touches, most teams prefer an orchestrator they can watch.

Nguyen Le Phong leads a dusk planning session where a small engineering group maps the steps and compensations of a multi-service workflow before confirming the final outcome. — A saga becomes easier to trust when the team names the compensations before the happy path ships, while everyone can still see who must undo what if a later step fails.

CQRS and read models: serving the data you scattered

Database-per-service broke your JOINs. So how do you render a dashboard that needs data from six services? You build a read model: a separate, denormalised copy shaped exactly for that screen, kept up to date by listening to events. This is the readable half of CQRS — Command Query Responsibility Segregation — which simply means the model you write through and the model you read from do not have to be the same model.

Write side: small, consistent, validates business rules.
Read side: wide, fast, often eventually consistent, optimised for queries.

CQRS shines when reads and writes have wildly different shapes or scale — a product catalogue read millions of times and written rarely. It is overkill when a plain table serves both fine, which is most of the time. Reach for it to solve a specific read problem, never because it sounds advanced.

Event sourcing: keep the events, derive the state

The most advanced option flips storage on its head. Instead of saving the current state and overwriting it, you store the full sequence of events that led here — AccountOpened, MoneyDeposited, MoneyWithdrawn — and compute the balance by replaying them. The events become the source of truth; state is just a cached opinion of them.

The upside is real: a perfect audit log for free, the ability to ask "what was true last Tuesday?", and the freedom to build new read models from history. The costs are equally real — you must version events forever, snapshot for performance, and rethink how you delete data under privacy law. Most systems should not start here.

A grounded recommendation

Event sourcing is a sharp tool for a few genuinely event-shaped domains — ledgers, audit-heavy workflows, anything where "how we got here" matters as much as "where we are." For everything else, an outbox plus a read model gives you most of the benefit at a fraction of the lifetime cost.

Choosing, without the hype

Problem you actually have	The honest answer
Save a row and tell others	Transactional outbox + idempotent consumers
One business action across services	Saga with compensating actions (prefer orchestrated)
A screen joining many services' data	A read model fed by events (the read half of CQRS)
Reads and writes scale very differently	Full CQRS — separate write and read stores
History and audit are first-class	Event sourcing — and accept its lifetime costs
It all fits in one service / one DB	A single ACID transaction. Do not distribute it.

The honest view by company size

Solo / early startup. One database, real transactions, no sagas. The single biggest data advantage you have is that everything can still be strongly consistent in one COMMIT. Do not give that up for an architecture diagram.
Growing scale-up. As you carve off your first few services, give each its own data and adopt the outbox the day you publish your first event. Introduce a saga only for the one or two workflows that genuinely cross services and money. Add read models when a screen starts fanning out into many calls.
Enterprise. Eventual consistency is the default and teams are fluent in it. The investment shifts to tooling: schema/version governance for events, saga monitoring, and read models as a first-class, owned part of the platform. Event sourcing appears in the few domains that earn it, not everywhere.

Key takeaways

A real split means private data. If services share tables, you built a distributed monolith — all the cost, none of the independence.
You trade the ACID transaction for eventual consistency. Ask the business "is it fine for this to be true a second later?" — and keep money inside one service's strong transaction.
The dual-write bug is real; the outbox fixes it. Commit the event and the row together, then relay — which is why consumers must be idempotent.
Sagas replace rollback with compensation. Model failure as a real-world reversal (a refund), not a technical undo. Prefer an orchestrator for anything touching money.
CQRS and event sourcing are sharp tools, not defaults. Reach for a read model to solve a real query problem; reach for event sourcing only when history itself is the product.

You can now split services, let them talk through events, and keep their data honest. There is one promise left unkept — that the system stays standing when, not if, the network drops a message, a service stalls, or a dependency has a bad day. Paying that "distributed-systems tax" with timeouts, retries, circuit breakers, and idempotency is the final part of this series.

What did you think?