Nguyen Le Phong

소프트웨어 아키텍처 기초9부 중 7부

Events, Queues, and the Art of Not Calling Each Other: Event-Driven Architecture, Explained

When services hold hands through synchronous calls, latency adds up and one slow dependency takes down checkout. A no-hype guide to event-driven architecture: sync vs async, commands vs events, what brokers really promise (at-least-once, not exactly-once), choreography vs orchestration, and exactly when NOT to reach for events.

You have felt this before. An POST /checkout that should take 200 ms is now timing out, and when you open the trace you see why: the order service calls payment, which calls inventory, which calls the email service, which calls a third-party that is having a slow afternoon. Four services holding hands, and the customer is staring at a spinner because the last one in the chain is tired.

The first six parts of this series were about structure: how to draw boundaries inside a codebase and between services. This part is about the next problem that shows up the moment you have more than one service — how they talk to each other. Event-driven architecture is the most misunderstood answer to that question: loved as a silver bullet, feared as needless complexity, and rarely explained in plain words. Let us fix that.

The problem: services that hold hands

When one service calls another and waits for the reply, we call it synchronous. It is the natural first instinct, because it reads like a normal function call. The trouble is what happens when you chain them.

  • Latency adds up. If each hop takes 80 ms, a four-hop chain is 320 ms before anyone does real work.
  • Failure spreads. If the email service is down, checkout fails — even though emailing a receipt is not the point of buying something.
  • Coupling hardens. The order service has to know about payment, inventory, and email. Add a fifth reaction and you edit the order service again.

The deep issue is that the caller is doing two different jobs at once: completing the order and orchestrating everyone who cares about the order. Those are not the same job, and bolting them together is what makes the spinner spin.

Two ways for services to talk

There are exactly two shapes of conversation, and most architecture arguments are really arguments about which one to use where.

 Synchronous (request/response)Asynchronous (messaging)
Mental modelA phone call — both sides on the line at onceA letter in the post — sent now, read later
CouplingCaller knows the callee and waits for itSender knows only the broker, not the readers
FailureCallee down → caller fails nowReader down → message waits in the queue
LatencySum of every hopSender returns immediately; work happens in the background
Best for"I need the answer to continue" (read a balance, validate a token)"This happened; whoever cares can react" (order placed, file uploaded)

Notice the last row. Synchronous is the right tool when you genuinely need the answer to keep going. Asynchronous is the right tool when you are announcing that something happened and other people's reactions are their business, not yours. Event-driven architecture is what you get when you lean into that second shape on purpose.

Here is the same checkout, before and after. The synchronous version makes the customer wait for everyone:

// Before: checkout owns — and waits on — everybody downstream
async function checkout(cart) {
  const order = await orders.create(cart)
  await payment.charge(order)        // slow
  await inventory.reserve(order)      // slow
  await email.sendReceipt(order)      // slow, and not the point
  return order                        // customer waited for all four
}

The event-driven version finishes the order and announces it. Whoever cares reacts on their own time:

// After: checkout completes, then publishes a fact and returns
async function checkout(cart) {
  const order = await orders.create(cart)
  await broker.publish("orders", { type: "OrderPlaced", order })
  return order                        // fast — payment, inventory, email react later
}

The customer no longer pays the email service's slow afternoon. Just as importantly, adding fraud scoring tomorrow means writing a new subscriber, not editing checkout again.

The vocabulary, in plain words

The jargon is small once you strip the mystique off it.

  • Message — any blob of data one service hands to another through a middleman.
  • Command — a message that asks for something to happen: ChargeCard. It has one intended handler and can be refused.
  • Event — a message that announces something already happened: OrderPlaced. It is a fact in the past tense, addressed to no one in particular.
  • Broker — the middleman that holds messages and delivers them: RabbitMQ, Kafka, NATS, AWS SNS/SQS, Google Pub/Sub.
  • Topic / queue — the named mailbox a message lands in. Producers write to it; consumers read from it.
  • Producer / consumer — the one who publishes, and the one who subscribes and reacts.

The single most important distinction is command vs. event, because it decides who is in charge. A command is "you, do this." An event is "this happened; do whatever you think is right." Confuse the two and you build a system that looks event-driven but is secretly a pile of remote-control buttons.

One service publishes an event to a broker; several independent services subscribe and react on their own time. PUBLISH ONCE · REACT MANY Order Service emits OrderPlaced Message Broker topic: orders Payment Inventory Email / Analytics subscribe
The publisher does not know who listens. Adding a new reaction — fraud scoring, a loyalty update, a webhook — means adding a subscriber, not editing the order service.

Three flavours of "event"

"We use events" can mean three quite different things. Knowing which one you mean saves a lot of arguments.

  1. Event notification. A thin ping: "order 1234 was placed." If a consumer needs details, it calls back to ask. Smallest payload, but it can create chatty call-backs.
  2. Event-carried state transfer. The event carries everything a consumer needs — the whole order — so no call-back is required. Heavier messages, but consumers stay independent. This is the workhorse of most event-driven systems.
  3. Event sourcing. The events are the source of truth; current state is rebuilt by replaying them. Powerful and rare — most teams do not need it, and we will treat it as an advanced option in the next part.
A simple default

Start with event-carried state transfer for the handful of facts other teams care about, and plain synchronous calls for everything that needs an immediate answer. You can live a long, happy life on just those two before you ever reach for event sourcing.

What the broker actually promises

Here is where good intentions meet physics. Brokers do not promise the clean "exactly once, in order" world you imagine. They promise something more honest and more awkward.

  • At-least-once delivery is the norm. The broker may deliver the same message twice — after a network blip, a crash, or a redelivery. "Exactly once" is mostly marketing; what real systems do is at-least-once delivery plus idempotent consumers.
  • Ordering is limited. Global ordering across a topic is expensive; most brokers only guarantee order within a partition or key. Design so that out-of-order arrivals are survivable.
  • Failed messages need a home. A message that keeps failing should land in a dead-letter queue rather than blocking the line forever.

The practical consequence is one rule worth tattooing on the team: every consumer must be safe to run twice. That property is called idempotency, and it is the difference between a robust system and one that emails the customer three receipts.

// Not idempotent: a duplicate delivery double-charges and double-emails
async function onOrderPlaced(evt) {
  await payment.charge(evt.amount)        // runs again on redelivery
  await email.sendReceipt(evt.customerId)
}

// Idempotent: the event id is the guard. Safe to run twice.
async function onOrderPlaced(evt) {
  if (await seen.has(evt.id)) return   // already handled — do nothing
  await payment.charge(evt.amount)
  await email.sendReceipt(evt.customerId)
  await seen.add(evt.id)
}

Who is in charge: choreography vs. orchestration

Once work spans several services, something has to coordinate it. There are two philosophies, and the choice shapes how you will debug at 2 a.m.

  • Choreography. No conductor. Each service listens for events and reacts, emitting its own events in turn. Order emits OrderPlaced; payment hears it and emits PaymentTaken; shipping hears that. Beautifully decoupled — and genuinely hard to follow, because the "flow" lives in no single place.
  • Orchestration. A coordinator (an "order saga") explicitly tells each service what to do next and tracks progress. Easier to see and to reason about; the trade-off is that the coordinator becomes a thing you must own and keep simple.
A useful rule of thumb

Use choreography for loose "fan-out" reactions where no one needs to know the whole story (analytics, notifications, search indexing). Use orchestration for a real business transaction with steps that must succeed or be undone together — which is exactly the saga problem we unpack in the next part on data.

Where event-driven architecture earns its keep

  • Fan-out. One thing happens and many unrelated reactions follow. Adding the seventh reaction should not mean editing the thing that happened.
  • Spiky or slow work. Video encoding, PDF generation, sending a million emails — push it onto a queue and let workers chew through it without making the user wait.
  • Buffering load. A queue absorbs a traffic spike that would otherwise knock over a synchronous downstream service.
  • Decoupling teams. The team that owns "orders" can publish facts without learning about every team that consumes them.

When NOT to reach for events

Events are not free, and reaching for them too early is one of the most common ways to manufacture accidental complexity.

  • You need an immediate answer. "Is this coupon valid?" is a question, not an announcement. Just call the service.
  • It is a two-service app. A broker, a schema registry, and a dead-letter queue are a lot of machinery to avoid a direct call between your only two services.
  • The team has never operated a broker. Async failures are invisible by default. Without tracing and good dashboards, a stuck queue is a silent outage you discover from angry customers.
  • Strong consistency is mandatory. If the business cannot tolerate "true in a second or two," eventual consistency will hurt — see the next part.
The "everything is an event" trap

The failure mode is turning every function call into an event and ending up with a system whose logic is scattered across twenty subscribers and impossible to trace. Asynchrony is a tool for decoupling things that are genuinely independent — not a default for everything. If you cannot explain a flow without a whiteboard and four colours, you have over-applied it.

The honest view by company size

  • Solo / early startup. You almost certainly do not need a broker yet. A background job table in your existing database (poll a jobs table, mark rows done) covers 90% of "do this later" needs with none of the operational weight.
  • Growing scale-up. Introduce a managed broker (SQS, Pub/Sub, a hosted Kafka) for the two or three genuine fan-out points: order events, file processing, notifications. Keep synchronous calls for everything that needs an answer. Invest in tracing before you invest in more topics.
  • Enterprise. Events become the backbone between teams, with schema governance, dead-letter handling, replay tooling, and explicit ownership of each topic. The hard part is no longer technology; it is contracts — agreeing on what an event means and never breaking it casually.

How to start on Monday

  1. Find one place where a caller is waiting on work the user does not need to wait for — sending email, generating a report, syncing to a third party. That is your first event.
  2. Publish a past-tense fact (OrderPlaced), not a command. Let the email service subscribe instead of being called.
  3. Make the consumer idempotent from day one, keyed on the event id. Assume it will be delivered twice.
  4. Add a dead-letter queue and an alert on it before you go live. A growing dead-letter queue is your earliest warning sign.
  5. Add tracing so a single request can be followed across the hop. Async without observability is debugging blindfolded.

Key takeaways

  • Sync is a phone call; async is a letter. Use synchronous calls when you need the answer to continue, and messaging when you are announcing that something happened.
  • Commands ask; events announce. The command/event distinction decides who is in charge — get it right before anything else.
  • Brokers promise at-least-once, not exactly-once. Therefore every consumer must be idempotent and safe to run twice. This is non-negotiable.
  • Choreography decouples; orchestration clarifies. Fan-out reactions love choreography; real multi-step transactions want an orchestrator — the saga of the next part.
  • Events are for genuine independence, not for everything. Reaching for a broker in a two-service app, or turning every call into an event, manufactures the very complexity you were trying to avoid.

Splitting the work was the easy half. The moment services stop sharing a database, a harder question arrives: who owns the data, and what does "true" even mean when the truth is spread across five services? That is the next part of this series.

이 글 어떠셨나요?