API Gateway Patterns

A practical look at API gateway responsibilities: routing, authentication boundaries, rate limiting, request aggregation, backend-for-frontend trade-offs, failure modes, and the observability needed to keep the edge of a system understandable.

By Nguyen Le PhongApril 25, 20266 min read

Software Architecture
API Gateway
Backend for Frontend
Rate Limiting
System Design
Observability

The first sign is usually small. A teammate opens the network tab during a morning debug session and sees the page making eleven calls before it can render one customer profile. One call asks for identity, another for billing, another for permissions, another for feature flags, and two more exist because an older mobile app still needs a different shape. Nothing is dramatic yet, but everyone can feel the edge of the system getting noisy.

An API gateway is often introduced at that moment. It gives clients one front door instead of asking every browser, mobile app, partner integration, and internal tool to know the map of all backend services. Used carefully, it makes the outside of the system calmer. Used carelessly, it becomes a second application where routing, security, business logic, and emergency fixes gather until no one is sure what actually lives behind the front door.

A gateway often arrives when the outside of the system starts feeling busier than any one client should have to understand.

The plain responsibility of a gateway is to handle cross-cutting work at the edge. It routes requests to the right service. It enforces authentication and often starts authorization checks. It applies rate limits, request size limits, and sometimes IP or tenant-level rules. It can normalize headers, attach correlation IDs, terminate TLS, and give every backend a consistent way to learn who is calling. These are useful jobs because they are repeated everywhere, and repetition at the edge quickly turns into drift.

Routing is the first pattern most teams notice. A gateway can send /api/orders to the order service, /api/payments to payment, and /api/profile to identity without exposing the internal service names to clients. This is not only a convenience. It lets the team move a service, split a route, or run a migration behind the gateway while keeping the public contract stable. The quiet discipline is to keep routing rules boring and visible. If a routing table starts hiding product decisions, the gateway has stopped being a map and started becoming a maze.

Two backend teammates map request flow around an API gateway on a whiteboard while sketching how routes reach downstream services. — Routing stays useful when the gateway keeps the map visible instead of hiding product decisions inside edge rules.

Authentication is the next boundary to treat with care. The gateway is a good place to verify a token, reject expired credentials, and attach a trusted identity context before the request reaches a service. But it should not be the only place where important authorization lives. A gateway can say, this caller is Nguyen, tenant A, with these claims. The order service still needs to decide whether that caller can read this specific order. Keeping that split prevents a dangerous comfort where the inner services quietly assume every request from the gateway is allowed.

Rate limiting is another edge responsibility that looks simple until it meets real traffic. A useful limit protects the system from accidental loops, scraping, partner bugs, and sudden spikes without punishing normal users. The unit matters: per IP, per user, per tenant, per API key, or per route. A public search endpoint, a payment callback, and an admin export do not deserve the same rule. Good limits also explain themselves with clear status codes, retry hints, and logs that tell the operator which key crossed the line. A silent throttle is just a slower outage.

Some gateways also aggregate requests. Instead of making a mobile app call five services to draw a dashboard, the gateway calls those services and returns one view-shaped response. This can remove network chatter and make clients simpler, especially on mobile connections. The trade-off is ownership. The gateway now knows the shape of a product screen, and when that screen changes, the gateway changes too. That is why request aggregation should usually stay close to a client experience, not become a general place for business rules that belong to the domain services.

This is where the backend-for-frontend pattern earns a calm discussion. A BFF is a gateway tailored to one frontend or one family of clients: web, iOS, Android, partner API, internal admin. It can give each client the response shape it needs without forcing every backend service to serve every presentation concern. The cost is more surfaces to own. A web BFF and a mobile BFF can drift in behavior if no one watches the contracts. The pattern is worth it when client needs are genuinely different. It is less useful when it only gives the team another layer to patch because changing the service feels slow.

The biggest failure mode is letting the gateway become the new monolith. It starts with harmless glue: one header rewrite, one fallback, one temporary transformation. Six months later, pricing rules, permission exceptions, data joins, A/B logic, and partner-specific behavior all live in the gateway because it was the fastest place to make a change. The system may still look like microservices from the outside, but the real product logic has collected in the edge layer. A helpful rule is simple: the gateway may coordinate and protect the edge; it should not become the owner of domain truth.

Another failure mode is forgetting that the gateway is now on every critical path. If it is down, every service can be healthy and the product still feels down. Timeouts, circuit breakers, graceful degradation, and safe defaults matter here. A gateway that aggregates five downstream calls needs a clear answer for partial failure: should it return a partial profile, hide one widget, serve cached data, or fail the whole request? That decision should be made deliberately, not discovered during an incident while logs are still warming up.

Observability is what keeps the gateway honest. Every incoming request should receive a correlation or trace ID that follows it into downstream services. Logs should include the route, caller type, tenant or API key when safe, status code, latency, selected backend, rate-limit decision, and timeout reason. Metrics should show traffic by route, error rate, p95 latency, throttled requests, upstream failures, and cache hit rate if caching exists. Traces should make it obvious whether the gateway itself is slow or whether it is waiting on a backend. Without that visibility, the gateway becomes the place everyone blames and nobody can prove.

An on-call engineer and teammate review gateway dashboards with latency charts and trace flows during a partial failure. — Once the gateway sits on every critical path, traces and rate-limit logs are what keep the edge understandable under pressure.

A good API gateway is not exciting. It is more like a well-run reception desk in a busy building: it checks identity, points people to the right room, keeps the queue from overwhelming the staff, and leaves enough notes that someone can understand what happened later. The healthier question is not can we put this in the gateway? but does putting it at the edge make the system clearer, safer, and easier to operate? If the answer is no, the work probably belongs closer to the service that owns the truth. If you have seen a gateway help a team, or slowly turn into the place every shortcut lands, that contrast is often where the best architecture lesson lives.

What did you think?

Related reading