Case study

Designing for Failure: What Happens When Your API Layer Goes Down

Adding resilience to a client app that depended on a flaky third-party API — circuit breakers, retries, and graceful degradation.

Node.jsTypeScript

Context

A side project I built depended on an external API for real-time data. The API was occasionally slow or returned errors. When it failed, our app would hang or show blank screens. Users had no idea what was going on. I needed to design for the case where the API is down — without building a full replica of the external service. The goal was: when the API fails, the app should degrade gracefully, not crash.

Constraints

  • No control over the external API — we couldn't fix their uptime
  • Solo project — the solution had to be simple enough to implement and maintain alone
  • Users needed some feedback — a blank screen was worse than a 'data temporarily unavailable' message

Architecture

I implemented a three-layer approach: retry with exponential backoff for transient failures, a circuit breaker to stop hammering the API when it's clearly down, and fallback responses when the circuit is open. The circuit breaker tracks failure rate; after N consecutive failures, it opens and fails fast for a cooldown period. During that period, the UI shows cached data (if we have it) or a clear 'service unavailable' state. I used a simple in-memory circuit breaker — no Redis, no distributed state — because this was a single-instance app. The key was defining what 'graceful' meant: we could show stale data, we could show a message, but we would never leave the user staring at a spinner forever.

Alternatives considered

  • Queue all API calls and process async: Overkill for a side project. The user needed near-real-time data; queuing would have added latency and complexity we didn't need.
  • Just retry with a fixed delay: Retrying without a circuit breaker means we keep hitting a down API, wasting resources and potentially making things worse. The circuit breaker stops the bleeding.

Lessons learned

  • Define what 'graceful degradation' means before you build it. For us it was: show cached or fallback UI, never infinite loading.
  • Circuit breakers are simple in concept, tricky in practice. Tuning the threshold and cooldown took a few iterations.
  • Users prefer a clear 'something is wrong' message over a spinner that never resolves.