Circuit Balancer: Preventing Cascading Failures
Circuit breakers are essential in preventing cascading failures in interconnected systems. Cascading failures happen when an issue in one service causes a domino effect, leading to failures in other dependent services, and ultimately a complete system outage.
Partial errors are tolerable, but a full-blown outage can cripple the system. Let's break it down with an example to understand the problem and its solution better.
The Problem: Cascading Failures
Imagine a social media application where users interact by consuming and posting content.
The Feed: A central feature relies on two dependencies:
Recommendations
Trending topics
Each of these dependencies further relies on other services:
Recommendations depend on Profile and Post services.
Trending has its own set of dependencies, including Post DB and Profile DB.
This forms a chain of interdependencies. If the Profile DB starts slowing down, it cascades across the chain. The delay in one service impacts another, eventually slowing down or breaking the entire system.
Over time, this cascading slowdown can snowball into a major outage.
Why Does This Happen?
Cascading failures occur when services transitively depend on others. For example:
A connection timeout in a TCP or HTTP call.
A single slow or failing service impacts others downstream.
The Solution: Circuit Breaker
To prevent cascading failures, we can use a circuit breaker.
What Is a Circuit Breaker?
A circuit breaker monitors service health and blocks calls to failing or unhealthy services. If a service is unhealthy, the circuit breaker "trips," and calls to that service are avoided.
How Does It Work?
Before making a call to a service, the circuit breaker checks its health:
If the service is healthy: Proceed with the call.
If the service is unhealthy: Avoid the call to prevent cascading failures.
Implementation of a Circuit Breaker
Here’s how it can be implemented in practice:
Centralized Configuration:
A common database stores the status of all services.
The configuration includes whether a service is healthy or not.
Local Caching:
- Services cache the configuration locally to reduce the load on the central database.
Health Check Before Calls:
Before a service makes a call to another, it checks the cached configuration for the health status of the target service.
If the service is marked as unhealthy, the call is skipped.
Manual Overrides:
If a service is known to be down or experiencing issues, its status can be manually set to unhealthy.
This prevents unnecessary calls and reduces strain on the failing service.
Why Use a Circuit Breaker?
Prevents Full Outages: By blocking calls to unhealthy services, you limit the impact of partial failures.
Minimizes Latency: Avoids waiting for timeouts on unhealthy services.
Manual Control: Gives you the ability to proactively manage service status.