Spring Cloud Circuit Breaker Resilience4j
A guide to handle partial failure in microservices
A distributed system, which comprises many services interacting to achieve business goals, is prone to failures in the chain of service dependencies.
Suppose service A calls service B, which calls service C, but C does not respond. Service C may be down, or overloaded, and take a long time to respond, causing errors that may cascade and cause the system to fail.
In this article, we will explore how Spring Cloud Circuit Breaker can help you design services that prevent partial failures from cascading throughout systems.
Circuit Breaker Pattern
Chris Richardson, author of the book “Microservice Patterns”, proposes the following solution.
A service client should invoke a remote service via a proxy that functions in a similar fashion to an electrical circuit breaker. When the number of consecutive failures crosses a threshold, the circuit breaker trips, and for the duration of a timeout period all attempts to invoke the remote service will fail immediately.
After the timeout expires the circuit breaker allows a limited number of test requests to pass through. If those requests succeed the circuit breaker resumes normal operation. Otherwise, if there is a failure the timeout period begins again.
Resilience4j Circuit Breaker
Resilience4j provides higher-order functions (decorators) to enhance any functional interface, lambda expression or method reference. Circuit Breaker is implemented via a finite state machine with three states: CLOSED, OPEN and HALF_OPEN.
When the “circuit” is CLOSED, requests can reach the service, when it is OPEN it fails immediately. After a period of time OPEN it moves to HALF_OPEN, if the failures are below the threshold it moves to CLOSED, or returns to OPEN otherwise.
Spring Cloud Circuit Breaker Resilience4j
Scenario
In the following example, clients make money transfers when the Remittance Service is down. Consequently, the gateway proxies these requests to an unhealthy Remittance Service.
A poor implementation could block the client’s call, blocking a thread indefinitely. Eventually, the API Gateway may run out of threads, becoming unable to handle any incoming requests. As a result, the entire API would crash causing a system outage.
Maven Dependencies
First, we add Spring Cloud Starter Circuitbreaker Reactor Resilience4j, Spring Boot Starter WebFlux and Spring Boot Starter Actuator dependencies to our pom.xml
.
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
Configuring Retry Circuit Breakers
You can configure Resilience4j Circuit Breaker, Retry and Time Limiter instances in your application’s configuration properties file.
resilience4j:
circuitbreaker:
instances:
remittance-service:
failureRateThreshold: 50
waitDurationInOpenState: 5s
permittedNumberOfCallsInHalfOpenState: 3
minimumNumberOfCalls: 5
slidingWindowType: COUNT_BASED
slidingWindowSize: 10
eventConsumerBufferSize: 10
registerHealthIndicator: true
retry:
instances:
remittance-service:
maxAttempts: 3
waitDuration: 1s
timelimiter:
instances:
remittance-service:
timeoutDuration: 5s
cancelRunningFuture: true
Here, we have created instances of the remittance-service
for Circuit Breaker, Retry, and Time Limiter configuration in our application.yml
file.
Moving on, let’s define CircuitBreaker.class
, Retry.class
, and TimeLimiter.class
as beans.
@Configuration
class CircuitBreakerConfiguration {
@Bean
CircuitBreaker remittanceServiceCircuitBreaker(CircuitBreakerRegistry registry) {
return registry.circuitBreaker("remittance-service");
}
@Bean
Retry remittanceServiceRetry(RetryRegistry registry) {
return registry.retry("remittance-service");
}
@Bean
TimeLimiter remittanceServiceTimeLimiter(TimeLimiterRegistry registry) {
return registry.timeLimiter("remittance-service");
}
}
Here, our CircuitBreaker.class
is obtained by calling CircuitBreakerRegistry.circuitBreaker()
with the remittance-service
parameter. Similarly, we can obtain Retry.class
through RetryRegistry.retry()
, and TimeLimiter.class
by calling TimeLimiterRegistry.timeLimiter()
.
Decorate Mono and Flux
Resilience4j allow us to provide custom Spring Reactor operators. The operators make sure that a downstream subscriber can acquire a permission to subscribe to an upstream Publisher.
Below is an example of custom Reactor operators that decorate a Mono
(Flux
is also supported.)
@RestController
public class RemittanceServiceProxy {
private final WebClient webClient;
private final CircuitBreaker remittanceServiceCircuitBreaker;
private final Retry remittanceServiceRetry;
private final TimeLimiter remittanceServiceTimeLimiter;
public RemittanceServiceProxy(WebClient webClient,
CircuitBreaker remittanceServiceCircuitBreaker,
Retry remittanceServiceRetry,
TimeLimiter remittanceServiceTimeLimiter) {
this.webClient = webClient;
this.remittanceServiceCircuitBreaker = remittanceServiceCircuitBreaker;
this.remittanceServiceRetry = remittanceServiceRetry;
this.remittanceServiceTimeLimiter = remittanceServiceTimeLimiter;
}
@PostMapping("v1/transfers")
public Mono<ResponseEntity<Void>> transferV1(Transfer transfer) {
return webClient.post()
.uri("/transfers")
.body(Mono.just(transfer), Transfer.class)
.retrieve()
.toBodilessEntity()
.transformDeferred(CircuitBreakerOperator.of(remittanceServiceCircuitBreaker))
.transformDeferred(RetryOperator.of(remittanceServiceRetry))
.transformDeferred(TimeLimiterOperator.of(remittanceServiceTimeLimiter))
.onErrorReturn(ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).build());
}
}
Here, Mono is decorated by passing the operators to transformDeferred()
.
Collecting Metrics
Spring Cloud Circuit Breaker Resilience4j includes auto-configuration to setup actuator metrics.
# actuator
management:
endpoint.health.show-details: always
health.circuitbreakers.enabled: true
You can view the metrics collection by calling GET /actuator/health
.
{
"status": "UP",
"components": {
"circuitBreakers": {
"status": "UNKNOWN",
"details": {
"remittance-service": {
"status": "CIRCUIT_OPEN",
"details": {
"failureRate": "100.0%",
"failureRateThreshold": "50.0%",
"slowCallRate": "0.0%",
"slowCallRateThreshold": "100.0%",
"bufferedCalls": 3,
"slowCalls": 0,
"slowFailedCalls": 0,
"failedCalls": 3,
"notPermittedCalls": 2,
"state": "OPEN"
}
}
}
}
}
}
Thanks for reading. I hope this was helpful!
The example code is available on GitHub.