CircuitBreaker and Retry pattern
When building software the most important aspect to be considered is Resiliency. In the world of microservices, it’s vital to ensure that the service is resilient and has the ability to react to failure and still remain functional.
Fundamentally, there are two patterns that falls under resiliency patterns
- Circuit Breaker: Ensures that the application refrains from making calls to external system when it’s likely to fail.
- Retries Pattern: Intent is to do retries expecting that the failure is temporary, and it’s likely to succeed in subsequent retries.
These two patterns must be implemented in a way that they work together because there is no point of retrying when the CircuitBreaker is in OPEN state.
Here is the excerpt from Chatgpt
A circuit breaker is a design pattern that helps prevent an application from trying to perform an action that is likely to fail. It does this by “tripping” a switch when a certain threshold of failures is reached, and then preventing any further action from being taken until the switch is reset.
This helps to protect the application from repeatedly trying to perform an action that is not likely to succeed, which could cause the application to crash or become unresponsive.
The circuit breaker pattern is often used in distributed systems, where a service might depend on other services or resources that might not always be available. By using a circuit breaker, the service can fail gracefully and continue to function, rather than crashing or becoming unresponsive. For example, if a service depends on a database that is experiencing problems, the circuit breaker might trip, preventing the service from trying to access the database until the issue is resolved. This helps to make the service more resilient, as it is able to continue functioning even when one of its dependencies is not available.
Implementing CircuitBreaker and Retries in Spring Boot using Resilience4j
- Image credits:
- Source site having image: Resilience4j
- Image source: Resilience4j state machine
Terminology
SlidingWindowType
: If the sliding window type isCOUNT_BASED
, the last N(SlidingWindowSize) calls are recorded and aggregated.SlidingWindowSize
: Configures the size of the sliding window which is used to record the outcome of calls when the CircuitBreaker is closed.FailureRate
: By default all exceptions count as a failure. You can define a list of exceptions which should count as a failure. All other exceptions are then counted as a success, unless they are ignored. Exceptions can also be ignored so that they neither count as a failure nor success in CircuitBreaker- The count-based sliding window aggregates the outcome of the last N calls.
- If your sliding window size is 3, assume an array of size 3 is created and each success or failure calls are added as entries in that sliding window array
- But note that ignored exception in CircuitBreaker(Not to be confused with IgnoreExceptions under retries’ config)are neither counted as failure nor success in sliding window.
- Always the CLOSE/OPEN state transition is done after failure rate threshold computation is performed on the latest sliding window state.
Sample Retries and CircuitBreaker Configuration - For this blog post illustration
Retries:
- max-attempts: 4 # The pattern behavior is that it will do only 3 Retry Attempts
- wait-duration: 5s
-
exponential-backoff-multiplier: 1
Retry attempt wait duration=wait-duration*(exponential-backoff-multiplier)^(retry-count)
1st retry - wait duration=5*1^1=5s=5000ms
2nd retry - wait duration=5*1^2=5s=5000ms
3rd retry - wait duration=5*1^3=5s=5000ms
Cb:
- sliding-window-size: 3
- minimum-number-of-calls: 3
- failure-rate-threshold: 50
- wait-duration-in-open-state: 1m
- automatic-transition-from-open-to-half-open-enabled: true
- permitted-number-of-calls-in-half-open-state: 1
State Transitions:
- CLOSED
-
Transition to OPEN state
- With NonRetryableException cases - Retry pattern has no role in this case
- Call 1: If a call is made and failed and falls under NonRetryableException case say Reason/Status code = 301, reason phrase=Moved Permanently
- Call 2: If a call is made and failed and falls under NonRetryableException case say Reason/Status code = 301, reason phrase=Moved Permanently
- Call 3: If a call is made and failed and falls under NonRetryableException case say Reason/Status code = 301, reason phrase=Moved Permanently
- After 3 failures CB will move from CLOSED to OPEN state
- Why?
- Because,
minimum-number-of-calls: 3
so CB will calculate failure threshold only after the sliding window size has a at least a size of 3 having success or failure records. Now out of that it checks failure threshold of 50%. Let’s 3/2 = 1.5 calls are failing so OPEN CB.
- Because,
- With RetryableException cases - Retry pattern in action in this case
- Call 1: If a call is made and failed and falls under RetryableException case say Reason/Status code=404, reason phrase=Not Found
- 3 retries are attempted
- Retries will get exhausted
- Sliding window is appended with this new entry popping out the first entry if the size is already 3.
- This failure count is added as first entry in sliding window.
- Call 2: If a call is made and failed and falls under RetryableException case say Reason/Status code=500, reason phrase=Internal Server Error
- 3 retries are attempted
- Retries will get exhausted
- Sliding window is appended with this new entry popping out the first entry if the size is already 3.
- This failure count is added as second entry in sliding window.
- Call 3: If a call is made and failed and falls under RetryableException case say Reason/Status code=404, reason phrase=Not Found
- 3 retries are attempted
- Retries will get exhausted
- Sliding window is appended with this new entry popping out the first entry if the size is already 3.
- This failure count is added as third entry in sliding window.
- At this point CircuitBreaker has the data i.e., sliding window of size 3 with success/failure entries. Now, it will compute the failure threshold and move from CLOSED to OPEN state because the failure rate has reached the configured threshold.
- Call 1: If a call is made and failed and falls under RetryableException case say Reason/Status code=404, reason phrase=Not Found
- With NonRetryableException cases - Retry pattern has no role in this case
-
- OPEN
- Transition to HALF_OPEN
- Until a certain wait duration based on configuration it will wait and provides stubbed/canned responses.
- After the wait duration is completed, it will auto transition from OPEN to HALF_OPEN state as I have enabled the
automatic-transition-from-open-to-half-open-enabled
to true
- Transition to HALF_OPEN
- HALF_OPEN - An opportunity to move to CLOSED state otherwise to OPEN state
- In HALF_OPEN state it will stay indefinitely as I have not configured
max-wait-duration-in-half-open-state
- In this state, the sliding window/ring buffer and all the CircuitBreaker metrics like failureRate threshold, failedCalls, succesCalls count everything is reset.
- Until all permitted number of calls(say if it is configured to 1) are done then based on success or failure it will move to CLOSED or OPEN accordingly.
- In this state Sliding window size will be equal to
permitted-number-of-calls-in-half-open-state
- For CB to decide on state transition to OPEN or CLOSED it will compute the failure threshold and in this example configuration it is 50%, so it will compute it as 50%1 = 0.5. As
permitted-number-of-calls-in-half-open-state
is 1, if 1 call is failed, failure rate is 100% and it will transition to OPEN state, similarly if 1 call is successful, success rate is 100% and it will transition to CLOSED state - Transition to OPEN state
- With NonRetryableException cases - Retry pattern has no role in this case
- Call 1: Call is made and failed - NonRetryableException so no retries
- As 1 permitted number of calls are failed, failure rate is 100% in HALF_OPEN state CircuitBreaker will transition to OPEN state
- With RetryableException cases - Retry pattern in action in this case
- Call 1: Call is made and failed
- 3 retries are attempted
- As 1 permitted number of calls are failed, failure rate is 100% in HALF_OPEN state CircuitBreaker will transition to OPEN state
- Call 1: Call is made and failed
- With NonRetryableException cases - Retry pattern has no role in this case
- Transition to CLOSED state
- Call 1: If a call is made and successful
- As 1 permitted number of calls are successful, failure rate is 0% in HALF_OPEN state it will transition to CLOSED state
- In HALF_OPEN state it will stay indefinitely as I have not configured
Listening to CircuitBreaker and Retry events in spring boot
Note that when you register the CB event in Spring boot the event notification thread will always run with thread name: CircuitBreakerAutoTransitionThread
Example snippet:
private CircuitBreaker getCircuitBreaker(String cbInstance) {
CircuitBreaker circuitBreaker = cBreakerRegistry.circuitBreaker(cbInstance);
circuitBreaker
.getEventPublisher()
.onStateTransition(cbEvent -> { /*code in here will run in separate thread named: CircuitBreakerAutoTransitionThread*/ });
return circuitBreaker;
}
However, the below retry event listener always runs in current thread
private Retry getRetry(String retryInstance) {
Retry retry = retryRegistry.retry(retryInstance);
retry
.getEventPublisher()
.onRetry(retryEvent -> {/*will run from current thread */});
return retry;
}
References: