BulkHead Pattern
Pre-requisite: Threadpools in Java. Read my post on Concurrency in Java jump to ThreadPools Section.
Bulkhead is one of the fault tolerance patterns that protects the system from complete failure when there is a failure/leak in one of the components. Refer: Bulkhead(partition). Limits the boundary of failure.
- Problem:
- Identifying the problem pattern(UnResponsive/Very Slow Endpoint):
- Assume an API endpoint is choking to respond quickly for reasons like it’s dependent on multiple external sources, so this delay in response holds the client connections for more than reasonable time leading to
Socket Timeout or Read timeout (HttpResponse is not received, this is http-client side error)
The httpClient will fail with socket timeout
- Note: The medicine for this symptom is to apply BulkHead pattern. But, before concluding we gotta check if this is a one-off case or consistent behavior.
- Do a load test on this endpoint:
- If it’s consistently slow and unresponsive after a certain load the service rejects new requests due to app-server thread starvation. Based on this behavior we can conclude that this needs to be wrapped with BulkHead decorator from Resilience4j
- Details: For each incoming request to the endpoint, new http-connections are opened and as the load increases by calling the problematic endpoint the http-connections are held up leading to the app-server thread starvation and rejects new requests leading to 503 Service Unavailable.
- If it’s consistently slow and unresponsive after a certain load the service rejects new requests due to app-server thread starvation. Based on this behavior we can conclude that this needs to be wrapped with BulkHead decorator from Resilience4j
- Assume an API endpoint is choking to respond quickly for reasons like it’s dependent on multiple external sources, so this delay in response holds the client connections for more than reasonable time leading to
- Identifying the problem pattern(UnResponsive/Very Slow Endpoint):
- Solution:
- After the certain threshold, if the results of the API responses is consistently 503, then BulkHead pattern must be applied
- Bulkhead can be of 2 types
SEMAPHORE
: Operates on Application server thread-pool.-
In Spring Boot + Tomcat:(Request per thread model). Default App Server threads are 200.
-
Through configuration: Dedicated number of app-server threads can be assigned to an endpoint using semaphore bulkhead.
-
THREADPOOL
: Operates on separate(non app-server) thread pool with thread names prefixed with bulk-head name.Note: This ThreadPool based BulkHead’s real performance potential can be realised when used in Reactive Setups, not in request-per-thread-model setups because it’s still a blocking call.
Practical Semaphore based BulkHead
Implementation in a Java Service using Resilience4j
-
Config
# SRK BulkHead Demo spring: application: name: srk-bh-app server: port: 8080 resilience4j: bulkhead: instances: stubbornService: max-concurrent-calls: 2 # 2 max-wait-duration: 0ms
-
Java Snippet
@Slf4j @RestController @Data public class ApiController { @GetMapping("/call-unresponsive-service") @Bulkhead( name = "stubbornService", type = Bulkhead.Type.SEMAPHORE, fallbackMethod = "bulkHeadFallback") public String callUnResponsiveService() throws InterruptedException { log.info("SRK: calling unresponsive service "); try { return slowRunningService(); } catch (InterruptedException e) { log.error("SRK: Exception calling -> ", e.getLocalizedMessage()); throw e; } catch (BulkheadFullException e) { /*- - This will never get executed as this is not the right place to catch this - When there is no fallback option used in @Bulkhead annotation, the caller of this method will have to catch this exception */ log.error("SRK: Bulk head semaphore is full -> ", e.getLocalizedMessage()); throw e; } } public String slowRunningService() throws InterruptedException { Thread.sleep(30000); return "Stubborn service listened"; } /*- When the bulkhead is full this is called as it is configured as fallbackMethod=bulkHeadFallback*/ public String bulkHeadFallback(BulkheadFullException e) { log.error( "SRK: In bulkHeadFallback Bulkhead(Semaphore). ErrorMessage:{}", e.getLocalizedMessage()); return "Stubborn service unresponsive"; } }
References: