Below is an explanation of some drawbacks when the same /health actuator endpoint is used for both liveness and readiness probes, especially in the context of graceful shutdowns and rolling updates:
Lack of Differentiation Between Probes
- Liveness vs. Readiness:
Liveness probes are meant to capture whether the application is “alive” (i.e. not in a completely broken state) whereas readiness probes determine if the application is prepared to handle requests. Using a single endpoint for both may mask or blur these distinctions.
- During Shutdown:
When the application is beginning a graceful shutdown, it should ideally fail the readiness probe to stop receiving new traffic, even though the core processes are still alive. If both probes share the same endpoint and that endpoint continues to return ‘healthy’, the container may continue to receive traffic, delaying or interfering with the shutdown process.
Impact on Graceful Shutdown
- Incomplete Shutdown Signals:
A shared /health endpoint might continue to report the application as “healthy” even if the application is in the process of shutting down. This can result in:
- The load balancer or orchestrator (e.g. Kubernetes) continuing to send traffic to the pod.
- Ongoing requests interfering with the orderly shutdown of in-flight requests and resources.
- Shutdown Coordination Challenges:
Graceful shutdown procedures usually require the application to stop accepting new requests while completing current ones. Without a dedicated readiness check that fails during shutdown, the infrastructure might not be informed correctly that the pod is no longer ready to handle new traffic.
Rolling Updates Concerns
-
Premature Traffic Routing:
In rolling updates, if the readiness probe doesn’t clearly signal that a pod is about to shut down (because it’s tied with the liveness probe that stays positive), the infrastructure might keep routing new requests to a pod that’s in the midst of a shutdown. This can cause:
- Request timeouts
- Increased latency
- Potentially dropped connections for clients
-
Inconsistent State Exposure:
A shared endpoint might not provide the necessary granularity to account for transitional states. For instance, during a rolling update, a pod might be partially drained but still report a healthy overall status, leading to inconsistent behavior across your cluster.
Recommendations
-
Separate Endpoints for Liveness and Readiness:
Consider configuring distinct endpoints (or customizing the /health endpoint logic) so that the readiness probe can reflect the actual state of the application’s ability to handle traffic, especially during shutdown phases.
-
Modify Health Check Logic During Shutdown:
During the shutdown process, modify the readiness health check to report an “unready” status, even if the liveness check still passes. This tells the orchestrator to stop routing traffic to the pod while you complete your graceful shutdown procedures.
Summary
Using a single /health endpoint for both probes can result in:
- Inaccurate signaling during graceful shutdown scenarios
- The potential for new traffic being routed to pods that are in a transient state
- Complications during rolling updates, where pods might be prematurely taken out of service
Managing these concerns typically involves separating the signals for readiness and liveness, ensuring that during shutdown, the pod correctly indicates it is not ready to accept new requests while still being “live” enough to finish processing existing work.