k8s – Liveness and Readiness Probes

Many applications running for long periods of time eventually transition to broken states, and cannot recover except by being restarted. When you run them inside Kubernetes, it provides liveness probes to detect and remedy such situations. Moreover, if your Container needs to work on loading large data, configuration files, or migrations during startup, specify a readiness probe.
Liveness vs Readiness Probes
Before we begin, let’s have a little bit of theory here.
kubelet
A kubelet
is an agent that runs on each node in the cluster. It makes sure containers are running in a pod but it doesn’t manage containers which were not created by Kubernetes.
It takes a set of PodSpecs (as e.g. YAML files) and ensures that the containers described there are running and healthy. kubelet
has basically one job: given a set of containers to run, make sure they are all running.

liveness
The kubelet
uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
readiness
The kubelet
uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.
A side note: both of these healthchecks can define initialDelaySeconds
. If this is undefined, they will start counting at the same time, as soon as a pod is scheduled and created. If you want livenessProbe
to start after the readinessProbe
(i.e. wait enough time for readiness to be likely verified first), you will need to adjust their initialDelaySeconds
.
Kubernetes

The basic YAML template for these probes is really very simple:
readinessProbe:
httpGet:
path: /health/ready
port: 3000
livenessProbe:
httpGet:
path: /health/alive
port: 3000
You just define path
and port
for HTTP healthchecks. As I said previously, you can also provide configuration for them like:
initialDelaySeconds
: Number of seconds after the container has started before liveness or readiness probes are initiated.periodSeconds
: How often (in seconds) to perform the probe. Default to 10 seconds. The minimum value is 1.timeoutSeconds
: Number of seconds after which the probe times out. Defaults to 1 second. The minimum value is 1.successThreshold
: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness. The minimum value is 1.failureThreshold
: When a Pod starts and the probe fails, Kubernetes will tryfailureThreshold
times before giving up. Giving up in case of liveness probe means restarting the Pod. In case of readiness probe, the Pod will be marked Unready. Defaults to 3. The minimum value is 1.
What is more, httpGet
probes have additional fields:
host
: Hostname to connect to, defaults to the pod IP.scheme
: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.path
: Path to access on the HTTP server.httpHeaders
: Custom headers to set in the request.port
: Name or number of the port to access on the container. The number must be in the range of 1 to 65535.
What is even more, there are not only httpGet
probes but also TCP
and command
ones but I’ve already told you more than it’s important for the scope of this article.

Elixir
Once we know what Liveness and Readiness Probes are, and since we are able to define them for Kubernetes, let’s finally implement them in our code.
For that purpose, we will leverage plug
library. Plugs are composable middlewares mounted in Controllers, as a part of a Router
or defined for the entire Endpoint
.
Forwarding
The very first approach I suggest is to leverage forward/2
from Plug.Router
.
defmodule PlugForward do
use Plug.Routerplug(:match)
plug(:dispatch)forward(
“/health/live”,
to: Liveness
)
end
With a simple function, you forward all requests on the given path to the particular Plug, in our case it’s a liveness probe.
Mounting
The other way you may use is to directly mount a Plug inside your router.
defmodule PlainPlug do
use Plug.Routerplug(Liveness)
# regular paths defined here
end
It’s basically as simple as that, there’s no need for any additional configuration, assuming your Plug will handle requests correctly.
Configuration
I always like to have my libraries/dependencies configurable. Thus, we can provide both the liveness path and response in our configuration:
config :healthchex,
liveness_path: “/health/live”,
liveness_response: “OK”
Later on, in our module, we can fetch them like that:
defmodule Healthchex.Probes.Liveness do
import Plug.Conn@default_path Application.get_env(:healthchex, :liveness_path, “/health/live”)
@default_resp Application.get_env(:healthchex, :liveness_response, “OK”)def init(opts) do
%{
path: Keyword.get(opts, :path, @default_path),
resp: Keyword.get(opts, :resp, @default_resp)
}
enddef call(conn, _opts), do: conn
end
What we do here is fetching our configuration with some defaults if nothing found, then, we are turning opts
list into a map with specific keys which will be used in call/2
function too.
Keep in mind, when including an external library, its configuration must be provided in a project’s own config.exes
file.
Response
Finally, we have to implement a simple function that will allow responding to the health-check. The response will be simply 200 "OK"
but can be configurable of course.
defmodule Healthchex.Probes.Liveness do
# …def call(%Plug.Conn{request_path: path} = conn, %{path: path, resp: resp}) do
conn
|> send_resp(200, resp)
|> halt()
enddef call(conn, _opts), do: conn
end
As you can see, we are checking what is the actual request_path
in the incoming connection
. If it matches the one configured previously (either via config.exes
or by path
option), we are halting the connection and return a successful response. Otherwise, we pass the request through.
If you want to see more usage examples and the readiness probe definition, the entire code is available here:
KamilLelonek/healthchex
A set of Plugs to be used for Kubernetes health-checks KamilLelonek/healthchex
You can use it as a dependency for your project and include both health-checks in your application’s Router.
Subscribe to get the latest content immediately
https://tinyletter.com/KamilLelonek
Summary

To sum up, I’d like to share with you some best practices regarding Liveness and Readiness Probes.
- Avoid checking dependencies in liveness probes. Liveness probes should be inexpensive and have response times with minimal variance.
- The
initialDelaySeconds
parameter should be longer than maximum initialization time for the container. - Regularly restart containers to exercise startup dynamics and to avoid unexpected behavioral changes during initialization.
- If the container evaluates a shared dependency in the Readiness probe, set its timeout longer than the maximum response time for that dependency.
I hope you will find these Plug
s useful and leverage them in your own applications.
Liveness and Readiness Probes – The Theory
On each node of a Kubernetes cluster there is a Kubelet running which manages the pods on that particular node. Its responsible for getting images pulled down to the node, reporting the node’s health, and restarting failed containers. But how does the Kubelet know if there is a failed container?
Well, it can use the notion of probes to check on the status of a container. Specifically a liveness probe.
Liveness probes indicate if a container is running. Meaning, has the application within the container started running and is it still running? If you’ve configured liveness probes for your containers, you’ve probably still seen them in action. When a container gets restarted, it’s generally because of a liveness probe failing. This can happen if your container couldn’t startup, or if the application within the container crashed. The Kubelet will restart the container because the liveness probe is failing in those circumstances. In some circumstances though, the application within the container is not working, but hasn’t crashed. In that case, the container won’t restart unless you provide additional information as a liveness probe.
A readiness probe indicates if the application running inside the container is “ready” to serve requests. As an example, assume you have an application that starts but needs to check on other services like a backend database before finishing its configuration. Or an application that needs to download some data before it’s ready to handle requests. A readiness probe tells the Kubelet that the application can now perform its function and that the Kubelet can start sending it traffic.
There are three different ways these probes can be checked.
- ExecAction: Execute a command within the container
- TCPSocketAction: TCP check against the container’s IP/port
- HTTPGetAction: An HTTP Get request against the container’s IP/Port
Let’s look at the two probes in the context of a container starting up. The diagram below shows several states of the same container over time. We have a view into the containers to see whats going on with the application with relationship to the probes.
On the left side, the pod has just been deployed. A liveness probe performed at TCPSocketAction and found that the pod is “alive” even though the application is still doing work (loading data, etc) and isn’t ready yet. As time moves on, the application finishes its startup routine and is now “ready” to serve incoming traffic.

Let’s take a look at this from a different perspective. Assume we have a deployment already in our cluster, and it consists of a single replica which is displayed on the right side, behind our service. Its likely that we’ll need to scale the app, or replace it with another version. Now that we know our app isn’t ready to handle traffic right away after being started, we can wait to have our service add the new app to the list of endpoints until the application is “ready”. This is an important thing to consider if your apps aren’t ready as soon as the container starts up. A request could be sent to the container before its able to handle the request.

Liveness and Readiness Probes – In Action
First, we’ll look to see what happens with a readiness check. For this example, I’ve got a very simple Apache container that displays pretty elaborate website. I’ve created a yaml manifest to deploy the container, service, and ingress rule.
apiVersion: v1 kind: Pod metadata: labels: app: liveness name: liveness-http spec: containers: name: liveness image: theithollow/hollowapp-blog:liveness livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 3 periodSeconds: 3 readinessProbe: httpGet: path: /health port: 80 initialDelaySeconds: 3 periodSeconds: 3 apiVersion: v1 kind: Service metadata: name: liveness spec: selector: app: liveness ports: - protocol: TCP port: 80 targetPort: 80 apiVersion: extensions/v1beta1 kind: Ingress metadata: name: liveness-ingress namespace: default spec: rules: host: liveness.theithollowlab.com http: paths: backend: serviceName: liveness servicePort: 80
This manifest includes two probes:
- Liveness check doing an HTTP request against “/”
- Readiness check doing an HTTP request against /health
livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 3 periodSeconds: 3 readinessProbe: httpGet: path: /health port: 80 initialDelaySeconds: 3 periodSeconds: 3
My container uses a script to start the HTTP daemon right away, and then waits 60 seconds before creating a /health page. This is to simulate some work being done by the application and the app isn’t ready for consumption. This is the entire website for reference.

And here is my container script.
/usr/sbin/httpd > /dev/null 2>&1 &. #Start HTTP Daemon sleep 60. #wait 60 seconds echo HealthStatus > /var/www/html/health #Create Health status page sleep 3600
Deploy the manifest through kubectl apply. Once deployed, I’ve run a --watch
command to keep an eye on the deployment. Here’s what it looked like.

You’ll notice that the ready status showed 0/1 for about 60 seconds. Meaning that my container was not in a ready status for 60 seconds until the /health page became available through the startup script.
As a silly example, what if we modified our liveness probe to look for /health? Perhaps we have an application that sometimes stops working, but doesn’t crash. Will the application ever startup? Here’s my new probe in the yaml manifest.
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 3
periodSeconds: 3
After deploying this, let’s run another --watch
on the pods. Here we see that the pod is restarting, and I am unable to ever access the /health page because it restarts before its ready.

We can see that the liveness probe is failing if we run a describe on the pod.

You must be logged in to post a comment.