What Does Readiness Probes Mean?
Also known as: readiness probe, Kubernetes health check, CKAD exam readiness probe, pod design, Kubernetes probes
On This Page
Quick Definition
A readiness probe is a health check that Kubernetes uses to determine if a container inside a pod is ready to receive network traffic. If the probe fails, Kubernetes stops sending requests to that pod until it recovers. This helps ensure that only healthy, working containers serve users or other services.
Must Know for Exams
The Certified Kubernetes Application Developer (CKAD) exam places a strong emphasis on pod design, including health checks with readiness probes. The exam objectives explicitly list "Configure readiness probes" under the Pod Design section (13% of the exam). Candidates must be able to define YAML manifests that include readiness probe configurations with appropriate parameters. Exam questions often present a scenario where an application is failing because traffic is routed to pods that are not fully initialized, and the candidate must add a readiness probe to solve the issue.
Questions may test the difference between readiness, liveness, and startup probes. A common topic is understanding that readiness probes only affect traffic routing, not pod lifecycle. For example, if a readiness probe fails, the pod is not restarted; only the Service endpoints are updated. This is a key distinction that exam questions exploit. Candidates also need to know the three probe mechanisms (HTTP, TCP, Exec) and when to use each. A scenario might involve a legacy application that does not expose an HTTP endpoint, so the candidate must choose a TCP socket probe instead. Another scenario might involve an application that writes a readiness flag to a file, requiring an Exec probe. The exam also tests configuration fields: initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold. A typical question might set a periodSeconds of 5 and a failureThreshold of 3, and ask what happens after 15 seconds of failures. The answer is that the pod is marked not ready after 15 seconds (5 seconds times 3 failures). Candidates must also understand that readiness probes run continuously, not just at startup. The CKAD is a hands-on exam where you write and debug YAML files in a live Kubernetes cluster, so practical knowledge of readiness probes is essential. The CKA (Certified Kubernetes Administrator) exam also covers readiness probes, particularly in the context of Services and network traffic management.
Simple Meaning
Imagine you work at a busy post office that sorts and dispatches packages. Each sorting machine has a green "ready" light that tells the central conveyor belt system it is working properly and can accept new packages. If the light is off, the conveyor belt system skips that machine and sends packages to another one that is ready.
A readiness probe works exactly like that green light for a container running inside a Kubernetes pod. Kubernetes, which orchestrates your applications, uses this probe to ask the container a simple question: "Are you ready to serve traffic?" The container answers by running a small check, such as opening a specific network port or reading a certain file.
If the check passes, Kubernetes keeps the pod in the pool of active servers and routes requests to it. If the check fails, Kubernetes stops sending new requests to that pod, but it does not restart the container. This is different from a crash, where the container stops entirely.
The pod stays alive, and Kubernetes periodically rechecks the probe. Once the container recovers and the probe passes again, Kubernetes resumes sending traffic to it. This mechanism prevents users from hitting a pod that is still starting up, busy processing a heavy workload, or temporarily malfunctioning.
For example, an application might need to load a large configuration file or connect to a database before it can handle user requests. While it is doing that setup, the readiness probe fails, and Kubernetes keeps the pod hidden from incoming traffic. Only after the setup completes and the probe passes does the pod become visible and start serving.
This intelligent gating prevents errors, timeouts, and poor user experiences. Without readiness probes, Kubernetes would blindly send traffic to pods that are not ready, causing failures and frustration. Think of it like a store that only opens its doors when the staff has finished stocking the shelves and the cash registers are working.
The readiness probe is the manager who checks everything before unlocking the door.
Full Technical Definition
A readiness probe is a diagnostic mechanism defined in the Kubernetes PodSpec for each container. It tells the kubelet agent running on each node how to determine whether a container is ready to serve traffic. Readiness probes are part of the Kubernetes container lifecycle and are distinct from liveness probes and startup probes. While a liveness probe checks if the container is still running and restarts it if not, a readiness probe only controls whether the pod is added to the list of endpoints for a Kubernetes Service. This distinction is critical for rolling updates, scaling, and traffic management.
Kubernetes supports three types of readiness probes: HTTP GET probes, TCP socket probes, and Exec probes. An HTTP GET probe makes an HTTP request to a specified endpoint, such as /healthz, on the container's IP address and port. If the response status code is between 200 and 399, the probe is considered successful. A TCP socket probe attempts to open a TCP connection to a specified port on the container. If the connection succeeds, the probe passes. An Exec probe runs a command inside the container, such as cat /tmp/ready, and checks the exit code. A zero exit code means success; any non-zero code means failure. Each probe type allows you to configure parameters: initialDelaySeconds (how long to wait after the container starts before probing), periodSeconds (how often to probe), timeoutSeconds (how long to wait for a response), successThreshold (how many consecutive successes needed to mark the container as ready after a failure), and failureThreshold (how many consecutive failures cause the probe to mark the container as not ready).
When a pod is created, the readiness probe begins checking after the initial delay. If the probe fails enough times to reach the failure threshold, the kubelet updates the pod's status to indicate it is not ready. The Kubernetes controller that manages Services (the endpoints controller) then removes the pod's IP address from the list of endpoints for any Service that selects the pod. This stops traffic from being routed to the pod. Once the probe starts passing again and meets the success threshold, the pod is marked ready and re-added to the Service's endpoints. Readiness probes are also used during rolling updates. When a new pod version is deployed, the old pods are kept serving until the new pods pass their readiness probes. This ensures zero-downtime deployments. In cluster autoscaling, readiness probes help the autoscaler decide if new nodes are needed, because pods that are not ready do not count toward resource utilization targets. Real-world implementations often use a custom HTTP endpoint that checks internal dependencies, such as database connections, cache availability, or upstream API health, before returning a 200 status. This makes the readiness probe a holistic health gate, not just a simple process check.
Real-Life Example
Think of a large office building with a secure entrance that requires a key card to enter. Each employee has a key card that grants access only if they are currently authorized to be in the building. The security desk has a system that checks each card against a database. When you swipe your card, the system runs a readiness probe: it checks your employee status, whether you have completed safety training, and whether your access level matches the current time (for example, after-hours access). If everything is correct, the door unlocks and you can enter. If any check fails, the door stays locked, and you cannot enter the building. This system protects the office from unauthorized or unprepared individuals.
Now map this to Kubernetes. The pod is like an employee arriving at the building. The readiness probe is the security system that checks the pod's credentials. The HTTP endpoint /healthz is like the key card reader that sends your ID to the database. The database is the application inside the container that checks if it can connect to dependencies like a database or a cache. If the application reports success (status 200), the security system unlocks the door, and the pod is added to the Service's endpoint list. Traffic flows to the pod like people entering the building. If the application reports failure (status 500), the door stays locked, and the pod is removed from the Service. The security system retries the check every few seconds (periodSeconds). Once the application recovers and the probe passes again, the door unlocks, and the pod starts receiving traffic again. This analogy also explains the initial delay: the pod needs time to boot up and initialize, just like an employee needs time to scan their card after arriving at the lobby. The readiness probe waits that initial delay before checking, so it does not block the pod while it is still starting up.
Why This Term Matters
In real IT operations, applications must remain available and responsive even when individual components fail or restart. Readiness probes are a fundamental tool for achieving this reliability in containerized environments managed by Kubernetes. Without them, traffic would be routed to containers that are not yet ready, causing timeouts, 503 errors, and a poor user experience. For example, during a rolling update, old pods are gradually replaced with new versions. If a new pod fails its readiness probe, Kubernetes leaves the old pod running and serving traffic. This prevents downtime and ensures users never see error pages. This is critical for e-commerce platforms, banking systems, and any service that must be available 24/7.
Readiness probes also enable graceful shutdowns. When a pod needs to be terminated for maintenance or scaling, the readiness probe can be set to fail before the container actually stops. This tells Kubernetes to remove the pod from the Service endpoints before the container process dies. Any in-flight requests can complete, and no new requests are sent to the dying pod. This graceful draining prevents dropped connections and data loss. In cloud-native architectures, where microservices communicate with each other, readiness probes ensure that a service only receives traffic if it can handle it. For instance, a payment processing service might need to verify its connection to the fraud detection database. If that database is unreachable, the readiness probe fails, and the payment service stops accepting new requests until the connection is restored. This prevents partial failures and cascading errors across the system. System administrators and SREs rely on readiness probes to maintain service level objectives (SLOs) and reduce mean time to recovery (MTTR). By automating health checks, they reduce the need for manual intervention and enable self-healing infrastructure. In short, readiness probes are a cornerstone of production-grade Kubernetes deployments.
How It Appears in Exam Questions
In the CKAD exam, readiness probes appear in multiple question formats. The most common is a configuration question: the candidate is given a YAML file for a Deployment or Pod that is missing a readiness probe, and the application is failing because pods are receiving traffic before they are ready. The candidate must add a readiness probe with specific parameters. For example, an HTTP readiness probe at /healthz with an initial delay of 10 seconds, a period of 5 seconds, and a failure threshold of 3. The question may also require the candidate to select the correct probe type based on the application description.
Another pattern is a troubleshooting question. The exam presents a scenario where a Service is not routing traffic to pods even though the pods appear to be running. The candidate must inspect the pod's status using kubectl describe pod and notice that the readiness probe is failing. They then need to identify the cause, such as a wrong port number or a missing endpoint path, and correct the YAML. There are also scenario-based architecture questions that ask which probe type to use for a specific application. For example, for a web server, use an HTTP probe; for a database that accepts TCP connections, use a TCP probe; for an application that creates a temporary file to indicate readiness, use an Exec probe. Some questions combine readiness probes with rolling updates. The candidate might be asked how to achieve zero-downtime deployments using readiness probes. The correct answer involves ensuring the new pods pass their readiness probe before the old pods are terminated. Questions may also test the behavior of readiness probes in combination with liveness probes. For instance, if both probes fail, the liveness probe restarts the container, while the readiness probe removes the pod from the Service. Understanding this interplay is crucial. Finally, exam questions may ask about the default behavior when no readiness probe is defined. The answer is that Kubernetes assumes the pod is ready as soon as it reaches the Running state, which is often not ideal because the container might still be initializing. This knowledge helps candidates recognize when a readiness probe is necessary.
Study cncf-ckad
Test your understanding with exam-style practice questions.
Example Scenario
A company runs a web application on Kubernetes. The application is a Node.js server that connects to a MongoDB database at startup. The initial connection takes between 5 and 15 seconds depending on network conditions. Without a readiness probe, Kubernetes starts sending user traffic to the pod immediately after the container starts, even though the database connection is not yet established. Users see a "Service Unavailable" error page.
To fix this, the development team adds an HTTP readiness probe to the Deployment manifest. They create a /health endpoint in the Node.js application that returns HTTP 200 only after the database connection is confirmed. The probe is configured with an initialDelaySeconds of 10, a periodSeconds of 5, and a failureThreshold of 3. After the pod starts, Kubernetes waits 10 seconds before the first probe. It then checks the /health endpoint every 5 seconds. If the database connection is still not ready, the endpoint returns HTTP 503, and the probe fails. After three consecutive failures (15 seconds), Kubernetes marks the pod as not ready and removes it from the Service endpoints. No traffic reaches the pod until the database connection succeeds and the probe passes. Once the probe passes, the pod is added back to the Service, and users can access the application without errors. This simple change eliminates the startup errors and improves user experience during deployments.
Common Mistakes
Confusing readiness probes with liveness probes and using them interchangeably.
A readiness probe only affects traffic routing and does not restart the container. A liveness probe restarts the container if it fails. Using a readiness probe where a liveness probe is needed (or vice versa) leads to incorrect behavior. For example, if a container crashes, a readiness probe will remove it from the Service but will not restart it, causing a permanent outage.
Remember this distinction: readiness probes answer "Is this container ready to serve traffic?" and liveness probes answer "Is this container still alive?" Use readiness for traffic gating and liveness for crash recovery.
Setting the initialDelaySeconds too low or not setting it at all.
If initialDelaySeconds is zero or very low, the probe starts checking immediately after the container starts. The container may still be initializing, causing the probe to fail unnecessarily. This can cause the pod to be marked not ready prematurely, and Kubernetes might remove it from the Service before it even had a chance to become ready.
Set initialDelaySeconds based on the known startup time of your application. For example, if your app takes 10 seconds to initialize, set initialDelaySeconds to 12 or 15 to give a buffer.
Using a readiness probe that checks the same condition as a liveness probe, like a simple process check.
If the readiness probe fails for the same reason the liveness probe fails, you lose the benefit of traffic gating. For example, if both probes check whether the process is running, a crashed process will cause the readiness probe to remove the pod from the Service, and the liveness probe will restart it. But the readiness probe's removal is redundant because the liveness probe will restart the container anyway. Additionally, the readiness probe might fail and remove the pod even when the container is still serving traffic but temporarily slow.
Design the readiness probe to check application-specific readiness conditions, such as dependency availability or cache warmup, not just process health. Let the liveness probe handle process-level health.
Forgetting that readiness probes run continuously, not just at startup.
Some learners think the readiness probe only runs during the initial startup phase. In reality, it runs continuously at the configured period. If the application becomes unhealthy later (for example, it loses its database connection), the readiness probe will fail, and the pod will be removed from the Service. This is a desired behavior, but if you do not account for it, you might be surprised when your pod stops receiving traffic hours after deployment.
Design your application's health endpoint to reflect readiness throughout its lifecycle. If the application becomes temporarily unable to serve traffic (e.g., during a maintenance window), the probe should fail to prevent traffic from being routed to it.
Exam Trap — Don't Get Fooled
A question presents a scenario where a pod's readiness probe fails, and the candidate assumes the pod will be restarted by Kubernetes. Remember the core purpose of each probe. A failing readiness probe only removes the pod from the Service's endpoint list.
It does not restart the pod. To trigger a restart, you need a liveness probe. In the exam, read the question carefully. If it asks about traffic routing or Service endpoints, the answer involves the readiness probe.
If it asks about container restart or crash recovery, the answer involves the liveness probe. Practice writing YAML for both to solidify the difference.
Commonly Confused With
A liveness probe checks if a container is still running and restarts it if the probe fails. A readiness probe checks if the container is ready to serve traffic and only affects traffic routing, not container lifecycle. They serve different purposes and should be configured separately.
If your web server crashes, the liveness probe detects it and restarts the container. If your web server is running but still loading a large configuration file, the readiness probe fails so that no traffic is sent until it is ready.
A startup probe is used for containers that have a slow initialization process, such as legacy applications that take minutes to start. It runs only at startup and, once it succeeds, the controller switches to using the liveness probe. A readiness probe runs continuously throughout the container's life. They are complementary but not interchangeable.
For a Java application that takes 2 minutes to start, use a startup probe with a high failure threshold and long period. Once the application is up, readiness probes take over to manage traffic routing during normal operation.
The /healthz endpoint is a common convention for exposing application health information, but it is not a probe itself. A readiness probe can be configured to use an HTTP GET request to /healthz, but it can also use TCP or Exec methods. The endpoint is the target of the probe, not the probe itself.
Your application might have an /healthz route that returns HTTP 200 when everything is fine. You can then create a readiness probe that calls that route. But you could also create a TCP probe that checks port 8080 directly, without using any HTTP endpoint.
Step-by-Step Breakdown
Define the Probe Type
Choose one of three probe types: HTTP GET, TCP socket, or Exec. HTTP GET is best for web applications that expose a health endpoint. TCP socket is suitable for services that accept TCP connections but do not have an HTTP interface. Exec is for applications that indicate readiness through a file or command. The choice depends on your application's architecture.
Configure the Probe Parameters
Set initialDelaySeconds to give the container time to initialize before the first probe. Set periodSeconds to define how often the probe runs. Set timeoutSeconds for the maximum wait time for a probe response. Set successThreshold and failureThreshold to control how many consecutive successes or failures are needed to change the readiness state. For example, failureThreshold of 3 with periodSeconds of 5 means the pod is marked not ready after 15 seconds of failures.
Write the YAML Manifest
Add the readinessProbe field under the container specification in your Pod or Deployment YAML. For an HTTP probe, specify the httpGet path, port, and optionally httpHeaders. For a TCP probe, specify the tcpSocket port. For an Exec probe, specify the command to run inside the container. Ensure the YAML is syntactically correct and aligns with your application's capabilities.
Deploy and Verify
Apply the manifest using kubectl apply. Use kubectl get pods to observe the pod's status. Use kubectl describe pod to see the readiness probe events, including successes and failures. Verify that when the probe fails, the pod's READY column shows 0/1 under the containers ready count. Also verify that the Service endpoints are updated accordingly using kubectl get endpoints.
Test Behavior During Failures
Simulate a failure by stopping the application's health endpoint or blocking the port. Observe that the pod is marked not ready and removed from the Service. Then restore the health endpoint and confirm the pod becomes ready again and is re-added to the Service. This validates that your readiness probe configuration works correctly in both failure and recovery scenarios.
Integrate with Rolling Updates
In a Deployment, configure a rolling update strategy with a maxSurge and maxUnavailable setting. The readiness probe ensures that new pods are only added to the Service after they pass the probe, and old pods are terminated only after new pods are ready. This creates a seamless zero-downtime deployment. Test by updating the Deployment image and monitoring the rollout process with kubectl rollout status.
Practical Mini-Lesson
Readiness probes are a critical part of Kubernetes pod design that every developer and administrator must understand to build reliable applications. In practice, you will define readiness probes in the YAML manifest of your Deployment, StatefulSet, or standalone Pod. The most common approach is to create a lightweight HTTP endpoint in your application, typically at /healthz or /ready, that returns a 200 status code only when the application is fully prepared to handle requests. This endpoint should check all critical dependencies: database connections, cache availability, upstream service connectivity, and any other resources the application needs to function. If any dependency is missing, the endpoint should return a non-2xx status, such as 503 Service Unavailable.
When configuring the probe parameters, you must understand the startup behavior of your application. For example, a Java application running on a JVM might take 30 seconds to initialize, while a Node.js app might be ready in 5 seconds. Set initialDelaySeconds appropriately to avoid false negatives. A good practice is to add a buffer of 5 to 10 seconds beyond the expected startup time. The periodSeconds value should balance between quick detection of failures and reducing load on the application. A period of 10 seconds is common for most production workloads. The failureThreshold determines how many failed probes are tolerated before the pod is considered not ready. For applications that experience temporary spikes, a higher threshold (e.g., 3) prevents flapping, where the pod oscillates between ready and not ready states.
One common mistake professionals make is using the same endpoint for both readiness and liveness probes. While it is possible, it is often not ideal. The readiness probe should be more detailed, checking specific dependencies that might fail temporarily without crashing the process. The liveness probe should be simpler, checking only that the process is alive and responsive. For example, a database connection loss might be recoverable without restarting the container, so the readiness probe should fail, but the liveness probe should still pass. This allows the container to continue running and retry the database connection without being killed.
In real-world production systems, readiness probes are also used in blue-green deployments and canary releases. During a canary release, a small percentage of traffic is routed to new pods. The readiness probe ensures that if the new version has issues, it is quickly removed from the Service, preventing it from impacting users. This makes readiness probes a foundational tool for safe application rollouts. Professionals should also monitor readiness probe metrics through Kubernetes events or third-party monitoring tools like Prometheus. If a pod's readiness probe fails frequently, it is a sign of underlying issues that need investigation. Understanding readiness probes deeply allows you to design self-healing, resilient systems that minimize downtime and deliver a consistent user experience.
Memory Tip
Readiness Probe: the door guard that blocks traffic until the server says 'I am ready.' Only affects routing, never restarts the container.
Covered in These Exams
Related Glossary Terms
Two-factor authentication (2FA) is a security method that requires two different types of proof before granting access to an account or system.
802.1X is a network access control standard that authenticates devices before they are allowed to connect to a wired or wireless network.
5G is the fifth generation of cellular network technology, designed to deliver faster speeds, lower latency, and support for many more connected devices than previous generations.
Frequently Asked Questions
What is the difference between a readiness probe and a liveness probe?
A readiness probe determines if a container is ready to serve traffic and only affects the Service endpoints. A liveness probe determines if a container is still running and restarts it if the probe fails. They serve different purposes and are often used together.
Can I define both a readiness probe and a liveness probe on the same container?
Yes, you can and often should define both. The readiness probe handles traffic gating, while the liveness probe handles crash recovery. They work independently.
What happens if a readiness probe fails after the container has been running for a long time?
The pod is marked as not ready and is removed from the Service endpoints. New traffic stops flowing to that pod. The probe continues to run, and if the container recovers and the probe passes, the pod is re-added to the Service.
How do I choose the right type of readiness probe?
Use HTTP if your application exposes an HTTP endpoint that can reflect readiness. Use TCP if the application accepts TCP connections but does not have an HTTP health endpoint. Use Exec if you need to run a custom command inside the container to check readiness, like checking for a file.
What value should I set for initialDelaySeconds?
Set initialDelaySeconds based on the time your application takes to start up. Check logs and startup patterns. Add a buffer of 5 to 10 seconds to avoid premature probe failures.
Can a readiness probe cause a pod to be restarted?
No, a readiness probe never causes a restart. Only a liveness probe can restart a container. A readiness probe only affects traffic routing.
Summary
Readiness probes are a Kubernetes mechanism that determines whether a container inside a pod is ready to accept network traffic. They act as a gatekeeper, ensuring that only healthy, fully initialized pods receive requests from users or other services. Unlike liveness probes, which restart failing containers, readiness probes only update the Service endpoint list, removing or adding pods based on their health status.
Understanding readiness probes is essential for the CKAD exam, where candidates must configure them in YAML manifests, choose the correct probe type, and set appropriate parameters. In real-world operations, readiness probes enable zero-downtime deployments, graceful shutdowns, and resilient microservices by preventing traffic from reaching pods that are starting up, overloaded, or temporarily unhealthy. Common mistakes include confusing readiness probes with liveness probes, setting initial delays incorrectly, and using overly simplistic check logic.
Remember the core rule: readiness probes control traffic flow, not container lifecycle. By mastering readiness probes, you build applications that are robust, self-healing, and capable of maintaining high availability even during failures and updates. For exam preparation, focus on hands-on practice with kubectl and YAML, and ensure you can distinguish between probe types and their behaviors in different scenarios.