CNCFKubernetesApplication DevelopmentBeginner22 min read

What Does Resource Requests and Limits Mean?

Also known as: Resource Requests and Limits, Kubernetes CPU request, Kubernetes memory limit, CKAD resource management, Kubernetes resource allocation

Reviewed byJohnson Ajibi· Senior Network & Security Engineer · MSc IT Security
On This Page

Quick Definition

Resource Requests and Limits are like reserving a seat and setting a spending cap for each container in a Kubernetes cluster. The Request is the minimum amount of CPU and memory a container is guaranteed to get. The Limit is the maximum amount it can use, even if extra resources are available. These settings help the cluster run reliably by preventing one container from starving others.

Must Know for Exams

The CNCF Certified Kubernetes Application Developer (CKAD) exam explicitly tests Resource Requests and Limits. This topic falls under the 'Core Concepts' and 'Pod Design' domains. According to the CKAD curriculum, candidates must be able to 'Define resource requirements for Pods' and 'Understand and configure resource limits for containers.' Expect to see at least one or two questions dedicated to this area.

The exam tests your ability to read and write YAML manifests that include the resources block. You may be asked to modify an existing Deployment to add CPU and memory Requests and Limits. Alternatively, you might need to inspect a running Pod and determine if its Resources are set correctly based on a scenario description. The exam emphasizes hands-on tasks using a live cluster, so you must know the exact syntax and fields.

Questions often present a scenario where a Pod is failing due to OOMKilled status. You must recognize that the memory Limit is too low and increase it. Another common pattern involves scheduling failures: a Pod stays in Pending state because the node has insufficient resources. You need to check the Pod events and adjust Requests to fit the available capacity.

The CKA (Certified Kubernetes Administrator) exam also covers this topic but focuses more on node resource management and troubleshooting. The KCNA (Kubernetes and Cloud Native Associate) exam includes basic understanding of Resource Requests and Limits as part of core Kubernetes concepts. For CKAD, you need practical skills, not just theory. You will be expected to edit manifests, apply changes, and verify results using kubectl commands.

Exam traps include confusing Requests with Limits, using incorrect units (like 'mb' instead of 'Mi'), and forgetting that memory Limits trigger OOM kills while CPU Limits cause throttling. Practicing with real YAML files is the best preparation.

Simple Meaning

Imagine you are in a shared office space with several teams working at desks. Each team needs a certain amount of space to work comfortably, like a desk, a chair, and some storage. The office manager wants to make sure every team has enough room, but also prevents one team from taking up too much space and leaving others cramped. Resource Requests and Limits work exactly like this for computer programs called containers that run on a Kubernetes cluster.

When you deploy a container to Kubernetes, you can tell the system: 'This container needs at least 0.5 of a CPU core and 256 megabytes of memory to run properly.' That is the Resource Request. Kubernetes uses this information to find a worker node with enough free capacity to host the container. If the node does not have at least that much CPU and memory available, Kubernetes will not place the container there. This ensures the container gets the resources it needs to start and run.

You can also say: 'But this container should never use more than 1 CPU core and 512 megabytes of memory.' That is the Resource Limit. If the container tries to use more CPU than the limit, Kubernetes will slow it down. If it tries to use more memory than the limit, Kubernetes will kill the container and restart it. This prevents a runaway program from hogging all the resources on the node and impacting other containers.

Think of it like a library study room with a reservation system. You reserve a spot for one hour (your Request), guaranteeing you have a place. You also agree not to stay longer than one hour (your Limit), so other people can use the room. Without these rules, someone might show up and find no space, or one person might stay all day and lock others out. Resource Requests and Limits keep the Kubernetes cluster fair and predictable.

Full Technical Definition

In Kubernetes, Resource Requests and Limits are fields defined in the containers section of a Pod specification. They are part of the core resource management system that controls how CPU and memory are allocated to containers running on cluster nodes. These settings are defined under spec.containers[].resources in the Pod YAML manifest. The resources block can contain requests and limits objects, each specifying cpu and memory values.

CPU is measured in Kubernetes compute units, which are equivalent to one physical core or one virtual CPU (vCPU) on the host node. You can specify fractional values such as 0.5 for half a core, or use the millicpu notation like 500m, which also means half a core. Memory is specified in bytes using binary or decimal units like Mi (mebibytes) or Gi (gibibytes). For example, 256 Mi equals 268,435,456 bytes.

When a Pod is scheduled, the kube-scheduler considers the sum of all Resource Requests on a node. It only places a Pod on a node if the node has at least that amount of unallocated CPU and memory. This guarantees that every container receives its requested resources. The scheduler does not consider Limits during placement; Limits are enforced at runtime by the kubelet.

At runtime, the kubelet uses Linux control groups (cgroups) to enforce Limits. For CPU, the kernel uses the Completely Fair Scheduler (CFS) to cap usage. If a container exceeds its CPU Limit, its processes are throttled — they are paused periodically to reduce their CPU consumption. For memory, exceeding the Limit triggers an Out of Memory (OOM) kill, causing the container to be terminated and restarted by the kubelet based on the Pod restart policy.

Resource Requests and Limits are also used by the Kubernetes vertical pod autoscaler (VPA) and the cluster autoscaler. The VPA can adjust Requests and Limits automatically based on actual usage. The cluster autoscaler uses Request values to decide if it needs to add or remove nodes. Limits affect quality of service (QoS) classes: containers with Requests equal to Limits are Guaranteed, containers with Requests less than Limits are Burstable, and containers with no Requests or Limits are BestEffort. These classes determine how the system handles resource pressure and eviction.

Real-Life Example

Think of a busy airport with a tarmac where planes park at gates. Each airline needs to reserve gate time and fuel for its flights. The Resource Request is like reserving a specific gate for your plane for a scheduled window. The airline says, 'I need gate A3 from 2:00 PM to 3:00 PM, and I need 5,000 liters of fuel.' The airport control checks the schedule. If gate A3 is available and enough fuel is in the tank, they confirm the reservation. This guarantees the plane has a place to park and enough fuel to depart.

Now, imagine the airline also agrees not to exceed a certain fuel usage and not to block the gate for longer than needed. That is the Resource Limit. They promise, 'I will not need more than 6,000 liters of fuel, and I will leave the gate by 3:00 PM at the latest.' The airport uses this to plan resources for other airlines.

If another flight tries to pull into the same gate at the same time, the system prevents it because that gate is already reserved. If a plane tries to take more fuel than its limit, the fuel pump stops. If it stays at the gate past 3:00 PM, airport staff will tow it away. In Kubernetes, if a container tries to use more memory than its limit, the system kills it. If it tries to use more CPU, it gets throttled.

Without these reservations and limits, chaos would ensue. Planes would compete for gates, some would run out of fuel mid-flight, and others would block the tarmac. Similarly, without Resource Requests and Limits, containers could starve each other of CPU and memory, causing crashes, slowdowns, and unpredictable behavior in the entire cluster.

Why This Term Matters

Resource Requests and Limits are fundamental to running reliable and efficient applications on Kubernetes. In real IT work, servers are not infinite. They have finite CPU cores and memory sticks. When you run multiple containers on the same server, they share these resources. Without proper configuration, one container can consume all available CPU or memory, causing other containers to slow down, crash, or fail to start. This leads to downtime, degraded performance, and unhappy users.

For system administrators and platform engineers, setting correct Requests and Limits allows you to plan capacity. You can estimate how many containers fit on a node, how many nodes you need, and when to scale. This directly impacts cloud costs because each node costs money. Overprovisioning wastes money; underprovisioning causes failures. Requests and Limits give you a dial to tune this balance.

In cloud-native environments, Resource Requests and Limits also affect auto-scaling decisions. The horizontal pod autoscaler (HPA) uses the average CPU or memory utilization relative to Requests to decide when to add or remove replicas. The cluster autoscaler uses the sum of Requests on unschedulable Pods to decide if it needs to add a new node. If you do not set Requests, the autoscalers cannot function properly.

From a security and stability perspective, Limits prevent noisy neighbors. In a multi-tenant cluster, a buggy or malicious container could try to consume all node resources, starving other workloads. Limits act as a safety net, capping the damage such containers can cause. This isolation is critical for production environments where uptime and fairness matter.

Finally, many Kubernetes distributions and managed services enforce quotas and admission controllers that require Requests and Limits. Without them, your Pods may not be allowed to run. Understanding this concept is not optional for anyone working with Kubernetes in a professional capacity.

How It Appears in Exam Questions

Resource Requests and Limits appear in several distinct patterns across CKAD and other Kubernetes exams. The most common is the configuration question: you receive a partial YAML file for a Pod or Deployment and must add the resources block. For example, the question might say, 'Create a Pod named nginx-pod that requests 256Mi of memory and 500m of CPU, and limits memory to 512Mi and CPU to 1.' You must edit the YAML and apply it to the cluster.

Another frequent pattern is the troubleshooting question. The question describes a symptom: a Pod is restarting repeatedly with 'OOMKilled' status, or a Pod is stuck in 'Pending' state with an event message like '0/3 nodes are available: 3 Insufficient memory'. You must identify the root cause, then fix it by adjusting the Requests or Limits in the Pod specification. The solution often involves increasing the memory Limit or decreasing the Request to fit available node resources.

Scenario-based questions ask you to analyze resource usage. For instance, you might see a Pod that has no Resources defined, and the question asks why it is performing poorly when other Pods are active. The answer involves explaining that the Pod has a BestEffort QoS class and gets throttled or evicted under resource pressure. You must then apply appropriate Requests and Limits.

Comparison questions ask you to differentiate between QoS classes: Guaranteed, Burstable, and BestEffort. You might be given three Pod specs and asked to classify them. To answer correctly, you must know that a Pod with Requests equal to Limits belongs to Guaranteed, with Requests less than Limits belongs to Burstable, and with no Resources defined belongs to BestEffort.

Finally, multi-resource questions combine Requests and Limits with other concepts like Namespace ResourceQuotas and LimitRanges. You might be asked to explain why a Pod fails to create despite having Requests set, due to a namespace quota. These questions test your understanding of how Kubernetes enforces resource governance at multiple levels.

Study cncf-ckad

Test your understanding with exam-style practice questions.

Practise

Example Scenario

You are a developer working on a web application that runs in a Kubernetes cluster. Your application consists of two components: a front-end web server and a back-end database. The front-end normally uses about 0.2 CPU cores and 150 MB of memory during regular traffic, but during a traffic spike, it can surge to 0.8 CPU cores and 400 MB of memory. The database uses about 0.5 CPU cores and 1 GB of memory steadily.

Your manager tells you to deploy these components in a cluster that is shared with other team's applications. You want to ensure your application always has enough resources to serve users, but you do not want to waste resources or hurt other applications. How do you configure Resource Requests and Limits?

For the front-end, you set a Request of 200m CPU and 150Mi memory, which guarantees the minimum needed for normal operation. You set a Limit of 1 CPU and 500Mi memory, allowing it to burst during traffic spikes but preventing it from consuming more than a full core or half a gigabyte. For the database, you set a Request of 500m CPU and 1Gi memory, matching its steady usage, and a Limit of 1 CPU and 2Gi memory to allow some headroom for background tasks.

Now, when the cluster is under heavy load from other applications, your front-end is guaranteed at least 200m CPU and 150Mi memory, so it stays responsive. If a memory leak causes the database to use more than 2Gi, Kubernetes kills and restarts it, preventing the leak from crashing the entire node. This configuration keeps your application stable and fair to others.

Common Mistakes

Setting a Limit but not setting a Request, or vice versa.

Kubernetes uses both values for different purposes. Without a Request, the scheduler has no guaranteed minimum, and the quality of service class becomes undefined or BestEffort. Without a Limit, a container can consume unlimited resources on a node, potentially starving other containers.

Always define both Requests and Limits together for each container. Even if they are the same value, specifying both gives you a Guaranteed QoS class, which provides the highest level of stability.

Using wrong units for memory, like MB or GB instead of Mi or Gi.

Kubernetes uses binary units by default. 1 Mi equals 1,048,576 bytes, while 1 MB equals 1,000,000 bytes. Using MB can lead to incorrect capacity calculations and unexpected OOM kills or scheduling failures.

Always use the standard Kubernetes memory units: Mi (mebibytes), Gi (gibibytes), or plain bytes as an integer. For CPU, use millicpus (m) for fractions of a core, like 500m for half a core.

Setting Requests higher than the available resources on any node.

If the total Requests exceed the capacity of the largest node, the Pod will never be scheduled and will remain in Pending state indefinitely. This causes application unavailability.

Check the capacity of your nodes using kubectl describe nodes. Ensure that the sum of Requests for your Pod fits within the allocatable resources of at least one node. Use smaller Requests if needed, or add larger nodes to the cluster.

Assuming CPU Limits work the same as memory Limits.

When a container exceeds its memory Limit, it gets killed (OOMKilled). When it exceeds its CPU Limit, it gets throttled — the kernel slows it down, but it does not get killed. Beginners often panic when they see high CPU throttling and think the container is failing.

Understand the difference: memory overlimit is fatal, CPU overlimit is performance degradation. Monitor CPU throttling as a signal to increase the Limit or optimize the application, not as a crash event.

Forgetting that resource limits apply per container, not per Pod.

A Pod can have multiple containers, each with its own resources block. If you set limits only on one container but ignore another, the second container can consume all node resources, affecting the first container.

When writing a multi-container Pod spec, define separate Requests and Limits for every container. Consider the total resource footprint of all containers in the Pod when scheduling.

Exam Trap — Don't Get Fooled

An exam question shows a Pod YAML with memory Limit set to 512Mi and memory Request set to 256Mi. The Pod runs fine for a few minutes, then enters CrashLoopBackOff with the reason 'OOMKilled'. The candidate is asked to fix the issue.

Many candidates incorrectly increase the memory Request. Remember that OOMKilled is caused by exceeding the memory Limit, not the memory Request. The fix is to increase the memory Limit (e.

g., to 1Gi) to allow the container more headroom. The Request can stay the same or be adjusted proportionally. Always check the Limit first when diagnosing OOM kills.

Commonly Confused With

Resource Requests and LimitsvsResource Quota

Resource Requests and Limits apply to individual containers, limiting how much CPU and memory a single container can request or use. A Resource Quota is a namespace-level policy that limits the total amount of resources that all Pods in a namespace can collectively consume. Resource Quota enforces aggregate limits, while Requests and Limits control individual behavior.

You set a Request of 256Mi for container A. That guarantees container A at least 256Mi. A Resource Quota on the namespace might say all containers combined cannot exceed 10Gi of memory. Both work together but at different scopes.

Resource Requests and LimitsvsLimitRange

LimitRange is a policy that sets default Requests and Limits for containers in a namespace if they are not specified in the Pod spec. You might set a default Limit of 512Mi memory in a LimitRange so any container without an explicit limit automatically gets that cap. Resource Requests and Limits are the actual values assigned to a specific container, while LimitRange provides defaults and constraints.

If you forget to set a Limit on a container, a LimitRange can automatically assign one. But the container still has its own explicit or defaulted Requests and Limits.

Resource Requests and LimitsvsNode Capacity

Node capacity is the total amount of CPU and memory physically available on a Kubernetes worker node. Resource Requests and Limits are settings for a container that describe its needs and caps. Node capacity is the ceiling that all containers on that node share. Requests must fit within node capacity, but they do not define the node itself.

A node has 4 CPU cores and 16 GB RAM. Your container has a Request of 1 CPU and 2 GB RAM. The scheduler will only place it if the node has at least 1 free CPU and 2 GB free RAM. The node capacity is the pool these requests draw from.

Step-by-Step Breakdown

1

Define the container resource requirements

Start by determining how much CPU and memory your application needs under normal load and under peak load. Measure or estimate the minimum guaranteed amount (Request) and the maximum allowed amount (Limit). Write these as values in the containers[].resources field of your Pod spec.

2

Write the YAML manifest with the resources block

In the Pod or Deployment YAML, add a resources field under the container specification. Inside it, add a requests object with cpu and memory keys, and a limits object with the same keys. Use correct units: millicpus (e.g., 500m) for CPU, and Mi or Gi for memory. Apply the manifest to the cluster using kubectl apply.

3

Kubernetes scheduler evaluates Requests for placement

When you create the Pod, the kube-scheduler looks at the sum of Requests from all containers in the Pod. It then checks each node's allocatable resources. It selects a node that has at least that much free CPU and memory. If no node meets the Requests, the Pod stays in Pending state with an event explaining the insufficient resources.

4

Kubelet enforces Limits at runtime using cgroups

Once the Pod runs on a node, the kubelet configures Linux cgroups to enforce the Limits. For CPU, it sets CPU CFS quota and period. For memory, it sets the memory limit in bytes. The container processes are constrained by these cgroup settings, preventing them from exceeding the Limits.

5

Monitor for throttling or OOM kills

Use kubectl top pod and kubectl describe pod to check resource usage. If a container hits its CPU Limit, you will see high throttling in container metrics. If it hits its memory Limit, the container will restart with an OOMKilled reason. Adjust Limits based on observed usage to balance performance and stability.

6

Adjust Requests and Limits based on observed usage

Production workloads change over time. Use monitoring tools or Kubernetes metrics server to track actual resource consumption. Update the YAML with new Requests and Limits, then roll out the update. Consider using the Vertical Pod Autoscaler in recommendation mode to get suggested values.

Practical Mini-Lesson

Resource Requests and Limits are not just theoretical settings; they are practical tools you use daily when deploying and managing applications on Kubernetes. As a developer or administrator, you must know how to determine the right values for your workloads. Start by running your application in a test environment without any limits. Use kubectl top pod or a monitoring tool like Prometheus to observe actual CPU and memory usage over time, especially under load. Record the average and peak values.

For the Request, choose a value slightly above the average usage. This guarantees the container a baseline that keeps it stable during normal operation. For the Limit, choose a value slightly above the peak usage. This allows the container to handle short spikes without being killed, but prevents it from consuming unlimited resources in case of a memory leak or runaway process.

A common production pattern is to set Requests and Limits equal to each other for critical services. This gives a Guaranteed QoS class, which means the container is never evicted under resource pressure unless the node fails. For less critical batch jobs, you might set Requests lower than Limits to allow bursting, accepting a Burstable QoS class that could be evicted under node pressure.

When configuring, always use the correct units: for CPU, 1 CPU core = 1000m. Do not use decimals like 0.5; use 500m. For memory, always use Mi or Gi. Never use MB or GB. Remember that 1 Gi = 1024 Mi. A common pitfall is writing 1G instead of 1Gi, which Kubernetes interprets as 1,000,000,000 bytes instead of 1,073,741,824 bytes, causing a small but potentially significant discrepancy.

In real deployments, you will often use Helm charts or Kustomize to manage YAML templates. These tools allow you to set resource values as parameters, making it easy to adjust per environment. For example, you might use 256Mi memory in development and 2Gi in production.

If a container is repeatedly OOMKilled, do not just increase the Limit blindly. Check if the application has a memory leak that needs fixing. Use tools like heap dumps or profiling to understand the memory usage pattern. Similarly, if a container is heavily throttled on CPU, consider optimizing the code or adding more replicas rather than just raising the Limit.

Resource Requests and Limits also interact with cluster autoscaling. If you set Requests too high, the cluster autoscaler may add unnecessary nodes, increasing cloud costs. If you set them too low, Pods may be scheduled on overloaded nodes. Finding the sweet spot requires monitoring and iteration. Many teams use the Vertical Pod Autoscaler to automate this tuning.

Finally, remember that setting no Requests or Limits at all is almost never appropriate for production. It puts your application and other workloads at risk. Always define at least a Limit to protect the node, and a Request to protect your container.

Memory Tip

Remember 'R-L' as 'Reserve Low, Leap High': The Request is the low guaranteed reserve, and the Limit is the high ceiling you can leap up to. Or think 'Request is what you need to start, Limit is the roof you cannot break.'

Covered in These Exams

Related Glossary Terms

Frequently Asked Questions

What happens if I set only a Request but no Limit?

The container is guaranteed the requested amount, but it can use unlimited CPU and memory on the node. This can lead to resource starvation for other containers. It is best to set both.

Can I set different Requests and Limits for multiple containers in the same Pod?

Yes, each container in a Pod has its own resources block. You can set different values for each container. The scheduler considers the sum of all Requests in the Pod.

Why does my Pod stay in Pending state after setting Requests?

This usually means no node has enough free resources to meet the total Requests. Check the node capacity and events using kubectl describe pod <pod-name>. Reduce the Requests or add more nodes.

What is the difference between CPU throttling and OOMKilled?

CPU throttling happens when a container exceeds its CPU Limit but stays running, just slower. OOMKilled happens when a container exceeds its memory Limit and is forcefully stopped.

How do I check the current resource usage of a Pod?

Use the command kubectl top pod <pod-name> to see live CPU and memory usage. You can also use kubectl describe pod for historical events like OOM kills.

Do Resource Requests and Limits affect Kubernetes billing in cloud environments?

They do not directly determine billing, but they influence autoscaling and node count. Higher Requests may trigger more nodes, increasing costs. Most cloud providers charge based on node usage, not individual container limits.

Should I set Requests and Limits for all containers in development?

Yes, even in development. It helps you catch configuration errors early and makes your manifests consistent across environments. Development clusters also benefit from resource fairness.

Summary

Resource Requests and Limits are essential Kubernetes settings that control how CPU and memory are allocated to containers. The Request is the minimum amount of resources a container is guaranteed, used by the scheduler for placement decisions. The Limit is the maximum amount a container can use, enforced by the kernel at runtime.

Together, they ensure fair resource sharing, prevent noisy neighbors, and enable reliable application behavior. In certification exams like CKAD, you will be tested on writing YAML with correct syntax and units, troubleshooting OOM kills and scheduling failures, and understanding QoS classes. A common exam trap is confusing the role of Requests with Limits in crash scenarios.

Always set both values for every container, use proper units like Mi and m, and monitor actual usage to tune your settings. Mastering this concept is not optional for Kubernetes practitioners, as it directly impacts cluster stability, cost, and performance.