CNCFKubernetesApplication DevelopmentBeginner22 min read

What Does Metrics Server Mean?

Also known as: Metrics Server, Kubernetes Metrics Server, CKAD, CKA, Horizontal Pod Autoscaler

Reviewed byJohnson Ajibi· Senior Network & Security Engineer · MSc IT Security
On This Page

Quick Definition

The Metrics Server is a tool that collects CPU and memory usage from every node and pod in a Kubernetes cluster. It provides this data to other parts of Kubernetes, like the Horizontal Pod Autoscaler, so they can make decisions about scaling applications up or down. Think of it as a central dashboard that constantly takes snapshots of resource consumption across your entire cluster.

Must Know for Exams

The Metrics Server is explicitly tested in the Certified Kubernetes Application Developer (CKAD) exam under the core concepts and configuration topics. The CKAD curriculum expects candidates to understand how to view pod logs, check resource usage, and configure autoscaling. The kubectl top command, which depends entirely on the Metrics Server being installed and running, is a standard tool for these tasks. Candidates may be asked to verify that the Metrics Server is functional as a prerequisite before configuring a HorizontalPodAutoscaler. The CKAD exam also includes questions about resource quotas and limits, and understanding the Metrics Server helps candidates grasp how those limits are enforced and monitored.

For the Certified Kubernetes Administrator (CKA) exam, the Metrics Server is tested in the context of cluster monitoring and troubleshooting. CKA candidates must know how to install the Metrics Server from its YAML manifests, check its logs, and diagnose common issues like missing metrics or incorrect configurations. They may also need to use kubectl top to identify node-level resource pressure as part of troubleshooting workload failures. The Certified Kubernetes Security Specialist (CKS) exam touches on the Metrics Server indirectly, as understanding resource usage can help identify security anomalies such as cryptominers consuming excessive CPU.

Exam questions often present scenarios where a HorizontalPodAutoscaler is not working, and learners need to diagnose whether the Metrics Server is running and collecting data. A typical question might state that an HPA is configured but never triggers scaling, even under load. The correct answer may involve checking the Metrics Server pod logs, verifying that the metrics API is available, or confirming that the kubelet is exposing metrics on the expected port. Another common question type involves interpreting output from kubectl top to determine which node or pod is consuming the most resources. Learners must understand that the output shows current usage, not cumulative usage, and that the Metrics Server refreshes data periodically. The CKAD and CKA exams also test high-level understanding of the Metrics Server's role versus tools like Prometheus, clarifying that the Metrics Server does not store historical data and is not a replacement for full monitoring solutions.

Simple Meaning

Imagine you are running a large apartment building with many different tenants. Each tenant has a unit (like a pod in Kubernetes) and uses resources such as electricity and water (similar to CPU and memory). As the building manager, you need to know how much electricity and water each unit is using so you can adjust supply, plan for expansions, or ensure no single unit is overloading the system. The Metrics Server is like a smart meter system installed throughout the building. Every minute, it reads the electricity and water usage from each unit and sends that data to a central office. This central office (the Kubernetes control plane) can then use the data to decide, for example, if the building needs more water capacity for a new floor of apartments (scaling up).

The Metrics Server itself does not store long-term history or make decisions. It just reads real-time usage data and makes it available quickly. In Kubernetes, it runs as a simple deployment and collects resource metrics through the kubelet on each node. These metrics include CPU usage in millicores and memory usage in bytes. The metrics are then served via the Kubernetes API, which other components like the Horizontal Pod Autoscaler and the kubectl top command can query. Because the Metrics Server only keeps the most recent data (usually for about one minute), it is designed for lightweight, fast, in-memory aggregation rather than as a full monitoring solution. This makes it perfect for autoscaling decisions that need up-to-date information without lag.

A common analogy is a parking lot with hundreds of spaces. The Metrics Server is like a camera system that counts how many cars are parked in each section at this moment. It does not track cars over days or weeks, and it does not predict future parking needs. It just provides the current count to the parking lot manager, who can then decide to open more sections or direct cars elsewhere. Similarly, when your web application experiences a traffic spike, the Metrics Server quickly reports the increased CPU usage of your pods, allowing the Horizontal Pod Autoscaler to add more pod replicas automatically.

Full Technical Definition

The Metrics Server is a component of the Kubernetes ecosystem that implements the Metrics API, a standard API for exposing resource usage metrics. It was originally designed as part of the Kubernetes Heapster project, but Heapster was deprecated in favor of the Metrics Server due to its simpler, more focused design. The Metrics Server runs as a single deployment within a Kubernetes cluster, typically with one replica, and it collects metrics from each node's kubelet using the Summary API endpoint. The kubelet, which runs on every node, gathers resource usage statistics for all pods on that node using cAdvisor, an embedded container monitoring tool. The Metrics Server then aggregates these per-node metrics and makes them available cluster-wide through the Kubernetes API server.

The Metrics Server uses an in-memory storage model, meaning it does not persist metrics to disk or a database. It polls the kubelet on each node at a configurable interval, typically every 15 seconds or 60 seconds, and retains only the most recent batch of metrics. This design prioritizes low resource overhead and fast response times, which are critical for autoscaling components like the Horizontal Pod Autoscaler (HPA) that need to react quickly to changing workloads. The Metrics Server is also designed to handle large clusters, though it has some practical limits depending on the cluster size and configuration. For clusters with more than 100 nodes, users often need to tune the Metrics Server's resource requests and limits or use a higher-performance version like the Metrics Server with metric resolution adjustments.

From a protocol perspective, the Metrics Server communicates with the kubelet over HTTPS using the kubelet's secure port (typically 10250). It uses the Kubernetes ServiceAccount authentication model to access the kubelet API. The Metrics Server then exposes the aggregated metrics through the Kubernetes API server at the endpoint /apis/metrics.k8s.io/v1beta1. This endpoint is versioned (e.g., v1beta1 or v1beta2) and follows the standard Kubernetes API conventions. The metrics returned are structured as ResourceMetric objects, which include the pod name, namespace, and resource usage values for CPU (in millicores, e.g., 100m) and memory (in bytes, e.g., 50Mi). This data is then used by the HPA, the Vertical Pod Autoscaler (VPA), and the kubectl top command. It is important to note that the Metrics Server is not a full monitoring solution and does not provide metrics like disk I/O, network bandwidth, or application-level metrics. For those, users need tools like Prometheus or Datadog. The Metrics Server is specifically designed for resource metrics required by Kubernetes autoscaling and scheduling components.

Real-Life Example

Think of a large public library with many reading rooms, computer stations, and study cubicles. Each section of the library represents a node in a Kubernetes cluster, and each individual study cubicle or computer station is a pod. The library staff needs to know how many people are using each section at any given moment to decide whether to open more rooms or call in extra staff. The Metrics Server is like a librarian who walks through the entire library every five minutes, counting the number of people in each room and at each computer station. This librarian does not record names, check out books, or track reading habits. They simply count current occupancy and report those numbers to the head librarian's office (the Kubernetes control plane).

Now, imagine that the head librarian notices that the computer section is completely full while the quiet reading room is nearly empty. Based on the counts from the Metrics Server librarian, the head librarian decides to move some computers from the quiet reading room into the computer section to better serve patrons. In Kubernetes terms, the Horizontal Pod Autoscaler sees that the CPU usage of the web server pods is high (the computer section is full) and decides to scale up by adding more pod replicas (more computers). The Metrics Server provides the exact usage data needed for that decision.

If the library had no such librarian walking the floors, the head librarian would have to guess about occupancy, rely on outdated reports, or manually call each section. Similarly, without the Metrics Server, Kubernetes autoscalers would lack the real-time data they need to make accurate scaling decisions. The library also does not keep a permanent record of these counts day after day. The librarian only reports the current snapshot. This matches the Metrics Server's in-memory, short-term data storage model. The analogy further illustrates that the librarian is lightweight and does not interfere with library patrons. They simply observe and report, just as the Metrics Server consumes minimal cluster resources and does not affect running applications.

Why This Term Matters

The Metrics Server is a foundational component for any Kubernetes cluster that needs automated scaling, efficient resource allocation, or basic visibility into resource usage. Without it, the Horizontal Pod Autoscaler cannot function because it has no source of current CPU or memory metrics to base its scaling decisions on. The Vertical Pod Autoscaler, which adjusts resource requests and limits for individual pods, also depends on the Metrics Server for data. This means that organizations using Kubernetes for production workloads often cannot achieve cost efficiency or performance stability without it. For example, an e-commerce platform experiencing a flash sale would rely on the Horizontal Pod Autoscaler to increase the number of web server pods as traffic spikes. The Metrics Server provides the real-time CPU metrics that trigger that scaling action. Without it, the platform would either crash under load or waste money by running too many pods all the time.

In real IT work, the Metrics Server also simplifies resource monitoring via the kubectl top command. System administrators use kubectl top nodes and kubectl top pods to quickly see which nodes are overloaded or which pods are consuming excessive memory. This is invaluable for troubleshooting performance issues, capacity planning, and spot-checking cluster health. Additionally, the Metrics Server integrates with the Kubernetes scheduler, enabling it to make more informed placement decisions. When a new pod needs to be scheduled, the scheduler can use metrics data to avoid placing it on a node that is already close to its resource limits.

The Metrics Server also matters because it is a lightweight, officially supported component of the Kubernetes ecosystem. It is part of the Kubernetes Metrics Project and is recommended for autoscaling in production clusters. Its simplicity means it is easy to install, configure, and upgrade. Many cloud providers include it by default in their managed Kubernetes services like Amazon EKS, Google GKE, and Azure AKS. For DevOps engineers and platform teams, understanding the Metrics Server is essential for building reliable, scalable infrastructure. It also serves as a gateway to more advanced observability tools. While the Metrics Server itself only provides CPU and memory metrics, its data feeds into other systems like Prometheus via adapters, enabling richer monitoring and alerting. Thus, mastering the Metrics Server is not just about one component but about understanding how Kubernetes manages resources at scale.

How It Appears in Exam Questions

In the CKAD and CKA exams, the Metrics Server appears in several distinct question patterns. First, there are configuration questions where candidates must enable or install the Metrics Server. For example, an exam task might say: 'The Metrics Server is not installed in this cluster. Install it from the official YAML manifests located at /opt/kubernetes/metrics-server.yaml.' The candidate would then apply the manifest, verify that the pod is running, and confirm that kubectl top nodes produces output. This tests practical knowledge of Metrics Server installation and troubleshooting.

Second, there are scenario questions involving the Horizontal Pod Autoscaler. A typical prompt might describe a deployment that runs a web application. The candidate must create an HPA that scales the deployment based on CPU utilization at 80%. The HPA will not work unless the Metrics Server is functional. The exam might also ask candidates to check the status of the HPA after creation. They would use kubectl get hpa which shows a TARGETS column with current vs. target CPU usage. If the Metrics Server is missing, the TARGETS column shows '<unknown>' or no data. Candidates must recognize this symptom and take corrective action.

Third, there are troubleshooting questions where the Metrics Server itself is failing. For instance, a question might state: 'Users report that the kubectl top command returns an error: metrics not available yet. Diagnose and fix the issue.' Candidates would check the Metrics Server pod logs for errors, verify network connectivity to kubelets, and ensure that the metrics API version is supported. Common causes include missing --kubelet-insecure-tls flag, node name resolution issues, or the Metrics Server not having enough CPU to collect data in a large cluster.

Fourth, there are architecture-related questions in the CKA exam that test understanding of the Metrics Server's limitations. For example, a question might ask: 'Which of the following metrics can the Metrics Server provide? A) CPU usage B) Memory usage C) Disk I/O D) Network throughput.' The correct answer is CPU and memory only. Another question might ask about the data retention period: metrics are stored in memory and only the most recent sample is retained. These questions assess whether candidates understand the exact scope of the Metrics Server's functionality and how it differs from full monitoring solutions like Prometheus.

Study cncf-ckad

Test your understanding with exam-style practice questions.

Practise

Example Scenario

A company runs an online ticketing platform for concerts and events. During a major festival, ticket sales surge, and the traffic to their Kubernetes cluster increases tenfold. The platform's deployment has three pod replicas running a Node.

js application. The team has configured a HorizontalPodAutoscaler (HPA) to maintain CPU utilization at 70%. When the surge hits, the existing three pods quickly reach 95% CPU usage.

The HPA queries the Metrics Server every 15 seconds and sees that the current CPU usage exceeds the target. It triggers a scale-up event, adding three more pod replicas to handle the load. After the ticket sale ends, traffic drops, and the HPA scales back down to three replicas.

In this scenario, the Metrics Server is the silent workhorse that makes the autoscaling possible. Without it, the HPA would have no data to act on, and the platform would likely crash or become unresponsive during the flash sale. The scenario also highlights that the Metrics Server must be properly installed and configured before the HPA can function.

A system administrator would use kubectl top pods to verify that the Metrics Server is collecting data before relying on autoscaling. This scenario is representative of real-world event-driven applications that depend on dynamic scaling for availability and cost efficiency.

Common Mistakes

Thinking the Metrics Server stores historical data

The Metrics Server only keeps the most recent metrics snapshot in memory, with no long-term storage capability. It is not a monitoring database and cannot provide trend analysis or historical graphs.

Use Prometheus or a similar time-series database for historical monitoring. The Metrics Server is only for real-time resource metrics used by autoscaling and kubectl top.

Expecting the Metrics Server to provide application-level metrics like request latency or error rates

The Metrics Server only collects resource metrics (CPU and memory) from the kubelet. It does not instrument applications or expose things like HTTP request count, database query times, or custom business metrics.

For application-level monitoring, use a dedicated APM tool or instrument your application with Prometheus client libraries and scrape those metrics separately.

Assuming the Metrics Server is always installed by default in managed Kubernetes clusters

While most managed clusters like Amazon EKS, Google GKE, and Azure AKS do install the Metrics Server by default, some configurations or older versions may not. Additionally, self-managed clusters never include it by default.

Always verify the Metrics Server installation. Run kubectl get deployment metrics-server -n kube-system and check for a running pod. If missing, follow the official installation guide.

Misinterpreting the output of kubectl top as cumulative usage rather than current usage

The kubectl top command shows the current resource usage at the moment of the last Metrics Server scrape. It is not a total or average over time. Using it to diagnose long-term trends can be misleading.

Remember that the Metrics Server provides point-in-time snapshots. For trend analysis, use a monitoring solution with persistence, or use the --watch flag on kubectl top to observe changes over a short period.

Exam Trap — Don't Get Fooled

An exam question shows that a HorizontalPodAutoscaler's TARGETS column displays '<unknown>/80%' and asks you to identify the cause. Learners often select 'The HPA is misconfigured' or 'The deployment has no CPU requests set'. Remember that '<unknown>' in the HPA TARGETS field indicates that the metrics API endpoint is not returning data, which is usually because the Metrics Server is not installed, not running, or not reachable.

Always verify the Metrics Server first by running kubectl get deployment metrics-server -n kube-system and kubectl top pods. Only after confirming the Metrics Server works should you check pod resource requests.

Commonly Confused With

Metrics ServervsPrometheus

Prometheus is a full monitoring and alerting toolkit that collects and stores a wide range of metrics (CPU, memory, disk, network, application-specific) with long-term retention and powerful querying. The Metrics Server, in contrast, only collects CPU and memory metrics, stores no history, and is designed solely for Kubernetes autoscaling and basic real-time checks.

The Metrics Server is like a thermometer that shows the current temperature. Prometheus is like a weather station that records temperature, humidity, rainfall, and wind speed over weeks and months, allowing you to analyze trends.

Metrics ServervsHeapster

Heapster was the predecessor to the Metrics Server in Kubernetes. It aggregated metrics across the cluster and supported multiple backends like InfluxDB. Heapster is now deprecated and removed from Kubernetes. The Metrics Server is the replacement, but it is simpler and only retains metrics in memory for short-term use, whereas Heapster could send data to external storage for history.

Heapster was like a librarian who counted people and also logged the counts into a notebook for future reference. The Metrics Server is like a librarian who only counts and announces the current count, then forgets it.

Metrics Servervskubectl top

kubectl top is a command-line tool that displays resource usage from the Metrics Server. It is not a separate component but a consumer of Metrics Server data. Learners sometimes confuse the command with the underlying server. The Metrics Server is the data provider, and kubectl top is the interface to view that data.

The Metrics Server is the plumbing system that delivers water (metrics data), while kubectl top is the faucet (command) that lets you access it.

Step-by-Step Breakdown

1

Metrics Server Installation

The Metrics Server is deployed as a Kubernetes Deployment in the kube-system namespace. It requires a ServiceAccount with appropriate RBAC permissions to access kubelets on all nodes. Installation is typically done using the official YAML manifests from the Kubernetes metrics repository.

2

Node Discovery and Communication

Upon startup, the Metrics Server discovers all nodes in the cluster using the Kubernetes API. It then begins polling each node's kubelet on port 10250 using the Summary API endpoint. It authenticates using its ServiceAccount token and communicates over HTTPS.

3

Data Collection from Kubelet

The kubelet on each node uses cAdvisor to gather resource usage statistics for every pod running on that node. This includes CPU usage (in millicores) and memory usage (in bytes). The kubelet packages this data into a Summary API response and sends it back to the Metrics Server.

4

Data Aggregation and Storage

The Metrics Server aggregates the per-node data into a cluster-wide view. It stores this aggregated data in memory, organized by pod and namespace. Only the most recent data sample is kept; old data is discarded immediately. The Metrics Server repeats this collection cycle every configurable interval, defaulting to 15 seconds.

5

Exposing Metrics via the API

The Metrics Server registers itself with the Kubernetes API server as an aggregation layer. It exposes the collected metrics at the endpoint /apis/metrics.k8s.io/v1beta1. This endpoint returns ResourceMetric objects that can be consumed by kubectl top, the Horizontal Pod Autoscaler, and the Vertical Pod Autoscaler.

6

Consumption by Autoscalers and CLI

Components like the HorizontalPodAutoscaler periodically query the metrics API to get current resource usage. Based on the target utilization defined in the HPA specification, it calculates whether scaling is needed. Similarly, kubectl top nodes and kubectl top pods render the data for human operators.

Practical Mini-Lesson

The Metrics Server is one of the first components you should verify when setting up a Kubernetes cluster that will use autoscaling. As a professional, you need to understand not only how to install it but also how to troubleshoot it when things go wrong. Installation is straightforward but often fails due to network issues or missing configuration flags. The most common installation command is kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml. However, in many on-premises or self-managed clusters, this default installation fails because the Metrics Server cannot verify the kubelet's TLS certificate. This occurs when the kubelet uses a self-signed certificate. The fix is to add the --kubelet-insecure-tls flag to the Metrics Server deployment. Alternatively, if node names are not resolvable via DNS, you may need to add the --kubelet-use-node-status-port flag or configure DNS entries for node names.

Once the Metrics Server is running, validate it with kubectl get deployment metrics-server -n kube-system and check its logs with kubectl logs -n kube-system <metrics-server-pod-name>. The logs should show successful data collection from each node. If the logs show errors like 'failed to get node info' or 'dial tcp: lookup', the issue is usually node name resolution. To resolve this, you can either configure DNS or use the --kubelet-preferred-address-types flag to force the Metrics Server to use internal IPs instead of hostnames.

In production, you may need to tune the Metrics Server for larger clusters. The default resource request is low (e.g., 100m CPU and 200Mi memory), which may be insufficient for clusters with hundreds of nodes. You can increase these values in the deployment manifest. Also, consider adjusting the --metric-resolution flag to a higher value (e.g., 60 seconds) to reduce load on the kubelets. The Metrics Server can also be run with multiple replicas for high availability, but this is rarely needed because it is designed to be lightweight and fast to restart.

A common real-world task is to use the Metrics Server data for capacity reports. While the Metrics Server does not store history, you can write a script that periodically runs kubectl top and saves the output to a file, giving you a rough view of usage over time. For formal capacity planning, you should use Prometheus and Grafana, which can scrape the same metrics endpoint and store them long-term. The Metrics Server also feeds the Kubernetes Dashboard, giving you a visual interface for resource usage without needing to install additional monitoring tools. Understanding these practical aspects ensures you can keep your cluster running efficiently and respond to scaling needs effectively. As a Kubernetes administrator, the Metrics Server is a tool you will use daily, whether for debugging why an HPA is not scaling or for quickly checking why a node is overloaded.

Memory Tip

Think of the Metrics Server as the 'fuel gauge' for your Kubernetes cluster: it shows only what you have right now, not your driving history or future range.

Covered in These Exams

Related Glossary Terms

Frequently Asked Questions

Is the Metrics Server required for a Kubernetes cluster to run?

No, the Metrics Server is optional. Your cluster will function without it, but you will not be able to use kubectl top, the Horizontal Pod Autoscaler, or the Vertical Pod Autoscaler for CPU and memory metrics.

Can the Metrics Server be used with a managed Kubernetes service like Amazon EKS?

Yes, most managed Kubernetes services install the Metrics Server by default. If not, you can install it manually using the official YAML manifests.

How often does the Metrics Server collect metrics?

By default, the Metrics Server collects metrics every 15 seconds, but this interval is configurable with the --metric-resolution flag.

Does the Metrics Server support custom metrics like application response time?

No, the Metrics Server only supports CPU and memory metrics provided by the kubelet. For custom metrics, use the Kubernetes Custom Metrics API with a solution like Prometheus.

What happens if the Metrics Server stops working mid-operation?

The Horizontal Pod Autoscaler will stop receiving new metrics data and will retain its last known scaling decisions. It will not scale up or down until the Metrics Server resumes providing data.

Can I run multiple replicas of the Metrics Server for high availability?

Yes, you can run multiple replicas, but it is not typically necessary because the Metrics Server is stateless and restarts quickly. If you do run multiple replicas, ensure they use leader election to avoid duplicate data collection.

Summary

The Metrics Server is a lightweight, cluster-wide aggregator of CPU and memory usage in Kubernetes, providing real-time resource metrics essential for autoscaling and basic monitoring. It collects data from each node's kubelet, stores it temporarily in memory, and exposes it through the Kubernetes metrics API. Without the Metrics Server, the Horizontal Pod Autoscaler and Vertical Pod Autoscaler cannot function, and operators lose visibility into resource consumption via kubectl top.

For certification exams like CKAD and CKA, understanding its installation, troubleshooting, and limitations is critical. Common exam traps include confusing the Metrics Server with full monitoring solutions like Prometheus, and failing to diagnose that an HPA with '<unknown>' targets is often due to a missing or misconfigured Metrics Server. Remember that the Metrics Server is purely for real-time CPU and memory metrics with no historical retention.

It is a foundational tool for any Kubernetes professional, enabling efficient resource utilization, cost management, and automated scaling in production environments. Always verify its presence and health before configuring autoscaling, and be prepared to troubleshoot common issues like TLS certificate validation and node name resolution. This knowledge will serve you well both in exams and in real-world cluster administration.