AZ-104Chapter 105 of 168Objective 3.3

AKS Scaling: HPA, KEDA, Cluster Autoscaler

This chapter covers the three primary scaling mechanisms for Azure Kubernetes Service (AKS): Horizontal Pod Autoscaler (HPA), Kubernetes Event-Driven Autoscaling (KEDA), and the Cluster Autoscaler. These are critical for the AZ-104 exam under Compute objective 3.3 (Configure and manage scaling of Azure Kubernetes Service). Approximately 10-15% of exam questions touch on AKS scaling, often requiring you to choose the correct autoscaler for a given scenario or to identify misconfigurations. Understanding the distinct roles and interplay of these components is essential for designing cost-efficient, responsive applications on AKS.

25 min read
Intermediate
Updated May 31, 2026

Scaling a Taxi Fleet with Dispatchers

Imagine a taxi company that serves a city. The company has a fleet of taxis (pods) and a central dispatcher (Kubernetes control plane). Customers request rides (workloads). The dispatcher assigns rides to available taxis. Now, if ride requests surge, the dispatcher can only assign rides to existing taxis; if all taxis are busy, new requests queue up. To handle surges, the company uses three mechanisms: (1) The Horizontal Pod Autoscaler (HPA) is like a rule that says 'if the average wait time per taxi exceeds 5 minutes, hire 10 more taxis.' The dispatcher checks every 15 seconds and if the condition is met, it requests more taxis from the garage. (2) KEDA is like a smarter rule that can also look at external signals: 'if the number of pending ride requests in the call center queue exceeds 100, hire 5 taxis.' This allows scaling based on events, not just CPU. (3) The Cluster Autoscaler is like the garage manager who decides whether to buy more cars (add nodes) or park some (remove nodes). When the dispatcher requests more taxis but no taxis are idle, the garage manager checks if there are enough parking spots (node capacity). If not, it purchases a new car (provisions a new VM node) and adds it to the fleet. Conversely, if many taxis are idle for a long time, the garage manager sells some cars (deletes nodes). The key is that HPA and KEDA scale pods (taxis), while Cluster Autoscaler scales nodes (cars). They work together: HPA requests more pods, but if the cluster is full, Cluster Autoscaler adds nodes to make room. Without Cluster Autoscaler, HPA would fail to schedule new pods when nodes are at capacity.

How It Actually Works

What is AKS Scaling and Why It Exists

Azure Kubernetes Service (AKS) is a managed Kubernetes offering. Kubernetes workloads run in pods, which are scheduled on nodes (VMs). In production, workloads have variable demand. Scaling ensures that the right number of pods and nodes are running to meet demand without overprovisioning (wasting money) or underprovisioning (causing performance issues). AKS provides three complementary scaling tools:

Horizontal Pod Autoscaler (HPA): Scales the number of pod replicas based on observed CPU/memory utilization or custom metrics.

KEDA (Kubernetes Event-Driven Autoscaling): Extends HPA to support event-driven scaling based on external metrics (e.g., queue length, Kafka lag, HTTP requests).

Cluster Autoscaler: Scales the number of nodes in the cluster when pods cannot be scheduled due to resource constraints, and scales down when nodes are underutilized.

How HPA Works Internally

HPA is a Kubernetes resource (horizontalpodautoscaler) that periodically adjusts the number of replicas in a Deployment, ReplicaSet, or StatefulSet to maintain observed metrics close to target values.

Mechanism:

1.

Metric Collection: HPA queries the Kubernetes Metrics Server, which collects resource usage (CPU and memory) from the kubelet on each node. The Metrics Server exposes these via the Resource Metrics API. By default, metrics are collected every 15 seconds, but HPA has its own sync period (default: 15 seconds, configurable via --horizontal-pod-autoscaler-sync-period).

2.

Desired Replica Calculation: For each target metric, HPA computes the desired replica count using the formula:

desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]

For example, if current CPU utilization is 80% and target is 50%, and there are 10 pods, desired = ceil[10 * (80/50)] = ceil[16] = 16 replicas.

For custom metrics, the calculation is similar but uses the custom metric value.

3. Stabilization and Cooldown: To avoid thrashing (rapid scaling up and down), HPA has: - Cooldown delay: After scaling down, HPA waits a default of 5 minutes before considering another scale-down (controlled by --horizontal-pod-autoscaler-downscale-stabilization). There is no similar stabilization for scale-up; scale-up happens as soon as the condition is met. - Tolerance: HPA uses a tolerance of 0.1 (10%) to avoid scaling for negligible changes. If the desired replica count is within 10% of current, no scaling occurs.

4.

Scaling Decision: If the desired replica count differs from current, HPA updates the target resource's replicas field. The Deployment controller then creates or destroys pods accordingly.

Key Defaults and Timers: - HPA sync period: 15 seconds (can be changed via --horizontal-pod-autoscaler-sync-period) - Downscale stabilization: 5 minutes - Tolerance: 0.1 (10%) - Metrics Server: collects resource metrics every 15 seconds

KEDA: Event-Driven Autoscaling

KEDA is an open-source project that extends HPA by allowing scaling based on external event sources (e.g., Azure Service Bus, Azure Storage Queue, Kafka, RabbitMQ). It works alongside the standard HPA, not replacing it.

How KEDA Works:

1.

ScaledObject: Users define a ScaledObject custom resource that specifies the target deployment and the event source (e.g., Azure Service Bus queue). The ScaledObject includes the queue name, connection string, and threshold (e.g., queue length > 10).

2.

KEDA Operator: The KEDA operator runs in the cluster and watches ScaledObjects. It creates a corresponding HPA object that targets the deployment, using a custom metric provided by KEDA's metrics server.

3.

Metric Adapter: KEDA includes a metrics adapter that exposes external metrics to the Kubernetes API. The HPA then queries this adapter to get the current queue length or other event source metric.

4.

Scaling: When the queue length exceeds the threshold, the HPA scales up the deployment. When it drops below, it scales down. The scaling behavior follows HPA rules (cooldown, tolerance).

Key Difference from HPA: HPA can only use built-in resource metrics (CPU/memory) or custom metrics that are available via the custom metrics API. KEDA provides a simple way to expose external metrics from dozens of event sources without writing custom adapters.

Azure-Specific Integration: KEDA can authenticate to Azure services using Pod Managed Identities or Azure AD Workload Identity, avoiding connection strings in secrets.

Cluster Autoscaler

Cluster Autoscaler (CA) is a Kubernetes component that automatically adjusts the number of nodes in a node pool. It is not part of the core Kubernetes but is available as an add-on in AKS.

How Cluster Autoscaler Works:

1.

Unschedulable Pods: CA watches for pods that are in 'Pending' state because no node has enough resources (CPU, memory, or other constraints like node affinity). It checks every 10 seconds (default scan interval).

2.

Scale-Up Decision: If there are unschedulable pods, CA calculates the minimum number of nodes needed to schedule those pods. It considers the resource requests of the pending pods and the capacity of node templates (VM sizes). It then calls the Azure API to add new nodes to the node pool. The new nodes are provisioned and join the cluster (typically takes 2-5 minutes).

3.

Scale-Down Decision: CA also checks for nodes that are underutilized for a configurable period (default: 10 minutes). A node is considered underutilized if the sum of CPU and memory requests of all pods on that node is less than 50% of the node's allocatable resources (configurable via --scale-down-utilization-threshold). Nodes running critical system pods (kube-system) are never scaled down. Also, pods that are not evictable (e.g., with PodDisruptionBudget constraints) can block scale-down.

4.

Scale-Down Process: Once a node is identified for removal, CA first cordons the node (marks it unschedulable) and then evicts pods gracefully (respecting PDBs). After all pods are evicted, the node is deleted. If any pod fails to evict, CA aborts the scale-down for that node.

Key Defaults and Timers: - Scan interval: 10 seconds - Scale-down utilization threshold: 0.5 (50%) - Scale-down unneeded time: 10 minutes (node must be underutilized for this long before removal) - Max node count per node pool: configurable, default no limit - Min node count per node pool: configurable, default 1

Interaction Between HPA, KEDA, and Cluster Autoscaler

These three components work together in a feedback loop:

1.

HPA or KEDA scales pods up/down based on metrics.

2.

If there are not enough nodes to schedule new pods, pods remain pending.

3.

Cluster Autoscaler detects pending pods and adds nodes.

4.

After nodes are added, pods get scheduled.

5.

When load decreases, HPA/KEDA scales down pods.

6.

Cluster Autoscaler detects underutilized nodes and removes them.

Potential Pitfalls: - Thrashing: If HPA scales up too aggressively and CA adds nodes, then HPA scales down quickly, CA may remove nodes too soon, leading to oscillation. Proper stabilization settings help. - Delay: CA takes minutes to add nodes, so for sudden spikes, HPA may not be able to scale quickly enough. Consider using Pod Priority and Preemption or overprovisioning. - Node Pool Constraints: CA only works within a node pool; it cannot add nodes from a different pool or VM size unless configured.

Configuration and Verification Commands

Enable Cluster Autoscaler during AKS cluster creation:

az aks create --resource-group myRG --name myAKSCluster --node-count 1 --enable-cluster-autoscaler --min-count 1 --max-count 10

Enable on existing cluster:

az aks update --resource-group myRG --name myAKSCluster --enable-cluster-autoscaler --min-count 1 --max-count 10

Create HPA manifest (hpa.yaml):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

Apply HPA:

kubectl apply -f hpa.yaml

Verify HPA status:

kubectl get hpa

Install KEDA via Helm:

helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespace

Create ScaledObject (example for Azure Service Bus Queue):

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: azure-servicebus-queue-scaler
spec:
  scaleTargetRef:
    name: my-deployment
  triggers:
  - type: azure-servicebus
    metadata:
      queueName: myqueue
      namespace: mynamespace
      connectionFromEnv: SERVICEBUS_CONNECTION_STRING
      queueLength: "5"

Check Cluster Autoscaler events:

kubectl get events --namespace kube-system | grep cluster-autoscaler

View CA logs:

kubectl logs -n kube-system -l app=cluster-autoscaler

Specific Numbers and Defaults to Memorize for Exam

HPA sync period: 15 seconds

HPA downscale stabilization: 5 minutes

HPA tolerance: 0.1 (10%)

Cluster Autoscaler scan interval: 10 seconds

Cluster Autoscaler scale-down utilization threshold: 0.5 (50%)

Cluster Autoscaler scale-down unneeded time: 10 minutes

Cluster Autoscaler max node count per pool: up to 1000 (but practical limits apply)

KEDA does not replace HPA; it creates HPA objects.

Cluster Autoscaler can be enabled per node pool (multiple node pools supported).

For Cluster Autoscaler to work, node pools must have a minimum size (min count) and maximum size (max count).

How These Interact with Other Azure Services

Azure Monitor and Container Insights: Provides metrics for HPA (CPU/memory) and can trigger alerts.

Azure Policy: Can enforce autoscaler configurations via policies.

Azure AD Pod Managed Identity: Used by KEDA to authenticate to Azure services without secrets.

Azure Load Balancer: Distributes traffic to pods; scaling affects load balancing.

Virtual Machine Scale Sets (VMSS): AKS uses VMSS for node pools; Cluster Autoscaler directly manages the VMSS instance count.

Exam Trap Patterns

Confusing HPA with Cluster Autoscaler: A common question describes a scenario where pods are pending due to insufficient node resources. The correct answer is to enable Cluster Autoscaler, not HPA. HPA only scales pods, not nodes.

Assuming KEDA replaces HPA: KEDA works with HPA; it does not replace it. The exam may ask which component to use for event-driven scaling; KEDA is the answer, but it still requires HPA underneath.

Forgetting downscale stabilization: Questions about rapid scaling down may require adjusting the downscale stabilization window.

Cluster Autoscaler minimum node count: If you set min count to 1, CA will never scale below 1 node, even if the cluster is idle. This is a common misconfiguration.

Metrics Server: HPA requires the Metrics Server to be installed. In AKS, it is installed by default, but if you create a custom cluster, you must install it.

Walk-Through

1

Configure HPA for CPU-based scaling

Start by deploying your application and exposing it via a service. Then create an HPA manifest with a target CPU utilization (e.g., 50%). Apply with `kubectl apply -f hpa.yaml`. The HPA controller will begin querying the Metrics Server every 15 seconds. If the average CPU utilization across all pods exceeds 50%, it calculates the desired replica count and updates the deployment's replicas field. The deployment controller then creates new pods. Conversely, if utilization drops below 50% for at least 5 minutes (downscale stabilization), it scales down. Monitor with `kubectl get hpa` to see current and target metrics.

2

Configure KEDA for event-driven scaling

Install KEDA using Helm into the cluster. Create a ScaledObject that references your deployment and specifies an event source (e.g., Azure Service Bus queue with queueLength threshold of 5). KEDA operator creates an HPA object that targets your deployment with a custom metric. The HPA then queries KEDA's metrics adapter for the queue length. When the queue length exceeds 5, HPA scales up the deployment. When it drops below, it scales down. Ensure your deployment can handle the events and that the connection string or managed identity is correctly set up. Test by sending messages to the queue and observing pod count.

3

Enable Cluster Autoscaler on node pool

Run `az aks update --enable-cluster-autoscaler --min-count 1 --max-count 10` for the node pool. The Cluster Autoscaler pod starts in kube-system. It scans every 10 seconds for pending pods. If any pods are pending due to resource constraints, it calculates the required nodes and calls Azure API to add VMs to the VMSS. The new nodes join the cluster after 2-5 minutes. For scale-down, it checks nodes that are less than 50% utilized for 10 minutes. It cordons, evicts pods, and deletes the node. Verify with `kubectl get nodes` and `kubectl describe pod` to see events.

4

Test scaling with load generation

Generate load on the application to trigger HPA. For example, use a pod that sends continuous HTTP requests: `kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://<service-ip>; done"`. Monitor HPA with `kubectl get hpa -w` and watch pod count increase. If the cluster is small, you may see pods become pending, which triggers Cluster Autoscaler to add nodes. After stopping load, observe scale-down: first HPA reduces pods, then after 10 minutes, CA removes nodes.

5

Troubleshoot scaling issues

Common issues: HPA not scaling because Metrics Server is not running (check `kubectl top pods`). Cluster Autoscaler not scaling because node pool min/max limits are reached (check `az aks show` for autoscaler config). KEDA not scaling because ScaledObject is misconfigured (check KEDA operator logs). Use `kubectl describe hpa` to see events and conditions. For CA, check `kubectl get events --namespace kube-system | grep cluster-autoscaler`. Ensure that pods have resource requests defined; without them, HPA cannot compute utilization. Also verify that the cluster has enough quota for additional VMs in the region.

What This Looks Like on the Job

In enterprise environments, AKS scaling is critical for cost management and performance. Consider a financial services company running a real-time transaction processing system on AKS. During peak trading hours, transaction volumes spike unpredictably. They use HPA with CPU and memory targets to scale the transaction processing pods. However, CPU utilization alone is not enough; they also need to scale based on the number of pending messages in an Azure Service Bus queue. They deploy KEDA to monitor queue length and scale pods accordingly. Additionally, they enable Cluster Autoscaler on their node pool (Standard_D4s_v3 VMs) with min=3 and max=50 to handle sudden node demand. This combination ensures that during a flash crash, the system scales out rapidly (KEDA triggers HPA, HPA scales pods, CA adds nodes). Without CA, pods would remain pending and transactions would be delayed. A common misconfiguration is setting the CA max node count too low (e.g., 5), causing scale-up to hit the ceiling and pods to stay pending. They learned this the hard way during a Black Friday sale when the cluster hit max nodes and performance degraded. Another scenario is a SaaS provider with multi-tenant workloads that have predictable daily patterns. They use HPA with custom metrics based on request latency. They also use PodDisruptionBudgets to ensure availability during scale-down. They set the CA scale-down unneeded time to 30 minutes to avoid removing nodes too quickly during brief lulls. A mistake they encountered was forgetting to set resource requests on pods; HPA could not calculate utilization, so it never scaled up. They now enforce resource requests via Azure Policy. In a third scenario, a media streaming platform uses KEDA to scale encoding pods based on the length of an Azure Storage Queue. They use GPU-enabled node pools for encoding tasks. Cluster Autoscaler is configured to scale GPU nodes from 0 to 10, but they set min=0 to save costs. However, they discovered that scaling from 0 takes 5+ minutes because the VMSS must provision GPU VMs, which are slower to allocate. They mitigated by keeping a small buffer of 1 node always running. These real-world examples highlight the importance of proper configuration, testing, and monitoring of all three scaling components.

How AZ-104 Actually Tests This

The AZ-104 exam tests AKS scaling under objective 3.3 (Configure and manage scaling of Azure Kubernetes Service). Key areas:

1.

HPA vs. Cluster Autoscaler: The exam will present scenarios where pods are pending or performance is poor. You must choose the correct autoscaler. Common wrong answer: selecting HPA when the issue is node capacity. Remember: HPA scales pods, Cluster Autoscaler scales nodes.

2.

KEDA use cases: Questions may ask which tool to use for scaling based on Azure Service Bus queue length. Many candidates incorrectly choose HPA alone, forgetting that HPA cannot natively consume external metrics. KEDA is the correct answer, but it works with HPA.

3.

Defaults and timers: The exam loves specific numbers. Memorize: HPA sync period 15s, downscale stabilization 5 min, tolerance 0.1; Cluster Autoscaler scan interval 10s, scale-down utilization threshold 0.5, unneeded time 10 min.

4.

Configuration commands: You may be asked to enable Cluster Autoscaler on an existing AKS cluster. The correct command is az aks update --enable-cluster-autoscaler --min-count 1 --max-count 10. Wrong answer: using az aks scale (which only sets node count statically).

5.

Multiple node pools: Cluster Autoscaler can be enabled per node pool. The exam may present a scenario with multiple node pools and ask which pool will scale. Remember that CA only scales the pool where unschedulable pods exist (considering node affinity).

6.

PodDisruptionBudget: CA respects PDBs during scale-down. If a PDB prevents eviction, the node may not be removed. This is a common edge case.

7.

Resource requests: HPA requires pods to have resource requests defined. Without them, CPU utilization cannot be calculated, and HPA will not scale. This is a frequent trick.

8.

Metrics Server: HPA depends on the Metrics Server. If it's not installed, HPA will not work. In AKS, it is pre-installed, but be aware for custom clusters.

9.

KEDA authentication: KEDA can use Pod Managed Identity or connection strings. The exam may test secure methods.

10.

Scaling behavior: The exam may ask about the order of operations. For example, when load increases: HPA scales pods first, then if nodes are insufficient, CA adds nodes. Not the reverse.

To eliminate wrong answers, focus on the underlying mechanism: if the question mentions 'pending pods', the solution involves Cluster Autoscaler. If it mentions 'metric-based scaling', think HPA or KEDA. If it mentions 'event-driven', think KEDA. Always consider the resource requests and limits.

Key Takeaways

HPA scales pod replicas based on CPU/memory utilization or custom metrics; it does not scale nodes.

Cluster Autoscaler scales nodes up when pods are pending due to resource constraints, and scales down underutilized nodes after 10 minutes.

KEDA enables event-driven scaling by creating HPA objects that consume external metrics from sources like Azure Service Bus.

HPA default sync period is 15 seconds; downscale stabilization is 5 minutes; tolerance is 10%.

Cluster Autoscaler default scan interval is 10 seconds; scale-down utilization threshold is 50%; unneeded time is 10 minutes.

For HPA to work, pods must have resource requests defined.

Cluster Autoscaler must be enabled per node pool with min and max count set.

KEDA does not replace HPA; it works with it.

Scaling order: HPA/KEDA scales pods first; if nodes are insufficient, Cluster Autoscaler adds nodes.

PodDisruptionBudgets can block Cluster Autoscaler from removing nodes.

In AKS, the Metrics Server is pre-installed; in custom clusters, it must be deployed separately.

The command to enable Cluster Autoscaler on an existing AKS cluster is `az aks update --enable-cluster-autoscaler --min-count 1 --max-count 10`.

Setting min node count to 0 can cause cluster instability; minimum of 1 is recommended.

KEDA can use Azure AD Pod Managed Identity for secure authentication to Azure services.

Cluster Autoscaler only scales within a node pool; it cannot add nodes from a different pool.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Horizontal Pod Autoscaler (HPA)

Scales the number of pod replicas.

Uses CPU/memory or custom metrics.

Sync period: 15 seconds.

Downscale stabilization: 5 minutes.

Does not add or remove nodes.

Cluster Autoscaler

Scales the number of nodes in a node pool.

Triggers on unschedulable pods.

Scan interval: 10 seconds.

Scale-down unneeded time: 10 minutes.

Adds or removes VMs via Azure API.

KEDA

Extends HPA to support event-driven scaling.

Supports Azure Service Bus, Kafka, etc.

Requires KEDA operator and ScaledObject CRD.

Can authenticate via managed identity.

Creates HPA objects automatically.

HPA (native)

Built into Kubernetes.

Only supports resource metrics and custom metrics API.

No external event sources natively.

Requires custom adapter for external metrics.

Manual configuration of metrics.

Watch Out for These

Mistake

HPA can scale nodes directly.

Correct

HPA only scales the number of pod replicas. It does not interact with nodes. Node scaling is handled by Cluster Autoscaler.

Mistake

KEDA replaces HPA entirely.

Correct

KEDA works alongside HPA. It creates HPA objects that use custom metrics provided by KEDA. HPA is still responsible for the actual scaling decision.

Mistake

Cluster Autoscaler scales down nodes immediately when pods are removed.

Correct

Cluster Autoscaler has a default scale-down unneeded time of 10 minutes. A node must be underutilized (below 50% of allocatable resources) for that duration before it is considered for removal.

Mistake

Setting min node count to 0 is safe and saves costs.

Correct

If min count is 0, Cluster Autoscaler can scale down to 0 nodes, but then the cluster will have no nodes to run system pods or new workloads. This can cause the cluster to become unavailable. Typically, min count should be at least 1.

Mistake

HPA works without resource requests defined on pods.

Correct

HPA uses the sum of resource requests to compute utilization. If pods do not have CPU or memory requests, the Metrics Server cannot report utilization, and HPA will not scale based on that metric.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between HPA and Cluster Autoscaler in AKS?

HPA (Horizontal Pod Autoscaler) adjusts the number of pod replicas based on metrics like CPU utilization. Cluster Autoscaler adjusts the number of nodes in a node pool when pods cannot be scheduled due to insufficient resources. In short, HPA scales pods, Cluster Autoscaler scales nodes. They work together: HPA may request more pods, and if the cluster is full, Cluster Autoscaler adds nodes to accommodate them.

Can I use HPA without the Metrics Server?

No, HPA requires the Metrics Server to collect resource metrics from nodes. Without it, HPA cannot obtain CPU/memory utilization data and will not scale. In AKS, the Metrics Server is installed by default, but if you create a custom cluster, you must install it manually.

How do I enable Cluster Autoscaler on an existing AKS cluster?

Use the Azure CLI command: `az aks update --resource-group <rg> --name <cluster> --enable-cluster-autoscaler --min-count 1 --max-count 10`. You can also enable it during cluster creation with `az aks create --enable-cluster-autoscaler`. The min and max counts are required.

Does KEDA replace HPA?

No, KEDA works alongside HPA. KEDA creates HPA objects that use custom metrics from external event sources. The actual scaling decision and mechanism are still handled by HPA. KEDA simplifies the configuration of event-driven scaling.

What happens if Cluster Autoscaler cannot scale up due to quota limits?

If the Azure subscription has insufficient vCPU quota in the region, Cluster Autoscaler will fail to add nodes. It will log an error and retry periodically. You must increase the quota via Azure support or reduce the max node count.

Why is my HPA not scaling down even after load decreases?

HPA has a downscale stabilization window (default 5 minutes). It waits for that period before scaling down to avoid thrashing. Additionally, check the tolerance (10%) – if the desired replica count is within 10% of current, no scaling occurs. Also ensure that the metrics are actually decreasing.

Can I use multiple metrics in one HPA?

Yes, HPA supports multiple metrics. It calculates the desired replica count for each metric and then takes the maximum. For example, if CPU suggests 10 replicas and memory suggests 8, HPA will use 10. This is important for exam scenarios where both CPU and memory are monitored.

Terms Worth Knowing

Ready to put this to the test?

You've just covered AKS Scaling: HPA, KEDA, Cluster Autoscaler — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.

Done with this chapter?