This chapter covers Azure Kubernetes Service (AKS) node pools and cluster upgrades, two critical topics for managing AKS clusters in production. On the AZ-104 exam, these concepts appear in roughly 5–10% of questions, often in scenario-based questions about cost optimization, high availability, and upgrade strategies. You will learn how to configure multiple node pools, manage node scaling, plan and execute cluster upgrades, and avoid common pitfalls that lead to application downtime.
Jump to a section
Think of an AKS cluster as an airline company. The cluster itself is the airline brand, and each node pool is a specific fleet of aircraft. The default node pool is your main fleet of workhorse planes (e.g., Boeing 737s) that handle the majority of flights. You can add extra node pools for specialized missions: a pool of small regional jets for short hops (low-CPU pods), a pool of cargo planes for heavy freight (GPU-intensive workloads), and a pool of retired planes used only for testing (spot instances). Each fleet has its own configuration—engine type (VM size), number of planes (node count), and maintenance schedule (upgrade policy). When you need to upgrade the entire airline's planes to a new model (Kubernetes version), you don't ground all planes at once. Instead, you upgrade each fleet sequentially, draining passengers (pods) from each plane before taking it out of service. The default fleet is upgraded last because it handles critical hub operations (system pods). You can also add a new, upgraded fleet (new node pool) and gradually transfer flights, then retire the old fleet—this is a blue-green upgrade strategy. If you have a fleet of leased planes (spot instances), they can be preempted at any time, so you must ensure your cargo (pods) can be moved to other fleets quickly.
What Are Node Pools?
An AKS cluster consists of a control plane (managed by Microsoft) and one or more node pools. A node pool is a group of virtual machines that all run the same configuration: VM size, OS version, Kubernetes version, scaling settings, and node labels. The first node pool created with the cluster is called the default node pool. It is mandatory and cannot be deleted as long as the cluster exists. You can add additional node pools (user node pools) to run workloads that require different resources, such as GPU VMs for machine learning or memory-optimized VMs for databases.
Why Multiple Node Pools?
Workload isolation: Run critical system pods (CoreDNS, metrics-server) in the default pool and application pods in user pools. This prevents noisy neighbors from affecting cluster operations.
Cost optimization: Use spot node pools for non-critical, fault-tolerant workloads. Spot VMs offer up to 90% discount compared to pay-as-you-go but can be evicted at any time.
Resource specialization: Different workloads have different resource requirements. A single pool with large VMs wastes money on small pods; a single pool with small VMs cannot run memory-intensive pods. Multiple pools allow right-sizing.
Compliance and security: Some workloads require specific OS versions (e.g., Ubuntu 18.04 vs. 20.04) or need to be on isolated hardware (e.g., confidential computing). Node pools can be configured with different OS SKUs or even different VM series.
Node Pool Configuration
When you create a node pool, you specify:
- Node count: Initial number of nodes. Can be scaled manually or automatically using cluster autoscaler.
- VM size: The Azure VM SKU (e.g., Standard_DS2_v2).
- OS disk size and type: Default is 128 GB Premium SSD. Can be increased up to 4 TB.
- OS type: Linux or Windows. Windows node pools require a Windows OS SKU (e.g., Windows Server 2019).
- Node labels and taints: Used for pod scheduling constraints. For example, taint a GPU pool with nvidia.com/gpu:true:NoSchedule so only pods with a corresponding toleration can run there.
- Availability zones: Distribute nodes across zones for high availability. Default is none (not zone-redundant).
- Scaling method: Manual or autoscaler. The cluster autoscaler automatically adjusts node count based on pod resource requests.
- Upgrade settings: For system node pools, you can set the upgrade mode to NodeImage or Automatic (preview). For user node pools, you can set max surge to control how many extra nodes are created during upgrades.
System vs. User Node Pools
System node pools: Always created first. They host critical system pods. You can have up to two system pools per cluster (one for Linux, one for Windows). They cannot be deleted; you can only delete the cluster. They have a higher priority during upgrades (upgraded last). They support CriticalAddonsOnly taint to prevent application pods from scheduling on them.
User node pools: Created for application workloads. You can have up to 1000 node pools per cluster (500 Linux + 500 Windows). They can be deleted independently. They do not host system pods unless explicitly tolerated.
Cluster Upgrades
AKS manages the Kubernetes version for the control plane and node pools. You are responsible for upgrading both. The control plane version determines the maximum node pool version (nodes cannot be newer than the control plane). You can upgrade the control plane independently of node pools, but it is recommended to upgrade together.
Supported versions: AKS supports three minor versions at any time (N-2). For example, if 1.27 is the latest, then 1.26 and 1.25 are supported. You must upgrade before the version goes out of support.
Upgrade process:
1. Upgrade the control plane first (or together with node pools).
2. Upgrade each node pool. By default, AKS performs a rolling update: it cordons and drains one node at a time, then deletes it and creates a new node with the new version. Pods are evicted and rescheduled on remaining nodes.
3. During upgrade, you can configure max surge to control how many extra nodes are created. For example, max-surge=1 creates one extra node before draining the first old node, reducing disruption. Default is 1 for user pools, 0 for system pools.
4. The entire upgrade can take 10–30 minutes per node pool, depending on node count and workload.
Upgrade strategies: - In-place upgrade: The default. Nodes are upgraded one by one. Simple but may cause brief resource constraints. - Blue-green upgrade: Create a new node pool with the new version, migrate workloads by tainting the old pool, then delete the old pool. This requires more VMs temporarily but reduces risk of downtime. - Canary upgrade: Upgrade a subset of nodes in a pool first (by creating a small node pool with the new version), validate, then upgrade the rest.
Cluster Autoscaler
The cluster autoscaler automatically adjusts the number of nodes in a node pool based on pending pod resource requests. It works with both system and user pools. It does not scale down below the minimum node count. It respects pod disruption budgets (PDBs) during scale-down to avoid evicting pods that cannot be interrupted. It checks every 10 seconds for pending pods and every 10 seconds for underutilized nodes.
Key settings:
- --min-nodes and --max-nodes: Limits per node pool.
- --scale-down-delay-after-add: How long to wait before scaling down after a scale-up (default 10 minutes).
- --scale-down-unneeded-time: How long a node must be underutilized before being removed (default 10 minutes).
- --skip-nodes-with-system-pods: Prevents scaling down nodes running critical system pods (default true).
Node Image Updates
AKS periodically releases new node images with security patches. You can upgrade node images without changing the Kubernetes version using az aks nodepool upgrade --node-image-only. This is important for staying patched without a full version upgrade.
Commands and Verification
Create a node pool:
az aks nodepool add \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name gpunodepool \
--node-count 3 \
--node-vm-size Standard_NC6 \
--node-taints nvidia.com/gpu=true:NoSchedule \
--labels workload=gpu \
--zones 1 2 3List node pools:
az aks nodepool list --resource-group myResourceGroup --cluster-name myAKSCluster --output tableUpgrade a node pool:
az aks nodepool upgrade \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name mynodepool \
--kubernetes-version 1.26.0 \
--max-surge 1Upgrade cluster (control plane and all node pools):
az aks upgrade \
--resource-group myResourceGroup \
--name myAKSCluster \
--kubernetes-version 1.26.0Enable cluster autoscaler:
az aks nodepool update \
--resource-group myResourceGroup \
--cluster-name myAKSCluster \
--name mynodepool \
--enable-cluster-autoscaler \
--min-count 2 \
--max-count 10How It Interacts with Related Technologies
Azure Policy: You can enforce node pool configurations (e.g., require certain tags) using Azure Policy for Kubernetes.
Azure Monitor: Container Insights provides metrics on node and pod resource usage, helpful for right-sizing node pools.
Azure CNI: Each node pool can use the same or different virtual network subnet. For advanced networking, you can assign a different subnet per node pool.
Azure Active Directory: Node pools can be integrated with AAD for authentication, but the node pool itself doesn't have a direct AAD configuration.
Kubernetes RBAC: Node labels and taints are used with RBAC to control pod placement, but node pools themselves are not RBAC objects.
Upgrade Considerations
Pod Disruption Budgets (PDBs): During upgrade, pods are evicted. PDBs can prevent the upgrade from proceeding if too many pods would be disrupted simultaneously. For example, if you have 3 replicas and a PDB of maxUnavailable: 1, the upgrade will only evict one pod at a time.
Node Selectors and Tolerations: Ensure your pods have appropriate node selectors and tolerations to land on the correct node pool. Misconfiguration can cause pods to remain pending.
Storage: If using Azure Disks with WaitForFirstConsumer volume binding mode, pods may not move to new nodes if the disk is attached to the old node. Plan for storage migration.
Windows nodes: Windows node pools have additional considerations: they require a Windows OS image, and not all Kubernetes versions are available for Windows. Upgrades may take longer due to image size.
Default Values and Limits
Default node pool VM size: Standard_DS2_v2.
Default node count: 3 for system pool when created via portal, 1 via CLI.
Maximum node count per node pool: 1000 (with quota).
Maximum node pools per cluster: 1000 (500 Linux + 500 Windows).
Maximum surge: 1 for user pools, 0 for system pools (configurable up to 10 or 100% of pool size).
Minimum node count for autoscaler: 1 (but system pool should have at least 2 for redundancy).
Supported Kubernetes versions: N-2 minor versions.
Upgrade timeout: 30 minutes per node pool (configurable).
Create a New Node Pool
First, identify the resource group and AKS cluster name. Use the `az aks nodepool add` command with parameters for VM size, node count, taints, labels, and availability zones. The command creates the VMs in the same virtual network as the cluster. The new nodes automatically join the cluster and are registered as Kubernetes nodes. The process takes 5-10 minutes. Verify with `kubectl get nodes` and look for the new node names (they include the nodepool name).
Schedule Pods to Node Pool
To ensure pods land on the correct node pool, add a nodeSelector or toleration to the pod spec. For example, if the node pool has a label `workload=gpu`, set `nodeSelector: workload: gpu` in the deployment YAML. If the node pool has a taint `nvidia.com/gpu=true:NoSchedule`, add a toleration `- key: nvidia.com/gpu, operator: Exists, effect: NoSchedule`. The Kubernetes scheduler then matches pods to nodes based on these constraints. Pods without matching tolerations will remain pending.
Scale Node Pool Manually
Use `az aks nodepool scale` to change the node count. For example, `az aks nodepool scale --resource-group myRG --cluster-name myCluster --name mypool --node-count 5`. This adds or removes VMs. When scaling down, nodes are cordoned and drained before deletion. Pods are evicted and rescheduled on remaining nodes. The process respects PDBs. Scaling up adds new VMs that register as nodes, ready to schedule pods. Manual scaling is immediate but can be automated with autoscaler.
Enable Cluster Autoscaler
Update the node pool with `--enable-cluster-autoscaler` and set min and max counts. The autoscaler monitors pending pods every 10 seconds. If there are pending pods that cannot be scheduled due to resource constraints, it increases node count up to the maximum. If nodes are underutilized (less than 50% CPU/memory for 10 minutes), it scales down. The autoscaler will not scale down nodes running system pods or pods with PDBs that would be violated. It also respects a 10-minute delay after scale-up before considering scale-down.
Upgrade a Node Pool
Use `az aks nodepool upgrade` with the target Kubernetes version. The control plane must already be at that version or higher. The upgrade uses a rolling update: for each node, it creates a new node with the new version (if max-surge > 0), cordons and drains the old node, evicts pods (respecting PDBs), then deletes the old node. The process repeats for all nodes. The entire upgrade can be monitored with `az aks nodepool show` to check provisioning state. If max-surge is set to 1, one extra node is created at a time, minimizing resource impact.
Enterprise Scenario 1: Multi-Tier Application Isolation
A financial services company runs a three-tier application: a frontend (NGINX), a backend (Node.js), and a database (MongoDB). They use a single node pool with Standard_DS3_v2 VMs. The database pods consume high memory, causing the frontend pods to be evicted during peak hours. The solution: create three node pools. The frontend pool uses small VMs (Standard_DS2_v2) with autoscaling from 2 to 10 nodes. The backend pool uses medium VMs (Standard_DS3_v2) with autoscaling from 3 to 6. The database pool uses memory-optimized VMs (Standard_E4s_v3) with fixed count of 3. Each pool has taints to prevent other tiers from scheduling there. This eliminates resource contention and reduces costs by 30% because the frontend pool can scale down during off-peak hours. The cluster autoscaler is enabled on the frontend and backend pools. They also set PDBs on each tier to ensure availability during upgrades.
Enterprise Scenario 2: GPU Workloads with Spot Instances
A media company runs batch video rendering jobs on AKS. They need GPU VMs (Standard_NC6) for rendering but want to minimize cost. They create two node pools: a regular GPU pool with pay-as-you-go VMs for critical jobs, and a spot GPU pool with Standard_NC6 Spot VMs for non-critical batch jobs. The spot pool has a taint instance-type=spot:NoSchedule and a label workload=render. Batch pods have a toleration for the spot taint. The spot pool is configured with cluster autoscaler from 0 to 20 nodes. When spot VMs are evicted, the pods are rescheduled to the regular pool (which has no taint) or remain pending until new spot nodes become available. This setup saves 70% on compute costs for batch jobs. However, they must ensure the regular pool has enough capacity to handle evictions. They also use PDBs to prevent all batch pods from being evicted simultaneously.
Common Misconfigurations
Taints without tolerations: Operators add taints to node pools but forget to add tolerations to pods, causing pods to remain pending. Always test with a simple deployment.
Incorrect autoscaler limits: Setting min-count too high (e.g., 5) wastes money when no workloads are running. Setting max-count too low causes pods to remain pending during traffic spikes.
Not upgrading node images: Clusters become vulnerable to CVEs. Schedule monthly node image upgrades.
Mixing system and user pods on same pool: System pods can be starved by user pods. Use a dedicated system pool with taint CriticalAddonsOnly=true:NoSchedule.
AZ-104 Exam Focus on Node Pools and Upgrades
The AZ-104 exam (Objective 3.3) tests your ability to manage AKS clusters, including node pools and upgrades. Questions often appear as scenario-based multiple-choice or drag-and-drop. Here's what you need to know:
Specific Objective Codes: - 3.3.1: Implement and manage node pools (create, scale, configure taints/labels). - 3.3.2: Plan and execute cluster upgrades (control plane, node pools, node image). - 3.3.3: Configure cluster autoscaler (enable, set min/max, understand behavior).
Common Wrong Answers and Why:
1. "Upgrade all node pools at once" – Candidates think you can upgrade all pools simultaneously. Reality: You must upgrade each pool individually or use az aks upgrade which upgrades control plane and all pools sequentially. You cannot upgrade multiple pools in parallel.
2. "Cluster autoscaler scales based on actual resource usage" – Candidates think autoscaler scales based on CPU/memory utilization. Reality: It scales based on pending pods. It only considers resource requests, not actual usage. A node with low utilization but no pending pods will not be scaled down until it is underutilized for 10 minutes.
3. "You can delete the default node pool" – Candidates think you can delete the default pool after creating a new system pool. Reality: The default system pool cannot be deleted; you can only delete the cluster. You can, however, create additional system pools.
4. "Max surge applies to system node pools by default" – Candidates think max surge is 1 for all pools. Reality: For system pools, max surge defaults to 0 to minimize disruption. You can change it, but it's risky.
Numbers and Values:
Default node pool VM size: Standard_DS2_v2.
Maximum node pools per cluster: 1000.
Supported Kubernetes versions: N-2.
Autoscaler scale-up check interval: 10 seconds.
Autoscaler scale-down unneeded time: 10 minutes.
Max surge default: 1 for user pools, 0 for system pools.
Minimum node count for system pool: 1 (but recommended 2).
Edge Cases:
- Windows node pools: Cannot be upgraded to a version higher than the control plane. Must use a supported Windows OS SKU.
- Spot node pools: Cannot be used for system pools. Autoscaler cannot scale spot pools below min count if VMs are evicted.
- Node image upgrade without version change: Use --node-image-only flag. This is tested as a way to apply security patches without a full upgrade.
- PodDisruptionBudget: If a PDB prevents pod eviction, the upgrade stalls. Candidates need to know how to check PDBs.
How to Eliminate Wrong Answers:
If a question asks about scaling down, remember the autoscaler's 10-minute timer and that it ignores nodes with system pods.
If a question asks about upgrading, check whether the control plane version is mentioned. Nodes cannot be newer than control plane.
If a question mentions cost savings, consider spot node pools or scaling down non-production pools.
If a question mentions GPU, think about node taints and tolerations.
AKS supports up to 1000 node pools per cluster (500 Linux + 500 Windows).
The default node pool is mandatory and cannot be deleted; it must be a system pool.
System node pools host critical add-ons; they have a default max surge of 0 during upgrades.
Cluster autoscaler scales based on pending pods, not actual CPU/memory usage.
Node pools cannot run a higher Kubernetes version than the control plane.
Use `az aks nodepool upgrade --node-image-only` to apply OS patches without changing Kubernetes version.
Spot node pools cannot be used for system pools and are evictable; use tolerations to schedule pods.
Pod Disruption Budgets can block node pool upgrades if they prevent pod eviction.
Max surge controls how many extra nodes are created during upgrade; default 1 for user pools, 0 for system pools.
The cluster autoscaler checks for pending pods every 10 seconds and scale-down candidates every 10 seconds.
These come up on the exam all the time. Here's how to tell them apart.
Manual Scaling
You specify exact node count; no automation.
Scaling up/down is immediate upon command.
No dependency on pending pods or resource requests.
Risk of over-provisioning or under-provisioning if not monitored.
Best for predictable workloads with steady demand.
Cluster Autoscaler
Automatically adjusts node count based on pending pods.
Scales up within seconds of detecting pending pods; scales down after 10 minutes of underutilization.
Requires correct pod resource requests to function optimally.
Prevents over-provisioning; can handle traffic spikes.
Best for variable or unpredictable workloads.
Mistake
You can have only one system node pool per cluster.
Correct
You can have up to two system node pools: one for Linux and one for Windows. Both cannot be deleted, but you can create additional system pools if needed.
Mistake
Cluster autoscaler scales down nodes immediately when they become idle.
Correct
The autoscaler waits 10 minutes (configurable) of consistent underutilization before scaling down. It also respects PDBs and will not scale down nodes running system pods.
Mistake
Upgrading a node pool requires the same Kubernetes version as the control plane.
Correct
Node pools can be at a lower version than the control plane, but they cannot be at a higher version. The control plane must be upgraded first if you want to upgrade nodes to a newer version.
Mistake
You can delete the default node pool after creating a new system pool.
Correct
The default system pool cannot be deleted. It is tied to the cluster's lifecycle. You can delete the entire cluster, but not the default pool individually.
Mistake
Max surge setting applies to all node pools equally.
Correct
Max surge defaults to 1 for user node pools and 0 for system node pools. You can change it per pool, but setting it too high for system pools can cause resource contention.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
You can have up to 1000 node pools per cluster, with a maximum of 500 Linux node pools and 500 Windows node pools. Each node pool can have up to 1000 nodes (subject to subscription quota).
No, the default node pool (the first system pool) cannot be deleted. You can delete the entire cluster, but not the default pool individually. You can, however, create additional system pools.
The control plane runs the Kubernetes API server and etcd. Node pools run your workloads. You can upgrade the control plane independently, but node pools cannot run a version higher than the control plane. To upgrade both, either upgrade the control plane first then node pools, or use `az aks upgrade` which upgrades both together.
The autoscaler considers a node for scale-down if it has been underutilized (CPU and memory requests < 50% of capacity) for a configurable period (default 10 minutes). It will not scale down nodes that are running system pods or that would violate a PodDisruptionBudget.
Max surge is the number of extra nodes created during a rolling upgrade. For example, max-surge=1 creates one new node before draining an old node, reducing resource pressure. Default is 1 for user pools and 0 for system pools. Higher max surge speeds up upgrades but requires more resources.
No, spot VMs are not supported for system node pools because they can be evicted, which would disrupt critical system components. Spot VMs can only be used for user node pools.
Use `az aks nodepool upgrade --node-image-only`. This upgrades the node OS image (which includes security patches) without changing the Kubernetes version. This can be done per node pool.
You've just covered AKS Node Pools and Cluster Upgrades — now see how well it sticks with free AZ-104 practice questions. Full explanations included, no account needed.
Done with this chapter?