ACEChapter 35 of 101Objective 3.1

GKE Node Pools and Cluster Autoscaler

Two critical features for managing compute resources in Kubernetes clusters on Google Cloud are GKE node pools and the Cluster Autoscaler. Node pools allow you to group nodes with identical configurations, enabling workload segregation and cost optimization. The Cluster Autoscaler automatically adjusts the number of nodes in a pool based on pending pod resource requests. These topics are central to the ACE exam's Deploy Implement domain (Objective 3.1) and appear in approximately 10-15% of exam questions. Mastery of node pool design, autoscaling configuration, and interaction with PodDisruptionBudgets is essential for passing.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Warehouse staffing with flexible teams

How can a large warehouse handle a fluctuating number of customer orders? The warehouse has different sections: picking, packing, shipping, and returns. Each section has a dedicated team of workers, but the number of orders fluctuates throughout the day. The warehouse manager wants to avoid overstaffing (wasting money) or understaffing (delaying orders). So, the manager creates a flexible pool of temporary workers who can be assigned to any section as needed. The manager monitors the queue length in each section. If the picking queue grows beyond 50 items, the manager adds two temporary workers to picking. If the queue shrinks below 10 items for 5 minutes, the manager removes one worker. The manager also sets a minimum of 5 permanent workers per section to handle baseline demand and a maximum of 20 total workers per section to avoid overcrowding. The temporary workers are hired from an agency with a 2-minute notice, but once hired, they must work for at least 10 minutes before being released. This analogy mirrors GKE node pools and Cluster Autoscaler: node pools are the dedicated sections with specific machine types, and Cluster Autoscaler is the manager that adds or removes nodes based on pod resource requests, respecting minimum and maximum pool sizes, cooldown periods, and node utilization thresholds.

How It Actually Works

What Are GKE Node Pools?

A node pool is a group of nodes within a GKE cluster that all have the same configuration: machine type, image type, disk size, and other node-level settings. When you create a cluster, GKE automatically creates a default node pool with one node. You can then create additional node pools with different configurations. Node pools enable you to:

Run different workloads on different hardware (e.g., compute-optimized for batch jobs, memory-optimized for databases).

Isolate workloads by using node taints and tolerations.

Scale node pools independently.

Perform zero-downtime upgrades by draining nodes in one pool at a time.

How Node Pools Work Internally

Each node pool is backed by a managed instance group (MIG) in Compute Engine. When you create a node pool, GKE creates a MIG with the specified machine type, boot disk image, and other settings. The MIG ensures that the desired number of nodes is maintained. If a node fails, the MIG automatically recreates it. When you scale a node pool up or down, GKE adjusts the target size of the MIG.

Node pools have a name, machine type (e.g., e2-medium, n1-standard-2), disk type (pd-standard, pd-ssd), disk size (default 100 GB), image type (Container-Optimized OS, Ubuntu), and node version (Kubernetes version). You can also specify node labels, node taints, and metadata.

Cluster Autoscaler Overview

The Cluster Autoscaler is a Kubernetes component that automatically adjusts the size of a node pool based on the resource requests of pods. It runs as a Deployment in the kube-system namespace. It is enabled by default on GKE clusters (since version 1.12.6) and can be configured per node pool.

How it works: 1. The Cluster Autoscaler continuously monitors pod scheduling status. 2. When it detects pending pods that cannot be scheduled due to insufficient resources (CPU, memory, or GPU), it triggers a scale-up. 3. It calculates how many nodes are needed to accommodate the pending pods, based on the resource requests of the pods and the capacity of the node pool's machine type. 4. It adds nodes by increasing the target size of the underlying MIG. New nodes take 2-3 minutes to become ready. 5. When nodes are underutilized (CPU and memory usage below a threshold), the autoscaler considers scaling down. 6. It first checks if the node can be emptied (i.e., all pods can be rescheduled to other nodes). It respects PodDisruptionBudgets and does not scale down if it would violate them. 7. If safe to remove, it drains the node (evicts pods gracefully) and then reduces the MIG target size, which terminates the node.

Key Configuration Parameters and Defaults

Scale-up: - Scale-up trigger: Pending pods that cannot be scheduled for more than 10 seconds. - Scale-up cooldown: After a scale-up, the autoscaler waits 10 minutes before considering another scale-up, to allow pods to stabilize. - Maximum number of nodes added per scale-up: Limited by the node pool's max size.

Scale-down: - Scale-down trigger: Nodes with utilization below 50% for CPU and memory for at least 10 minutes (configurable via --scale-down-utilization-threshold). - Scale-down cooldown: After a scale-down, the autoscaler waits 10 minutes before considering another scale-down. - Minimum node lifetime: Nodes must exist for at least 10 minutes before being considered for scale-down (configurable via --scale-down-unneeded-time). - Max node count per scale-down: Up to 10% of the node pool's nodes per 10-minute interval (configurable via --max-nodes-total).

Node pool limits: - Minimum nodes: 0 (default) — can be set per pool. - Maximum nodes: 1000 per pool (default).

Configuring Node Pools and Cluster Autoscaler

Creating a node pool:

gcloud container node-pools create POOL_NAME \
  --cluster CLUSTER_NAME \
  --machine-type e2-standard-4 \
  --num-nodes 3 \
  --min-nodes 1 \
  --max-nodes 10 \
  --enable-autoscaling \
  --node-labels=env=prod \
  --node-taints=dedicated=ml:NoSchedule

Enabling autoscaling on an existing node pool:

gcloud container clusters update CLUSTER_NAME \
  --node-pool POOL_NAME \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10

Disabling autoscaling:

gcloud container clusters update CLUSTER_NAME \
  --node-pool POOL_NAME \
  --no-enable-autoscaling

Viewing autoscaler status:

kubectl get events --field-selector source=cluster-autoscaler
kubectl describe configmap cluster-autoscaler-status -n kube-system

Interaction with Related Technologies

PodDisruptionBudgets (PDBs): The Cluster Autoscaler respects PDBs during scale-down. If a node has pods that are protected by a PDB that would be violated by eviction, the node is not scaled down.

Node Auto-Repair: GKE automatically repairs unhealthy nodes. If a node is in a bad state, it is recreated. This is separate from autoscaling.

Horizontal Pod Autoscaler (HPA): HPA scales the number of pod replicas based on CPU/memory. Cluster Autoscaler scales the number of nodes. They work together: HPA increases pod count, which may create pending pods, triggering Cluster Autoscaler to add nodes.

Vertical Pod Autoscaler (VPA): VPA adjusts pod resource requests/limits. This can affect node utilization and trigger scale-down if requests decrease.

Node taints and tolerations: If a node pool has taints, only pods with matching tolerations can be scheduled there. The Cluster Autoscaler considers taints when deciding which node pool to scale.

Best Practices

Set appropriate min/max values for node pools to control costs and prevent runaway scaling.

Use separate node pools for different workload types (e.g., system workloads vs. user workloads).

Enable autoscaling on all node pools to handle demand spikes.

Use node labels and taints to ensure pods land on the correct node pool.

Monitor autoscaler events and node utilization to fine-tune thresholds.

Common Pitfalls

Forgetting to set resource requests on pods: If pods don't have resource requests, the scheduler sees them as consuming zero resources, and the autoscaler may not scale up correctly.

Setting min nodes too high: This keeps nodes running even when not needed, increasing costs.

Not setting max nodes: A sudden spike could create hundreds of nodes, incurring high costs.

Misconfigured PDBs: A PDB that requires too many pods to be available can prevent scale-down.

Verification Commands

List node pools:

gcloud container node-pools list --cluster CLUSTER_NAME

Describe a node pool:

gcloud container node-pools describe POOL_NAME --cluster CLUSTER_NAME

Check autoscaler logs:

kubectl logs -n kube-system -l app=cluster-autoscaler

Walk-Through

Create a cluster with node pools

First, create a GKE cluster with a default node pool. Then create additional node pools with specific machine types and scaling limits. For example, create a pool for general workloads with e2-medium and a pool for GPU workloads with n1-standard-4 and attached GPUs. Each pool gets a name, machine type, disk size, and optionally node labels and taints. The underlying MIG is created automatically.

Enable Cluster Autoscaler per pool

Enable autoscaling on each node pool by setting minimum and maximum node counts. Use the `--enable-autoscaling` flag with `--min-nodes` and `--max-nodes`. The autoscaler will then manage the number of nodes within that range. If you don't enable autoscaling, the pool size remains fixed.

Deploy pods with resource requests

Deploy workloads with CPU and memory resource requests. Without requests, the scheduler treats pods as consuming zero resources, and the autoscaler may not scale up. Requests must be specified in the pod spec under `resources.requests`. The autoscaler sums the requests of pending pods to determine how many nodes to add.

Monitor pending pods trigger scale-up

When pods cannot be scheduled due to insufficient resources, they remain in Pending state. The Cluster Autoscaler detects pending pods after about 10 seconds. It calculates the required number of nodes based on the machine type's allocatable resources. It then increases the MIG target size, and new nodes are provisioned.

Scale-down on low utilization

When a node's CPU and memory utilization drops below 50% for 10 minutes, the autoscaler considers it for removal. It checks if all pods on the node can be rescheduled to other nodes without violating PDBs. If safe, it drains the node (evicts pods with a 30-second grace period) and then terminates the node by reducing the MIG size.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Platform with Seasonal Spikes

A large online retailer runs its e-commerce platform on GKE. During Black Friday, traffic surges 10x. They use two node pools: a general-purpose pool (e2-standard-4) for web servers and a compute-optimized pool (c2-standard-8) for recommendation engines. Both pools have autoscaling enabled with min=3, max=50. The Cluster Autoscaler adds nodes as pods scale up via HPA. After the sale, utilization drops, and the autoscaler scales down to 3 nodes per pool. Misconfiguration: If they had set min nodes to 10, they would pay for 10 nodes even during low traffic.

Enterprise Scenario 2: Multi-tenant SaaS Platform

A SaaS company isolates customer workloads using node pools with taints and tolerations. They have a pool for customer A (taint=customer-a:NoSchedule) and another for customer B. Each pool has autoscaling enabled with min=1, max=10. The autoscaler ensures each customer gets enough nodes without interfering. Problem: If a customer's pods don't set resource requests, the autoscaler never scales up, causing pod starvation.

Enterprise Scenario 3: CI/CD Pipeline with Bursty Jobs

A tech company runs CI/CD jobs in a node pool with preemptible VMs to reduce costs. The pool has autoscaling enabled with min=0, max=20. When a build job is triggered, it creates a pod that requires 4 CPUs. The autoscaler sees the pending pod and adds a preemptible node. After the job completes, the node becomes idle and is scaled down within 10 minutes. Pitfall: Preemptible nodes can be terminated at any time, so jobs must be fault-tolerant. Also, the autoscaler cannot preempt preemptible nodes; it can only reduce the target size.

How ACE Actually Tests This

What the ACE Exam Tests

ACE questions on node pools and Cluster Autoscaler primarily test: - Objective 3.1: Deploying and managing GKE clusters, including node pool configuration. - Understanding the relationship between node pools, MIGs, and autoscaling. - How to enable/disable autoscaling on a node pool. - Default values: scale-down utilization threshold (50%), scale-down unneeded time (10 minutes), scale-up cooldown (10 minutes). - The fact that Cluster Autoscaler respects PodDisruptionBudgets. - The requirement for pods to have resource requests for autoscaling to work.

Common Wrong Answers and Why

"Cluster Autoscaler can scale down nodes immediately when utilization drops." Wrong: There is a 10-minute cooldown and a minimum node lifetime of 10 minutes.

"You must manually create MIGs for node pools." Wrong: GKE creates MIGs automatically.

"Node pools cannot have different machine types." Wrong: Each node pool can have a different machine type.

"Autoscaling can be enabled only at cluster creation." Wrong: You can enable it on existing node pools using gcloud container clusters update.

Specific Values to Memorize

Default scale-down utilization threshold: 50%

Default scale-down unneeded time: 10 minutes

Default scale-up cooldown: 10 minutes

Default minimum node lifetime: 10 minutes

Maximum nodes per pool: 1000

Default disk size: 100 GB

Edge Cases and Exceptions

If a node pool has min=0 and all pods are removed, the autoscaler will scale to 0 nodes, but only after the scale-down conditions are met. This can save costs.

If a node is running a DaemonSet pod, the autoscaler will not scale down that node if the DaemonSet requires a pod on every node (i.e., DaemonSet with nodeSelector matching all nodes).

The autoscaler does not consider node conditions like disk pressure or memory pressure; it only looks at pending pods.

How to Eliminate Wrong Answers

If an answer mentions "immediate" scaling, it's likely wrong due to cooldowns.

If an answer suggests that autoscaling requires manual MIG management, it's wrong.

If an answer says node pools must have the same machine type, it's wrong.

If an answer says pods do not need resource requests, it's wrong.

Key Takeaways

Node pools group nodes with identical configuration; each pool has its own MIG.

Cluster Autoscaler scales based on pod resource requests, not actual usage.

Default scale-down utilization threshold is 50% for both CPU and memory.

Default scale-down unneeded time is 10 minutes; nodes must be underutilized for 10 minutes before removal.

Autoscaler respects PodDisruptionBudgets during scale-down.

Pods must have resource requests for autoscaler to work correctly.

You can enable autoscaling on existing node pools using gcloud container clusters update.

Minimum and maximum node counts are set per node pool.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Node Pool without Autoscaling

Fixed number of nodes, manually managed.

Cannot handle traffic spikes automatically.

Requires manual intervention to add or remove nodes.

Costs are predictable but may be wasteful during low demand.

Easier to understand and debug for static workloads.

Node Pool with Autoscaling

Number of nodes adjusts automatically based on demand.

Handles traffic spikes by adding nodes when pods are pending.

Reduces costs during low demand by scaling down.

Requires careful configuration of min/max and thresholds.

Works with HPA and VPA for full elasticity.

Watch Out for These

Mistake

Cluster Autoscaler scales based on actual resource usage of pods.

Correct

Cluster Autoscaler scales based on pod resource requests, not actual usage. It looks at pending pods and their requests to determine if new nodes are needed.

Mistake

You can have only one node pool per cluster.

Correct

A GKE cluster can have multiple node pools, each with different configurations. The default pool is created at cluster creation, but you can add up to 100 node pools per cluster.

Mistake

Cluster Autoscaler scales down nodes immediately when they become idle.

Correct

There is a 10-minute cooldown after a scale-down event, and nodes must have utilization below 50% for at least 10 minutes before being considered for removal.

Mistake

Node pools are automatically created for each machine type you specify in a pod spec.

Correct

Node pools must be explicitly created. Pod scheduling uses node affinity, taints, and tolerations to match pods to existing node pools.

Mistake

Cluster Autoscaler can scale up a node pool beyond its maximum setting if needed.

Correct

The autoscaler strictly respects the maximum node count. If pending pods require more nodes than the max, they remain pending.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I enable Cluster Autoscaler on an existing node pool?

Use the command: `gcloud container clusters update CLUSTER_NAME --node-pool POOL_NAME --enable-autoscaling --min-nodes 1 --max-nodes 10`. This enables autoscaling on the specified pool within the given range.

What happens if a node pool reaches its maximum node count and pods are still pending?

The pods remain in Pending state. The Cluster Autoscaler will not exceed the maximum. You must either increase the max limit or reduce pod resource requests.

Can I have a node pool with min nodes set to 0?

Yes, but the autoscaler will only scale down to 0 if all pods can be moved. This is useful for batch jobs or CI/CD that run intermittently. Note that the first pod may experience a delay while the node is provisioned.

Does Cluster Autoscaler work with preemptible VMs?

Yes, you can create a node pool with preemptible VMs and enable autoscaling. However, preemptible nodes can be terminated at any time, so workloads must be fault-tolerant. The autoscaler treats them like regular nodes for scaling decisions.

How does Cluster Autoscaler interact with Horizontal Pod Autoscaler?

HPA scales the number of pod replicas based on metrics like CPU. When HPA creates more pods, they may become pending if there are insufficient nodes. This triggers Cluster Autoscaler to add nodes. Together, they provide both pod-level and node-level elasticity.

What is the difference between node auto-repair and Cluster Autoscaler?

Node auto-repair automatically recreates unhealthy nodes (e.g., if the kubelet is not reporting). Cluster Autoscaler adds or removes nodes based on resource demand. They are independent; a node can be repaired and then scaled down.

Can I use node pools without enabling autoscaling?

Yes, node pools can have a fixed size. You can manually resize them using `gcloud container clusters resize`. Autoscaling is optional.

Terms Worth Knowing

Azure Kubernetes Service GKE Google Kubernetes Engine

Ready to put this to the test?

You've just covered GKE Node Pools and Cluster Autoscaler — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.

Try ACE practice questions Back to all chapters

Done with this chapter?

GKE Autopilot vs Standard Mode

Cloud Build Triggers and CI/CD

See the full ACE study guide