ACEChapter 3 of 101Objective 1.3

Google Kubernetes Engine (GKE)

This chapter covers Google Kubernetes Engine (GKE), Google Cloud's managed Kubernetes service, which is a core topic for the Associate Cloud Engineer (ACE) exam. GKE questions appear on roughly 20-25% of the exam, making it one of the highest-weighted domains. You will be tested on cluster creation, node pools, networking, storage, security, and monitoring—all from an operational perspective. This chapter provides the deep, mechanistic understanding needed to answer scenario-based questions correctly, including specific commands, default values, and common misconfigurations.

25 min read

Intermediate

Updated May 31, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

GKE as a Managed Restaurant Kitchen

Imagine you own a restaurant that needs to serve hundreds of custom meals (containers) daily. Instead of buying, installing, and maintaining every oven, stove, and prep station yourself, you contract with a professional kitchen service (GKE). The service provides the facility (cluster), hires a head chef (control plane) who manages the line cooks (nodes) and ensures each station has the right ingredients (container images) and utensils (storage). When a customer orders a meal (deployment), the head chef decides which line cook can cook it fastest, assigns the station (scheduling), and monitors the cooking time. If a cook burns a steak (node failure), the head chef immediately reassigns orders to another cook without you lifting a finger (auto-healing). The kitchen also has an automatic dishwasher (auto-scaling) that brings in more plates when the dinner rush hits. You only pay for the kitchen time used, not the equipment. The key mechanism: the head chef never does the actual cooking—that's the line cooks' job—but he manages the flow, health, and scaling. Similarly, GKE's control plane manages nodes but runs no application pods. If you tried to run your own kitchen (self-managed Kubernetes), you'd have to hire the head chef, buy all equipment, and handle repairs—GKE does all that for you, with automated upgrades and monitoring built-in.

How It Actually Works

What is GKE and Why Does It Exist?

Google Kubernetes Engine (GKE) is a managed, production-ready environment for deploying, managing, and scaling containerized applications using Kubernetes on Google Cloud. Before GKE, organizations had to install, configure, and manage Kubernetes clusters themselves—a complex task involving the control plane (etcd, API server, scheduler, controller manager), worker nodes, networking (CNI plugins), and storage. GKE automates the control plane management, node provisioning, upgrades, and repair, allowing you to focus on deploying applications. The ACE exam tests your ability to create and configure GKE clusters, understand node pools, networking modes (VPC-native vs routes-based), and use features like Workload Identity and Config Connector.

How GKE Works Internally

A GKE cluster consists of a control plane (managed by Google) and worker nodes (Compute Engine instances). The control plane runs Kubernetes components: kube-apiserver, kube-controller-manager, kube-scheduler, and etcd. Google manages these components, including high availability (multi-zonal control plane for regional clusters) and automatic upgrades. Worker nodes run the kubelet, kube-proxy, and container runtime (containerd or Docker). When you deploy a workload (e.g., a Deployment), the API server stores the desired state in etcd. The scheduler assigns pods to nodes based on resource requests and constraints. The kubelet on each node pulls container images and runs the pods. GKE also integrates with Google Cloud services like Cloud Load Balancing, Cloud Storage, and Cloud Monitoring.

Key Components, Values, Defaults, and Timers

Cluster Types: Zonal (single zone) or Regional (multi-zone, recommended for production). Regional clusters have replicas of the control plane in three zones within the region, providing higher availability. Default for new clusters is regional.

Node Pools: Groups of nodes with the same configuration (machine type, disk size, image type). You can have multiple node pools in a cluster. Default machine type is e2-medium (2 vCPU, 4 GB memory).

Node Image: Default is Container-Optimized OS (COS) with containerd. You can also use Ubuntu with Docker.

Cluster Network Mode: VPC-native (using alias IP ranges) is the default and recommended. Routes-based mode is legacy and uses VPC routes. VPC-native clusters use secondary IP ranges for pods and services, enabling direct pod communication without NAT.

Autoscaling: Node auto-provisioning enables automatic creation of node pools when pods are unschedulable. Cluster autoscaler automatically resizes node pools based on pending pods. Default scale-down unneeded time: 10 minutes.

Auto-upgrade: Enabled by default for nodes. Nodes are upgraded in a rolling fashion. You can configure maintenance windows and exclusions.

Auto-repair: Enabled by default. GKE periodically checks node health; unhealthy nodes are drained and recreated.

Workload Identity: Allows pods to impersonate a Google Cloud service account without managing keys. Default: disabled. Must be enabled at cluster creation or update.

Private Clusters: Nodes have internal IPs only, and the control plane is accessed via VPC peering or private endpoint. Default: public endpoint is enabled.

Configuration and Verification Commands

Creating a regional cluster with default settings:

gcloud container clusters create my-cluster --region us-central1

Creating a private cluster with Workload Identity:

gcloud container clusters create my-private-cluster \
    --region us-central1 \
    --enable-private-nodes \
    --enable-ip-alias \
    --workload-pool=my-project.svc.id.goog

Verifying cluster details:

gcloud container clusters describe my-cluster --region us-central1

Getting credentials:

gcloud container clusters get-credentials my-cluster --region us-central1

Listing node pools:

gcloud container node-pools list --cluster my-cluster --region us-central1

Interaction with Related Technologies

Cloud Load Balancing: GKE integrates with Cloud Load Balancing via Ingress resources. The GKE Ingress controller creates an HTTP(S) load balancer. For TCP/UDP traffic, use Service type LoadBalancer.

Cloud Storage: Persistent Volumes can use Compute Engine persistent disks (standard, SSD, or balanced) or Filestore for shared filesystems. You can also use Cloud Storage via a sidecar or CSI driver.

Cloud Monitoring and Logging: GKE automatically sends metrics and logs to Cloud Monitoring and Cloud Logging. You can view cluster and pod metrics, set alerts, and use Cloud Logging for container logs.

Cloud IAM: GKE uses IAM for cluster permissions (e.g., roles/container.clusterAdmin) and Kubernetes RBAC for pod-level permissions. Workload Identity bridges both.

Cloud NAT: For private clusters to access the internet, you must configure Cloud NAT. Without it, nodes cannot pull images from external registries.

Networking Deep Dive

VPC-native clusters use alias IP ranges. Each cluster has a primary IP range for nodes, and two secondary ranges for pods and services. Pods get IP addresses from the pod secondary range, and services get IPs from the service secondary range. This allows pods to communicate directly using their pod IPs without NAT, and enables network policies. The default maximum pods per node is 110 (with /24 alias range). You can configure this at cluster creation. For routes-based clusters, pods use IP forwarding and VPC routes, which can cause route table pollution at scale. VPC-native is the recommended and default mode.

Security Features

Workload Identity: Maps a Kubernetes service account to a Google Cloud service account. Pods can then access Google Cloud APIs without storing service account keys.

Binary Authorization: Enforces signed container images. Only images signed by trusted authorities can be deployed.

Shielded GKE Nodes: Uses shielded VMs to protect against rootkits and bootkits.

Node Auto-Upgrade: Applies security patches automatically.

GKE Sandbox: Provides an additional layer of isolation using gVisor for untrusted workloads.

Monitoring and Logging

GKE integrates with Cloud Operations. You can view cluster dashboards, pod logs, and metrics like CPU and memory utilization. The default metrics collection includes system metrics (e.g., node CPU, memory) and pod metrics. You can enable Managed Service for Prometheus for advanced monitoring. Alerts can be set up for high resource usage, pod failures, or cluster health issues.

Walk-Through

Create a GKE Cluster

Use the gcloud container clusters create command with desired parameters. For a regional cluster, specify --region. For a zonal cluster, specify --zone. Defaults include Container-Optimized OS, containerd runtime, VPC-native networking, and auto-upgrade enabled. The command triggers Google Cloud to provision a control plane in the chosen region (with replicas for regional) and a default node pool with one node per zone. The process takes 5-10 minutes. Example: gcloud container clusters create my-cluster --region us-central1 --num-nodes=3.

Configure kubectl Access

After cluster creation, get credentials using gcloud container clusters get-credentials my-cluster --region us-central1. This generates a kubeconfig entry with an access token. The token is obtained from the Google Cloud SDK and is valid for 1 hour by default. kubectl commands will use this token to authenticate to the cluster's API server. You can verify access with kubectl cluster-info.

Deploy a Workload

Create a Deployment manifest (e.g., nginx-deployment.yaml) with replicas=3. Use kubectl apply -f nginx-deployment.yaml. The API server stores the deployment object in etcd. The scheduler assigns pods to nodes based on resource requests (default CPU request: 100m, memory: 200Mi). The kubelet on each node pulls the nginx image from Docker Hub or Container Registry and starts the container. You can monitor with kubectl get pods -o wide.

Expose the Application

Create a Service of type LoadBalancer to expose the deployment externally. Use kubectl expose deployment nginx-deployment --type=LoadBalancer --port=80 --target-port=80. GKE provisions a Cloud Load Balancer with a public IP. The service controller creates forwarding rules and health checks. The load balancer distributes traffic to healthy pods. You can get the external IP with kubectl get svc. This process takes 1-2 minutes.

Scale the Application

Scale the deployment to 10 replicas: kubectl scale deployment nginx-deployment --replicas=10. The deployment controller creates 7 additional pods. If the current node pool doesn't have enough resources, the cluster autoscaler (if enabled) adds nodes. The autoscaler checks for pending pods every 10 seconds and can add a node in 1-2 minutes. You can also scale nodes manually with gcloud container clusters resize or by updating the node pool.

Upgrade the Cluster

GKE automatically upgrades the control plane and nodes. You can trigger a manual upgrade: gcloud container clusters upgrade my-cluster --region us-central1 --master. The control plane is upgraded first, then nodes. For nodes, you can specify --node-pool. The upgrade process drains nodes one at a time, evicting pods with respect to PodDisruptionBudgets. Each node takes about 5-10 minutes. You can configure maintenance windows to control when upgrades happen.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce Platform with Autoscaling

A large e-commerce company runs its microservices on GKE. During Black Friday, traffic spikes 10x. They use cluster autoscaler with node auto-provisioning to automatically add node pools with different machine types (e.g., n2-highcpu for compute-intensive tasks, n2-standard for general). They set pod disruption budgets to ensure at least 50% of pods are available during upgrades. They use VPC-native clusters for direct pod-to-pod communication and network policies to isolate frontend from backend. Common issues: if cluster autoscaler scale-down unneeded time is too short (default 10 min), it may remove nodes that are about to receive traffic. They adjust it to 15 minutes. Also, if node auto-provisioning is not enabled, pods may remain pending if the existing node pool is full. They enable it with minimum CPU and memory constraints.

Enterprise Scenario 2: Financial Services with Strict Security

A bank deploys GKE private clusters with no public IP addresses. They use Cloud NAT for egress to pull images from Artifact Registry. They enable Workload Identity to allow pods to access Cloud Storage and BigQuery without managing keys. Binary Authorization is enforced: only images signed by their internal CI/CD pipeline can be deployed. They use Shielded GKE Nodes for boot integrity. They also enable GKE Sandbox for untrusted third-party workloads. A common misconfiguration: forgetting to create a Cloud NAT for private clusters, causing image pull failures. They also set up maintenance exclusions during audit periods to prevent automatic upgrades.

Enterprise Scenario 3: Media Streaming with GPUs

A video streaming company uses GKE to run transcoding jobs on GPU nodes. They create a node pool with accelerator type nvidia-tesla-t4 and enable GPU time-sharing for higher utilization. They use node taints and tolerations to ensure GPU pods are scheduled only on GPU nodes. They use Filestore for shared storage of video files. They enable cluster autoscaler to scale GPU nodes based on pending jobs. Performance consideration: GPU nodes are expensive, so they use preemptible VMs for batch jobs to reduce costs. However, preemptible nodes can be terminated at any time, so they set up pod disruption budgets and use node pools with regular VMs for critical workloads. A common issue: forgetting to install NVIDIA drivers on COS (they use the GPU driver installation add-on).

How ACE Actually Tests This

The ACE exam tests GKE in the context of operational tasks. The objective code is 1.3: Configuring Google Kubernetes Engine. You will be asked to:

Choose between zonal and regional clusters based on availability requirements.

Understand node pool configuration: machine type, disk size, image type, and autoscaling.

Differentiate VPC-native vs routes-based networking and know default is VPC-native.

Enable and configure Workload Identity.

Use gcloud commands for cluster creation, credential retrieval, and scaling.

Interpret kubectl output to identify pod status and cluster health.

Troubleshoot common issues: pods pending due to resource constraints, image pull failures, and node health.

Common wrong answers: 1. Choosing a zonal cluster when high availability is required. Many candidates pick zonal because it's cheaper, but the exam expects regional for production workloads. 2. Selecting routes-based networking for new clusters because they think it's simpler. The exam expects VPC-native as the default and recommended. 3. Enabling Workload Identity after cluster creation without knowing it can only be enabled on new clusters or via update. Actually, you can enable it on existing clusters with gcloud container clusters update --workload-pool. 4. Forgetting that private clusters need Cloud NAT for egress. Candidates often assume private clusters can pull images directly.

Specific numbers to memorize:

Default max pods per node: 110 (for VPC-native with /24 alias range).

Default machine type: e2-medium.

Default node image: Container-Optimized OS with containerd.

Cluster autoscaler scale-down unneeded time: 10 minutes.

Workload Identity must be enabled at cluster creation or via update.

Edge cases:

If you create a cluster with --enable-private-nodes but no Cloud NAT, pods cannot reach the internet. The exam may present this as a troubleshooting scenario.

If you use a routes-based cluster, you cannot use network policies (unless you install a CNI plugin).

Node pools can have different machine types, but all nodes in a pool are identical.

To eliminate wrong answers, focus on the underlying mechanism: GKE is managed, so you don't manage the control plane. If a question asks about upgrading the control plane, the answer is that GKE does it automatically (or you can trigger manually). If a question asks about persistent storage, the answer is PersistentVolumeClaims backed by Compute Engine disks or Filestore.

Key Takeaways

GKE is a managed Kubernetes service; you do not manage the control plane.

Regional clusters are recommended for production; zonal for dev/test.

Default networking is VPC-native (alias IP ranges); routes-based is legacy.

Default node image: Container-Optimized OS with containerd.

Default machine type: e2-medium.

Cluster autoscaler scale-down unneeded time: 10 minutes.

Workload Identity maps a KSA to a GSA, eliminating service account keys.

Private clusters require Cloud NAT for internet egress.

Auto-upgrade and auto-repair are enabled by default.

Ingress creates a Cloud Load Balancer; Service type LoadBalancer creates a TCP/UDP load balancer.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Zonal Cluster

Control plane runs in a single zone.

Lower cost (no multi-zone replication).

Less resilient: if the zone fails, the control plane is unavailable.

Nodes can be in the same zone only.

Suitable for development or non-critical workloads.

Regional Cluster

Control plane runs in three zones within the region.

Higher cost due to replication.

Highly available: control plane survives zone failure.

Nodes can be in multiple zones for high availability.

Recommended for production workloads.

Watch Out for These

Mistake

GKE clusters always have a public endpoint that cannot be disabled.

Correct

You can create a private cluster with --enable-private-endpoint, which makes the control plane accessible only via internal IP from within the VPC. The public endpoint can also be disabled after creation.

Mistake

You must manually upgrade the control plane and nodes.

Correct

GKE offers auto-upgrade for both control plane and nodes (enabled by default). You can also trigger manual upgrades, but it's not required.

Mistake

Workload Identity requires storing service account keys in the cluster.

Correct

Workload Identity eliminates the need for keys. Pods use a Kubernetes service account mapped to a Google Cloud service account, and the GKE metadata server provides short-lived tokens automatically.

Mistake

Cluster autoscaler can scale down nodes immediately when they are idle.

Correct

The default scale-down unneeded time is 10 minutes. Nodes are not removed immediately even if idle, to avoid thrashing. You can adjust this with --max-nodes-per-pool and other flags.

Mistake

All GKE clusters use VPC-native networking by default.

Correct

This is true for clusters created with the gcloud command or Console after a certain date (2019). However, older clusters or those created with --no-enable-ip-alias use routes-based networking. The exam expects you to know the default is VPC-native.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

How do I create a GKE cluster with a specific number of nodes?

Use the --num-nodes flag. For example: gcloud container clusters create my-cluster --num-nodes=3 --region us-central1. This creates a default node pool with 3 nodes per zone (for regional, total 9 nodes across 3 zones).

Can I enable Workload Identity on an existing cluster?

Yes, you can update an existing cluster: gcloud container clusters update my-cluster --workload-pool=my-project.svc.id.goog. This enables Workload Identity. You must then create a Kubernetes service account and map it to a Google Cloud service account.

What is the difference between a zonal and regional cluster?

A zonal cluster has its control plane in a single zone, while a regional cluster has control plane replicas in three zones within the region. Regional clusters are more available and recommended for production. Zonal clusters are cheaper and suitable for dev/test.

How do I expose a deployment on GKE?

You can create a Service of type LoadBalancer (kubectl expose deployment my-deploy --type=LoadBalancer --port=80) which provisions a Cloud Load Balancer. For HTTP(S) traffic, you can create an Ingress resource, which creates an HTTP(S) load balancer.

What happens if a node fails in GKE?

If auto-repair is enabled (default), GKE checks node health periodically. If a node is unhealthy, it is drained (pods are evicted) and recreated. The cluster autoscaler may also add new nodes if needed. Pods on the failed node are rescheduled by the Kubernetes controller.

Can I use GPUs with GKE?

Yes, create a node pool with accelerator type (e.g., --accelerator type=nvidia-tesla-t4,count=1). Ensure you enable GPU driver installation (default for COS). Use node taints and tolerations to ensure GPU pods are scheduled on GPU nodes.

How do I configure network policies in GKE?

Network policies are supported only on VPC-native clusters. You can use Kubernetes NetworkPolicy resources to control pod-to-pod traffic. GKE also supports Dataplane V2 for advanced network policies using eBPF.

Terms Worth Knowing

Azure Kubernetes Service Cloud computing Cloud IAM GKE Google Kubernetes Engine Region

Ready to put this to the test?

You've just covered Google Kubernetes Engine (GKE) — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.

Try ACE practice questions Back to all chapters

Done with this chapter?

Google Compute Engine

Cloud Run and App Engine

See the full ACE study guide