GCDLChapter 4 of 101Objective 2.2

Compute Options on Google Cloud

This chapter covers the core compute options available on Google Cloud: Compute Engine, Google Kubernetes Engine (GKE), Cloud Run, and Cloud Functions. Understanding when to choose each service is critical for the GCDL exam, as compute decisions appear in roughly 15-20% of questions, often in scenario-based items. We will dissect each service's mechanism, default values, and ideal use cases, so you can confidently select the right compute option for any workload requirement.

25 min read
Intermediate
Updated May 31, 2026

Your Cloud Kitchen: Pick Your Appliance

Imagine you are opening a cloud kitchen. You have different appliances to cook meals, each suited for different needs. A Compute Engine (CE) is like a full-sized oven and stove combo. You can cook any dish (run any application) from scratch, but it takes time to preheat (provision), and you pay for the whole appliance even if you only bake a muffin. A Google Kubernetes Engine (GKE) is like a set of modular induction cooktops that you can rearrange quickly. You standardize on induction-compatible pots (containers), and a chef (Kubernetes) decides which cooktop gets which pot, automatically adjusting when orders surge. A Cloud Run is like a microwave: you just pop in a pre-made meal (container), press start, and it runs only as long as needed. You pay per second of cooking, and if no one orders, it costs nothing. A Cloud Functions is like a single-serve coffee pod machine: you drop in a pod (function), press one button, and get a small, specific output (e.g., a cup of coffee). It's perfect for event-driven tasks like 'when a new order email arrives, send a confirmation text.' The key difference: CE gives you full control over the recipe and temperature, but you manage everything. GKE gives you flexibility and resilience for complex multi-course meals. Cloud Run and Cloud Functions are hands-off, scaling to zero when idle, ideal for sporadic or simple tasks. Your choice depends on how much control you need versus how much management you want to offload.

How It Actually Works

What Are Compute Options and Why Do They Exist?

Compute options on Google Cloud are the services that provide processing power to run applications. They range from raw virtual machines (VMs) to fully managed serverless platforms. The choice between them depends on factors like control, scalability, operational overhead, cost model, and application architecture. The GCDL exam expects you to map business requirements—such as 'need full OS control' or 'want to scale to zero when idle'—to the correct service.

Compute Engine: The IaaS Foundation

Compute Engine (CE) provides virtual machines running on Google's infrastructure. Each VM is a virtualized x86 server with a configurable number of vCPUs, memory, and local SSDs or persistent disks. You choose a machine family: General-purpose (E2, N2, N2D, N1), Compute-optimized (C2, C2D, C3), Memory-optimized (M1, M2, M3), or Accelerator-optimized (A2, G2) for GPU/TPU workloads. Each family has predefined machine types (e.g., n2-standard-4: 4 vCPUs, 16 GB RAM). You can also create custom machine types with granular vCPU and memory combinations.

How it works internally: When you create a VM, Google's hypervisor (based on KVM) allocates dedicated vCPUs from a physical host. The VM runs its own operating system (Linux or Windows) with full kernel access. Persistent disks are network-attached, using Google's internal storage system (Colossus). Live migration moves running VMs between hosts without reboot for maintenance. Sustained use discounts apply automatically: the more you run a VM in a month, the lower the per-hour rate (up to 30% for full month). Committed use discounts offer up to 57% off for 1-year or 3-year commitments.

Key defaults and timers: - Default machine type: e2-medium (2 vCPUs, 4 GB) - Boot disk default: 10 GB persistent disk (standard or balanced) - Maximum persistent disk per VM: 257 TB (across up to 128 disks) - Maximum VM life: no limit, but preemptible VMs run max 24 hours - Preemptible VM termination: 30-second notice before shutdown - Live migration: enabled by default for most machine types (except A2, G2, C2, C2D, C3)

Configuration commands:

gcloud compute instances create my-vm --zone=us-central1-a --machine-type=n1-standard-2 --image-family=debian-10 --image-project=debian-cloud --boot-disk-size=20GB

Google Kubernetes Engine: Container Orchestration

GKE is a managed Kubernetes service. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. GKE provisions and manages the underlying Compute Engine instances that form the cluster's node pool. You define workloads using Kubernetes objects like Deployments, Services, and Pods.

How it works internally: When you create a GKE cluster, you specify a node pool configuration (e.g., machine type, number of nodes, disk size). GKE creates the VMs (nodes) and installs Kubernetes components (kubelet, kube-proxy, container runtime). The control plane (API server, scheduler, controller manager) is managed by Google and is highly available. You interact with the cluster via kubectl. Pods are scheduled onto nodes based on resource requests and limits. The cluster autoscaler can add or remove nodes based on pending pods. Horizontal Pod Autoscaler (HPA) adjusts the number of pod replicas based on CPU/memory utilization or custom metrics.

Key defaults and timers: - Default node machine type: e2-medium - Default disk size: 100 GB - Cluster autoscaler scale-up: typically within 1-2 minutes - HPA default metrics: target CPU utilization 80% - Pod eviction timeout: 5 minutes for node failure - Container runtime: containerd (since GKE 1.21) - Maximum pods per node: 110 (default) but configurable up to 256

Configuration commands:

gcloud container clusters create my-cluster --zone=us-central1-a --num-nodes=3 --machine-type=e2-standard-2
kubectl create deployment my-app --image=us-docker.pkg.dev/my-project/my-repo/my-image:tag
kubectl expose deployment my-app --type=LoadBalancer --port=80 --target-port=8080

Cloud Run: Serverless Containers

Cloud Run is a managed compute platform that runs stateless containers in a serverless fashion. It abstracts away infrastructure: you provide a container image, specify CPU and memory, and Cloud Run automatically scales from zero to thousands of instances based on incoming requests. You pay only for resources consumed during request processing (CPU and memory per second, plus a minimum billing of 1 minute per invocation).

How it works internally: When you deploy a container to Cloud Run, Google stores the image in Container Registry or Artifact Registry. The service configures a revision (immutable snapshot of the container with its configuration). Each revision can have its own scaling settings (min and max instances). The Cloud Run control plane uses Knative (an open-source Kubernetes-based platform) under the hood. When a request arrives, the request router (Google Front End) forwards it to an instance of the container. If no instances are ready, the system starts one (cold start). Cold starts typically take 1-2 seconds, but can be longer if the container is large. Once an instance is running, it stays warm for 15 minutes of inactivity by default (configurable via min instances).

Key defaults and timers: - Default CPU: 1 vCPU, Memory: 512 MB (configurable up to 8 vCPU, 32 GB) - Default max instances: 100 (configurable up to 1000) - Default min instances: 0 (scale to zero) - Idle timeout: 5 minutes (configurable from 1 to 60 minutes) - Request timeout: 5 minutes (configurable up to 60 minutes) - Cold start latency: typically 1-2 seconds - Concurrent requests per instance: default 80 (configurable up to 1000) - Billing granularity: 100 ms increments after 1 minute minimum

Configuration commands:

gcloud run deploy my-service --image=gcr.io/my-project/my-image --platform=managed --region=us-central1 --memory=1Gi --cpu=2 --max-instances=50 --concurrency=200

Cloud Functions: Event-Driven Functions

Cloud Functions is a serverless execution environment for single-purpose functions written in Node.js, Python, Go, Java, .NET, Ruby, or PHP. Functions are triggered by events from Google Cloud services (e.g., Cloud Storage object changes, Pub/Sub messages, HTTP requests). Each function runs in its own isolated container, which is ephemeral and scaled automatically.

How it works internally: When you deploy a function, Google packages your code and dependencies into a container image. The function is associated with a trigger (HTTP or event). For event-driven functions, the Cloud Functions service subscribes to the event source (e.g., a Cloud Storage bucket). When an event occurs, the service invokes the function by passing the event data as a parameter. The function executes in a sandboxed environment with limited resources. Cloud Functions uses the same Knative-based infrastructure as Cloud Run. Cold starts occur when scaling from zero, but can be mitigated with min instances (available for 2nd gen functions).

Key defaults and timers: - Default memory: 256 MB (1st gen), 512 MB (2nd gen) - Maximum memory: 8 GB (2nd gen) - Timeout: default 60 seconds (1st gen), 3600 seconds (2nd gen) - Maximum timeout: 540 seconds (1st gen), 3600 seconds (2nd gen) - Concurrency: 1 (1st gen), configurable up to 1000 (2nd gen) - Min instances: 0 (default), can set >0 for 2nd gen - Invocation billing: per 100 ms increments

Configuration commands:

gcloud functions deploy hello-world --runtime python39 --trigger-http --allow-unauthenticated --memory=512MB --timeout=120
gcloud functions deploy process-file --runtime nodejs16 --trigger-bucket=my-bucket --entry-point=processFile

How They Interact

These services are not mutually exclusive. A common architecture uses Cloud Functions to process events (e.g., image upload), store results in Cloud Storage, and a Cloud Run service to serve a web frontend. GKE might run a batch processing job that scales to thousands of pods, while Compute Engine hosts a legacy database that cannot be containerized. The GCDL exam tests your ability to select the best option for a given scenario based on trade-offs between control, scalability, cost, and operational overhead.

Walk-Through

1

Choose Compute Engine for full OS control

When you need to install custom kernels, use specific OS configurations, or run software that requires full access to the hypervisor (e.g., some security tools), Compute Engine is the only option. You create a VM instance with your desired machine type, boot disk, and network settings. After SSH access, you have root or administrator privileges. You manage patching, scaling (via instance groups), and load balancing yourself. This is ideal for lift-and-shift migrations of legacy applications.

2

Select GKE for container orchestration at scale

If your application is containerized and you need automated deployment, scaling, and management across a cluster, GKE is the choice. You define a Deployment with desired replicas, and GKE ensures the correct number of pods are running. The cluster autoscaler and HPA handle resource scaling. You benefit from managed upgrades, logging with Cloud Logging, and integration with Cloud Build for CI/CD. GKE is suitable for microservices architectures and batch processing.

3

Use Cloud Run for stateless HTTP services

Cloud Run is best when you have a container that serves HTTP requests and you want to avoid managing any infrastructure. You deploy a container image, and Cloud Run automatically scales from zero to handle traffic. You pay per request, making it cost-effective for variable or low-traffic services. Cold starts can be mitigated with min instances. Cloud Run is ideal for APIs, web applications, and event-driven processing via Pub/Sub or Cloud Scheduler.

4

Apply Cloud Functions for event-driven, short-lived tasks

Cloud Functions excels for lightweight, event-driven code that runs in response to a specific trigger, such as a file upload or a Pub/Sub message. Functions have a maximum timeout (540 seconds for 1st gen, 3600 for 2nd gen), so they are not suitable for long-running processes. They are perfect for data transformation, notifications, and lightweight API endpoints. The 1st gen functions have a concurrency of 1 per instance, while 2nd gen supports concurrent requests.

5

Combine services in a multi-tier architecture

In production, you might use Cloud Functions to process incoming data (e.g., resize images), store results in Cloud Storage, then use Cloud Run to serve a web frontend that displays the images. GKE could run a backend service for heavy computation, while Compute Engine hosts a legacy database. The GCDL exam expects you to understand how these services can be integrated via triggers, load balancers, and service accounts.

What This Looks Like on the Job

Scenario 1: E-commerce Platform with Microservices

A large online retailer uses GKE to deploy over 50 microservices. Each service is containerized and deployed as a set of pods. The cluster autoscaler adds nodes during Black Friday traffic spikes, scaling from 20 to 200 nodes. HPA adjusts replicas per service based on CPU and custom metrics (e.g., queue depth). The platform team uses Cloud Build to build images and deploy via Helm charts. Cloud SQL is used for stateful databases, while Redis (Memorystore) caches session data. The retailer chose GKE over Cloud Run because some services require stateful sets, and they needed fine-grained control over pod placement and networking. Misconfiguration of resource requests led to node overcommitment, causing pod evictions. They resolved this by setting proper resource limits and using node taints to isolate critical workloads.

Scenario 2: Serverless Image Processing Pipeline

A photo-sharing app uses Cloud Storage to store user uploads. When a new image is uploaded, a Cloud Function (triggered by the google.storage.object.finalize event) resizes the image to three sizes and saves them back to Cloud Storage. A Cloud Run service serves a web frontend that displays thumbnails. The function has a 60-second timeout, which is sufficient for resizing. During peak usage, the function scales to hundreds of concurrent invocations. The app chose Cloud Functions over Cloud Run for the processing step because the task is short-lived and event-driven, and they wanted to avoid paying for idle compute. A common mistake is setting the timeout too low; the default 60 seconds may be insufficient for large images. They increased timeout to 120 seconds and moved to 2nd gen functions for better concurrency.

Scenario 3: Lift-and-Shift of a Legacy ERP System

A manufacturing company migrates its on-premises ERP system to Compute Engine. The application requires Windows Server with a specific SQL Server version. They create a VM with 32 vCPUs and 128 GB RAM, attach a persistent disk for data, and use a static internal IP. They set up a managed instance group for high availability. The ERP is not containerized, so GKE and serverless options are not suitable. They use Cloud VPN to connect to on-premises systems. A common issue is misconfiguring disk performance: they initially used standard persistent disks, which caused high I/O latency. They switched to SSD persistent disks and enabled disk snapshots for backup. The cost of running the VM 24/7 is high, but they use committed use discounts for 1 year to reduce costs by 37%.

How GCDL Actually Tests This

What GCDL Tests on This Topic

The GCDL exam objective 2.2 expects you to 'identify the appropriate compute option for a given workload.' Exam questions are scenario-based: you are given a business requirement (e.g., 'need to run a containerized app that scales to zero when not in use') and must choose among Compute Engine, GKE, Cloud Run, Cloud Functions, or App Engine (which is not covered in this chapter but appears in other objectives).

Common Wrong Answers and Why Candidates Choose Them 1. Choosing Compute Engine for a containerized microservice when the requirement is to minimize operational overhead. Candidates think 'I need containers, so I need VMs to run them.' They forget that GKE and Cloud Run manage containers without VM management. The correct answer is Cloud Run if the service is stateless and HTTP-triggered, or GKE if you need orchestration features. 2. Selecting Cloud Functions for a long-running process (e.g., video transcoding that takes 30 minutes). The default timeout for 1st gen functions is 60 seconds; max is 540 seconds. Even 2nd gen max is 3600 seconds. Candidates overlook timeout limits. The correct answer is Cloud Run (max 60 minutes) or GKE/Compute Engine. 3. Choosing GKE for a simple API that serves low traffic. Candidates see 'container' and think 'Kubernetes.' But GKE requires cluster management, even if automated. Cloud Run is simpler and scales to zero, costing less for low traffic. 4. Picking Compute Engine for a batch job that runs once a month for 10 minutes. Candidates think 'VMs are flexible.' But a preemptible VM or Cloud Run (with min instances 0) would be cheaper and simpler.

Specific Numbers and Terms on the Exam - Cloud Run idle timeout default: 5 minutes - Cloud Functions 1st gen timeout max: 540 seconds - GKE cluster autoscaler adds nodes within 1-2 minutes - Compute Engine sustained use discount: up to 30% for full month - Preemptible VM max lifetime: 24 hours - Cloud Run max instances default: 100

Edge Cases and Exceptions - Cloud Run requires a stateless container; if you need state, use GKE with StatefulSets or Compute Engine. - Cloud Functions can be triggered by HTTP, but if you need more than 540 seconds timeout, use Cloud Run. - GKE Autopilot mode abstracts node management entirely; the exam may contrast Autopilot with Standard mode. - Compute Engine sole-tenant nodes are used for compliance or licensing needs (e.g., Windows Server licensing).

How to Eliminate Wrong Answers - If the scenario mentions 'no infrastructure management' or 'scale to zero,' eliminate Compute Engine and GKE (unless GKE Autopilot is mentioned). - If the task is event-driven (e.g., 'when a file is uploaded'), prefer Cloud Functions. - If the application is a web API or backend service, prefer Cloud Run. - If the workload requires specific OS or hardware (GPU, TPU), choose Compute Engine. - If the workload is containerized and needs orchestration (rolling updates, service discovery), choose GKE.

Key Takeaways

Compute Engine provides IaaS with full control; use for lift-and-shift, legacy apps, or workloads requiring specific OS/hardware.

GKE is the managed Kubernetes service; use for containerized microservices needing orchestration, scaling, and self-healing.

Cloud Run runs stateless containers in a serverless model; scales to zero, pay per request; ideal for HTTP APIs and web apps.

Cloud Functions is FaaS for event-driven, short-lived code; triggered by Cloud Storage, Pub/Sub, HTTP, etc.

Cloud Run idle timeout default is 5 minutes; configurable up to 60 minutes.

Cloud Functions 1st gen max timeout is 540 seconds; 2nd gen max is 3600 seconds.

GKE cluster autoscaler adds nodes in ~1-2 minutes; HPA default target CPU utilization is 80%.

Compute Engine sustained use discounts up to 30% for full month; committed use discounts up to 57% for 1-3 years.

Preemptible VMs run max 24 hours and can be terminated with 30-second notice; use for fault-tolerant batch jobs.

Cloud Run max instances default is 100; can be increased up to 1000.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Compute Engine (IaaS)

Full control over OS, kernel, and software stack

Can run any workload, including stateful and long-running

Pay per hour of VM uptime (minimum 1 minute)

Requires manual scaling via instance groups or autoscaler

Best for lift-and-shift migrations or legacy apps

Cloud Run (Serverless Containers)

No control over underlying OS; only container-level control

Stateless containers only; request-driven scaling

Pay per request (CPU and memory per second, min 1 minute)

Automatic scaling from zero to thousands of instances

Best for stateless HTTP services and APIs

GKE (Container Orchestration)

Manages clusters of VMs running containers

Supports stateful workloads via StatefulSets

Requires cluster management (even with Autopilot)

Ideal for microservices and batch processing

Scaling via cluster autoscaler and HPA

Cloud Functions (FaaS)

No cluster management; function is ephemeral

Stateless and short-lived (max 540s 1st gen, 3600s 2nd gen)

Fully serverless; no infrastructure to manage

Ideal for event-driven, single-purpose tasks

Scaling per function invocation (concurrent requests up to 1000 in 2nd gen)

Watch Out for These

Mistake

Cloud Run requires you to use a specific programming language or runtime.

Correct

Cloud Run runs any container image, so you can use any language or framework as long as it is packaged in a container. There is no restriction on runtime.

Mistake

Cloud Functions can handle long-running processes as long as you set a high timeout.

Correct

Cloud Functions 1st gen has a maximum timeout of 540 seconds (9 minutes); 2nd gen has a maximum of 3600 seconds (60 minutes). For processes longer than that, you must use Cloud Run or Compute Engine.

Mistake

GKE is always more expensive than Compute Engine because you pay for the cluster management fee.

Correct

GKE does not charge a cluster management fee for clusters with fewer than 100 nodes. You only pay for the underlying Compute Engine instances. For larger clusters, the fee is $0.10 per cluster per hour. GKE can be cost-effective due to autoscaling and higher resource utilization.

Mistake

Compute Engine is the best choice for all containerized workloads because you have full control.

Correct

While you can run containers on Compute Engine (e.g., with Docker), you lose the orchestration, scaling, and self-healing benefits of GKE. For production containerized workloads, GKE or Cloud Run are almost always better choices.

Mistake

Cloud Run and Cloud Functions are the same service.

Correct

Cloud Run runs containers that can handle multiple concurrent requests and has a longer timeout (up to 60 minutes). Cloud Functions runs single-purpose functions (1st gen: concurrency=1) with shorter timeouts. Cloud Functions is more event-driven; Cloud Run is more request-driven.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

When should I use Cloud Run vs GKE?

Use Cloud Run when you have a stateless container that serves HTTP requests and you want minimal operational overhead. Cloud Run scales to zero and charges per request. Use GKE when you need more control over the container orchestration, such as StatefulSets, custom networking, or node-level configurations. GKE is also better for workloads that require multiple containers per pod (sidecars) or complex scheduling policies.

Can I run a stateful application on Cloud Run?

Cloud Run is designed for stateless containers. The container instances are ephemeral and can be terminated at any time. If you need to persist state, you must use external services like Cloud SQL, Firestore, or Cloud Storage. For stateful applications that require local disk or network storage attached to the instance, use Compute Engine or GKE with StatefulSets.

What is the difference between Cloud Functions 1st gen and 2nd gen?

1st gen functions have a maximum timeout of 540 seconds, concurrency of 1 per instance, and limited memory (up to 4 GB). 2nd gen functions are built on Cloud Run and Cloud Events, offering up to 60 minutes timeout, configurable concurrency (up to 1000), up to 32 GB memory, and support for min instances to reduce cold starts. 2nd gen also supports eventarc for broader event sources.

How does GKE Autopilot differ from Standard mode?

In Standard mode, you manage the node pools (VM instances) and are responsible for node upgrades, scaling, and optimization. In Autopilot mode, Google manages the entire cluster infrastructure, including nodes, scaling, and upgrades. You only define your workloads (pods). Autopilot is more expensive per node but reduces operational overhead and ensures compliance with best practices.

What is the minimum billing for Cloud Run?

Cloud Run bills for CPU and memory usage in 100 ms increments, with a minimum billing of 1 minute per invocation. If your request completes in 200 ms, you are billed for 1 minute. If it takes 2 minutes, you are billed for 2 minutes. This minimum billing period can make Cloud Run less cost-effective for extremely short requests if traffic is very high.

Can I use GPUs with Cloud Run or Cloud Functions?

No. Cloud Run and Cloud Functions do not support GPUs. If you need GPU acceleration (e.g., for machine learning inference), you must use Compute Engine (with GPU-attached VMs) or GKE (with node pools that include GPUs).

What is the maximum number of vCPUs I can get on a single Compute Engine VM?

The maximum vCPUs per VM depends on the machine family. For general-purpose N2 and N2D, up to 128 vCPUs. For compute-optimized C2, up to 60 vCPUs. For memory-optimized M2, up to 416 vCPUs. You can also create custom machine types with up to 96 vCPUs (subject to quotas).

Terms Worth Knowing

Ready to put this to the test?

You've just covered Compute Options on Google Cloud — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Done with this chapter?