This chapter covers the three primary compute options on Google Cloud: Virtual Machines (VMs), Containers, and Serverless. Understanding when to use each is critical for the GCDL exam, as roughly 15% of questions test your ability to compare these models based on control, scalability, operational overhead, and cost. You will learn the underlying mechanisms, key trade-offs, and real-world deployment patterns to make informed architectural decisions.
Jump to a section
Imagine you are running a restaurant. You have a full commercial kitchen with ovens, stoves, refrigerators, and prep stations. Each night, you host multiple parties. In the VM model, you build a separate, fully equipped kitchen for each party: you install a stove, fridge, and prep table inside a soundproof booth, and then you cook each party's meal in its own booth. This is heavy: each booth takes time to set up, uses its own appliances, and requires its own cleaning. In the container model, you have one shared kitchen with all the appliances, but each party's meal is cooked in its own dedicated set of pots and pans on the same stove. The pots are isolated from each other but share the stove and fridge. This is lighter: you can start cooking faster, use fewer appliances, and clean only the pots. In the serverless model, you don't even have a kitchen. Instead, you call a delivery service that cooks each dish on demand at a remote central kitchen. You don't manage any appliances; you just specify the recipe and pay per dish. The service handles scaling: if 100 parties order pasta, they fire up 100 burners automatically. You never wait for a kitchen to be set up, but you also have no control over the stove temperature or ingredient sourcing. The trade-off is control vs. overhead: VMs give you full control but heavy overhead; containers share the OS but isolate processes; serverless abstracts everything except the code.
What Are VMs, Containers, and Serverless?
Compute is the backbone of any cloud workload. Google Cloud offers three broad categories for running applications: Virtual Machines (Compute Engine), Containers (Google Kubernetes Engine and Cloud Run), and Serverless (Cloud Functions and App Engine). Each represents a different level of abstraction, from full control over the OS and hardware to fully managed, event-driven execution.
Virtual Machines (Compute Engine) provide an emulation of a physical computer, including virtualized CPU, memory, storage, and networking. You choose the machine family (general-purpose, compute-optimized, memory-optimized, accelerator-optimized) and machine type (e.g., n2-standard-4: 4 vCPUs, 16 GB RAM). You are responsible for the guest OS, middleware, runtime, and application. Billing is per-second after a 1-minute minimum.
Containers (via GKE or Cloud Run) package an application with its dependencies into a lightweight, portable image. Containers share the host OS kernel but are isolated using Linux namespaces and cgroups. They start in seconds, consume less memory than VMs, and are ideal for microservices. GKE manages clusters of VMs (nodes) that run containers, while Cloud Run is a fully managed serverless container platform.
Serverless (Cloud Functions, App Engine) abstracts away all infrastructure. You provide code (or a container for Cloud Run), and the platform automatically provisions and scales compute resources. Billing is based on execution time, memory allocated, and invocations. There is no idle cost, but there are cold starts (latency when scaling from zero).
How Compute Engine VMs Work
When you create a VM in Compute Engine, the following occurs:
Resource allocation: The API (or gcloud command) specifies the zone, machine type, boot disk image, and network. The hypervisor (KVM-based) allocates vCPUs (hardware threads) and memory from the physical host.
Disk provisioning: A persistent disk is created from the specified image (e.g., Ubuntu 22.04 LTS). The disk is replicated within the zone (standard) or across zones (regional).
Networking: An internal IP is assigned from the VPC subnet. An optional external ephemeral IP can be attached. Firewall rules are applied.
Boot: The VM boots using the disk image. You can SSH into it using OS Login or SSH keys.
Lifecycle: The VM can be running, stopped (no compute cost, only disk cost), or terminated. Live migration moves running VMs between hosts without reboot for maintenance.
Key defaults:
Boot disk: 10 GB standard persistent disk (pd-standard) by default.
Machine type: n1-standard-1 (1 vCPU, 3.75 GB RAM) if not specified.
Billing: per-second after 1 minute minimum. Sustained use discounts apply for running >25% of a month.
gcloud command example:
gcloud compute instances create my-vm \
--zone=us-central1-a \
--machine-type=n2-standard-4 \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--boot-disk-size=50GBHow Containers Work on GKE and Cloud Run
Containers use kernel features to isolate processes: - Namespaces: Provide isolated views of the system (PID, network, mount, user, etc.). Each container sees its own process tree, network stack, and filesystem. - cgroups: Control resource usage (CPU, memory, disk I/O). Limits prevent a container from starving others.
Google Kubernetes Engine (GKE) manages a cluster of Compute Engine VMs (nodes) that run containers. Key components: - Cluster: A group of nodes (VMs) managed by the Kubernetes control plane. - Node pool: A subset of nodes with the same configuration (machine type, autoscaling). - Pod: The smallest deployable unit – one or more containers with shared storage/network. - Deployment: Declares the desired state (e.g., 3 replicas of a container image). - Service: Exposes pods via a stable IP or load balancer.
GKE autoscaling: Horizontal Pod Autoscaler (HPA) adjusts replica count based on CPU/memory; Cluster Autoscaler adds/removes nodes based on pod scheduling.
Cloud Run is a fully managed compute platform for containers. It abstracts the cluster: you provide a container image, and Cloud Run automatically scales from 0 to N instances based on requests. Billing is per 100 ms of CPU and memory usage, with a minimum of 100 ms per request. Cold starts occur when scaling from zero; you can mitigate with min-instances (paid).
gcloud command example (Cloud Run):
gcloud run deploy my-service \
--image=gcr.io/my-project/my-image:latest \
--region=us-central1 \
--platform=managed \
--allow-unauthenticatedHow Serverless Works: Cloud Functions and App Engine
Cloud Functions is a lightweight, event-driven compute service. You write a function in Node.js, Python, Go, Java, .NET, or Ruby, and deploy it. The platform triggers it on events (HTTP, Cloud Storage, Pub/Sub, Firestore, etc.). Execution is limited to 9 minutes (HTTP functions) or 60 minutes (background functions). Memory can be set from 128 MB to 8 GB. Billing is per invocation (first 2 million free), compute time (per 100 ms), and memory.
App Engine is a Platform-as-a-Service (PaaS) for web applications. It supports two environments: - Standard environment: Sandboxed, supports specific runtimes (Python, Java, Go, Node.js, PHP, Ruby). Scales to zero, but has restrictions (no local disk writes, limited libraries). - Flexible environment: Runs your app in Docker containers on Compute Engine VMs. Scales to zero? No – it keeps at least one instance running. Supports any runtime.
App Engine automatically handles load balancing, scaling, and health checks. You define scaling parameters (min/max instances, idle timeout).
Comparison of Key Characteristics
| Feature | VMs (Compute Engine) | Containers (GKE/Cloud Run) | Serverless (Cloud Functions/App Engine) | |---------|----------------------|----------------------------|----------------------------------------| | Abstraction level | Hardware (virtualized) | OS (kernel sharing) | Application (fully managed) | | Startup time | Minutes | Seconds (Cloud Run: <1s warm) | Milliseconds (warm) to seconds (cold) | | Billing granularity | Per-second (1 min min) | Per-second (GKE node), per-100ms (Cloud Run) | Per-invocation + per-100ms | | Scaling | Manual or managed instance groups | Automated (HPA + Cluster Autoscaler) | Automatic from 0 to N | | Control | Full OS, kernel, hardware | OS (via node), container runtime | None (platform managed) | | Persistence | Persistent disks, local SSD | Volumes (PersistentVolumeClaim) | Cloud Storage, Firestore (stateful outside) | | Max execution | Unlimited | Unlimited (but pods have max lifetime) | 9-60 minutes per invocation | | Networking | VPC, custom firewall | VPC-native (GKE), internal routing | VPC connector for private resources |
When to Use Each
Choose VMs when: - You need full control over the OS, kernel modules, or hardware (e.g., GPU workloads). - You are migrating existing applications that are not containerized. - You require specific static IPs or complex networking configurations. - You have licensed software tied to a specific OS.
Choose Containers when: - You are building microservices architectures. - You need fast startup and high density (many services per host). - You want consistent environments across dev, test, and prod. - You need orchestration for rolling updates, canary deployments, and auto-healing.
Choose Serverless when: - You have event-driven workloads (e.g., processing files on Cloud Storage). - You want zero idle cost and automatic scaling to zero. - You have variable or unpredictable traffic. - You want to focus on code, not infrastructure.
Integration with Other Google Cloud Services
VMs integrate with Cloud Load Balancing, Cloud CDN, Cloud NAT, VPC peering, and Cloud Interconnect.
GKE integrates with Cloud Build, Container Registry/Artifact Registry, Cloud Monitoring, Cloud Logging, and Cloud Audit Logs.
Cloud Run integrates with Cloud Scheduler, Pub/Sub, Eventarc, and Workflows.
Cloud Functions integrates natively with all Google Cloud services via triggers.
Exam Relevance
GCDL exam objectives under 2.2 require you to compare and contrast these compute options. You must know:
The key differences in abstraction level, scaling, billing, and operational overhead.
Which compute option is best for specific scenarios (e.g., legacy migration, microservices, event processing).
The names of Google Cloud products: Compute Engine, GKE, Cloud Run, Cloud Functions, App Engine.
That Cloud Run is serverless for containers, while GKE is orchestrated containers.
That App Engine Standard scales to zero, but Flexible does not.
Common exam traps:
Confusing Cloud Run (serverless containers) with GKE (orchestrated containers).
Thinking App Engine Flexible scales to zero – it does not.
Assuming VMs are always more expensive – sustained use discounts and committed use discounts can make them cost-effective for steady workloads.
Believing containers are always faster to deploy than VMs – container images need to be built and pushed, but runtime startup is faster.
Conclusion
Choosing the right compute service depends on your requirements for control, scaling, cost, and operational complexity. VMs offer maximum flexibility but require management. Containers balance portability and efficiency. Serverless minimizes overhead but imposes constraints. The GCDL exam expects you to map business needs to the appropriate Google Cloud compute product.
Provision a Compute Engine VM
You use the Google Cloud Console, gcloud CLI, or API to create a VM. The request includes zone, machine type, boot disk image, and network. The hypervisor allocates vCPUs and memory from the host. A persistent disk is created and attached. The VM boots and gets an internal IP. You can SSH into it. The entire process takes 1-3 minutes. Billing starts when the VM starts, at a per-second rate after the first minute.
Deploy a container to GKE
First, you create a GKE cluster (or use an existing one). You define a Deployment YAML specifying the container image, replicas, and resource requests/limits. Kubernetes schedules pods onto nodes. The kubelet on each node pulls the image from Container Registry and starts the container using Docker or containerd. The container gets its own network namespace and IP. A Service exposes it via a load balancer. Scaling can be manual or via HPA.
Deploy a container to Cloud Run
You build a container image and push it to Artifact Registry. Then you run `gcloud run deploy` with the image name, region, and service name. Cloud Run automatically creates a revision, sets up a HTTPS endpoint, and configures autoscaling (0 to N). When a request arrives, Cloud Run routes it to an instance; if none exists, it starts one (cold start). Billing is per 100 ms of CPU/memory usage. You can set concurrency (max simultaneous requests per instance).
Create a Cloud Function
You write a function in your chosen language and deploy it using `gcloud functions deploy`. You specify the trigger (e.g., HTTP, Cloud Storage). The platform packages the code and dependencies into a container image, deploys it, and exposes an endpoint. When the trigger fires, Cloud Functions starts an instance (if not already warm) and runs the function. Execution time is limited (9 min for HTTP, 60 min for background). Logs are sent to Cloud Logging.
Scale a managed instance group
For VMs, you can create a managed instance group (MIG) with autoscaling. You define a template (machine type, image), min/max instances, and autoscaling metric (e.g., CPU utilization). The MIG creates or deletes VMs based on load. Each VM boots from the template. You can use rolling updates to replace instances with a new template. MIGs provide autohealing: if an instance becomes unhealthy, it is recreated.
Scenario 1: Legacy Enterprise Application Migration A large financial institution needs to migrate a monolithic Java application running on Windows Server to Google Cloud. The application requires full OS access for custom kernel drivers and has a license tied to specific CPU cores. The team chooses Compute Engine VMs. They create a custom machine type with 32 vCPUs and 128 GB RAM, attach a regional persistent disk for high availability, and use a Windows Server 2019 image. They set up a managed instance group with a fixed size of 2 for redundancy. The VM boots in about 5 minutes. They use Cloud Load Balancing to distribute traffic. The key challenge is cost: running 2 VMs 24/7 is expensive, but committed use discounts (1-year) reduce cost by 20%. They also use live migration to avoid downtime during host maintenance. If they had chosen containers, the kernel driver dependency would have prevented migration.
Scenario 2: Microservices on GKE A SaaS startup runs 50 microservices on GKE. Each service is containerized and deployed via CI/CD. They use a cluster with 3 node pools: one for general-purpose services (n2-standard-4), one for memory-intensive services (n2-highmem-8), and one for GPU-accelerated ML inference (a2-highgpu-1g). They use Horizontal Pod Autoscaler to scale based on CPU and custom metrics. Cluster Autoscaler adds nodes when pods cannot be scheduled. They use Istio for service mesh. The challenge is cost management: node VMs run 24/7, even if pods scale down. They use preemptible VMs for batch jobs to save 60%. They also use Cloud Run for services with low traffic to avoid paying for idle nodes. The team must understand Kubernetes concepts like pods, deployments, services, and ingress.
Scenario 3: Event-Driven Image Processing A media company processes user-uploaded images. When a user uploads an image to Cloud Storage, a Cloud Function is triggered. The function reads the image, resizes it using ImageMagick, and writes the thumbnail to another bucket. The function is written in Python, with 2 GB memory and a 540-second timeout. It processes images in parallel (up to 1000 concurrent invocations). The cost is low: only pay per invocation and compute time. Cold starts are acceptable because the function runs infrequently. If the company needed more control over the processing environment (e.g., custom libraries), they might use Cloud Run instead. The main risk is hitting the 9-minute timeout for large images; they would then need to use a background function with 60-minute limit or switch to GKE.
GCDL Objective 2.2: Compute Comparison The exam tests your ability to compare VMs, containers, and serverless based on control, scaling, cost, and operational overhead. You must know the Google Cloud product names and their characteristics. Expect 3-5 questions on this topic. Key areas:
Common Wrong Answers: 1. 'Containers are always cheaper than VMs.' Reality: Containers share OS, reducing overhead, but you still pay for the underlying VMs (nodes) in GKE. Cloud Run is serverless and can be cheaper for low-traffic services. The cost comparison depends on workload density and traffic patterns. 2. 'Cloud Run is a container orchestration service.' Reality: Cloud Run is serverless for containers; it does not orchestrate. GKE is the orchestration service. Cloud Run abstracts the cluster entirely. 3. 'App Engine Flexible scales to zero.' Reality: App Engine Flexible always runs at least one instance. Only App Engine Standard scales to zero. 4. 'VMs are always the best choice for high-performance computing.' Reality: For HPC, VMs are often used, but Google Cloud also offers bare metal solutions and specialized accelerators. Containers can also run HPC workloads with proper configuration.
Specific Values and Terms: - Compute Engine billing: per-second after 1-minute minimum. - Cloud Run billing: per 100 ms, minimum 100 ms per request. - Cloud Functions timeout: 9 minutes (HTTP), 60 minutes (background). - GKE node pools: can have different machine types. - App Engine Standard: supports Python, Java, Go, Node.js, PHP, Ruby. - Cloud Run: supports any container image, but must be stateless.
Edge Cases: - If a workload needs GPU, VMs or GKE with GPU node pools are required. Cloud Functions and Cloud Run do not support GPUs. - If a workload needs static IPs, VMs or GKE Services with static IPs are used. Cloud Run and Cloud Functions get ephemeral IPs by default. - If a workload runs longer than 9 minutes, Cloud Functions HTTP is not suitable; use Cloud Run (no timeout) or GKE.
How to Eliminate Wrong Answers: - If the question mentions 'full control over OS', eliminate serverless and containers (unless you control the container image, but OS is shared). - If the question mentions 'event-driven', eliminate VMs (though you can trigger scripts, serverless is native). - If the question mentions 'microservices', containers or serverless are likely correct. - If the question mentions 'zero idle cost', serverless (Cloud Functions, Cloud Run, App Engine Standard) is the answer.
Compute Engine VMs offer full control over OS and hardware; billed per-second after 1-minute minimum.
GKE manages containers on a cluster of VMs; supports autoscaling, rolling updates, and GPU node pools.
Cloud Run is serverless for containers: scales to zero, billed per 100 ms, no idle cost.
Cloud Functions is event-driven serverless: max 9-minute HTTP timeout, 60-minute background timeout.
App Engine Standard scales to zero; App Engine Flexible does not.
Choose VMs for legacy migrations and full OS control; containers for microservices; serverless for event-driven or variable workloads.
Sustained use discounts and committed use discounts reduce VM costs for steady workloads.
Cold starts in serverless can be mitigated with min-instances (paid) or by keeping functions warm.
These come up on the exam all the time. Here's how to tell them apart.
Compute Engine (VMs)
Full control over OS and kernel.
Startup time: 1-5 minutes.
Billing per-second after 1-minute minimum.
Can attach GPUs and local SSDs.
Best for monolithic apps and legacy migrations.
Google Kubernetes Engine (Containers)
Share host OS kernel; control over container runtime.
Startup time: seconds (pod), but node provisioning takes minutes.
Billing per node (VM) plus any premium GKE fees.
GPU support via node pools.
Best for microservices and orchestrated deployments.
Cloud Run (Serverless Containers)
Runs any container image (any runtime/language).
No timeout limit (but billed per 100 ms).
Scales from 0 to N; cold starts possible.
Supports concurrent requests (up to 1000 per instance).
Can use VPC connectors for private networking.
Cloud Functions (Serverless Functions)
Runs only supported runtimes (Node.js, Python, Go, etc.).
Timeout: 9 min (HTTP), 60 min (background).
Scales from 0 to N; cold starts possible.
Single request per instance (no concurrency).
Native triggers from Cloud Storage, Pub/Sub, etc.
Mistake
Containers are always faster than VMs.
Correct
Containers start in seconds, VMs in minutes. However, building and pushing container images takes time. For long-running workloads, the startup time difference is negligible. Also, VMs can be pre-warmed with custom images.
Mistake
Serverless is always cheaper than VMs.
Correct
Serverless can be cheaper for low-traffic or bursty workloads because you pay only for execution. For steady, high-traffic workloads, VMs with committed use discounts can be cheaper. Serverless also incurs per-invocation overhead.
Mistake
Cloud Run is just a managed Kubernetes service.
Correct
Cloud Run is a serverless platform for containers. It does not expose Kubernetes APIs. GKE is the managed Kubernetes service. Cloud Run abstracts clusters entirely.
Mistake
App Engine Flexible scales to zero instances.
Correct
App Engine Flexible always keeps at least one instance running to handle requests. Only App Engine Standard can scale to zero. Flexible uses VMs under the hood.
Mistake
You cannot use GPUs with containers.
Correct
GKE supports GPU node pools. You can schedule pods that request GPUs. Cloud Run and Cloud Functions do not support GPUs. VMs with GPUs are also available.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Cloud Run is a fully managed serverless platform that runs container images. You don't manage any infrastructure; you just deploy a container and it scales automatically from 0 to N. GKE is a managed Kubernetes cluster where you control the node VMs, networking, and orchestration. GKE gives you more control but requires managing the cluster. Cloud Run is simpler and better for stateless services, while GKE is for complex microservices architectures.
Cloud Run is designed for stateless containers. You can use Cloud Storage, Firestore, or Cloud SQL for persistence, but the container itself should not store state locally because instances can be terminated at any time. For stateful workloads, consider GKE with StatefulSets or Compute Engine VMs.
No. App Engine Flexible always keeps at least one instance running. Only App Engine Standard can scale to zero. Flexible runs your app in Docker containers on Compute Engine VMs, so there is always a VM running. Standard uses a sandbox and can shut down all instances when idle.
For HTTP-triggered functions, the timeout is 9 minutes. For background functions (e.g., Cloud Storage, Pub/Sub), the timeout is 60 minutes. You can set the timeout when deploying the function. If your function needs longer, consider Cloud Run (no timeout limit) or GKE.
You are billed per second after a 1-minute minimum. For example, if you run a VM for 30 seconds, you are billed for 1 minute. If you run it for 90 seconds, you are billed for 90 seconds. Sustained use discounts apply automatically for VMs running more than 25% of a month. Committed use discounts (1 or 3 years) provide up to 70% discount.
A cold start occurs when a serverless function or container is invoked after being idle, so the platform must provision a new instance. This adds latency (usually 100ms to a few seconds). Cloud Run and Cloud Functions both have cold starts. You can mitigate by setting a minimum number of instances (paid) or by using warm-up requests.
No. Cloud Functions does not support GPUs. For GPU workloads, use Compute Engine VMs with GPU accelerators or GKE with GPU node pools. Cloud Run also does not support GPUs.
You've just covered Compute Comparison: VMs vs Containers vs Serverless — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.
Done with this chapter?