GCDLChapter 71 of 101Objective 4.2

Cloud Run for Serverless Containers

This chapter covers Cloud Run, Google Cloud's fully managed serverless container platform. Cloud Run is a core topic for the GCDL exam, appearing in roughly 10-15% of questions in the Apps domain. Understanding Cloud Run's architecture, scaling behavior, and integration with other services is essential for the Digital Leader certification. This chapter provides the depth needed to answer exam questions confidently.

25 min read

Intermediate

Updated May 31, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Cloud Run as a Food Truck Fleet

Imagine a city with thousands of potential customers who want food delivered instantly. Instead of building a full restaurant (a server) that sits empty most of the time, you operate a fleet of food trucks (containers) that can be dispatched on demand. Each food truck is fully stocked with its own kitchen (container image) and can serve one order at a time. A central dispatcher (Cloud Run) receives requests (HTTP requests) and instantly routes them to an available truck. If no truck is free, the dispatcher can start a new truck from scratch in under a minute (cold start), but it takes a few seconds to get the kitchen ready (container startup). Once the order is served, the truck may idle for a while (idle timeout, default 5 minutes) before being decommissioned to save fuel (cost). Trucks can be scaled from 0 to thousands based on demand. The dispatcher also handles routing, load balancing, and health checks. If a truck fails, the dispatcher redirects the order to another truck. This model allows the fleet to handle sudden spikes (like lunch rush) without paying for idle trucks overnight.

How It Actually Works

What is Cloud Run and Why Does It Exist?

Cloud Run is a managed compute platform that runs stateless containers in a serverless environment. It abstracts away infrastructure management, allowing developers to deploy containerized applications without provisioning clusters, nodes, or virtual machines. The key innovation is that it combines the portability of containers with the operational simplicity of serverless functions (like Cloud Functions). You can bring your own container image (from Artifact Registry or Container Registry) and Cloud Run handles scaling, load balancing, and availability automatically.

Cloud Run exists to solve several pain points: - Operational overhead: Traditional container orchestration (Kubernetes) requires managing clusters, node pools, and scaling policies. - Cost inefficiency: Always-on servers incur costs even when idle. - Scaling complexity: Manually configuring autoscaling for variable traffic is error-prone. - Cold start latency: Functions have cold starts, but containers can be optimized for faster startup.

How Cloud Run Works Internally

Cloud Run runs on top of Google Kubernetes Engine (GKE) and uses the Knative serving layer. When you deploy a container image, Cloud Run: 1. Stores the image in Artifact Registry. 2. Creates a revision (immutable snapshot of the deployment configuration). 3. Routes traffic to the revision via a managed HTTPS endpoint. 4. Instantiates container instances on demand, each in its own isolated sandbox (gVisor).

Request Flow:

A client sends an HTTP request to the Cloud Run service URL (e.g., https://service-name-xxxx-uc.a.run.app).

Google Front End (GFE) terminates TLS and forwards the request to the Cloud Run control plane.

The control plane checks if an idle instance is available. If yes, it routes the request to that instance. If no, it initiates a cold start: pulls the container image from Artifact Registry, starts the container, and runs the health check (if configured).

The container must respond within the request timeout (default 5 minutes, max 60 minutes).

Cloud Run injects environment variables like K_SERVICE, K_REVISION, and K_CONFIGURATION.

The container handles the request and returns a response. Cloud Run logs the request and metrics.

Key Components, Values, Defaults, and Timers

Container instances: Each instance handles one request at a time by default (concurrency=1). You can set concurrency up to 80 (or 250 for second generation execution environment).

Max instances: Default is 100, but can be set up to 1000 (or more with quota increase).

Min instances: Default is 0 (scale to zero). Setting >0 reduces cold starts but incurs cost.

Idle timeout: Default 5 minutes. After this period with no requests, the instance is shut down. Range: 0 to 60 minutes (0 means immediate shutdown after request).

Request timeout: Default 5 minutes. Max 60 minutes for first gen, 60 minutes for second gen.

Memory: 128 MiB to 32 GiB. Default 256 MiB.

CPU: Allocated only during request processing (throttled) by default. You can enable CPU always on (for background tasks).

Execution environment: First gen (sandbox-based, faster cold start) or second gen (VM-based, better networking, larger memory, supports gRPC).

VPC connector: Allows Cloud Run to access resources in a VPC network (e.g., Cloud SQL, Memorystore).

Cloud Run for Anthos: Deploy on-premises or on GKE clusters using the same API.

Configuration and Verification Commands

Deploy a service:

gcloud run deploy myservice \
  --image us-docker.pkg.dev/myproject/myrepo/myimage:tag \
  --region us-central1 \
  --memory 512Mi \
  --concurrency 80 \
  --timeout 300 \
  --max-instances 50 \
  --min-instances 2 \
  --no-allow-unauthenticated

List services:

gcloud run services list --region us-central1

Describe a service (show revisions, traffic split, etc.):

gcloud run services describe myservice --region us-central1

View logs:

gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=myservice" --limit 10

Update traffic split (e.g., 50% to revision1, 50% to revision2):

gcloud run services update-traffic myservice --to-revisions=revision1=50,revision2=50 --region us-central1

How Cloud Run Interacts with Related Technologies

Cloud Load Balancing: For global HTTPS load balancing, you can place Cloud Run behind an external HTTPS Load Balancer with Serverless NEGs.

Cloud CDN: Can cache responses from Cloud Run using the load balancer.

Cloud Functions: Similar serverless model but for single-purpose functions (not containers). Cloud Run offers more flexibility (any language, any binary).

GKE: Cloud Run is built on GKE but abstracts cluster management. Cloud Run for Anthos runs on your GKE cluster.

Artifact Registry: Stores container images. Cloud Run pulls images from here.

Cloud Scheduler: Can invoke Cloud Run services via HTTP triggers (for cron jobs).

Eventarc: Routes events from Google Cloud services (e.g., Pub/Sub, Cloud Storage) to Cloud Run.

Cloud SQL: Accessed via VPC connector or private IP (with serverless VPC access).

Secret Manager: Mount secrets as environment variables or volumes.

Scaling Behavior

Cloud Run scales based on the number of concurrent requests. Each instance can handle up to concurrency requests simultaneously. When the average number of concurrent requests per instance exceeds a threshold (default 0.7 * concurrency), Cloud Run starts new instances. Scaling is fast (seconds) but cold starts add latency. To reduce cold starts, set min instances >0, use the second generation execution environment (faster startup), or keep container images small (under 1 GB).

Request Handling and Lifecycle

Request arrives: Cloud Run's frontend receives the request and forwards it to an instance.

Instance selection: If an instance is idle, it handles the request. If all instances are busy, a new instance is started (cold start).

Request processing: The container must respond within the request timeout. The container can use the PORT environment variable (default 8080) to listen.

Response: The response is sent back through the same path.

Idle timeout: After the last request finishes, the instance remains idle for the idle timeout period (default 5 minutes). If no new request arrives, the instance is shut down.

Security and Authentication

By default, Cloud Run services are private (only accessible by authorized accounts). To allow unauthenticated access, use --allow-unauthenticated or configure IAM policies. Cloud Run integrates with IAM for access control. Service accounts can be attached to revisions to grant permissions to other Google Cloud services.

Limitations

Stateless: Containers must be stateless. Use Cloud SQL, Firestore, or Cloud Storage for persistence.

Local disk: Each instance has a writable tmpfs (ephemeral) of up to 32 GB, but data is lost when the instance is recycled.

No gRPC streaming: First gen supports only HTTP/1.1 and HTTP/2 (h2c). Second gen supports gRPC.

No WebSockets: Not supported (use Cloud Run for Anthos with GKE).

No background activities: CPU is throttled when not handling requests unless CPU always on is enabled.

Pricing

Cloud Run charges based on: - CPU allocation: per vCPU-second (while handling requests, or always-on if configured). - Memory allocation: per GiB-second. - Requests: per million requests. - Networking: egress traffic is charged at standard rates. - No charge for idle instances (when min instances = 0).

Exam Relevance

For the GCDL exam, focus on:

Understanding that Cloud Run is for stateless containers, serverless, scales to zero.

Knowing the difference between Cloud Run and Cloud Functions (containers vs. functions).

Recognizing Cloud Run as a good choice for containerized apps that need autoscaling and minimal ops.

Being aware of Cloud Run for Anthos for hybrid deployments.

Understanding that Cloud Run integrates with Cloud Load Balancing, VPC connectors, and Eventarc.

Walk-Through

Deploy container image

The user pushes a container image to Artifact Registry (e.g., `us-docker.pkg.dev/myproject/myrepo/myimage:v1`). Then they run `gcloud run deploy` specifying the image, region, and configuration. Cloud Run validates the image, creates a new revision, and begins routing traffic to it. The revision is immutable; any change creates a new revision. Traffic can be split between revisions.

Request arrives at Cloud Run endpoint

A client sends an HTTPS request to the service URL (e.g., `https://myservice-xxxx-uc.a.run.app`). Google Front End (GFE) terminates TLS and forwards the request to the Cloud Run control plane. The control plane checks IAM permissions. If the request is unauthenticated and the service requires authentication, it returns 403. Otherwise, it proceeds to route the request.

Instance selection or cold start

The control plane checks if any existing instance is idle (not handling a request). If yes, it forwards the request to that instance. If all instances are busy and the max instance count hasn't been reached, it initiates a cold start: pulls the container image from Artifact Registry, creates a sandbox (first gen) or micro-VM (second gen), starts the container, and waits for it to listen on the configured port (default 8080). Cold start latency typically ranges from 1-10 seconds depending on image size and execution environment.

Request processed by container

The container receives the request on the assigned port. It must respond within the request timeout (default 300 seconds). The container can use environment variables like `PORT`, `K_SERVICE`, `K_REVISION`, and `K_CONFIGURATION`. Cloud Run does not forward requests to containers that are still starting up. After processing, the container sends an HTTP response. Cloud Run logs the request method, path, status code, latency, and instance ID.

Instance idle and shutdown

After the response is sent, the instance becomes idle. Cloud Run starts a timer for the idle timeout (default 300 seconds). If no new request arrives before the timer expires, the instance is shut down. If a new request arrives, the timer resets. When min instances > 0, the specified number of instances are kept warm (never shut down). Shutdown is graceful: the container receives a SIGTERM signal and has 10 seconds to clean up before SIGKILL.

What This Looks Like on the Job

Enterprise Scenario 1: E-commerce API Backend

A large online retailer uses Cloud Run to host its product catalog API. The API is containerized with Node.js and connects to Cloud SQL (PostgreSQL) for product data. The service experiences highly variable traffic: low during early morning, spikes during flash sales. They deploy with --min-instances 10 to keep a baseline warm, --max-instances 200 to handle spikes, and --concurrency 80 to maximize throughput. They use a VPC connector to access Cloud SQL privately. During a flash sale, they see cold starts for new instances, but most requests are handled by warm instances. They monitor with Cloud Monitoring and set alerts for request latency >500ms. A common misconfiguration is setting --min-instances too high, leading to unnecessary costs during low traffic. They also enable CPU always on for background cache warming.

Enterprise Scenario 2: Media Processing Pipeline

A media company uses Cloud Run to run a containerized video transcoding service. They receive video files via Cloud Storage, which triggers a Cloud Function that publishes a message to Pub/Sub. A Cloud Run service subscribes to the Pub/Sub topic and processes each video. Each transcoding job takes 10-30 minutes, so they set request timeout to 60 minutes (max). They use the second generation execution environment for better CPU performance and larger memory (up to 16 GiB). They set --concurrency 1 because each instance handles one video at a time. They use Cloud Run's CPU always on to allow background processing. A pitfall is that if the request timeout is exceeded, Cloud Run terminates the request, causing partial processing. They implement checkpointing to Cloud Storage every few minutes. They also set --max-instances to control costs and avoid overwhelming downstream services.

Enterprise Scenario 3: Internal Dashboard

A financial services company runs a Grafana dashboard on Cloud Run for internal use. The dashboard is lightweight and accessed by about 50 employees during business hours. They set --min-instances 0 to save costs overnight. They use IAM with --no-allow-unauthenticated and grant access via Google Groups. They use Cloud Run's managed SSL certificate for HTTPS. They notice that the first request in the morning has a cold start of about 5 seconds, which is acceptable. They set idle timeout to 10 minutes to avoid frequent cold starts if someone is actively using the dashboard. A common mistake is not setting a VPC connector to access internal databases, causing connection failures. They also use Cloud CDN behind a load balancer to cache static assets.

How GCDL Actually Tests This

What GCDL Tests on Cloud Run

The GCDL exam (Objective 4.2: Serverless Compute) tests your understanding of Cloud Run's use cases, benefits, and limitations. You do not need to memorize CLI commands, but you must know:

Cloud Run is for stateless containers that scale to zero.

It supports any language/runtime as long as it's containerized.

It integrates with Eventarc for event-driven architectures.

Cloud Run for Anthos extends Cloud Run to on-premises or GKE.

Pricing is based on CPU/memory allocation and requests (not idle instances).

Common Wrong Answers and Why They Are Chosen

1. Wrong: "Cloud Run requires you to write functions like Cloud Functions." Why chosen: Candidates confuse serverless containers with serverless functions. Cloud Run runs arbitrary containers; Cloud Functions runs code snippets.

2. Wrong: "Cloud Run can run stateful applications with persistent local storage." Why chosen: Candidates assume containers have persistent storage. Cloud Run instances are ephemeral; use external services for state.

3. Wrong: "Cloud Run supports WebSockets and gRPC streaming in all execution environments." Why chosen: Candidates think Cloud Run supports all protocols. First gen does not support WebSockets or gRPC; second gen supports gRPC but not WebSockets.

4. Wrong: "Cloud Run is the best choice for long-running batch jobs." Why chosen: Candidates see containers and think batch. Cloud Run has a 60-minute timeout; use Batch or GKE for longer jobs.

Specific Numbers and Terms That Appear on the Exam

Default request timeout: 5 minutes (300 seconds)

Maximum request timeout: 60 minutes

Default idle timeout: 5 minutes

Default concurrency: 1 (can be set up to 80 or 250 for second gen)

Default max instances: 100

Minimum memory: 128 MiB

Maximum memory: 32 GiB (second gen)

Execution environments: first gen (sandbox) and second gen (micro-VM)

Cloud Run for Anthos: runs on GKE clusters

Eventarc: triggers Cloud Run from Pub/Sub, Cloud Storage, BigQuery

Edge Cases and Exceptions

If you set --min-instances > 0, you pay for those instances even if they are idle.

CPU is throttled when not handling requests unless you enable CPU always on.

Cloud Run does not support local disk persistence across requests (ephemeral tmpfs).

To access a VPC network, you must use a VPC connector (Serverless VPC Access).

Cloud Run services can be invoked via HTTP or gRPC (second gen) but not via TCP/UDP.

How to Eliminate Wrong Answers

If a question mentions "functions" or "code snippets" → not Cloud Run.

If a question mentions "persistent storage" or "stateful" → not Cloud Run.

If a question mentions "long-running jobs" (>60 min) → not Cloud Run.

If a question mentions "WebSockets" → not Cloud Run (use GKE).

If a question mentions "manage clusters" → not Cloud Run (serverless).

Key Takeaways

Cloud Run is a fully managed serverless container platform that scales to zero.

It supports any containerized application, any language, any library.

Default request timeout is 5 minutes; max is 60 minutes.

Default idle timeout is 5 minutes; instances shut down after idle period.

Concurrency can be set from 1 to 80 (or 250 for second gen).

Min instances can be set to reduce cold starts; max instances limit scaling.

Cloud Run integrates with Eventarc, Cloud Scheduler, Cloud Load Balancing, and VPC connectors.

Cloud Run for Anthos runs on GKE clusters for hybrid deployments.

Pricing: pay for CPU, memory, and requests; no charge for idle instances (min=0).

Cloud Run does not support WebSockets, raw TCP, or persistent local storage.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Cloud Run

Runs containers (any language/runtime)

Supports request timeout up to 60 minutes

Concurrency up to 250 (second gen)

Can use CPU always on for background tasks

Ideal for microservices and APIs

Cloud Functions

Runs functions (code snippets) in supported runtimes (Node.js, Python, Go, etc.)

Request timeout max 9 minutes (HTTP) or 10 minutes (background)

Single-threaded (one request per instance)

No CPU always on (only during execution)

Ideal for event-driven lightweight tasks

Watch Out for These

Mistake

Cloud Run runs containers on virtual machines that I manage.

Correct

Cloud Run is serverless. You do not manage any VMs or clusters. The underlying infrastructure is fully managed by Google. You only provide a container image.

Mistake

Cloud Run supports any protocol, including WebSockets and raw TCP.

Correct

Cloud Run only supports HTTP/1.1, HTTP/2 (h2c), and gRPC (second gen). WebSockets and raw TCP are not supported.

Mistake

Cloud Run instances have persistent local storage.

Correct

Cloud Run provides an ephemeral writable tmpfs (up to 32 GB) that is lost when the instance is shut down. For persistent storage, use Cloud Storage, Cloud SQL, or Firestore.

Mistake

Cloud Run charges for idle instances even when scaled to zero.

Correct

Cloud Run charges only for resources used during request processing (CPU and memory allocation) and per request. When min instances = 0 and no requests are being handled, there is no charge.

Mistake

Cloud Run requires you to write code in a specific language like Node.js or Python.

Correct

Cloud Run runs any containerized application, so you can use any language or runtime (Go, Java, .NET, Rust, etc.) as long as it listens on a port.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between Cloud Run and Cloud Functions?

Cloud Run runs containers, allowing you to use any language or runtime, and supports longer timeouts (up to 60 minutes) and higher concurrency. Cloud Functions runs code snippets in supported runtimes with shorter timeouts (up to 9 minutes for HTTP). Cloud Run is better for microservices; Cloud Functions is better for lightweight event-driven tasks.

Can Cloud Run access resources in a VPC network?

Yes, by using a VPC connector (Serverless VPC Access). You attach the connector to your Cloud Run service, and it enables access to resources like Compute Engine VMs, Cloud SQL instances, and Memorystore with private IPs.

How does Cloud Run handle cold starts?

When a request arrives and no instance is available, Cloud Run pulls the container image, starts a new instance, and waits for it to listen on the port. Cold start latency depends on image size, execution environment (first gen faster), and startup code. To reduce cold starts, set min instances >0, use smaller images, or enable the second gen environment.

What is the maximum number of instances Cloud Run can scale to?

The default max instances is 100. You can increase it up to 1000 via the console or CLI. For higher limits, you need to request a quota increase from Google Cloud Support.

Can I use Cloud Run for background processing or batch jobs?

Yes, but with limitations. Cloud Run has a maximum request timeout of 60 minutes. For longer jobs, use Batch, GKE, or Cloud Tasks. You can also use CPU always on to allow background processing, but the instance will still be terminated after idle timeout.

How do I authenticate requests to my Cloud Run service?

By default, Cloud Run services require authentication. You can use IAM roles (roles/run.invoker) to grant access to users, service accounts, or groups. Alternatively, you can allow unauthenticated access with --allow-unauthenticated. For programmatic access, use service account keys or workload identity federation.

What is Cloud Run for Anthos?

Cloud Run for Anthos is a version of Cloud Run that runs on your GKE cluster (on-premises or in Google Cloud). It provides the same serverless developer experience but with more control over the underlying infrastructure. It supports custom domains, VPC integration, and advanced networking.

Terms Worth Knowing

Azure Functions Cloud computing Cloud Run Lambda Region

Ready to put this to the test?

You've just covered Cloud Run for Serverless Containers — now see how well it sticks with free GCDL practice questions. Full explanations included, no account needed.

Try GCDL practice questions Back to all chapters

Done with this chapter?

Observability: Logging, Monitoring, and Tracing

Cost Optimisation Strategies on GCP

See the full GCDL study guide