ACEChapter 4 of 101Objective 1.4

Cloud Run and App Engine

Cloud Run and App Engine: Google Cloud's two primary serverless compute platforms. These services are central to the ACE exam, appearing in roughly 15-20% of questions across multiple domains, especially in scenarios involving application deployment, scaling, and cost optimization. You will learn their architectures, key differences, use cases, and how to choose between them based on application requirements. Mastery of this content is essential for passing the exam and for building scalable, cost-effective applications in production.

25 min read

Intermediate

Updated Jul 20, 2026

Reviewed by Johnson Ajibi· Senior Network & Security Engineer · MSc IT Security

Jump to a section

Explain it to me simply Where people get tripped up Test what I know Look up key terms

Cloud Run and App Engine as Food Trucks vs. Restaurants

App Engine is a full-service restaurant chain. You hand the corporate chef a complete menu (your app code) and specify the cuisine (runtime, e.g., Python, Java). The restaurant chain handles everything: sourcing ingredients (provisioning servers), hiring cooks (scaling instances), managing reservations (load balancing), and even cleaning up (patching). You don't see the kitchen; you just know the food comes out. But you're locked into their menu format — if you want to serve a dish that requires a special stove (custom runtime), you might struggle. Cloud Run, by contrast, is a food truck lot. You bring your own truck (a container image) with your own stove, ingredients, and chef. The lot owner (Google) provides electricity (compute), water (networking), and a common dining area (managed infrastructure). You can serve any cuisine you want, as long as it fits in a standard truck (container). The lot owner can quickly move trucks around to where customers are (auto-scaling from zero). However, if your truck takes too long to set up (cold start), customers walk away. And you're responsible for ensuring your truck doesn't catch fire (container security). The key difference: with App Engine, Google dictates the kitchen; with Cloud Run, you bring your own kitchen in a box.

How It Actually Works

What Are Cloud Run and App Engine?

Cloud Run and App Engine are Google Cloud's fully managed serverless platforms for running applications without provisioning or managing servers. Both support automatic scaling, pay-per-use pricing, and integration with other Google Cloud services. However, they differ fundamentally in their execution model and flexibility.

App Engine is a Platform-as-a-Service (PaaS) offering that abstracts away the entire infrastructure. You deploy your application code (in supported languages: Python, Java, Go, PHP, Node.js, Ruby, or .NET) and App Engine handles everything from provisioning load balancers to scaling instances. It offers two environments: Standard Environment (sandboxed, faster scaling, lower cost) and Flexible Environment (runs in Docker containers, more customizability, but slower scaling). App Engine enforces a specific runtime contract — your application must conform to certain constraints, such as no writing to the local filesystem (except /tmp) and request timeouts of 60 minutes for Standard (or 24 hours for manual scaling tasks).

Cloud Run is a fully managed compute platform that runs stateless containers. You provide a container image (via Artifact Registry or Container Registry), and Cloud Run automatically scales it up and down based on traffic, including scaling to zero when there are no requests. It is built on Knative, an open-source Kubernetes-based platform, and offers more flexibility than App Engine because you can use any runtime, any library, and any binary that fits in a container. Cloud Run supports both HTTP requests and event-driven invocations (via Eventarc). Key constraints: containers must start a web server on the port defined by the PORT environment variable (default 8080), and requests must complete within 60 minutes (for Cloud Run services) or 15 minutes (for Cloud Run jobs).

How Cloud Run Works Internally

When you deploy a container image to Cloud Run, the following steps occur:

Image Upload: You push your container image to Artifact Registry or Container Registry. Cloud Run pulls the image from the registry.

Revision Creation: Cloud Run creates a new revision — an immutable snapshot of your container, environment variables, and configuration. Each deployment creates a new revision; revisions can be rolled back to.

Routing: Cloud Run creates a stable HTTPS URL (e.g., https://service-name-hash-uc.a.run.app). Incoming requests are routed through Google's frontend load balancers to the Cloud Run service.

Scaling: Cloud Run uses a request-based autoscaler. When a request arrives, if no instance is already handling a request, Cloud Run starts a new container instance (cold start). The instance handles the request and remains alive for a configurable idle timeout (default 5 minutes, max 24 hours). If more requests arrive while existing instances are busy, Cloud Run starts additional instances. The maximum number of instances can be set (default 100, max 1000). Instances can handle multiple concurrent requests based on the concurrency setting (default 80, max 1000).

Instance Lifecycle: Each container instance runs in a lightweight gVisor sandbox for security. The instance processes requests until idle timeout, then is shut down. Scaling to zero means no instances run when there are no requests, reducing cost.

How App Engine Works Internally

Standard Environment: Your application runs in a sandboxed runtime provided by Google. The runtime includes a web server (e.g., Jetty for Java, Gunicorn for Python). App Engine automatically starts and stops instances based on scaling settings. You configure scaling via app.yaml:

automatic_scaling: App Engine adjusts instances based on request rate and latency. Parameters: min_idle_instances (default 0), max_idle_instances (default automatic), min_pending_latency (default 30ms), max_pending_latency (default automatic).

basic_scaling: Instances are created only when requests arrive; idle instances are shut down after 5 minutes. Good for batch jobs.

manual_scaling: You specify a fixed number of instances.

Flexible Environment: Your application runs in a Docker container on Compute Engine VM instances. Scaling is slower (minutes vs. seconds) because it provisions VMs. Instances are managed by App Engine, but you have SSH access and can install custom software. Billing is per-hour of VM uptime, not per-request.

Key Components and Defaults

Cloud Run:

Default region: us-central1 (if not specified)

Default concurrency: 80 (requests per instance)

Default request timeout: 5 minutes (max 60 minutes for services, 15 for jobs)

Default idle timeout: 5 minutes (max 24 hours)

Default max instances: 100 (can be set from 0 to 1000; 0 means unlimited, but subject to project quota)

Minimum instances: 0 (can be set >0 to reduce cold starts)

CPU allocation: by default, CPU is allocated only during request processing (throttled when idle). You can set CPU to always allocated to reduce cold start latency.

VPC connector: required to access resources in a VPC network (e.g., Cloud SQL).

Ingress settings: can restrict to internal traffic, internal and Cloud Load Balancing, or all.

App Engine Standard:

Request timeout: 60 seconds for automatic scaling, 60 minutes for manual scaling and tasks.

Response size limit: 32 MB.

File size limit: 32 MB per file (uploaded via app.yaml).

Max concurrent requests per instance: 10 (by default; can be increased via max_concurrent_requests in app.yaml).

Instance hours: free tier includes 28 instance hours per day for Standard.

Memory: varies by instance class (e.g., F1: 128 MB, F2: 256 MB, F4: 512 MB).

App Engine Flexible:

Request timeout: 60 minutes.

Instance types: predefined (e.g., g1-small, n1-standard-1) or custom.

Billing: per-hour of VM uptime.

Scaling: slower than Standard; uses VM-based scaling.

Configuration and Verification Commands

Cloud Run:

Deploy a service:

gcloud run deploy SERVICE_NAME \
  --image IMAGE_URL \
  --region REGION \
  --concurrency 80 \
  --timeout 300 \
  --max-instances 10 \
  --min-instances 1 \
  --cpu-boost \
  --no-cpu-throttling

List services:

gcloud run services list --region REGION

Describe a revision:

gcloud run revisions describe REVISION_NAME --region REGION

App Engine:

Deploy an app:

gcloud app deploy app.yaml --version VERSION --promote

View logs:

gcloud app logs tail -s SERVICE

Update scaling:

gcloud app services update SERVICE --min-instances 1 --max-instances 10

Interaction with Related Technologies

Both Cloud Run and App Engine integrate with: - Cloud Build: for continuous deployment. - Cloud Monitoring: for metrics like request count, latency, and instance count. - Cloud Logging: for application and request logs. - Cloud Scheduler: for cron jobs (App Engine has built-in cron; Cloud Run uses Cloud Scheduler to invoke HTTP endpoints). - Cloud Tasks: for asynchronous task processing. - Cloud SQL: via VPC connector (Cloud Run) or App Engine's built-in connection (Standard uses unix socket; Flexible uses TCP). - Cloud Storage: for static assets. - Eventarc: for event-driven invocations (Cloud Run).

Exam Emphasis

The ACE exam tests your ability to differentiate between Cloud Run and App Engine based on requirements such as:

Need for custom runtime / container flexibility → Cloud Run

Need for fastest scaling and lowest cost for simple apps → App Engine Standard

Need for long-running background processes (up to 24 hours) → App Engine Flexible or Cloud Run jobs

Need to scale to zero → Cloud Run (App Engine Standard can scale to zero only with basic scaling, but not with automatic scaling as min_idle_instances can be 0 but instances may remain if traffic exists)

Need for VPC access → Both support via VPC connector (Cloud Run) or App Engine Flexible (direct VM access) or Serverless VPC Access for Standard.

Walk-Through

Choose between Cloud Run and App Engine

Evaluate the application's requirements. If the app is a standard web app using a supported runtime (Python, Java, Go, etc.) and does not require custom binaries or full filesystem access, App Engine Standard is optimal due to its sub-second scaling and low cost. If the app needs a custom runtime, uses any programming language, or requires specific system libraries, Cloud Run is the choice. For apps that need SSH access or long-running background processes (over 60 minutes), App Engine Flexible or Cloud Run jobs (max 15 min) may be considered. The exam often presents a scenario with constraints like 'must scale to zero' and 'use a custom container' — that points to Cloud Run.

Deploy application to Cloud Run

Create a Dockerfile that runs a web server on the port defined by the `PORT` environment variable (default 8080). Build the image and push to Artifact Registry. Use `gcloud run deploy` to create a service. The command accepts parameters for concurrency, timeout, max/min instances, and CPU allocation. Cloud Run creates a revision and provides a HTTPS URL. Verify deployment with `gcloud run services describe`. The service automatically scales based on incoming requests. Note: if the container fails to start (e.g., wrong port), the revision will show an error and Cloud Run will not route traffic to it.

Configure App Engine scaling and environment

In `app.yaml`, define runtime, instance class, scaling type (automatic, basic, or manual), and environment variables. For Standard environment, set `env: standard`. For Flexible, set `env: flexible`. Scaling parameters: `automatic_scaling` includes `min_idle_instances`, `max_idle_instances`, `min_pending_latency`, `max_pending_latency`. Use `gcloud app deploy` to deploy. App Engine will create a new version and route traffic to it if `--promote` is used. Monitor deployment with `gcloud app versions list`. The exam tests knowledge of scaling defaults: for automatic scaling, min_idle_instances defaults to 0, meaning instances can scale down to zero if no traffic, but idle instances may remain due to pending latency settings.

Handle traffic splitting and revisions

Cloud Run supports traffic splitting between revisions. Use `gcloud run services update-traffic` to send a percentage of traffic to a specific revision. This enables canary deployments and rollbacks. App Engine supports traffic splitting between versions via `gcloud app services set-traffic`. In both cases, you can gradually migrate traffic. The exam may ask about rolling back: in Cloud Run, you can set traffic 100% to a previous revision; in App Engine, you can promote a previous version. Note: traffic splitting in App Engine is by version, not by revision; versions are mutable (you can redeploy to a version).

Monitor and troubleshoot serverless applications

Use Cloud Logging to view request logs and application logs. For Cloud Run, you can stream logs with `gcloud logging read`. For App Engine, use `gcloud app logs tail`. Common issues: cold start latency (mitigated by min instances or CPU always allocated), 502 errors (often due to container crashing), timeout errors (increase timeout setting). The exam tests understanding of how to diagnose scaling issues: if latency increases, check if instances are being throttled or if concurrency is too high. Use Cloud Monitoring dashboards to view request count, p50/p99 latency, and instance count.

What This Looks Like on the Job

Enterprise Scenario 1: Event-Driven Microservices with Cloud Run

A financial services company needs to process real-time transaction alerts. They use Cloud Run because each alert is a discrete event that triggers a container to run for a few seconds. The container, written in Go, processes the alert, updates a database, and exits. They set --max-instances=50 to control costs and --min-instances=1 to reduce cold start latency for the most critical alerts. They use Eventarc to route Pub/Sub messages to Cloud Run. The system scales to zero during low-traffic periods, saving costs. A common mistake is forgetting to set --no-cpu-throttling (or --cpu-boost) to ensure CPU is allocated during initialization, which reduces cold start time. The team also uses Cloud Run's concurrency setting to allow multiple alerts to be processed by a single instance, increasing throughput. They monitor concurrency and instance count to avoid hitting the max instances limit, which would cause request queuing.

Enterprise Scenario 2: Traditional Web Application on App Engine Standard

A SaaS company hosts a Django-based web app for project management. They choose App Engine Standard because it provides automatic scaling, integrates seamlessly with Cloud SQL via unix sockets, and offers a free tier for development. They configure automatic_scaling with min_idle_instances: 0 and max_idle_instances: 1 to minimize cost. During peak hours, App Engine scales up to 20 instances. They use Cloud Tasks to offload long-running operations (e.g., report generation) because App Engine Standard has a 60-second request timeout for automatic scaling. The team accidentally set max_concurrent_requests too high (default 10), causing some instances to become overloaded and return 503 errors. They fixed it by reducing concurrency to 5. They also learned that App Engine Standard's /tmp directory is writable but ephemeral — they use Cloud Storage for persistent file storage.

Scenario 3: Hybrid with Cloud Run and App Engine

A retail company uses App Engine Standard for their main e-commerce site (Python) and Cloud Run for a custom recommendation engine written in R (not supported by App Engine). The recommendation engine is stateless and receives HTTP requests from the main site. Cloud Run scales to zero when no recommendations are requested, saving costs. They use a VPC connector to allow Cloud Run to access a Redis instance for caching. The main challenge was ensuring low latency for the recommendation service — they set --min-instances=2 and --cpu-always to avoid cold starts. The exam scenario might ask: 'Which service should host a containerized application that needs to scale to zero and uses a custom binary?' The answer is Cloud Run.

How ACE Actually Tests This

What the ACE Exam Tests

The ACE exam (Objective 1.4) focuses on your ability to select the appropriate serverless compute service based on application requirements. Key decision points include:

Whether the app requires a custom runtime or container → Cloud Run

Whether the app needs to scale to zero → Both, but Cloud Run is more explicit

Whether the app needs long request timeouts ( > 60 min) → App Engine Flexible or Cloud Run jobs (max 15 min) — note that Cloud Run services max at 60 min, so for 2-hour jobs, App Engine Flexible is the only option

Whether the app needs to run in a specific Google Cloud region → Both support multiple regions

Whether the app needs VPC access → Both via Serverless VPC Access (Standard) or VPC connector (Cloud Run)

Common Wrong Answers and Traps

Choosing App Engine Flexible for a simple web app because it supports custom runtimes. Trap: App Engine Flexible is more expensive and slower to scale than Standard. If the app fits Standard's constraints, Standard is preferred.

Selecting Cloud Run for an application that requires writing to the local filesystem persistently. Trap: Cloud Run containers are ephemeral; filesystem writes are lost when instances are recycled. Use Cloud Storage or other persistent storage.

Thinking App Engine Standard cannot scale to zero. Reality: With min_idle_instances: 0 and no traffic, App Engine Standard can scale down to zero instances. However, there is a slight delay (seconds) to start a new instance. Cloud Run is more efficient at scaling to zero because it's container-based.

Believing Cloud Run supports SSH access. Trap: Cloud Run runs in a sandbox; you cannot SSH into containers. App Engine Flexible allows SSH via gcloud app instances ssh.

Confusing Cloud Run jobs with Cloud Run services. Trap: Jobs are for batch workloads (max 15 min timeout), services are for HTTP request handling. The exam may ask about processing a batch file — answer is Cloud Run jobs or App Engine with basic/manual scaling.

Specific Numbers and Terms

Cloud Run default concurrency: 80

Cloud Run default timeout: 5 minutes (300 seconds)

Cloud Run max timeout for services: 60 minutes

Cloud Run max timeout for jobs: 15 minutes

Cloud Run default max instances: 100

App Engine Standard request timeout: 60 seconds (automatic scaling)

App Engine Standard max idle instances: automatic (not configurable directly)

App Engine Flexible request timeout: 60 minutes

App Engine Standard instance classes: F1 (128 MB), F2 (256 MB), F4 (512 MB)

App Engine Flexible machine types: g1-small, n1-standard-1, etc.

Edge Cases

VPC connectivity: App Engine Standard uses Serverless VPC Access; Cloud Run uses VPC connector. Both require a connector in the same region.

Cron jobs: App Engine has built-in cron; Cloud Run must use Cloud Scheduler to make HTTP requests to the service.

Multiple containers: Cloud Run runs a single container per instance. For multi-container pods, use GKE or Cloud Run for Anthos.

Stateful applications: Neither is ideal; use Compute Engine or GKE with stateful sets.

How to Eliminate Wrong Answers

When a question asks 'Which service should be used?', look for keywords: - 'custom container' or 'Dockerfile' → Cloud Run or App Engine Flexible - 'scale to zero' → Cloud Run (more direct) or App Engine Standard (with basic scaling) - 'long-running process' > 15 min → App Engine Flexible or Cloud Run jobs (if ≤15 min) - 'requires SSH' → App Engine Flexible - 'supported language' (Python, Java, Go, PHP, Node.js, Ruby, .NET) → App Engine Standard - 'events' or 'Pub/Sub' → Cloud Run (via Eventarc) or App Engine (via push queues)

Key Takeaways

Cloud Run runs stateless containers and scales to zero; ideal for custom runtimes and event-driven workloads.

App Engine Standard is best for simple web apps using supported languages; offers sub-second scaling and a free tier.

App Engine Flexible is for apps needing custom runtimes or SSH access but is more expensive and slower to scale.

Cloud Run default concurrency is 80; default timeout is 5 minutes (max 60).

App Engine Standard default request timeout is 60 seconds for automatic scaling; 60 minutes for manual scaling.

Both Cloud Run and App Engine integrate with Cloud SQL via VPC connectors or unix sockets (Standard).

Cloud Run jobs are for batch tasks up to 15 minutes; services for HTTP requests up to 60 minutes.

Traffic splitting is supported in both Cloud Run (revisions) and App Engine (versions).

Cold starts can be mitigated by setting min instances (Cloud Run) or min idle instances (App Engine).

The ACE exam tests your ability to choose between these services based on application requirements like runtime flexibility, scaling needs, and cost.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Cloud Run

Runs any container with any runtime or language

Scales to zero by default; no idle instances

Bills per request duration (to nearest 100ms) with a minimum of 1 minute

Max request timeout 60 minutes (services) or 15 minutes (jobs)

Supports up to 1000 concurrent requests per instance (default 80)

App Engine Standard

Supports only specific runtimes (Python, Java, Go, PHP, Node.js, Ruby, .NET)

Can scale to zero but may keep idle instances based on settings

Bills per instance hour (free tier includes 28 instance hours/day)

Max request timeout 60 seconds (automatic scaling) or 60 minutes (manual scaling)

Default 10 concurrent requests per instance (configurable)

Cloud Run

Fully managed; no VM access

Scales in seconds; can scale to zero

Bills per request duration (minimum 1 minute)

No SSH access

Built on Knative (open source)

App Engine Flexible

Runs on Compute Engine VMs; SSH access available

Scales in minutes; cannot scale to zero (always at least 1 instance)

Bills per VM hour

Supports custom runtimes via Dockerfile

Legacy offering; newer apps prefer Cloud Run

Watch Out for These

Mistake

Cloud Run and App Engine Flexible are the same because both run containers.

Correct

App Engine Flexible runs containers on Compute Engine VMs, which means it takes longer to scale (minutes) and you pay per VM hour. Cloud Run runs containers on a managed Knative infrastructure, scales in seconds, and bills per request duration. Cloud Run also scales to zero, while App Engine Flexible always has at least one instance running (cannot scale to zero).

Mistake

App Engine Standard cannot scale to zero instances.

Correct

App Engine Standard can scale to zero instances if there is no traffic and you set `min_idle_instances: 0`. However, there is a slight latency to start a new instance (cold start). The default `min_idle_instances` is 0, so it can scale to zero automatically. Some candidates think it always keeps at least one idle instance, but that's only if you set `min_idle_instances` > 0.

Mistake

Cloud Run supports stateful workloads because you can mount a persistent disk.

Correct

Cloud Run does not support persistent disk mounts. It is designed for stateless containers. For stateful workloads, use Compute Engine, GKE, or Cloud Run for Anthos (which can mount volumes). Cloud Run only provides ephemeral storage that is lost when the instance is recycled.

Mistake

App Engine Flexible is cheaper than App Engine Standard because you can use smaller instances.

Correct

App Engine Flexible is generally more expensive because you pay for the VM instance per hour, regardless of traffic. App Engine Standard charges per instance hour but instances are more efficient and can scale to zero. For low-traffic apps, Standard is cheaper.

Mistake

Cloud Run jobs can handle HTTP requests like Cloud Run services.

Correct

Cloud Run jobs are for batch tasks that run to completion (max 15 min). They do not serve HTTP requests. Cloud Run services are for HTTP request handling. If you need to process an HTTP request, use a service. If you need to run a one-time batch job, use a job.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

Can Cloud Run access a Cloud SQL database?

Yes, Cloud Run can access Cloud SQL using a VPC connector or the Cloud SQL proxy. You need to configure a Serverless VPC Access connector to allow Cloud Run to reach resources in a VPC network, including Cloud SQL private IP. Alternatively, you can use the Cloud SQL proxy as a sidecar container (not natively supported, but you can run it in the same container). The easiest way is to enable Cloud SQL connections directly in the Cloud Run service configuration using the `--add-cloudsql-instances` flag, which automatically sets up the proxy.

What is the difference between Cloud Run services and Cloud Run jobs?

Cloud Run services are designed to handle HTTP requests and can scale based on traffic. They have a maximum request timeout of 60 minutes. Cloud Run jobs are for batch workloads that run to completion; they do not serve HTTP requests and have a maximum timeout of 15 minutes. Jobs are ideal for data processing, ETL tasks, or any one-off computation. Both are stateless and scale to zero when not in use.

How does App Engine Standard handle session state?

App Engine Standard does not support sticky sessions (session affinity) by default. If your application requires session state, you must store it externally, such as in Cloud Memorystore (Redis) or Cloud Datastore. App Engine's distributed nature means consecutive requests from the same user may go to different instances. For stateful applications, consider using Cloud Run with session affinity (not supported directly) or use an external session store.

Can I use Cloud Run to run a long-running process that takes more than 60 minutes?

No, Cloud Run services have a maximum request timeout of 60 minutes. For processes longer than 60 minutes, you should use App Engine Flexible (which supports up to 60 minutes but can be extended with manual scaling) or Compute Engine. Alternatively, you can break the process into smaller tasks and use Cloud Tasks or Cloud Run jobs (max 15 min). For truly long-running processes, consider Cloud Batch or GKE.

Does App Engine Standard support WebSockets?

App Engine Standard does not support WebSockets. If your application requires WebSocket connections, use App Engine Flexible (which supports WebSockets) or Cloud Run (which supports WebSockets if your container handles them). Cloud Run can handle WebSocket connections as long as the container supports them and the request timeout is sufficient for the connection duration.

What is the default region for Cloud Run if not specified?

The default region for Cloud Run is `us-central1` (Iowa). You can set a different region during deployment with the `--region` flag. App Engine has a default region based on your project settings; you must select a region when creating the App Engine application (e.g., `gcloud app create --region=us-central`).

How do I reduce cold start latency on Cloud Run?

You can reduce cold start latency by setting `--min-instances` to a value greater than 0, which keeps a minimum number of instances warm. Alternatively, use `--cpu-always` to ensure CPU is always allocated (prevents CPU throttling during idle). You can also use `--cpu-boost` to temporarily increase CPU during cold starts. Additionally, optimize your container image size and startup time.

Terms Worth Knowing

Azure Functions Cloud computing Cloud IAM Cloud Run Lambda Region

Ready to put this to the test?

You've just covered Cloud Run and App Engine — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.

Try ACE practice questions Back to all chapters

Done with this chapter?

Google Kubernetes Engine (GKE)

GCP VPC and Networking

See the full ACE study guide