ACEChapter 33 of 101Objective 3.1

App Engine Standard vs Flexible

This chapter covers the differences between Google App Engine Standard and Flexible environments, a core topic for the ACE exam's 'Deploy Implement' domain (Objective 3.1). Approximately 5-8% of exam questions touch on App Engine selection, configuration, and limitations. You will learn the architectural differences, runtime support, scaling behavior, pricing models, and when to choose each environment. Mastery of this topic is essential for deploying scalable applications with minimal operational overhead.

25 min read
Intermediate
Updated May 31, 2026

App Engine Standard vs Flexible: Prefab vs Custom Kitchen

Choosing between App Engine Standard and Flexible is like deciding between a prefabricated kitchen and a custom-built one. In a prefab kitchen (Standard), the manufacturer provides a fixed set of cabinets, countertops, and appliances. You can choose colors and finishes within a catalog, but you cannot change the layout or install a commercial-grade range. The kitchen is fully assembled and ready to use immediately. It runs efficiently because everything is optimized for the standard design. However, if you need a six-burner stove or a walk-in pantry, you are out of luck. In a custom kitchen (Flexible), you hire a contractor who builds everything to your exact specifications. You can choose any appliance, any countertop material, and any layout. The downside is that the build takes longer, you must manage the contractor, and the kitchen may require more maintenance. In Google Cloud terms, App Engine Standard runs your code in a sandboxed runtime with strict limitations on libraries, background processes, and file system access. It auto-scales to zero and is billed per instance hour only when serving. App Engine Flexible runs your code in Docker containers on Compute Engine VMs, giving you full control over the environment, including SSH access, any runtime, and background processes. However, it cannot scale to zero (minimum 1 instance) and costs more because you pay for the underlying VM even when idle. The exam tests your ability to match application requirements to the correct environment: Standard for low-cost, auto-scaling, simple apps; Flexible for custom runtimes, GPU access, or apps needing background threads.

How It Actually Works

What is App Engine?

App Engine is Google Cloud's fully managed platform-as-a-service (PaaS) that automatically scales your web applications. It abstracts away infrastructure management, including servers, networking, and load balancing. You write code, upload it, and App Engine handles the rest. App Engine offers two environments: Standard and Flexible. The choice between them determines the runtime, scaling, networking, and pricing.

App Engine Standard Environment

The Standard environment is a sandboxed, restricted runtime designed for high-performance, auto-scaling applications. It supports specific language runtimes: Python 2.7, Python 3.7+, Java 8/11/17, Node.js 10+, PHP 5.5/7+, Ruby 2.5+, Go 1.11+, and .NET (experimental). Each runtime has predefined versions and libraries. You cannot install arbitrary third-party software or use native code extensions (e.g., C extensions for Python).

Key characteristics: - Scaling: Automatic, can scale to zero instances when no traffic. You configure min/max instances, but idle instances can be zero. No manual scaling. - Billing: Pay only for instance hours when serving requests. No cost when scaled to zero. - Instance types: Predefined tiers (e.g., F1, F2, F4, F4_1G) with fixed CPU/memory. F1 is the smallest (600 MHz, 256 MB RAM). - Networking: No direct outbound connections to VPC networks by default. You must use Serverless VPC Access for private networking. Inbound traffic is through HTTP/HTTPS only; no TCP/UDP sockets. - Background processes: Not allowed. The runtime terminates any threads after the request completes. Cron jobs and task queues are the only supported async mechanisms. - File system: Read-only except for /tmp directory. Writes to /tmp are ephemeral (lost on instance restart). - Session state: Must be stored externally (e.g., Cloud Datastore, Redis) because instances are ephemeral. - Startup time: Very fast (sub-second) because instances are pre-warmed and sandboxed. - Deployment: Deploy via gcloud app deploy. Code is uploaded as a .zip or directory. No Dockerfile required.

App Engine Flexible Environment

The Flexible environment runs your application in Docker containers on Compute Engine VMs. It provides more flexibility at the cost of some automation. It supports any runtime that can run in a Docker container. Google provides base images for Python, Java, Node.js, Go, Ruby, PHP, .NET, and custom runtimes. You can also bring your own container.

Key characteristics: - Scaling: Automatic, but cannot scale to zero. Minimum 1 instance must always run. Manual scaling is also possible. - Billing: Pay for the underlying Compute Engine VM (vCPU, memory, disk) even when idle. Pricing is higher than Standard. - Instance types: You choose machine type (e.g., g1-small, n1-standard-1) or custom machine shapes. Disk size is configurable (default 10 GB, up to 10 TB for persistent disks). - Networking: Full outbound access to VPC networks. Can use Cloud NAT, VPN, or dedicated interconnect. Inbound traffic can be HTTP/HTTPS or any TCP/UDP port (via health checks and load balancing). - Background processes: Allowed. Your container can run background threads, workers, or daemons. - File system: Read-write persistent disk attached to each VM. Changes survive restarts but not across instances (unless using shared NFS). - Session state: Can use local disk or external storage. Instance identity is not guaranteed. - Startup time: Slower (minutes) because it builds the Docker container and provisions a VM. - Deployment: Deploy via gcloud app deploy with an app.yaml that specifies the runtime and optional Dockerfile. The image is built using Cloud Build. - SSH access: You can SSH into the VM for debugging (via gcloud app instances ssh).

How They Work Internally

Standard Environment: 1. You define an app.yaml with runtime, handlers, and scaling settings. 2. gcloud app deploy uploads your code to Cloud Storage and triggers a build. 3. App Engine creates a sandboxed instance running the specified runtime. The sandbox isolates the code from the underlying OS. 4. Incoming HTTP requests are routed by the App Engine frontend to an available instance. If no instance is running, a new one is started (cold start). 5. The instance handles the request and may be kept alive for subsequent requests. Idle instances are terminated after a period (default 15 minutes). 6. The sandbox enforces restrictions: no syscalls beyond allowed set, no direct network sockets, no background threads.

Flexible Environment: 1. You define an app.yaml with runtime (or custom) and optionally a Dockerfile. 2. gcloud app deploy uploads code and triggers Cloud Build to build a Docker image and store it in Container Registry. 3. App Engine creates a Managed Instance Group (MIG) with the specified machine type. Each VM runs your Docker container. 4. A load balancer (HTTP(S) or network) routes traffic to instances. Health checks ensure only healthy instances receive traffic. 5. Instances can run background processes. They can access the VPC network directly. 6. The MIG auto-scales based on CPU utilization or request rate, but always keeps at least one instance running. 7. Updates are performed via rolling updates (configurable).

Key Values and Defaults

Standard scaling settings: automatic_scaling with min_idle_instances (default 0), max_idle_instances (default 1), min_pending_latency (default 30ms), max_pending_latency (default 15s).

Flexible scaling settings: automatic_scaling with min_num_instances (default 2), max_num_instances (default 20), cool_down_period_sec (default 120), cpu_utilization.target_utilization (default 0.6).

Standard instance classes: F1 (256 MB, 600 MHz), F2 (512 MB, 1.2 GHz), F4 (1024 MB, 2.4 GHz), F4_1G (2048 MB, 2.4 GHz).

Flexible machine types: any Compute Engine machine type; custom (1-96 vCPU, 0.9-624 GB memory).

Disk size: Flexible default 10 GB, max 10 TB. Standard: no persistent disk (ephemeral /tmp only).

Idle timeout: Standard instances are terminated after 15 minutes of inactivity (configurable via min_idle_instances). Flexible instances never terminate due to inactivity.

Cold start latency: Standard < 100 ms for pre-warmed instances; up to 10 seconds for cold starts. Flexible: 1-5 minutes for initial deployment, but subsequent cold starts (after scaling up) are faster (30-60 seconds).

Configuration Commands

Standard app.yaml example:

runtime: python39
instance_class: F1
automatic_scaling:
  min_idle_instances: 0
  max_idle_instances: 1
  min_pending_latency: 30ms
  max_pending_latency: 15s

Flexible app.yaml example:

runtime: custom
env: flex
resources:
  cpu: 2
  memory_gb: 4
  disk_size_gb: 20
automatic_scaling:
  min_num_instances: 1
  max_num_instances: 10
  cool_down_period_sec: 120
  cpu_utilization:
    target_utilization: 0.6

Deploy command (both):

gcloud app deploy app.yaml --project=my-project --version=v1

SSH into Flexible instance:

gcloud app instances ssh --service=default --version=v1 INSTANCE_ID

Interaction with Other Services

Cloud Tasks & Cloud Scheduler: Both environments can use these for async processing, but Standard requires them for background work, while Flexible can run its own background threads.

Cloud SQL: Both can connect via Unix sockets (Standard) or TCP (Flexible). Standard must use the Cloud SQL Proxy or the App Engine connector library.

VPC networks: Standard requires Serverless VPC Access to reach VPC resources. Flexible has direct VPC access.

Cloud Storage: Both can read/write via the Cloud Storage client library.

Identity-Aware Proxy (IAP): Both can use IAP for authentication.

Cloud CDN: Both can be fronted by Cloud CDN.

Walk-Through

1

Define app.yaml configuration

You create an app.yaml file that specifies the runtime, scaling, and resource settings. For Standard, you set runtime (e.g., python39) and instance_class. For Flexible, you set env: flex, runtime (or custom), and optionally a Dockerfile. The app.yaml is the single source of truth for deployment. Common mistakes include forgetting env: flex for Flexible, or setting incompatible scaling parameters (e.g., manual scaling with automatic). The exam expects you to know the required fields for each environment.

2

Build and upload code

Run `gcloud app deploy`. For Standard, the CLI packages your code and uploads it to Cloud Storage, then triggers a build that creates a sandboxed instance. For Flexible, Cloud Build builds a Docker image from your Dockerfile (or default image) and pushes it to Container Registry. The CLI then creates or updates the Managed Instance Group. During this step, the CLI validates your app.yaml. If you use a custom runtime in Flexible, you must provide a Dockerfile that exposes port 8080.

3

Provision instances

Standard: The App Engine service creates sandboxed instances in a pre-warmed pool. For a new version, it starts one instance immediately. Flexible: The service creates a MIG with the specified machine type. It launches VMs, pulls the Docker image, and starts the container. Health checks begin after the container is running. If the health check fails, the instance is replaced. The provisioning time for Flexible is significantly longer (minutes) compared to Standard (seconds).

4

Route traffic to instances

App Engine's frontend load balancer routes incoming HTTP(S) requests to instances. For Standard, the frontend uses a custom load balancing algorithm that considers pending latency and instance load. For Flexible, the frontend is an HTTP(S) load balancer that distributes traffic across healthy instances. In both cases, you can split traffic between versions for canary deployments. The `--split` flag in `gcloud app deploy` or the `gcloud app services set-traffic` command controls traffic splitting.

5

Auto-scaling and instance management

Standard: The autoscaler monitors request queue depth and latency. If pending requests exceed `max_pending_latency`, it spins up new instances. Idle instances are terminated after 15 minutes. The `min_idle_instances` parameter ensures some instances are always warm. Flexible: The autoscaler monitors CPU utilization (default target 0.6) and scales up/down accordingly. It uses a cool-down period of 120 seconds to avoid thrashing. Flexible never scales to zero; `min_num_instances` defaults to 2 (but you can set 1). If you set `min_num_instances` to 0, it will still keep 1 because scaling to zero is not allowed.

6

Update and rollback

When you deploy a new version, App Engine creates a new set of instances. For Standard, the new version starts receiving traffic only after you migrate traffic. For Flexible, by default, a rolling update replaces instances gradually. You can configure the update parameters like `min_ready_sec`, `max_surge`, and `max_unavailable`. To rollback, you redeploy a previous version or use `gcloud app versions start` to activate an older version. The exam may ask about traffic migration strategies (e.g., 'migrate' vs 'split').

What This Looks Like on the Job

Scenario 1: Low-Traffic API Backend

A startup builds a REST API for a mobile app that receives sporadic traffic (e.g., 10 requests per hour during night, 1000 req/s during peak). They choose App Engine Standard because it scales to zero, minimizing cost during idle periods. They use Python 3.9 runtime and F1 instances. The API connects to Cloud Datastore for data and uses Cloud Tasks for email notifications (since background threads are not allowed). They configure automatic scaling with min_idle_instances=0 and max_idle_instances=1. During peak, the app scales to 50 instances. Cost is $0.05 per instance hour (F1), so peak cost ~$2.50/hour, but idle cost is $0. The main challenge is cold start latency: the first request after a long idle period may take 5-10 seconds. They mitigate by setting min_idle_instances=1 for production, accepting a small constant cost.

Scenario 2: Machine Learning Inference Service

A company deploys a TensorFlow model for image classification. The model requires GPU acceleration and uses native C++ libraries. They cannot use Standard because it does not support GPU and restricts native code. They choose Flexible with a custom Docker image that includes TensorFlow, CUDA, and the model. They select n1-standard-4 with 1 NVIDIA Tesla T4 GPU. The app runs a Flask server that loads the model at startup and processes requests. They configure manual scaling with 3 instances to handle expected load. Cost is ~$0.20 per hour per instance (VM + GPU). They also need SSH access for debugging model loading issues. Flexible allows gcloud app instances ssh. They use a VPC connector to access on-premises data. The downside: they pay for idle instances if traffic drops. They could use Cloud Run for GPU (if available) but chose Flexible for full control.

Scenario 3: High-Throughput Web Application

A media company runs a Django web app serving 50,000 requests per second. They need persistent disk for session storage and background workers for video transcoding. Standard's /tmp is ephemeral and background threads are forbidden, so they choose Flexible. They use n1-highcpu-16 instances with 100 GB persistent disk. They configure automatic scaling with min_num_instances=10 and max_num_instances=100. They also run Celery workers in the same container for background tasks. The app connects to Cloud SQL via private IP. They use Cloud CDN to cache static assets. The main operational challenge is managing Docker image updates: rolling updates must be configured carefully to avoid downtime. They set max_surge to 2 and max_unavailable to 1. They also monitor instance health and set up Stackdriver alerts for high CPU. Cost is significant: ~$0.50 per instance hour, so at 100 instances, $50/hour. They considered Standard but could not meet the background processing requirement.

How ACE Actually Tests This

The ACE exam tests Objective 3.1 'Deploy and implement App Engine Standard and Flexible environments' through scenario-based questions. You must identify which environment is appropriate given specific constraints. The exam focuses on five key differentiators:

1.

Scaling to zero: Standard can scale to zero; Flexible cannot. If the question mentions 'cost optimization for low traffic' or 'idle instances', the answer is Standard. Trap: 'Flexible can scale to zero' is a common wrong answer.

2.

Background processes: Standard forbids background threads; Flexible allows them. If the app needs to run a background worker or cron job internally, choose Flexible. Trap: 'Standard supports background threads via basic scaling' — Basic scaling does not allow background threads either; only manual scaling in Standard can run background tasks, but it's not recommended.

3.

Runtime flexibility: Standard supports only specific runtimes with predefined library sets. Flexible supports any runtime via Docker. If the app requires a custom language (e.g., Rust) or native library, choose Flexible.

4.

Networking: Standard cannot directly access VPC resources without Serverless VPC Access. Flexible has direct VPC access. If the question mentions 'private network' or 'Cloud SQL private IP', Flexible is likely correct unless Serverless VPC Access is explicitly mentioned.

5.

Pricing: Standard is cheaper for sporadic traffic; Flexible is more expensive due to always-on VMs. Exam questions may ask about total cost of ownership.

Common wrong answers: - 'Standard environment supports GPU' — False. Only Flexible supports GPU (with custom machine types). - 'Flexible environment can scale to zero' — False. Minimum 1 instance. - 'Standard environment allows SSH access' — False. Only Flexible allows SSH. - 'Standard environment supports any runtime' — False. Only specified runtimes. - 'Flexible environment is always cheaper' — False. For low traffic, Standard is cheaper.

Edge cases:

If the app uses WebSockets, only Flexible supports them (Standard does not).

If the app needs to write to the filesystem persistently, Flexible is required (Standard /tmp is ephemeral).

If the app uses a third-party software that requires installation, Flexible is required.

If the app uses Cloud SQL, both can connect, but Standard requires the Cloud SQL Proxy or connector library.

Exam tip: When you see a question about App Engine, first identify if the app needs any of the Flexible-only features (GPU, background threads, custom runtime, SSH, persistent disk, WebSockets, VPC without Serverless VPC Access). If yes, answer Flexible. Otherwise, Standard is likely the correct choice, especially if cost is a concern.

Key Takeaways

Standard environment scales to zero instances; Flexible always runs at least one.

Standard supports only specific runtimes; Flexible supports any runtime via Docker.

Standard does not allow background threads; Flexible does.

Standard has a read-only filesystem except /tmp; Flexible has persistent disk.

Standard costs less for low-traffic apps; Flexible costs more due to always-on VMs.

Standard requires Serverless VPC Access for private networking; Flexible has direct VPC access.

Flexible supports GPU; Standard does not.

SSH access is only available in Flexible.

Standard cold starts are sub-second; Flexible cold starts take minutes.

Default min instances for Flexible is 2; you can set it to 1 but not 0.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

App Engine Standard

Scales to zero instances when idle

Supports only predefined runtimes (Python, Java, Node.js, PHP, Ruby, Go, .NET)

No background threads or processes

Read-only filesystem except /tmp (ephemeral)

No SSH access

App Engine Flexible

Always runs at least 1 instance

Supports any runtime via custom Docker containers

Allows background threads and processes

Read-write persistent disk (10 GB default, up to 10 TB)

SSH access for debugging

App Engine Standard

Pay only for instance hours when serving requests

No GPU support

No direct VPC access (requires Serverless VPC Access)

Cold start < 100ms (pre-warmed)

Instance classes: F1, F2, F4, F4_1G

App Engine Flexible

Pay for underlying VM even when idle

GPU support (attach to custom machine types)

Direct VPC access (no extra configuration)

Cold start 1-5 minutes (initial), 30-60s (subsequent)

Any Compute Engine machine type or custom

Watch Out for These

Mistake

App Engine Flexible can scale to zero instances.

Correct

Flexible always runs at least one instance. The minimum number of instances is 1 (or 2 by default). Scaling to zero is only possible in Standard.

Mistake

App Engine Standard supports all programming languages.

Correct

Standard supports only specific runtimes: Python 2.7/3.7+, Java 8/11/17, Node.js 10+, PHP 5.5/7+, Ruby 2.5+, Go 1.11+, and .NET (experimental). Custom runtimes are not allowed.

Mistake

Standard and Flexible have the same pricing model.

Correct

Standard bills per instance hour only when serving requests (no cost when idle). Flexible bills for the underlying Compute Engine VM even when idle, plus additional costs for disk and network.

Mistake

Both environments support background threads.

Correct

Standard does not allow background threads. The sandbox kills any threads after the request completes. Flexible allows background processes because it runs in a Docker container.

Mistake

You can SSH into App Engine Standard instances.

Correct

SSH access is only available for Flexible environment instances. Standard instances are sandboxed and cannot be accessed via SSH.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

Can I use App Engine Standard with a custom runtime like Rust?

No. App Engine Standard only supports the predefined runtimes: Python, Java, Node.js, PHP, Ruby, Go, and .NET (experimental). If you need a custom runtime like Rust, you must use App Engine Flexible, which allows you to provide your own Docker container with any runtime.

Does App Engine Flexible support scaling to zero?

No. App Engine Flexible always runs at least one instance. The minimum number of instances is 1 (default is 2). You cannot scale to zero. If you need zero scaling, use App Engine Standard.

How do I connect App Engine Standard to a Cloud SQL instance using private IP?

App Engine Standard cannot directly connect to a VPC network. To use private IP, you must configure Serverless VPC Access, which creates a connector that allows your Standard app to reach resources in your VPC. Alternatively, you can use the Cloud SQL Proxy or the App Engine connector library with public IP.

Can I run background workers in App Engine Standard?

No. App Engine Standard does not allow background threads or processes. The sandbox terminates any threads after the request completes. For background work, use Cloud Tasks, Cloud Scheduler, or deploy a separate service on App Engine Flexible or Compute Engine.

What is the default machine type for App Engine Flexible?

The default machine type for App Engine Flexible is g1-small (1 vCPU, 1.7 GB memory). You can override it in the app.yaml by specifying the `resources` section with `cpu` and `memory_gb` values. You can also use custom machine types.

How do I update an App Engine app without downtime?

For both environments, deploy a new version and gradually migrate traffic. Use `gcloud app deploy --no-promote` to deploy without routing traffic, then use `gcloud app services set-traffic` with the `--split-by` flag to gradually shift traffic. For Flexible, rolling updates are automatic but configurable via `max_surge` and `max_unavailable`.

Can I use GPU with App Engine Standard?

No. GPU is not supported in App Engine Standard. Only App Engine Flexible supports GPU by specifying a custom machine type with GPU accelerators (e.g., n1-standard-4 with NVIDIA Tesla T4).

Terms Worth Knowing

Ready to put this to the test?

You've just covered App Engine Standard vs Flexible — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.

Done with this chapter?