This chapter covers Google Cloud Load Balancing types, a critical topic for the ACE exam. Load balancing is the backbone of scalable, highly available applications on Google Cloud. Approximately 10-15% of exam questions touch on load balancing concepts, configurations, and use cases. You will learn the differences between global and regional, external and internal, and Layer 4 and Layer 7 load balancers, along with their specific features, quotas, and best practices. Mastery of this topic is essential for designing resilient architectures and passing the exam.
Jump to a section
Imagine a major highway connecting two cities. At each end, there are toll booths that manage traffic entering and exiting. The highway itself has multiple lanes, but the toll booths are the only way to get on or off. Now, think of each car as a network packet. The highway is the Google Cloud network backbone. The toll booth at the entry is the load balancer's frontend — it receives all incoming traffic. The toll booth operator (load balancer) must decide which lane (backend instance) each car should take. There are two types of toll booths: one that only looks at the car's destination city (Layer 4, TCP/UDP) and sends it to any lane that can handle that city, and another that also inspects the car's contents (Layer 7, HTTP/HTTPS) to route based on cargo type, package priority, or even the specific delivery address. The first type (Layer 4) is very fast — it just checks the destination IP and port, then forwards the car to a lane based on a simple round-robin or least-connections algorithm. The second type (Layer 7) is slower but smarter: it can read HTTP headers, cookies, and paths to send cars to specialized lanes (e.g., 'images' lane, 'video' lane). Both types also perform health checks: if a lane is closed due to construction (instance down), the toll booth stops sending cars there. Additionally, the load balancer can be global or regional: a global toll booth system has booths at multiple entry points across the country, routing cars to the nearest open lane, while a regional system only manages traffic within one state.
What is Google Cloud Load Balancing?
Google Cloud Load Balancing is a fully distributed, software-defined service that distributes incoming traffic across multiple backend instances (Compute Engine VMs, GKE pods, serverless backends) in one or more regions. Unlike traditional hardware load balancers, there is no virtual appliance to manage — the load balancer is a global or regional service that scales automatically. It uses Google's global network infrastructure to route traffic from users to the closest healthy backend with capacity.
Why It Exists
Applications need to handle varying traffic loads, maintain availability during failures, and provide low latency to users worldwide. Cloud Load Balancing provides: - High availability: Automatic failover across instances, zones, and regions. - Scalability: Seamless scaling of backend instances without reconfiguration. - Global reach: Single anycast IP address for global applications. - Traffic management: Content-based routing, session affinity, and traffic splitting.
Types of Load Balancers
Google Cloud offers several load balancer types, categorized along three axes: 1. Global vs. Regional: Global load balancers use a single anycast IP address and route traffic to the closest healthy backend. Regional load balancers operate within a single region. 2. External vs. Internal: External load balancers distribute traffic from the internet to your VPC. Internal load balancers distribute traffic within your VPC (private IPs). 3. Layer 4 (TCP/UDP) vs. Layer 7 (HTTP/HTTPS): Layer 4 load balancers forward packets based on IP and port. Layer 7 load balancers inspect application-layer data (HTTP headers, cookies, paths).
#### Global External Load Balancers - Global External HTTP(S) Load Balancer: Layer 7, supports HTTP/HTTPS/HTTP2. Uses URL maps for content-based routing. Supports Cloud CDN, IAP, and SSL offload. Backends can be instance groups, GKE, serverless (Cloud Run, App Engine), or Cloud Storage buckets. Ideal for web applications with global users. - Global External TCP/UDP Load Balancer: Layer 4, supports TCP and UDP. Also known as the SSL Proxy and TCP Proxy load balancers. SSL Proxy terminates SSL connections and forwards TCP traffic. TCP Proxy forwards TCP traffic without SSL termination. Both are global and support only TCP (SSL Proxy) or TCP (TCP Proxy). They are used for non-HTTP protocols that need global anycast IP.
#### Regional External Load Balancers - Regional External HTTP(S) Load Balancer: Layer 7, regional scope. Supports HTTP/HTTPS. Uses URL maps. Backends must be in the same region. Used when global load balancing is not needed or when using internal backends with external clients. - Regional External TCP/UDP Load Balancer: Layer 4, also called the Network Load Balancer (NLB). Supports TCP and UDP. It is regional and uses forwarding rules to direct traffic. It does not support health checks for UDP (only TCP). Used for low-latency, high-throughput applications like gaming or real-time streaming.
#### Internal Load Balancers - Internal HTTP(S) Load Balancer: Layer 7, regional. Distributes internal traffic (within VPC) for HTTP/HTTPS. Uses URL maps. Backends can be instance groups or serverless. Ideal for microservices communicating internally. - Internal TCP/UDP Load Balancer: Layer 4, regional. Distributes internal TCP/UDP traffic. Also known as the Internal Load Balancer (ILB). Supports health checks. Used for internal services like databases or middleware.
How Load Balancers Work
All load balancers share a common architecture: 1. Forwarding Rule: Maps an external (or internal) IP address, protocol, and port to a target proxy or backend service. 2. Target Proxy (for HTTP(S) only): Terminates client connections and sends requests to the backend service. Supports SSL certificates. 3. URL Map (for HTTP(S) only): Defines routing rules based on host and path. 4. Backend Service: Configures backends (instance groups, NEGs), health checks, session affinity, and capacity balancer settings. 5. Health Check: Probes backends at regular intervals (default every 5 seconds) to determine health. Unhealthy instances are removed from rotation. 6. Backend: An instance group (zonal, regional, unmanaged) or network endpoint group (NEG) that serves traffic.
Key Components, Values, and Defaults
Health Check Parameters:
Check interval: 5 seconds (default), minimum 1 second.
Timeout: 5 seconds (default), minimum 1 second.
Healthy threshold: 2 (default), minimum 1.
Unhealthy threshold: 2 (default), minimum 1.
Session Affinity: Options: NONE, CLIENT_IP, GENERATED_COOKIE, HTTP_COOKIE, CLIENT_IP_PORT_PROTO (for TCP/UDP). Default is NONE.
Connection Draining: Default timeout is 300 seconds (5 minutes). Can be set from 0 to 3600 seconds.
Backend Capacity: Each backend can have a max RPS or max connections setting. Default is unlimited.
Quotas:
Max forwarding rules per project: 15 (default, can be increased).
Max backend services per project: 15 (default).
Max health checks per project: 25 (default).
Supported Protocols: HTTP/1.1, HTTP/2, HTTPS, TCP (with SSL), TCP (without SSL), UDP.
Configuration and Verification Commands
Using gcloud compute commands:
#### Create a Health Check
gcloud compute health-checks create tcp my-tcp-health-check --port=80 --check-interval=5 --timeout=5 --unhealthy-threshold=2 --healthy-threshold=2#### Create a Backend Service
gcloud compute backend-services create my-backend-service --protocol=HTTP --health-checks=my-tcp-health-check --global#### Add Backend Instance Group
gcloud compute backend-services add-backend my-backend-service --instance-group=my-instance-group --instance-group-zone=us-central1-a --balancing-mode=UTILIZATION --max-utilization=0.8 --global#### Create a URL Map
gcloud compute url-maps create my-url-map --default-service=my-backend-service#### Create a Target Proxy
gcloud compute target-http-proxies create my-target-proxy --url-map=my-url-map#### Create a Forwarding Rule
gcloud compute forwarding-rules create my-forwarding-rule --global --target-http-proxy=my-target-proxy --ports=80#### Verify Load Balancer Status
gcloud compute backend-services get-health my-backend-service --globalInteraction with Related Technologies
Cloud CDN: Can be enabled on global external HTTP(S) load balancers to cache content at edge locations.
Identity-Aware Proxy (IAP): Can be enabled on load balancers to enforce access control based on user identity.
Cloud Armor: Provides DDoS protection and WAF capabilities at the load balancer level.
Traffic Director: For advanced traffic management in service mesh architectures, but not directly a load balancer.
VPC Network: Internal load balancers use RFC 1918 addresses within the VPC. External load balancers use external IPs.
Exam-Relevant Details
Global vs. Regional: Global load balancers are only available for HTTP(S) and TCP/SSL Proxy. Regional load balancers are for Network Load Balancer (TCP/UDP) and regional HTTP(S).
Anycast IP: Global load balancers use a single anycast IP (e.g., 34.96.128.0/17 range). Regional load balancers use a specific regional IP.
Backend Types: Instance groups (zonal, regional, unmanaged), NEGs (GCE_VM_IP, GCE_VM_IP_PORT, INTERNET_IP, SERVERLESS, PRIVATE_SERVICE_CONNECT).
Session Affinity: For global HTTP(S), only CLIENT_IP or GENERATED_COOKIE are supported for cross-region affinity. CLIENT_IP_PORT_PROTO is not supported.
Health Check for UDP: The Network Load Balancer does not support health checks for UDP backends. You must use TCP health checks on a related port.
SSL Offload: The Global External HTTP(S) Load Balancer can terminate SSL, reducing backend CPU load.
Internal Load Balancer: Supports proxy protocol (for preserving client IP) and connection tracking (default 5 minutes idle timeout).
Define Backend and Health Check
First, you create a health check that defines how the load balancer probes backend instances. For example, a TCP health check on port 80 with default interval of 5 seconds and timeout of 5 seconds. Then you create a backend service that references this health check. The backend service is the logical group of backends that serve traffic. You specify the protocol (HTTP, HTTPS, TCP, UDP) and balancing mode (UTILIZATION, RATE, CONNECTION). The default balancing mode is UTILIZATION with max utilization of 0.8 (80%). This step is critical because without a proper health check, the load balancer cannot detect failures and traffic will be sent to unhealthy instances.
Add Backends to Backend Service
You add backend instances or instance groups to the backend service. For each backend, you specify the instance group (zonal or regional), the balancing mode (e.g., UTILIZATION, RATE, or CONNECTION), and the capacity settings (e.g., max RPS or max connections). For UTILIZATION mode, you set a max utilization (default 0.8). The load balancer distributes traffic based on these settings. If a backend's utilization exceeds the max, the load balancer sends less traffic to it. You can also set a failover ratio for regional instance groups. This step determines the actual compute resources that will handle traffic.
Create URL Map (Layer 7 Only)
For HTTP(S) load balancers, you create a URL map that defines routing rules based on host and path. The URL map has a default service that handles unmatched requests. You can add path matchers that route requests to different backend services based on URL paths (e.g., /images -> image-backend, /video -> video-backend). You can also define host rules to route based on the Host header (e.g., www.example.com vs api.example.com). The URL map is a critical component for content-based routing and multi-tenant applications. It supports wildcard hosts and paths.
Create Target Proxy (Layer 7 Only)
For HTTP(S) load balancers, you create a target proxy that terminates client connections. There are two types: target HTTP proxy (for HTTP) and target HTTPS proxy (for HTTPS). The target proxy references the URL map and optionally an SSL certificate (for HTTPS). The proxy terminates the SSL connection and forwards the decrypted request to the backend service. This offloads SSL processing from backends. The proxy also supports HTTP/2 and QUIC. For global load balancers, the proxy is global; for regional, it is regional. This step is where SSL certificates are attached.
Create Forwarding Rule
The forwarding rule maps an external (or internal) IP address, protocol, and port to the target proxy (for Layer 7) or backend service (for Layer 4). For global load balancers, the IP is anycast and global. For regional, it is a regional IP. The forwarding rule also specifies the IP version (IPv4 or IPv6). You can create multiple forwarding rules for different ports (e.g., 80 and 443). The load balancer listens on the specified IP:port and forwards traffic according to the target. This is the final step that makes the load balancer accessible. Once created, traffic flows from clients to the forwarding rule, through the proxy (if Layer 7), to the backend service, and finally to a healthy backend instance.
Scenario 1: Global E-commerce Platform
A multinational retailer deploys a web application on Compute Engine instances across multiple regions (us-central1, europe-west1, asia-east1). They use a Global External HTTP(S) Load Balancer with a single anycast IP. The URL map routes requests to regional backend services: /api to a backend in us-central1, /static to a Cloud Storage bucket via serverless NEG, and default to a regional instance group in the closest region. Health checks are configured with a 5-second interval and 2 unhealthy threshold. They enable Cloud CDN for static content and Cloud Armor for WAF rules. During a flash sale, traffic spikes 10x, but the load balancer automatically distributes traffic across all healthy instances. One zone in us-central1 fails; the load balancer detects it within 10 seconds (2 unhealthy checks * 5 seconds) and stops sending traffic there. The retailer achieves 99.99% availability. Common misconfiguration: forgetting to set session affinity for cart service, causing users to lose session data on backend changes. They use GENERATED_COOKIE affinity to stick users to the same backend for the session duration.
Scenario 2: Internal Microservices
A fintech company runs microservices on GKE in a single region. They use an Internal HTTP(S) Load Balancer to route traffic between services. The load balancer has a private IP (10.0.0.1) within the VPC. URL maps route /payments to a payment service backend and /auth to an authentication service backend. They use serverless NEGs for Cloud Run services. The load balancer is regional, so all traffic stays within the region, reducing latency and egress costs. They configure health checks with a 1-second interval for fast failover. They also enable logging to monitor traffic patterns. One challenge: the internal load balancer's proxy preserves client IPs via proxy protocol, but the backend services must be configured to parse the proxy protocol header. Misconfiguration leads to incorrect source IP logging. They also set connection draining timeout to 30 seconds to gracefully handle instance shutdowns during rolling updates.
Scenario 3: Real-time Gaming with UDP
A gaming company hosts a multiplayer game that uses UDP for low-latency communication. They deploy game servers in a single region using regional Network Load Balancer (External TCP/UDP). The load balancer has a regional external IP and forwards UDP traffic on port 7777 to a backend instance group. They cannot use health checks for UDP, so they configure a TCP health check on a separate management port (e.g., 8080) that reflects the server's health. They use session affinity based on CLIENT_IP_PORT_PROTO to ensure packets from the same player go to the same server. During a DDoS attack, they use Cloud Armor to filter malicious traffic. The load balancer handles millions of packets per second. Common pitfall: forgetting that the Network Load Balancer does not support UDP health checks, leading to blackholing traffic if the TCP health check port goes down but the game port is still up. They monitor both ports with separate health checks.
What the ACE Tests
The ACE exam objectives (Domain 2: Planning Solutions, Objective 2.1) require you to 'determine which load balancing type to use based on requirements.' You must understand the differences between global vs. regional, external vs. internal, and Layer 4 vs. Layer 7. Specific objectives include: - 2.1.1: Differentiate between global and regional load balancers. - 2.1.2: Determine appropriate load balancer for a given use case (e.g., HTTP vs TCP, external vs internal). - 2.1.3: Identify key features like health checks, session affinity, and SSL offload.
Common Wrong Answers
Choosing a global TCP load balancer for UDP traffic: Many candidates think the Global External TCP/UDP Load Balancer supports UDP. It does NOT. The global TCP proxy only supports TCP. For UDP, you need a regional Network Load Balancer (External TCP/UDP).
Selecting an internal load balancer for internet-facing traffic: Internal load balancers have private IPs and cannot receive traffic from the internet. Candidates often confuse internal with external.
Assuming all load balancers support health checks for UDP: The Network Load Balancer does not support UDP health checks. You must use TCP health checks on a different port.
Thinking session affinity is required for all applications: Session affinity is optional. The default is NONE. Candidates may think it's always needed, but stateless apps don't need it.
Specific Numbers and Terms
Default health check interval: 5 seconds.
Default unhealthy threshold: 2.
Connection draining timeout: Default 300 seconds, range 0-3600.
Global anycast IP range: 34.96.128.0/17 (for external HTTP(S) load balancers).
Max forwarding rules per project: 15 (default).
Balancing modes: UTILIZATION (default 0.8), RATE (max RPS), CONNECTION (max connections).
Edge Cases and Exceptions
Cross-region session affinity: Only CLIENT_IP and GENERATED_COOKIE are supported for global HTTP(S) load balancers. CLIENT_IP_PORT_PROTO is not supported globally.
Internal HTTP(S) load balancer: Supports only HTTP/HTTPS, not TCP/UDP. For internal TCP/UDP, use Internal TCP/UDP Load Balancer.
Serverless NEGs: Only supported with HTTP(S) load balancers (global or regional). Not supported with TCP/UDP load balancers.
IPv6: Only supported on global external HTTP(S) load balancers and regional external TCP/UDP load balancers. Internal load balancers do not support IPv6.
How to Eliminate Wrong Answers
If the requirement mentions 'global anycast IP' or 'users worldwide', eliminate regional options.
If the requirement mentions 'TCP or UDP', eliminate HTTP(S) load balancers.
If the requirement mentions 'internal traffic' or 'private IP', eliminate external load balancers.
If the requirement mentions 'content-based routing' or 'URL paths', eliminate Layer 4 load balancers.
Pay attention to protocol: UDP traffic can only use regional Network Load Balancer (External TCP/UDP) or Internal TCP/UDP Load Balancer (but internal UDP is supported).
Global External HTTP(S) Load Balancer supports HTTP/HTTPS/HTTP2, Cloud CDN, IAP, and Cloud Armor.
Global External TCP/UDP Load Balancer (TCP Proxy) supports only TCP, not UDP.
Regional Network Load Balancer (External TCP/UDP) supports both TCP and UDP, but no UDP health checks.
Internal load balancers use private IPs and are only for VPC-internal traffic.
Default health check interval is 5 seconds; unhealthy threshold is 2.
Connection draining default timeout is 300 seconds.
Session affinity for global HTTP(S) supports only CLIENT_IP and GENERATED_COOKIE.
Serverless NEGs are only supported with HTTP(S) load balancers.
IPv6 is supported only on global external HTTP(S) and regional external TCP/UDP load balancers.
Max forwarding rules per project is 15 (default).
These come up on the exam all the time. Here's how to tell them apart.
Global External HTTP(S) Load Balancer
Single anycast IP for global traffic distribution.
Supports Cloud CDN, IAP, and Cloud Armor.
Backends can be in any region, but must be global instance groups or NEGs.
Higher latency for regional-only apps due to cross-region routing.
Quota: 15 global forwarding rules per project.
Regional External HTTP(S) Load Balancer
Regional IP, only distributes traffic within one region.
Does not support Cloud CDN. Supports IAP and Cloud Armor.
Backends must be in the same region as the load balancer.
Lower latency for regional applications.
Quota: 15 regional forwarding rules per project (separate from global).
Network Load Balancer (External TCP/UDP)
External IP, receives traffic from internet.
Supports TCP and UDP.
Regional scope only.
No health checks for UDP; use TCP health checks.
Supports session affinity (CLIENT_IP, CLIENT_IP_PORT_PROTO).
Internal TCP/UDP Load Balancer
Internal IP, only accessible within VPC.
Supports TCP and UDP.
Regional scope only.
Supports health checks for TCP only (not UDP).
Supports session affinity (CLIENT_IP, CLIENT_IP_PORT_PROTO).
Mistake
The Global External TCP/UDP Load Balancer supports both TCP and UDP.
Correct
The Global External TCP/UDP Load Balancer (also called TCP Proxy) only supports TCP. For UDP, you must use a regional Network Load Balancer (External TCP/UDP). The name 'TCP/UDP' is misleading; it actually refers to the fact that it can balance TCP traffic and also has a separate UDP option, but the global version does not support UDP.
Mistake
All load balancers support health checks for any protocol.
Correct
The Network Load Balancer (regional external TCP/UDP) does not support health checks for UDP backends. You must configure a separate TCP health check on a different port. Similarly, internal TCP/UDP load balancers support health checks only for TCP, not UDP.
Mistake
Session affinity is required for all web applications.
Correct
Session affinity is optional and only needed for stateful applications that store session data locally. Stateless applications can distribute requests to any backend. Enabling session affinity can reduce scalability and should be used only when necessary.
Mistake
Internal load balancers can be used to balance traffic from the internet.
Correct
Internal load balancers have private IP addresses (RFC 1918) and can only distribute traffic within the same VPC network or peered networks. They cannot receive traffic directly from the internet. For internet-facing traffic, use external load balancers.
Mistake
The Global External HTTP(S) Load Balancer is the only load balancer that supports Cloud CDN.
Correct
Cloud CDN can be enabled only on the Global External HTTP(S) Load Balancer. Regional HTTP(S) load balancers do not support Cloud CDN. This is a key differentiator.
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
Global load balancers use a single anycast IP address and distribute traffic to backends in any region, routing users to the closest healthy backend. Regional load balancers operate within a single region and use a regional IP. Global load balancers are ideal for applications with users worldwide, while regional load balancers are suitable for latency-sensitive or region-locked applications. On the ACE exam, remember that only HTTP(S) and TCP/SSL Proxy load balancers can be global; Network Load Balancer (TCP/UDP) is always regional.
No. The Network Load Balancer (regional external TCP/UDP) does not support health checks for UDP backends. You must configure a separate TCP health check on a different port (e.g., a management port) to determine backend health. This is a common exam trap. The health check will probe the TCP port, and if it fails, the backend is marked unhealthy regardless of UDP port status.
For global external HTTP(S) load balancers, the supported session affinity modes are: NONE (default), CLIENT_IP, and GENERATED_COOKIE. CLIENT_IP_PORT_PROTO is not supported globally. For regional HTTP(S) load balancers, you can also use HTTP_COOKIE. On the exam, remember that cross-region affinity requires using CLIENT_IP or GENERATED_COOKIE to ensure sticky sessions across zones.
Use a global load balancer if your application serves users worldwide and you want a single anycast IP with automatic routing to the nearest backend. Use a regional load balancer if your application is confined to a single region (e.g., due to data residency) or requires low latency within that region. Also, regional load balancers are the only option for UDP traffic. On the exam, look for keywords like 'global users' vs 'single region' to decide.
No. Internal load balancers have private IP addresses (RFC 1918) and can only be reached from within the same VPC network, peered VPCs, or via Cloud VPN/Interconnect. They cannot receive traffic directly from the internet. For internet-facing traffic, use an external load balancer. This is a common misconception tested on the ACE exam.
Connection draining allows existing connections to complete before a backend is removed from service (e.g., during rolling updates). When a backend is deregistered, the load balancer stops sending new connections to it but continues to forward existing connections for a configurable timeout (default 300 seconds). After the timeout, any remaining connections are closed. This ensures graceful shutdown. On the exam, remember the default timeout is 300 seconds and can be set from 0 to 3600 seconds.
Yes. The Global External HTTP(S) Load Balancer supports WebSockets (upgrade from HTTP to WebSocket protocol). It also supports HTTP/2 and gRPC. However, session affinity must be configured to ensure WebSocket connections are sticky to the same backend. The load balancer will forward the upgrade request and then maintain the persistent connection.
You've just covered Cloud Load Balancing Types — now see how well it sticks with free ACE practice questions. Full explanations included, no account needed.
Done with this chapter?