You are a cloud architect for an e-commerce company. Their application runs on Google Kubernetes Engine (GKE) with a Regional cluster. The application consists of a frontend service, a backend service, and a Redis cache. Traffic is routed via an external HTTP(S) Load Balancer to the frontend. Recently, customers have reported intermittent 502 Bad Gateway errors during peak hours. The frontend logs show 'upstream connect error or disconnect/reset before headers. retried and limit reset' errors. The backend service is deployed with 3 replicas, each with resource requests of 1 CPU and 2 GB memory. The cluster autoscaler is enabled with a minimum of 3 nodes and a maximum of 10 nodes, using e2-standard-4 instances. The backend service's HPA is configured with CPU utilization target of 80%. During peak hours, CPU utilization on the backend pods reaches 90%, but the HPA does not scale up. The cluster has sufficient node capacity. What should you do to resolve the issue?
Lowering the target triggers scaling earlier, and more min replicas provide baseline capacity.
Why this answer
The HPA is configured with a CPU utilization target of 80%, but during peak hours, CPU utilization reaches 90% without triggering scale-up. This indicates that the HPA's target utilization is too high relative to the actual load, causing the HPA to not scale because the average CPU utilization across pods may still be below the target when considering the metric calculation. Lowering the HPA CPU target to 60% ensures that the HPA triggers scaling earlier, and increasing the minimum replicas to 5 provides a baseline capacity to absorb traffic spikes, preventing the upstream connect errors from the backend being overwhelmed.
Exam trap
Google Cloud often tests the misconception that increasing cluster node count or changing autoscaler settings resolves pod-level scaling issues, when the real problem is the HPA configuration not triggering due to a high target utilization or insufficient minimum replicas.
How to eliminate wrong answers
Option A is wrong because switching to memory utilization does not address the root cause—CPU is the bottleneck (90% utilization) and memory may not be the limiting factor; the HPA would still fail to scale if memory is not the constrained resource. Option C is wrong because the error 'upstream connect error or disconnect/reset before headers' indicates connection timeouts or resource exhaustion at the pod level, not a connection limit per pod; increasing max connections in backendConfig would not resolve the underlying CPU starvation. Option D is wrong because the cluster already has sufficient node capacity (the autoscaler can add nodes up to 10, and the issue is that the HPA is not scaling pods, not that nodes are unavailable; adding more nodes does not force the HPA to scale pods.