A team is monitoring a production service on Google Kubernetes Engine (GKE) and notices that a deployment is occasionally returning HTTP 503 errors. The team has set up a ServiceMonitor in Prometheus to scrape metrics from the pods. What is the most likely cause of the intermittent 503 errors?
Trap 1: The pods are crashing and restarting frequently.
Restarts would cause 503s but less likely than readiness probe failures.
Trap 2: The Prometheus scrape interval is too long, causing missed metrics.
Prometheus scraping does not affect pod availability.
Trap 3: The container resource limits are set too low, causing…
OOM errors cause restarts, not directly 503s.
- A
The pods are crashing and restarting frequently.
Why wrong: Restarts would cause 503s but less likely than readiness probe failures.
- B
The Prometheus scrape interval is too long, causing missed metrics.
Why wrong: Prometheus scraping does not affect pod availability.
- C
The readiness probes are failing, causing the pods to be removed from the service endpoints.
Readiness probe failures remove pods from service endpoints, causing 503s if all replicas fail.
- D
The container resource limits are set too low, causing out-of-memory errors.
Why wrong: OOM errors cause restarts, not directly 503s.