A DevOps team notices that a microservice is returning 503 errors intermittently. The service runs in Kubernetes and uses a liveness probe. The team wants to understand the root cause without restarting the pod. Which observability approach should they use first?
Trap 1: Use kubectl describe pod to check recent events
Events may not capture every probe failure, especially if they are short-lived.
Trap 2: Increase log verbosity in the application to capture all requests
Logs may not capture probe failures, and increasing verbosity can impact performance.
Trap 3: Enable distributed tracing across the service mesh
Tracing is for request flows, not probe health checks.
- A
Use kubectl describe pod to check recent events
Why wrong: Events may not capture every probe failure, especially if they are short-lived.
- B
Query Prometheus for kubelet metrics on probe successes and failures
Metrics like 'probe_success' from kubelet can show probe status over time, helping identify intermittent failures.
- C
Increase log verbosity in the application to capture all requests
Why wrong: Logs may not capture probe failures, and increasing verbosity can impact performance.
- D
Enable distributed tracing across the service mesh
Why wrong: Tracing is for request flows, not probe health checks.