A financial services company is migrating a monolithic Java application to Google Kubernetes Engine (GKE) for improved scalability and reliability. The application serves real-time trading data and has strict latency requirements. Post-migration, the team observes frequent pod restarts due to OutOfMemory (OOM) errors, increased latency during peak trading hours, and occasional database connection timeouts. The current setup uses a single GKE cluster with a node pool of n1-standard-4 machines, a stateless application deployed as a Deployment with resource requests and limits set to 512 Mi memory and 1 CPU. The database is a Cloud SQL PostgreSQL instance with 2 vCPUs and 7.5 GB memory, and applications connect using a hardcoded connection string. The team wants to ensure reliable operation under load and during node maintenance events. Which course of action best addresses the reliability issues?
Correctly addresses all issues: resource tuning for OOM, custom metric HPA for load, cluster autoscaler for capacity, connection pooling for timeouts, and PDB for maintenance.
Why this answer
Option C comprehensively addresses all issues: setting resource requests ensures scheduling, limits prevent OOM, HPA on custom metrics (e.g., requests per second) scales based on load, Cloud SQL connection pooling with Cloud SQL Auth Proxy prevents connection exhaustion and adds security, cluster autoscaler handles node capacity, and PDB ensures availability during maintenance. Option A misses readiness probes and autoscaling; Option B ignores resource limits and connection pooling; Option D uses StatefulSet unnecessarily and omits connection pooling and HPA on custom metrics.