PMLE • Timed Hard Questions
This is a timed practice session. You have 15 minutes to answer 25 questions — approximately 1 minute per question, matching real PMLE exam pace. Answer every question before time expires.
Time remaining
15:00
Exam-pace drill
Allow 1 minute per question. On the real PMLE exam you have approximately 72 seconds per question — this session trains you to maintain that pace under pressure.
A travel booking company has a real-time recommendation system that suggests hotels and flights to users. The model is served using TensorFlow Serving on a Google Kubernetes Engine (GKE) cluster with auto-scaling enabled. The cluster uses n1-standard-4 machine types. The team has set up Cloud Monitoring dashboards and alerts. Last week, during a major holiday promotion, the team noticed that the model's inference latency P99 increased from 150 ms to 450 ms over a 30-minute period, while the request throughput increased from 500 to 1,200 requests per second. CPU utilization across the cluster rose to 95%, but memory utilization remained at 60%. The model version and the serving infrastructure configuration have not changed since the last deployment. Which action should the team take to mitigate the latency issue?