Courseiva — IT Certification Practice Questions

PDE

Study mode — explanations shown

Q 1 / 90

Operationalizing machine learning models

medium

A team notices that the latency for online predictions from a Vertex AI endpoint has increased significantly over the past hour. The model is a large TensorFlow model deployed with automatic scaling (minReplicaCount=2, maxReplicaCount=10). The CPU utilization of the deployed instances is consistently above 85%. What is the most likely cause of the increased latency?