PMLE Serving and Scaling Models • Set 2
PMLE Serving and Scaling Models Practice Test 2 — 15 questions with explanations. Free, no signup.
You deployed a model to a Vertex AI endpoint with minReplicas=0 and maxReplicas=5. After sending prediction requests, you notice the endpoint takes about 30 seconds to respond initially, but subsequent requests are fast. What is the most likely cause?