PMLE Serving and Scaling Models • Set 3
PMLE Serving and Scaling Models Practice Test 3 — 15 questions with explanations. Free, no signup.
A company deploys a model on Vertex AI Endpoints for real-time inference. They need to minimize latency for prediction requests that are identical to previous requests. Which approach should they use?