A company deploys a TensorFlow model on Vertex AI Prediction with a single node. During peak hours, inference latency increases. What should they do first to reduce latency?
Trap 1: Increase the machine type of the node
Increasing machine type may help but does not address scaling under load.
Trap 2: Decrease the min replicas to 0
Reducing min replicas may cause cold starts and increase latency.
Trap 3: Enable automatic batching of requests
Batching increases latency as requests wait to be batched.
- A
Enable autoscaling for the deployment
Autoscaling adds nodes during peak traffic, reducing latency.
- B
Increase the machine type of the node
Why wrong: Increasing machine type may help but does not address scaling under load.
- C
Decrease the min replicas to 0
Why wrong: Reducing min replicas may cause cold starts and increase latency.
- D
Enable automatic batching of requests
Why wrong: Batching increases latency as requests wait to be batched.