A company deploys a machine learning model to Vertex AI for real-time predictions. After deployment, they notice that prediction latency spikes during peak traffic hours. Which approach should they take to reduce latency without sacrificing accuracy?
Trap 1: Reduce the number of input features
Reducing features might degrade accuracy.
Trap 2: Switch from online to batch prediction
Batch prediction is not for real-time.
Trap 3: Use a larger machine type for the model
Larger machines may not address scaling dynamically.
- A
Configure auto-scaling with higher min and max instances
Auto-scaling handles traffic spikes.
- B
Reduce the number of input features
Why wrong: Reducing features might degrade accuracy.
- C
Switch from online to batch prediction
Why wrong: Batch prediction is not for real-time.
- D
Use a larger machine type for the model
Why wrong: Larger machines may not address scaling dynamically.