A company is using AutoML Tables to build a fraud detection model. The dataset has 10 million rows with 100 features, heavily imbalanced (fraud cases 0.1%). They used AutoML Tables with default settings and achieved high precision but very low recall. They need to deploy the model for real-time scoring on a Vertex AI Endpoint. The model will be used by a transaction processing system that requires low latency (<100 ms per prediction) and high throughput. The team is concerned about cost as the endpoint will receive up to 5,000 predictions per second. After deploying the model, they notice that the endpoint's latency occasionally spikes to over 1 second during peak hours. The team wants to optimize both model performance (recall) and serving performance. Which course of action should they take?
AutoML Tables supports class weights to handle imbalance, improving recall. Vertex AI Prediction with autoscaling dynamically adjusts resources to maintain latency during spikes and control costs.
Why this answer
Option A is correct because AutoML Tables allows adjusting class weights to handle imbalanced datasets, which directly addresses the low recall issue by penalizing misclassifications of the minority class more heavily. Deploying on Vertex AI Prediction with autoscaling ensures the endpoint can handle up to 5,000 predictions per second while maintaining low latency, as autoscaling dynamically adjusts resources based on traffic, preventing spikes during peak hours.
Exam trap
Google Cloud often tests the misconception that exporting a managed model to a custom format (like TensorFlow SavedModel) and deploying on a larger machine type is the best way to optimize serving performance, when in fact autoscaling and class weight adjustments within the managed service are the correct low-code approach.
How to eliminate wrong answers
Option B is wrong because BigQuery ML's logistic regression is a simpler model that may not capture complex patterns in 100 features, and Cloud Run's maximum concurrency can lead to increased latency under high throughput (5,000 QPS) without dedicated GPU/TPU support for real-time scoring. Option C is wrong because exporting an AutoML Tables model as a TensorFlow SavedModel loses the optimized serving infrastructure of AutoML, and simply using a larger machine type with increased min replicas does not guarantee sub-100ms latency during traffic spikes without autoscaling. Option D is wrong because using Vertex AI Workbench to manually tune a deep neural network is not a low-code solution, and deploying on App Engine introduces cold start issues and lacks the low-latency, high-throughput capabilities of Vertex AI Prediction for real-time scoring.