CCNA Architecting low-code ML solutions Questions

MCQhard

A company is using AutoML Tables to build a fraud detection model. The dataset has 10 million rows with 100 features, heavily imbalanced (fraud cases 0.1%). They used AutoML Tables with default settings and achieved high precision but very low recall. They need to deploy the model for real-time scoring on a Vertex AI Endpoint. The model will be used by a transaction processing system that requires low latency (<100 ms per prediction) and high throughput. The team is concerned about cost as the endpoint will receive up to 5,000 predictions per second. After deploying the model, they notice that the endpoint's latency occasionally spikes to over 1 second during peak hours. The team wants to optimize both model performance (recall) and serving performance. Which course of action should they take?

A.Retrain the model with adjusted class weights in AutoML Tables to increase recall, then deploy using Vertex AI Prediction with autoscaling enabled.

B.Use BigQuery ML to create a logistic regression model with class weights, then deploy it on Cloud Run with maximum concurrency.

C.Export the AutoML Tables model as a TensorFlow SavedModel and deploy it on Vertex AI Prediction with a larger machine type and increased min replicas.

D.Use Vertex AI Workbench to manually tune a deep neural network with class imbalance techniques, then deploy as a custom container on App Engine.

AnswerA

AutoML Tables supports class weights to handle imbalance, improving recall. Vertex AI Prediction with autoscaling dynamically adjusts resources to maintain latency during spikes and control costs.

Why this answer

Option A is correct because AutoML Tables allows adjusting class weights to handle imbalanced datasets, which directly addresses the low recall issue by penalizing misclassifications of the minority class more heavily. Deploying on Vertex AI Prediction with autoscaling ensures the endpoint can handle up to 5,000 predictions per second while maintaining low latency, as autoscaling dynamically adjusts resources based on traffic, preventing spikes during peak hours.

Exam trap

Google Cloud often tests the misconception that exporting a managed model to a custom format (like TensorFlow SavedModel) and deploying on a larger machine type is the best way to optimize serving performance, when in fact autoscaling and class weight adjustments within the managed service are the correct low-code approach.

How to eliminate wrong answers

Option B is wrong because BigQuery ML's logistic regression is a simpler model that may not capture complex patterns in 100 features, and Cloud Run's maximum concurrency can lead to increased latency under high throughput (5,000 QPS) without dedicated GPU/TPU support for real-time scoring. Option C is wrong because exporting an AutoML Tables model as a TensorFlow SavedModel loses the optimized serving infrastructure of AutoML, and simply using a larger machine type with increased min replicas does not guarantee sub-100ms latency during traffic spikes without autoscaling. Option D is wrong because using Vertex AI Workbench to manually tune a deep neural network is not a low-code solution, and deploying on App Engine introduces cold start issues and lacks the low-latency, high-throughput capabilities of Vertex AI Prediction for real-time scoring.