CCNA Architecting low-code ML solutions Questions

75 questions · Architecting low-code ML solutions · All types, answers revealed

1
MCQhard

A company is using AutoML Tables to build a fraud detection model. The dataset has 10 million rows with 100 features, heavily imbalanced (fraud cases 0.1%). They used AutoML Tables with default settings and achieved high precision but very low recall. They need to deploy the model for real-time scoring on a Vertex AI Endpoint. The model will be used by a transaction processing system that requires low latency (<100 ms per prediction) and high throughput. The team is concerned about cost as the endpoint will receive up to 5,000 predictions per second. After deploying the model, they notice that the endpoint's latency occasionally spikes to over 1 second during peak hours. The team wants to optimize both model performance (recall) and serving performance. Which course of action should they take?

A.Retrain the model with adjusted class weights in AutoML Tables to increase recall, then deploy using Vertex AI Prediction with autoscaling enabled.
B.Use BigQuery ML to create a logistic regression model with class weights, then deploy it on Cloud Run with maximum concurrency.
C.Export the AutoML Tables model as a TensorFlow SavedModel and deploy it on Vertex AI Prediction with a larger machine type and increased min replicas.
D.Use Vertex AI Workbench to manually tune a deep neural network with class imbalance techniques, then deploy as a custom container on App Engine.
AnswerA

AutoML Tables supports class weights to handle imbalance, improving recall. Vertex AI Prediction with autoscaling dynamically adjusts resources to maintain latency during spikes and control costs.

Why this answer

Option A is correct because AutoML Tables allows adjusting class weights to handle imbalanced datasets, which directly addresses the low recall issue by penalizing misclassifications of the minority class more heavily. Deploying on Vertex AI Prediction with autoscaling ensures the endpoint can handle up to 5,000 predictions per second while maintaining low latency, as autoscaling dynamically adjusts resources based on traffic, preventing spikes during peak hours.

Exam trap

Google Cloud often tests the misconception that exporting a managed model to a custom format (like TensorFlow SavedModel) and deploying on a larger machine type is the best way to optimize serving performance, when in fact autoscaling and class weight adjustments within the managed service are the correct low-code approach.

How to eliminate wrong answers

Option B is wrong because BigQuery ML's logistic regression is a simpler model that may not capture complex patterns in 100 features, and Cloud Run's maximum concurrency can lead to increased latency under high throughput (5,000 QPS) without dedicated GPU/TPU support for real-time scoring. Option C is wrong because exporting an AutoML Tables model as a TensorFlow SavedModel loses the optimized serving infrastructure of AutoML, and simply using a larger machine type with increased min replicas does not guarantee sub-100ms latency during traffic spikes without autoscaling. Option D is wrong because using Vertex AI Workbench to manually tune a deep neural network is not a low-code solution, and deploying on App Engine introduces cold start issues and lacks the low-latency, high-throughput capabilities of Vertex AI Prediction for real-time scoring.

2
MCQeasy

A team needs to quickly create a visual interface for data exploration and model building without writing code. They want to run AutoML jobs and visualize results. Which Google Cloud tool should they use?

A.Vertex AI Workbench
B.Cloud Datalab
C.Cloud Composer
D.Google Colab
AnswerA

Provides a managed notebook environment with visual data exploration and one-click AutoML integration.

Why this answer

Vertex AI Workbench provides a managed JupyterLab environment with a low-code interface for data exploration, AutoML model training, and result visualization without writing code. It integrates directly with Vertex AI's AutoML and custom training services, allowing users to run AutoML jobs and view evaluation metrics, feature importance, and predictions through its UI.

Exam trap

Google Cloud often tests the distinction between code-based notebook tools (Colab, Datalab) and managed low-code platforms (Vertex AI Workbench), expecting candidates to recognize that AutoML job execution and visual result exploration require the latter's integrated UI and API access.

How to eliminate wrong answers

Option B (Cloud Datalab) is wrong because it is a deprecated tool that required code-based notebooks and does not support AutoML job execution or low-code visual interfaces. Option C (Cloud Composer) is wrong because it is a workflow orchestration service based on Apache Airflow, designed for scheduling and monitoring pipelines, not for interactive data exploration or AutoML. Option D (Google Colab) is wrong because it is a free, code-centric notebook environment that lacks native integration with Vertex AI AutoML and does not provide a low-code visual interface for model building.

3
MCQeasy

A startup wants to build a product recommendation engine without writing custom training code. They have user-item interaction data stored in BigQuery. Which Google Cloud service should they use?

A.Cloud Dataflow with ML APIs
B.BigQuery ML matrix factorization
C.Vertex AI AutoML Tables
D.Vertex AI Matching Engine
AnswerB

Train a recommendation model using SQL with no code.

Why this answer

BigQuery ML matrix factorization is the correct choice because it allows building a recommendation engine directly in BigQuery using SQL, without writing custom training code. It supports implicit and explicit user-item interaction data and provides built-in evaluation metrics, making it ideal for low-code ML solutions on existing BigQuery data.

Exam trap

Google Cloud often tests the distinction between services that require custom code (Dataflow) versus those that offer SQL-based low-code ML (BigQuery ML), and the trap here is assuming any ML service like AutoML or Matching Engine is suitable for recommendation without recognizing the specific need for matrix factorization on interaction data.

How to eliminate wrong answers

Option A is wrong because Cloud Dataflow is a data processing pipeline service, not a low-code ML training service; using ML APIs would require custom code to orchestrate and train models. Option C is wrong because Vertex AI AutoML Tables is designed for tabular data with structured features, not specifically for user-item interaction matrices, and requires exporting data from BigQuery. Option D is wrong because Vertex AI Matching Engine is for vector similarity search and nearest neighbor retrieval, not for training matrix factorization models from interaction data.

4
Multi-Selecteasy

A data scientist wants to use Vertex AI Pipelines to automate a low-code ML workflow. Which two statements are correct regarding best practices? (Choose TWO.)

Select 2 answers
A.Use pre-built components from Google's curated component library to avoid custom code.
B.Store all intermediate artifacts in Cloud Storage to enable reproducibility and reuse.
C.Avoid using pre-built components because they are not customizable.
D.Use the Vertex AI Experiments to track and compare pipeline runs.
E.Use the Kubeflow Pipelines SDK to define the pipeline, which requires extensive coding.
AnswersA, B

Pre-built components enable low-code pipeline construction.

Why this answer

Option A is correct because Vertex AI Pipelines offers a curated library of pre-built components that encapsulate common ML tasks (e.g., data preprocessing, training, evaluation). Using these components reduces the need for custom code, aligning with the low-code ML workflow requirement. This approach accelerates development while maintaining reliability through Google-tested implementations.

Exam trap

The trap here is that candidates confuse Vertex AI Experiments (a tracking tool) with a pipeline design best practice, or they assume pre-built components are rigid and cannot be customized, leading them to incorrectly select D or C.

5
MCQeasy

A non-technical user wants to build a binary classification model using Vertex AI. Which UI should they use?

A.Vertex AI AutoML
B.Vertex AI Workbench
C.Vertex AI Pipelines
D.Vertex AI Prediction
AnswerA

Correct: No-code UI for training.

Why this answer

Vertex AI AutoML is the correct choice because it provides a no-code graphical user interface specifically designed for non-technical users to build, train, and deploy machine learning models, including binary classification models, without writing any code. It automates the entire ML pipeline—feature engineering, model selection, hyperparameter tuning—allowing users to simply upload labeled data and get a production-ready model.

Exam trap

Google Cloud often tests the distinction between 'building/training' tools (AutoML) and 'deploying/serving' tools (Prediction), leading candidates to mistakenly choose Vertex AI Prediction because they confuse the deployment phase with the model creation phase.

How to eliminate wrong answers

Option B is wrong because Vertex AI Workbench is a Jupyter notebook-based development environment intended for data scientists and ML engineers who write custom code, not for non-technical users seeking a low-code solution. Option C is wrong because Vertex AI Pipelines is a tool for orchestrating and automating ML workflows using code-defined pipelines (e.g., Kubeflow Pipelines SDK), requiring programming skills to define steps and dependencies. Option D is wrong because Vertex AI Prediction is a serving endpoint for deploying and running inference on already-trained models, not a UI for building or training models from scratch.

6
Multi-Selecthard

A healthcare company uses AutoML Tables to predict patient readmission risk. The dataset contains 500,000 rows and 200 features, including patient demographics, lab results, and medical history. The model accuracy is lower than expected. The engineer wants to improve performance using low-code techniques. Which THREE actions are most effective? (Choose THREE.)

Select 3 answers
A.Increase the training time budget to the maximum allowed.
B.Remove highly correlated features using AutoML Tables' built-in feature importance analysis.
C.Engineer new features such as time since last admission and number of previous admissions.
D.Use a custom model architecture via AutoML Tables advanced options.
E.Enable automated handling of missing values and outliers in the dataset configuration.
AnswersB, C, E

Reduces noise and improves model generalization.

Why this answer

Option B is correct because AutoML Tables provides built-in feature importance analysis that can identify and remove highly correlated features, which reduces noise and multicollinearity, often improving model performance without manual intervention. This is a low-code technique that leverages the platform's automated capabilities to streamline feature selection.

Exam trap

Google Cloud often tests the misconception that increasing training time or using custom architectures is a low-code solution, when in fact low-code techniques rely on platform automation like built-in feature engineering and data preprocessing, not manual tuning or custom coding.

7
MCQmedium

A company uses Vertex AI Pipelines to orchestrate an AutoML tabular training step followed by a BigQuery ML evaluation step. The pipeline fails because the output of the AutoML step (a model resource name) is not being passed to the BigQuery step. What is the most likely cause?

A.The AutoML training component is implemented as a Python function without proper artifact input/output annotations
B.The pipeline is using a custom pipeline root but the model is in a different region
C.The Vertex AI Pipeline Runner does not have permission to access AutoML models
D.The BigQuery ML evaluation component requires a service agent with Cloud SQL access
AnswerA

Kubeflow Pipelines requires artifact tracking for passing parameters.

Why this answer

In Vertex AI Pipelines, when using the Kubeflow Pipelines SDK, components must explicitly declare their inputs and outputs using type annotations (e.g., `Input[Model]`, `Output[Model]`) or via `@component` decorators with `outputs` specified. If the AutoML training step is implemented as a plain Python function without these annotations, the pipeline framework cannot serialize and pass the model resource name as an artifact to the downstream BigQuery ML evaluation step. This causes the pipeline to fail because the BigQuery step receives no valid model reference.

Exam trap

Google Cloud often tests the distinction between runtime permission errors (like IAM) and pipeline orchestration errors (like missing artifact passing), leading candidates to incorrectly choose a permissions-related option when the real issue is a component definition flaw.

How to eliminate wrong answers

Option B is wrong because a custom pipeline root or regional mismatch would cause storage or execution errors, not a failure to pass an output artifact between steps; the model resource name is a metadata artifact, not a storage path. Option C is wrong because permission issues would manifest as authorization errors (e.g., 403 Forbidden) when the pipeline runner tries to access the model, not as a missing output artifact; the error described is about data flow, not access control. Option D is wrong because BigQuery ML evaluation does not require Cloud SQL access; it uses BigQuery's own service agent and IAM permissions, and Cloud SQL is a separate database service irrelevant to this pipeline.

8
MCQmedium

A data analyst wants to use Vision API to detect custom objects in manufacturing images, but the pre-trained API does not recognize their specific components. They have 1000 labeled images. Which path offers the fastest time-to-value with minimal coding?

A.Store images in BigQuery and use ML.PREDICT with a custom model
B.Use AutoML Vision for object detection
C.Use a Cloud Function to call the Vision API and post-process results
D.Train a custom object detection model using TensorFlow on Vertex AI
AnswerB

No-code training and deployment.

Why this answer

AutoML Vision for object detection is the fastest path because it requires no custom coding—users simply upload labeled images, and the platform automatically trains a model tailored to their custom components. This directly addresses the need to detect objects the pre-trained Vision API cannot recognize, while minimizing time-to-value compared to manual TensorFlow training or custom infrastructure setup.

Exam trap

Google Cloud often tests the misconception that any cloud function or API call can be adapted to custom objects via post-processing, but the pre-trained Vision API's fixed label set cannot be extended without retraining, making AutoML the only low-code solution that actually learns new object classes.

How to eliminate wrong answers

Option A is wrong because BigQuery ML.PREDICT is designed for structured data and tabular models, not for image-based object detection; storing images in BigQuery and using ML.PREDICT would require converting images to embeddings or using a pre-trained model, which does not solve the custom object recognition problem efficiently. Option C is wrong because calling the pre-trained Vision API via Cloud Function and post-processing results still relies on the same pre-trained model that cannot recognize the custom components, so it fails to address the core requirement. Option D is wrong because training a custom model using TensorFlow on Vertex AI requires significant coding, manual architecture design, and hyperparameter tuning, which is far slower and more complex than using AutoML Vision's no-code automated training pipeline.

9
MCQeasy

A marketing team wants to analyze customer reviews for sentiment without writing code. Which Google Cloud service should they use?

A.Cloud Dataflow
B.Vertex AI Workbench
C.BigQuery ML
D.Cloud Natural Language API
AnswerD

Correct: Pre-trained, no-code sentiment analysis.

Why this answer

The Cloud Natural Language API (option D) is the correct choice because it provides pre-trained models for sentiment analysis, entity recognition, and syntax analysis via a simple REST API, requiring no code beyond sending HTTP requests. This aligns perfectly with the requirement to analyze customer reviews for sentiment without writing code, as the API abstracts all ML complexity.

Exam trap

Google Cloud often tests the distinction between services that require coding (like Dataflow or Workbench) versus those that offer pre-built, no-code APIs (like Cloud Natural Language API), leading candidates to mistakenly choose BigQuery ML because it uses SQL, which they perceive as 'low-code' but still requires explicit query writing and model management.

How to eliminate wrong answers

Option A is wrong because Cloud Dataflow is a fully managed stream and batch data processing service based on Apache Beam, requiring users to write code (e.g., Java or Python) to define data pipelines, making it unsuitable for a no-code sentiment analysis task. Option B is wrong because Vertex AI Workbench is a Jupyter-based notebook environment for building and deploying custom ML models, requiring users to write code (e.g., Python) to train or use models, not a no-code solution. Option C is wrong because BigQuery ML allows users to create and execute ML models using SQL queries, but it still requires writing SQL statements and managing model creation, which is not a no-code API for direct sentiment analysis of text.

10
MCQhard

What is the root cause of the failure?

A.The budget_milli_node_hours parameter is set to 0, which is below the minimum required value
B.The evaluate_model component expects the model artifact but the autopilot_train component does not output a model artifact
C.The location parameter 'us-central1' is not a valid region for AutoML
D.The threshold parameter is missing in the autopilot_train component
AnswerA

Must be at least 1000 (1 node hour).

Why this answer

Option A is correct because the `budget_milli_node_hours` parameter in Vertex AI AutoML training specifies the maximum amount of compute time (in milliseconds) allocated for model training. Setting it to 0 means no compute time is allowed, which causes the training job to fail immediately as it cannot proceed with zero resource allocation. The minimum required value is typically 1 (or higher depending on the task type), so 0 is invalid and triggers a failure.

Exam trap

Google Cloud often tests the misconception that a zero value for a resource allocation parameter is acceptable or defaults to a minimum, when in fact it causes an immediate validation failure.

How to eliminate wrong answers

Option B is wrong because the `autopilot_train` component in Vertex AI AutoML does output a model artifact; the failure is not due to a missing artifact but due to the zero budget parameter. Option C is wrong because `us-central1` is a valid region for AutoML in Vertex AI, and region validity is not the issue here. Option D is wrong because the `threshold` parameter is not a required parameter for the `autopilot_train` component; the failure is caused by the zero budget, not a missing threshold.

11
MCQmedium

A financial services company uses Vertex AI AutoML Tables to build a credit risk model. The dataset contains 500,000 rows and 50 features, including loan amount, credit score, debt-to-income ratio, and employment length. The target variable is binary: 'default' (1) or 'no default' (0). The data is highly imbalanced, with only 2% defaults. The data scientist trains a model with AutoML Tables using default settings. The evaluation metrics show an AUC of 0.85, but the confusion matrix reveals that the model predicts 'no default' for almost all cases, missing most defaults. The data scientist needs to improve the model's ability to identify defaults without significantly increasing false positives. They have limited time and cannot write custom code. What should they do?

A.Manually split the data into a stratified train/test set to ensure the same proportion of defaults in each.
B.Train multiple models with different algorithms (e.g., XGBoost, Random Forest) and blend them using a custom script.
C.Enable 'Enable weighted evaluation' and set the optimization objective to 'Maximize recall at a specific recall@P%' with a target precision of 0.5.
D.Under-sample the majority class to create a balanced dataset and retrain.
AnswerC

Why A is correct: AutoML Tables supports custom optimization objectives to handle imbalance.

Why this answer

Option C is correct because AutoML Tables allows you to set a custom optimization objective to handle class imbalance without custom code. By enabling weighted evaluation and setting the objective to 'Maximize recall at a specific recall@P%' with a target precision of 0.5, the model will be tuned to prioritize identifying defaults (recall) while maintaining a specified precision level, directly addressing the need to catch more defaults without a massive increase in false positives.

Exam trap

Google Cloud often tests the misconception that manual data splitting or resampling is necessary for imbalanced data in AutoML, when in fact AutoML Tables provides built-in optimization objectives and weighted evaluation to handle imbalance without data manipulation.

How to eliminate wrong answers

Option A is wrong because AutoML Tables already performs stratified splitting by default; manually splitting does not change the model's training behavior or address the imbalance issue. Option B is wrong because it requires writing custom code (blending scripts), which violates the constraint of 'cannot write custom code' and is not a native AutoML Tables feature. Option D is wrong because under-sampling the majority class reduces the dataset size and discards valuable data, which can degrade model performance and is not recommended with AutoML Tables' built-in imbalance handling capabilities.

12
MCQeasy

A company wants to implement a document processing solution that extracts key information from invoices and receipts. They have limited ML expertise and want to use a pre-trained solution as much as possible. Which Google Cloud service should they use?

A.Document AI with a pre-trained invoice processor.
B.AutoML Natural Language with custom entity extraction.
C.Vertex AI Workbench with custom Python scripts.
D.Cloud Vision API with OCR.
AnswerA

Why B is correct: Document AI offers specialized pre-trained processors for invoices.

Why this answer

Document AI with a pre-trained invoice processor is the correct choice because it provides a fully managed, pre-trained solution specifically designed for extracting structured data (e.g., vendor name, invoice number, line items) from invoices and receipts. This aligns with the company's limited ML expertise and desire to use a pre-trained solution, requiring no custom model training or complex coding.

Exam trap

Google Cloud often tests the distinction between raw OCR (Cloud Vision API) and structured document understanding (Document AI), leading candidates to mistakenly choose Cloud Vision API for invoice processing when they only need text extraction, not structured data extraction.

How to eliminate wrong answers

Option B is wrong because AutoML Natural Language with custom entity extraction requires users to train a custom model with labeled data, which contradicts the requirement to use a pre-trained solution as much as possible. Option C is wrong because Vertex AI Workbench with custom Python scripts demands significant ML expertise to write and deploy custom code, which the company lacks. Option D is wrong because Cloud Vision API with OCR only extracts raw text from images, not the structured key-value pairs or specific fields needed for invoice processing.

13
MCQmedium

A company deploys an AutoML Vision model for real-time defect detection. They notice high inference latency during peak hours. Which configuration change can help?

A.Reduce the model's input resolution
B.Use batch prediction
C.Enable model compression
D.Increase the number of max replicas
AnswerD

Correct: Handles increased load with more parallelism.

Why this answer

Increasing the number of max replicas allows the AutoML Vision endpoint to scale horizontally during peak hours, distributing the inference load across more compute instances. This directly reduces per-request latency by preventing queuing and resource contention, as the Vertex AI Prediction service can spin up additional replicas up to the configured maximum to handle higher throughput.

Exam trap

Google Cloud often tests the misconception that reducing input resolution or enabling compression is a safe latency fix, but the PMLE exam expects you to recognize that AutoML Vision models are black-box optimized and that horizontal scaling via max replicas is the proper architectural response to real-time latency spikes.

How to eliminate wrong answers

Option A is wrong because reducing input resolution may lower latency but at the cost of detection accuracy, which is unacceptable for defect detection where fine-grained features matter. Option B is wrong because batch prediction is designed for asynchronous, non-real-time processing of large datasets, not for reducing latency in real-time inference; it actually increases end-to-end latency. Option C is wrong because AutoML Vision models are already optimized by Google's neural architecture search, and enabling model compression (e.g., quantization) is not a supported configuration option for deployed AutoML Vision models; it would require retraining with a different model type.

14
MCQmedium

A company uses AutoML Tables (Vertex AI AutoML for tabular data) to predict customer churn. Their dataset has 10,000 rows and 50 features. During training, they notice the model's performance is poor. Which approach is most likely to improve the model?

A.Enable automatic feature engineering transformations
B.Switch to BigQuery ML linear regression
C.Increase the training budget to 10 node hours
D.Remove 20 features to reduce noise
AnswerA

AutoML can create new features from existing ones automatically.

Why this answer

AutoML Tables (Vertex AI AutoML for tabular data) includes automatic feature engineering transformations such as scaling, one-hot encoding, and feature cross creation. These transformations are essential for capturing non-linear relationships and interactions between features, which can significantly improve model performance when the default preprocessing is insufficient. Enabling this option directly addresses the poor performance by allowing the model to learn more complex patterns from the data.

Exam trap

Google Cloud often tests the misconception that increasing training budget or reducing features is a universal fix for poor model performance, when in fact the most impactful first step is to enable automatic feature engineering to let the model learn better representations from the data.

How to eliminate wrong answers

Option B is wrong because switching to BigQuery ML linear regression would likely worsen performance, as linear regression assumes a linear relationship between features and target, which is rarely the case in churn prediction; AutoML is designed to handle non-linear patterns. Option C is wrong because increasing the training budget to 10 node hours does not address the root cause of poor performance—it only allows more time for training, but if the model's architecture or preprocessing is inadequate, more budget will not fix the underlying issue. Option D is wrong because removing 20 features arbitrarily may discard valuable information; AutoML Tables can handle high-dimensional data and automatically identify feature importance, so reducing features without analysis can harm performance.

15
MCQhard

The pipeline fails during the evaluate component with error "Model not found". What is the most likely cause?

A.The dataset_id is misspelled
B.The model_id parameter is referencing the wrong output
C.The training container did not produce a model artifact
D.The threshold value is invalid
AnswerB

Correct: Output name mismatch causes Model not found.

Why this answer

The error 'Model not found' during the evaluate component indicates that the model_id parameter is referencing an output that does not exist or is incorrectly named. In SageMaker Pipelines, the evaluate step typically takes the model artifact from a previous training step via a PropertyFile or JsonGet, and if the model_id points to a wrong output (e.g., a different step's output or a misspelled reference), the pipeline cannot locate the model. This is the most likely cause because the error is specific to model resolution, not dataset or threshold issues.

Exam trap

Google Cloud often tests the distinction between resource resolution errors (like 'Model not found') and data/validation errors, tricking candidates into confusing dataset or threshold issues with pipeline step output references.

How to eliminate wrong answers

Option A is wrong because a misspelled dataset_id would cause a 'Dataset not found' or data loading error, not a 'Model not found' error during evaluation. Option C is wrong because if the training container did not produce a model artifact, the pipeline would fail earlier in the training step with an artifact missing error, not during the evaluate component. Option D is wrong because an invalid threshold value would cause a validation or scoring error within the evaluate step, not a 'Model not found' error, which is a resource resolution issue.

16
MCQeasy

A data scientist wants to quickly train a binary classification model on a tabular dataset stored in BigQuery without writing any code. They have limited ML experience. Which Google Cloud service should they use?

A.Vertex AI Workbench with a built-in scikit-learn notebook.
B.Dataflow with a TensorFlow pipeline.
C.BigQuery ML with CREATE MODEL statement using SQL.
D.AutoML Tables with a direct BigQuery connection.
AnswerC

BigQuery ML enables model creation with SQL, no coding required.

Why this answer

Option C is correct because BigQuery ML allows a data scientist to train a binary classification model directly in BigQuery using a `CREATE MODEL` SQL statement, without writing any code or moving data. This is the fastest low-code approach for users with limited ML experience, as it leverages familiar SQL syntax and runs entirely within BigQuery's serverless infrastructure.

Exam trap

Google Cloud often tests the distinction between 'low-code' (BigQuery ML) and 'no-code' (AutoML) services, but the trap here is that AutoML Tables requires more setup and data movement, while BigQuery ML is the fastest no-code option for users already working in BigQuery.

How to eliminate wrong answers

Option A is wrong because Vertex AI Workbench requires writing Python code (e.g., scikit-learn) and managing a notebook environment, which is not a no-code solution and exceeds the 'limited ML experience' constraint. Option B is wrong because Dataflow with a TensorFlow pipeline requires writing code for pipeline construction and model training, and is designed for stream/batch data processing, not for quick no-code model training. Option D is wrong because AutoML Tables, while low-code, requires exporting data from BigQuery or connecting via a separate interface, and involves a more complex workflow than directly using BigQuery ML's SQL-based training; the question specifies 'without writing any code' and 'quickly,' and BigQuery ML is the most direct path.

17
MCQmedium

A retail company wants to build a customer churn prediction model using AutoML Tables. They have a dataset with 5000 rows and 50 features, including customer ID, transaction history, and support tickets. The target is a binary column 'churned'. After training, the model shows high accuracy but low recall for the churned class. What is the most likely cause?

A.The dataset is too small for AutoML to train effectively.
B.The features are not normalized, leading to biased predictions.
C.The churned class is underrepresented, causing the model to favor the majority class.
D.The dataset includes a unique customer ID feature, causing overfitting.
AnswerC

Class imbalance leads to high accuracy but low recall for minority class.

Why this answer

Option C is correct because in imbalanced datasets, AutoML Tables optimizes for overall accuracy, which can be high if the majority class dominates. Low recall for the churned class indicates the model predicts most instances as non-churned, a classic symptom of class imbalance. AutoML Tables provides class weighting and sampling options to mitigate this, but without them, the model favors the majority class.

Exam trap

Google Cloud often tests the misconception that high accuracy always means a good model, trapping candidates who overlook class imbalance as the root cause of poor recall for the minority class.

How to eliminate wrong answers

Option A is wrong because 5000 rows is generally sufficient for AutoML Tables to train effectively, especially with 50 features; the issue is class imbalance, not dataset size. Option B is wrong because AutoML Tables automatically handles feature normalization internally, so unnormalized features do not cause biased predictions in this context. Option D is wrong because including a unique customer ID feature can cause overfitting, but the symptom described (high accuracy, low recall) is characteristic of class imbalance, not overfitting; overfitting would typically show high training accuracy but poor generalization, not specifically low recall for a minority class.

18
Multi-Selecteasy

Which THREE of the following are supported output types for BigQuery ML?

Select 3 answers
A.Classification
B.Object detection
C.Anomaly detection
D.Time-series forecasting
E.Regression
AnswersA, D, E

e.g., logistic regression model.

Why this answer

BigQuery ML supports supervised learning tasks like classification and regression, as well as time-series forecasting, through its model types such as `LOGISTIC_REG`, `LINEAR_REG`, and `ARIMA_PLUS`. Classification (option A) is correct because BigQuery ML provides `LOGISTIC_REG` for binary and multi-class classification problems, outputting predicted labels or probabilities.

Exam trap

Google Cloud often tests the distinction between supported BigQuery ML output types and broader ML capabilities, leading candidates to mistakenly include object detection or anomaly detection, which are not native output types in BigQuery ML's SQL-based interface.

19
MCQhard

An e-commerce company deployed a Vertex AI AutoML Tables model to predict customer churn. The model is served via a private endpoint with a dedicated machine type n1-standard-4. After a week, they observe that 5% of predictions fail with 'Request timed out' error. The average prediction time is 1.2 seconds but spikes to 4 seconds during peak hours. The input data is 50 features. They have enabled autoscaling with a min node count of 1 and max of 5. Which action is most likely to resolve the timeout issue without increasing complexity?

A.Reduce the number of features to 30.
B.Increase the max node count to 10.
C.Enable model monitoring to detect data drift.
D.Change the machine type to n1-highmem-4 to increase memory.
AnswerB

More nodes can absorb traffic spikes and reduce timeout errors.

Why this answer

Option A is correct because increasing the max node count allows the endpoint to handle peak traffic better. Option B (increasing memory) does not address compute demand. Option C (reducing features) changes the model and may degrade performance.

Option D (model monitoring) does not fix the timeout.

20
MCQeasy

A data analyst wants to create a classification model directly in BigQuery using SQL. Which feature should they use?

A.BigQuery ML
B.Vertex AI
C.Dataflow
D.Cloud ML Engine
AnswerA

BigQuery ML allows creating models using SQL.

Why this answer

BigQuery ML (BQML) enables users to create and execute machine learning models directly in BigQuery using standard SQL syntax, without needing to export data or manage separate ML infrastructure. For a data analyst who wants to build a classification model entirely within BigQuery, BQML provides the CREATE MODEL statement with classification algorithms like logistic regression or XGBoost, making it the correct and most direct feature.

Exam trap

Google Cloud often tests the distinction between services that run inside BigQuery (BQML) versus external ML platforms (Vertex AI), trapping candidates who think any ML service qualifies without checking if it operates directly via SQL in BigQuery.

How to eliminate wrong answers

Option B is wrong because Vertex AI is a full MLOps platform for training, deploying, and managing models, but it requires data to be exported from BigQuery and does not allow model creation directly in SQL within BigQuery. Option C is wrong because Dataflow is a stream and batch data processing service (based on Apache Beam) used for ETL and data pipelines, not for creating classification models. Option D is wrong because Cloud ML Engine (now part of Vertex AI) is a managed service for training and serving custom ML models, but it does not support SQL-based model creation inside BigQuery.

21
MCQmedium

Refer to the exhibit. A data engineer is defining a Vertex AI Pipeline step to train a model. The pipeline fails with an error: "Failed to create vertex ai custom job: Invalid resource name." What is the most likely cause of the error?

A.The container image URI is incorrect; it should be from gcr.io/vertex-ai/training.
B.The output artifact schema is missing the 'type' property.
C.The training_data input should be a Vertex AI Dataset resource, not a simple string.
D.The machine type n1-standard-4 is not supported for Vertex AI training.
AnswerC

The input expects a dataset resource name, not a raw string.

Why this answer

Option C is correct because Vertex AI Pipeline steps that use a CustomJob to train a model require the training data input to be a Vertex AI Dataset resource (a Dataset object), not a plain string. When a string is passed instead of a Dataset resource, the pipeline attempts to create a custom job with an invalid resource name, as the backend expects a properly formatted Dataset resource name (e.g., projects/{project}/locations/{location}/datasets/{dataset_id}). This mismatch triggers the 'Invalid resource name' error.

Exam trap

Google Cloud often tests the distinction between raw data inputs (like strings or URIs) and managed Vertex AI resources (like Datasets), leading candidates to overlook that the pipeline component expects a resource object, not a simple string.

How to eliminate wrong answers

Option A is wrong because the container image URI does not need to be from gcr.io/vertex-ai/training; any valid container image URI (e.g., from Artifact Registry or a custom registry) is acceptable as long as it is accessible and correctly formatted. Option B is wrong because the output artifact schema's 'type' property is optional in Vertex AI Pipelines; missing it does not cause an 'Invalid resource name' error, which is specific to resource naming issues. Option D is wrong because n1-standard-4 is a supported machine type for Vertex AI training; the error is about resource naming, not machine type availability.

22
Matchingmedium

Match each model evaluation metric to its use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Measure of false positives in classification

Measure of false negatives in classification

Harmonic mean of precision and recall

Root mean squared error for regression

Cross-entropy loss for probabilistic classification

Why these pairings

Metrics are critical for model selection and tuning.

23
Multi-Selectmedium

A manufacturing company uses AutoML Tables to predict equipment failure. They want to improve model performance without increasing manual effort. Which three actions should they take? (Choose THREE.)

Select 3 answers
A.Perform feature engineering using Vertex AI Feature Store.
B.Use BigQuery to aggregate sensor data before training.
C.Enable early stopping to prevent overfitting.
D.Deploy the model on a larger machine type to speed up inference.
E.Increase the training budget (node hours) for AutoML.
AnswersA, C, E

Feature Store helps create and manage features with minimal code.

Why this answer

Option A is correct because Vertex AI Feature Store enables feature engineering and reuse without manual effort, allowing the team to create, store, and serve features consistently for AutoML Tables, which can improve model performance by providing more relevant input data. This aligns with the goal of reducing manual work while enhancing model accuracy through automated feature management.

Exam trap

Google Cloud often tests the distinction between actions that improve model performance (like feature engineering and training budget) versus actions that affect deployment or inference speed, leading candidates to mistakenly choose options like deploying on a larger machine type.

24
Multi-Selecthard

Which THREE actions can help improve the performance of a BigQuery ML model?

Select 3 answers
A.Increase the amount of training data
B.Replace the model with an AutoML model via export
C.Use hypertuning to optimize model parameters
D.Increase the time interval for prediction
E.Perform feature engineering in SQL
AnswersA, C, E

More data often improves model accuracy.

Why this answer

Increasing the amount of training data provides the model with more examples to learn from, which can reduce overfitting and improve generalization, especially for complex patterns. In BigQuery ML, more data often leads to better feature representation and higher accuracy, as long as the data is clean and relevant.

Exam trap

Google Cloud often tests the misconception that exporting a model to AutoML is a valid optimization step, but in reality, BigQuery ML and AutoML are separate services with incompatible model formats and training workflows.

25
MCQmedium

A retailer uses BigQuery ML to build a linear regression model for sales forecasting. The model's evaluation shows high RMSE. Which step should they take first?

A.Use a more complex model like XGBoost
B.Increase the number of features
C.Set a larger training budget
D.Examine the data for outliers and missing values
AnswerD

Correct: Data quality inspection is the first step.

Why this answer

High RMSE in a linear regression model often indicates issues with data quality, such as outliers or missing values, which can disproportionately skew the model's predictions. BigQuery ML's linear regression is sensitive to such anomalies, so examining and cleaning the data is the most appropriate first step before considering model complexity or feature engineering.

Exam trap

Google Cloud often tests the misconception that high RMSE is always a model complexity issue, leading candidates to jump to advanced algorithms or feature engineering without considering fundamental data quality checks.

How to eliminate wrong answers

Option A is wrong because switching to a more complex model like XGBoost without first addressing data quality issues can amplify overfitting and does not fix the root cause of high RMSE. Option B is wrong because blindly increasing the number of features can introduce noise and multicollinearity, potentially worsening RMSE rather than improving it. Option C is wrong because setting a larger training budget in BigQuery ML does not improve model accuracy; it only allocates more resources for training, which is irrelevant when the issue stems from data problems.

26
MCQeasy

A company needs to extract entities (e.g., names, dates) from customer emails using a pre-trained model. Which service should they use?

A.Translation API
B.Natural Language API
C.Dialogflow
D.Vision API
AnswerB

Natural Language API can extract entities from text.

Why this answer

The Natural Language API provides entity extraction as a pre-trained model. Vision API is for images, Translation API for translation, and Dialogflow for conversational agents.

27
MCQeasy

A small business wants to build a sentiment analysis model for customer reviews without writing any code. They have a small labeled dataset with 500 positive and 500 negative reviews. Which Google Cloud service should they use?

A.AutoML Natural Language
B.Natural Language API
C.Vertex AI custom training with PyTorch
D.BigQuery ML with logistic regression
AnswerA

Allows training a custom model with a GUI and no code.

Why this answer

AutoML Natural Language is the correct choice because it allows the business to train a custom sentiment analysis model using their own labeled dataset without writing any code. It provides a low-code interface for uploading data, training, and deploying the model, which aligns with the requirement of no coding and a small labeled dataset.

Exam trap

The trap here is that candidates often confuse the pre-trained Natural Language API with AutoML Natural Language, assuming the API can be customized with labeled data, but the API is fixed and cannot be retrained, while AutoML is designed for custom model training without code.

How to eliminate wrong answers

Option B is wrong because the Natural Language API is a pre-trained model that cannot be fine-tuned with custom labeled data; it only offers general sentiment analysis and would not leverage the business's specific 500/500 dataset. Option C is wrong because Vertex AI custom training with PyTorch requires writing code to define the model architecture and training loop, which violates the 'without writing any code' constraint. Option D is wrong because BigQuery ML with logistic regression is designed for structured tabular data, not for text sentiment analysis, and it would require feature engineering and SQL-based model definition, which is not a true no-code solution for text.

28
MCQhard

A data scientist uses Vertex AI Pipelines to orchestrate an ML workflow. They want to reuse a component from Google's curated repository. What is the recommended way to incorporate it?

A.Import the component from Google Cloud Build
B.Use the 'aiplatform' Python SDK to define the component
C.Use prebuilt components from the Google Cloud Pipeline Components repository
D.Copy the component code into the pipeline definition
AnswerC

These are officially maintained and can be directly used in Vertex AI Pipelines.

Why this answer

Option C is correct because Google provides a curated set of prebuilt components in the Google Cloud Pipeline Components repository, which are designed to be directly imported and used within Vertex AI Pipelines. These components encapsulate common ML tasks (e.g., model training, deployment) and are maintained by Google, ensuring compatibility and reducing custom code. Using them is the recommended approach to avoid reinventing the wheel and to leverage Google's best practices.

Exam trap

The trap here is that candidates may confuse the 'aiplatform' SDK (used for direct API calls) with the pipeline components SDK, or assume that copying code is acceptable for reusability, when Google specifically recommends using the curated prebuilt components to ensure compatibility and reduce maintenance overhead.

How to eliminate wrong answers

Option A is wrong because Google Cloud Build is a CI/CD service for building and testing code, not a repository for reusable pipeline components; importing a component from Cloud Build would not provide the curated, prebuilt component logic needed for Vertex AI Pipelines. Option B is wrong because the 'aiplatform' Python SDK is used to interact with Vertex AI services (e.g., creating datasets, jobs) but does not define or import prebuilt pipeline components; defining a component from scratch would bypass the curated repository. Option D is wrong because copying component code into the pipeline definition defeats the purpose of reusability and maintainability, and it is not the recommended method; the curated repository provides versioned, tested components that should be referenced rather than duplicated.

29
MCQhard

A team is using Vertex AI AutoML to train a forecasting model. They need to retrain the model weekly and only if the new week's data significantly changes the data distribution. What is the most efficient way to achieve this?

A.Use a scheduled pipeline that always retrains
B.Use Cloud Monitoring alerts on data drift to trigger retraining
C.Use Vertex AI Model Monitoring to detect drift and trigger a pipeline
D.Use Cloud Functions on schedule to compare distributions
AnswerC

Correct: Native drift detection and retriggering.

Why this answer

Option C is correct because Vertex AI Model Monitoring can be configured to detect data drift on the model's input features, and when drift exceeds a threshold, it can trigger a Cloud Function or a Vertex AI pipeline to retrain the model. This approach avoids unnecessary retraining when the data distribution has not changed significantly, which is more efficient than always retraining. The integration with Cloud Functions or Pub/Sub allows for a serverless, event-driven retraining pipeline that only runs when needed.

Exam trap

Google Cloud often tests the distinction between infrastructure monitoring (Cloud Monitoring) and model-specific monitoring (Vertex AI Model Monitoring), and candidates mistakenly choose Cloud Monitoring because they think it can detect data drift, but it lacks the statistical algorithms needed for feature-level distribution comparison.

How to eliminate wrong answers

Option A is wrong because a scheduled pipeline that always retrains ignores the requirement to retrain only when the new week's data significantly changes the data distribution, leading to wasted compute resources and potential model instability from unnecessary retraining. Option B is wrong because Cloud Monitoring alerts are designed for infrastructure and application metrics (e.g., CPU, latency), not for detecting data drift in model features; data drift detection requires feature-level statistical analysis, which is not a built-in capability of Cloud Monitoring. Option D is wrong because using Cloud Functions on a schedule to compare distributions would require custom code to compute statistical tests (e.g., KS test) and manage state, which is less efficient and more error-prone than using Vertex AI Model Monitoring's managed drift detection service.

30
Multi-Selecthard

Which THREE of the following are valid best practices when using Vertex AI AutoML for tabular data?

Select 3 answers
A.Normalize the data into multiple tables to reduce data size
B.Enable automatic feature engineering to improve model performance
C.Disable early stopping for best model quality if budget allows
D.Use the max time budget parameter to control costs
E.Keep training data with heavy class imbalance as-is to let AutoML correct it
AnswersB, C, D

Creates cross features and handling missing values.

Why this answer

Option B is correct because enabling automatic feature engineering in Vertex AI AutoML for tabular data allows the service to automatically create, select, and transform features (e.g., polynomial combinations, cross features, and numerical transformations) to improve model accuracy without manual intervention. This is a built-in capability that leverages Google's AutoML algorithms to discover the most predictive feature representations from the raw data.

Exam trap

Google Cloud often tests the misconception that cost-control parameters like max time budget are best practices for model quality, when in fact they are operational constraints, and that disabling early stopping is beneficial for quality, when it actually risks overfitting and wasted resources.

31
MCQmedium

A retail company wants to build a product recommendation system using BigQuery ML for their e-commerce platform. The data includes customer purchase history, product metadata, and clickstream logs. The ML engineer needs to minimize manual feature engineering and leverage pre-built solutions. Which approach should the engineer take?

A.Use a pre-built recommendation model from Vertex AI Model Garden and deploy it to an endpoint.
B.Write a custom TensorFlow model using the Vertex AI Training service and deploy it via Vertex AI Prediction.
C.Export the data to CSV and use AutoML Tables to train a recommendation model.
D.Use BigQuery ML's matrix factorization model (CREATE MODEL with model_type='matrix_factorization') to train directly on historical interaction data.
AnswerD

BigQuery ML provides low-code matrix factorization for recommendations.

Why this answer

Option D is correct because BigQuery ML's matrix factorization model (model_type='matrix_factorization') is purpose-built for recommendation systems using implicit or explicit feedback data. It trains directly on historical interaction data (e.g., user-item purchases) without requiring manual feature engineering, aligning with the goal of minimizing low-code ML effort. This approach leverages BigQuery's native SQL interface and scales automatically, making it ideal for the described e-commerce scenario.

Exam trap

The trap here is that candidates may assume Vertex AI Model Garden (Option A) is the go-to for pre-built ML, but it does not offer a pre-trained recommendation model that can be directly deployed without custom training on the company's data.

How to eliminate wrong answers

Option A is wrong because Vertex AI Model Garden provides pre-built models for tasks like vision or NLP, not a ready-to-use recommendation model that can be directly deployed without training on the company's specific interaction data. Option B is wrong because writing a custom TensorFlow model and training it via Vertex AI Training contradicts the requirement to minimize manual feature engineering and leverage pre-built solutions. Option C is wrong because exporting data to CSV and using AutoML Tables would require additional data preparation and does not natively handle the user-item interaction structure as efficiently as BigQuery ML's matrix factorization, which operates directly on the data in place.

32
MCQeasy

A marketing team wants to use a pre-built natural language processing (NLP) model from Vertex AI Model Garden to analyze customer feedback. They need to extract sentiment from text data stored in Cloud Storage. The team has no experience with model serving infrastructure. Which deployment option minimizes operational overhead?

A.Deploy the model as a Cloud Function invoked by Cloud Storage events.
B.Deploy the model as a Cloud Run service using a custom Docker container.
C.Deploy the model on App Engine flexible environment.
D.Deploy the model to a Vertex AI Endpoint directly from Model Garden.
AnswerD

Simplest deployment with managed infrastructure.

Why this answer

Option D is correct because deploying directly to a Vertex AI Endpoint from Model Garden eliminates all infrastructure management. Vertex AI handles model serving, scaling, and monitoring automatically, which is ideal for a team with no experience in model serving infrastructure. This is a fully managed, serverless deployment that requires no containerization or server configuration.

Exam trap

The trap here is that candidates often assume Cloud Functions or Cloud Run are simpler because they are 'serverless,' but they fail to recognize that deploying a large NLP model requires specialized infrastructure (GPUs, model serving frameworks) that these services do not natively provide without significant custom work.

How to eliminate wrong answers

Option A is wrong because Cloud Functions are designed for lightweight, stateless event-driven code, not for hosting large NLP models with significant memory and GPU requirements; they also lack built-in model serving capabilities like autoscaling for inference. Option B is wrong because deploying as a Cloud Run service with a custom Docker container requires the team to containerize the model, manage dependencies, and configure scaling, which introduces significant operational overhead for a team with no serving experience. Option C is wrong because App Engine flexible environment still requires the team to build a custom runtime, manage instances, and handle model dependencies, and it is not optimized for ML inference workloads like Vertex AI endpoints.

33
Multi-Selecthard

A team is architecting a low-code ML system for real-time predictions with AutoML. Which THREE considerations are critical for production?

Select 3 answers
A.Enable autoscaling for the endpoint
B.Set up alerts for model performance degradation
C.Monitor prediction drift with Vertex AI Model Monitoring
D.Use a custom container for prediction
E.Use global model endpoints for low latency everywhere
AnswersA, B, C

Correct: Essential for handling variable traffic.

Why this answer

A is correct because autoscaling ensures the prediction endpoint can handle variable request loads without manual intervention, which is critical for production real-time systems. In Vertex AI, you can configure autoscaling with a target utilization level (e.g., 60%) to automatically adjust the number of compute nodes based on incoming traffic, preventing both over-provisioning and latency spikes.

Exam trap

The trap here is that candidates confuse 'low-code' with 'no-code' and assume custom containers (Option D) are always required for production, when AutoML actually abstracts away container management, and they also mistakenly think a single global endpoint inherently provides low latency, ignoring the need for regional deployment and traffic routing.

34
MCQeasy

A user receives this error when deploying an AutoML model. What should they do?

A.Change the region to us-west1
B.Use machine type n1-highmem-2
C.Increase the min-replica-count to 2
D.Remove the traffic-split flag
AnswerB

Correct: n1-highmem-2 is a supported machine type for AutoML.

Why this answer

The error occurs because AutoML models require a machine type with sufficient memory to load the model and perform predictions. The default machine type may not have enough memory for the model's size, leading to an out-of-memory (OOM) error. Using `n1-highmem-2` provides higher memory per core, which resolves the memory constraint without changing other deployment parameters.

Exam trap

The trap here is that candidates often confuse scaling (increasing replicas) with resource allocation (increasing memory per replica), leading them to choose Option C instead of addressing the per-instance memory bottleneck.

How to eliminate wrong answers

Option A is wrong because changing the region to `us-west1` does not affect the memory capacity of the machine type; the error is due to insufficient memory, not regional availability or latency. Option C is wrong because increasing `min-replica-count` to 2 only adds more replicas for scaling, but each replica still uses the same underpowered machine type, so the OOM error persists. Option D is wrong because removing the `traffic-split` flag would disrupt traffic routing but does not address the root cause of insufficient memory for model loading.

35
MCQhard

A company has a pipeline that uses Vertex AI Pipelines to fetch data from BigQuery, preprocess with Dataflow (without code?), then train an AutoML model, and deploy. However, they want to reduce cloud costs. The pipeline runs hourly. Which change will most reduce compute costs while maintaining throughput?

A.Decrease the AutoML training budget from 10 to 1 node hour
B.Replace Dataflow preprocessing with a Cloud Function that runs on each file upload
C.Increase Dataflow batch size to process more data per worker
D.Switch from Vertex AI Pipelines to Cloud Composer for orchestration
AnswerC

Reduces the number of worker instances needed.

Why this answer

Option C is correct because increasing the Dataflow batch size allows each worker to process more data per batch, reducing the number of workers needed and the total compute time for the same throughput. This directly lowers Dataflow's compute cost without affecting the pipeline's hourly schedule or the AutoML training budget.

Exam trap

The trap here is that candidates assume reducing AutoML node hours (Option A) is the most direct way to cut costs, but the question specifies 'maintaining throughput' and the pipeline runs hourly, so Dataflow preprocessing is the dominant cost driver, not the model training budget.

How to eliminate wrong answers

Option A is wrong because decreasing the AutoML training budget from 10 to 1 node hour would severely degrade model quality, as AutoML requires sufficient training time to converge, and this does not address the main cost driver (Dataflow preprocessing). Option B is wrong because replacing Dataflow with a Cloud Function triggered on file upload is event-driven and not suitable for the hourly batch pipeline; Cloud Functions have a 9-minute timeout and cannot handle large-scale preprocessing, so throughput would drop and costs could increase due to per-invocation overhead. Option D is wrong because switching from Vertex AI Pipelines to Cloud Composer (managed Airflow) adds orchestration complexity and cost (e.g., environment nodes) without reducing compute costs for Dataflow or AutoML; the orchestration layer is not the primary cost driver.

36
MCQhard

An organization uses Vertex AI Pipelines to automate a model training workflow. They want to reuse previously trained models if the data hasn't changed. Which pipeline component best achieves this?

A.Use a caching mechanism in Vertex AI Pipelines
B.Use a Cloud Function to check BigQuery update time
C.Use Artifact Registry to store model versions
D.Use a conditional component that checks data hash
AnswerD

Correct: Conditional logic can skip training when data unchanged.

Why this answer

Conditional components can check data changes and skip training if unchanged. Caching exists but not for whole pipeline; Cloud Function check is external; Artifact Registry stores models but doesn't decide to retrain.

37
MCQmedium

A company uses Vertex AI AutoML to train a vision model, but the model has low accuracy. What should they do first?

A.Add more labeled images to the dataset
B.Switch to a custom model
C.Increase the training budget
D.Reduce image size to speed up training
AnswerA

More data often improves model accuracy.

Why this answer

Adding more labeled images directly addresses the most common cause of low accuracy in AutoML vision models: insufficient or unrepresentative training data. Vertex AI AutoML relies on transfer learning from pre-trained models, and its performance is heavily dependent on the quality and quantity of labeled examples. Before adjusting hyperparameters or infrastructure, the first step should always be to improve the dataset, as AutoML is designed to handle model architecture and training budget automatically.

Exam trap

Google Cloud often tests the misconception that AutoML models are 'black boxes' where tuning budgets or switching to custom models is the first fix, when in reality the platform is optimized to handle those aspects automatically, and the primary lever is data quality.

How to eliminate wrong answers

Option B is wrong because switching to a custom model would require manual architecture design and hyperparameter tuning, which contradicts the low-code premise of AutoML and is not the first troubleshooting step. Option C is wrong because increasing the training budget (e.g., node hours) only helps if the model has not converged; with low accuracy, the root cause is typically data quality, not insufficient training time. Option D is wrong because reducing image size may speed up training but can discard critical features, further degrading accuracy; AutoML already handles resizing internally.

38
MCQmedium

A retail company wants to build a product recommendation system using customer purchase history and product attributes. They have limited ML expertise and want to minimize custom code. Which approach should they choose?

A.Use BigQuery ML to create a matrix factorization model.
B.Use Vertex AI Vizier for hyperparameter tuning on a pre-built recommendation model.
C.Use Vertex AI AutoML Tables to train a recommendation model.
D.Use TensorFlow with Keras to build a custom collaborative filtering model.
AnswerC

AutoML Tables can build a recommendation model from tabular data with minimal code.

Why this answer

Vertex AI AutoML Tables is the correct choice because it enables building a recommendation model with minimal ML expertise and custom code, leveraging automated feature engineering, model selection, and hyperparameter tuning on tabular data (customer purchase history and product attributes). It requires no custom code, unlike TensorFlow/Keras, and provides a managed service that handles data preprocessing and training, aligning with the company's limited ML expertise and desire to minimize custom code.

Exam trap

Google Cloud often tests the distinction between model training services (AutoML, BigQuery ML) and optimization/tuning services (Vizier), leading candidates to confuse Vizier as a complete model-building solution when it only tunes hyperparameters for an existing model.

How to eliminate wrong answers

Option A is wrong because BigQuery ML's matrix factorization model is designed for explicit feedback (e.g., ratings) and requires structured SQL-based feature engineering, which still demands ML knowledge and custom SQL code, not a fully low-code solution. Option B is wrong because Vertex AI Vizier is a hyperparameter tuning service, not a model training service; it cannot build a recommendation model on its own and requires a pre-built model to tune, which the company lacks. Option D is wrong because TensorFlow with Keras requires significant custom code and ML expertise to implement collaborative filtering, contradicting the requirement to minimize custom code and limited ML expertise.

39
MCQeasy

A company wants to predict customer churn using a dataset with 10,000 rows and 20 features. They have no ML expertise. Which low-code solution should they use?

A.Kubeflow Pipelines
B.Custom TensorFlow model
C.BigQuery ML
D.Vertex AI AutoML Tables
AnswerD

AutoML Tables provides automated model training and deployment without requiring deep ML knowledge.

Why this answer

Vertex AI AutoML Tables is the correct low-code solution because it allows users with no ML expertise to train high-quality tabular models on structured data (10,000 rows, 20 features) without writing any code. It automates feature engineering, model selection, and hyperparameter tuning, and provides a simple UI to upload data and get predictions. This directly matches the requirement of a low-code, no-expertise solution for a tabular churn prediction problem.

Exam trap

Google Cloud often tests the distinction between low-code/no-code solutions (like AutoML Tables) and platforms that still require coding or infrastructure expertise (like Kubeflow or custom TensorFlow), leading candidates to pick a technically capable but overly complex option.

How to eliminate wrong answers

Option A is wrong because Kubeflow Pipelines is a platform for building and deploying ML pipelines that requires significant coding and Kubernetes expertise, making it unsuitable for users with no ML expertise. Option B is wrong because a custom TensorFlow model requires writing Python code, defining neural network architectures, and tuning hyperparameters, which demands ML expertise. Option C is wrong because BigQuery ML is a low-code option for SQL-based ML, but it requires knowledge of SQL and ML concepts (e.g., creating models with CREATE MODEL statements), and it is less automated than AutoML Tables for users with zero ML background.

40
MCQhard

A financial institution uses BigQuery ML to train a linear regression model to predict loan default risk. The model is trained on a dataset with 100 million rows and 50 features. During inference, the engineer uses the ML.PREDICT function. However, the query takes several minutes to run and times out frequently. The data is static and updated monthly. What is the most cost-effective and low-code solution to improve prediction latency?

A.Export the trained model as a SQL function using the EXPORT MODEL statement, then use it for predictions.
B.Create a Dataflow pipeline to precompute predictions and store them in a separate table.
C.Use a materialized view to precompute the prediction features.
D.Increase the BigQuery compute capacity by reserving more slots.
AnswerA

Exports model as a persistent function for faster inference.

Why this answer

Option A is correct because exporting the trained model as a SQL function via `EXPORT MODEL` converts the linear regression coefficients into a persistent SQL UDF, eliminating the overhead of model loading and serialization during each `ML.PREDICT` call. This approach is low-code (no external pipeline) and cost-effective since predictions are executed as standard SQL without consuming BigQuery ML slot resources for model inference.

Exam trap

Google Cloud often tests the misconception that scaling infrastructure (more slots) or adding external pipelines (Dataflow) is the default solution for ML inference latency, when the correct low-code approach is to leverage BigQuery's native model export to SQL functions for static or batch-updated models.

How to eliminate wrong answers

Option B is wrong because creating a Dataflow pipeline introduces additional operational complexity, cost, and latency for a static dataset that is updated only monthly — the precomputed predictions would be stale until the next batch run, and the pipeline adds unnecessary engineering overhead. Option C is wrong because a materialized view can only store precomputed query results, not the prediction logic itself; it would still require calling `ML.PREDICT` on each refresh, and materialized views cannot directly invoke ML functions without incurring the same inference overhead. Option D is wrong because increasing BigQuery compute capacity (reserving more slots) only addresses resource contention, not the fundamental latency caused by model loading and inference overhead in `ML.PREDICT`; it is also the least cost-effective solution as it incurs ongoing slot costs without fixing the root cause.

41
MCQmedium

What is the most likely cause of the error?

A.The data split column contains only NULL values, so no rows are assigned to the training set
B.The model type 'linear_reg' is incompatible with the column 'price' because of missing values
C.The model creation does not have permission to read the dataset in BigQuery
D.The model creation did not specify a training budget, so default is insufficient
AnswerA

Custom split requires non-NULL values 0,1,2.

Why this answer

Option A is correct because when the data split column contains only NULL values, BigQuery ML cannot assign any rows to the training set. The `DATA_SPLIT_METHOD` using a custom column requires non-NULL values in that column to partition data into training and evaluation sets; if all values are NULL, the training set receives zero rows, causing the model creation to fail with an error about insufficient training data.

Exam trap

Google Cloud often tests the subtle distinction between missing values in the label column (which are handled gracefully) versus missing values in the data split column (which can cause a complete failure), leading candidates to incorrectly blame missing values in the target column.

How to eliminate wrong answers

Option B is wrong because the `linear_reg` model type is fully compatible with the `price` column even if it has missing values; BigQuery ML handles NULLs in the label column by excluding those rows during training, but the error here is about no training rows, not missing values. Option C is wrong because if the user lacked permission to read the dataset, the error would be a permissions-related message (e.g., 'Access Denied'), not a training set size error. Option D is wrong because BigQuery ML does not require a training budget for linear regression models; the default settings are sufficient, and the error is not budget-related.

42
MCQhard

A financial institution wants to use Natural Language API for sentiment analysis on customer feedback, but the domain-specific language (e.g., 'bullish', 'bearish') is not correctly classified. They have 200 labeled examples. Which approach minimizes coding effort while improving accuracy?

A.Submit a feature request to Google for domain-specific terms
B.Create a custom sentiment dictionary and pass it to the Natural Language API
C.Build a custom TensorFlow model for sentiment
D.Use AutoML Natural Language to train a custom model
AnswerD

No-code training on labeled data for improved accuracy.

Why this answer

Option D is correct because AutoML Natural Language enables you to train a custom model on your 200 labeled examples without writing code, directly improving accuracy for domain-specific terms like 'bullish' and 'bearish'. This approach leverages transfer learning from Google's pre-trained models, minimizing coding effort while adapting to your unique vocabulary and sentiment patterns.

Exam trap

Google Cloud often tests the misconception that the Natural Language API supports custom dictionaries or rule-based overrides, when in fact it only offers a fixed pre-trained model, making AutoML the correct low-code path for domain adaptation.

How to eliminate wrong answers

Option A is wrong because submitting a feature request to Google for domain-specific terms is not a practical solution—Google does not provide custom term updates for individual customers, and the turnaround time is indefinite. Option B is wrong because the Natural Language API does not accept a custom sentiment dictionary; it only supports a static, built-in sentiment model, and passing a dictionary is not a supported feature. Option C is wrong because building a custom TensorFlow model requires significant coding effort, including data preprocessing, model architecture design, training, and deployment, which contradicts the goal of minimizing coding effort.

43
MCQeasy

Refer to the exhibit. What does this command do?

A.Trains a new BigQuery ML model
B.Exports the model to Cloud Storage
C.Evaluates the model's performance
D.Makes predictions using the model
AnswerD

ML.PREDICT generates predictions.

Why this answer

The command shown in the exhibit is a BigQuery ML prediction query (e.g., `SELECT * FROM ML.PREDICT(MODEL mydataset.mymodel, ...)`). This command uses a trained model to generate predictions on new input data, making option D correct. It does not train, export, or evaluate the model.

Exam trap

Google Cloud often tests the distinction between the four key BigQuery ML commands (`CREATE MODEL`, `ML.EVALUATE`, `ML.PREDICT`, `EXPORT MODEL`), and the trap here is confusing the prediction function with the evaluation function, especially when the exhibit shows a query that looks like it might be evaluating performance due to the presence of a model name and input data.

How to eliminate wrong answers

Option A is wrong because training a new BigQuery ML model uses the `CREATE MODEL` statement, not the `ML.PREDICT` function. Option B is wrong because exporting a model to Cloud Storage uses the `EXPORT MODEL` statement, not a prediction query. Option C is wrong because evaluating model performance uses the `ML.EVALUATE` function, which returns metrics like loss and accuracy, not predictions.

44
MCQmedium

A company needs to perform sentiment analysis on streaming social media data. Which architecture should they use?

A.Dataflow → Pub/Sub → Natural Language API → BigQuery
B.Pub/Sub → Cloud Functions → Natural Language API → Cloud Storage
C.Cloud Functions → Pub/Sub → Natural Language API → BigQuery
D.Pub/Sub → Dataflow → Natural Language API → BigQuery
AnswerD

This is the recommended architecture for streaming analytics.

Why this answer

Option D is correct because streaming social media data requires a scalable, ordered ingestion pipeline. Pub/Sub ingests the stream, Dataflow processes it in real-time (e.g., windowing, deduplication), the Natural Language API performs sentiment analysis, and BigQuery stores results for querying. This decouples ingestion from processing and storage, enabling exactly-once semantics and auto-scaling.

Exam trap

Google Cloud often tests the misconception that Cloud Functions can replace Dataflow for streaming pipelines, but Cloud Functions lacks stream processing primitives (e.g., windowing, state management) and has a 9-minute timeout, making it unsuitable for continuous sentiment analysis.

How to eliminate wrong answers

Option A is wrong because Dataflow cannot directly read from a streaming source without a buffer like Pub/Sub; placing Dataflow before Pub/Sub reverses the pipeline order and breaks stream ingestion. Option B is wrong because Cloud Functions is not designed for high-throughput streaming; it has a 9-minute timeout and no built-in stream processing (e.g., windowing), making it unsuitable for continuous social media data. Option C is wrong because Cloud Functions should not be the entry point for streaming data; it lacks Pub/Sub's durability and ordering guarantees, and placing Pub/Sub after Cloud Functions would lose the stream before processing.

45
Multi-Selecteasy

Which TWO of the following are low-code machine learning solutions on Google Cloud?

Select 2 answers
A.TensorFlow
B.scikit-learn
C.PyTorch
D.BigQuery ML
E.Vertex AI AutoML
AnswersD, E

BigQuery ML allows creating models using SQL.

Why this answer

BigQuery ML (D) is a low-code ML solution because it allows users to create, train, and deploy machine learning models using standard SQL queries directly within BigQuery, eliminating the need for custom coding in Python or other programming languages. Vertex AI AutoML (E) is also low-code as it provides a graphical interface and automated pipeline to train high-quality models with minimal manual intervention, handling feature engineering, model selection, and hyperparameter tuning automatically.

Exam trap

Google Cloud often tests the distinction between general-purpose ML frameworks (like TensorFlow, scikit-learn, PyTorch) that require significant coding versus managed services (BigQuery ML, AutoML) that provide low-code or no-code interfaces, leading candidates to mistakenly classify any ML tool on Google Cloud as low-code.

46
MCQeasy

A company wants to classify support ticket text into categories. They have labeled historical tickets. Which Google Cloud service allows them to train a custom classification model with no code?

A.Vertex AI Matching Engine
B.AutoML Natural Language
C.Cloud Natural Language API
D.Document AI
AnswerB

Correct: No-code custom text classification.

Why this answer

AutoML Natural Language (now part of Vertex AI) is the correct service because it enables users to train custom text classification models using labeled data without writing any code. It provides a no-code interface for uploading datasets, training models, and evaluating performance, making it ideal for classifying support ticket text into custom categories.

Exam trap

The trap here is that candidates confuse the pre-trained Cloud Natural Language API (which requires no training but cannot be customized) with AutoML Natural Language (which requires labeled data but allows custom categories), leading them to select Option C incorrectly.

How to eliminate wrong answers

Option A is wrong because Vertex AI Matching Engine is designed for vector similarity search and embeddings, not for training custom classification models with labeled text data. Option C is wrong because Cloud Natural Language API is a pre-trained API that offers sentiment analysis, entity extraction, and syntax analysis, but it cannot be trained on custom labeled data for custom categories. Option D is wrong because Document AI is specialized for document processing (e.g., OCR, form parsing, invoice extraction) and is not intended for general text classification from labeled ticket data.

47
Multi-Selectmedium

A company uses Vertex AI for AutoML training. Which THREE are best practices for managing model versions?

Select 3 answers
A.Deploy each model version to a separate endpoint
B.Use Vertex AI Model Registry to version models
C.Use evaluation metrics to compare versions
D.Use labels to tag models for tracking
E.Automatically delete old versions after 30 days
AnswersB, C, D

Correct: Centralized model versioning.

Why this answer

Vertex AI Model Registry is the central repository for managing and versioning models, allowing you to track iterations, compare performance, and control deployments. It provides a structured way to organize models, roll back to previous versions if needed, and maintain lineage for compliance and reproducibility.

Exam trap

The trap here is that candidates may think deploying each version to a separate endpoint is necessary for isolation, but Vertex AI's traffic splitting on a single endpoint is the correct and cost-effective approach for managing multiple model versions.

48
MCQhard

You are an ML engineer at a logistics company. The company uses a Vertex AI Pipeline with BigQuery ML to train a model that predicts delivery delays based on weather, traffic, and historical order data. The pipeline runs daily and includes steps: (1) data extraction from BigQuery, (2) feature engineering using Dataflow, (3) model training with BigQuery ML (logistic regression), (4) model evaluation, and (5) conditional deployment to a Vertex AI Endpoint if accuracy > 0.85. Recently, the pipeline has been failing at step 5 with the error: "Vertex AI Endpoint creation failed: Quota limit of 1 endpoint per region exceeded." The company has already created one endpoint in the same region for another model. The pipeline is configured to create a new endpoint each time a model is deployed. The engineer needs to fix this with minimal changes to the pipeline code. Which course of action should the engineer take?

A.Submit a quota increase request to Google Cloud for Vertex AI Endpoints in the current region.
B.Change the region in the pipeline configuration to a region with available endpoint quota.
C.Remove the accuracy threshold and deploy every model automatically to a pre-created endpoint.
D.Modify the deployment step to check if an endpoint already exists and, if so, deploy a new model version to the existing endpoint instead of creating a new one.
AnswerD

Reuses the existing endpoint, avoiding quota limits.

Why this answer

Option D is correct because it directly addresses the root cause: the pipeline fails because it tries to create a new endpoint each time, exceeding the regional quota of one endpoint. By modifying the deployment step to check for an existing endpoint and deploying a new model version to it, the engineer avoids quota issues without altering the pipeline's core logic or requiring external approvals. This approach leverages Vertex AI's model versioning capability, which allows multiple model versions under a single endpoint, aligning with minimal code changes.

Exam trap

The trap here is that candidates may focus on quota limits as a resource issue (Option A) or a region issue (Option B), rather than recognizing that the pipeline's deployment logic is architecturally flawed by creating a new endpoint per deployment, which is both inefficient and violates best practices for model serving.

How to eliminate wrong answers

Option A is wrong because submitting a quota increase request is a slow, administrative process that does not constitute a minimal code change and may not be approved quickly, leaving the pipeline broken in the meantime. Option B is wrong because changing the region introduces additional complexity (e.g., data residency, latency, and potential BigQuery dataset location mismatches) and does not address the underlying design issue of creating a new endpoint per deployment. Option C is wrong because removing the accuracy threshold undermines the model quality gate, potentially deploying poor models, and still requires creating a new endpoint each time, which would still hit the quota limit.

49
MCQhard

A logistics company uses Vertex AI AutoML Tables to predict delivery delays based on order attributes, weather data, and traffic data. The model is retrained weekly using a Vertex AI Pipeline that runs a BigQuery query to get training data, then triggers AutoML training. Recently, the pipeline fails with the error 'Dataset not found' when the AutoML training step starts. The BigQuery query runs successfully and outputs a table. Which is the most likely cause?

A.The AutoML training step is referencing a different dataset location.
B.The training data has been manually deleted from Cloud Storage.
C.The pipeline's IAM permissions are insufficient to access BigQuery.
D.The BigQuery output table is not being passed as a Vertex AI Dataset resource.
AnswerD

The pipeline must create a Vertex AI Dataset from the BigQuery table for AutoML to use.

Why this answer

The error 'Dataset not found' occurs because AutoML Tables requires a Vertex AI Dataset resource (a metadata wrapper) to reference the training data, not just a BigQuery table. The pipeline's BigQuery query produces a table, but if that table is not explicitly converted into or passed as a Vertex AI Dataset resource (via the `aiplatform.Dataset` creation step), AutoML training cannot locate it. Option D correctly identifies this missing step as the root cause.

Exam trap

Google Cloud often tests the distinction between a raw data source (BigQuery table) and a Vertex AI Dataset resource, trapping candidates who assume AutoML can directly consume a BigQuery table without the required metadata wrapper.

How to eliminate wrong answers

Option A is wrong because the error is 'Dataset not found', not a location mismatch; AutoML Tables uses Dataset resource IDs, not direct paths, so a different dataset location would cause a different error (e.g., 'Permission denied' or 'Table not found'). Option B is wrong because the training data is stored in BigQuery, not Cloud Storage, and the error occurs at the AutoML step, not during data retrieval; manual deletion of a Cloud Storage file would not affect a BigQuery-sourced dataset. Option C is wrong because the BigQuery query runs successfully, proving the pipeline's IAM permissions to access BigQuery are sufficient; insufficient permissions would fail at the query step, not at the AutoML training step.

50
MCQhard

A global e-commerce company uses BigQuery ML to forecast daily sales for 10,000 products. They use a time-series model with a horizon of 7 days. Recently, forecasts for a specific product category have been consistently too high. They suspect the model is not capturing a new seasonal pattern. Which action should they take first to diagnose the issue?

A.Retrain the model with minimal additional data
B.Run ML.EVALUATE on the recent sales data and compare accuracy metrics
C.Increase the forecast horizon to 14 days
D.Switch to AutoML forecasting via Vertex AI AutoML
AnswerB

Allows quantifying drift and identifying underperforming categories.

Why this answer

Running ML.EVALUATE on recent sales data allows you to compute accuracy metrics (e.g., MAE, MAPE) specifically for the period where the model is failing. This isolates whether the error is due to a new seasonal pattern or another cause, without retraining or changing the model architecture. It is the standard first diagnostic step in BigQuery ML for time-series models.

Exam trap

Google Cloud often tests the principle that diagnosis must precede action—candidates mistakenly jump to retraining or switching tools instead of evaluating the existing model's performance on the problematic data window.

How to eliminate wrong answers

Option A is wrong because retraining with minimal additional data does not diagnose why forecasts are too high; it only incorporates more data without identifying the root cause. Option C is wrong because increasing the forecast horizon to 14 days would worsen the problem by extending predictions further into the uncertain future, not addressing the seasonal pattern miss. Option D is wrong because switching to AutoML forecasting via Vertex AI AutoML is a premature architectural change that bypasses the diagnostic step; you should first evaluate the current model to understand the error before migrating.

51
MCQeasy

A marketing agency uses Vertex AI AutoML Vision to classify social media images into brand logos and generic content. They have 5,000 images per class. The model achieves 95% accuracy on validation set, but in production it misclassifies many images that contain logos in unusual angles or lighting. They have limited ML expertise and want to improve robustness. Which action should they take?

A.Switch to a custom CNN model trained with data augmentation.
B.Augment the training set with images that have varied angles and lighting.
C.Deploy the model with a lower confidence threshold.
D.Use Vertex AI Matching Engine for similarity search instead.
AnswerB

Simply adding more diverse training images improves model robustness.

Why this answer

Option B is correct because the core issue is a domain shift between the training data (likely clean, canonical logo images) and production data (logos at unusual angles and lighting). Augmenting the training set with those specific variations directly addresses the lack of robustness by exposing the model to the missing edge cases during training, which is the most effective and simplest fix for a team with limited ML expertise using AutoML Vision.

Exam trap

The trap here is that candidates often assume a more complex model (custom CNN) is needed for robustness, when in fact the problem is a data distribution mismatch that can be fixed with simple data augmentation, which is the most practical solution for a team with limited ML expertise using a managed service like AutoML.

How to eliminate wrong answers

Option A is wrong because switching to a custom CNN model requires significant ML expertise to design, train, and tune, which contradicts the team's limited ML expertise; AutoML Vision already uses a CNN-based architecture under the hood, so the issue is data quality, not model architecture. Option C is wrong because lowering the confidence threshold would increase the number of false positives (misclassifying generic content as logos), which does not fix the model's inability to correctly recognize logos at unusual angles—it only changes the decision boundary, not the model's feature representation. Option D is wrong because Vertex AI Matching Engine is designed for similarity search (e.g., finding nearest neighbors in an embedding space), not for classification; it would require generating embeddings for all images and does not directly solve the classification robustness problem, nor does it leverage the existing labeled training data.

52
MCQmedium

A team wants to deploy a BigQuery ML model for online prediction. Which approach should they take?

A.Export the model to Cloud Storage and deploy to AI Platform
B.Export the model to Vertex AI and create an endpoint
C.None of these; BigQuery ML models cannot be used for online prediction
D.Use BigQuery ML's ML.PREDICT for online predictions
AnswerB

Vertex AI supports deploying BigQuery ML models for online serving.

Why this answer

BigQuery ML models can be exported directly to Vertex AI for online prediction. Vertex AI provides a managed endpoint that supports real-time serving with low latency, which is required for online prediction. Exporting to Cloud Storage and then deploying to AI Platform is outdated because AI Platform is now part of Vertex AI, and the recommended path is to export the model directly to Vertex AI and create an endpoint.

Exam trap

Google Cloud often tests the distinction between batch prediction (ML.PREDICT) and online prediction (Vertex AI endpoint), and the trap here is that candidates assume BigQuery ML's ML.PREDICT can serve real-time requests, but it is designed for batch processing only.

How to eliminate wrong answers

Option A is wrong because exporting the model to Cloud Storage and deploying to AI Platform is a legacy approach; AI Platform has been integrated into Vertex AI, and the current best practice is to export directly to Vertex AI. Option C is wrong because BigQuery ML models can indeed be used for online prediction by exporting them to Vertex AI and creating an endpoint. Option D is wrong because ML.PREDICT in BigQuery ML is designed for batch predictions, not for real-time online predictions with low-latency requirements.

53
Multi-Selectmedium

Which TWO are best practices when deploying AutoML models to production?

Select 2 answers
A.Monitor for data drift
B.Train the model on a disk to reduce latency
C.Enable Vertex AI Explainability
D.Deploy on sole-tenant nodes
E.Use TPUs for model serving
AnswersA, C

Data drift can degrade performance; monitoring is essential.

Why this answer

Monitoring for data drift (Option A) is a best practice because production models can degrade over time as the statistical properties of input data change. Vertex AI provides a Model Monitoring service that automatically detects skew and drift by comparing serving data distribution against training data distribution, triggering alerts when anomaly thresholds are breached. This ensures model reliability and performance in production.

Exam trap

Google Cloud often tests the misconception that TPUs are suitable for model serving, but TPUs are optimized for training and not supported for Vertex AI AutoML serving, which uses CPUs or GPUs for inference.

54
Multi-Selectmedium

A company wants to build a low-code ML pipeline using Vertex AI Pipelines and BigQuery ML. They need to train, evaluate, and deploy a model. Which TWO statements are correct about the integration between Vertex AI Pipelines and BigQuery ML? (Choose TWO.)

Select 2 answers
A.BigQuery ML models are automatically stored in Vertex AI Model Registry after training.
B.BigQuery ML supports hyperparameter tuning using the CREATE MODEL statement.
C.Vertex AI Pipelines supports automatic retry of failed steps due to transient errors.
D.Vertex AI Pipeline steps can include BigQuery ML training via the BigQueryQueryJob operator.
E.The trained BigQuery ML model can be registered in Vertex AI Model Registry and deployed to an endpoint.
AnswersD, E

BigQuery ML training can be invoked as a SQL query step.

Why this answer

Option D is correct because Vertex AI Pipelines can integrate with BigQuery ML by using the BigQueryQueryJob operator to execute SQL-based training queries, such as `CREATE MODEL`, as a pipeline step. This allows you to orchestrate BigQuery ML model training within a Vertex AI Pipeline, enabling a low-code ML workflow.

Exam trap

Google Cloud often tests the misconception that BigQuery ML models are automatically registered in Vertex AI Model Registry after training, but in reality, you must explicitly export or upload the model to the registry as a separate step.

55
MCQmedium

A financial services firm uses Vertex AI AutoML Natural Language to classify customer feedback into categories (positive, neutral, negative). They notice that the model performs poorly on neutral and negative classes, with high false negatives for negative. The dataset has 10,000 samples: 8,000 positive, 1,000 neutral, 1,000 negative. They have trained the model with automatic data split and default hyperparameters. Which course of action should they take to improve classification of minority classes?

A.Use a custom model with a weighted loss function.
B.Enable the 'weighted' option in AutoML NLP to handle class imbalance.
C.Increase the number of training node hours.
D.Set the data split to 50/25/25 for train/validation/test.
AnswerB

This built-in option adjusts weights for minority classes, improving performance.

Why this answer

Option B is correct because AutoML Natural Language provides a built-in 'weighted' option that automatically adjusts the loss function to penalize misclassifications of minority classes more heavily, directly addressing the class imbalance without requiring custom model development. This is the simplest and most effective way to improve recall for the neutral and negative classes within the AutoML framework.

Exam trap

Google Cloud often tests the misconception that any class imbalance problem requires a custom model or manual data augmentation, when in fact AutoML's built-in 'weighted' option is the prescribed low-code solution for such scenarios.

How to eliminate wrong answers

Option A is wrong because using a custom model with a weighted loss function would require moving away from AutoML's low-code paradigm, which contradicts the scenario's implicit requirement for a low-code solution; AutoML already handles weighting internally via the 'weighted' option. Option C is wrong because increasing training node hours only provides more compute time for the same training process and does not address the fundamental issue of class imbalance; it may lead to overfitting on the majority class. Option D is wrong because changing the data split ratio (e.g., 50/25/25) does not mitigate class imbalance; it merely redistributes the same skewed proportions across training, validation, and test sets, leaving the model still biased toward the majority positive class.

56
MCQhard

A healthcare startup deployed a Vertex AI AutoML Vision model to detect anomalies in medical images. The model performs well on the test set but has high latency in production, exceeding the 2-second SLA. The images are stored in Cloud Storage and are processed via a Cloud Function triggered by new uploads. What is the most likely cause?

A.The images are being resized and preprocessed in the Cloud Function, adding latency.
B.The model is deployed on a small machine type with insufficient compute.
C.The Cloud Function has a cold start issue.
D.The AutoML Vision endpoint is not using GPU acceleration.
AnswerB

A small machine type (e.g., n1-standard-2) can cause high inference latency under load.

Why this answer

Option B is correct because the most likely cause of high latency exceeding the 2-second SLA is that the Vertex AI AutoML Vision model is deployed on a small machine type (e.g., n1-standard-2 or lower) with insufficient compute resources (CPU/memory). AutoML Vision endpoints use container-based serving, and underpowered machines cannot handle the inference load efficiently, especially for high-resolution medical images, leading to response times beyond the SLA.

Exam trap

The trap here is that candidates confuse cold start latency (Cloud Function) with inference latency (model serving), or assume GPU acceleration is optional for AutoML endpoints, when in fact AutoML Vision automatically uses GPUs and the real bottleneck is the compute capacity of the serving machine.

How to eliminate wrong answers

Option A is wrong because image resizing and preprocessing in the Cloud Function typically add minimal latency (milliseconds) and are not the primary cause of exceeding a 2-second SLA; the bottleneck is inference, not preprocessing. Option C is wrong because cold starts in Cloud Functions add 1-2 seconds at most and can be mitigated with min instances, but the question states the model performs well on the test set, implying the issue is inference latency, not function initialization. Option D is wrong because AutoML Vision endpoints automatically use GPU acceleration when available and appropriate; the lack of GPU is not a configurable option for AutoML endpoints, and the latency issue is more likely due to insufficient CPU/memory on the serving machine.

57
Multi-Selecteasy

A data analyst wants to build a binary classification model using a low-code ML solution on Google Cloud. The dataset is stored in BigQuery and contains 500,000 rows with 20 features, including categorical and numerical columns. The analyst has minimal coding experience and needs to deploy the model as an API endpoint for real-time predictions. Which two Google Cloud services should the analyst use to accomplish this task with minimal code? Choose two options.

Select 2 answers
A.BigQuery ML
B.Vertex AI Endpoints
C.Cloud Functions
D.Vertex AI Workbench
E.AutoML Tables
AnswersB, E

Vertex AI Endpoints provides a serverless option to deploy trained models as REST APIs with autoscaling, ideal for real-time predictions without code.

Why this answer

Vertex AI Endpoints is correct because it provides a managed service to deploy trained models as REST API endpoints for real-time predictions with minimal code. The analyst can deploy an AutoML Tables model directly to a Vertex AI Endpoint, enabling low-code deployment and serving.

Exam trap

Google Cloud often tests the distinction between model training services (BigQuery ML, AutoML Tables) and model deployment services (Vertex AI Endpoints), leading candidates to incorrectly select BigQuery ML for real-time API deployment when it only supports batch inference.

58
MCQhard

Refer to the exhibit. What is being configured?

A.A model training pipeline
B.A batch prediction job
C.An endpoint with autoscaling based on request count
D.An endpoint with autoscaling based on CPU utilization
AnswerC

The autoscaling metric is 'prediction/online/requests'.

Why this answer

The exhibit shows the configuration of an Amazon SageMaker endpoint with a scaling policy that uses 'InvocationsPerInstance' as the target metric. This is the standard method for enabling autoscaling based on request count, where the scaling policy adjusts the number of instances to maintain a target number of invocations per instance. Option C is correct because the configuration explicitly sets the target tracking metric to 'SageMakerVariantInvocationsPerInstance', which triggers scaling based on request count.

Exam trap

Google Cloud often tests the distinction between request-count-based and CPU-based autoscaling; the trap here is that candidates see 'autoscaling' and assume CPU utilization is the default metric, but the exhibit explicitly shows the invocation-based metric, making Option D a distractor for those who do not read the configuration details carefully.

How to eliminate wrong answers

Option A is wrong because a model training pipeline involves steps like data preprocessing, training, and evaluation, not endpoint scaling policies or instance count settings. Option B is wrong because a batch prediction job uses a transform job or batch transform, not a persistent endpoint with autoscaling and invocation metrics. Option D is wrong because the exhibit shows 'InvocationsPerInstance' as the target metric, not CPU utilization; CPU-based autoscaling would use a metric like 'CPUUtilization' from CloudWatch, not the invocation-based metric configured here.

59
MCQmedium

A company uses AutoML Tables to predict customer churn. The model's AUC is low. Which action is most likely to improve performance?

A.Use a different optimization objective
B.Add more training data
C.Increase the training budget to 10 hours
D.Remove features with low importance
AnswerB

Correct: More data generally improves model performance.

Why this answer

Adding more training data often helps improve model performance. Increasing the training budget alone may not help if data is insufficient. Removing features with low importance could hurt.

Changing the optimization objective may not directly improve AUC.

60
MCQhard

A retail company has been using Vertex AI AutoML to predict store-level demand for each product. They have a pipeline that runs nightly: data is extracted from BigQuery, preprocessed via Dataflow, and then used to train a new AutoML model each night. The model is deployed to a Vertex AI Endpoint for real-time inference. After two months, they notice that predictions for a new product category (recently launched) are consistently inaccurate, with predicted sales far exceeding actuals. They suspect data drift due to the new category. The data scientist has limited coding skills and wants a low-code solution. Which course of action should they take to improve predictions for the new category?

A.Add the product category as a feature in the AutoML dataset and retrain the model with the updated dataset
B.Retrain the model using only data from the new product category to specialize the model for that category
C.Use Vertex AI custom training with a Python script to fine-tune the model on the new category data
D.Remove the new product category from the training data because it causes bias, and rely on the pre-trained model's general pattern
AnswerA

Allows model to learn category-specific demand patterns.

Why this answer

Adding the product category as a feature in the AutoML dataset allows the model to learn the distinct demand patterns of the new category directly from the data. Vertex AI AutoML automatically handles feature engineering and can adjust its predictions based on this categorical input, addressing the data drift without requiring custom code. This low-code approach leverages AutoML's built-in ability to incorporate new features and retrain with minimal manual intervention.

Exam trap

Google Cloud often tests the misconception that specialized models (Option B) or custom training (Option C) are necessary for new data patterns, when in fact AutoML's feature-based retraining is the simplest low-code solution that leverages the model's existing architecture.

How to eliminate wrong answers

Option B is wrong because retraining only on the new category data would discard the valuable historical patterns from other categories, leading to overfitting and poor generalization for the new category. Option C is wrong because it requires custom Python scripting and custom training, which contradicts the low-code requirement and the data scientist's limited coding skills. Option D is wrong because removing the new category from training data would prevent the model from learning its specific patterns, causing the model to continue making inaccurate predictions based on the old distribution.

61
MCQhard

A company uses Vertex AI Pipelines to orchestrate their ML training workflow. The pipeline includes a BigQuery ML training step, a model evaluation step, and a deployment step to Vertex AI Endpoints. The engineer notices that the pipeline fails intermittently due to a quota exceeded error on Vertex AI Endpoints during model deployment. What is the best long-term solution to prevent this failure?

A.Run the pipeline steps sequentially with longer wait times.
B.Add retry logic with exponential backoff to the deployment step in the pipeline.
C.Switch to deploying models using a custom container on Compute Engine.
D.Request a permanent quota increase for Vertex AI Endpoints.
AnswerB

Handles transient quota errors gracefully without manual intervention.

Why this answer

Option D is correct because implementing retry logic with exponential backoff is a resilient pattern for transient quota errors. Option A is wrong because increasing quota requires a support ticket and may not be granted immediately. Option B is wrong because using a custom container does not address quota limits.

Option C is wrong because sequential execution does not prevent quota errors.

62
MCQhard

A data engineering team wants to orchestrate an ML pipeline that includes data preprocessing in Dataflow, AutoML training, and model deployment. They want to minimize operational overhead. Which approach is best?

A.Use Cloud Composer with Apache Airflow DAG
B.Use AI Platform Training with script
C.Use Cloud Scheduler to trigger Cloud Functions
D.Use Vertex AI Pipelines with custom components
AnswerD

Correct: Purpose-built for ML workflows, minimal overhead.

Why this answer

Vertex AI Pipelines with custom components is the best choice because it provides a fully managed, serverless orchestration service that natively integrates with Dataflow, AutoML, and model deployment. This minimizes operational overhead by eliminating the need to manage infrastructure, handle retries, or maintain a separate orchestration server, while offering built-in artifact tracking and pipeline caching.

Exam trap

The trap here is that candidates often confuse 'orchestration' with 'scheduling' and pick Cloud Scheduler, failing to recognize that a multi-step ML pipeline requires workflow orchestration with dependencies and error handling, not just a time-based trigger.

How to eliminate wrong answers

Option A is wrong because Cloud Composer with Apache Airflow DAG requires managing a Kubernetes cluster, Airflow workers, and infrastructure, which increases operational overhead rather than minimizing it. Option B is wrong because AI Platform Training with a script only handles the training step in isolation, not the end-to-end orchestration of preprocessing, training, and deployment. Option C is wrong because Cloud Scheduler to trigger Cloud Functions is a simple time-based trigger that lacks the workflow orchestration capabilities (e.g., conditional branching, parallel steps, dependency management) needed for a multi-step ML pipeline.

63
MCQmedium

A manufacturing company wants to predict equipment failure using sensor data stored in BigQuery. They have limited ML expertise and want to use AutoML Tables. The data includes timestamps, numerical sensor readings, and a boolean 'failure' column. The dataset is highly imbalanced with only 1% failure cases. Which of the following is the most effective approach to handle the imbalance in AutoML Tables?

A.Let AutoML Tables handle the imbalance automatically; it has built-in techniques for class imbalance.
B.Downsample the majority class to balance the dataset.
C.Use a custom loss function in the training configuration.
D.Oversample the minority class using SQL before training.
AnswerA

AutoML Tables automatically adjusts for imbalance.

Why this answer

AutoML Tables has built-in techniques to handle class imbalance, such as automatically adjusting class weights and using stratified sampling during training. This allows the model to learn from the minority class without requiring manual data preprocessing, making it the most effective and simplest approach for users with limited ML expertise.

Exam trap

The trap here is that candidates may assume manual resampling (downsampling or oversampling) is always required for imbalanced datasets, but AutoML Tables abstracts this complexity, and the exam tests whether you trust its built-in capabilities for low-code solutions.

How to eliminate wrong answers

Option B is wrong because downsampling the majority class would discard valuable data, potentially reducing model performance and losing information about normal operating conditions. Option C is wrong because AutoML Tables does not expose a custom loss function configuration; it abstracts away such hyperparameters and uses its own optimized training pipeline. Option D is wrong because oversampling the minority class using SQL before training is unnecessary and could lead to overfitting or data leakage; AutoML Tables handles imbalance internally without manual intervention.

64
MCQhard

A company wants to use low-code ML for time series forecasting with 5 years of hourly data. They need to incorporate holiday effects. Which solution best meets these requirements?

A.Custom LSTM model
B.BigQuery ML ARIMA_PLUS with holiday regression
C.Vertex AI AutoML Tables with timestamp and holiday features
D.Vertex AI AutoML Forecasting with timestamp and holiday feature
AnswerB

ARIMA_PLUS directly supports holiday effects in its model.

Why this answer

BigQuery ML ARIMA_PLUS with holiday regression is the correct choice because it is a low-code solution that natively supports time series forecasting with built-in holiday effect modeling. ARIMA_PLUS automatically handles seasonality, trend, and holiday regression without requiring custom code, making it ideal for 5 years of hourly data.

Exam trap

Google Cloud often tests the distinction between AutoML Forecasting and BigQuery ML ARIMA_PLUS, where candidates mistakenly assume AutoML Forecasting natively handles holiday regression, but it requires explicit feature engineering, while ARIMA_PLUS provides built-in holiday support.

How to eliminate wrong answers

Option A is wrong because a custom LSTM model requires significant coding and ML expertise, violating the low-code requirement. Option C is wrong because Vertex AI AutoML Tables is designed for tabular data and does not natively support time series forecasting with holiday effects; it would require manual feature engineering. Option D is wrong because Vertex AI AutoML Forecasting does not natively incorporate holiday regression; it focuses on time series features but lacks built-in holiday effect handling, requiring additional preprocessing.

65
Multi-Selecteasy

A data analyst wants to use low-code ML to analyze text data. Which TWO Google Cloud services are appropriate?

Select 2 answers
A.Vertex AI Workbench
B.Document AI
C.Cloud Natural Language API
D.AutoML Natural Language
E.BigQuery ML for sentiment
AnswersC, D

Correct: Pre-trained sentiment and entity analysis via API.

Why this answer

Cloud Natural Language API is a low-code ML service that provides pre-trained models for analyzing text, including sentiment analysis, entity recognition, and syntax analysis, without requiring custom model training. It is appropriate for a data analyst who wants to quickly extract insights from text data using simple API calls.

Exam trap

The trap here is that candidates may confuse BigQuery ML's sentiment analysis feature (which is SQL-based and not a dedicated low-code service) with a standalone low-code ML service, or mistakenly think Vertex AI Workbench is low-code when it actually requires coding in Python or other languages.

66
MCQmedium

A developer sees this error when calling the endpoint. What is the most likely cause?

A.The model is still in training
B.The model is deployed but not yet serving
C.The endpoint has no deployed model
D.The request payload size exceeds limit
AnswerB

Correct: Model deployment is still initializing.

Why this answer

The error 'model is not serving' occurs when the endpoint exists and a model is deployed, but the deployment is not yet in the 'serving' state (e.g., still loading, scaling, or warming up). In SageMaker, the endpoint must transition through 'Creating' and 'InService' before it can serve inference requests. Option B correctly identifies that the model is deployed but not yet ready to handle traffic.

Exam trap

Google Cloud often tests the distinction between 'no model deployed' and 'model deployed but not serving', where candidates confuse a deployment that exists but is not yet ready with a missing deployment.

How to eliminate wrong answers

Option A is wrong because if the model were still in training, the endpoint would not exist or would return a 'ModelNotFound' error, not a 'not serving' error. Option C is wrong because if the endpoint had no deployed model, the error would be 'NoSuchModel' or 'EndpointNotFound', not a serving state error. Option D is wrong because payload size limits (typically 5 MB for SageMaker real-time endpoints) cause a '413 Request Entity Too Large' or 'PayloadTooLarge' error, not a 'not serving' error.

67
Multi-Selecthard

Which TWO are best practices for implementing a low-code ML solution using Vertex AI AutoML? (Choose 2)

Select 2 answers
A.Use the AutoML recommended data split (train/validation/test) to avoid overfitting.
B.Impute missing values manually before uploading the dataset.
C.Normalize numerical features to zero mean and unit variance.
D.Enable automatic feature engineering by leaving feature columns as raw data.
E.Export the data and train a custom model with a different architecture.
AnswersA, D

Why A is correct: AutoML optimizes split for best performance.

Why this answer

Option A is correct because AutoML's recommended data split (train/validation/test) is designed to prevent overfitting by ensuring the model is evaluated on unseen data. AutoML automatically handles the split ratio (e.g., 80/10/10) and stratification, which is a best practice for low-code ML solutions where manual split logic is error-prone.

Exam trap

Google Cloud often tests the misconception that manual preprocessing (like imputation or normalization) is required for AutoML, when in fact AutoML is designed to handle these steps automatically, and manual intervention can degrade performance or cause errors.

68
MCQmedium

Refer to the exhibit. A data scientist runs the above BigQuery ML query to create a logistic regression model. After training, the model is evaluated using ML.EVALUATE. The evaluation shows poor performance with high bias. Which action would most likely improve the model's performance?

A.Remove the TRANSFORM clause and use raw features.
B.Change the model_type to 'linear_reg'.
C.Add more complex features by including polynomial expansions.
D.Increase the number of training iterations by setting MAX_ITERATIONS.
AnswerC

Polynomial expansions increase model complexity, allowing it to learn non-linear patterns from the data, which addresses high bias.

Why this answer

High bias indicates the model is underfitting the data, meaning it is too simple to capture underlying patterns. Adding polynomial expansions (feature crosses) in the TRANSFORM clause increases model complexity, allowing the logistic regression to learn non-linear decision boundaries, which directly addresses underfitting.

Exam trap

Google Cloud often tests the distinction between bias and variance; the trap here is that candidates might confuse high bias (underfitting) with high variance (overfitting) and incorrectly choose to simplify the model or increase iterations, rather than adding complexity.

How to eliminate wrong answers

Option A is wrong because removing the TRANSFORM clause would discard any feature preprocessing, likely making the model even simpler and worsening high bias. Option B is wrong because changing model_type to 'linear_reg' would switch to a regression task, which is inappropriate for classification and does not address bias in a logistic regression model. Option D is wrong because increasing MAX_ITERATIONS only affects convergence of the optimization algorithm; if the model is too simple (high bias), more iterations will not help it learn more complex patterns.

69
MCQhard

Refer to the exhibit. A data analyst creates a BigQuery ML logistic regression model for churn prediction. The model evaluation shows high precision but low recall. Which change to the model creation would most likely improve recall?

A.Drop more columns to reduce overfitting.
B.Increase the training data by including customers without churn dates.
C.Use ML.ADJUST_THRESHOLD to lower the classification threshold.
D.Change model_type to 'BOOSTED_TREE_CLASSIFIER'.
AnswerC

Why C is correct: Lowering threshold increases sensitivity, improving recall.

Why this answer

Option C is correct because lowering the classification threshold (e.g., from 0.5 to 0.3) will classify more customers as positive (churn), increasing recall (true positives / (true positives + false negatives)). In BigQuery ML, ML.ADJUST_THRESHOLD directly modifies the decision boundary, trading off precision for recall. This is the most direct way to address low recall without altering the model architecture or training data.

Exam trap

Google Cloud often tests the misconception that changing the model type (e.g., to boosted trees) is the default solution for any performance metric issue, when in fact the threshold adjustment is the simplest and most direct way to trade off precision and recall in a logistic regression model.

How to eliminate wrong answers

Option A is wrong because dropping more columns to reduce overfitting would likely harm recall further by removing potentially informative features, and overfitting typically causes high variance, not low recall. Option B is wrong because including customers without churn dates (non-churners) would increase the class imbalance, making the model even more biased toward the majority class and likely reducing recall further. Option D is wrong because changing model_type to 'BOOSTED_TREE_CLASSIFIER' might improve overall performance but does not specifically target the recall issue; it is a model architecture change that could also reduce recall if the class imbalance is not addressed, and it is not the most direct fix for a threshold-related precision-recall trade-off.

70
MCQmedium

Refer to the exhibit. What is this Cloud Build step doing?

A.Uploading a model to Vertex AI Model Registry
B.Deploying a model to a Vertex AI endpoint
C.Creating a custom container for prediction
D.Training a model in Vertex AI
AnswerA

The 'upload' command registers the model.

Why this answer

The Cloud Build step shown uses the `gcloud ai models upload` command, which specifically uploads a model artifact to the Vertex AI Model Registry. This action registers the model metadata and location in Vertex AI, making it available for versioning and later deployment, but does not create an endpoint or perform training.

Exam trap

Google Cloud often tests the distinction between model registration (upload) and model deployment (endpoint creation), leading candidates to confuse the `gcloud ai models upload` step with the actual deployment to an endpoint.

How to eliminate wrong answers

Option B is wrong because deploying a model to a Vertex AI endpoint requires the `gcloud ai endpoints deploy-model` command, not `gcloud ai models upload`. Option C is wrong because creating a custom container for prediction involves building and pushing a Docker image (e.g., via `gcloud builds submit` or `docker push`), not uploading a model to the registry. Option D is wrong because training a model in Vertex AI uses `gcloud ai custom-jobs create` or `gcloud ai training jobs submit`, not the model upload command.

71
MCQhard

A company is using AutoML Vision for object detection and observes high latency for online predictions. What can they do to reduce latency?

A.Reduce the training budget to create a smaller model
B.Use continuous batch prediction instead of online prediction
C.Deploy the model to a region closer to the users
D.Use a larger batch size in the prediction request
AnswerA

A smaller model has lower inference latency.

Why this answer

Reducing the training budget in AutoML Vision forces the model to use fewer node-hours, which typically results in a smaller and less complex model. A smaller model has fewer parameters and requires less computation during inference, directly reducing the latency for online predictions. This is a trade-off between model accuracy and inference speed.

Exam trap

The trap here is that candidates often confuse network latency with inference latency, assuming that deploying closer to users (Option C) is the primary fix, when in fact the question specifically targets high latency for online predictions caused by model complexity.

How to eliminate wrong answers

Option B is wrong because continuous batch prediction is designed for offline, asynchronous processing of large datasets and does not reduce latency for real-time online predictions; it actually increases end-to-end time. Option C is wrong because deploying the model to a region closer to users reduces network latency but does not address the model inference latency itself, which is the primary bottleneck in AutoML Vision's online prediction. Option D is wrong because AutoML Vision online prediction endpoints do not support user-defined batch sizes; the batch size is fixed by the service, and attempting to use a larger batch size would not be accepted or would increase latency per request.

72
Drag & Dropmedium

Drag and drop the steps to set up data lineage tracking for ML pipelines using Vertex AI Experiments in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Start with SDK setup, then create an experiment, log metrics, record artifacts, and review lineage.

73
MCQmedium

A retail company wants to build a customer churn prediction model using BigQuery ML. The data is stored in BigQuery tables and includes customer demographics, purchase history, and support interactions. The data scientist wants to experiment with different model types quickly without moving data to another environment. Which approach should they use?

A.Use Cloud Composer to orchestrate a custom training pipeline on Vertex AI.
B.Use AI Platform Notebooks with pandas and scikit-learn.
C.Use BigQuery ML to create and evaluate models directly in BigQuery.
D.Export the data to Cloud Storage and use Vertex AI AutoML Tables.
AnswerC

Why C is correct: BigQuery ML is a low-code solution that works directly on BigQuery data.

Why this answer

BigQuery ML (BQML) allows data scientists to create, train, and evaluate machine learning models directly in BigQuery using SQL, without moving data to another environment. This approach supports rapid experimentation with various model types (e.g., logistic regression, boosted trees, deep neural networks) and is ideal for the stated requirement of quick iteration while keeping data in place.

Exam trap

Google Cloud often tests the candidate's ability to recognize that BigQuery ML is purpose-built for low-code, in-database ML experimentation, and the trap here is assuming that more complex or external tools (like Vertex AI or Cloud Composer) are necessary when the simpler, integrated solution suffices.

How to eliminate wrong answers

Option A is wrong because Cloud Composer is an orchestration tool for workflows, not a direct model training environment; using it to build a custom pipeline on Vertex AI would require moving data and add unnecessary complexity. Option B is wrong because AI Platform Notebooks with pandas and scikit-learn require exporting data from BigQuery to a Python environment, violating the requirement to keep data in BigQuery. Option D is wrong because exporting data to Cloud Storage for Vertex AI AutoML Tables introduces data movement and latency, contradicting the need for quick experimentation without moving data.

74
MCQhard

Refer to the exhibit. A data scientist trained a BigQuery ML classification model to detect fraudulent transactions. The dataset has 95% non-fraud (class 0) and 5% fraud (class 1). The evaluation metrics show high accuracy (0.91) but low recall (0.60) for fraud detection. Which low-code approach should the data scientist take to improve recall without significantly sacrificing precision?

A.Use the ML.PREDICT function with a lower classification threshold (e.g., 0.3 instead of 0.5) to capture more positive cases.
B.Apply feature selection to reduce the number of features and focus on the most predictive ones.
C.Increase the number of training iterations by setting the MAX_ITERATIONS option to a higher value.
D.Re-train the model using AutoML Tables with class weights to penalize false negatives more heavily.
AnswerA

Lowering the threshold increases recall by classifying more instances as positive.

Why this answer

Option A is correct because lowering the classification threshold in ML.PREDICT (e.g., from 0.5 to 0.3) causes the model to classify more transactions as fraud, directly increasing recall. This is a low-code adjustment that does not require retraining or complex feature engineering, and it allows the data scientist to trade off precision for recall as needed.

Exam trap

Google Cloud often tests the misconception that improving recall always requires retraining or complex model changes, when in fact a simple threshold adjustment in ML.PREDICT is a valid low-code technique to shift the precision-recall balance.

How to eliminate wrong answers

Option B is wrong because feature selection reduces the number of input features, which may improve training speed or reduce overfitting but does not directly increase recall for a specific class; it can even harm recall if important fraud-indicative features are removed. Option C is wrong because increasing MAX_ITERATIONS only affects the convergence of the training algorithm; if the model is already converged, more iterations will not improve recall and may lead to overfitting. Option D is wrong because AutoML Tables is a separate service, not a low-code approach within BigQuery ML; while class weights can help, this option requires moving to a different platform and is not the simplest low-code fix described in the question.

75
Multi-Selectmedium

Which TWO of the following are benefits of using BigQuery ML for low-code model development?

Select 2 answers
A.Train models directly on data in BigQuery without moving it
B.Automatic feature engineering and hyperparameter tuning
C.Automatic scaling to petabytes of data
D.Built-in model explainability for all model types
E.Support for image classification tasks
AnswersA, C

Data stays in BigQuery, eliminating ETL.

Why this answer

Option A is correct because BigQuery ML allows you to train machine learning models using SQL directly on data stored in BigQuery, eliminating the need to export or move data to a separate environment. This reduces data transfer latency, simplifies security governance, and leverages BigQuery's native storage and compute separation.

Exam trap

Google Cloud often tests the misconception that 'low-code' means 'fully automated' — candidates mistakenly assume BigQuery ML handles feature engineering and hyperparameter tuning automatically, when in fact it only reduces coding effort for model creation, not for data preparation or optimization.

Ready to test yourself?

Try a timed practice session using only Architecting low-code ML solutions questions.