Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 376–450

506 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 6 of 7

376

MCQeasy

A team just moved a model from prototype to production using Vertex AI. They notice prediction errors for certain inputs that were not present in training data. What should they do to detect such issues automatically?

A.Set up Vertex AI Experiments to compare predictions

B.Use BigQuery ML to analyze prediction requests

C.Enable Cloud Logging and set up alerts for error logs

D.Enable Vertex AI Model Monitoring to detect prediction anomalies

AnswerD

Model Monitoring automatically checks for drift and anomalies.

Why this answer

Option D is correct because Vertex AI Model Monitoring is specifically designed to detect prediction anomalies, such as data drift and feature skew, by comparing production prediction requests against the training data distribution. This allows the team to automatically identify inputs that deviate from the training data, even if those exact inputs were not present during training, without manual inspection.

Exam trap

Google Cloud often tests the distinction between monitoring for operational errors (e.g., HTTP errors) versus monitoring for model-specific issues (e.g., data drift), leading candidates to choose Cloud Logging (Option C) when the correct answer requires a dedicated ML monitoring service.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is used for tracking and comparing model training runs and hyperparameter tuning, not for monitoring production prediction requests or detecting anomalies in real-time. Option B is wrong because BigQuery ML is a tool for creating and executing machine learning models directly in BigQuery using SQL, not for analyzing prediction requests from a deployed Vertex AI model or detecting input anomalies. Option C is wrong because while Cloud Logging can capture error logs, it only reacts to explicit errors (e.g., 4xx/5xx HTTP responses) and cannot automatically detect prediction anomalies like data drift or feature skew that do not generate error logs.

Full explanation →

377

MCQeasy

A pharmaceutical company uses Vertex AI Pipelines with custom training containers. Recently, the pipeline has been failing with 'Container failed with exit code 137' (out of memory). The container runs with default memory limit. The team needs to fix this without changing the code. The project quota for CPU and memory is sufficient. What should the team do?

A.Add a resource hint to the container spec for more memory.

B.Set the 'machineType' field for the training task to a higher memory machine.

C.Increase the model parallelism by using multi-worker training.

D.Use a smaller dataset for training.

AnswerB

This directly provides more memory to the container without code changes.

Why this answer

Option B is correct because the container is running out of memory (exit code 137) with the default memory limit. In Vertex AI Pipelines, when using custom training containers, the default memory allocation is typically 4 GiB. By setting the 'machineType' field to a higher memory machine (e.g., n1-highmem-8), the container automatically receives more memory without requiring code changes.

This directly resolves the OOM issue while respecting the constraint of not modifying the code.

Exam trap

Google Cloud often tests the misconception that resource hints or environment variables can override default memory limits in Vertex AI Pipelines, but the correct mechanism is the 'machineType' field in the task specification, not hints or code changes.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipelines does not support resource hints in the container spec for custom training containers; resource allocation is controlled via the 'machineType' field, not hints. Option C is wrong because multi-worker training (model parallelism) distributes computation across workers but does not increase the memory available to a single container; it would require code changes to implement distributed training, which violates the 'without changing the code' constraint. Option D is wrong because using a smaller dataset may reduce memory usage but changes the training data, which is not a valid fix for an OOM error in a production pipeline; the problem is memory allocation, not dataset size.

Full explanation →

378

Matchingmedium

Match each regularization technique to its effect.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Adds absolute value of weights to loss, induces sparsity

Adds squared magnitude of weights to loss, prevents overfitting

Randomly drops units during training to prevent co-adaptation

Stops training when validation performance stops improving

Increases training data diversity through transformations

Why these pairings

Regularization helps generalize models.

Full explanation →

379

Drag & Dropmedium

Drag and drop the steps to set up a batch prediction job using Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Prepare input data, register model, create job, submit, and retrieve results.

Full explanation →

380

Multi-Selecthard

Which TWO services are commonly used together to implement an end-to-end ML pipeline that automatically retrains and deploys models on Vertex AI? (Choose two.)

Select 2 answers

A.Cloud Dataflow

B.Vertex AI Pipelines

C.Cloud Composer

D.Cloud Source Repositories

E.Cloud Scheduler

AnswersB, E

Pipelines orchestrate the training and deployment steps.

Why this answer

Vertex AI Pipelines (B) is the correct choice because it provides a serverless, scalable orchestration service specifically designed to build, run, and manage ML pipelines on Vertex AI. It enables you to define a directed acyclic graph (DAG) of steps—including data preprocessing, training, evaluation, and deployment—and can be triggered automatically to retrain and deploy models. Cloud Scheduler (E) is commonly used together with Vertex AI Pipelines to schedule pipeline runs at fixed intervals or in response to time-based triggers, forming a complete end-to-end automated retraining and deployment workflow.

Exam trap

Google Cloud often tests the distinction between general-purpose orchestration tools (Cloud Composer) and ML-native pipeline services (Vertex AI Pipelines), leading candidates to pick Cloud Composer because of its familiarity with Airflow, even though Vertex AI Pipelines is the correct, integrated choice for end-to-end ML workflows on Vertex AI.

Full explanation →

381

MCQeasy

A retail company wants to forecast daily sales for inventory planning. They have 3 years of historical sales data with clear weekly and yearly seasonality. Which approach should they use?

A.Call a pre-built Google Cloud API for sales prediction

B.Use a linear regression model in Vertex AI

C.Use Vertex AI AutoML Tables with date as feature

D.Use BigQuery ML to train an ARIMA_PLUS model

AnswerD

ARIMA_PLUS handles seasonality and is optimized for time series.

Why this answer

Option D is correct because ARIMA_PLUS in BigQuery ML is specifically designed for time-series forecasting with multiple seasonalities (weekly and yearly). It automatically handles seasonality detection, trend decomposition, and holiday effects, making it ideal for retail sales data with clear periodic patterns.

Exam trap

The trap here is that candidates often choose AutoML Tables (Option C) thinking it can handle any structured data, but they miss that AutoML Tables is not a dedicated time-series model and requires manual feature engineering to capture seasonality, whereas ARIMA_PLUS is purpose-built for this scenario.

How to eliminate wrong answers

Option A is wrong because calling a pre-built Google Cloud API for sales prediction is vague and not a specific, integrated solution for time-series forecasting with seasonality; such APIs may not exist or may not handle custom seasonality patterns. Option B is wrong because linear regression in Vertex AI is a general-purpose model that does not inherently capture time-series dependencies like weekly and yearly seasonality without extensive feature engineering (e.g., lag features, Fourier terms). Option C is wrong because Vertex AI AutoML Tables with date as a feature treats the problem as a regression on tabular data, not as a dedicated time-series model, and may fail to properly model temporal autocorrelation and multiple seasonalities without manual time-series preprocessing.

Full explanation →

382

MCQmedium

A healthcare organization is building a machine learning model to predict patient readmission risk. They have sensitive data stored in BigQuery that includes protected health information (PHI). The data science team uses Vertex AI Workbench notebooks to explore the data and develop models. The organization's security policy requires that all PHI data must be encrypted at rest and in transit, and that access to the data is logged and audited. They also need to ensure that the data used for model training is de-identified to remove direct identifiers such as patient names and SSNs. The team wants to automate the de-identification process as part of the data pipeline. Which approach meets these requirements?

A.Create a Dataflow pipeline that reads from the original BigQuery table, applies Cloud DLP de-identification transforms, and writes to a new BigQuery table. Grant the data science team access to the de-identified table.

B.Enable Shielded VM on Vertex AI Workbench notebooks and use VPC-SC to restrict data access.

C.Use Cloud Key Management Service to encrypt the PHI columns in BigQuery, and share the encryption key with the data science team.

D.Use BigQuery row-level security to mask PHI columns for the data science team, and train the model directly on the original table.

AnswerA

Dataflow with DLP automates de-identification and creates a safe dataset.

Why this answer

Option A is correct because it uses Cloud DLP within a Dataflow pipeline to automatically de-identify PHI data as it is read from the original BigQuery table and written to a new, de-identified table. This satisfies the requirement for automated de-identification, while the original table remains encrypted at rest (BigQuery default) and in transit (TLS), and access to the original data can be logged via Cloud Audit Logs. The data science team only gets access to the de-identified table, ensuring PHI is not exposed during model development.

Exam trap

Google Cloud often tests the distinction between data masking/encryption (which still exposes PHI to authorized users) and true de-identification (which removes or transforms PHI so it is no longer considered protected health information).

How to eliminate wrong answers

Option B is wrong because Shielded VM and VPC-SC provide infrastructure security (integrity, network perimeter) but do not de-identify PHI data; the data science team would still see raw PHI in the notebooks. Option C is wrong because Cloud KMS encryption protects data at rest but does not remove or mask PHI columns; sharing the encryption key with the data science team would give them access to the raw PHI, violating the de-identification requirement. Option D is wrong because BigQuery row-level security masks columns at query time but does not de-identify the underlying data; the model training would still use the original table with PHI present in the masked columns, and the masking is not a permanent de-identification suitable for an automated pipeline.

Full explanation →

383

MCQeasy

Refer to the exhibit. The team notices that the pipeline fails to read data from the specified Cloud Storage path. What is the most likely issue?

A.The bucket does not exist

B.The pipeline runner is incorrect

C.The region is mismatched

D.The service account lacks `storage.objectViewer` permission

AnswerD

The Dataflow service account needs read access to Cloud Storage.

Why this answer

The pipeline fails to read data from Cloud Storage because the service account lacks the `storage.objectViewer` IAM role, which grants the `storage.objects.get` and `storage.objects.list` permissions required to read objects. Without this role, the pipeline cannot authenticate or authorize the read operation, even if the bucket and path are correct.

Exam trap

Google Cloud often tests the distinction between bucket-level permissions (like `storage.objectViewer`) and project-level roles, leading candidates to overlook that the service account must have the specific IAM role on the bucket or project, not just any storage role.

How to eliminate wrong answers

Option A is wrong because if the bucket did not exist, the error would typically be a 404 'Bucket not found' or a similar explicit message, not a generic read failure. Option B is wrong because the pipeline runner (e.g., Dataflow, Apache Beam) is responsible for executing the pipeline logic, not for authenticating to Cloud Storage; a runner mismatch would cause execution errors, not permission-related read failures. Option C is wrong because Cloud Storage bucket access is global and region-mismatch errors occur only for specific operations like writing to a regional bucket from a different region, but reading is allowed across regions; a region mismatch would not block read access.

Full explanation →

384

MCQhard

Refer to the exhibit. An engineer notices no drift alerts but the model performance has degraded. What is the likely cause?

A.Feature attribution monitoring is causing too many false positives

B.Drift threshold for income is too high

C.Skew thresholds are not configured for categorical features

D.Concept drift is occurring, which is not captured by drift or skew detection

AnswerD

Concept drift affects the model's predictive relationship, not input distributions.

Why this answer

Concept drift occurs when the statistical properties of the target variable change over time, causing model performance to degrade even when the input data distribution remains stable. Drift detection (e.g., data drift or skew) monitors changes in feature distributions, not the relationship between features and the target. Since no drift alerts were triggered, the input data appears unchanged, but the model's predictive relationship has shifted — this is classic concept drift, which requires performance monitoring (e.g., accuracy, F1-score) rather than drift or skew detection.

Exam trap

Google Cloud often tests the distinction between data drift (input feature changes) and concept drift (target relationship changes), trapping candidates who assume that no drift alerts mean the model is healthy, when in fact performance degradation can occur without any feature distribution shift.

How to eliminate wrong answers

Option A is wrong because feature attribution monitoring (e.g., SHAP values) explains model predictions but does not generate false positives for drift; it is unrelated to the absence of drift alerts. Option B is wrong because a drift threshold for income being too high would suppress drift alerts for that feature, but the scenario states no drift alerts at all, and concept drift is not captured by feature-level drift thresholds. Option C is wrong because skew thresholds for categorical features detect distribution shifts in those features, but the problem is a change in the target relationship (concept drift), not a change in feature distributions.

Full explanation →

385

Multi-Selectmedium

A team wants to serve a large PyTorch model (3 GB) for online predictions with low latency. Which THREE actions should they take?

Select 3 answers

A.Use a custom container that preloads the model into memory.

B.Use batch prediction instead of online prediction.

C.Use a machine type with a GPU accelerator.

D.Optimize the model using TorchScript or quantization.

E.Deploy in multiple regions with Cloud Load Balancing.

AnswersA, C, D

Preloading avoids loading model on each request, reducing latency.

Why this answer

Options A, B, and E are correct. Option A: GPU accelerator speeds up inference. Option B: model optimization (TorchScript, quantization) reduces inference time.

Option E: custom container with model preloading reduces cold start latency. Option C (multiregion) reduces network latency, not prediction latency. Option D (batch prediction) is not for online.

Full explanation →

386

MCQeasy

An ML engineer runs this command to upload a model. The model artifact in Cloud Storage is a directory containing model.pkl and a custom preprocessing script. What will happen when he later deploys this model to an endpoint and sends a prediction request?

A.The prediction will succeed because the pre-built container automatically detects and uses the custom preprocessing script.

B.The prediction will succeed only if he also specifies a custom prediction routine.

C.The prediction will fail because the custom preprocessing script is not a standard scikit-learn serialized object.

D.The prediction will fail because the artifact URI must point to a single file not a directory.

AnswerC

The pre-built container only loads the model; custom preprocessing is not executed.

Why this answer

Option C is correct because the pre-built container for scikit-learn expects a single serialized model file (e.g., model.pkl) as the artifact. A directory containing a custom preprocessing script is not a standard scikit-learn serialized object, so the container cannot load or execute it, causing the prediction to fail.

Exam trap

Google Cloud often tests the misconception that a pre-built container can handle arbitrary directories or custom scripts, when in fact it strictly expects a single serialized model file.

How to eliminate wrong answers

Option A is wrong because the pre-built container does not automatically detect or use custom preprocessing scripts; it only loads a single model file. Option B is wrong because specifying a custom prediction routine would not fix the issue—the artifact must still be a single file, and the custom routine would need to be packaged differently (e.g., as a source distribution). Option D is wrong because the artifact URI can point to a directory; the failure is due to the directory containing a non-standard object, not because it is a directory.

Full explanation →

387

MCQeasy

A company deploys a model on Vertex AI Prediction for real-time inference. Users report intermittent high latency during peak hours. The model is deployed on a single machine type with `min_replica_count=1` and `max_replica_count=5`. Autoscaling is enabled based on CPU utilization. What is the most likely cause of the latency spikes?

A.The model server is crashing under load due to memory issues.

B.Autoscaling based on CPU utilization does not react quickly to inference request spikes.

C.The load balancer is misconfigured and routes traffic unevenly.

D.The container image is not optimized for the model.

AnswerB

CPU utilization may lag behind request surges; Vertex AI recommends using target utilization or custom metrics for faster response.

Why this answer

Option B is correct because CPU utilization may not be a good proxy for inference load; the system may not scale up fast enough under sudden traffic bursts. Option A is wrong because Vertex AI automatically manages container health. Option C is wrong because Vertex AI endpoints automatically distribute traffic.

Option D is wrong because the container image is built correctly.

Full explanation →

388

MCQeasy

A company has deployed a computer vision model on Vertex AI Prediction using a custom container. The model processes high-resolution images and serves predictions to a mobile application. Recently, users have reported that predictions sometimes take over 10 seconds, and the application times out. The ML engineer's monitoring shows that the endpoint's CPU utilization is consistently high (above 85%) and that the request latency spikes during peak hours. The model is deployed on n1-standard-4 machines with automatic scaling set to minReplicaCount=1 and maxReplicaCount=5. The engineer has observed that the endpoint rarely scales beyond 2 replicas even during peak hours. What should the engineer do to reduce prediction latency?

A.Increase the maxReplicaCount to 20 to allow more instances during spikes.

B.Review the custom container's startup time and consider pre-warming or reducing model loading time.

C.Change the machine type to a higher CPU machine like n1-standard-8.

D.Set the minReplicaCount to 5 to ensure enough capacity at all times.

AnswerB

The endpoint rarely scales beyond 2 replicas due to slow container startup, causing CPU overload on existing instances. Reducing startup time or pre-warming enables faster scaling and lower latency.

Why this answer

Option D is correct because the root cause is likely that the custom container takes a long time to start (model loading), preventing the endpoint from scaling quickly. Pre-warming or reducing model loading time addresses this directly. Option A (increasing max replicas) does not solve the scaling delay.

Option B (upgrading machine type) may help but does not address the scaling speed. Option C (increasing min replicas) would be costly and still not handle sudden spikes if new replicas start slowly.

Full explanation →

389

MCQmedium

A manufacturing company wants to predict equipment failure using sensor data stored in BigQuery. They have limited ML expertise and want to use AutoML Tables. The data includes timestamps, numerical sensor readings, and a boolean 'failure' column. The dataset is highly imbalanced with only 1% failure cases. Which of the following is the most effective approach to handle the imbalance in AutoML Tables?

A.Let AutoML Tables handle the imbalance automatically; it has built-in techniques for class imbalance.

B.Downsample the majority class to balance the dataset.

C.Use a custom loss function in the training configuration.

D.Oversample the minority class using SQL before training.

AnswerA

AutoML Tables automatically adjusts for imbalance.

Why this answer

AutoML Tables has built-in techniques to handle class imbalance, such as automatically adjusting class weights and using stratified sampling during training. This allows the model to learn from the minority class without requiring manual data preprocessing, making it the most effective and simplest approach for users with limited ML expertise.

Exam trap

The trap here is that candidates may assume manual resampling (downsampling or oversampling) is always required for imbalanced datasets, but AutoML Tables abstracts this complexity, and the exam tests whether you trust its built-in capabilities for low-code solutions.

How to eliminate wrong answers

Option B is wrong because downsampling the majority class would discard valuable data, potentially reducing model performance and losing information about normal operating conditions. Option C is wrong because AutoML Tables does not expose a custom loss function configuration; it abstracts away such hyperparameters and uses its own optimized training pipeline. Option D is wrong because oversampling the minority class using SQL before training is unnecessary and could lead to overfitting or data leakage; AutoML Tables handles imbalance internally without manual intervention.

Full explanation →

390

MCQhard

A company wants to use low-code ML for time series forecasting with 5 years of hourly data. They need to incorporate holiday effects. Which solution best meets these requirements?

A.Custom LSTM model

B.BigQuery ML ARIMA_PLUS with holiday regression

C.Vertex AI AutoML Tables with timestamp and holiday features

D.Vertex AI AutoML Forecasting with timestamp and holiday feature

AnswerB

ARIMA_PLUS directly supports holiday effects in its model.

Why this answer

BigQuery ML ARIMA_PLUS with holiday regression is the correct choice because it is a low-code solution that natively supports time series forecasting with built-in holiday effect modeling. ARIMA_PLUS automatically handles seasonality, trend, and holiday regression without requiring custom code, making it ideal for 5 years of hourly data.

Exam trap

Google Cloud often tests the distinction between AutoML Forecasting and BigQuery ML ARIMA_PLUS, where candidates mistakenly assume AutoML Forecasting natively handles holiday regression, but it requires explicit feature engineering, while ARIMA_PLUS provides built-in holiday support.

How to eliminate wrong answers

Option A is wrong because a custom LSTM model requires significant coding and ML expertise, violating the low-code requirement. Option C is wrong because Vertex AI AutoML Tables is designed for tabular data and does not natively support time series forecasting with holiday effects; it would require manual feature engineering. Option D is wrong because Vertex AI AutoML Forecasting does not natively incorporate holiday regression; it focuses on time series features but lacks built-in holiday effect handling, requiring additional preprocessing.

Full explanation →

391

MCQhard

Your team deploys a multi-model endpoint on Vertex AI with two models: Model A (small, low latency) and Model B (large, high latency). You configure traffic splitting so that 90% goes to Model A and 10% to Model B. However, you notice that the latency for Model A increases when Model B receives traffic. What is the most likely cause?

A.Model A is being overloaded because autoscaling is based on aggregate traffic.

B.The traffic split is misconfigured, causing requests to be routed incorrectly.

C.The models are collocated on the same instances, leading to resource contention.

D.Model B's logging is generating too much output, slowing down the predictor.

AnswerC

Multi-model endpoints share replicas; Model B's work impacts Model A.

Why this answer

In a multi-model endpoint, all models share the underlying infrastructure. When Model B handles requests, it consumes resources (CPU/memory), causing contention that degrades Model A's latency. Collocation of models on the same instance is the issue.

Full explanation →

392

MCQeasy

A startup is deploying its first machine learning model using BigQuery ML. The model is a logistic regression for churn prediction, trained on a dataset of 5 million rows. The pipeline runs every week: it exports training data from BigQuery, trains a model using BigQuery ML, and then deploys the model as a remote model for predictions. The ML engineer wants to set up basic monitoring to ensure the pipeline runs successfully and the model quality does not degrade. Which monitoring approach should the engineer implement first?

A.Set up Cloud Monitoring alerts on the pipeline's execution status and duration, and create a simple dashboard showing these metrics.

B.Export BigQuery audit logs to Cloud Logging and analyze them for any errors.

C.Enable Vertex AI Model Monitoring to detect data drift between training and serving data.

D.Monitor the model's area under the ROC curve (AUC) over time and alert if it drops by more than 0.01.

AnswerA

Fundamental monitoring ensures pipeline runs successfully.

Why this answer

Option A is correct because the first priority in monitoring a new ML pipeline is ensuring it runs successfully and on time. Cloud Monitoring alerts on execution status and duration directly address pipeline reliability, which is the most basic operational concern before model quality metrics like AUC or drift can be meaningful. This approach aligns with the principle of starting with infrastructure health before advanced model monitoring.

Exam trap

Google Cloud often tests the principle of 'start with the basics' — candidates are tempted to jump to advanced monitoring like drift or AUC, but the correct first step is ensuring the pipeline runs reliably.

How to eliminate wrong answers

Option B is wrong because exporting BigQuery audit logs to Cloud Logging and analyzing them for errors is a reactive, post-hoc approach that does not provide real-time pipeline monitoring; it also adds complexity without addressing the immediate need for basic pipeline health checks. Option C is wrong because Vertex AI Model Monitoring for data drift is an advanced monitoring technique that requires a stable serving environment and baseline data, which is premature for a first deployment; it also incurs additional cost and setup time. Option D is wrong because monitoring AUC over time and alerting on a drop of 0.01 assumes the model is already in production with a baseline, but the question asks for the first monitoring step, which should be pipeline execution success, not model performance degradation.

Full explanation →

393

MCQhard

Your model serving endpoint on Vertex AI is experiencing increased memory usage after a recent update. The model was converted from TensorFlow to TF Lite for faster inference. You notice that the endpoint's instances occasionally get killed due to out-of-memory (OOM) errors. What is the most likely cause?

A.The TF Lite model is larger in size than the original model.

B.The Vertex AI endpoint is not configured with enough CPU.

C.The number of inference threads in the TF Lite runtime is set too high, causing memory consumption.

D.The traffic to the endpoint has increased significantly.

AnswerC

TF Lite can use multiple threads; excessive threads increase memory.

Why this answer

TF Lite models can have different memory footprint depending on the number of threads used for inference. If the custom container or the runtime allocates many threads, memory usage can spike. The model conversion itself may not reduce memory; thread count is a key factor.

Full explanation →

394

Drag & Dropmedium

Drag and drop the steps to set up model monitoring for drift detection on Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Deploy first, then enable monitoring, set thresholds, configure alerts, and review.

Full explanation →

395

Drag & Dropmedium

Drag and drop the steps to perform a hyperparameter tuning job on Vertex AI in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Define the search space, then create and run the tuning job, monitor, and select the best parameters.

Full explanation →

396

Multi-Selecthard

Which TWO factors should you consider when choosing between BigQuery and Cloud Storage for storing training data? (Choose 2)

Select 2 answers

A.The format of the data: structured vs. unstructured.

B.The need for SQL-based transformations and analysis on the data.

C.The requirement for data encryption at rest.

D.The need for fine-grained access control at the row level.

E.The maximum size of the dataset (BigQuery limit 1 TB).

AnswersA, B

Correct: Cloud Storage is better for unstructured data.

Why this answer

Option A is correct because BigQuery is optimized for structured, tabular data (e.g., CSV, Avro, Parquet) and supports SQL queries, while Cloud Storage is a better fit for unstructured data (e.g., images, videos, raw text files) that does not require schema enforcement. Choosing the right storage depends on whether the training data has a fixed schema and requires relational querying or is blob-based and needs high-throughput access.

Exam trap

Google Cloud often tests the misconception that BigQuery has a hard 1 TB storage limit, when in reality the limit is much higher (default 10 TB for free, and no hard cap for paid tiers), leading candidates to incorrectly choose option E.

Full explanation →

397

MCQmedium

A machine learning team has a prototype using a custom TensorFlow model trained on a small dataset stored in Cloud Storage. They want to scale the prototype to production with minimal code changes while ensuring the model can handle increased traffic and new data. The model currently loads data using tf.data.Dataset from CSV files. Which approach best meets these requirements?

A.Use Vertex AI Training with hyperparameter tuning and distributed training, then deploy the model to Vertex AI Prediction with autoscaling.

B.Deploy the model to AI Platform (Unified) Prediction with a custom container, and use AI Platform Training to retrain on larger datasets.

C.Migrate the model to BigQuery ML and use SQL for training and prediction to leverage BigQuery's scalability.

D.Package the model as a Cloud Run Function and use Cloud Scheduler to trigger retraining periodically.

AnswerA

Vertex AI provides seamless scaling with minimal code changes and supports tf.data.Dataset.

Why this answer

Vertex AI Prediction with autoscaling directly addresses the need to handle increased traffic without code changes, while Vertex AI Training with hyperparameter tuning and distributed training enables scaling to larger datasets with minimal modifications to the existing tf.data pipeline. This approach keeps the custom TensorFlow model intact and leverages managed infrastructure for both training and serving.

Exam trap

The trap here is that candidates may overcomplicate by choosing containerization (B) or a completely different platform (C), missing that Vertex AI Prediction natively supports TensorFlow models with autoscaling and minimal code changes.

How to eliminate wrong answers

Option B is wrong because it suggests using AI Platform (Unified) Prediction with a custom container, which is unnecessary and adds complexity; the existing model can be deployed directly without containerization, and the requirement is minimal code changes. Option C is wrong because migrating to BigQuery ML would require rewriting the model logic from TensorFlow to SQL, which is a significant code change and not suitable for a custom TensorFlow model. Option D is wrong because Cloud Run Functions are stateless and not designed for serving ML models with autoscaling for prediction traffic; Cloud Scheduler for retraining does not address the need for handling increased traffic or new data in a production serving path.

Full explanation →

398

MCQeasy

A company is serving a model for their e-commerce website. They expect traffic to be low at night and very high during flash sales. They want to minimize costs while ensuring availability during spikes. Which autoscaling configuration should they use?

A.min_replica_count=5, max_replica_count=5, target_cpu=60

B.min_replica_count=1, max_replica_count=20, target_cpu=60

C.min_replica_count=10, max_replica_count=10, target_cpu=60

D.min_replica_count=0, max_replica_count=100, target_cpu=80

AnswerB

Scales from 1 to 20 based on load, cost-efficient.

Why this answer

Setting a high max_replica_count allows scaling to handle spikes, while a low min_replica_count saves cost during low traffic. CPU utilization target of 60% is reasonable.

Full explanation →

399

MCQeasy

A company has a prototype ML model that works well on historical data, but when deployed to production, the model performance degrades over time. The data distribution shifts gradually. Which strategy should they implement to maintain model accuracy?

A.Increase the regularization strength to prevent overfitting.

B.Increase the amount of training data by using more historical records.

C.Implement a retraining pipeline that periodically retrains the model on recent data.

D.Switch to a more complex model architecture to better capture patterns.

AnswerC

Periodic retraining with fresh data helps the model adapt to gradual distribution shifts.

Why this answer

Option C is correct because gradual data distribution shifts (concept drift) require the model to adapt to new patterns over time. A retraining pipeline that periodically retrains on recent data ensures the model remains aligned with the current production distribution, directly addressing the degradation caused by drift without relying on static historical data.

Exam trap

Google Cloud often tests the misconception that overfitting or model complexity is the primary cause of production degradation, leading candidates to choose regularization or more complex architectures instead of recognizing that distribution shift requires data freshness.

How to eliminate wrong answers

Option A is wrong because increasing regularization strength reduces overfitting to historical noise but does not address the root cause—distribution shift—and may actually harm performance on new data by forcing the model to ignore legitimate new patterns. Option B is wrong because adding more historical records only reinforces the old distribution, making the model less responsive to recent shifts and potentially worsening drift. Option D is wrong because switching to a more complex model architecture increases capacity to fit data but does not solve the problem of stale training distribution; it may even overfit to outdated patterns and degrade faster under drift.

Full explanation →

400

MCQeasy

A company has deployed a fraud detection model on Vertex AI Prediction. After three months, the model's accuracy has degraded, and the business is losing money due to undetected fraud. What should the team implement to proactively detect such issues?

A.Enable Vertex AI Model Monitoring to track prediction drift and alert when metrics exceed thresholds.

B.Set up Cloud Logging to capture all prediction requests and responses for manual review.

C.Randomly shuffle the training data before retraining to improve robustness.

D.Schedule a monthly job to retrain the model with the latest data without monitoring.

AnswerA

Model Monitoring automatically analyzes input distributions and prediction quality over time.

Why this answer

Option B is correct because monitoring prediction drift is a key practice for model quality. Option A is wrong because logs don't automatically detect drift. Option C is wrong because model monitoring helps, but retraining alone doesn't detect.

Option D is wrong because shuffling data doesn't address drift.

Full explanation →

401

Matchingmedium

Match each ML acronym to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Area Under the ROC Curve

Mean Squared Error

Tensor Processing Unit

Support Vector Machine

Principal Component Analysis

Why these pairings

These are standard ML acronyms used in Google Cloud ML exams.

Full explanation →

402

MCQeasy

A global retail company uses Vertex AI Recommendations to provide product recommendations on their website. They have a large catalog and millions of users. The initial deployment works well for active users, but they notice that new users (with no purchase history) receive generic recommendations that are not personalized. The company wants to improve the cold-start experience. They have user demographic data (age, location) available at sign-up. Current recommendation model is a collaborative filtering model using the built-in Vertex AI Recommendations. What should the company do to improve personalization for new users?

A.Collect more historical interaction data before showing recommendations

B.Disable recommendations for new users until they have at least 10 interactions

C.Increase the user exploration parameter in the Vertex AI Recommendations configuration

D.Build a custom two-tower recommendation model using Vertex AI Training

AnswerC

Exploration helps serve diverse items to new users to learn preferences.

Why this answer

Option C is correct because increasing the user exploration parameter in Vertex AI Recommendations instructs the model to allocate a higher percentage of recommendations to items with less historical data, effectively enabling personalized suggestions for cold-start users based on available demographic signals. This parameter directly controls the balance between exploiting known user-item interactions and exploring new or less-seen items, which is the standard mechanism within Vertex AI's built-in collaborative filtering to address the cold-start problem without requiring a custom model.

Exam trap

Google Cloud often tests the misconception that cold-start problems always require custom models or additional data collection, when in fact built-in platform features like exploration parameters are designed specifically to handle this scenario without custom development.

How to eliminate wrong answers

Option A is wrong because collecting more historical interaction data before showing recommendations does not solve the immediate cold-start problem for new users; it merely delays personalization and contradicts the goal of improving the experience from sign-up. Option B is wrong because disabling recommendations for new users until they have at least 10 interactions is a poor user experience and ignores the fact that Vertex AI Recommendations can leverage user demographic data (age, location) to provide personalized suggestions even without purchase history. Option D is wrong because building a custom two-tower recommendation model using Vertex AI Training is unnecessary and over-engineered; Vertex AI's built-in service already supports exploration parameters and can utilize demographic features for cold-start personalization without requiring custom model development.

Full explanation →

403

Matchingmedium

Match each ML model interpretability method to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Game-theoretic approach to explain feature contributions

Local surrogate model to explain individual predictions

Ranking features by their impact on model output

Shows marginal effect of a feature on predictions

Measures decrease in performance when feature is shuffled

Why these pairings

Interpretability is key for trustworthy ML.

Full explanation →

404

MCQmedium

Refer to the exhibit. A user receives the error shown when trying to upload a model to Vertex AI. What is the most likely cause?

A.The container image 'gcr.io/cloud-aiplatform/prediction/tf2-cpu.2-12:latest' is not accessible.

B.The user does not have the 'roles/aiplatform.admin' or the 'aiplatform.models.upload' permission on the project.

C.The user specified an incorrect region (us-central1) that does not support Vertex AI.

D.The Cloud Storage bucket 'gs://my-model-artifacts/fraud-detection/v2/' does not exist.

AnswerB

Permission denied errors typically indicate missing IAM roles.

Why this answer

The error message indicates a permission issue during model upload. The user lacks the 'aiplatform.models.upload' permission or the broader 'roles/aiplatform.admin' role on the project. Vertex AI requires these IAM permissions to authorize the upload action, regardless of other resource accessibility.

Exam trap

Google Cloud often tests the distinction between permission errors and resource availability errors, trapping candidates who assume the error is due to a missing bucket or container image rather than IAM misconfiguration.

How to eliminate wrong answers

Option A is wrong because if the container image were inaccessible, the error would typically occur during deployment or prediction, not during the upload step, and the error message would reference image pull failures (e.g., 'ImagePullBackOff'). Option C is wrong because us-central1 is a fully supported region for Vertex AI; the error does not mention region unavailability. Option D is wrong because if the Cloud Storage bucket did not exist, the error would be a 404 or 'bucket not found' message, not a permission-denied error.

Full explanation →

405

MCQhard

Refer to the exhibit. A user attempts to upload a model to Vertex AI Model Registry using the gcloud CLI. The command fails with the error shown. What is the most likely cause?

A.The region us-central1 does not support Vertex AI

B.The --container-command flag is misspelled

C.The --artifact-uri points to a directory instead of a model file

D.The --container-ports flag expects a comma-separated list

AnswerC

Error indicates URI must point to a single file.

Why this answer

The error indicates that the `--artifact-uri` flag points to a directory (e.g., `gs://bucket/model/`) rather than a specific model file (e.g., `gs://bucket/model/saved_model.pb`). Vertex AI Model Registry requires a direct path to the model artifact file, not a container directory, because the service needs to locate and register the exact model binary for deployment.

Exam trap

Google Cloud often tests the distinction between a directory path and a file path in cloud CLI commands, exploiting the common mistake of assuming a folder URI is acceptable when the service expects a specific artifact file.

How to eliminate wrong answers

Option A is wrong because us-central1 is a fully supported region for Vertex AI, including Model Registry, and the error message does not indicate a regional restriction. Option B is wrong because the `--container-command` flag is correctly spelled in the command; the error is unrelated to flag spelling. Option D is wrong because the `--container-ports` flag does accept a comma-separated list, but the error message points to the `--artifact-uri` value, not to the ports flag.

Full explanation →

406

MCQhard

A team deploys a real-time model using a custom container on Vertex AI Prediction. The container is large (5 GB) and cold starts are causing latency spikes. The endpoint is configured with `min_replica_count=0` to reduce cost. The team wants to keep the cost low while reducing cold starts. What is the best approach?

A.Set `min_replica_count=1` to keep at least one replica always warm.

B.Use a prebuilt container for the model framework to reduce image size.

C.Enable container memory optimization to reduce startup time.

D.Provision a Persistent Disk (SSD) for the container image to speed up download.

AnswerA

A single warm replica handles traffic immediately while autoscaling adds more.

Why this answer

Option B is correct because configuring a minimum number of always-on replicas (e.g., 1) eliminates cold starts for most traffic. Option A is wrong because it may not help if container is large. Option C is wrong because prebuilding images doesn't reduce cold start startup overhead.

Option D is wrong because SSD can help but not eliminate cold start latency.

Full explanation →

407

Multi-Selecteasy

A data analyst wants to use low-code ML to analyze text data. Which TWO Google Cloud services are appropriate?

Select 2 answers

A.Vertex AI Workbench

B.Document AI

C.Cloud Natural Language API

D.AutoML Natural Language

E.BigQuery ML for sentiment

AnswersC, D

Correct: Pre-trained sentiment and entity analysis via API.

Why this answer

Cloud Natural Language API is a low-code ML service that provides pre-trained models for analyzing text, including sentiment analysis, entity recognition, and syntax analysis, without requiring custom model training. It is appropriate for a data analyst who wants to quickly extract insights from text data using simple API calls.

Exam trap

The trap here is that candidates may confuse BigQuery ML's sentiment analysis feature (which is SQL-based and not a dedicated low-code service) with a standalone low-code ML service, or mistakenly think Vertex AI Workbench is low-code when it actually requires coding in Python or other languages.

Full explanation →

408

MCQeasy

A machine learning engineer wants to manage multiple model versions and facilitate collaboration across teams. The goal is to track model lineage, versioning, and approvals. Which Vertex AI service should they use?

A.Vertex AI Model Registry

B.Vertex AI ML Metadata

C.Vertex AI Feature Store

D.Vertex AI Vizier

AnswerA

Model Registry is designed for model versioning, lifecycle management, and collaboration.

Why this answer

Option C is correct because Model Registry provides versioning, approval tracking, and integration with Vertex AI Pipelines. Option A is wrong because Feature Store stores features, not models. Option B is wrong because ML Metadata is lower-level and less user-friendly.

Option D is wrong because Vizier is for hyperparameter tuning.

Full explanation →

409

MCQmedium

You run the above command to deploy a new model version to an existing endpoint. After deployment, you observe that the endpoint's previous model version is still receiving 100% of traffic. What is the most likely reason for this?

A.The new model is still in the 'creating' state and hasn't been activated.

B.The model ID provided does not exist in the endpoint.

C.The --traffic-split flag is specified incorrectly; it should use model IDs, not '0-100'.

D.The min-replica-count is too high, preventing traffic splitting.

AnswerC

Correct syntax requires model IDs with percentages.

Why this answer

The traffic-split flag syntax is incorrect. The correct syntax for Vertex AI is --traffic-split=<model-id>=<percentage> for each model. Without correct model IDs, the flag is ignored, and no traffic split is applied, so the existing version continues to receive all traffic.

Full explanation →

410

MCQmedium

A developer sees this error when calling the endpoint. What is the most likely cause?

A.The model is still in training

B.The model is deployed but not yet serving

C.The endpoint has no deployed model

D.The request payload size exceeds limit

AnswerB

Correct: Model deployment is still initializing.

Why this answer

The error 'model is not serving' occurs when the endpoint exists and a model is deployed, but the deployment is not yet in the 'serving' state (e.g., still loading, scaling, or warming up). In SageMaker, the endpoint must transition through 'Creating' and 'InService' before it can serve inference requests. Option B correctly identifies that the model is deployed but not yet ready to handle traffic.

Exam trap

Google Cloud often tests the distinction between 'no model deployed' and 'model deployed but not serving', where candidates confuse a deployment that exists but is not yet ready with a missing deployment.

How to eliminate wrong answers

Option A is wrong because if the model were still in training, the endpoint would not exist or would return a 'ModelNotFound' error, not a 'not serving' error. Option C is wrong because if the endpoint had no deployed model, the error would be 'NoSuchModel' or 'EndpointNotFound', not a serving state error. Option D is wrong because payload size limits (typically 5 MB for SageMaker real-time endpoints) cause a '413 Request Entity Too Large' or 'PayloadTooLarge' error, not a 'not serving' error.

Full explanation →

411

MCQhard

A company deploys a training pipeline on Vertex AI using custom containers. The pipeline includes a hyperparameter tuning job that uses Bayesian optimization. After several runs, they observe that the tuning job is not converging and the search space is large. They want to reduce the number of trials while still finding good hyperparameters. Which strategy should they use?

A.Increase the number of parallel trials to explore more points simultaneously.

B.Use Grid search instead of Bayesian optimization to systematically cover the search space.

C.Implement early stopping by using the 'early_stopping' flag in the hyperparameter tuning job.

D.Reduce the search space by applying feature selection and using prior knowledge.

AnswerD

A smaller search space requires fewer trials to find good hyperparameters.

Why this answer

Option D is correct because reducing the search space using prior knowledge directly decreases the number of trials needed. Option A is wrong because increasing parallel trials does not reduce the total number of trials. Option B is wrong because grid search generally requires more trials than Bayesian optimization.

Option C is wrong because early stopping reduces time per trial but does not reduce the number of trials.

Full explanation →

412

Multi-Selecthard

Which THREE of the following are recommended practices for model governance and lineage in Vertex AI?

Select 3 answers

A.Enable Vertex AI ML Metadata to track artifacts, executions, and contexts.

B.Use Vertex AI Experiments to log parameters and metrics.

C.Store model artifacts in Cloud Storage with metadata in a database.

D.Manually record model lineage in a spreadsheet.

E.Use Vertex AI Model Registry to manage model versions and stages.

AnswersA, B, E

ML Metadata provides automated lineage tracking.

Why this answer

Vertex AI ML Metadata is a fully managed service that automatically tracks artifacts, executions, and contexts across the ML workflow. By enabling it, you create a lineage graph that records every step from data preparation to model deployment, which is essential for auditability and reproducibility. This is a core recommended practice for model governance because it provides an immutable, queryable history of all model-related activities.

Exam trap

Google Cloud often tests the distinction between using native Vertex AI services (like ML Metadata, Experiments, and Model Registry) versus ad-hoc or manual methods (like spreadsheets or custom databases) that lack automated governance and audit trails.

Full explanation →

413

Multi-Selecteasy

Which TWO options are best practices for building ML pipelines on Vertex AI?

Select 2 answers

A.Use Cloud Functions to execute individual pipeline steps

B.Hardcode pipeline parameters in the component definitions

C.Use custom container components to encapsulate reusable logic

D.Always use the same compute environment for training and serving to ensure consistency

E.Leverage Vertex ML Metadata to track artifact lineage

AnswersC, E

Reusable components allow sharing across pipelines and reduce duplication.

Why this answer

Option C is correct because custom container components allow you to encapsulate reusable logic with specific dependencies, libraries, and environments, enabling consistent execution across pipeline steps. This is a best practice for building modular, maintainable ML pipelines on Vertex AI, as it decouples step logic from the pipeline orchestration and supports versioning and testing.

Exam trap

Google Cloud often tests the misconception that serverless functions like Cloud Functions are suitable for ML pipeline steps, but the trap is that ML steps require persistent state, longer timeouts, and specialized hardware, which Cloud Functions cannot provide.

Full explanation →

414

MCQeasy

A startup wants to deploy a small machine learning model for real-time predictions but has a very limited budget. Traffic is minimal and predictable. They want to avoid paying for idle resources. Which serving option is most cost-effective?

A.Deploy the model on a single Compute Engine VM with a GPU.

B.Use Vertex AI Batch Prediction for each prediction request.

C.Deploy the model as a Cloud Run service using a custom container.

D.Deploy the model to Vertex AI Endpoint with min_replica_count=0.

AnswerC

Cloud Run scales to zero and charges only when serving requests.

Why this answer

Option B is correct because Cloud Run with a custom container can scale to zero when idle, incurring no cost when not in use. Option A is wrong because Vertex AI Endpoint requires at least one replica (min_replica_count >= 1). Option C is wrong because batch prediction is not real-time.

Option D is wrong because deploying on a Compute Engine VM requires 24/7 cost even when idle.

Full explanation →

415

MCQhard

A model serving team notices that during a flash sale, a real-time recommendation model experiences sudden spikes in traffic, causing some requests to time out. The endpoint is configured with `min_replica_count=3`, `max_replica_count=10`, and autoscaling metric set to `target_utilization=0.6` on CPU. Despite this, autoscaling is too slow. What change will most improve the autoscaling responsiveness?

A.Add a custom metric based on GPU utilization, assuming the model uses GPU.

B.Increase the target CPU utilization to 0.8 to reduce the number of replicas and save cost.

C.Reduce `min_replica_count` to 1 to allow more aggressive scaling.

D.Change the autoscaling metric to 'average request count per replica' with an appropriate target.

AnswerD

Request count directly reflects load and scales more quickly than CPU.

Why this answer

Option A is correct because using request count per replica (transactions per second) as a direct measure of load triggers autoscaling faster. Option B is wrong because increasing target utilization makes it slower. Option C is wrong because GPU metrics are only relevant for GPU models.

Option D is wrong because reducing min replicas may cause underprovisioning.

Full explanation →

416

Multi-Selecthard

Which TWO are best practices for implementing a low-code ML solution using Vertex AI AutoML? (Choose 2)

Select 2 answers

A.Use the AutoML recommended data split (train/validation/test) to avoid overfitting.

B.Impute missing values manually before uploading the dataset.

C.Normalize numerical features to zero mean and unit variance.

D.Enable automatic feature engineering by leaving feature columns as raw data.

E.Export the data and train a custom model with a different architecture.

AnswersA, D

Why A is correct: AutoML optimizes split for best performance.

Why this answer

Option A is correct because AutoML's recommended data split (train/validation/test) is designed to prevent overfitting by ensuring the model is evaluated on unseen data. AutoML automatically handles the split ratio (e.g., 80/10/10) and stratification, which is a best practice for low-code ML solutions where manual split logic is error-prone.

Exam trap

Google Cloud often tests the misconception that manual preprocessing (like imputation or normalization) is required for AutoML, when in fact AutoML is designed to handle these steps automatically, and manual intervention can degrade performance or cause errors.

Full explanation →

417

Multi-Selecthard

An e-commerce company uses a recommendation model that suggests products based on user browsing history. The model was trained on data from the past year and has high accuracy on the test set. However, after deployment, the click-through rate (CTR) on recommendations is much lower than expected. Which three steps should the data scientist take to diagnose and improve the model? (Choose THREE)

Select 3 answers

A.Run offline evaluation on a holdout dataset to confirm accuracy

B.Set up an A/B experiment comparing the model's recommendations against a baseline

C.Retrain the model on the most recent three months of data to capture recent trends

D.Check the distribution of predictions versus the training set to detect drift

E.Increase the training dataset size by including data from two years ago

AnswersB, C, D

A/B testing validates the model's real-world performance and identifies issues.

Why this answer

Option B is correct because an A/B experiment directly measures the model's real-world impact by comparing its CTR against a baseline (e.g., random or popularity-based recommendations). This isolates the model's performance from confounding factors like seasonality or user behavior changes, providing a causal estimate of its effectiveness.

Exam trap

Google Cloud often tests the misconception that high offline accuracy guarantees online success, ignoring that offline metrics can be misleading due to distribution shift, feedback loops, or mismatched optimization objectives (e.g., accuracy vs. CTR).

Full explanation →

418

MCQmedium

Refer to the exhibit. A data scientist runs the above BigQuery ML query to create a logistic regression model. After training, the model is evaluated using ML.EVALUATE. The evaluation shows poor performance with high bias. Which action would most likely improve the model's performance?

A.Remove the TRANSFORM clause and use raw features.

B.Change the model_type to 'linear_reg'.

C.Add more complex features by including polynomial expansions.

D.Increase the number of training iterations by setting MAX_ITERATIONS.

AnswerC

Polynomial expansions increase model complexity, allowing it to learn non-linear patterns from the data, which addresses high bias.

Why this answer

High bias indicates the model is underfitting the data, meaning it is too simple to capture underlying patterns. Adding polynomial expansions (feature crosses) in the TRANSFORM clause increases model complexity, allowing the logistic regression to learn non-linear decision boundaries, which directly addresses underfitting.

Exam trap

Google Cloud often tests the distinction between bias and variance; the trap here is that candidates might confuse high bias (underfitting) with high variance (overfitting) and incorrectly choose to simplify the model or increase iterations, rather than adding complexity.

How to eliminate wrong answers

Option A is wrong because removing the TRANSFORM clause would discard any feature preprocessing, likely making the model even simpler and worsening high bias. Option B is wrong because changing model_type to 'linear_reg' would switch to a regression task, which is inappropriate for classification and does not address bias in a logistic regression model. Option D is wrong because increasing MAX_ITERATIONS only affects convergence of the optimization algorithm; if the model is too simple (high bias), more iterations will not help it learn more complex patterns.

Full explanation →

419

MCQeasy

You have deployed a text classification model using Vertex AI Endpoints. The model is performing well, but the operations team wants to be alerted if the endpoint returns an excessive number of HTTP 503 errors. What is the simplest way to achieve this?

A.Configure a Cloud Monitoring uptime check on the endpoint URL.

B.Create a Cloud Monitoring alert based on the metric 'prediction/failed_request_count' with a condition on 5xx errors.

C.Add a logging statement in the custom prediction routine to count errors manually.

D.Export Cloud Logging to BigQuery and run a scheduled query for 503s.

AnswerB

Built-in metric directly reflects HTTP errors.

Why this answer

Option B is correct because Vertex AI Endpoints automatically export the 'prediction/failed_request_count' metric to Cloud Monitoring, which includes a label for HTTP status codes. By creating an alert on this metric with a filter for 5xx errors, you can directly monitor excessive 503 responses without additional infrastructure or custom code.

Exam trap

The trap here is that candidates often confuse uptime checks (which measure availability from external probes) with metric-based alerts (which track internal error counts), leading them to choose Option A despite its inability to specifically detect 503 errors.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring uptime checks test endpoint availability from external locations, but they cannot distinguish between 503 errors and other HTTP statuses; they only report overall uptime/downtime, not specific error counts. Option C is wrong because adding a logging statement in a custom prediction routine requires modifying the deployment code and does not leverage the built-in metrics already available in Vertex AI, making it unnecessarily complex and not the simplest approach. Option D is wrong because exporting logs to BigQuery and running scheduled queries introduces significant latency, cost, and operational overhead compared to using the native Cloud Monitoring alert, which provides real-time detection with minimal configuration.

Full explanation →

420

MCQmedium

A team deploys a model on Vertex AI that uses a custom prediction routine (CPR) with a dependency on a native library. The container crashes with 'ImportError: libcudart.so.11.0: cannot open shared object file'. How should they resolve this?

A.Build a custom container image that includes the CUDA runtime library.

B.Submit the model for batch prediction to avoid the error.

C.Request a GPU machine type for the endpoint.

D.Use a Vertex AI pre-built container for PyTorch instead.

AnswerA

Ensures the library is available.

Why this answer

The error 'ImportError: libcudart.so.11.0: cannot open shared object file' indicates that the CUDA runtime library (version 11.0) is missing from the container environment. Since the custom prediction routine (CPR) depends on a native library that requires this CUDA runtime, the correct solution is to build a custom container image that includes the CUDA runtime library. This ensures the shared object is available at runtime, resolving the import error.

Exam trap

Google Cloud often tests the misconception that requesting a GPU machine type automatically provides the necessary CUDA libraries, but in reality, the CUDA runtime must be explicitly included in the container image, as the GPU machine type only provides the hardware and driver, not the user-space libraries.

How to eliminate wrong answers

Option B is wrong because submitting the model for batch prediction does not change the container environment; the same missing CUDA runtime library will cause the same ImportError during batch prediction. Option C is wrong because requesting a GPU machine type for the endpoint provides GPU hardware but does not install the CUDA runtime library into the container; the library must be present in the container image regardless of the underlying hardware. Option D is wrong because using a Vertex AI pre-built container for PyTorch does not guarantee inclusion of the specific CUDA runtime version 11.0 required by the native library; the pre-built container may have a different CUDA version or omit the library entirely.

Full explanation →

421

MCQeasy

A data scientist runs a BigQuery ML prediction query and gets a region mismatch error. The model is in the US region, but the new_data table is in the EU region. What is the simplest way to resolve this?

A.Recreate the model in the EU region using the same training data

B.Copy the new_data table to the US region using the BigQuery UI or CLI

C.Enable cross-region query in BigQuery settings

D.Export the model from US and import it to EU

AnswerB

Copying the table to the same region resolves the mismatch with minimal effort.

Why this answer

Option B is correct because the simplest fix is to move the new_data table to the same region as the model (US). BigQuery ML requires that the model and the data used for predictions reside in the same multi-region or regional location. Copying the table via the BigQuery UI or CLI (e.g., `bq cp`) is a straightforward, no-code operation that avoids retraining or exporting the model.

Exam trap

The trap here is that candidates may overthink the solution and choose to recreate the model or export/import it, not realizing that the simplest and most efficient fix is to copy the data table to the model's region.

How to eliminate wrong answers

Option A is wrong because recreating the model in the EU region would require retraining the model from scratch, which is unnecessary and time-consuming when a simple data copy resolves the mismatch. Option C is wrong because BigQuery does not support a 'cross-region query' setting; queries are always restricted to a single region or multi-region, and enabling such a feature is not possible. Option D is wrong because exporting and importing a model between regions is more complex and involves additional steps (e.g., using Cloud Storage as an intermediary), whereas copying the table is simpler and directly addresses the region mismatch.

Full explanation →

422

MCQeasy

You are using Vertex AI Training to train a model and then automatically deploy the best candidate to a Vertex AI Prediction endpoint via the Vertex AI Model Registry. However, after deployment, you notice that the endpoint returns predictions for the new model, but they are significantly different from the evaluation metrics computed during training. The training scripts used TensorFlow with a serving input function. What is the most likely issue and how would you fix it?

A.The endpoint is using a different machine type affecting numerical precision; you should use the same machine type as training.

B.The serving input function's preprocessing steps do not match the training preprocessing; you should verify and align them.

C.The model registry deployed a different version; you should check the alias.

D.The model was saved with training-only metrics; you should retrain with evaluation metrics.

AnswerB

Preprocessing mismatch is a common cause for prediction discrepancies.

Why this answer

Option B is correct because the serving input function's preprocessing must match training preprocessing exactly; any mismatch causes prediction errors. Option A is wrong because the model saved includes evaluation metrics. Option C is possible but less likely given the consistency of difference.

Option D is unlikely as numerical precision differences are minimal.

Full explanation →

423

MCQhard

Refer to the exhibit. A data analyst creates a BigQuery ML logistic regression model for churn prediction. The model evaluation shows high precision but low recall. Which change to the model creation would most likely improve recall?

A.Drop more columns to reduce overfitting.

B.Increase the training data by including customers without churn dates.

C.Use ML.ADJUST_THRESHOLD to lower the classification threshold.

D.Change model_type to 'BOOSTED_TREE_CLASSIFIER'.

AnswerC

Why C is correct: Lowering threshold increases sensitivity, improving recall.

Why this answer

Option C is correct because lowering the classification threshold (e.g., from 0.5 to 0.3) will classify more customers as positive (churn), increasing recall (true positives / (true positives + false negatives)). In BigQuery ML, ML.ADJUST_THRESHOLD directly modifies the decision boundary, trading off precision for recall. This is the most direct way to address low recall without altering the model architecture or training data.

Exam trap

Google Cloud often tests the misconception that changing the model type (e.g., to boosted trees) is the default solution for any performance metric issue, when in fact the threshold adjustment is the simplest and most direct way to trade off precision and recall in a logistic regression model.

How to eliminate wrong answers

Option A is wrong because dropping more columns to reduce overfitting would likely harm recall further by removing potentially informative features, and overfitting typically causes high variance, not low recall. Option B is wrong because including customers without churn dates (non-churners) would increase the class imbalance, making the model even more biased toward the majority class and likely reducing recall further. Option D is wrong because changing model_type to 'BOOSTED_TREE_CLASSIFIER' might improve overall performance but does not specifically target the recall issue; it is a model architecture change that could also reduce recall if the class imbalance is not addressed, and it is not the most direct fix for a threshold-related precision-recall trade-off.

Full explanation →

424

MCQhard

Your team has deployed a text classification model on Vertex AI Endpoints. You notice that the model's latency has increased significantly over the last week, but the request rate has remained stable. Which of the following is the most likely cause?

A.A sudden increase in the number of prediction requests

B.The model was replaced with a larger version without updating the endpoint

C.A change in the preprocessing logic that now includes a computationally expensive step

D.A misconfiguration in the autoscaling policy

AnswerC

This increases per-request latency without changing request rate.

Why this answer

A computationally expensive preprocessing step directly increases per-request latency on the inference path, even when request rate is stable. Vertex AI Endpoints execute user-provided preprocessing code before model inference, so adding a heavy operation (e.g., large regex, image resizing, or external API call) will linearly increase response time for every prediction.

Exam trap

The trap here is that candidates confuse 'model latency' with 'request rate' and assume any latency increase must be due to scaling issues, ignoring that preprocessing logic changes can dramatically affect per-request performance without altering throughput.

How to eliminate wrong answers

Option A is wrong because a sudden increase in request rate would cause latency to rise, but the question explicitly states request rate has remained stable. Option B is wrong because replacing the model with a larger version requires deploying a new model to the endpoint or updating the endpoint's deployed model; simply replacing the model binary without updating the endpoint's deployment configuration would not change the model served, so latency would not increase. Option D is wrong because a misconfiguration in autoscaling policy (e.g., too few min replicas) would cause latency to increase only when request rate exceeds the current serving capacity, but request rate is stable and autoscaling would have already scaled to match the stable load.

Full explanation →

425

MCQhard

Refer to the exhibit. A team uses this Cloud Build configuration to deploy a model to a Vertex AI endpoint. The build succeeds up to the 'upload' step, but the 'deploy-model' step fails with an error that the model 'my-model' does not exist. What is the most likely cause?

A.The deploy step uses the display name instead of the model resource ID

B.The model was not uploaded because the artifact URI is a directory, not a valid SavedModel

C.The Vertex AI API was not enabled for the project

D.The region in the deploy step does not match the model's region

AnswerB

The artifact URI must point to a specific model file or subdirectory, not a generic directory.

Why this answer

The 'deploy-model' step fails because the model was not successfully uploaded. Cloud Build's 'upload' step expects a valid SavedModel artifact (a directory containing a saved_model.pb file and variables subdirectory). If the artifact URI points to a directory that is not a valid SavedModel, the upload may appear to succeed but does not register a usable model resource, causing the subsequent deploy step to fail with 'model does not exist'.

Exam trap

Google Cloud often tests the distinction between a successful upload step and a valid model registration, trapping candidates who assume any directory upload creates a usable model resource.

How to eliminate wrong answers

Option A is wrong because the deploy step uses the model resource ID, not the display name; the error message explicitly says 'my-model' does not exist, indicating the model resource was never created. Option C is wrong because if the Vertex AI API were not enabled, the build would fail at the 'upload' step or earlier with an API enablement error, not specifically at the deploy step. Option D is wrong because region mismatch would cause a different error (e.g., 'model not found in region') or a permission error, but the error message states the model does not exist, implying it was never registered in any region.

Full explanation →

426

MCQeasy

A data scientist trained a model on historical data from 2020-2022 and deployed it in January 2023. In February 2023, the model's accuracy drops significantly. Which monitoring metric would most likely indicate the root cause?

A.Number of unique users calling the endpoint.

B.Prediction latency p99.

C.Number of missing feature values in requests.

D.Training-serving skew detected by Vertex AI Model Monitoring.

AnswerD

Skew indicates that serving data distribution differs from training data, likely causing accuracy drop.

Why this answer

Option D is correct because Vertex AI Model Monitoring specifically detects training-serving skew, which occurs when the distribution of input features at serving time differs from the training data distribution. Since the model was trained on 2020-2022 data and deployed in January 2023, a significant accuracy drop in February 2023 likely indicates that the real-world data distribution has shifted (e.g., seasonal patterns, new user behavior), causing the model to encounter unseen patterns. This skew is a common root cause of performance degradation and is directly monitored by Vertex AI's skew detection feature.

Exam trap

Google Cloud often tests the distinction between model performance metrics (accuracy, precision) and operational metrics (latency, throughput, user count), and the trap here is that candidates may confuse a drop in accuracy with a system-level issue like latency or missing values, rather than recognizing that accuracy degradation is most directly linked to data distribution shifts (skew).

How to eliminate wrong answers

Option A is wrong because the number of unique users calling the endpoint is a business metric, not a model performance metric; it does not directly indicate why accuracy dropped. Option B is wrong because prediction latency p99 measures response time, not prediction quality; high latency could degrade user experience but does not explain a drop in accuracy. Option C is wrong because missing feature values in requests would cause errors or fallback behavior, but the question states accuracy drops, not that predictions fail; missing values are typically handled by imputation or default values and would not necessarily cause a significant accuracy drop unless the model was not trained to handle them.

Full explanation →

427

Drag & Dropmedium

Drag and drop the steps to set up a BigQuery ML linear regression model for forecasting in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start by preparing data, then create the model, evaluate, and predict.

Full explanation →

428

Multi-Selectmedium

A company wants to reduce costs for serving a model on Vertex AI Prediction without sacrificing availability. Which THREE strategies should they consider?

Select 3 answers

A.Use larger machine types to reduce the number of replicas

B.Switch to HTTP/2 to reduce network overhead

C.Enable automatic batching to improve throughput per instance

D.Use CPU instead of GPU for models that can run on CPU

E.Use min replicas=0 and enable autoscaling

AnswersC, D, E

Batching increases efficiency, reducing number of instances needed.

Why this answer

Option C is correct because enabling automatic batching on Vertex AI Prediction allows the model server to group multiple inference requests into a single batch, which increases throughput per instance and reduces the total number of compute resources needed. This directly lowers serving costs without sacrificing availability, as the batching is handled transparently by the Vertex AI Prediction infrastructure.

Exam trap

Google Cloud often tests the misconception that reducing replicas with larger machines is cost-effective, but the trap here is that larger machines increase per-unit cost and can lead to idle capacity, whereas autoscaling with min replicas=0 and batching optimizes cost without sacrificing availability.

Full explanation →

429

Multi-Selecthard

Which THREE factors are critical when designing a model serving architecture for a global user base with strict latency SLAs? (Choose 3.)

Select 3 answers

A.Use batch prediction to process requests in bulk for efficiency.

B.Deploy the model in a single region to avoid data sovereignty issues.

C.Enable autoscaling with request-based metrics to handle traffic spikes.

D.Implement request caching for idempotent predictions when appropriate.

E.Use multi-region deployment with Vertex AI Endpoints in multiple locations.

AnswersC, D, E

Autoscaling ensures capacity matches demand.

Why this answer

Options B, C, and E are correct. Option A is wrong because single-region deployment cannot meet global latency. Option D is wrong because batch processing adds latency.

Full explanation →

430

MCQeasy

A data scientist has trained a scikit-learn model locally and wants to deploy it to Vertex AI for online predictions with low latency. The model is a small RandomForestClassifier (100 MB). What is the recommended way to deploy this model?

A.Deploy the model on a Kubernetes cluster with Istio.

B.Package the model as a Docker container with a custom prediction routine.

C.Upload the model to Vertex AI Model Registry using the pre-built scikit-learn serving container.

D.Export the model as a TensorFlow SavedModel and use the pre-built TF serving container.

AnswerC

Vertex AI offers a pre-built container for scikit-learn that handles prediction out of the box.

Why this answer

Option C is correct because Vertex AI provides a pre-built container for scikit-learn that is optimized for serving predictions with low latency. For a small RandomForestClassifier (100 MB), this container handles model loading, request routing, and scaling automatically, eliminating the need for custom infrastructure. This is the recommended approach for deploying scikit-learn models to Vertex AI for online predictions.

Exam trap

Google Cloud often tests the misconception that any model must be containerized or converted to TensorFlow for deployment, but the correct answer leverages the platform's pre-built container for the specific framework, which is the simplest and most efficient path for small models.

How to eliminate wrong answers

Option A is wrong because deploying on a Kubernetes cluster with Istio adds unnecessary operational complexity and overhead for a small model that can be served directly via Vertex AI's managed infrastructure; it is not the recommended path for a simple scikit-learn model. Option B is wrong because packaging the model as a Docker container with a custom prediction routine is overkill when Vertex AI already offers a pre-built, optimized scikit-learn serving container that handles the prediction logic out of the box. Option D is wrong because exporting a scikit-learn model as a TensorFlow SavedModel is not a direct conversion; scikit-learn models are not natively compatible with TensorFlow Serving, and this would require significant re-engineering or use of ONNX, which is not the recommended path for a RandomForestClassifier.

Full explanation →

431

Multi-Selectmedium

An ML team is designing an automated pipeline to retrain a recommendation model every day using new user interaction data stored in BigQuery. The pipeline must be cost-efficient, scalable, and require minimal manual intervention. Which two approaches should they consider?

Select 2 answers

A.Deploy a custom Kubernetes cron job on GKE to run the training script directly.

B.Use Cloud Composer (Airflow) to schedule the pipeline with a DAG.

C.Use Cloud Scheduler to publish a Pub/Sub message daily, which triggers a Cloud Function that starts the Vertex AI Pipeline.

D.Use Dataflow to continuously read from BigQuery and trigger training when new data arrives.

E.Use Vertex AI Pipelines to define the workflow and preemptible VMs for training to reduce cost.

AnswersC, E

This provides automated daily triggering with minimal overhead.

Why this answer

Option C is correct because Cloud Scheduler triggers a Pub/Sub message that invokes a Cloud Function, which starts a Vertex AI Pipeline. This serverless approach is cost-efficient (no idle compute), scales automatically, and requires minimal manual intervention. Option E is correct because Vertex AI Pipelines natively orchestrates ML workflows, and using preemptible VMs reduces training costs by up to 80% while maintaining scalability.

Exam trap

Google Cloud often tests the distinction between batch scheduling (Cloud Scheduler) and continuous streaming (Dataflow), and candidates mistakenly choose Dataflow because they think 'new data' implies real-time, but the requirement is a daily retrain, not a streaming trigger.

Full explanation →

432

MCQhard

You have a model that requires GPU for efficient inference. You deploy it on Vertex AI with a single NVIDIA T4 GPU accelerator and notice that the GPU utilization hovers around 30%. The endpoint has 10 replicas. What is the best way to improve cost efficiency while maintaining throughput?

A.Use a larger GPU like V100 to process requests faster.

B.Reduce the number of replicas to increase GPU utilization per instance.

C.Enable autoscaling to increase the number of replicas.

D.Switch to a CPU-only instance; the model can run on CPU.

AnswerB

Fewer replicas with same traffic will increase GPU utilization and reduce cost.

Why this answer

If GPU utilization is low, you can reduce the number of replicas or increase the batch size per request to fully utilize the GPU. Reducing replicas directly saves cost. Increasing batch size may also help but requires code changes.

Full explanation →

433

MCQeasy

Refer to the exhibit. A data scientist notices that predictions from a deployed model are taking longer than expected. Which Cloud Monitoring metric should be inspected first to identify the bottleneck?

A.Vertex AI - Model - Compute utilization

B.Vertex AI - Endpoint - Prediction latency distribution

C.Vertex AI - Endpoint - Traffic

D.Vertex AI - Endpoint - Online prediction errors

AnswerB

This metric directly shows the distribution of latency for prediction requests, making it the first place to look for a bottleneck.

Why this answer

The data scientist is investigating slow predictions from a deployed model. The most direct metric to identify the latency bottleneck is the prediction latency distribution, which shows the distribution of response times for online prediction requests. This metric allows you to pinpoint whether the delay is due to model inference time, network overhead, or endpoint queuing, making it the first logical place to inspect.

Exam trap

Google Cloud often tests the distinction between metrics that measure performance (latency) versus metrics that measure capacity (utilization, traffic) or errors, leading candidates to mistakenly choose compute utilization or traffic when the question explicitly asks about prediction time.

How to eliminate wrong answers

Option A is wrong because Vertex AI - Model - Compute utilization measures the resource usage (CPU/memory) of the model's compute resources, which can indicate a resource bottleneck but does not directly show prediction latency; it is a secondary metric to investigate after latency is confirmed. Option C is wrong because Vertex AI - Endpoint - Traffic measures the number of requests per second (RPS) to the endpoint, which can indicate load but does not directly measure how long each prediction takes; high traffic can cause latency, but the metric itself is not a latency metric. Option D is wrong because Vertex AI - Endpoint - Online prediction errors tracks the count or rate of failed predictions (e.g., timeouts, invalid inputs), not the latency of successful predictions; errors may be a consequence of latency but are not the primary metric for identifying a latency bottleneck.

Full explanation →

434

MCQhard

A machine learning engineer is training a large-scale text classification model using a distributed strategy on TPUs. The training loss decreases normally but the validation loss starts increasing after a few epochs while training loss continues to decrease. The engineer suspects overfitting. Which technique is most appropriate to address this while scaling training?

A.Add dropout regularization.

B.Use early stopping with patience.

C.Reduce the learning rate.

D.Increase the batch size.

AnswerA

Reduces overfitting by randomly dropping units, effective in distributed settings.

Why this answer

Option B is correct because dropout regularization is a common technique to prevent overfitting in neural networks, and it can be applied in distributed training without major modifications. Option A is wrong because reducing learning rate may not directly address overfitting. Option C is wrong because increasing batch size can sometimes help generalization but is not a primary anti-overfitting method.

Option D is wrong because early stopping prevents further overfitting but does not address the cause during training.

Full explanation →

435

MCQeasy

A company deploys a TensorFlow model on Vertex AI Prediction with a single node. During peak hours, inference latency increases. What should they do first to reduce latency?

A.Enable autoscaling for the deployment

B.Increase the machine type of the node

C.Decrease the min replicas to 0

D.Enable automatic batching of requests

AnswerA

Autoscaling adds nodes during peak traffic, reducing latency.

Why this answer

Enabling autoscaling for the deployment is the correct first step because it allows Vertex AI Prediction to dynamically adjust the number of replicas based on incoming traffic. During peak hours, autoscaling can add more nodes to distribute the inference load, directly reducing latency without requiring manual intervention or over-provisioning.

Exam trap

The trap here is that candidates often confuse improving throughput (batching or bigger machines) with reducing latency under load, but the first action should always be to add more replicas via autoscaling to handle concurrent requests, not to optimize a single node's performance.

How to eliminate wrong answers

Option B is wrong because increasing the machine type of the node (e.g., moving to a larger VM) may improve per-node throughput but does not address the root cause of insufficient capacity during traffic spikes; it also increases cost without guaranteeing latency reduction if the single node is already saturated. Option C is wrong because decreasing the min replicas to 0 would cause the deployment to scale down to zero during idle periods, but during peak hours it would still need to scale up from zero, causing cold-start latency and potentially failing to handle the initial burst of requests. Option D is wrong because enabling automatic batching of requests can improve throughput by grouping multiple inference requests into a single batch, but it does not reduce latency for individual requests—in fact, it may increase latency as requests wait for a batch to fill.

Full explanation →

436

MCQeasy

An ML engineer needs to monitor a deployed model for data drift. They want to compare the distribution of incoming predictions against a baseline distribution. Which Vertex AI service should they use?

A.Vertex AI Feature Store

B.Vertex AI Model Monitoring

C.Vertex AI Experiments

D.Vertex AI Explainable AI

AnswerB

Designed for detecting drift and anomalies in prediction data.

Why this answer

Vertex AI Model Monitoring is the correct service because it is specifically designed to detect data drift and feature skew in deployed models. It continuously compares the distribution of incoming prediction requests against a baseline distribution (e.g., training data or a previous window) and alerts the engineer when statistically significant drift is detected, using metrics like Jensen-Shannon divergence or L-infinity distance.

Exam trap

Google Cloud often tests the distinction between monitoring (drift detection) and other MLOps components like feature stores or experiment tracking, so the trap here is that candidates may confuse 'monitoring' with 'storing features' or 'tracking experiments' because all are part of the ML lifecycle but serve different purposes.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is a centralized repository for storing, managing, and serving feature values for training and serving, not for monitoring distributional shifts in predictions. Option C is wrong because Vertex AI Experiments is used for tracking and comparing machine learning experiments (e.g., hyperparameter tuning runs), not for real-time monitoring of deployed model predictions. Option D is wrong because Vertex AI Explainable AI provides feature attributions and explanations for model predictions, but does not perform statistical drift detection or baseline comparison.

Full explanation →

437

MCQmedium

Refer to the exhibit. What is this Cloud Build step doing?

A.Uploading a model to Vertex AI Model Registry

B.Deploying a model to a Vertex AI endpoint

C.Creating a custom container for prediction

D.Training a model in Vertex AI

AnswerA

The 'upload' command registers the model.

Why this answer

The Cloud Build step shown uses the `gcloud ai models upload` command, which specifically uploads a model artifact to the Vertex AI Model Registry. This action registers the model metadata and location in Vertex AI, making it available for versioning and later deployment, but does not create an endpoint or perform training.

Exam trap

Google Cloud often tests the distinction between model registration (upload) and model deployment (endpoint creation), leading candidates to confuse the `gcloud ai models upload` step with the actual deployment to an endpoint.

How to eliminate wrong answers

Option B is wrong because deploying a model to a Vertex AI endpoint requires the `gcloud ai endpoints deploy-model` command, not `gcloud ai models upload`. Option C is wrong because creating a custom container for prediction involves building and pushing a Docker image (e.g., via `gcloud builds submit` or `docker push`), not uploading a model to the registry. Option D is wrong because training a model in Vertex AI uses `gcloud ai custom-jobs create` or `gcloud ai training jobs submit`, not the model upload command.

Full explanation →

438

MCQhard

A company is using AutoML Vision for object detection and observes high latency for online predictions. What can they do to reduce latency?

A.Reduce the training budget to create a smaller model

B.Use continuous batch prediction instead of online prediction

C.Deploy the model to a region closer to the users

D.Use a larger batch size in the prediction request

AnswerA

A smaller model has lower inference latency.

Why this answer

Reducing the training budget in AutoML Vision forces the model to use fewer node-hours, which typically results in a smaller and less complex model. A smaller model has fewer parameters and requires less computation during inference, directly reducing the latency for online predictions. This is a trade-off between model accuracy and inference speed.

Exam trap

The trap here is that candidates often confuse network latency with inference latency, assuming that deploying closer to users (Option C) is the primary fix, when in fact the question specifically targets high latency for online predictions caused by model complexity.

How to eliminate wrong answers

Option B is wrong because continuous batch prediction is designed for offline, asynchronous processing of large datasets and does not reduce latency for real-time online predictions; it actually increases end-to-end time. Option C is wrong because deploying the model to a region closer to users reduces network latency but does not address the model inference latency itself, which is the primary bottleneck in AutoML Vision's online prediction. Option D is wrong because AutoML Vision online prediction endpoints do not support user-defined batch sizes; the batch size is fixed by the service, and attempting to use a larger batch size would not be accepted or would increase latency per request.

Full explanation →

439

MCQmedium

Refer to the exhibit. A team deploys a model with the above configuration. They observe that during traffic spikes, the endpoint does not scale up quickly enough, causing increased latency. The average CPU utilization never exceeds 50%. What is the most likely reason for the slow scaling?

A.The autoscaling metric is not configured

B.The minReplicaCount is too low

C.The accelerator is causing a bottleneck

D.The machineType does not have enough CPU

AnswerA

The strategy is 'manual', so autoscaling is not configured; changing to 'autoscaling' with a target metric would resolve the issue.

Why this answer

Option C is correct. The configuration shows strategy: manual, meaning autoscaling is disabled. Without autoscaling, the endpoint does not add instances in response to load.

Option A increases min replicas but still manual. Option B changes machine type but scaling remains manual. Option D is irrelevant because CPU utilization is low.

Full explanation →

440

MCQhard

A team is building a CI/CD pipeline for ML using Cloud Build. The pipeline trains a model and deploys it to Vertex AI. Recently, a change in the data processing step caused the model to be trained with a different data version, leading to a failed deployment because the model was invalid. How should the team prevent this in the future?

A.Add a manual review step before training

B.Pin all library versions in the Docker image

C.Use a data versioning tool (e.g., DVC) to track datasets and ensure the pipeline always uses the correct version

D.Schedule a cron job to check for data changes

AnswerC

Data versioning ensures reproducibility and consistency across pipeline runs.

Why this answer

Option C is correct because the root cause is a data version mismatch, not a code or environment issue. A data versioning tool like DVC (Data Version Control) tracks dataset versions via hash-based pointers in Git, ensuring the pipeline retrieves the exact dataset version used during training. This prevents silent failures when data processing steps change the data schema or content, which library pinning or manual reviews cannot guarantee.

Exam trap

The trap here is that candidates confuse environment reproducibility (pinning libraries) with data reproducibility, assuming that locking code dependencies is sufficient to prevent model failures caused by data drift or version changes.

How to eliminate wrong answers

Option A is wrong because a manual review step before training introduces human latency and does not enforce data version consistency; it relies on a person to catch a version mismatch that may not be visually obvious. Option B is wrong because pinning library versions in the Docker image addresses dependency drift in code, not data versioning; the model failed due to a different data version, not a library incompatibility. Option D is wrong because scheduling a cron job to check for data changes is reactive and does not prevent the pipeline from using the wrong data version; it only alerts after the fact, and the pipeline would still train on incorrect data.

Full explanation →

441

MCQhard

A company serves a scikit-learn model on Vertex AI Prediction but receives a 400 error with 'Prediction failed: Model evaluation error'. What is the most likely cause?

A.The input data format is incorrect

B.The model was trained with a different framework

C.The model uses a scikit-learn version not supported by Vertex AI

D.The endpoint is overloaded and timing out

AnswerC

Version mismatch causes evaluation failure.

Why this answer

Vertex AI Prediction supports specific versions of scikit-learn for serving models. If the model was trained with a version that is not in the supported list (e.g., 0.19, 0.20, 0.22, 0.23, 0.24, 1.0, 1.1), the prediction endpoint will fail with a 'Model evaluation error' because the underlying runtime cannot load the serialized model (e.g., pickle or joblib file). This is the most likely cause of a 400 error when the input format is otherwise correct.

Exam trap

Google Cloud often tests the misconception that a 400 error always indicates a client-side input format issue, but here the error message 'Model evaluation error' points to a server-side model loading failure due to version incompatibility, not the input data.

How to eliminate wrong answers

Option A is wrong because an incorrect input data format typically results in a different error message, such as 'Invalid input' or 'Prediction failed: Input parsing error', not 'Model evaluation error'. Option B is wrong because Vertex AI Prediction supports multiple frameworks (TensorFlow, PyTorch, XGBoost, scikit-learn) and will not throw a 'Model evaluation error' solely due to a different training framework; it would fail at model upload or deployment with an unsupported framework error. Option D is wrong because an overloaded endpoint or timeout would return a 429 (Too Many Requests) or 504 (Gateway Timeout) status code, not a 400 error with 'Model evaluation error'.

Full explanation →

442

MCQhard

A team is monitoring a production ML system that includes multiple models and data processing pipelines. They want to set up a comprehensive alerting strategy that minimizes false positives while ensuring critical issues are promptly addressed. Which approach is the most effective?

A.Set up alerts for all possible error conditions

B.Use static thresholds based on historical data

C.Rely on manual monitoring during business hours

D.Use AIOps with anomaly detection to dynamically adjust thresholds

AnswerD

AIOps anomaly detection models learn normal behavior and flag deviations, reducing false positives while detecting real anomalies.

Why this answer

Option D is correct because AIOps with anomaly detection uses machine learning to dynamically adjust alert thresholds based on real-time system behavior, reducing false positives while ensuring critical issues are detected promptly. This approach adapts to changing data distributions and traffic patterns, unlike static thresholds that require manual tuning and often miss subtle anomalies. It is the most effective strategy for complex ML production systems where multiple models and pipelines interact, as it can correlate signals across components to identify genuine incidents.

Exam trap

The trap here is that candidates often choose static thresholds (Option B) because they seem simpler and more predictable, but they fail to recognize that production ML systems require adaptive thresholds to handle dynamic data distributions and avoid alert fatigue.

How to eliminate wrong answers

Option A is wrong because setting alerts for all possible error conditions leads to alert fatigue, overwhelming the team with noise and causing critical issues to be missed; it lacks prioritization and ignores the need for intelligent filtering. Option B is wrong because static thresholds based on historical data fail to adapt to concept drift, seasonal patterns, or sudden traffic spikes, resulting in either too many false positives or missed anomalies when the system behavior changes. Option C is wrong because relying on manual monitoring during business hours introduces unacceptable latency for critical issues that occur outside those hours, and human error or fatigue can cause delays in detection; it is not scalable for 24/7 production ML systems.

Full explanation →

443

MCQeasy

A machine learning engineer is exporting a trained model from Vertex AI Training to the Model Registry. Which artifact should they upload as the model artifact?

A.The saved model directory containing the model file(s) and any custom dependencies.

B.Only the model checkpoint file (.ckpt or .h5).

C.The entire training directory including training code and logs.

D.A zip file of the training source code.

AnswerA

This is the standard artifact expected by Vertex AI for deployment.

Why this answer

When exporting a trained model from Vertex AI Training to the Model Registry, the correct artifact is the saved model directory that contains the model file(s) (e.g., SavedModel format for TensorFlow, model.pkl for scikit-learn) along with any custom dependencies required for serving. This ensures the model can be deployed consistently to endpoints or batch predictions, as the Model Registry expects a self-contained artifact that includes both the model binary and its runtime dependencies.

Exam trap

Google Cloud often tests the distinction between training artifacts (checkpoints, code) and deployable model artifacts, trapping candidates who confuse a checkpoint (used for resuming training) with a final, serving-ready model.

How to eliminate wrong answers

Option B is wrong because a model checkpoint file (.ckpt or .h5) is an intermediate training state, not a final deployable artifact; it lacks the serialized graph and serving signatures needed for inference. Option C is wrong because uploading the entire training directory, including training code and logs, introduces unnecessary files and violates the Model Registry's expectation of a minimal, serving-ready artifact. Option D is wrong because a zip file of the training source code contains no model weights or architecture, making it useless for deployment.

Full explanation →

444

MCQmedium

A data science team uses BigQuery to store raw data and Vertex AI for model training. They want to ensure that only authorized users can access training data, and that model artifacts are automatically versioned and tracked. Which combination of Google Cloud services should they use?

A.Dataflow for data access control and Vertex AI Experiments for model tracking

B.Cloud Storage with bucket-level IAM and Cloud Build for versioning

C.Cloud Composer for data access control and Cloud Source Repositories for model versioning

D.Vertex AI Feature Store with access control and Vertex AI ML Metadata for model versioning

AnswerD

Vertex AI Feature Store provides controlled access to features, and ML Metadata tracks model artifacts and versions.

Why this answer

Vertex AI Feature Store provides fine-grained access control to training data, ensuring only authorized users can access it. Vertex AI ML Metadata automatically tracks and versions model artifacts, lineage, and parameters, which aligns with the requirement for automated versioning and tracking.

Exam trap

Google Cloud often tests the distinction between services that handle data processing (Dataflow, Cloud Composer) versus those that handle access control and metadata management (Feature Store, ML Metadata), leading candidates to confuse orchestration or CI/CD tools with versioning and access control solutions.

How to eliminate wrong answers

Option A is wrong because Dataflow is a data processing service, not an access control mechanism; it does not provide data access control for training data in BigQuery or Vertex AI. Option B is wrong because Cloud Storage with bucket-level IAM can control access to stored objects, but Cloud Build is a CI/CD service for building and deploying applications, not for versioning model artifacts automatically. Option C is wrong because Cloud Composer is a workflow orchestration service (based on Apache Airflow), not a data access control solution, and Cloud Source Repositories is a Git repository for source code, not designed for model versioning or tracking.

Full explanation →

445

Drag & Dropmedium

Drag and drop the steps to set up data lineage tracking for ML pipelines using Vertex AI Experiments in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start with SDK setup, then create an experiment, log metrics, record artifacts, and review lineage.

Full explanation →

446

Multi-Selectmedium

Which THREE are key capabilities of Vertex AI Feature Store?

Select 3 answers

A.Automatic generation of feature embeddings

B.Feature monitoring and validation to detect skew

C.Online serving for low-latency feature retrieval

D.Real-time streaming ingestion from Apache Kafka

E.Offline batch serving for training

AnswersB, C, E

Feature Store includes monitoring for distribution changes.

Why this answer

Option B is correct because Vertex AI Feature Store provides built-in feature monitoring and validation capabilities that detect training-serving skew and data drift. This is critical for maintaining model performance in production, as it alerts when the distribution of feature values changes between training and serving environments.

Exam trap

Google Cloud often tests the misconception that Vertex AI Feature Store includes automatic embedding generation or direct Kafka integration, when in fact these are separate services or require custom implementation.

Full explanation →

447

MCQhard

A Vertex AI pipeline is triggered from Cloud Build using the configuration above. The pipeline fails with an error: 'Unable to submit build: The source code is not available.' What is the most likely cause?

A.The Docker build step failed silently due to a missing dependency.

B.The 'gcloud builds submit' command does not have access to the source code in the Cloud Build environment.

C.The Docker image tag does not include a hash, causing the push to fail.

D.The Cloud Build service account lacks permission to access the Vertex AI Pipeline API.

AnswerB

The source code must be provided or referenced explicitly; using 'gcloud builds submit' in a step requires the source to be available via a trigger or artifact.

Why this answer

The error 'Unable to submit build: The source code is not available' indicates that the Cloud Build environment cannot locate the source code when the 'gcloud builds submit' command is executed. This typically happens when the pipeline is triggered from Cloud Build but the source code is not properly staged or accessible in the build context, often because the build configuration does not include the source directory or the source is not uploaded to Cloud Storage. Option B correctly identifies that the command lacks access to the source code in the Cloud Build environment.

Exam trap

Google Cloud often tests the distinction between source code availability errors and permission or build failures, leading candidates to mistakenly attribute the error to service account permissions or Docker issues when the root cause is a missing or misconfigured source path.

How to eliminate wrong answers

Option A is wrong because a silent Docker build step failure due to a missing dependency would produce a different error, such as 'Failed to build' or 'Docker build failed', not a source code unavailability error. Option C is wrong because the Docker image tag missing a hash would cause a push failure with an error like 'unauthorized' or 'tag invalid', not a source code availability issue. Option D is wrong because a permission issue with the Cloud Build service account accessing the Vertex AI Pipeline API would result in an authorization error (e.g., 'Permission denied'), not a source code unavailability error.

Full explanation →

448

MCQmedium

A team uses Vertex AI Pipelines. They need to ensure that only certain team members can deploy models to production. What is the best approach?

A.Use Vertex AI Experiments to track models

B.Store model artifacts in a bucket with bucket-level permissions

C.Use IAM roles with custom permissions on the Vertex AI Model Registry

D.Create separate projects for dev and prod

AnswerC

Model Registry integrates with IAM to grant specific deployment permissions.

Why this answer

Option C is correct because Vertex AI Model Registry supports IAM roles with custom permissions, allowing fine-grained access control over who can promote or deploy models to production. By assigning specific roles (e.g., `roles/aiplatform.modelDeployer`) to only authorized team members, you can restrict deployment actions while still permitting others to view or register models. This approach directly addresses the need to control production deployments without affecting other pipeline stages.

Exam trap

The trap here is that candidates often confuse artifact storage permissions (bucket-level IAM) with deployment permissions (model registry IAM), leading them to choose Option B, even though bucket permissions do not control the Vertex AI deployment API call.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing model training runs (e.g., hyperparameters, metrics), not for controlling access or permissions to deploy models. Option B is wrong because bucket-level permissions control access to the storage location of model artifacts, but they do not govern the deployment action itself within Vertex AI Pipelines; a user with bucket access could still lack deployment permissions, or vice versa. Option D is wrong because creating separate projects for dev and prod is an organizational boundary that can help with isolation, but it does not provide granular control over which specific team members can deploy within the same project; it also introduces overhead in managing multiple projects and does not leverage Vertex AI's native IAM capabilities for model registry operations.

Full explanation →

449

MCQhard

Refer to the exhibit. A Machine Learning Engineer attempts to deploy a model to a Vertex AI Endpoint for online predictions but receives an error. What is the most likely cause of this error?

A.The model is not compatible with the selected machine type.

B.The machine type does not support GPU acceleration.

C.The min replica count is set to 0, which is not allowed for online prediction.

D.The endpoint is not in the same region as the model.

AnswerC

The error clearly states that min_replica_count must be at least 1.

Why this answer

Vertex AI online prediction endpoints require at least one replica to serve traffic. Setting `min_replica_count` to 0 is only valid for batch prediction, not for online prediction, because the endpoint must always have a running instance to handle incoming requests. The error occurs because the deployment request violates this constraint, causing the API to reject the configuration.

Exam trap

Google Cloud often tests the distinction between batch and online prediction configuration requirements, specifically that `min_replica_count = 0` is valid for batch but invalid for online, leading candidates to overlook this subtle but critical constraint.

How to eliminate wrong answers

Option A is wrong because Vertex AI automatically validates model compatibility with the selected machine type at deployment time; if there were an incompatibility, the error would be specific to that mismatch, not a generic deployment failure. Option B is wrong because GPU acceleration is optional and not required for online prediction; the error message would explicitly mention GPU-related issues if that were the cause. Option D is wrong because Vertex AI endpoints and models can be in different regions as long as the endpoint is deployed in a supported region; the platform handles cross-region model serving transparently.

Full explanation →

450

MCQmedium

A machine learning engineer notices that the Vertex AI Prediction endpoint's error rate has increased over the past week. The model was retrained with new data and redeployed. Which step should the engineer take first to diagnose the issue?

A.Increase the number of replicas to reduce error rate.

B.Compare the input data distribution of recent requests to the training data distribution using Explainable AI.

C.Roll back to the previous model version immediately.

D.Check the Cloud Monitoring dashboard for latency and error codes, and review the model's prediction logs.

AnswerD

Monitoring and logs provide direct evidence to diagnose errors.

Why this answer

Option C is correct because reviewing Cloud Monitoring dashboards and logs provides immediate insights into error patterns and root cause. Option A is premature without investigation. Option B is more advanced and requires setup.

Option D might temporarily reduce errors due to overload but does not address the underlying cause.

Full explanation →

Page 6 of 7

All pages

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Scaling prototypes into ML models Automating and orchestrating ML pipelines Collaborating within and across teams to manage data and models Architecting low-code ML solutions Collaborating to manage data and models Serving and scaling models Monitoring ML solutions Solving business challenges with ML

See all domains with question counts →