Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 76–150

506 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 2 of 7

76

MCQeasy

A team is using Cloud Composer to orchestrate ML workflows. They want to allow multiple data scientists to contribute DAGs without interfering with each other. What is the recommended approach?

A.Give each data scientist write access to the DAGs folder in Cloud Storage

B.Use a complex naming convention for DAG files to avoid overwriting

C.Store DAGs in a source control repository and use CI/CD to deploy to Cloud Composer

D.Create a separate Cloud Composer environment for each data scientist

AnswerC

Version control and CI/CD provide collaboration, testing, and safe deployment.

Why this answer

Option C is correct because Cloud Composer (based on Apache Airflow) recommends managing DAGs via source control and CI/CD pipelines to ensure version control, code review, and consistent deployment. This prevents conflicts when multiple data scientists contribute, as each change is tracked and tested before being synced to the DAGs folder in Cloud Storage, avoiding overwrites or broken workflows.

Exam trap

The trap here is that candidates may assume direct write access or naming conventions are sufficient for collaboration, but Cisco tests the understanding that production-grade ML workflows require source control and CI/CD to enforce code quality and prevent deployment conflicts.

How to eliminate wrong answers

Option A is wrong because giving each data scientist direct write access to the DAGs folder in Cloud Storage bypasses version control and can lead to accidental overwrites, conflicts, or deployment of untested code, breaking production workflows. Option B is wrong because a complex naming convention does not prevent race conditions or overwrites when multiple data scientists upload files simultaneously; it only reduces the probability of name collisions but does not address the core need for controlled, auditable deployments. Option D is wrong because creating a separate Cloud Composer environment for each data scientist is cost-prohibitive, inefficient, and defeats the purpose of shared orchestration; it also introduces overhead in managing multiple environments and does not solve the collaboration problem at scale.

Full explanation →

77

Multi-Selectmedium

A media company uses a custom Python script on a Compute Engine VM to run batch predictions with a large ML model. The script loads the model from Cloud Storage, processes records from a Pub/Sub pull subscription, and writes results to BigQuery. Predictions are taking too long and the VM often runs out of memory. Which two changes should the company implement to improve performance and scalability? (Choose TWO)

Select 2 answers

A.Deploy the model on Vertex AI Prediction for batch prediction

B.Change Pub/Sub to a push subscription that sends messages to a load-balanced group of VMs

C.Use Dataflow to read from Pub/Sub, run predictions using the model, and write to BigQuery

D.Switch to a larger VM with more memory

E.Store results in Cloud SQL instead of BigQuery

AnswersB, C

Push subscriptions with load balancing allow horizontal scaling across multiple VMs.

Why this answer

Option B is correct because switching to a push subscription with a load-balanced group of VMs distributes the message processing load across multiple instances, preventing any single VM from being overwhelmed. This directly addresses the memory exhaustion issue by parallelizing the work and allowing horizontal scaling.

Exam trap

Google Cloud often tests the distinction between vertical scaling (larger VM) and horizontal scaling (load-balanced VMs or Dataflow), where candidates mistakenly choose a larger VM thinking it solves memory issues without recognizing the scalability bottleneck.

Full explanation →

78

Drag & Dropmedium

Drag and drop the steps to set up a distributed training job on Vertex AI using a custom container in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First prepare the code and container, then push, configure the job, and run.

Full explanation →

79

MCQhard

A company uses a custom container on Vertex AI Prediction. They want to send custom metrics from their prediction container to Cloud Monitoring. Which method should they use?

A.OpenCensus or OpenTelemetry SDK

B.Vertex AI built-in metrics

C.Stackdriver Monitoring agent installed in the container

D.Cloud Logging log-based metrics

AnswerA

Vertex AI Prediction integrates with OpenTelemetry for custom metrics.

Why this answer

Option A is correct because OpenCensus and OpenTelemetry are the recommended open-source frameworks for exporting custom metrics from custom containers on Vertex AI Prediction to Cloud Monitoring. They provide a standardized way to instrument your application code, collect metrics, and send them directly to Cloud Monitoring via the Cloud Monitoring API, without requiring additional agents or log-based workarounds.

Exam trap

The trap here is that candidates often confuse built-in Vertex AI metrics (which are automatic but limited) with the need for custom metrics, or they incorrectly assume that log-based metrics are the simplest path, when in fact OpenCensus/OpenTelemetry are the direct and recommended method for custom containers.

How to eliminate wrong answers

Option B is wrong because Vertex AI built-in metrics only cover default infrastructure metrics (e.g., CPU, memory, request latency) and cannot capture custom application-level metrics defined by the user. Option C is wrong because the Stackdriver Monitoring agent (now the Ops Agent) is designed for VM-based environments and is not intended to be installed inside a container; it would add unnecessary overhead and is not the recommended pattern for custom containers on Vertex AI. Option D is wrong because Cloud Logging log-based metrics require you to write metrics as structured log entries and then define metric filters, which is an indirect, higher-latency approach compared to directly exporting metrics via OpenCensus/OpenTelemetry, and it is not the standard method for custom containers in Vertex AI Prediction.

Full explanation →

80

MCQhard

A mobile app company needs to run an image classification model on-device for real-time performance. The model is a ResNet-50 trained in TensorFlow. They need to reduce latency to under 50ms on a mid-range phone. Which optimization should they apply first?

A.Convert the model to TensorFlow Lite

B.Quantize the model weights to 8-bit integers

C.Replace ResNet-50 with MobileNet

D.Apply weight pruning to remove 50% of connections

AnswerB

Quantization reduces model size and speeds up inference significantly.

Why this answer

Quantizing the model weights to 8-bit integers (option B) is the most effective first optimization because it directly reduces the model size by 4x and leverages integer-arithmetic acceleration on mobile CPUs/GPUs, often cutting inference latency by 2-3x without requiring architectural changes. This is the standard first step for on-device deployment of TensorFlow models, as it preserves the ResNet-50 accuracy while meeting the 50ms target on mid-range hardware.

Exam trap

Google Cloud often tests the misconception that converting to TensorFlow Lite alone is sufficient for latency reduction, but the real performance gain comes from quantization, not the format change.

How to eliminate wrong answers

Option A is wrong because simply converting to TensorFlow Lite (TFLite) without quantization does not reduce latency; TFLite is a runtime format that enables on-device inference but does not inherently speed up computation—quantization must be applied during conversion. Option C is wrong because replacing ResNet-50 with MobileNet is a model architecture change that would require retraining and potentially degrade accuracy for the specific image classification task, and the question asks for the first optimization to apply, not a model swap. Option D is wrong because weight pruning (removing 50% of connections) can reduce model size but often requires specialized hardware or software support for sparse matrix multiplication, which is not universally available on mid-range phones, and the latency improvement is less predictable than quantization.

Full explanation →

81

Drag & Dropmedium

Drag and drop the steps to set up a feature store for ML features using Vertex AI Feature Store in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First define entity type and features, then ingest data, serve, and monitor.

Full explanation →

82

MCQmedium

A financial services company uses a custom deep learning model on Vertex AI to automatically approve or reject credit card transactions. The model is explainable using Vertex Explainable AI, and the company monitors feature attribution drift with thresholds defined per feature. Last week, the monitoring system flagged that the mean absolute attribution score for the 'transaction_amount' feature increased from 0.35 to 0.55. The overall model accuracy, measured on a daily batch of labeled transactions, has remained around 97%. The operations team is concerned about potential compliance issues due to changing model behavior. What should the data scientist do?

A.Tune the alert threshold for 'transaction_amount' to 0.6 to avoid future false alarms.

B.Retrain the model by increasing regularization to reduce the importance of the 'transaction_amount' feature.

C.Investigate whether there has been a shift in the distribution of 'transaction_amount' values in the recent transaction data, which could explain the attribution change.

D.Disable the feature attribution drift monitoring for 'transaction_amount' since the model accuracy is stable.

AnswerC

A distribution shift in the feature values can cause the model to rely more heavily on that feature, leading to higher attribution scores. Investigating this is the appropriate diagnostic step.

Why this answer

Option C is correct because a shift in the distribution of the 'transaction_amount' feature (e.g., due to seasonality or a new customer segment) can naturally cause its attribution score to change without indicating model degradation. Vertex Explainable AI computes feature attributions relative to the current data distribution; if the input values shift, the model's reliance on that feature may legitimately increase. Investigating the distribution shift is the first diagnostic step before adjusting thresholds or retraining, as stable accuracy does not rule out data drift that could lead to compliance issues.

Exam trap

The trap here is that candidates assume stable accuracy means the model is fine, but the PMLE exam tests that feature attribution drift can indicate a change in model behavior that accuracy alone cannot detect, especially for compliance-sensitive applications.

How to eliminate wrong answers

Option A is wrong because tuning the alert threshold to 0.6 without understanding the root cause ignores the possibility of a real distribution shift or model behavior change, and could mask a genuine compliance risk. Option B is wrong because increasing regularization to reduce the importance of 'transaction_amount' is a premature intervention that could harm model performance and does not address why the attribution changed; it assumes the change is harmful without evidence. Option D is wrong because disabling monitoring for a feature based solely on stable accuracy is dangerous—accuracy can remain high while feature attributions drift, leading to biased or non-compliant decisions that accuracy alone does not capture.

Full explanation →

83

MCQmedium

You are responsible for deploying a real-time recommendation model that uses a large embedding table (5 GB) and a small neural network. The model is served through a custom container on Vertex AI Prediction. The end-to-end latency requirement is under 200 ms. During load testing with 500 QPS, you observe that latency increases linearly with batch size. You are currently using a single replica with an n1-standard-8 machine and one T4 GPU. The embedding table is loaded entirely in GPU memory. However, CPU utilization is at 100% while GPU is at 30%. What is the best approach to meet the latency requirement at scale?

A.Increase the number of replicas and use a global load balancer to distribute traffic.

B.Use a custom container that partitions the embedding table across multiple GPUs within a single replica.

C.Switch to a TPU v2-8 pod slice to accelerate embedding lookups.

D.Use a machine type with more CPU cores to parallelize embedding lookups.

AnswerD

More CPU cores reduce contention and latency for embedding operations.

Why this answer

Option D is correct because CPU is the bottleneck; using a machine type with more CPU cores (e.g., n1-highcpu-16) allows parallel embedding lookups and reduces latency. Option A increases resources but not in the bottleneck area. Option B increases replicas but each would still be CPU-bound.

Option C is expensive and may not improve latency if model not T PU-compatible.

Full explanation →

84

MCQeasy

A data scientist wants to perform feature engineering on a large dataset stored in BigQuery before training a model. Which feature engineering tool is most appropriate?

A.Use Vertex AI Feature Store to store engineered features

B.Export data to Cloud Dataproc for feature engineering

C.Create a Dataflow pipeline to compute features

D.Use BigQuery ML TRANSFORM clause

AnswerD

Enables SQL-based feature transformations.

Why this answer

Option A is correct because BigQuery ML TRANSFORM clause allows creating transformed features directly in SQL. Option B is wrong because Cloud Dataflow is for pipelines, not direct interactive feature engineering. Option C is wrong because Vertex AI Feature Store is for storing already created features.

Option D is wrong because Cloud Dataproc is for Hadoop/Spark, not integrated with BigQuery as directly.

Full explanation →

85

MCQmedium

A machine learning team wants to perform A/B testing between two model versions (v1 and v2) on Vertex AI Endpoint. They need to gradually route 10% of traffic to v2 while monitoring performance. What is the most efficient way to achieve this?

A.Use a Cloud Load Balancer to route traffic based on a header.

B.Deploy both versions to the same endpoint and set traffic_split to 90% for v1 and 10% for v2.

C.Create two separate endpoints and use a weighted DNS round-robin.

D.Run batch predictions for v2 and log results separately.

AnswerB

Vertex AI Endpoint supports traffic splitting for A/B testing.

Why this answer

Option B is correct because Vertex AI Endpoint natively supports traffic splitting between model versions. Option A is wrong because creating separate endpoints adds complexity and cost. Option C is wrong because Cloud Load Balancing operates at the network level, not model level.

Option D is wrong because batch prediction is not for real-time A/B testing.

Full explanation →

86

MCQhard

A team has successfully trained a deep learning model on Vertex AI using a custom container and distributed training with TensorFlow. They want to serve this model for online predictions with low latency. They deploy the model to Vertex AI Endpoint with a single n1-standard-4 machine. During load testing, they observe that the median latency is 200ms, but the 99th percentile latency spikes to 2 seconds. The model is a complex neural network that takes variable-length text as input. Which approach will best reduce tail latency while maintaining throughput?

A.Use autoscaling with a target CPU utilization of 70%.

B.Implement request batching to process multiple inputs per request.

C.Use a GPU machine type like n1-standard-4 with an attached GPU.

D.Increase the machine type to n1-highmem-8 to allocate more memory.

AnswerB

Batching reduces overhead and smooths out latency for variable-length inputs.

Why this answer

Option C is correct because batching multiple requests together amortizes overhead and reduces per-request latency variability, particularly for variable-length inputs. Option A is wrong because increasing memory does not address compute-bound latency spikes. Option B is wrong because GPU might improve throughput but not necessarily reduce tail latency from variability.

Option D is wrong because autoscaling adds replicas over time but does not reduce per-request latency spikes.

Full explanation →

87

MCQmedium

Refer to the exhibit. A team member complains they cannot deploy a model to Vertex AI Endpoints. What is the most likely reason?

A.The policy is missing a condition

B.The policy lacks `roles/aiplatform.deployer`

C.The policy lacks `roles/aiplatform.specialist`

D.The service account needs `roles/aiplatform.user`

AnswerB

The deployer role is required for deploying models to endpoints.

Why this answer

The correct answer is B because deploying a model to Vertex AI Endpoints requires the `roles/aiplatform.deployer` role on the service account. This role grants the necessary permissions to create and manage endpoint deployments. Without it, the deployment operation will fail with an access denied error, even if other roles are present.

Exam trap

Google Cloud often tests the distinction between read-only roles like `roles/aiplatform.user` and write roles like `roles/aiplatform.deployer`, trapping candidates who assume a general user role includes deployment permissions.

How to eliminate wrong answers

Option A is wrong because the policy missing a condition is not the most likely reason; conditions are optional and typically used for context-aware access, not for basic deployment permissions. Option C is wrong because `roles/aiplatform.specialist` is a custom role that does not exist in standard Vertex AI IAM roles; the correct role for deployment is `roles/aiplatform.deployer`. Option D is wrong because `roles/aiplatform.user` provides read-only access to view resources but does not include the write permissions needed to deploy a model to an endpoint.

Full explanation →

88

Multi-Selectmedium

A machine learning team is collaborating on a project using Vertex AI Experiments to track model training runs. They want to ensure that all team members can reproduce any experiment by using the same code, data, and environment. Which THREE actions should the team take?

Select 3 answers

A.Store the training code in a Cloud Source Repository and tag commits with the experiment ID.

B.Build a custom container image for training and push it to Artifact Registry with a fixed tag.

C.Record the path and version of the training dataset in the experiment parameters.

D.Share a service account key with all team members so they can access the same resources.

E.Use Vertex AI's hyperparameter tuning job to automatically find the best parameters.

AnswersA, B, C

This ensures the exact code version is tied to the experiment.

Why this answer

Option A is correct because storing training code in a Cloud Source Repository with tags linked to experiment IDs ensures that every team member can retrieve the exact code version used for a given experiment. This is a core reproducibility practice in Vertex AI Experiments, where the code snapshot is a key component of the experiment lineage.

Exam trap

Google Cloud often tests the distinction between actions that enable reproducibility versus actions that improve model performance or access control, so candidates mistakenly select hyperparameter tuning or service account sharing as reproducibility measures.

Full explanation →

89

MCQeasy

A company wants to implement a document processing solution that extracts key information from invoices and receipts. They have limited ML expertise and want to use a pre-trained solution as much as possible. Which Google Cloud service should they use?

A.Document AI with a pre-trained invoice processor.

B.AutoML Natural Language with custom entity extraction.

C.Vertex AI Workbench with custom Python scripts.

D.Cloud Vision API with OCR.

AnswerA

Why B is correct: Document AI offers specialized pre-trained processors for invoices.

Why this answer

Document AI with a pre-trained invoice processor is the correct choice because it provides a fully managed, pre-trained solution specifically designed for extracting structured data (e.g., vendor name, invoice number, line items) from invoices and receipts. This aligns with the company's limited ML expertise and desire to use a pre-trained solution, requiring no custom model training or complex coding.

Exam trap

Google Cloud often tests the distinction between raw OCR (Cloud Vision API) and structured document understanding (Document AI), leading candidates to mistakenly choose Cloud Vision API for invoice processing when they only need text extraction, not structured data extraction.

How to eliminate wrong answers

Option B is wrong because AutoML Natural Language with custom entity extraction requires users to train a custom model with labeled data, which contradicts the requirement to use a pre-trained solution as much as possible. Option C is wrong because Vertex AI Workbench with custom Python scripts demands significant ML expertise to write and deploy custom code, which the company lacks. Option D is wrong because Cloud Vision API with OCR only extracts raw text from images, not the structured key-value pairs or specific fields needed for invoice processing.

Full explanation →

90

MCQmedium

A company deploys an AutoML Vision model for real-time defect detection. They notice high inference latency during peak hours. Which configuration change can help?

A.Reduce the model's input resolution

B.Use batch prediction

C.Enable model compression

D.Increase the number of max replicas

AnswerD

Correct: Handles increased load with more parallelism.

Why this answer

Increasing the number of max replicas allows the AutoML Vision endpoint to scale horizontally during peak hours, distributing the inference load across more compute instances. This directly reduces per-request latency by preventing queuing and resource contention, as the Vertex AI Prediction service can spin up additional replicas up to the configured maximum to handle higher throughput.

Exam trap

Google Cloud often tests the misconception that reducing input resolution or enabling compression is a safe latency fix, but the PMLE exam expects you to recognize that AutoML Vision models are black-box optimized and that horizontal scaling via max replicas is the proper architectural response to real-time latency spikes.

How to eliminate wrong answers

Option A is wrong because reducing input resolution may lower latency but at the cost of detection accuracy, which is unacceptable for defect detection where fine-grained features matter. Option B is wrong because batch prediction is designed for asynchronous, non-real-time processing of large datasets, not for reducing latency in real-time inference; it actually increases end-to-end latency. Option C is wrong because AutoML Vision models are already optimized by Google's neural architecture search, and enabling model compression (e.g., quantization) is not a supported configuration option for deployed AutoML Vision models; it would require retraining with a different model type.

Full explanation →

91

MCQhard

A large e-commerce company uses Vertex AI to train a recommendation model daily. The training pipeline is built with Vertex AI Pipelines and involves three steps: data preprocessing, training, and model evaluation. The pipeline is triggered by a Cloud Scheduler job every morning at 8 AM. Recently, the pipeline has been failing intermittently during the data preprocessing step, with an error message indicating 'ResourceExhausted: Quota limits exceeded for read api requests.' The team has checked and confirmed that the quota for BigQuery read requests is not exceeded at the project level. The preprocessing step reads data from a BigQuery table with billions of rows. The team has also noticed that the pipeline runs on a custom machine type (n1-standard-4) with a persistent disk. What is the most likely cause of this error?

A.The BigQuery table is partitioned on a date column, and the pipeline is querying a specific partition that exceeds the quota.

B.The Cloud Scheduler job is triggering multiple pipeline runs that overlap, causing concurrent quota usage.

C.The preprocessing component is using a BigQuery client library that does not use exponential backoff for retries.

D.The pipeline is using a shared VPC that has traffic shaping limits.

AnswerC

Without backoff, rapid retries can exhaust per-user read API quotas.

Why this answer

Option C is correct because the error 'ResourceExhausted: Quota limits exceeded for read api requests' indicates that the BigQuery API is throttling requests from the client, even though the project-level quota is not exceeded. The preprocessing component likely uses a BigQuery client library that lacks exponential backoff retry logic, causing rapid, repeated requests that exhaust the per-client or per-connection quota. Implementing exponential backoff would allow the client to back off and retry, preventing quota exhaustion.

Exam trap

The trap here is that candidates assume quota errors always mean the project-level limit is reached, but Cisco tests the nuance that per-client or per-connection rate limits can be exhausted independently, especially when retry logic is missing.

How to eliminate wrong answers

Option A is wrong because querying a specific partition does not inherently exceed quota; partitioning actually reduces data scanned and can lower quota usage. Option B is wrong because Cloud Scheduler triggers a single pipeline run at 8 AM, and overlapping runs would require multiple triggers or a long-running pipeline, which is not indicated; the error is specific to read API requests, not concurrency. Option D is wrong because shared VPC traffic shaping limits affect network throughput, not BigQuery read API quota, which is a separate resource governed by Google Cloud's API quota system.

Full explanation →

92

MCQeasy

A data science team is using a shared Cloud Storage bucket to store training data. Multiple team members are simultaneously uploading new data files, and occasionally the wrong version of a file is used in training, leading to inconsistent results. Which best practice should the team implement to ensure data version consistency?

A.Use Cloud Composer to schedule a daily snapshot of the Cloud Storage bucket.

B.Migrate all training data to BigQuery and use time-travel queries to access historical versions.

C.Enable object versioning on the Cloud Storage bucket and use the version ID when referencing data files.

D.Restrict write access to the bucket to only one team member using IAM roles.

AnswerC

Object versioning provides a way to keep multiple versions of an object, ensuring consistency.

Why this answer

Option C is correct because enabling object versioning on a Cloud Storage bucket preserves each object's history, allowing the team to reference a specific version ID when reading data files. This ensures that every training run uses the exact same version of a file, eliminating inconsistency from concurrent uploads. The version ID acts as an immutable pointer, decoupling the training process from the bucket's live state.

Exam trap

Google Cloud often tests the distinction between data versioning (object-level immutability) and data backup (snapshots or time-travel), leading candidates to choose snapshot or database-centric solutions that do not provide per-file version consistency in a shared object store.

How to eliminate wrong answers

Option A is wrong because Cloud Composer schedules workflows (e.g., Airflow DAGs) but does not provide per-object version consistency; a daily snapshot captures a point-in-time state but does not prevent concurrent uploads from overwriting files between snapshots. Option B is wrong because BigQuery time-travel queries access table snapshots within a 7-day window, but the scenario involves files in Cloud Storage, not tables; migrating all training data to BigQuery is an unnecessary architectural change that does not address file-level versioning. Option D is wrong because restricting write access to one team member creates a bottleneck and single point of failure, violating the team's need for simultaneous uploads and not solving the core problem of identifying which version is used.

Full explanation →

93

MCQhard

A machine learning engineer needs to share a trained model with the product team for integration. The model is stored in Cloud Storage, and the product team’s service account needs read access. The engineer wants to follow the principle of least privilege. Which IAM configuration should be used?

A.Generate a signed URL with read access and share it with the product team.

B.Grant the product team's service account the roles/storage.objectViewer role at the bucket level.

C.Grant the product team's service account the roles/storage.objectAdmin role at the bucket level.

D.Grant the product team's service account the roles/storage.objectViewer role at the project level.

AnswerB

Bucket-level grants read access to objects in that bucket only, following least privilege.

Why this answer

Option B is correct because granting the product team's service account the roles/storage.objectViewer role at the bucket level provides read-only access to objects in that specific bucket, adhering to the principle of least privilege. This role allows the service account to list and read objects without granting broader permissions, such as modifying or deleting them, and scoping it to the bucket prevents unnecessary access to other buckets in the project.

Exam trap

The trap here is that candidates may confuse the principle of least privilege with convenience, choosing a signed URL (Option A) because it seems simple, or selecting a project-level role (Option D) without realizing it grants access to all buckets, both of which violate the core requirement of minimal necessary permissions.

How to eliminate wrong answers

Option A is wrong because generating a signed URL with read access creates a time-limited, publicly accessible URL that bypasses IAM authentication, which violates the principle of least privilege by not using the service account's identity and potentially exposing the model to unauthorized users if the URL is leaked. Option C is wrong because granting the roles/storage.objectAdmin role at the bucket level provides full control over objects, including delete and overwrite permissions, which exceeds the required read-only access and violates least privilege. Option D is wrong because granting the roles/storage.objectViewer role at the project level gives read access to all buckets in the project, not just the specific bucket containing the model, which violates least privilege by granting broader access than necessary.

Full explanation →

94

MCQhard

A healthcare startup is using Vertex AI to train a deep learning model for detecting anomalies in chest X-rays. The training dataset is 500 GB of images stored in Cloud Storage (GCS). They use a custom training container with TPU v3-32. The training job completes successfully, but the model performance is poor. On investigation, they discover that the input images were not preprocessed correctly: the images were resized to 256x256 instead of the required 512x512. They need to fix the preprocessing and retrain as quickly as possible. The preprocessing pipeline involves decompressing, resizing, normalizing, and augmenting images. They have a small team and limited time. Which approach should they take?

A.Use Vertex AI Batch Transform to preprocess the images

B.Run another Vertex AI Training job with a modified container that preprocesses and trains

C.Use Dataflow with Apache Beam to build a parallel preprocessing pipeline

D.Use Cloud Data Fusion to orchestrate the preprocessing steps

AnswerC

Dataflow scales to process large volumes of data quickly in parallel.

Why this answer

Option C is correct because Dataflow with Apache Beam provides a fully managed, serverless, and highly parallel preprocessing pipeline that can efficiently process 500 GB of images in Cloud Storage. This approach decouples preprocessing from training, allowing the team to fix the resize step (256x256 to 512x512) and run the pipeline independently, then feed the corrected data into a new training job. Dataflow automatically scales resources to handle large datasets, minimizing retraining time without requiring infrastructure management.

Exam trap

Google Cloud often tests the misconception that Vertex AI Training should handle preprocessing inline, but the trap here is that decoupling preprocessing with a scalable, serverless pipeline like Dataflow is faster and more maintainable than modifying the training container or using prediction-oriented services like Batch Transform.

How to eliminate wrong answers

Option A is wrong because Vertex AI Batch Transform is designed for batch predictions on already-preprocessed data, not for transforming raw images (decompressing, resizing, normalizing, augmenting) — it lacks the flexibility to run custom preprocessing logic like image resizing. Option B is wrong because running a combined preprocessing and training container would require modifying the training code and container, which is inefficient for a quick fix; it also ties preprocessing to the training job, preventing parallelization and reuse of the preprocessing step. Option D is wrong because Cloud Data Fusion is a visual data integration tool for ETL/ELT workflows, but it is overkill for image preprocessing and does not natively support the high-throughput, parallel image transformations needed for 500 GB of X-ray images; it is better suited for structured data pipelines.

Full explanation →

95

MCQmedium

Your team has a production ML model on Vertex AI that shows a gradual decline in accuracy over the past week. The model is retrained weekly using the latest data. Which monitoring approach should you implement to detect the issue earlier?

A.Configure Vertex AI Model Monitoring to detect feature drift and alert when metrics exceed thresholds.

B.Create a Cloud Monitoring alert for prediction response count.

C.Use BigQuery ML to retrain the model more frequently.

D.Set up a Cloud Monitoring uptime check on the prediction endpoint.

AnswerA

Vertex AI Model Monitoring directly monitors for drift and skew, which helps detect accuracy decline.

Why this answer

Option B is correct because Vertex AI Model Monitoring can detect training-serving skew and data drift, which are common causes of accuracy decline. Option A is wrong because Cloud Monitoring without custom metrics cannot detect drift automatically. Option C is wrong because BigQuery ML is not a monitoring tool.

Option D is wrong because alerting on raw prediction count is irrelevant.

Full explanation →

96

MCQeasy

A team has a trained TensorFlow model running locally and wants to deploy it for low-latency online predictions on Google Cloud. Which service should they use?

A.Vertex AI Prediction

B.AI Platform Training

C.Cloud Run

D.Cloud Functions

AnswerA

Vertex AI Prediction is purpose-built for low-latency online ML predictions.

Why this answer

Vertex AI Prediction is the correct choice because it is a fully managed service designed specifically for deploying trained ML models for online (real-time) prediction with low latency. It supports importing TensorFlow SavedModel artifacts and automatically scales the serving infrastructure, including GPU/TPU support, to handle request traffic while providing built-in monitoring and explainability features.

Exam trap

Google Cloud often tests the distinction between training and prediction services, and the trap here is that candidates may confuse AI Platform Training (which is for model training) with AI Platform Prediction (now part of Vertex AI), or assume that any serverless compute like Cloud Run or Cloud Functions can handle ML inference without considering the need for GPU/TPU support and optimized serving infrastructure.

How to eliminate wrong answers

Option B (AI Platform Training) is wrong because it is a service for training ML models, not for serving predictions; using it for online predictions would require additional custom infrastructure and does not provide the low-latency serving endpoints needed. Option C (Cloud Run) is wrong because while it can host custom containers, it lacks native ML model serving optimizations such as automatic GPU/TPU acceleration, model versioning, and request batching, and would require you to manually build and manage a prediction server. Option D (Cloud Functions) is wrong because it is a serverless compute platform for event-driven, short-lived functions with a maximum timeout of 9 minutes and no support for GPU/TPU, making it unsuitable for low-latency online predictions that require persistent, stateful serving of large ML models.

Full explanation →

97

MCQmedium

A company has multiple teams that need to access and manage ML models in Vertex AI. Different teams require different permission levels: the data science team should be able to create and update models, while the MLOps team should have full control. What is the recommended approach to manage access?

A.Grant the 'aiplatform.user' role to a Google Group containing all users

B.Use folders in Google Cloud Resource Manager and assign IAM roles at the folder level

C.Use labels and tags on models to control access

D.Create a separate Google Cloud project for each team

AnswerB

Folders allow hierarchical policy management, and IAM roles can be scoped appropriately for each team.

Why this answer

Option B is correct because Google Cloud Resource Manager folders allow hierarchical IAM policy inheritance, enabling you to assign roles like 'roles/aiplatform.user' (for data science) and 'roles/aiplatform.admin' (for MLOps) at the folder level. This approach scales across multiple projects within the folder, ensuring consistent permissions without per-project duplication. It aligns with the principle of least privilege and centralized access management for Vertex AI resources.

Exam trap

The trap here is that candidates confuse resource labels/tags (which are for organization and cost allocation) with IAM-based access control, leading them to incorrectly select Option C as a viable permission management method.

How to eliminate wrong answers

Option A is wrong because granting 'aiplatform.user' to a Google Group containing all users gives the same permission level to everyone, failing to differentiate between data science (create/update) and MLOps (full control) needs; it violates least privilege. Option C is wrong because labels and tags are metadata for organizing and filtering resources, not IAM mechanisms—they cannot enforce access control or grant permissions to models. Option D is wrong because creating a separate project for each team introduces administrative overhead, breaks centralized model governance, and does not inherently solve fine-grained access within Vertex AI; it also complicates cross-team model sharing and cost tracking.

Full explanation →

98

MCQmedium

After deploying a new version of a model to a Vertex AI Endpoint, the team notices that predictions are still returning results from the old version. The deployment command used a traffic split of 100% to the new version. What is the most likely cause?

A.The model artifact uploaded was identical to the old version.

B.The traffic split was not properly updated; the endpoint is still routing 100% to the old version.

C.The new model version failed health checks and was automatically rolled back.

D.The prediction client is caching the old model response.

AnswerB

If the traffic split command is not applied correctly, the old version continues to serve.

Why this answer

Option A is correct because the traffic split update may not have taken effect if the command failed silently, or the new version is not healthy, causing the endpoint to route traffic to the old version. Option B is wrong because caching is not a typical issue for Vertex AI Endpoint. Option C is wrong because the deployment succeeded but traffic split might need explicit update.

Option D is wrong because a stale model artifact would affect the new version only.

Full explanation →

99

Multi-Selectmedium

A company is evaluating Google Cloud ML solutions. Which TWO services are appropriate for building custom machine learning models (not using pre-built APIs)? (Choose TWO.)

Select 2 answers

A.Vertex AI Workbench

B.Cloud Translation API

C.Vertex AI Training

D.Cloud AutoML

E.Cloud Vision API

AnswersA, C

Notebooks for custom model development.

Why this answer

Vertex AI Workbench is correct because it provides a Jupyter-based development environment where data scientists can write custom code, train models from scratch, and manage the entire ML workflow without relying on pre-built APIs. It supports custom containers, frameworks like TensorFlow and PyTorch, and integrates with Vertex AI Training for distributed training.

Exam trap

Google Cloud often tests the distinction between 'building custom models' and 'using pre-built APIs' — candidates mistakenly choose AutoML or pre-built APIs because they think any ML service that trains models qualifies, but the question explicitly requires building from scratch without pre-built models.

Full explanation →

100

MCQmedium

A company uses AutoML Tables (Vertex AI AutoML for tabular data) to predict customer churn. Their dataset has 10,000 rows and 50 features. During training, they notice the model's performance is poor. Which approach is most likely to improve the model?

A.Enable automatic feature engineering transformations

B.Switch to BigQuery ML linear regression

C.Increase the training budget to 10 node hours

D.Remove 20 features to reduce noise

AnswerA

AutoML can create new features from existing ones automatically.

Why this answer

AutoML Tables (Vertex AI AutoML for tabular data) includes automatic feature engineering transformations such as scaling, one-hot encoding, and feature cross creation. These transformations are essential for capturing non-linear relationships and interactions between features, which can significantly improve model performance when the default preprocessing is insufficient. Enabling this option directly addresses the poor performance by allowing the model to learn more complex patterns from the data.

Exam trap

Google Cloud often tests the misconception that increasing training budget or reducing features is a universal fix for poor model performance, when in fact the most impactful first step is to enable automatic feature engineering to let the model learn better representations from the data.

How to eliminate wrong answers

Option B is wrong because switching to BigQuery ML linear regression would likely worsen performance, as linear regression assumes a linear relationship between features and target, which is rarely the case in churn prediction; AutoML is designed to handle non-linear patterns. Option C is wrong because increasing the training budget to 10 node hours does not address the root cause of poor performance—it only allows more time for training, but if the model's architecture or preprocessing is inadequate, more budget will not fix the underlying issue. Option D is wrong because removing 20 features arbitrarily may discard valuable information; AutoML Tables can handle high-dimensional data and automatically identify feature importance, so reducing features without analysis can harm performance.

Full explanation →

101

Multi-Selecteasy

Which TWO are benefits of using Vertex AI Pipelines for ML workflow orchestration over deploying custom Airflow DAGs in Cloud Composer? (Choose TWO.)

Select 2 answers

A.Managed infrastructure without manual configuration

B.Built-in scheduling capabilities

C.Automatic artifact lineage tracking

D.Native integration with Vertex AI services

E.Support for arbitrary Python code in steps

AnswersC, D

Vertex Pipelines automatically tracks metadata and artifacts.

Why this answer

Option C is correct because Vertex AI Pipelines automatically captures and tracks artifact lineage (inputs, outputs, and their relationships) as part of the ML metadata store. This built-in lineage tracking is a key differentiator from custom Airflow DAGs, where you must manually implement artifact tracking using external tools or custom code.

Exam trap

Google Cloud often tests the misconception that managed infrastructure and scheduling are unique to Vertex AI Pipelines, when in fact Cloud Composer also provides these features, so candidates must focus on the specific differentiators like native integration and automatic lineage tracking.

Full explanation →

102

MCQmedium

A data engineer is setting up a data pipeline for ML training. The raw data is in Cloud Storage, and they need to transform it into features stored in Vertex AI Feature Store. The pipeline should run daily. Which service should they use?

A.Cloud Composer with Airflow DAG.

B.Cloud Dataproc with Spark.

C.Dataflow with Apache Beam pipeline.

D.Vertex AI Pipelines with custom components.

E.Cloud Functions on a schedule.

AnswerC

Dataflow can read from Cloud Storage, transform, and write to Feature Store efficiently.

Why this answer

Dataflow with Apache Beam is the correct choice because it provides a fully managed, serverless service for both batch and streaming data processing, which is ideal for transforming raw data from Cloud Storage into features for Vertex AI Feature Store on a daily schedule. Dataflow handles auto-scaling, exactly-once processing, and integrates natively with Google Cloud services, making it efficient for ETL pipelines that need to run reliably at scale.

Exam trap

Google Cloud often tests the distinction between orchestration (Cloud Composer) and actual data processing (Dataflow), leading candidates to pick Cloud Composer because they see 'schedule' in the question, but the core requirement is transforming data, not just scheduling it.

How to eliminate wrong answers

Option A is wrong because Cloud Composer with Airflow DAG is primarily an orchestration tool for scheduling and monitoring workflows, not a data processing engine; it would need to delegate the actual transformation to another service like Dataflow or Dataproc. Option B is wrong because Cloud Dataproc with Spark is optimized for big data analytics and interactive queries, but it requires managing clusters and is less suited for a simple, daily batch transformation pipeline that benefits from serverless, auto-scaling execution. Option D is wrong because Vertex AI Pipelines with custom components is designed for orchestrating ML workflows (e.g., training, evaluation, deployment), not for generic data transformation tasks; it adds unnecessary complexity for a simple daily ETL job.

Option E is wrong because Cloud Functions on a schedule is limited by a 9-minute timeout and 2 GB memory, making it unsuitable for processing large volumes of raw data from Cloud Storage into features.

Full explanation →

103

Multi-Selecthard

A company uses Vertex AI Model Monitoring to detect training-serving skew. They have a categorical feature 'product_category' with high cardinality. The monitoring job alerts for skew, but the data scientists believe the model performance is still acceptable. Which THREE actions should the team take to investigate and resolve the alert?

Select 3 answers

A.Examine which categories have the largest distribution changes to understand the nature of the shift.

B.Adjust the alerting threshold based on historical drift patterns to reduce noise.

C.Compare model performance metrics (e.g., AUC) on the drifted segment vs. the non-drifted segment.

D.Remove the drifted categories from the feature set to eliminate the alert.

E.Ignore the alert because the model is performing well; monitoring alerts are often false positives.

AnswersA, B, C

Identifying specific categories helps assess whether the drift is due to seasonal effects or other benign causes.

Why this answer

Option A is correct because examining which categories have the largest distribution changes allows the team to pinpoint the root cause of the training-serving skew. In Vertex AI Model Monitoring, the skew alert is based on statistical distance metrics (e.g., Jensen-Shannon divergence) between training and serving distributions. By drilling down into the specific categories driving the divergence, the team can assess whether the shift is benign (e.g., seasonal) or problematic, rather than relying on aggregate model performance alone.

Exam trap

Google Cloud often tests the misconception that a model's aggregate performance metrics (e.g., AUC) are sufficient to dismiss drift alerts, but the trap is that drift can be localized to specific segments without affecting overall metrics, requiring per-segment evaluation.

Full explanation →

104

MCQmedium

An ML team is using Vertex AI Pipelines to automate model training and deployment. They want to reuse components across multiple pipelines. What is the best practice for managing component code?

A.Define components inline in the pipeline definition

B.Embed component code in Cloud Composer DAGs

C.Copy the component definitions into each pipeline's YAML file

D.Use Cloud Functions to define components

E.Store components as container images in Artifact Registry and reference them from pipelines

AnswerE

Centralized, versioned, reusable.

Why this answer

Option E is correct because Vertex AI Pipelines natively supports reusable components by packaging them as container images stored in Artifact Registry. This allows teams to version, share, and reference components across multiple pipelines without duplicating code, ensuring consistency and reducing maintenance overhead. Container images encapsulate the component's runtime environment and logic, making them portable and independently deployable.

Exam trap

Google Cloud often tests the misconception that inline definitions or YAML duplication are acceptable for reuse, but the trap here is that candidates overlook the requirement for versioned, decoupled, and independently deployable components, which only container images in a registry can provide.

How to eliminate wrong answers

Option A is wrong because defining components inline in the pipeline definition tightly couples the component logic to a specific pipeline, preventing reuse across multiple pipelines and making versioning difficult. Option B is wrong because Cloud Composer DAGs are used for orchestrating Apache Airflow workflows, not for defining Vertex AI pipeline components; embedding component code in DAGs would violate separation of concerns and is not a supported pattern for Vertex AI Pipelines. Option C is wrong because copying component definitions into each pipeline's YAML file leads to code duplication, version drift, and increased maintenance burden, contradicting the goal of reusability.

Option D is wrong because Cloud Functions are event-driven serverless functions, not designed to define or host reusable pipeline components; they lack the containerized runtime and dependency management required by Vertex AI Pipelines.

Full explanation →

105

MCQhard

The pipeline fails during the evaluate component with error "Model not found". What is the most likely cause?

A.The dataset_id is misspelled

B.The model_id parameter is referencing the wrong output

C.The training container did not produce a model artifact

D.The threshold value is invalid

AnswerB

Correct: Output name mismatch causes Model not found.

Why this answer

The error 'Model not found' during the evaluate component indicates that the model_id parameter is referencing an output that does not exist or is incorrectly named. In SageMaker Pipelines, the evaluate step typically takes the model artifact from a previous training step via a PropertyFile or JsonGet, and if the model_id points to a wrong output (e.g., a different step's output or a misspelled reference), the pipeline cannot locate the model. This is the most likely cause because the error is specific to model resolution, not dataset or threshold issues.

Exam trap

Google Cloud often tests the distinction between resource resolution errors (like 'Model not found') and data/validation errors, tricking candidates into confusing dataset or threshold issues with pipeline step output references.

How to eliminate wrong answers

Option A is wrong because a misspelled dataset_id would cause a 'Dataset not found' or data loading error, not a 'Model not found' error during evaluation. Option C is wrong because if the training container did not produce a model artifact, the pipeline would fail earlier in the training step with an artifact missing error, not during the evaluate component. Option D is wrong because an invalid threshold value would cause a validation or scoring error within the evaluate step, not a 'Model not found' error, which is a resource resolution issue.

Full explanation →

106

MCQmedium

A company uses Vertex AI for training. They have a large dataset stored in Cloud Storage and need to train a custom model using TensorFlow. The training job is failing with an out-of-memory error. What is the best first step?

A.Reduce model size.

B.Enable data sharding and reduce input pipeline parallelism.

C.Use a larger machine type.

D.Increase the batch size.

AnswerB

Reduces memory footprint of data loading.

Why this answer

Option D is correct because enabling data sharding and reducing input pipeline parallelism can lower memory usage from data loading. Option A is wrong because increasing batch size would increase memory usage. Option B is wrong but might be a later step; it's not the best first step as it increases cost.

Option C is wrong because reducing model size may degrade accuracy and is not a first step.

Full explanation →

107

Multi-Selecteasy

Which THREE factors should be considered when choosing a compute option for serving a deep learning model in production on Google Cloud? (Choose three.)

Select 3 answers

A.Integration with Vertex AI for model monitoring

B.Autoscaling capabilities to handle variable traffic

C.GPU or TPU requirements for model inference

D.The programming language used for training

E.The color of the team's logo

AnswersA, B, C

Monitoring integration is crucial for production.

Why this answer

A is correct because Vertex AI provides integrated model monitoring capabilities, including feature drift detection, prediction skew analysis, and outlier detection, which are essential for maintaining model performance in production. Without this integration, you would need to build custom monitoring pipelines, increasing operational complexity.

Exam trap

The trap here is that candidates might think the training language (D) matters for serving, but Google Cloud serving infrastructure is language-agnostic as long as the model is exported in a supported format, making this a common distractor.

Full explanation →

108

MCQhard

A large e-commerce company uses Vertex AI Pipelines to orchestrate its recommendation model training. The pipeline has several parallel components: feature engineering, model training, and model evaluation. Recently, they noticed that the pipeline often fails due to resource exhaustion in the Vertex AI custom training job for the model training component. The training job consumes significant memory and occasionally exceeds the allocated memory limit, causing the pod to be OOMKilled. The team has already increased the memory to the maximum allowed for the chosen machine type. They need to prevent the pipeline from failing while still using the same machine type. Which approach should they take?

A.Split the training component into multiple smaller steps that process data in chunks to reduce peak memory usage.

B.Use a larger machine type with more memory to accommodate the peaks.

C.Add a memory check step before training that estimates memory usage and skips training if it exceeds the limit.

D.Implement a retry policy with exponential backoff for the training component, so it automatically retries on failure.

AnswerA

This reduces memory footprint and avoids exceeding the limit, allowing successful completion.

Why this answer

Option A is correct because splitting the training component into smaller steps that process data in chunks directly addresses the root cause of OOMKilled failures—peak memory usage exceeding the allocated limit. By reducing the memory footprint per step, the pipeline can stay within the maximum memory of the existing machine type without requiring a larger instance. This approach aligns with best practices for Vertex AI custom training jobs, where resource limits are fixed per machine type and cannot be exceeded.

Exam trap

Google Cloud often tests the misconception that retry policies or pre-checks can solve resource exhaustion, but the correct approach is to redesign the component to reduce peak memory usage, as retries do not fix the underlying OOM condition.

How to eliminate wrong answers

Option B is wrong because it suggests using a larger machine type, which contradicts the requirement to keep the same machine type; it also may increase cost unnecessarily without solving the underlying memory inefficiency. Option C is wrong because adding a memory check step that skips training on high memory usage would cause the pipeline to fail or produce no model, which does not prevent failure—it merely avoids it by not running the component. Option D is wrong because implementing a retry policy with exponential backoff does not address the resource exhaustion; the training job will repeatedly fail with OOMKilled on each retry, wasting time and compute resources without resolving the memory limit issue.

Full explanation →

109

Multi-Selecthard

Which THREE components should you include in a comprehensive model monitoring dashboard for a production ML system?

Select 3 answers

A.Team member roles and responsibilities

B.System resource utilization (CPU, memory, latency)

C.Input data quality metrics (missing values, outliers)

D.Training pipeline code version

E.Model performance metrics (accuracy, precision, recall) over time

AnswersB, C, E

Ensures infrastructure is healthy.

Why this answer

Option B is correct because system resource utilization metrics (CPU, memory, latency) are essential for monitoring the health and performance of the production infrastructure hosting the ML model. These metrics help detect resource bottlenecks, scaling issues, or degradation that could impact inference latency and throughput, which are critical for maintaining service-level objectives (SLOs).

Exam trap

Google Cloud often tests the distinction between operational governance artifacts (like team roles) and actual monitoring metrics; the trap here is confusing project management documentation with the technical components of a live monitoring dashboard.

Full explanation →

110

MCQhard

An e-commerce company uses a Vertex AI endpoint for product recommendations. Recently, the click-through rate (CTR) dropped significantly. Model monitoring shows no significant data drift or skew. Logs show increased latency but no errors. Which technique should the engineer use to diagnose the issue?

A.Increase the endpoint's request timeout value to accommodate the higher latency.

B.Enable autoscaling on the endpoint to reduce latency by adding more nodes.

C.Retrain the model with the most recent user interaction data.

D.Analyze the prediction output distribution using Vertex AI Model Monitoring for prediction drift and compare to a baseline.

AnswerD

Prediction drift can directly impact CTR even without data drift.

Why this answer

Option D is correct because the drop in CTR despite no data drift or skew suggests that the model's predictions have shifted in distribution (prediction drift), even if the input features remain stable. Vertex AI Model Monitoring can compare the current prediction output distribution against a baseline to detect such drift, which directly explains the CTR decline. The increased latency is a symptom, not the root cause, and fixing latency alone would not restore CTR.

Exam trap

Google Cloud often tests the distinction between data drift (input distribution changes) and prediction drift (output distribution changes), and candidates mistakenly assume that no data drift means the model is fine, overlooking that the model's predictions can still degrade due to concept drift.

How to eliminate wrong answers

Option A is wrong because increasing the request timeout does not address the root cause of the CTR drop; it only masks the latency issue and may lead to worse user experience if predictions are stale. Option B is wrong because enabling autoscaling reduces latency by adding nodes, but the CTR drop is not caused by latency; it is a prediction quality issue, and autoscaling does not fix prediction drift. Option C is wrong because retraining with recent data assumes the model is stale, but monitoring shows no data drift or skew, so the input distribution is fine; the problem is in the output distribution, and retraining without investigating prediction drift may not resolve the issue.

Full explanation →

111

MCQmedium

A healthcare startup is developing a diagnostic model using sensitive patient data. They use Vertex AI to manage the training pipeline. They need to ensure that the data is encrypted both at rest and in transit. Additionally, they want to prevent the ML engineers from seeing raw data but still allow them to train models. They use Cloud Storage with CMEK and VPC-SC. They plan to use Vertex AI Training with a custom service account. The data stored in Cloud Storage is encrypted with CMEK. What additional step is needed to allow Vertex AI Training to access the encrypted data?

A.Use a service account with the 'Storage Admin' role and 'Cloud KMS CryptoKey Decrypter' role.

B.Grant the Cloud Storage service agent the Cloud KMS CryptoKey Decrypter role.

C.Disable encryption for the training data to simplify access.

D.Grant the custom service account the Cloud KMS CryptoKey Decrypter role.

AnswerD

The custom service account used by Vertex AI Training must have decrypt permission to read CMEK-encrypted data.

Why this answer

The correct answer is D because Vertex AI Training must use a custom service account that has the Cloud KMS CryptoKey Decrypter role to decrypt the CMEK-encrypted data stored in Cloud Storage. The custom service account is the identity that Vertex AI jobs run as, and it needs explicit permission to decrypt the CMEK key to read the training data. Without this role, the encrypted objects remain inaccessible even if the service account has Storage Object Viewer permissions.

Exam trap

The trap here is that candidates often confuse the Cloud Storage service agent (used for default encryption) with the custom service account that Vertex AI jobs use, leading them to incorrectly grant permissions to the wrong principal.

How to eliminate wrong answers

Option A is wrong because the 'Storage Admin' role is overly permissive and unnecessary; the service account only needs 'Storage Object Viewer' to read data, and the 'Cloud KMS CryptoKey Decrypter' role is required but must be granted to the custom service account, not a generic admin account. Option B is wrong because the Cloud Storage service agent is used for server-side operations like bucket-level encryption, not for granting access to a custom service account used by Vertex AI Training; the decrypter role must be on the custom service account that runs the training job. Option C is wrong because disabling encryption violates the requirement to protect sensitive patient data at rest and in transit, and it is not a valid security practice for a healthcare startup.

Full explanation →

112

MCQeasy

A retail company wants to forecast weekly sales for each of its 500 stores. The data includes historical sales, promotions, holidays, and local weather. The company needs to update forecasts every week with new data. Which ML approach should they use?

A.Use BigQuery ML to create a linear regression model on historical data

B.Use Vertex AI Forecasting to train a time-series model with holiday and weather features

C.Export data to AutoML Tables and train a regression model

D.Build a custom LSTM model using TensorFlow on Vertex AI Workbench

AnswerB

Vertex AI Forecasting is designed for time series with multiple features and supports automatic retraining.

Why this answer

Vertex AI Forecasting is purpose-built for time-series forecasting with support for exogenous features like holidays and weather, making it the ideal choice for weekly sales predictions across 500 stores. It handles multiple time series automatically and integrates with the required weekly retraining cycle, unlike generic regression models that lack temporal awareness.

Exam trap

Google Cloud often tests the distinction between general regression (which assumes i.i.d. data) and time-series forecasting (which requires temporal dependencies and exogenous features), leading candidates to pick a simpler regression option like BigQuery ML or AutoML Tables instead of the specialized forecasting service.

How to eliminate wrong answers

Option A is wrong because BigQuery ML linear regression treats data as independent rows, ignoring the temporal ordering and seasonality inherent in sales forecasting, and cannot natively handle multiple time series (500 stores) with exogenous features like holidays. Option C is wrong because AutoML Tables is designed for tabular regression with independent rows, not time-series forecasting, and would require manual feature engineering to capture time dependencies, leading to poor forecast accuracy. Option D is wrong because building a custom LSTM on Vertex AI Workbench is overkill for this problem—Vertex AI Forecasting already provides a managed, scalable time-series solution with built-in support for holiday and weather features, avoiding the operational overhead of custom model development and hyperparameter tuning.

Full explanation →

113

MCQmedium

You are using Vertex AI continuous evaluation (model monitoring) for your deployed model. You receive an alert that the prediction distribution is significantly different from the training distribution. What should you do first?

A.Roll back the model to the previous version immediately.

B.Increase the alerting threshold to reduce false positives.

C.Analyze the input data to understand if there is a skew or drift.

D.Retrain the model using the latest data and redeploy.

AnswerC

Diagnosing the cause is the appropriate first step.

Why this answer

When a monitoring alert triggers, the first step is to investigate the root cause: check if input data has changed, retraining is needed, or there is a data pipeline issue. Simply rolling back or retraining without analysis might be premature.

Full explanation →

114

MCQeasy

A data scientist wants to quickly train a binary classification model on a tabular dataset stored in BigQuery without writing any code. They have limited ML experience. Which Google Cloud service should they use?

A.Vertex AI Workbench with a built-in scikit-learn notebook.

B.Dataflow with a TensorFlow pipeline.

C.BigQuery ML with CREATE MODEL statement using SQL.

D.AutoML Tables with a direct BigQuery connection.

AnswerC

BigQuery ML enables model creation with SQL, no coding required.

Why this answer

Option C is correct because BigQuery ML allows a data scientist to train a binary classification model directly in BigQuery using a `CREATE MODEL` SQL statement, without writing any code or moving data. This is the fastest low-code approach for users with limited ML experience, as it leverages familiar SQL syntax and runs entirely within BigQuery's serverless infrastructure.

Exam trap

Google Cloud often tests the distinction between 'low-code' (BigQuery ML) and 'no-code' (AutoML) services, but the trap here is that AutoML Tables requires more setup and data movement, while BigQuery ML is the fastest no-code option for users already working in BigQuery.

How to eliminate wrong answers

Option A is wrong because Vertex AI Workbench requires writing Python code (e.g., scikit-learn) and managing a notebook environment, which is not a no-code solution and exceeds the 'limited ML experience' constraint. Option B is wrong because Dataflow with a TensorFlow pipeline requires writing code for pipeline construction and model training, and is designed for stream/batch data processing, not for quick no-code model training. Option D is wrong because AutoML Tables, while low-code, requires exporting data from BigQuery or connecting via a separate interface, and involves a more complex workflow than directly using BigQuery ML's SQL-based training; the question specifies 'without writing any code' and 'quickly,' and BigQuery ML is the most direct path.

Full explanation →

115

MCQhard

A data engineering team uses Dataflow for preprocessing and wants to integrate with Vertex AI Pipelines. They need to pass the preprocessed data location to the training step. What is the best practice?

A.Store the path in Data Catalog

B.Use Cloud Pub/Sub

C.Use PipelineParam to pass the output path

D.Write the output to a fixed Cloud Storage path and hardcode it in the pipeline

AnswerC

PipelineParam allows dynamic, compile-time passing of values between steps.

Why this answer

Option C is correct because PipelineParam is the native mechanism in Vertex AI Pipelines (Kubeflow Pipelines SDK) to pass runtime outputs—such as a Cloud Storage path—between components. It creates a dependency graph that ensures the training step receives the exact output path from the preprocessing step, enabling dynamic, reproducible pipelines without hardcoding.

Exam trap

The trap here is that candidates confuse metadata services (Data Catalog) or messaging systems (Pub/Sub) with pipeline parameter passing, overlooking that Vertex AI Pipelines uses Kubeflow Pipelines' built-in component I/O for deterministic, graph-based data flow.

How to eliminate wrong answers

Option A is wrong because Data Catalog is a metadata management service for discovering and tagging assets, not designed to pass runtime pipeline parameters between steps; it would introduce unnecessary latency and coupling. Option B is wrong because Cloud Pub/Sub is an asynchronous messaging service for event-driven architectures, not a direct parameter-passing mechanism within a single pipeline execution; it would add complexity and potential ordering issues. Option D is wrong because hardcoding a fixed Cloud Storage path defeats pipeline reproducibility and scalability—if the preprocessing step changes its output location (e.g., due to timestamped folders), the training step would fail or use stale data.

Full explanation →

116

Multi-Selectmedium

A company is deploying a model for online predictions on Vertex AI. They want to minimize latency while also handling traffic spikes. Which TWO configurations should they choose?

Select 2 answers

A.Use GPU machine type

B.Enable autoscaling with min replicas=1

C.Disable autoscaling and use manual scaling

D.Use CPU machine type with more memory

E.Set a fixed number of replicas equal to peak load

AnswersA, B

GPUs accelerate inference, reducing latency.

Why this answer

Option A is correct because GPU machine types on Vertex AI provide significantly faster inference for deep learning models, reducing latency per prediction. Option B is correct because enabling autoscaling with min replicas=1 ensures the model can handle traffic spikes by dynamically adding replicas while keeping at least one instance running to avoid cold starts.

Exam trap

Google Cloud often tests the misconception that manual scaling or fixed replicas are better for latency, but the correct approach is autoscaling with a minimum replica count to balance cost and responsiveness.

Full explanation →

117

MCQmedium

A data scientist deployed a TensorFlow model for sentiment analysis to Vertex AI Prediction. The model expects input key 'text' but the client sends requests with key 'review_text'. Which step should the data scientist take to resolve the error without retraining the model?

A.Use a Cloud Function to strip the 'review_text' key and replace it with 'text'

B.Retrain the model with input key 'review_text'

C.Create a new Vertex AI Endpoint with an alias mapping 'review_text' to 'text'

D.Modify the client code to send requests with input key 'text'

AnswerD

This aligns the request with the model's expected signature without changing the model.

Why this answer

Option D is correct because the most straightforward and reliable solution is to modify the client code to send the request with the expected input key 'text'. This avoids any additional infrastructure, latency, or complexity, and does not require retraining the model or altering the deployed endpoint. Vertex AI Prediction serves the model as-is, so aligning the client's request format with the model's expected input is the simplest and most maintainable fix.

Exam trap

Google Cloud often tests the misconception that you need to add infrastructure (like Cloud Functions) or modify the model to handle input key mismatches, when the correct answer is to adjust the client code to match the model's expected input schema.

How to eliminate wrong answers

Option A is wrong because introducing a Cloud Function adds an unnecessary hop, increases latency, and creates an extra point of failure; it also violates the principle of keeping the architecture simple when a direct client-side fix exists. Option B is wrong because retraining the model is an expensive and time-consuming process that is not needed when the only issue is a key name mismatch in the request payload. Option C is wrong because Vertex AI Endpoints do not support alias mappings for input keys; the endpoint simply forwards the request payload to the model, and the model's input signature is fixed at deployment time.

Full explanation →

118

MCQhard

A data scientist deployed a model to Vertex AI Prediction. When making a prediction request as shown in the exhibit, they receive a 400 error. What is the most likely cause?

A.The request JSON is malformed due to a missing comma between instances.

B.The model was trained on 2 features, but the request provides 3 features.

C.The endpoint path is incorrect; it should include the model version.

D.The request is sending 3 separate instances but the model expects only 1.

AnswerB

The error indicates the model expects 2 features per instance, but the request provides 3.

Why this answer

The 400 error indicates a malformed request, typically due to a mismatch between the input features the model expects and what is provided. Since the model was trained on 2 features but the request includes 3 features, Vertex AI rejects the prediction as invalid input shape mismatch. This is the most common cause of 400 errors in Vertex AI Prediction when the instance structure does not match the model's signature.

Exam trap

The trap here is that candidates confuse a 400 error with a routing or versioning issue (Option C) or assume JSON syntax errors (Option A), but the real cause is a feature count mismatch, which is a common pitfall when deploying models with different training and serving data schemas.

How to eliminate wrong answers

Option A is wrong because a missing comma between instances would cause a JSON parse error (e.g., 400 with 'Invalid JSON payload'), but the exhibit shows valid JSON syntax with commas present. Option C is wrong because the endpoint path does not require a model version; Vertex AI Prediction uses the endpoint resource name, and versioning is handled via traffic splitting or aliases, not in the URL path. Option D is wrong because Vertex AI Prediction supports batch prediction with multiple instances in a single request, and the model expects exactly 1 instance per request only if the model's serving signature specifies a fixed batch size of 1, which is not indicated here.

Full explanation →

119

MCQmedium

Your team manages a production ML pipeline on Google Cloud that trains a fraud detection model every 6 hours using new transaction data. The pipeline steps are: (1) Cloud Function triggered by new files in Cloud Storage to validate data, (2) Dataflow job for feature engineering, (3) Vertex AI CustomJob for training, (4) Cloud Function to deploy the model to a Vertex AI endpoint after evaluation. You notice that the pipeline sometimes fails during the Dataflow job step with an error: 'Workflow failed. Causes: The job encountered a system error. Please try again later.' The error occurs sporadically, and retrying the pipeline manually usually succeeds. The team needs a reliable automated solution. What should you do?

A.Schedule the pipeline to run less frequently to reduce load on the Dataflow service.

B.Use Cloud Tasks to queue the Dataflow job and retry on failure.

C.Increase the number of Dataflow workers and use flexRS to handle transient errors.

D.Orchestrate the pipeline using Cloud Composer with retry policies on the Dataflow operator.

AnswerD

Cloud Composer (Airflow) can manage the pipeline DAG with automatic retries and dependencies.

Why this answer

Option D is correct because Cloud Composer (Apache Airflow) provides native retry policies on its Dataflow operators, enabling automatic retries of the Dataflow job when it fails due to transient system errors. This addresses the sporadic failure pattern without manual intervention, ensuring the pipeline runs reliably every 6 hours.

Exam trap

The trap here is that candidates confuse scaling solutions (Option C) with fault-tolerance mechanisms, or they choose a generic queuing service (Option B) instead of a dedicated orchestrator with built-in retry policies for pipeline steps.

How to eliminate wrong answers

Option A is wrong because reducing pipeline frequency does not resolve transient system errors in Dataflow; it only delays processing and may cause data staleness. Option B is wrong because Cloud Tasks is a generic task queue that lacks native integration with Dataflow job lifecycle management and retry logic for pipeline-specific errors. Option C is wrong because increasing workers and using FlexRS improves resource availability but does not handle transient system errors that are unrelated to worker count or preemptibility; FlexRS is for cost savings on preemptible VMs, not for retry logic.

Full explanation →

120

MCQmedium

A retail company wants to build a customer churn prediction model using AutoML Tables. They have a dataset with 5000 rows and 50 features, including customer ID, transaction history, and support tickets. The target is a binary column 'churned'. After training, the model shows high accuracy but low recall for the churned class. What is the most likely cause?

A.The dataset is too small for AutoML to train effectively.

B.The features are not normalized, leading to biased predictions.

C.The churned class is underrepresented, causing the model to favor the majority class.

D.The dataset includes a unique customer ID feature, causing overfitting.

AnswerC

Class imbalance leads to high accuracy but low recall for minority class.

Why this answer

Option C is correct because in imbalanced datasets, AutoML Tables optimizes for overall accuracy, which can be high if the majority class dominates. Low recall for the churned class indicates the model predicts most instances as non-churned, a classic symptom of class imbalance. AutoML Tables provides class weighting and sampling options to mitigate this, but without them, the model favors the majority class.

Exam trap

Google Cloud often tests the misconception that high accuracy always means a good model, trapping candidates who overlook class imbalance as the root cause of poor recall for the minority class.

How to eliminate wrong answers

Option A is wrong because 5000 rows is generally sufficient for AutoML Tables to train effectively, especially with 50 features; the issue is class imbalance, not dataset size. Option B is wrong because AutoML Tables automatically handles feature normalization internally, so unnormalized features do not cause biased predictions in this context. Option D is wrong because including a unique customer ID feature can cause overfitting, but the symptom described (high accuracy, low recall) is characteristic of class imbalance, not overfitting; overfitting would typically show high training accuracy but poor generalization, not specifically low recall for a minority class.

Full explanation →

121

Multi-Selecteasy

An ML team is converting a prototype model to a production pipeline using Vertex AI. They want to ensure model versioning and lineage. Which two practices should they adopt? (Select TWO)

Select 2 answers

A.Use Vertex AI Model Registry to manage model versions.

B.Only keep the latest model version to save storage.

C.Store model artifacts in Cloud Storage with unique versioned directories.

D.Train models directly in production without tracking.

E.Use a separate GCP project for each model version.

AnswersA, C

Integrates with other Vertex AI services for lineage.

Why this answer

Options A and B are correct. Storing model artifacts in Cloud Storage with versioned directories and using Vertex AI Model Registry provide organized versioning and lineage tracking. Option C is wrong because keeping only the latest version loses history.

Option D is wrong because using a separate GCP project per version is unnecessary and complex. Option E is wrong because not tracking versions is poor practice.

Full explanation →

122

MCQeasy

A team prototypes a recommendation model using a Jupyter notebook on Vertex AI Workbench. They want to productionize the model with CI/CD. Which approach should they use to package the model for deployment?

A.Use Cloud Build to deploy the notebook directly as a prediction endpoint

B.Store the model in Cloud Source Repositories and deploy from there

C.Containerize the model and push to Artifact Registry, then deploy via Cloud Run

D.Upload the model to Vertex AI Model Registry and use it for deployment

AnswerD

Model Registry manages versions and deployment targets.

Why this answer

Vertex AI Model Registry is the central repository for managing ML models, enabling versioning, evaluation, and deployment to endpoints. This approach integrates with CI/CD pipelines via the Vertex AI SDK or Cloud Build, allowing automated model promotion and deployment without manual packaging. Option D directly leverages Vertex AI's native deployment workflow, which is the recommended path for productionizing models from Workbench.

Exam trap

Google Cloud often tests the misconception that any storage or code repository (like Cloud Source Repositories or Artifact Registry) can directly serve as a deployment mechanism, when in fact Vertex AI Model Registry is the required service for managing and deploying models within Vertex AI's ecosystem.

How to eliminate wrong answers

Option A is wrong because Cloud Build cannot deploy a Jupyter notebook directly as a prediction endpoint; notebooks contain code and dependencies that must be containerized or exported as a model artifact first. Option B is wrong because Cloud Source Repositories is a code hosting service, not a model deployment mechanism; storing code there does not create a deployable endpoint. Option C is wrong because while containerization and Artifact Registry are valid for custom serving, Vertex AI Model Registry provides built-in model versioning, evaluation, and endpoint management that aligns with Vertex AI's native CI/CD capabilities, making it the more direct and recommended approach for this scenario.

Full explanation →

123

MCQmedium

A team of data scientists and ML engineers is collaborating on a project using Vertex AI Workbench. They need to share notebooks and code, but want to avoid conflicts and maintain a history of changes. Which approach should they use?

A.Email notebook files to each other and manually merge changes.

B.Store notebooks in a shared Cloud Storage bucket and access them simultaneously.

C.Use Vertex AI Experiments to share notebook outputs.

D.Use a git repository (e.g., Cloud Source Repositories) to manage code and notebooks.

AnswerD

Git provides branching, merging, and history.

Why this answer

Option D is correct because using a git repository (e.g., Cloud Source Repositories) provides version control, branching, and a full history of changes, which is essential for collaborative development. This approach avoids conflicts by allowing team members to work on separate branches and merge changes systematically, unlike shared storage or manual methods that lack conflict resolution and audit trails.

Exam trap

The trap here is that candidates confuse collaboration tools (like shared storage or experiment tracking) with version control, assuming that any shared access or logging mechanism can replace the structured history and conflict resolution of a git-based workflow.

How to eliminate wrong answers

Option A is wrong because emailing notebook files and manually merging changes is error-prone, lacks any version history or conflict detection, and does not scale for team collaboration. Option B is wrong because storing notebooks in a shared Cloud Storage bucket and accessing them simultaneously can lead to write conflicts, data corruption, and no built-in version history or merge capabilities. Option C is wrong because Vertex AI Experiments is designed for tracking and comparing model training runs and their metrics, not for managing source code or notebook version control.

Full explanation →

124

MCQeasy

The exhibit shows a Vertex AI PipelineJob submission command. The pipeline fails because the component cannot find the input data. What is the most likely cause?

A.The pipeline root path is incorrect

B.The pipeline name is misspelled

C.The input data path is not accessible by the Vertex AI Pipelines service account

D.The region does not support the component

AnswerC

The component likely expects a Cloud Storage path for data, and the service account lacks read permissions.

Why this answer

Option C is correct because the most likely cause of the pipeline failing to find input data is that the Vertex AI Pipelines service account lacks the necessary permissions to access the specified input data path. Vertex AI Pipelines uses the Compute Engine default service account (or a custom service account) to read data from Cloud Storage or other sources; if this account does not have the `storage.objectViewer` role (or equivalent) on the bucket or object, the component will fail with a permission-denied error, even if the path is syntactically correct.

Exam trap

Google Cloud often tests the misconception that a misspelled pipeline name or incorrect pipeline root path is the cause of runtime data access failures, when in fact the service account's IAM permissions on the data source are the critical factor.

How to eliminate wrong answers

Option A is wrong because an incorrect pipeline root path would cause a failure to store pipeline artifacts or metadata, not a failure to find input data; the input data path is specified separately in the component's parameters. Option B is wrong because a misspelled pipeline name would cause the pipeline submission to fail at the API validation stage (e.g., an invalid name error), not during runtime when the component tries to access input data. Option D is wrong because the region not supporting the component would result in a resource or API availability error at submission time, not a runtime data access failure.

Full explanation →

125

MCQmedium

Refer to the exhibit. A team configured Vertex AI Model Monitoring with skew detection for feature "income" with a threshold of 0.2. However, they have not received any alerts even though they suspect data drift. What is the most likely reason?

A.The monitoring is not enabled for the endpoint

B.The 'income' feature is not present in the serving data

C.The actual skew is below the threshold

D.The drift detection threshold is set higher

AnswerB

If the feature is missing from serving data, skew detection cannot perform comparison and will not generate alerts.

Why this answer

If the 'income' feature is not present in the serving data, the skew detection cannot compute a comparison, and no alert is generated even if other drifts exist. The threshold being low would increase alerts, not suppress them. The monitoring likely is enabled since the config is present.

The drift threshold for drift detection is separate.

Full explanation →

126

MCQmedium

A data science team deploys a custom container on Vertex AI Prediction for a PyTorch model. After deployment, the model returns predictions that are consistently off by a constant factor. The model performed correctly during local testing. What is the most likely cause?

A.The model is loaded in evaluation mode, but the training mode was used in testing.

B.The serving input function in the container is not applying the same normalization as during training.

C.The container is using a different PyTorch version than the training environment.

D.There is a bug in the custom container's prediction route.

AnswerB

Preprocessing mismatch, such as scaling by different factors, leads to constant offset in predictions.

Why this answer

Option B is correct because a constant factor error typically indicates a preprocessing mismatch, such as different normalization. Option A is wrong because different PyTorch versions may cause other inconsistencies but not a constant factor. Option C is wrong because training vs evaluation mode affects dropout/batch norm, not constant scaling.

Option D is possible but less specific than preprocessing.

Full explanation →

127

Multi-Selectmedium

A company wants to set up end-to-end monitoring for a Vertex AI model. Which three components should they include?

Select 3 answers

A.Feature store backup status

B.Model performance metrics

C.Data drift and concept drift detection

D.Prediction latency

E.Model training cost

AnswersB, C, D

Performance metrics like AUC or RMSE are essential for model health.

Why this answer

Model performance metrics (Option B) are essential for end-to-end monitoring because they track how well the Vertex AI model is performing over time using key indicators like accuracy, precision, recall, or AUC-ROC. This allows the team to detect degradation in prediction quality, which is a core requirement for maintaining model reliability in production.

Exam trap

The trap here is that candidates often confuse operational or cost-related metrics (like backup status or training cost) with the three core pillars of model monitoring: performance metrics, drift detection, and latency tracking.

Full explanation →

128

MCQhard

An organization runs a batch prediction job on Vertex AI for a large dataset (10 TB). The job is configured to use a cluster of 100 n1-standard-16 machines. Midway through, the job fails with 'Out of memory' errors. What is the most effective mitigation strategy?

A.Split the input data into smaller chunks and run multiple jobs.

B.Enable model parallelism within the prediction script.

C.Increase the number of machines to distribute data more.

D.Use a machine type with more memory per instance.

AnswerD

Directly addresses the OOM by providing more memory for each worker.

Why this answer

The 'Out of memory' error indicates that individual worker nodes are running out of RAM when processing their assigned data shards. Using a machine type with more memory per instance (e.g., n1-highmem-16) directly addresses the root cause by providing each node with sufficient memory to hold the model and its intermediate computations, without changing the data distribution or parallelism strategy.

Exam trap

The trap here is that candidates confuse scaling horizontally (adding more machines) with scaling vertically (increasing per-machine resources), assuming that distributing data further will fix memory exhaustion when the bottleneck is per-node RAM capacity, not data volume per node.

How to eliminate wrong answers

Option A is wrong because splitting the input data into smaller chunks and running multiple jobs does not increase the memory available per machine; it only reduces the data per job, but the same memory constraint per node will still cause OOM errors if the model or batch size per node remains unchanged. Option B is wrong because model parallelism splits the model across devices, which is typically used for very large models that cannot fit on a single GPU/TPU, not for batch prediction jobs where the model is already loaded and the issue is data processing memory. Option C is wrong because increasing the number of machines distributes the data across more nodes, but each node still has the same 16 GB of RAM (n1-standard-16), so the per-node memory pressure remains identical and OOM errors will persist.

Full explanation →

129

Multi-Selecteasy

Which THREE of the following are supported output types for BigQuery ML?

Select 3 answers

A.Classification

B.Object detection

C.Anomaly detection

D.Time-series forecasting

E.Regression

AnswersA, D, E

e.g., logistic regression model.

Why this answer

BigQuery ML supports supervised learning tasks like classification and regression, as well as time-series forecasting, through its model types such as `LOGISTIC_REG`, `LINEAR_REG`, and `ARIMA_PLUS`. Classification (option A) is correct because BigQuery ML provides `LOGISTIC_REG` for binary and multi-class classification problems, outputting predicted labels or probabilities.

Exam trap

Google Cloud often tests the distinction between supported BigQuery ML output types and broader ML capabilities, leading candidates to mistakenly include object detection or anomaly detection, which are not native output types in BigQuery ML's SQL-based interface.

Full explanation →

130

MCQeasy

You are a Machine Learning Engineer at a financial services company. You have trained a large language model (LLM) using a custom container on Vertex AI Training. The model is used for sentiment analysis on financial news articles. You have deployed the model to a Vertex AI Endpoint for online prediction. However, during peak trading hours, users report high latency ( > 5 seconds) and occasional timeout errors. The model is deployed on n1-highmem-8 machines with 1 replica. You monitor the endpoint and see that CPU utilization is high ( > 90%) and memory is near capacity. The queries are relatively small text inputs. Which course of action should you take to reduce latency?

A.Deploy the model to multiple endpoints and use round-robin load balancing.

B.Use Vertex AI Prediction with GPU accelerators like NVIDIA Tesla T4.

C.Increase the machine type to n1-highmem-16 and keep 1 replica.

D.Reduce the batch size for predictions to lower memory usage.

AnswerB

GPUs excel at matrix operations common in LLMs, dramatically reducing inference latency per request.

Why this answer

Option B is correct because the high CPU utilization and memory pressure indicate that the CPU is the bottleneck for inference, not the model size or input volume. Switching to GPU accelerators like NVIDIA Tesla T4 offloads the computationally intensive matrix operations of the LLM to the GPU, drastically reducing per-query latency and freeing CPU resources for preprocessing and I/O. This directly addresses the root cause of >5-second latency during peak hours.

Exam trap

Google Cloud often tests the misconception that scaling up CPU resources (vertical scaling) is the solution for high-latency inference, when in fact the correct approach for deep learning models is to offload computation to specialized hardware like GPUs or TPUs.

How to eliminate wrong answers

Option A is wrong because deploying to multiple endpoints with round-robin load balancing does not reduce per-query latency; it only distributes the load across replicas, but each replica still suffers from the same CPU bottleneck and would likely still time out. Option C is wrong because increasing the machine type to n1-highmem-16 adds more CPU cores and memory, but the inference bottleneck is the CPU's inability to parallelize the LLM's matrix operations efficiently; a larger CPU instance still cannot match GPU throughput for deep learning inference. Option D is wrong because reducing batch size for predictions would actually increase the number of inference calls and overhead, potentially worsening latency; the model already receives small text inputs, so batching is not the issue.

Full explanation →

131

MCQhard

An e-commerce company deployed a Vertex AI AutoML Tables model to predict customer churn. The model is served via a private endpoint with a dedicated machine type n1-standard-4. After a week, they observe that 5% of predictions fail with 'Request timed out' error. The average prediction time is 1.2 seconds but spikes to 4 seconds during peak hours. The input data is 50 features. They have enabled autoscaling with a min node count of 1 and max of 5. Which action is most likely to resolve the timeout issue without increasing complexity?

A.Reduce the number of features to 30.

B.Increase the max node count to 10.

C.Enable model monitoring to detect data drift.

D.Change the machine type to n1-highmem-4 to increase memory.

AnswerB

More nodes can absorb traffic spikes and reduce timeout errors.

Why this answer

Option A is correct because increasing the max node count allows the endpoint to handle peak traffic better. Option B (increasing memory) does not address compute demand. Option C (reducing features) changes the model and may degrade performance.

Option D (model monitoring) does not fix the timeout.

Full explanation →

132

MCQmedium

Refer to the exhibit. A machine learning engineer deployed a model on Vertex AI using this configuration. When testing the endpoint, the engineer receives a 400 error with the message: 'Invalid argument: Explanation metadata missing required field: `outputs`.' What is the most likely cause?

A.The explanation metadata outputs field is missing the required 'displayName' attribute.

B.The explanation metadata needs a 'baseline' configuration for the input.

C.The explanation metadata inputs field should be wrapped inside a 'visualization' block.

D.The explainability method chosen is not supported for the model type.

AnswerA

Vertex AI requires each output in explanation metadata to have a 'displayName' field.

Why this answer

The error message indicates that the explanation metadata provided in the Vertex AI endpoint configuration is missing the required `outputs` field. In Vertex AI's Explainable AI, the `outputs` field must contain at least one entry with a `displayName` attribute to define which output tensor to explain. Without this, the API rejects the request with a 400 error.

Exam trap

Google Cloud often tests the distinction between required fields in the explanation metadata (inputs vs. outputs) and their sub-attributes (like displayName), leading candidates to confuse a missing baseline or unsupported method with the actual missing outputs field.

How to eliminate wrong answers

Option B is wrong because a `baseline` configuration is required for the input, not the output; the error specifically points to the missing `outputs` field, not the input baseline. Option C is wrong because the `visualization` block is used for image-specific explanations (e.g., integrated gradients with visualization), not for wrapping the inputs field; the error is about the `outputs` field, not the inputs. Option D is wrong because the error message does not mention an unsupported explainability method; it explicitly states that the `outputs` field is missing, which is a metadata configuration issue, not a method compatibility problem.

Full explanation →

133

MCQeasy

A data analyst wants to create a classification model directly in BigQuery using SQL. Which feature should they use?

A.BigQuery ML

B.Vertex AI

C.Dataflow

D.Cloud ML Engine

AnswerA

BigQuery ML allows creating models using SQL.

Why this answer

BigQuery ML (BQML) enables users to create and execute machine learning models directly in BigQuery using standard SQL syntax, without needing to export data or manage separate ML infrastructure. For a data analyst who wants to build a classification model entirely within BigQuery, BQML provides the CREATE MODEL statement with classification algorithms like logistic regression or XGBoost, making it the correct and most direct feature.

Exam trap

Google Cloud often tests the distinction between services that run inside BigQuery (BQML) versus external ML platforms (Vertex AI), trapping candidates who think any ML service qualifies without checking if it operates directly via SQL in BigQuery.

How to eliminate wrong answers

Option B is wrong because Vertex AI is a full MLOps platform for training, deploying, and managing models, but it requires data to be exported from BigQuery and does not allow model creation directly in SQL within BigQuery. Option C is wrong because Dataflow is a stream and batch data processing service (based on Apache Beam) used for ETL and data pipelines, not for creating classification models. Option D is wrong because Cloud ML Engine (now part of Vertex AI) is a managed service for training and serving custom ML models, but it does not support SQL-based model creation inside BigQuery.

Full explanation →

134

MCQhard

A data science team deploys a large language model (LLM) on Vertex AI Prediction using an NVIDIA A100 GPU. The end-to-end latency is acceptable, but the cost is high due to low GPU utilization. The model is stateless and requests are independent. Which strategy would most effectively reduce cost per prediction?

A.Migrate the model to Cloud TPU using TensorFlow to benefit from higher throughput.

B.Use a smaller GPU, such as NVIDIA T4, and increase the number of replicas to maintain throughput.

C.Reduce the number of min replicas to 0 and scale from 0 on each request.

D.Implement dynamic batching in the serving container to aggregate multiple requests into a single inference call.

AnswerD

Batching improves GPU utilization by processing multiple requests in parallel, lowering cost per inference.

Why this answer

Option A is correct because request batching increases throughput per GPU, reducing cost per prediction. Option B is wrong because a smaller GPU may not meet latency requirements. Option C is wrong because Cloud TPUs are not designed for this model and may increase cost.

Option D is wrong because scaling down replicas reduces capacity and may cause latency spikes.

Full explanation →

135

MCQeasy

A machine learning engineer wants to monitor model performance on Vertex AI for a regression model. Which metric is most appropriate to track the average prediction error?

A.F1 score

B.Precision

C.Accuracy

D.RMSE

AnswerD

RMSE measures average prediction error in regression.

Why this answer

RMSE (Root Mean Squared Error) is the most appropriate metric for tracking average prediction error in a regression model because it measures the standard deviation of residuals (prediction errors) in the same units as the target variable. On Vertex AI, RMSE is a built-in evaluation metric for regression models, directly quantifying how far predictions deviate from actual values on average.

Exam trap

Google Cloud often tests the distinction between classification and regression metrics, and the trap here is that candidates mistakenly apply classification metrics like F1, precision, or accuracy to a regression problem, not recognizing that RMSE is the standard for continuous prediction error.

How to eliminate wrong answers

Option A is wrong because F1 score is a classification metric that combines precision and recall, not applicable to regression tasks. Option B is wrong because precision measures the proportion of true positive predictions among all positive predictions, used only in classification contexts. Option C is wrong because accuracy is the ratio of correct predictions to total predictions, suitable for classification but meaningless for continuous-valued regression outputs.

Full explanation →

136

Multi-Selectmedium

A data science team has trained a large deep learning model using Vertex AI Workbench. They want to deploy it to Vertex AI Prediction for online serving. The model is stored in a custom container with a Python-based web server. Which TWO actions should the team take to ensure optimal performance and cost?

Select 2 answers

A.Configure the model to use a larger batch size for inference.

B.Request GPU machine types for the prediction nodes.

C.Set the container's health check path to '/predict'.

D.Use a global load balancer to distribute traffic across regions.

E.Enable autoscaling with a minimum number of replicas.

AnswersB, E

Deep learning models typically require GPUs for low-latency inference.

Why this answer

B is correct because deep learning models, especially large ones, benefit significantly from GPU acceleration for online inference due to their parallel processing capabilities. Vertex AI Prediction supports GPU machine types, and using them reduces latency and improves throughput for compute-intensive model serving, which is critical for optimal performance.

Exam trap

Google Cloud often tests the misconception that health check endpoints should be the same as the prediction endpoint, but in practice, health checks must be lightweight and separate to avoid false positives and resource exhaustion.

Full explanation →

137

MCQmedium

A company deploys a custom TensorFlow model to Vertex AI Endpoint for online predictions. After deployment, prediction latency is consistently high (over 500ms) even under low traffic. The model is CPU-only and the default machine type (n1-standard-2) is used. Which action will most likely reduce prediction latency?

A.Increase the max_replica_count to 10 to allow more parallel requests.

B.Change the machine type to n1-highcpu-16 with a GPU accelerator.

C.Set min_replica_count to 3 to ensure always-on capacity.

D.Increase the batch size in the prediction request.

AnswerB

More CPU cores and GPU can reduce inference latency.

Why this answer

Option A is correct because using a machine type with more CPUs or adding a GPU accelerator can reduce inference time for compute-intensive models. Option B is wrong because increasing max replicas does not improve single-request latency. Option C is wrong because batch size affects throughput, not latency per request.

Option D is wrong because increasing min replicas reduces cold start but not steady-state latency.

Full explanation →

138

Multi-Selecthard

Which TWO options can help detect model performance degradation in production? (Choose two.)

Select 2 answers

A.Vertex AI Experiments on historical data

B.Cloud Logging for prediction errors

C.Cloud Monitoring custom metrics from serving logs

D.Vertex AI Model Monitoring (drift detection)

E.Using BigQuery to store predictions and compare with ground truth

AnswersD, E

Detects shifts in input distribution that often lead to performance degradation.

Why this answer

Options A and E are correct. Vertex AI Model Monitoring detects drift in input features, which can indicate performance degradation. Storing predictions in BigQuery and comparing with ground truth labels directly measures performance.

Option B monitors infrastructure, not model performance. Option C is training-time. Option D logs errors but not degradation.

Full explanation →

139

MCQmedium

A data science team is using Vertex AI Feature Store for online serving. They notice that the online serving latency is high. What is the most likely cause?

A.The features are being computed on the fly instead of being precomputed.

B.The feature table has too many rows.

C.The feature values are stored in Cloud Storage.

D.The online store is not configured for high throughput.

E.The serving endpoint is in a different region than the client.

AnswerC

Cloud Storage has high latency for per-request access; online store should use Bigtable or Memorystore.

Why this answer

Option C is correct because Vertex AI Feature Store requires feature values to be stored in a low-latency online store (such as a Bigtable or Redis cluster) for serving. When features are stored in Cloud Storage, each online serving request must read from object storage, which introduces significant latency due to network overhead and lack of indexing. This design violates the fundamental architecture of Feature Store, which expects precomputed features in a key-value store optimized for sub-millisecond lookups.

Exam trap

The trap here is that candidates may assume any cloud storage is acceptable for online serving, but the PMLE exam tests the specific architectural requirement that Vertex AI Feature Store must use a low-latency online store (like Bigtable or Redis) for serving, not Cloud Storage.

How to eliminate wrong answers

Option A is wrong because computing features on the fly would increase latency, but the question states the team is using Vertex AI Feature Store for online serving, which implies features are precomputed; the high latency is not due to on-the-fly computation but rather the storage backend. Option B is wrong because the number of rows in a feature table does not directly cause high online serving latency; Vertex AI Feature Store uses indexing and partitioning to handle large tables efficiently. Option D is wrong because the online store's throughput configuration affects capacity under load, not baseline latency; high latency is more likely a storage or network issue.

Option E is wrong because while cross-region latency can add delay, Vertex AI Feature Store endpoints are regional by default, and the question does not indicate a region mismatch; the more direct cause is the storage layer.

Full explanation →

140

Multi-Selecthard

A company trains a model using Vertex AI Training and then deploys it to Vertex AI Prediction. They notice that prediction requests fail with 'InvalidArgument: input tensor shape mismatch'. Which THREE are possible causes?

Select 3 answers

A.The model was exported in a different format than supported

B.The batch size in the request is too large

C.The input data types do not match the expected types (e.g., float vs int)

D.The input data has a different number of features than the model expects

E.The serving function does not include the same preprocessing as training

AnswersC, D, E

Data type mismatch causes shape or value errors.

Why this answer

Option C is correct because Vertex AI Prediction expects the input tensor data types to exactly match those used during model training. If the model was trained with float32 inputs but the prediction request sends int32 values, the serving infrastructure detects the mismatch and returns an 'InvalidArgument: input tensor shape mismatch' error, as TensorFlow Serving (which underlies Vertex AI Prediction) validates dtype consistency at the graph level.

Exam trap

Google Cloud often tests the misconception that 'shape mismatch' only refers to the number of features or dimensions, when in fact it also encompasses data type mismatches and preprocessing inconsistencies that alter the tensor structure before it reaches the model.

Full explanation →

141

MCQmedium

Refer to the exhibit. A data engineer is defining a Vertex AI Pipeline step to train a model. The pipeline fails with an error: "Failed to create vertex ai custom job: Invalid resource name." What is the most likely cause of the error?

A.The container image URI is incorrect; it should be from gcr.io/vertex-ai/training.

B.The output artifact schema is missing the 'type' property.

C.The training_data input should be a Vertex AI Dataset resource, not a simple string.

D.The machine type n1-standard-4 is not supported for Vertex AI training.

AnswerC

The input expects a dataset resource name, not a raw string.

Why this answer

Option C is correct because Vertex AI Pipeline steps that use a CustomJob to train a model require the training data input to be a Vertex AI Dataset resource (a Dataset object), not a plain string. When a string is passed instead of a Dataset resource, the pipeline attempts to create a custom job with an invalid resource name, as the backend expects a properly formatted Dataset resource name (e.g., projects/{project}/locations/{location}/datasets/{dataset_id}). This mismatch triggers the 'Invalid resource name' error.

Exam trap

Google Cloud often tests the distinction between raw data inputs (like strings or URIs) and managed Vertex AI resources (like Datasets), leading candidates to overlook that the pipeline component expects a resource object, not a simple string.

How to eliminate wrong answers

Option A is wrong because the container image URI does not need to be from gcr.io/vertex-ai/training; any valid container image URI (e.g., from Artifact Registry or a custom registry) is acceptable as long as it is accessible and correctly formatted. Option B is wrong because the output artifact schema's 'type' property is optional in Vertex AI Pipelines; missing it does not cause an 'Invalid resource name' error, which is specific to resource naming issues. Option D is wrong because n1-standard-4 is a supported machine type for Vertex AI training; the error is about resource naming, not machine type availability.

Full explanation →

142

MCQeasy

Your team manages multiple ML models in Vertex AI Model Registry. Each model has several versions deployed to different endpoints for testing and production. You need to implement a process where a model version can be promoted from a staging environment to production only after it has passed automated validation tests and been approved by a designated reviewer. The team uses CI/CD pipelines (Cloud Build) for training and deployment. Currently, model versions are deployed to endpoints using Vertex AI Endpoints with a single traffic split configuration. You want to track promotion requests and enforce approval gates. What should you do?

A.Deploy each model version to a separate endpoint, and use a custom database to track which endpoint is 'production'. Then use migration scripts to switch traffic.

B.Store the model version metadata in a BigQuery table and use a scheduled query to automatically update the endpoint deployment based on validation results.

C.Use Vertex AI Model Registry labels to mark versions as 'staging' or 'production', and create a Cloud Function that checks the label before deploying to the endpoint.

D.Use Vertex AI Model Registry version aliases ('staging', 'production') and configure Cloud Build to trigger a Cloud Run service that handles approval logic, then update the alias upon approval.

AnswerD

Version aliases provide a built-in way to denote environment stages and can be updated programmatically after validation and approval.

Why this answer

Option D is correct because Vertex AI Model Registry version aliases (e.g., 'staging', 'production') are designed to track model version lifecycle stages. By integrating Cloud Build to trigger a Cloud Run service that enforces approval logic before updating the alias, you create a clear promotion gate. This approach natively supports tracking promotion requests and enforcing approval without custom databases or manual scripts, aligning with CI/CD best practices.

Exam trap

Google Cloud often tests the distinction between labels (key-value metadata) and aliases (semantic lifecycle tags) in Vertex AI Model Registry, leading candidates to choose Option C because they confuse labels with the built-in promotion mechanism that aliases provide.

How to eliminate wrong answers

Option A is wrong because deploying each model version to a separate endpoint and using a custom database to track 'production' adds unnecessary complexity and operational overhead; Vertex AI already provides version aliases and traffic splitting to manage promotions. Option B is wrong because using a BigQuery table and scheduled queries to update endpoint deployments introduces latency and lacks real-time approval enforcement; it also bypasses the native Model Registry lifecycle management. Option C is wrong because Vertex AI Model Registry labels are key-value metadata not designed for version promotion workflows; they lack the built-in semantics of aliases and would require custom logic to enforce approval gates, whereas aliases directly support staging/production promotion.

Full explanation →

143

MCQhard

Refer to the exhibit. The team wants to automatically deploy the best-performing model version to production. They have set up a Cloud Function triggered by Model Registry events. Which alias should they use in the function to get the latest champion?

A.'champion'

B.''

C.'experiment'

D.'latest'

AnswerA

The 'champion' alias conventionally indicates the best-performing production version.

Why this answer

The 'champion' alias is specifically reserved in MLflow Model Registry to denote the best-performing model version in production. By configuring the Cloud Function to trigger on the assignment of the 'champion' alias, the team ensures that only the model version promoted as the production champion is automatically deployed, aligning with MLOps best practices for staged model promotion.

Exam trap

Google Cloud often tests the distinction between 'champion' (a production alias) and 'latest' (a version number concept), leading candidates to incorrectly choose 'latest' because they confuse chronological recency with performance-based promotion.

How to eliminate wrong answers

Option B is wrong because an empty string is not a valid alias in MLflow; aliases must be non-empty strings, and using an empty string would cause the function to fail or match no events. Option C is wrong because 'experiment' is not a predefined alias in MLflow Model Registry; it refers to an MLflow Experiment, not a model version alias, and would not trigger on model promotion events. Option D is wrong because 'latest' is not a standard alias in MLflow; while MLflow can retrieve the latest model version by version number, the 'latest' alias does not exist, and using it would not capture the champion promotion event.

Full explanation →

144

MCQmedium

A team is using AI Platform Data Labeling Service to label data for a classification model. They want to allow a labeler from a different team to work on the same dataset. What is the correct way to grant access?

A.Add the labeler's account as a Project Editor on the project

B.Share the Cloud Storage bucket containing the data with the labeler

C.Export the dataset and have the labeler create a new dataset

D.Add the labeler as a participant in the labeling task and assign IAM roles on the dataset

AnswerD

The Data Labeling Service allows adding participants to tasks, and IAM roles control access.

Why this answer

Option D is correct because labeling tasks are shared by granting the labeler role on the dataset resource. Option A is wrong because sharing the entire project gives too much access. Option B is wrong because the Data Labeling Service does not use Cloud Storage ACLs for task access.

Option C is wrong because exporting and reimporting causes duplication.

Full explanation →

145

Matchingmedium

Match each model evaluation metric to its use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Measure of false positives in classification

Measure of false negatives in classification

Harmonic mean of precision and recall

Root mean squared error for regression

Cross-entropy loss for probabilistic classification

Why these pairings

Metrics are critical for model selection and tuning.

Full explanation →

146

MCQeasy

You need to serve a TensorFlow model that has a cold start latency of 20 seconds. The model is used for a real-time application with unpredictable traffic, but occasional bursts require immediate responses. What is the best deployment strategy to minimize both cold start impact and cost?

A.Set min_replica_count to 1 to keep at least one instance always warm.

B.Use a larger machine type to reduce cold start time.

C.Set min_replica_count to 0 and rely on autoscaling to handle bursts.

D.Enable serving on Cloud Run for faster cold start.

AnswerA

One warm instance avoids cold start for initial traffic.

Why this answer

Setting a minimum number of replicas (min_replica_count) ensures that some instances are always warm, avoiding cold start for the first requests. This balances cost and latency. Prewarming requests or increasing target utilization wouldn't help directly.

Full explanation →

147

Multi-Selectmedium

Which THREE considerations are important when setting up a shared feature store in Vertex AI Feature Store for multiple teams?

Select 3 answers

A.Enable feature monitoring for data quality and freshness

B.Use separate BigQuery tables for each team's features

C.Implement data governance policies for feature creation and access

D.Create a feature sharing policy to enable cross-team discovery

E.Allow each team to build independent ingestion pipelines

AnswersA, C, D

Monitoring helps maintain trust in the feature store.

Why this answer

Option A is correct because Vertex AI Feature Store provides built-in feature monitoring that tracks data quality metrics (e.g., fraction of null values, distribution drift) and freshness (e.g., staleness of feature values). Enabling this monitoring is critical when multiple teams share a feature store to ensure that features remain reliable and up-to-date for downstream models, preventing silent degradation.

Exam trap

Google Cloud often tests the misconception that a shared feature store requires separate physical storage per team (Option B) or fully independent ingestion (Option E), when in reality the value lies in centralization with controlled access and standardized pipelines.

Full explanation →

148

Multi-Selectmedium

A manufacturing company uses AutoML Tables to predict equipment failure. They want to improve model performance without increasing manual effort. Which three actions should they take? (Choose THREE.)

Select 3 answers

A.Perform feature engineering using Vertex AI Feature Store.

B.Use BigQuery to aggregate sensor data before training.

C.Enable early stopping to prevent overfitting.

D.Deploy the model on a larger machine type to speed up inference.

E.Increase the training budget (node hours) for AutoML.

AnswersA, C, E

Feature Store helps create and manage features with minimal code.

Why this answer

Option A is correct because Vertex AI Feature Store enables feature engineering and reuse without manual effort, allowing the team to create, store, and serve features consistently for AutoML Tables, which can improve model performance by providing more relevant input data. This aligns with the goal of reducing manual work while enhancing model accuracy through automated feature management.

Exam trap

Google Cloud often tests the distinction between actions that improve model performance (like feature engineering and training budget) versus actions that affect deployment or inference speed, leading candidates to mistakenly choose options like deploying on a larger machine type.

Full explanation →

149

Multi-Selectmedium

Which TWO practices are important when scaling a prototype ML model to production on Google Cloud? (Choose two.)

Select 2 answers

A.Set up model monitoring for data drift and concept drift

B.Manually engineer features for each training iteration

C.Run the model on a single high-memory Compute Engine VM

D.Use proprietary libraries to maximize performance regardless of lock-in

E.Implement CI/CD pipelines for model training and deployment

AnswersA, E

Monitoring is essential for production model health.

Why this answer

Option A is correct because model monitoring for data drift and concept drift is essential in production ML on Google Cloud. Services like Vertex AI Model Monitoring automatically track feature distributions and prediction quality over time, alerting when the statistical properties of incoming data deviate from the training baseline. Without this, a model's accuracy can silently degrade as real-world data shifts, leading to poor business decisions.

Exam trap

Google Cloud often tests the misconception that production ML can rely on manual processes or single-instance deployments, whereas the correct approach emphasizes automation, monitoring, and scalability through managed services.

Full explanation →

150

MCQhard

A global retailer has deployed a real-time product recommendation model on Vertex AI Endpoints. The model is a large neural network that runs on a single node with 8 vCPUs and 30 GB memory. Over the past week, the p99 latency has increased from 200ms to 2 seconds, and the error rate has risen to 5%. Cloud Monitoring shows that the endpoint's CPU utilization is consistently near 100%, and memory is at 80%. The ML engineer suspects the model is too large for the node, but model size has not changed. Logs show no increase in request volume (steady at 50 QPS). There are no recent model updates. The engineer has tried to increase the node to 16 vCPUs, but latency decreased only slightly. What is the most likely root cause and the best first step to resolve it?

A.Profile the inference code to identify inefficient operations, such as unnecessary copies or suboptimal batch processing, and optimize the model serving logic.

B.Add more nodes to the endpoint by enabling autoscaling to distribute the load.

C.Retrain the model with a smaller architecture to reduce inference time.

D.Move the model to a machine type with more CPU cores and a GPU to accelerate inference.

AnswerA

The symptoms point to a code-level issue; profiling will reveal bottlenecks.

Why this answer

The p99 latency spike and high CPU utilization despite unchanged model size and request volume indicate a software bottleneck, not a hardware one. Profiling the inference code (Option A) can reveal inefficient operations like unnecessary data copies or suboptimal batch processing that degrade performance on the existing node. Since increasing vCPUs barely helped, the root cause is likely within the serving logic, not the compute capacity.

Exam trap

Google Cloud often tests the misconception that latency and CPU issues are always solved by scaling up hardware, when in fact software inefficiencies in the serving stack are a frequent root cause in ML deployments.

How to eliminate wrong answers

Option B is wrong because adding nodes via autoscaling would not address the root cause of high CPU utilization per node; it would only distribute the load, but each node would still suffer from the same inefficiency, and the steady 50 QPS suggests no need for more nodes. Option C is wrong because retraining with a smaller architecture is a long-term solution that ignores the immediate issue of serving inefficiency; the model size hasn't changed, and the problem is runtime performance, not model accuracy. Option D is wrong because moving to a GPU or more CPU cores treats the symptom (high CPU) rather than the cause; the minimal improvement from doubling vCPUs suggests the bottleneck is in software, not hardware, and a GPU would not fix inefficient code paths.

Full explanation →

Page 2 of 7

All pages

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Scaling prototypes into ML models Automating and orchestrating ML pipelines Collaborating within and across teams to manage data and models Architecting low-code ML solutions Collaborating to manage data and models Serving and scaling models Monitoring ML solutions Solving business challenges with ML

See all domains with question counts →