Google Professional Machine Learning Engineer PMLE Questions 226–300 | Page 4/7

226

MCQmedium

A startup has developed a prototype ML model using scikit-learn on a single machine. They now need to scale it to handle larger datasets and deploy it for real-time predictions. The team is small and wants minimal operational overhead. Which Google Cloud service should they use?

A.AI Platform Prediction

B.Vertex AI

C.Cloud Functions

D.Compute Engine with TensorFlow Serving

AnswerB

Vertex AI provides managed training, deployment, and autoscaling with minimal operational overhead.

Why this answer

Vertex AI (option B) is the correct choice because it provides a unified, fully managed MLOps platform that integrates model training, deployment, and scaling with minimal operational overhead. It supports scikit-learn models natively, offers auto-scaling for real-time predictions, and eliminates the need to manage infrastructure, making it ideal for a small team transitioning from a prototype.

Exam trap

Google Cloud often tests the misconception that any serverless option (like Cloud Functions) is suitable for ML inference, but the trap here is that Cloud Functions has severe resource and timeout limitations that make it impractical for real-time model serving, whereas Vertex AI is purpose-built for this workload.

How to eliminate wrong answers

Option A (AI Platform Prediction) is wrong because it is a legacy service that has been superseded by Vertex AI; while it could technically serve predictions, it lacks the unified workflow and newer features of Vertex AI, and using it would incur unnecessary complexity and potential deprecation risks. Option C (Cloud Functions) is wrong because it is a serverless compute service designed for event-driven, short-lived tasks (max 9 minutes timeout and 2 GB memory), not for hosting persistent ML models requiring real-time inference with low latency and large payloads. Option D (Compute Engine with TensorFlow Serving) is wrong because it requires manual setup, scaling, and maintenance of virtual machines, which contradicts the team's goal of minimal operational overhead; TensorFlow Serving also adds an extra layer of complexity for a scikit-learn model that could be served more simply via Vertex AI's built-in containers.

Full explanation →

227

Multi-Selecteasy

An ML team wants to monitor their recommendation model for fairness. Which TWO metrics should they track to detect potential bias? (Select TWO.)

Select 2 answers

A.Pair-wise fairness metrics such as equal opportunity difference.

B.Recall for the minority group only.

C.Overall accuracy on the test set.

D.Average prediction confidence per request.

E.Prediction distribution (e.g., top-K recommendations) across different sensitive attribute groups.

AnswersA, E

Standard fairness metric.

Why this answer

Pair-wise fairness metrics like equal opportunity difference directly compare model outcomes (e.g., true positive rates) across sensitive groups, making them a standard tool for detecting bias in classification tasks. This metric measures the difference in true positive rates between privileged and unprivileged groups, where a value close to zero indicates fairness. Tracking such metrics aligns with the core principle of monitoring for disparate impact in ML systems.

Exam trap

Google Cloud often tests the misconception that overall accuracy or group-specific recall alone is sufficient for fairness monitoring, when in fact comparative metrics across groups are required to detect bias.

Full explanation →

228

MCQmedium

You deploy a PyTorch model to Vertex AI Online Prediction. After deployment, you observe that inference latency is approximately 300ms per request, but the desired SLA is under 100ms. The model uses a custom container with CPU only. Which action is most likely to reduce latency to the target?

A.Deploy the model on a machine with a GPU accelerator.

B.Switch from online prediction to batch prediction.

C.Increase the min_replica_count to ensure more instances are always available.

D.Use a smaller machine type with less CPU to reduce overhead.

AnswerA

GPU can accelerate PyTorch inference significantly, reducing latency.

Why this answer

Enabling GPU acceleration can significantly speed up inference for deep learning models. Adding more CPU instances may help with throughput but not per-request latency. Switching to batch prediction changes the use case, and using a smaller instance type might reduce latency if the model is small, but GPU is more impactful.

Full explanation →

229

MCQmedium

A company deploys a classification model on Vertex AI for loan approval. After a month, they notice the precision has dropped significantly. What should they do first?

A.Retrain the model with more data

B.Increase the number of prediction nodes

C.Check for data drift using Vertex AI Model Monitoring

D.Revert to the previous model version

AnswerC

Model monitoring is designed to detect drift, which could cause precision drop.

Why this answer

Option C is correct because a sudden drop in precision indicates that the model's predictions are no longer aligning with the ground truth, which is a classic symptom of data drift. Vertex AI Model Monitoring can automatically detect drift in feature distributions or prediction output compared to a baseline, allowing you to identify the root cause before taking corrective action. Retraining or reverting without first diagnosing the drift could waste resources or mask the underlying issue.

Exam trap

Google Cloud often tests the misconception that any performance degradation should be immediately fixed by retraining or rolling back, rather than first diagnosing the cause through monitoring tools like Vertex AI Model Monitoring.

How to eliminate wrong answers

Option A is wrong because retraining with more data does not address the root cause if the data distribution has shifted; it may even reinforce the drift if the new data is also drifted. Option B is wrong because increasing prediction nodes only improves throughput and latency, not prediction quality or precision. Option D is wrong because reverting to a previous model version is a reactive rollback that does not diagnose why precision dropped; the old model may also suffer from drift if the environment has changed.

Full explanation →

230

MCQeasy

A company needs to serve a model with strict latency requirements (<100ms). They are using Vertex AI Prediction with CPU. During testing, latency is 150ms. What should they do?

A.Enable batching to improve throughput

B.Use a smaller machine type with more replicas

C.Export the model to TensorFlow Lite

D.Switch to a GPU machine type

AnswerD

GPUs can reduce inference latency.

Why this answer

The model's latency of 150ms exceeds the 100ms requirement. Switching to a GPU machine type (Option D) is correct because GPUs are optimized for parallel computation, significantly reducing inference latency for many ML models, especially deep learning models, compared to CPUs. Vertex AI Prediction supports GPU machine types, and this change directly addresses the latency bottleneck without altering the model or its serving configuration.

Exam trap

The trap here is that candidates confuse throughput optimization (batching or scaling replicas) with latency reduction, failing to recognize that GPUs directly address compute-bound latency while CPU-based solutions cannot meet strict sub-100ms requirements for complex models.

How to eliminate wrong answers

Option A is wrong because batching improves throughput (requests per second) by grouping multiple inference requests, but it typically increases per-request latency due to queuing and processing delays, making it unsuitable for a strict sub-100ms latency requirement. Option B is wrong because using a smaller machine type with more replicas can improve throughput and availability but does not reduce per-request inference latency; smaller machines often have less compute power, potentially increasing latency. Option C is wrong because exporting the model to TensorFlow Lite is designed for edge or mobile deployment with limited resources, not for optimizing latency in a cloud-based Vertex AI Prediction serving environment; it would require significant model conversion and may not be compatible with all model architectures.

Full explanation →

231

MCQhard

A company uses Vertex AI Pipelines to train and deploy models. The pipeline has a step that runs a custom container. The step fails intermittently with a timeout error. Which approach should be taken to robustly handle this?

A.Switch to Kubeflow Pipelines

B.Set up a Cloud Composer DAG to monitor and rerun the pipeline

C.Reduce the size of the training data

D.Increase the timeout for the step in the pipeline definition

E.Use Cloud Functions to retry the step

AnswerD

Directly fixes the timeout issue.

Why this answer

Option D is correct because Vertex AI Pipelines (built on Kubeflow Pipelines) allows you to define a `timeout` parameter for each pipeline step. Increasing this timeout directly addresses the intermittent timeout error by giving the custom container more time to complete its work, without changing the pipeline architecture or introducing external monitoring components. This is the most robust and minimal-change solution for a step that occasionally exceeds its current time limit.

Exam trap

The trap here is that candidates may over-engineer the solution by choosing external retry mechanisms (Cloud Functions, Cloud Composer) or changing the pipeline framework, when the simplest and most correct fix is to adjust the step's timeout configuration within the pipeline definition itself.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipelines is already built on Kubeflow Pipelines; switching does not solve a timeout issue and would require re-architecting the pipeline. Option B is wrong because Cloud Composer (Apache Airflow) is an external orchestrator; adding it to monitor and rerun the pipeline adds complexity and latency, and does not fix the root cause of the step timing out. Option C is wrong because reducing training data size may degrade model quality and does not address the timeout—the step might still fail if the container itself is slow for other reasons.

Option E is wrong because Cloud Functions are stateless and event-driven; they cannot directly retry a step within a Vertex AI Pipeline—retries should be configured natively in the pipeline definition using the `retry_count` or `timeout` parameters.

Full explanation →

232

MCQeasy

A company wants to predict customer churn using a dataset with 10,000 rows and 20 features. They have no ML expertise. Which low-code solution should they use?

A.Kubeflow Pipelines

B.Custom TensorFlow model

C.BigQuery ML

D.Vertex AI AutoML Tables

AnswerD

AutoML Tables provides automated model training and deployment without requiring deep ML knowledge.

Why this answer

Vertex AI AutoML Tables is the correct low-code solution because it allows users with no ML expertise to train high-quality tabular models on structured data (10,000 rows, 20 features) without writing any code. It automates feature engineering, model selection, and hyperparameter tuning, and provides a simple UI to upload data and get predictions. This directly matches the requirement of a low-code, no-expertise solution for a tabular churn prediction problem.

Exam trap

Google Cloud often tests the distinction between low-code/no-code solutions (like AutoML Tables) and platforms that still require coding or infrastructure expertise (like Kubeflow or custom TensorFlow), leading candidates to pick a technically capable but overly complex option.

How to eliminate wrong answers

Option A is wrong because Kubeflow Pipelines is a platform for building and deploying ML pipelines that requires significant coding and Kubernetes expertise, making it unsuitable for users with no ML expertise. Option B is wrong because a custom TensorFlow model requires writing Python code, defining neural network architectures, and tuning hyperparameters, which demands ML expertise. Option C is wrong because BigQuery ML is a low-code option for SQL-based ML, but it requires knowledge of SQL and ML concepts (e.g., creating models with CREATE MODEL statements), and it is less automated than AutoML Tables for users with zero ML background.

Full explanation →

233

MCQmedium

A team is using Vertex AI Experiments to compare different hyperparameters. They want to automatically record the hyperparameters. What is the correct way?

A.Manually log to console

B.Use the `aiplatform.start_run()` context manager

C.Write to a CSV file

D.Use BigQuery

AnswerB

This context manager automatically logs hyperparameters and metrics to Vertex AI Experiments.

Why this answer

Option B is correct because Vertex AI Experiments provides a native `aiplatform.start_run()` context manager that automatically captures hyperparameters passed as key-value arguments, logging them to the experiment run metadata without manual intervention. This integrates directly with the Vertex AI SDK, ensuring consistency and traceability across runs.

Exam trap

Google Cloud often tests the misconception that any logging method (console, CSV, BigQuery) is equivalent to native SDK integration, but the key requirement is automatic, structured recording tied to the experiment run, which only the SDK's context manager provides.

How to eliminate wrong answers

Option A is wrong because manually logging to console only outputs data to stdout, which is not persisted in Vertex AI Experiments and cannot be queried or compared programmatically. Option C is wrong because writing to a CSV file requires custom I/O code, lacks integration with Vertex AI's experiment tracking, and does not associate the hyperparameters with a specific experiment run. Option D is wrong because BigQuery is a data warehouse for analytics, not a mechanism for automatically recording hyperparameters during model training; it would require additional infrastructure to capture and store the parameters.

Full explanation →

234

Multi-Selecthard

Which THREE should be considered when setting up an automated retraining pipeline using Vertex AI Pipelines and Cloud Composer? (Choose THREE.)

Select 3 answers

A.Setting performance thresholds for new models to decide deployment

B.Including hyperparameter tuning in every retraining run

C.Optimizing resource allocation to control costs

D.Frequency of code commits to the repository

E.Monitoring for data drift to trigger retraining

AnswersA, C, E

Ensure new model is better than current.

Why this answer

Option A is correct because in an automated retraining pipeline, you must set performance thresholds (e.g., accuracy, precision, recall) for new models to decide whether to deploy them. Vertex AI Pipelines can evaluate model metrics against these thresholds and conditionally deploy only if the new model meets or exceeds the current production model's performance, preventing regressions.

Exam trap

Google Cloud often tests the misconception that hyperparameter tuning must be part of every retraining run, but in practice it is a separate, infrequent optimization step to avoid excessive compute costs and pipeline latency.

Full explanation →

235

MCQeasy

A company deploys a model on Vertex AI Endpoints for real-time inference. They notice latency spikes during peak hours. Which action is most effective to reduce latency without sacrificing accuracy?

A.Enable autoscaling based on CPU utilization

B.Use a larger machine type

C.Reduce model size by pruning

D.Implement client-side caching

AnswerA

Autoscaling adds instances during load spikes, maintaining low latency without sacrificing accuracy.

Why this answer

Option B is correct because enabling autoscaling based on CPU utilization dynamically adjusts the number of instances to handle traffic spikes, reducing latency. Option A increases cost without addressing scaling elasticity. Option C may help but not during peaks if requests are unique.

Option D can reduce accuracy.

Full explanation →

236

MCQhard

A team uses Vertex AI Experiments to track ML training runs. They want to automatically trigger a retraining pipeline when new labeled data arrives in BigQuery, and ensure the pipeline uses only approved libraries from a central artifact registry. Which combination of services should they use?

A.Cloud Composer to orchestrate, with Cloud Storage for libraries.

B.Vertex AI Pipelines with a scheduled trigger, and use Cloud Build to pull libraries from Artifact Registry.

C.Cloud Functions triggered by BigQuery, Cloud Build to run training, and Artifact Registry for libraries.

D.Vertex AI Experiments with continuous evaluation, and a Cloud Run job for training.

E.Dataflow to preprocess, then trigger a Cloud Run job.

AnswerB

Scheduled pipeline can query BigQuery for new data, and Cloud Build ensures consistent library versions.

Why this answer

Option B is correct because Vertex AI Pipelines provides a managed orchestration service for ML workflows, and a scheduled trigger can be set to run the pipeline when new labeled data arrives in BigQuery (e.g., via a Cloud Scheduler or Eventarc trigger). Cloud Build is used to pull approved libraries from Artifact Registry, ensuring only vetted dependencies are used during pipeline execution, which meets the security and compliance requirement.

Exam trap

The trap here is that candidates may confuse Cloud Build (a CI/CD service) with Vertex AI Training (a managed ML training service), or think that Cloud Composer is the only orchestration option for ML pipelines, when Vertex AI Pipelines is the native, more integrated choice for ML workflows on Vertex AI.

How to eliminate wrong answers

Option A is wrong because Cloud Composer (based on Apache Airflow) is a general-purpose workflow orchestrator, not specifically designed for Vertex AI Pipelines, and using Cloud Storage for libraries does not enforce the use of a central artifact registry with version control and access policies. Option C is wrong because Cloud Functions triggered by BigQuery can initiate a retraining pipeline, but Cloud Build is a CI/CD tool, not a managed ML training service; Vertex AI Training or Pipelines should be used for the actual training run, not Cloud Build. Option D is wrong because Vertex AI Experiments tracks runs but does not orchestrate retraining pipelines; continuous evaluation is a monitoring feature, not a trigger mechanism, and Cloud Run is a serverless compute service for containers, not a managed ML training service.

Option E is wrong because Dataflow is a stream/batch data processing service, not a trigger mechanism for retraining, and Cloud Run is not designed for long-running ML training jobs; it lacks GPU support and has request timeout limits.

Full explanation →

237

MCQhard

A financial institution uses BigQuery ML to train a linear regression model to predict loan default risk. The model is trained on a dataset with 100 million rows and 50 features. During inference, the engineer uses the ML.PREDICT function. However, the query takes several minutes to run and times out frequently. The data is static and updated monthly. What is the most cost-effective and low-code solution to improve prediction latency?

A.Export the trained model as a SQL function using the EXPORT MODEL statement, then use it for predictions.

B.Create a Dataflow pipeline to precompute predictions and store them in a separate table.

C.Use a materialized view to precompute the prediction features.

D.Increase the BigQuery compute capacity by reserving more slots.

AnswerA

Exports model as a persistent function for faster inference.

Why this answer

Option A is correct because exporting the trained model as a SQL function via `EXPORT MODEL` converts the linear regression coefficients into a persistent SQL UDF, eliminating the overhead of model loading and serialization during each `ML.PREDICT` call. This approach is low-code (no external pipeline) and cost-effective since predictions are executed as standard SQL without consuming BigQuery ML slot resources for model inference.

Exam trap

Google Cloud often tests the misconception that scaling infrastructure (more slots) or adding external pipelines (Dataflow) is the default solution for ML inference latency, when the correct low-code approach is to leverage BigQuery's native model export to SQL functions for static or batch-updated models.

How to eliminate wrong answers

Option B is wrong because creating a Dataflow pipeline introduces additional operational complexity, cost, and latency for a static dataset that is updated only monthly — the precomputed predictions would be stale until the next batch run, and the pipeline adds unnecessary engineering overhead. Option C is wrong because a materialized view can only store precomputed query results, not the prediction logic itself; it would still require calling `ML.PREDICT` on each refresh, and materialized views cannot directly invoke ML functions without incurring the same inference overhead. Option D is wrong because increasing BigQuery compute capacity (reserving more slots) only addresses resource contention, not the fundamental latency caused by model loading and inference overhead in `ML.PREDICT`; it is also the least cost-effective solution as it incurs ongoing slot costs without fixing the root cause.

Full explanation →

238

MCQmedium

A team wants to implement CI/CD for their ML models using Cloud Build. They have a pipeline that trains a model and deploys it. What is the best practice for triggering the pipeline when a new commit is pushed to the source repository?

A.Set up a Cloud Scheduler job to poll the repository periodically

B.Deploy a custom web service on App Engine to call Cloud Build API

C.Use Pub/Sub to notify Cloud Build of new commits

D.Configure a Cloud Build trigger on the source repository (e.g., Cloud Source Repositories, GitHub)

AnswerD

Cloud Build supports triggers that automatically start a build upon a push to the repository.

Why this answer

Option D is correct because Cloud Build natively supports triggers that automatically start a pipeline when a new commit is pushed to a connected source repository (e.g., Cloud Source Repositories, GitHub, Bitbucket). This is the simplest, most event-driven approach, requiring no polling, custom services, or additional messaging infrastructure. It directly maps the git push event to a build invocation, ensuring near-instantaneous pipeline execution.

Exam trap

The trap here is that candidates may overthink the solution and choose Pub/Sub (Option C) because they know Pub/Sub is used for event-driven architectures, but they miss that Cloud Build triggers already abstract this complexity away, making direct trigger configuration the best practice.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler polling is inefficient, introduces latency (minimum 1-minute intervals), and is not event-driven; it would waste resources and delay pipeline starts. Option B is wrong because deploying a custom web service on App Engine to call the Cloud Build API adds unnecessary complexity, cost, and maintenance overhead, and is not a best practice when native triggers exist. Option C is wrong because while Pub/Sub can be used to trigger builds, it requires an intermediary (e.g., a Cloud Function) to receive the commit notification and call the Cloud Build API, adding latency and complexity compared to a direct Cloud Build trigger.

Full explanation →

239

MCQhard

A company uses a Cloud Composer DAG to run a daily ML pipeline that includes Dataflow jobs and model training on Vertex AI. The pipeline frequently fails due to insufficient permissions when the Dataflow worker accesses data in Cloud Storage. What is the most efficient way to resolve this issue?

A.Create a custom service account with required permissions and assign it to the Dataflow job.

B.Grant the 'roles/storage.objectViewer' role to 'allUsers' on the Cloud Storage bucket.

C.Use the Composer environment's service account for all pipeline components.

D.Move the Dataflow job to run after the pipeline so that data is already processed.

AnswerA

Lets the Dataflow worker access the data securely.

Why this answer

The most efficient way to resolve insufficient permissions for Dataflow workers accessing Cloud Storage is to create a custom service account with the required roles (e.g., roles/storage.objectViewer) and assign it to the Dataflow job via the --serviceAccount option. This follows the principle of least privilege and ensures that only the Dataflow workers have the necessary permissions, without affecting other pipeline components or exposing the bucket publicly.

Exam trap

Google Cloud often tests the misconception that using a single service account for all components (like the Composer environment's service account) is simpler and sufficient, but this ignores the principle of least privilege and can cause security vulnerabilities or permission conflicts in distributed pipelines.

How to eliminate wrong answers

Option B is wrong because granting roles/storage.objectViewer to 'allUsers' makes the Cloud Storage bucket publicly readable, which is a severe security risk and violates least privilege principles. Option C is wrong because the Composer environment's service account typically has broader permissions than needed for Dataflow workers, and using it for all components can lead to over-privileging and potential security issues; moreover, Dataflow workers require a separate identity to access resources independently. Option D is wrong because moving the Dataflow job to run after the pipeline does not address the root cause of insufficient permissions; the Dataflow job will still fail when it tries to access Cloud Storage data, regardless of when it runs.

Full explanation →

240

MCQmedium

What is the most likely cause of the error?

A.The data split column contains only NULL values, so no rows are assigned to the training set

B.The model type 'linear_reg' is incompatible with the column 'price' because of missing values

C.The model creation does not have permission to read the dataset in BigQuery

D.The model creation did not specify a training budget, so default is insufficient

AnswerA

Custom split requires non-NULL values 0,1,2.

Why this answer

Option A is correct because when the data split column contains only NULL values, BigQuery ML cannot assign any rows to the training set. The `DATA_SPLIT_METHOD` using a custom column requires non-NULL values in that column to partition data into training and evaluation sets; if all values are NULL, the training set receives zero rows, causing the model creation to fail with an error about insufficient training data.

Exam trap

Google Cloud often tests the subtle distinction between missing values in the label column (which are handled gracefully) versus missing values in the data split column (which can cause a complete failure), leading candidates to incorrectly blame missing values in the target column.

How to eliminate wrong answers

Option B is wrong because the `linear_reg` model type is fully compatible with the `price` column even if it has missing values; BigQuery ML handles NULLs in the label column by excluding those rows during training, but the error here is about no training rows, not missing values. Option C is wrong because if the user lacked permission to read the dataset, the error would be a permissions-related message (e.g., 'Access Denied'), not a training set size error. Option D is wrong because BigQuery ML does not require a training budget for linear regression models; the default settings are sufficient, and the error is not budget-related.

Full explanation →

241

MCQmedium

Your organization has a large production system that uses Vertex AI Prediction for an NLP model with a 2 GB memory footprint. The endpoint is configured with 5 replicas, each using an n1-standard-4 with a single T4 GPU. Recently, you observed an increase in 503 errors during peak hours. Cloud Monitoring shows that GPU utilization is consistently above 90% across all replicas, while CPU and memory are below 50%. You have already increased the max replicas to 10, but the errors persist because the increased replicas also become saturated. What should you do to resolve the issue?

A.Switch to a larger GPU such as V100 or A100 to increase per-replica throughput.

B.Implement request batching in the custom container to improve GPU utilization efficiency.

C.Enable model parallelism across multiple GPUs within each replica.

D.Use a high-memory machine type like n1-highmem-16 to reduce memory pressure.

AnswerD

High-memory machines address memory bottlenecks; memory is likely the real issue given GPU saturation.

Why this answer

Option D is correct because CPU bottlenecks cause high latency; switching to a machine type with more CPU cores (e.g., n1-highcpu-16) reduces CPU contention. Option A adds memory but not CPU. Option B uses more replicas but each already saturated.

Option C is irrelevant; batch processing is not in use.

Full explanation →

242

MCQmedium

You are deploying a scikit-learn model for online predictions. The model size is 200 MB. You want to minimize latency and cost. Which serving option should you choose?

A.Deploy to Vertex AI online prediction using a prebuilt container for scikit-learn.

B.Use Cloud Run with a custom container.

C.Create a Kubernetes cluster on GKE and deploy the model there.

D.Export the model as a Cloud Function.

AnswerA

Vertex AI provides optimized containers and autoscaling for online prediction.

Why this answer

Vertex AI online prediction with custom containers is suitable for scikit-learn models. Vertex AI will host the container and scale. Using AI Platform or Cloud Functions with a 200 MB model might hit limits.

Full explanation →

243

MCQeasy

A team uses Vertex AI Feature Store for storing features. They want to share feature definitions with other teams in a collaborative manner. What is the best way to collaborate on feature definitions?

A.Use a shared repository with feature definition files and CI/CD to update the feature store.

B.Grant all teams write access to the same feature store so they can modify definitions directly.

C.Export the feature definitions as CSV and email them to the other teams.

D.Use a wiki page to document feature definitions and update it manually.

AnswerA

Using a shared repo with CI/CD provides version control and automated updates, ensuring consistency and traceability.

Why this answer

Option A is correct because using a shared repository with feature definition files and CI/CD pipelines enables version control, peer review, and automated deployment to Vertex AI Feature Store. This approach ensures consistency, traceability, and collaboration without risking direct, uncoordinated changes to the production feature store.

Exam trap

The trap here is that candidates may assume direct write access (Option B) is efficient for collaboration, but the exam tests understanding that feature stores require controlled, versioned updates to maintain data integrity and avoid breaking downstream models.

How to eliminate wrong answers

Option B is wrong because granting all teams write access to the same feature store allows uncoordinated, direct modifications to feature definitions, which can lead to conflicts, data corruption, and lack of version control. Option C is wrong because exporting feature definitions as CSV and emailing them is error-prone, lacks versioning, and does not provide a single source of truth for collaboration. Option D is wrong because using a wiki page for manual documentation is static, easily outdated, and does not integrate with the feature store's actual schema or deployment process.

Full explanation →

244

MCQmedium

A team deploys a PyTorch model on Vertex AI for online predictions. They notice that after deployment, the latency increases over time, especially during peak hours. The model is served using a custom container. What is the most likely cause?

A.The custom container does not have a health check, causing instances to be prematurely terminated.

B.The model is not using GPU even though a GPU machine is selected.

C.The model is too large for the machine's memory, causing swapping.

D.The prediction requests are not being batched, and the model inference code is not optimized for concurrency.

AnswerD

Without batching and concurrency, requests queue up, increasing latency under load.

Why this answer

Option D is correct because the latency increase over time, especially during peak hours, indicates that the model inference code is not handling concurrent requests efficiently. Without batching or optimized concurrency, each request is processed sequentially, causing a queue buildup under load. This is a common issue with custom containers on Vertex AI when the prediction handler is single-threaded or lacks async processing.

Exam trap

Google Cloud often tests the misconception that latency increases are always due to resource exhaustion (memory/CPU) rather than concurrency or request handling inefficiencies, leading candidates to pick Option C.

How to eliminate wrong answers

Option A is wrong because a missing health check would cause instances to be terminated and recreated, leading to intermittent failures or startup latency, not a gradual latency increase over time. Option B is wrong because selecting a GPU machine without using the GPU would result in underutilization but not necessarily increasing latency; the model would still run on CPU, and latency would be constant or high from the start. Option C is wrong because if the model were too large for memory, swapping would cause consistently high latency from the outset, not a gradual increase during peak hours.

Full explanation →

245

MCQeasy

A data scientist wants to track the lineage of a dataset used in a training run. Which Vertex AI feature should they use?

A.Vertex ML Metadata

B.Vertex AI Feature Store

C.Vertex AI Experiments

D.Vertex AI Model Registry

AnswerA

ML Metadata tracks data lineage and artifact relationships.

Why this answer

Vertex ML Metadata is the correct choice because it is specifically designed to track the lineage of datasets, models, and other artifacts throughout the ML lifecycle. It records metadata about each step in a pipeline, including the source dataset used for a training run, enabling full provenance tracking. This allows data scientists to trace back which data was used, how it was transformed, and which model version it produced.

Exam trap

The trap here is that candidates often confuse Vertex AI Experiments (which tracks run metrics and parameters) with lineage tracking, but Experiments does not capture the full artifact-to-execution graph that ML Metadata provides for dataset provenance.

How to eliminate wrong answers

Option B is wrong because Vertex AI Feature Store is a centralized repository for storing, serving, and sharing feature values for ML models, not for tracking dataset lineage. Option C is wrong because Vertex AI Experiments is used to track and compare model training runs, hyperparameters, and metrics, but it does not natively capture the lineage of the dataset itself beyond run-level parameters. Option D is wrong because Vertex AI Model Registry is a version control system for trained models, managing model deployments and versions, but it does not track the provenance of the training data used to create those models.

Full explanation →

246

MCQhard

A company has multiple teams working on different models. They want to enforce consistent data preprocessing steps across all teams. Which approach should they take?

A.Use Cloud Composer to orchestrate preprocessing

B.Write shared Python packages in Artifact Registry

C.Use Cloud Dataflow templates

D.Create shared Vertex AI Pipelines components

AnswerD

Shared components can be reused across pipelines, enforcing consistent preprocessing.

Why this answer

Vertex AI Pipelines components allow teams to define reusable, versioned, and parameterized preprocessing steps that can be shared across models and pipelines. This ensures consistent execution of data transformations because each component encapsulates the exact code and environment, and pipelines enforce the same DAG of steps regardless of which team triggers them.

Exam trap

Google Cloud often tests the distinction between 'sharing code' (e.g., packages) and 'sharing executable, environment-encapsulated pipeline steps' (e.g., components), leading candidates to choose a code-sharing option like Artifact Registry instead of the pipeline component approach that enforces consistency.

How to eliminate wrong answers

Option A is wrong because Cloud Composer is an orchestration service for workflows (based on Apache Airflow) and does not inherently enforce consistent preprocessing logic across teams; it only schedules and monitors tasks, leaving the actual preprocessing code to be defined separately and potentially inconsistently. Option B is wrong because writing shared Python packages in Artifact Registry provides a way to distribute code, but it does not enforce a standardized execution environment or pipeline structure; teams could still call the packages with different parameters or in different orders, leading to inconsistency. Option C is wrong because Cloud Dataflow templates are used for batch and stream data processing jobs (based on Apache Beam), but they are not designed to be shared as reusable, composable steps across multiple ML pipelines; they lack the pipeline-level DAG enforcement and versioning that Vertex AI Pipelines components provide.

Full explanation →

247

MCQhard

A financial institution wants to use Natural Language API for sentiment analysis on customer feedback, but the domain-specific language (e.g., 'bullish', 'bearish') is not correctly classified. They have 200 labeled examples. Which approach minimizes coding effort while improving accuracy?

A.Submit a feature request to Google for domain-specific terms

B.Create a custom sentiment dictionary and pass it to the Natural Language API

C.Build a custom TensorFlow model for sentiment

D.Use AutoML Natural Language to train a custom model

AnswerD

No-code training on labeled data for improved accuracy.

Why this answer

Option D is correct because AutoML Natural Language enables you to train a custom model on your 200 labeled examples without writing code, directly improving accuracy for domain-specific terms like 'bullish' and 'bearish'. This approach leverages transfer learning from Google's pre-trained models, minimizing coding effort while adapting to your unique vocabulary and sentiment patterns.

Exam trap

Google Cloud often tests the misconception that the Natural Language API supports custom dictionaries or rule-based overrides, when in fact it only offers a fixed pre-trained model, making AutoML the correct low-code path for domain adaptation.

How to eliminate wrong answers

Option A is wrong because submitting a feature request to Google for domain-specific terms is not a practical solution—Google does not provide custom term updates for individual customers, and the turnaround time is indefinite. Option B is wrong because the Natural Language API does not accept a custom sentiment dictionary; it only supports a static, built-in sentiment model, and passing a dictionary is not a supported feature. Option C is wrong because building a custom TensorFlow model requires significant coding effort, including data preprocessing, model architecture design, training, and deployment, which contradicts the goal of minimizing coding effort.

Full explanation →

248

MCQeasy

A team wants to share a trained model with another team who will deploy it to a different Google Cloud project. Which is the recommended way to transfer the model?

A.Copy the model artifact from one project's Cloud Storage to another using gsutil.

B.Export the model as a SavedModel, store in a shared Cloud Storage bucket, and import into the second project.

C.Package the model in a Docker container and push to a cross-project Container Registry.

D.Use Cloud Marketplace to publish the model.

E.Use Vertex AI Model Registry with cross-project IAM permissions to allow the second project to access the model.

AnswerE

Model Registry maintains version history and metadata while enabling cross-project sharing.

Why this answer

Option E is correct because Vertex AI Model Registry supports cross-project access via IAM permissions, allowing the second project to directly deploy the model without copying artifacts. This approach maintains a single source of truth, avoids data duplication, and leverages Vertex AI's built-in versioning and lineage tracking. It is the recommended pattern for sharing models across projects in Google Cloud.

Exam trap

Google Cloud often tests the misconception that copying artifacts (gsutil) or using shared storage is the simplest approach, but the exam expects candidates to recognize that Vertex AI Model Registry with cross-project IAM is the recommended, managed solution for model sharing across projects.

How to eliminate wrong answers

Option A is wrong because copying model artifacts via gsutil bypasses Vertex AI's model management, losing metadata, versioning, and deployment history, and is not a recommended practice for production model sharing. Option B is wrong because exporting a SavedModel to a shared Cloud Storage bucket still requires manual import and does not leverage Vertex AI's model registry, leading to potential versioning and access control issues. Option C is wrong because packaging the model in a Docker container and pushing to a cross-project Container Registry is more appropriate for containerized inference services, not for sharing a trained model artifact itself, and adds unnecessary complexity.

Option D is wrong because Cloud Marketplace is designed for publishing commercial solutions, not for internal team-to-team model sharing within an organization.

Full explanation →

249

MCQhard

A machine learning engineer is scaling a prototype natural language processing model that uses a transformer encoder. The prototype was trained on a small corpus on a single GPU. For production, they need to train on a much larger corpus using TPUs on Vertex AI. They convert the TensorFlow code to work with TPUStrategy. The training starts but after a few steps, the loss becomes NaN and training diverges. The learning rate scheduler uses a warm-up and then linear decay. The initial learning rate is 1e-4. The batch size per TPU core is 32, with 8 cores total (batch size 256). What is the most likely cause?

A.The batch size is too small for TPU.

B.The learning rate is too high for the batch size.

C.The learning rate schedule should be cosine instead of linear.

D.The warm-up steps are insufficient.

AnswerB

Larger batch size requires lower learning rate to maintain stability.

Why this answer

When scaling from a single GPU to 8 TPU cores, the global batch size increases from 32 to 256. The learning rate of 1e-4, which was appropriate for batch size 32, becomes too high for the larger batch size. This violates the linear scaling rule (learning rate should be scaled proportionally to batch size), causing gradient updates to overshoot minima and leading to NaN loss and divergence.

Exam trap

Google Cloud often tests the misconception that TPU-specific issues (like batch size or hardware compatibility) are the root cause, when in fact the problem is a fundamental hyperparameter scaling error that applies to any distributed training setup.

How to eliminate wrong answers

Option A is wrong because TPUs are designed to handle large batch sizes efficiently; a batch size of 256 is well within typical TPU capabilities and is not the cause of NaN loss. Option C is wrong because the learning rate schedule (linear vs. cosine) is not the primary issue; the fundamental problem is the learning rate magnitude relative to the batch size, not the decay shape. Option D is wrong because insufficient warm-up steps might cause early instability but would not typically lead to persistent NaN loss after several steps; the core issue is the learning rate being too high for the increased batch size.

Full explanation →

250

MCQeasy

A data science team deploys a regression model to predict house prices. After one month, the mean absolute error (MAE) on the serving data increases by 20% compared to the test set. Which monitoring strategy should the team implement first to diagnose the issue?

A.Retrain the model daily with the latest data to adapt to changing patterns.

B.Monitor prediction residuals and compute serving-time MAE over sliding windows.

C.Compare the distribution of training labels with serving labels using a two-sample t-test.

D.Monitor input feature distributions for drift using the Kolmogorov-Smirnov test.

AnswerB

Directly tracking MAE on serving data over time is the most straightforward diagnostic for performance degradation.

Why this answer

Option B is correct because the first step in diagnosing a 20% MAE increase on serving data is to monitor prediction residuals over sliding windows. This directly tracks how model errors evolve in production, allowing the team to detect whether performance degradation is sudden or gradual, and to correlate it with specific time windows or data slices. Computing serving-time MAE on sliding windows provides an immediate, interpretable signal of model health without assuming the root cause.

Exam trap

Google Cloud often tests the misconception that the first step in diagnosing model degradation is to check for data drift (Option D), when in fact the correct first step is to confirm and quantify the performance drop itself using serving-time metrics like sliding-window MAE.

How to eliminate wrong answers

Option A is wrong because retraining daily without first diagnosing the cause of the MAE increase is a reactive, resource-intensive approach that may mask underlying issues like data drift or concept drift, and does not help identify whether retraining is even necessary. Option C is wrong because comparing training labels with serving labels using a two-sample t-test checks for label distribution shift, but the MAE increase could be due to feature drift, concept drift, or data quality issues unrelated to label distribution; this test is too narrow and may miss the actual cause. Option D is wrong because monitoring input feature distributions for drift using the Kolmogorov-Smirnov test is a valid technique, but it is a secondary diagnostic step; the first priority should be to confirm and characterize the performance degradation itself via residual monitoring before investigating potential causes.

Full explanation →

251

MCQhard

An ML team uses Vertex AI Pipelines to automate model retraining. The pipeline includes a step that queries BigQuery to create a training dataset. The team notices that the pipeline fails intermittently with a '403 Exceeded rate limits' error. What is the most likely cause and solution?

A.The pipeline is issuing too many concurrent queries; use a BigQuery reservation to guarantee slot capacity

B.The training dataset is too large; partition the table and query only the latest partition

C.The pipeline step timeout is too short; increase the timeout to 30 minutes

D.The SQL query is inefficient; rewrite it using materialized views

AnswerA

Reservations provide dedicated slots, avoiding API rate limits.

Why this answer

The 403 'Exceeded rate limits' error in BigQuery indicates that the project is hitting the concurrent query rate limit or the rate of bytes read per second. Using a BigQuery reservation guarantees dedicated slot capacity, which prevents rate-limit errors by ensuring the pipeline has consistent compute resources regardless of other workloads in the project. This is the most direct solution because rate limits are enforced at the project level based on available slots, and a reservation provides a fixed number of slots that bypass those limits.

Exam trap

The trap here is that candidates confuse rate-limit errors with performance or timeout issues, and they choose options that optimize query cost or size (B, D) or adjust timeouts (C), instead of recognizing that a 403 error specifically points to a quota or rate-limit violation that requires resource allocation like a reservation.

How to eliminate wrong answers

Option B is wrong because a large dataset does not cause a 403 rate-limit error; it would cause a 'resources exceeded' or timeout error, not a rate-limit error. Partitioning and querying only the latest partition could reduce bytes processed but does not address the rate limit on concurrent queries or slot usage. Option C is wrong because a timeout error would manifest as a deadline exceeded or 504 error, not a 403 rate-limit error; increasing the timeout does not resolve rate-limiting.

Option D is wrong because an inefficient SQL query would cause high slot consumption or slow performance, but the error is specifically about rate limits, not query efficiency; materialized views could reduce query cost but do not change the project-level rate limit enforcement.

Full explanation →

252

MCQmedium

A data scientist uses Vertex AI Workbench to train a model and then deploys it to an endpoint. They want to automate the retraining and redeployment pipeline when new data arrives. Which service should they use?

A.Cloud Composer

B.Vertex AI Pipelines

C.Cloud Scheduler

D.Cloud Functions

AnswerB

Vertex AI Pipelines is purpose-built for ML workflows, allowing easy automation of retraining and redeployment.

Why this answer

Option C is correct because Vertex AI Pipelines provides a serverless, managed pipeline orchestration service that can automate retraining and redeployment. Option A (Cloud Composer) is a workflow orchestration service but is more complex and not as integrated with Vertex AI. Option B (Cloud Functions) is event-driven but lacks pipeline capabilities.

Option D (Cloud Scheduler) is for scheduled jobs, not event-driven retraining.

Full explanation →

253

MCQhard

A media company wants to build a real-time recommendation system for articles. They have a large user base (10M+) and frequent updates to user interactions. They need to handle cold-start users and new articles. Which architecture on Vertex AI is most suitable?

A.Deploy a Deep Learning Recommendation Model (DLRM) for prediction

B.Use a contextual bandit algorithm for exploration only

C.Use matrix factorization with collaborative filtering

D.Implement a two-tower model (user and item towers) with embeddings and nearest neighbor search

AnswerD

Two-tower models can incorporate side features and enable fast retrieval.

Why this answer

The two-tower model (user and item towers) with embeddings and nearest neighbor search is the most suitable because it handles cold-start users and new articles by learning separate embeddings for users and items, enabling efficient retrieval via approximate nearest neighbor (ANN) search. This architecture supports real-time updates and scales to 10M+ users by decoupling user and item representations, allowing incremental training on new interactions without full retraining.

Exam trap

Google Cloud often tests the misconception that matrix factorization (Option C) is sufficient for cold-start scenarios, but candidates miss that it requires retraining on new data and cannot generate embeddings for unseen users or items without side features.

How to eliminate wrong answers

Option A is wrong because DLRM is a deep learning model for click-through rate prediction that requires retraining on new data and does not natively handle cold-start items or users without additional feature engineering, making it less suitable for frequent updates and real-time recommendation. Option B is wrong because a contextual bandit algorithm for exploration only lacks exploitation of known user preferences, leading to suboptimal recommendations over time, and does not provide a full recommendation system. Option C is wrong because matrix factorization with collaborative filtering cannot handle cold-start users or new articles without retraining the entire model, as it relies on existing interaction matrices and lacks a mechanism for incorporating new entities in real time.

Full explanation →

254

MCQmedium

A team uses Vertex AI Feature Store for online serving. They notice high latency during peak hours. They have configured the feature store with Bigtable as the online serving store. What is the most likely cause of the high latency?

A.The Bigtable cluster has too many nodes.

B.Feature data is stored as Avro files.

C.The online serving node count is insufficient for the QPS.

D.Feature values are not pre-cached.

AnswerC

Insufficient nodes cause queuing and higher latency under load.

Why this answer

Option C is correct because Vertex AI Feature Store uses Bigtable as the online serving store, and during peak hours, high query-per-second (QPS) loads can overwhelm the serving nodes if they are under-provisioned. Insufficient node count leads to queuing and increased latency, as Bigtable's performance scales linearly with the number of nodes for read throughput. The most direct remedy is to increase the number of Bigtable nodes to match the QPS demand.

Exam trap

The trap here is that candidates may confuse Bigtable's scaling model with caching solutions (like Redis or Memorystore) and incorrectly assume that pre-caching (Option D) is the fix, when in fact the root cause is insufficient node count for the QPS load.

How to eliminate wrong answers

Option A is wrong because having too many Bigtable nodes would reduce latency, not increase it, as more nodes provide higher read throughput and lower queue depth. Option B is wrong because Avro files are used for offline batch storage or export, not for the online serving store, which uses Bigtable's native storage format; Avro files do not affect online latency. Option D is wrong because Bigtable does not support pre-caching of feature values in the same way as an in-memory cache; the latency issue is due to insufficient node count, not a missing caching mechanism.

Full explanation →

255

MCQeasy

Refer to the exhibit. What does this command do?

A.Trains a new BigQuery ML model

B.Exports the model to Cloud Storage

C.Evaluates the model's performance

D.Makes predictions using the model

AnswerD

ML.PREDICT generates predictions.

Why this answer

The command shown in the exhibit is a BigQuery ML prediction query (e.g., `SELECT * FROM ML.PREDICT(MODEL mydataset.mymodel, ...)`). This command uses a trained model to generate predictions on new input data, making option D correct. It does not train, export, or evaluate the model.

Exam trap

Google Cloud often tests the distinction between the four key BigQuery ML commands (`CREATE MODEL`, `ML.EVALUATE`, `ML.PREDICT`, `EXPORT MODEL`), and the trap here is confusing the prediction function with the evaluation function, especially when the exhibit shows a query that looks like it might be evaluating performance due to the presence of a model name and input data.

How to eliminate wrong answers

Option A is wrong because training a new BigQuery ML model uses the `CREATE MODEL` statement, not the `ML.PREDICT` function. Option B is wrong because exporting a model to Cloud Storage uses the `EXPORT MODEL` statement, not a prediction query. Option C is wrong because evaluating model performance uses the `ML.EVALUATE` function, which returns metrics like loss and accuracy, not predictions.

Full explanation →

256

Multi-Selectmedium

Which TWO tools can be used to collaborate on feature definitions across teams?

Select 2 answers

A.Cloud Storage

B.Vertex AI Feature Store

C.Cloud Logging

D.Cloud Build

E.Data Catalog

AnswersB, E

Feature Store provides a central repository for features that teams can share.

Why this answer

Options A and C are correct. Vertex AI Feature Store (A) allows sharing features, and Data Catalog (C) can catalog feature definitions. B (Cloud Storage) is not a collaboration tool.

D (Cloud Build) is for CI/CD. E (Cloud Logging) is for logs.

Full explanation →

257

Multi-Selectmedium

Which TWO of the following are recommended methods to ensure data privacy when collaborating with external partners on ML projects?

Select 2 answers

A.Use Vertex AI Feature Store with access controls.

B.Use Cloud DLP to de-identify data before sharing.

C.Grant the partner project's service account direct access to the raw data in BigQuery.

D.Use Confidential VMs for training with sensitive data.

E.Share data via email.

AnswersB, D

DLP can redact, tokenize, or mask sensitive data.

Why this answer

Cloud DLP (Data Loss Prevention) is a recommended method to de-identify sensitive data before sharing it with external partners. It can automatically detect and mask, tokenize, or redact PII, PCI, or other sensitive elements, ensuring that only anonymized data leaves your environment. This aligns with the principle of least privilege and data minimization for external collaboration.

Exam trap

Google Cloud often tests the misconception that access controls alone (like IAM or Feature Store ACLs) are sufficient for data privacy with external partners, but the key requirement is de-identification or encryption in use, not just authorization.

Full explanation →

258

MCQeasy

A data science team uses Vertex AI Pipelines to build a training pipeline. They notice that when the pipeline fails due to a transient error in a component, the entire pipeline restarts from the beginning, taking a long time. What is the best practice to handle transient errors efficiently?

A.Use Vertex AI Experiment to track runs and manually restart failed components.

B.Configure Vertex AI Pipelines to automatically restart from the last successful state by enabling checkpointing.

C.Wrap the component code in a try-except block and retry indefinitely.

D.Set the component's retry count to 3 in the pipeline definition.

AnswerB

Checkpointing allows the pipeline to resume from the last successful state, minimizing rerun time.

Why this answer

Option B is correct because Vertex AI Pipelines supports checkpointing, which allows a pipeline to resume from the last successful state after a transient failure, avoiding a full restart. This is the most efficient approach for handling transient errors in a managed pipeline service, as it minimizes wasted compute time and resources.

Exam trap

The trap here is that candidates often confuse simple retry logic (Option D) with stateful checkpointing, assuming that retrying a component a few times is sufficient, but they miss that checkpointing preserves the pipeline's progress across failures, which is critical for long-running pipelines.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiment is designed for tracking and comparing runs, not for automating recovery from transient errors; manually restarting failed components defeats the purpose of automation and is inefficient. Option C is wrong because wrapping component code in a try-except block with indefinite retries can lead to infinite loops, resource exhaustion, and does not leverage the pipeline's orchestration capabilities for stateful recovery. Option D is wrong because setting a retry count of 3 in the pipeline definition only retries the failed component from scratch, not from the last successful state, which still wastes time if the component has long-running steps.

Full explanation →

259

MCQmedium

A company deploys a batch prediction job on Vertex AI using a custom container. The job completes successfully, but the predictions are later found to be inaccurate. The ML engineer wants to set up monitoring to detect similar issues proactively. Which approach should the engineer take?

A.Use Cloud Monitoring to create a custom metric for prediction confidence and set an alert when confidence drops below 0.8.

B.Use Cloud Logging to export prediction requests and responses, then create a metric based on prediction count.

C.Export batch predictions to BigQuery, and use Vertex AI Model Monitoring to compare prediction distributions against a baseline.

D.Enable Cloud Audit Logs to track when the batch prediction job runs and analyze the logs for anomalies.

AnswerC

Model Monitoring detects drift by comparing predictions to a baseline.

Why this answer

Option C is correct because Vertex AI Model Monitoring can compare the distribution of batch prediction outputs (stored in BigQuery) against a baseline distribution to detect data drift or skew, which is the most direct way to proactively identify prediction inaccuracies. This approach monitors the statistical properties of predictions over time, catching shifts that could cause accuracy degradation even when the job runs successfully.

Exam trap

The trap here is that candidates assume monitoring prediction confidence or logging request counts is sufficient for detecting inaccuracies, but the PMLE exam specifically tests the concept of distribution drift monitoring as the correct proactive approach for batch prediction quality.

How to eliminate wrong answers

Option A is wrong because prediction confidence is a model-specific output (e.g., softmax probabilities) that may not exist for all models (e.g., regression models), and a fixed threshold of 0.8 is arbitrary; the question requires detecting inaccuracies proactively, not monitoring a single confidence score. Option B is wrong because exporting prediction requests/responses to Cloud Logging and creating a metric based on prediction count only tracks volume, not prediction quality or drift; count metrics cannot detect inaccuracies. Option D is wrong because Cloud Audit Logs track administrative actions (e.g., who ran the job), not the prediction data itself; analyzing audit logs for anomalies would not reveal prediction inaccuracies.

Full explanation →

260

MCQhard

You are troubleshooting a Vertex AI endpoint for a customer. The exhibit shows the endpoint configuration. The customer reports that Model A is experiencing high latency during peaks. Model B runs fine. What is the most likely cause?

A.Model A is not autoscaling properly due to minReplicaCount=1.

B.Model A's machine type has insufficient CPU and GPU for the load.

C.Dedicated endpoint is disabled, causing resource sharing between models.

D.The traffic split is unevenly balanced, causing Model A to receive more requests.

AnswerB

Model A uses n1-standard-4 with 1 GPU, while Model B uses n1-standard-8 with 2 GPUs.

Why this answer

Model A has only one GPU and fewer CPU cores compared to Model B. During high traffic, Model A's resources become a bottleneck. The traffic split is equal, so both get similar load, but Model A's hardware is weaker.

Full explanation →

261

MCQhard

A financial services company uses Vertex AI to build credit risk models. They have a team of 10 data scientists and 3 ML engineers. They use multiple notebooks in Vertex AI Workbench, storing data in Cloud Storage and BigQuery. The team reports that training jobs sometimes fail with 'Permission denied' errors when reading from certain Cloud Storage buckets. The error occurs intermittently and only for some users. The team uses custom service accounts for each user's notebook instance, but the permissions seem inconsistent. The IT security team has enforced that all service accounts must have least privilege. What is the most effective course of action to resolve the permission issues while maintaining security?

A.Create a single service account with broad permissions for all notebook instances and have users impersonate it.

B.Implement resource-level IAM policies on the specific Cloud Storage buckets used, and audit the existing service account permissions.

C.Grant all data scientists the 'Storage Admin' role on the project to ensure they can access any bucket.

D.Move all training data to BigQuery to avoid Cloud Storage permission issues.

AnswerB

Resource-level policies allow fine-grained control while maintaining least privilege.

Why this answer

Option C is correct because implementing resource-level IAM policies on the specific Cloud Storage buckets used will provide the necessary permissions without overgranting. Auditing current permissions helps identify gaps. Option A grants too broad access.

Option B reduces security by using a single powerful account. Option D changes the data architecture unnecessarily.

Full explanation →

262

MCQmedium

A company needs to perform sentiment analysis on streaming social media data. Which architecture should they use?

A.Dataflow → Pub/Sub → Natural Language API → BigQuery

B.Pub/Sub → Cloud Functions → Natural Language API → Cloud Storage

C.Cloud Functions → Pub/Sub → Natural Language API → BigQuery

D.Pub/Sub → Dataflow → Natural Language API → BigQuery

AnswerD

This is the recommended architecture for streaming analytics.

Why this answer

Option D is correct because streaming social media data requires a scalable, ordered ingestion pipeline. Pub/Sub ingests the stream, Dataflow processes it in real-time (e.g., windowing, deduplication), the Natural Language API performs sentiment analysis, and BigQuery stores results for querying. This decouples ingestion from processing and storage, enabling exactly-once semantics and auto-scaling.

Exam trap

Google Cloud often tests the misconception that Cloud Functions can replace Dataflow for streaming pipelines, but Cloud Functions lacks stream processing primitives (e.g., windowing, state management) and has a 9-minute timeout, making it unsuitable for continuous sentiment analysis.

How to eliminate wrong answers

Option A is wrong because Dataflow cannot directly read from a streaming source without a buffer like Pub/Sub; placing Dataflow before Pub/Sub reverses the pipeline order and breaks stream ingestion. Option B is wrong because Cloud Functions is not designed for high-throughput streaming; it has a 9-minute timeout and no built-in stream processing (e.g., windowing), making it unsuitable for continuous social media data. Option C is wrong because Cloud Functions should not be the entry point for streaming data; it lacks Pub/Sub's durability and ordering guarantees, and placing Pub/Sub after Cloud Functions would lose the stream before processing.

Full explanation →

263

MCQeasy

When distributing training across multiple workers using Vertex AI Training, how should the team share the training dataset?

A.Copy the dataset to each worker's local disk

B.Use NFS

C.Use Cloud Storage

D.Use Google Drive

AnswerC

Cloud Storage provides scalable, shared access to training data.

Why this answer

Vertex AI Training workers need shared, concurrent read access to the training dataset without manual replication. Cloud Storage (GCS) is the recommended and fully integrated solution because it provides a distributed, highly available object store that all workers can read from in parallel via the `tf.io.gfile` API or GCS connector, eliminating data duplication and ensuring consistency across the cluster.

Exam trap

The trap here is that candidates confuse 'shared storage' with 'local copies' or 'user-friendly sync tools,' assuming NFS or Drive are viable for distributed ML, when Vertex AI explicitly requires a cloud-native object store like GCS for scalability and fault tolerance.

How to eliminate wrong answers

Option A is wrong because copying the dataset to each worker's local disk introduces data duplication, increases startup latency, and risks inconsistency if workers are preempted or auto-scaled; Vertex AI does not manage local disk replication. Option B is wrong because NFS (Network File System) is not natively supported in Vertex AI Training; it would require manual setup of an NFS server, introduces a single point of failure, and adds network latency that GCS avoids with its native parallel read capabilities. Option D is wrong because Google Drive is a user-facing file sync service, not designed for high-throughput, concurrent access by distributed training jobs; it lacks the necessary IAM integration, access controls, and performance guarantees for ML workloads.

Full explanation →

264

MCQhard

A financial services company uses Vertex AI to deploy multiple models for fraud detection. The ML team has set up a CI/CD pipeline using Cloud Build and Cloud Deploy. The pipeline builds a custom container with the trained model, pushes it to Artifact Registry, and deploys it to a Vertex AI Endpoint. Recently, a new regulation requires that all model deployments be audited and approved by the compliance team before going live. The compliance team wants to review the model's evaluation metrics and approve the deployment via a ticketing system. Currently, the CI/CD pipeline automatically deploys after the container is built. The team needs to implement a gating process without slowing down the development cycle. What should they do?

A.Use Cloud Composer to orchestrate the deployment and add a sensor that waits for approval from the ticketing system via a custom operator.

B.Use Cloud Build's built-in approval gate feature to require compliance team sign-off before deployment.

C.Modify the CI/CD pipeline to use Cloud Deploy's approval gate feature, requiring a manual approval from the compliance team before the deployment step.

D.Store the model artifacts in Cloud Storage and have the compliance team deploy manually using the gcloud command.

AnswerC

Cloud Deploy supports manual approval gates integrated with the pipeline.

Why this answer

Option C is correct because Cloud Deploy provides a native approval gate feature that can be inserted into a delivery pipeline to require manual sign-off before a deployment proceeds. This allows the compliance team to review model evaluation metrics and approve via a ticketing system without modifying the CI/CD pipeline's build process, thus maintaining development velocity. The approval gate pauses the deployment at a specific stage, waiting for an external approval signal, which integrates seamlessly with Cloud Deploy's rollout management.

Exam trap

The trap here is confusing Cloud Build's approval gates (which operate at the build stage) with Cloud Deploy's approval gates (which operate at the deployment stage), leading candidates to incorrectly select Option B despite it not addressing the deployment gating requirement.

How to eliminate wrong answers

Option A is wrong because Cloud Composer (based on Apache Airflow) is an orchestration tool for workflows, but adding a sensor for ticketing approval introduces unnecessary complexity and overhead, slowing down the development cycle compared to a native approval gate. Option B is wrong because Cloud Build's built-in approval gate feature is designed for build-level approvals (e.g., before pushing an image), not for deployment-stage gating; it would require restructuring the pipeline to pause the build process, which is not aligned with the requirement to gate deployment after the container is built. Option D is wrong because manual deployment via gcloud commands bypasses automation entirely, reintroducing delays and human error, contradicting the goal of not slowing down the development cycle.

Full explanation →

265

MCQhard

A financial institution needs to deploy a fraud detection model with strict latency <100ms per prediction and high throughput (1000 predictions/sec). The model is a deep neural network. Which architecture on Google Cloud meets these requirements?

A.Deploy the model on AI Platform Training with a single large VM

B.Deploy the model as a Cloud Function triggered by Cloud Pub/Sub

C.Use Vertex AI Batch Prediction with a fixed number of machines

D.Use Vertex AI Prediction with autoscaling enabled and GPU machine types

AnswerD

Vertex AI Prediction provides real-time endpoints with autoscaling and GPU support for low latency and high throughput.

Why this answer

Vertex AI Prediction with autoscaling and GPU machine types is correct because it provides low-latency online serving with autoscaling to handle high throughput (1000 predictions/sec) while keeping latency under 100ms. GPUs accelerate deep neural network inference, and autoscaling ensures resources match demand without over-provisioning.

Exam trap

Google Cloud often tests the distinction between batch and online prediction services, where candidates mistakenly choose batch prediction for real-time requirements because they focus on throughput without considering latency constraints.

How to eliminate wrong answers

Option A is wrong because AI Platform Training is designed for model training, not real-time serving, and a single large VM cannot guarantee sub-100ms latency under high throughput due to resource contention and lack of autoscaling. Option B is wrong because Cloud Functions have a maximum timeout of 9 minutes (540 seconds) and are not optimized for high-throughput, low-latency ML inference; they also lack GPU support, making deep neural network inference too slow. Option C is wrong because Vertex AI Batch Prediction is for asynchronous, offline predictions on large datasets, not real-time serving with strict latency requirements; it processes jobs in batches and cannot meet sub-100ms per prediction.

Full explanation →

266

Multi-Selecteasy

Which TWO of the following are low-code machine learning solutions on Google Cloud?

Select 2 answers

A.TensorFlow

B.scikit-learn

C.PyTorch

D.BigQuery ML

E.Vertex AI AutoML

AnswersD, E

BigQuery ML allows creating models using SQL.

Why this answer

BigQuery ML (D) is a low-code ML solution because it allows users to create, train, and deploy machine learning models using standard SQL queries directly within BigQuery, eliminating the need for custom coding in Python or other programming languages. Vertex AI AutoML (E) is also low-code as it provides a graphical interface and automated pipeline to train high-quality models with minimal manual intervention, handling feature engineering, model selection, and hyperparameter tuning automatically.

Exam trap

Google Cloud often tests the distinction between general-purpose ML frameworks (like TensorFlow, scikit-learn, PyTorch) that require significant coding versus managed services (BigQuery ML, AutoML) that provide low-code or no-code interfaces, leading candidates to mistakenly classify any ML tool on Google Cloud as low-code.

Full explanation →

267

Multi-Selectmedium

Which TWO actions are recommended for collaborating on machine learning models using Vertex AI Model Registry?

Select 2 answers

A.Use Cloud Storage object labels to store model descriptions.

B.Use version aliases such as 'champion' and 'challenger' to manage model lifecycle.

C.Deploy all model versions to a single endpoint for comparison.

D.Attach custom metadata (e.g., training dataset, hyperparameters) to each model version.

E.Create a separate model entry for each training run.

AnswersB, D

Aliases enable controlled promotion of models.

Why this answer

Option B is correct because Vertex AI Model Registry supports version aliases like 'champion' and 'challenger' to designate which model version should serve as the production candidate and which is under evaluation, enabling controlled lifecycle management and A/B testing without manual version tracking.

Exam trap

Google Cloud often tests the distinction between a single model entry with multiple versions versus separate model entries per run, and candidates mistakenly think separate entries provide better traceability, but the registry's versioning and alias system is specifically designed to avoid that fragmentation.

Full explanation →

268

MCQeasy

A company wants to classify support ticket text into categories. They have labeled historical tickets. Which Google Cloud service allows them to train a custom classification model with no code?

A.Vertex AI Matching Engine

B.AutoML Natural Language

C.Cloud Natural Language API

D.Document AI

AnswerB

Correct: No-code custom text classification.

Why this answer

AutoML Natural Language (now part of Vertex AI) is the correct service because it enables users to train custom text classification models using labeled data without writing any code. It provides a no-code interface for uploading datasets, training models, and evaluating performance, making it ideal for classifying support ticket text into custom categories.

Exam trap

The trap here is that candidates confuse the pre-trained Cloud Natural Language API (which requires no training but cannot be customized) with AutoML Natural Language (which requires labeled data but allows custom categories), leading them to select Option C incorrectly.

How to eliminate wrong answers

Option A is wrong because Vertex AI Matching Engine is designed for vector similarity search and embeddings, not for training custom classification models with labeled text data. Option C is wrong because Cloud Natural Language API is a pre-trained API that offers sentiment analysis, entity extraction, and syntax analysis, but it cannot be trained on custom labeled data for custom categories. Option D is wrong because Document AI is specialized for document processing (e.g., OCR, form parsing, invoice extraction) and is not intended for general text classification from labeled ticket data.

Full explanation →

269

MCQhard

Refer to the exhibit. An alert policy is configured to trigger when prediction latency exceeds 500 ms for 5 consecutive minutes. The team is experiencing many false positive alerts during brief latency spikes. Which adjustment would most effectively reduce false positives while still detecting prolonged latency issues?

A.Change the comparison to less than

B.Add a condition that CPU utilization is also high

C.Increase the duration to 30 minutes

D.Increase the threshold to 1000 ms

AnswerC

A longer duration means the condition must persist for 30 minutes, filtering out brief spikes while still catching sustained high latency.

Why this answer

Increasing the duration from 5 to 30 minutes (Option C) directly addresses the problem of false positives from brief latency spikes by requiring the latency to exceed 500 ms for a longer continuous period before triggering an alert. This ensures that only sustained, prolonged latency issues—not transient spikes—activate the policy, aligning with the goal of detecting genuine degradation while ignoring noise.

Exam trap

Google Cloud often tests the distinction between threshold and duration adjustments, trapping candidates who think raising the threshold (Option D) is the only way to reduce false positives, when in fact increasing the evaluation window is more precise for filtering out transient spikes without compromising detection of sustained issues.

How to eliminate wrong answers

Option A is wrong because changing the comparison to 'less than' would invert the logic, triggering alerts when latency is below 500 ms, which is the opposite of detecting high latency and would generate false positives for normal or low-latency conditions. Option B is wrong because adding a condition that CPU utilization is also high introduces an unnecessary dependency that may miss prolonged latency issues caused by other factors (e.g., network bottlenecks, memory pressure, or I/O wait), and it does not address the core problem of brief latency spikes. Option D is wrong because increasing the threshold to 1000 ms would allow sustained latency between 500 ms and 1000 ms to go undetected, failing to capture prolonged issues that still violate the original 500 ms requirement, and it does not filter out brief spikes.

Full explanation →

270

MCQeasy

A company needs to serve a model for real-time predictions with a strict latency SLA of 100ms at the 99th percentile. The model is lightweight and traffic patterns are highly variable with occasional spikes. Which deployment strategy best meets the SLA while controlling cost?

A.Deploy the model as a Cloud Run service with autoscaling to zero.

B.Deploy to Vertex AI Endpoint with manual scaling and a fixed number of replicas.

C.Use Vertex AI Batch Prediction.

D.Deploy to Vertex AI Endpoint with min_replica_count=3 and autoscaling enabled.

AnswerD

Min replicas provide baseline capacity to absorb spikes, and autoscaling adds replicas as needed.

Why this answer

Option D is correct because setting a minimum number of replicas ensures baseline capacity to handle initial spikes without cold start delays, while autoscaling handles larger spikes. Option A is wrong because batch prediction is not real-time. Option B is wrong because no scaling may cause over-provisioning or under-provisioning.

Option C is wrong because Cloud Run with no accelerator may not meet latency SLA for ML models.

Full explanation →

271

MCQmedium

Your team is using Vertex AI Feature Store for online predictions. You notice that feature values for some entities are missing in production, leading to failed predictions. Upon investigation, you find that the ingestion pipeline has been failing intermittently. What is the best immediate course of action to prevent prediction failures?

A.Configure default values for missing features in the feature store so that the model can fall back on them.

B.Set up monitoring alerts on the ingestion pipeline to get notified of failures.

C.Change the prediction request to ignore missing features.

D.Manually re-ingest all missing features by running the ingestion pipeline again.

AnswerA

Ensures predictions can be made even when features are not available.

Why this answer

Option D is correct because using default values in the serving layer ensures predictions can still be made when features are missing. Option A is wrong because recreating features takes time and does not fix the ingestion issue. Option B is wrong because it does not address the missing values.

Option C is wrong because monitoring alone does not prevent failures.

Full explanation →

272

Multi-Selectmedium

A company uses Vertex AI for AutoML training. Which THREE are best practices for managing model versions?

Select 3 answers

A.Deploy each model version to a separate endpoint

B.Use Vertex AI Model Registry to version models

C.Use evaluation metrics to compare versions

D.Use labels to tag models for tracking

E.Automatically delete old versions after 30 days

AnswersB, C, D

Correct: Centralized model versioning.

Why this answer

Vertex AI Model Registry is the central repository for managing and versioning models, allowing you to track iterations, compare performance, and control deployments. It provides a structured way to organize models, roll back to previous versions if needed, and maintain lineage for compliance and reproducibility.

Exam trap

The trap here is that candidates may think deploying each version to a separate endpoint is necessary for isolation, but Vertex AI's traffic splitting on a single endpoint is the correct and cost-effective approach for managing multiple model versions.

Full explanation →

273

Multi-Selectmedium

A machine learning engineer is monitoring a deployed churn prediction model that has shown a gradual decline in accuracy over the past month. The engineer wants to diagnose the root cause of the performance degradation. Which TWO actions should the engineer take? (Choose two.)

Select 2 answers

A.Increase the model's learning rate and fine-tune it on the latest data.

B.Immediately retrain the model using all available historical data to improve accuracy.

C.Deploy a second model in parallel to compare predictions.

D.Use Vertex AI Model Monitoring to detect data drift by comparing the distribution of recent input features against the training data distribution.

E.Monitor the model's prediction accuracy by comparing recent predictions against newly collected ground truth labels.

AnswersD, E

Detecting data drift helps identify if the input distribution has changed, which often causes prediction drift.

Why this answer

Option D is correct because Vertex AI Model Monitoring is specifically designed to detect data drift by comparing the distribution of recent input features against the training data distribution. This allows the engineer to identify if the gradual decline in accuracy is caused by changes in the input data, which is a common root cause for model performance degradation over time.

Exam trap

The trap here is that candidates often confuse reactive retraining (Option B) with diagnostic monitoring, failing to recognize that the first step in troubleshooting performance degradation is to identify the root cause through drift detection and ground truth comparison, not to immediately modify or retrain the model.

Full explanation →

274

MCQhard

A data science team uses Vertex AI Pipelines to orchestrate ML training. They notice that some pipeline runs are failing because of inconsistent data schemas. They want to enforce schema validation as a gate before the training step executes. Which approach should they implement?

A.Use Cloud Dataflow to validate schema during data ingestion before the pipeline starts.

B.Use BigQuery schema enforcement when importing data.

C.Add a pipeline component that runs schema validation using the TensorFlow Data Validation library.

D.Use TFX ExampleGen with schema_gen to automatically generate and enforce schemas.

AnswerC

A custom component using TFDV can validate schema inside the pipeline and fail early if mismatched.

Why this answer

Option C is correct because the TensorFlow Data Validation (TFDV) library is specifically designed for ML pipeline schema validation. By adding a custom pipeline component that uses TFDV, the team can validate incoming data schemas against a predefined schema directly within the Vertex AI Pipelines orchestration, acting as a gate before the training step executes. This approach integrates seamlessly with the pipeline's component-based architecture and provides detailed anomaly reports.

Exam trap

Google Cloud often tests the distinction between tools that are part of the TFX ecosystem (like ExampleGen) versus standalone libraries (like TFDV) that can be used independently in custom pipeline components, leading candidates to choose D because they associate schema validation with TFX without realizing the integration requirements.

How to eliminate wrong answers

Option A is wrong because Cloud Dataflow is a batch/stream processing service for data transformation, not a schema validation tool; validating schema during ingestion outside the pipeline does not enforce the gate within the pipeline orchestration. Option B is wrong because BigQuery schema enforcement only validates data at the table level during import, but it is not a pipeline component that can be placed as a gate before a training step in Vertex AI Pipelines. Option D is wrong because TFX ExampleGen with schema_gen is part of the TFX framework, which is not directly compatible with Vertex AI Pipelines' custom component model; it would require significant adaptation and does not provide a simple gate component within the pipeline.

Full explanation →

275

MCQhard

You manage a multi-tenant serving system on Vertex AI Prediction where multiple models are deployed in a single endpoint using model versioning. One particular model version (v2) is consuming excessive resources, causing latency spikes for other versions. You need to isolate this model to prevent interference. The models are all in TensorFlow SavedModel format. What is the best approach?

A.Shard the models across multiple replicas using a custom routing logic in the container.

B.Set resource limits on the container using Kubernetes resource requests/limits, but Vertex AI Prediction does not support that.

C.Use Vertex AI Model Registry to deploy v2 to a dedicated endpoint and update the model alias.

D.Create a separate endpoint for v2 and redirect traffic using a load balancer.

AnswerC

Dedicated endpoint ensures resource isolation.

Why this answer

Option B is correct because creating a separate endpoint for v2 provides full resource isolation. Option A is similar but less direct (load balancer still distributes to same endpoint). Option C is not possible in Vertex AI Prediction.

Option D is complex and error-prone.

Full explanation →

276

MCQhard

A company needs to maintain an audit trail of model changes for compliance. Multiple teams will be updating models. What is the best approach to track who created, modified, or deployed each model version?

A.Enable Cloud Storage audit logs and require all model files to be stored in a bucket

B.Use Cloud Logging to collect logs from all services and search for model names

C.Use Vertex AI Experiments and Metadata to track model lineage and audit logs

D.Ask team members to maintain a shared spreadsheet of changes

AnswerC

Vertex AI provides built-in audit capabilities with user attribution and metadata.

Why this answer

Option A is correct because Vertex AI automatically logs metadata (including user identity) via Cloud Audit Logs and ML Metadata. Option B is wrong because Cloud Storage logs only show object-level access, not model-specific actions. Option C is wrong because manual logging is error-prone.

Option D is wrong because Cloud Logging alone does not correlate events to model versions.

Full explanation →

277

MCQmedium

A company deploys a custom ML model on Vertex AI to predict customer churn. The model retrains weekly, and predictions are served via a Vertex AI endpoint. After a recent retraining, the monitoring dashboard shows a sudden increase in prediction requests but a decrease in predicted churn probabilities. The model's accuracy on the validation set remains stable. What is the most likely cause of the observed behavior?

A.A training-serving skew exists between the training pipeline and the serving endpoint.

B.Concept drift has occurred, changing the relationship between features and churn.

C.The incoming data distribution has changed, e.g., due to a new marketing campaign attracting different customers.

D.Data leakage during training caused the model to overfit to historical patterns.

AnswerC

This is covariate shift; the model sees inputs it wasn't trained on, leading to lower confidence predictions.

Why this answer

Option C is correct because a sudden increase in prediction requests alongside a decrease in predicted churn probabilities, while validation accuracy remains stable, indicates a shift in the incoming data distribution (covariate shift). This is typical when a new marketing campaign attracts a different customer segment that inherently has lower churn risk. The model itself hasn't degraded; it's simply seeing a different population than it was trained on, which changes the base rate of churn in the live traffic.

Exam trap

Google Cloud often tests the distinction between covariate shift (data distribution change) and concept drift (relationship change), trapping candidates who assume any change in predictions must be due to model degradation or data leakage.

How to eliminate wrong answers

Option A is wrong because training-serving skew refers to a mismatch in feature preprocessing or data format between training and serving, which would typically cause a drop in accuracy or anomalous predictions, not a stable validation accuracy with a shift in prediction distribution. Option B is wrong because concept drift would change the relationship between features and the target (churn), leading to a decline in model accuracy on the validation set, which is explicitly stated as stable. Option D is wrong because data leakage during training would cause overfitting to historical patterns, resulting in poor generalization and a drop in validation accuracy, not a stable accuracy with a shift in prediction probabilities.

Full explanation →

278

MCQmedium

A team wants to deploy two versions of a model (v1 and v2) on Vertex AI Endpoint to conduct an A/B test. They need to split traffic so that 10% of requests go to v2. Which configuration achieves this?

A.Deploy both versions on the same endpoint and use the `traffic_split` parameter to allocate 90% to v1 and 10% to v2.

B.Configure a global load balancer in front of two endpoints and set the weight.

C.Create two separate endpoints, one for each version, and have the client randomly select the endpoint.

D.Deploy v2 as a canary deployment and set the canary rollout to 10% in Cloud Deployment Manager.

AnswerA

Vertex AI endpoints support traffic splitting between deployed models.

Why this answer

Option C is correct because Vertex AI Endpoints support traffic splitting by allocating percentages to each model. Option A is wrong because canary deployment gradually rolls out, not fixed split. Option B is wrong because multiple endpoints cannot share traffic splitting.

Option D is wrong because routing at load balancer is not necessary.

Full explanation →

279

MCQeasy

A model deployed on Vertex AI Endpoints shows increasing prediction latency. What is the most scalable way to reduce latency?

A.Switch to a larger machine type

B.Enable autoscaling with min nodes increased

C.Use batch prediction instead

D.Deploy multiple model versions

AnswerB

Autoscaling adds nodes during high load, reducing latency.

Why this answer

Increasing the minimum number of nodes in autoscaling ensures that a baseline of compute capacity is always ready to handle requests, reducing cold-start latency. This is the most scalable approach because it allows the endpoint to dynamically scale up during traffic spikes while maintaining a floor of pre-warmed instances, directly addressing prediction latency without over-provisioning.

Exam trap

The trap here is that candidates confuse 'scalability' with 'raw performance' and choose a larger machine type (A), not realizing that horizontal scaling with pre-warmed nodes is more cost-effective and elastic for reducing latency under variable load.

How to eliminate wrong answers

Option A is wrong because switching to a larger machine type (e.g., more vCPUs or memory) can reduce per-request latency but is not scalable—it increases cost linearly and does not handle traffic bursts efficiently, as it still relies on a single node's capacity. Option C is wrong because batch prediction is designed for offline, asynchronous processing of large datasets and does not reduce real-time prediction latency; it actually increases end-to-end time for individual requests. Option D is wrong because deploying multiple model versions does not inherently reduce latency; it adds routing overhead and does not address compute capacity or cold starts, and is intended for A/B testing or gradual rollouts, not performance optimization.

Full explanation →

280

Multi-Selectmedium

Which THREE practices improve collaboration when using Cloud Composer for ML pipelines?

Select 3 answers

A.Keep all pipeline logic in a single large DAG for simplicity.

B.Use a shared Cloud Storage bucket for intermediate artifacts with appropriate permissions.

C.Store DAGs in a version-controlled repository and use CI/CD to deploy them.

D.Embed service account keys directly in DAG code for authentication.

E.Use Airflow variables and connections to parameterize DAGs.

AnswersB, C, E

Facilitates handoff between pipeline steps and teams.

Why this answer

Option B is correct because Cloud Composer workflows often require sharing intermediate data (e.g., transformed datasets, model checkpoints) across multiple DAGs or team members. A shared Cloud Storage bucket with fine-grained IAM permissions enables secure, centralized artifact exchange without duplicating data or exposing it to unauthorized users. This practice avoids hard-coded paths and ensures that all pipeline stages can reliably access the same artifacts, which is critical for reproducibility and collaboration in ML pipelines.

Exam trap

Google Cloud often tests the misconception that a single monolithic DAG simplifies collaboration, when in fact it creates bottlenecks and merge conflicts; the trap is that candidates confuse 'simplicity' with 'ease of collaboration' without considering modularity and CI/CD practices.

Full explanation →

281

MCQhard

A recommendation system model is updated daily via a retraining pipeline. After each update, the online prediction latency increases significantly for about 30 minutes before returning to normal. What is the most likely cause and solution?

A.The Vertex AI endpoint autoscaling policy is too aggressive, causing scale-down during retraining.

B.The retraining pipeline runs on a GKE cluster that shares resources with the serving endpoint.

C.The model is being switched from CPU to GPU at deployment.

D.The new model version causes cold start in the serving infrastructure; pre-warm the model by sending a dummy request after deployment.

AnswerD

Pre-warming ensures the model is loaded into memory before serving real traffic.

Why this answer

Option A is correct because the cold start due to model version change causes initial slow inference while caches warm up, and pre-warming with traffic can mitigate. Option B is wrong because GKE is not directly involved. Option C is wrong because GPU switching is not needed.

Option D is wrong because the issue is not resource contention.

Full explanation →

282

MCQhard

A team uses Vertex AI Pipelines to automate training and deployment. They need to ensure that only models that pass a set of quality checks (e.g., accuracy > 0.9, latency < 100ms) are deployed to production. How should they implement this?

A.Manually review each model before promotion

B.Use Cloud Functions to deploy only if accuracy is reported in BigQuery

C.Set up Cloud Build triggers to deploy every model version

D.Add a Pipeline component that evaluates metrics and uses a conditional gate to deployment

AnswerD

Pipelines support conditional execution based on component outputs.

Why this answer

Vertex AI Pipelines can include custom components to evaluate metrics and conditionally proceed to deployment if thresholds are met. Option A is manual, B lacks conditional logic, D uses different services without such built-in gating.

Full explanation →

283

MCQhard

You are an ML engineer at a logistics company. The company uses a Vertex AI Pipeline with BigQuery ML to train a model that predicts delivery delays based on weather, traffic, and historical order data. The pipeline runs daily and includes steps: (1) data extraction from BigQuery, (2) feature engineering using Dataflow, (3) model training with BigQuery ML (logistic regression), (4) model evaluation, and (5) conditional deployment to a Vertex AI Endpoint if accuracy > 0.85. Recently, the pipeline has been failing at step 5 with the error: "Vertex AI Endpoint creation failed: Quota limit of 1 endpoint per region exceeded." The company has already created one endpoint in the same region for another model. The pipeline is configured to create a new endpoint each time a model is deployed. The engineer needs to fix this with minimal changes to the pipeline code. Which course of action should the engineer take?

A.Submit a quota increase request to Google Cloud for Vertex AI Endpoints in the current region.

B.Change the region in the pipeline configuration to a region with available endpoint quota.

C.Remove the accuracy threshold and deploy every model automatically to a pre-created endpoint.

D.Modify the deployment step to check if an endpoint already exists and, if so, deploy a new model version to the existing endpoint instead of creating a new one.

AnswerD

Reuses the existing endpoint, avoiding quota limits.

Why this answer

Option D is correct because it directly addresses the root cause: the pipeline fails because it tries to create a new endpoint each time, exceeding the regional quota of one endpoint. By modifying the deployment step to check for an existing endpoint and deploying a new model version to it, the engineer avoids quota issues without altering the pipeline's core logic or requiring external approvals. This approach leverages Vertex AI's model versioning capability, which allows multiple model versions under a single endpoint, aligning with minimal code changes.

Exam trap

The trap here is that candidates may focus on quota limits as a resource issue (Option A) or a region issue (Option B), rather than recognizing that the pipeline's deployment logic is architecturally flawed by creating a new endpoint per deployment, which is both inefficient and violates best practices for model serving.

How to eliminate wrong answers

Option A is wrong because submitting a quota increase request is a slow, administrative process that does not constitute a minimal code change and may not be approved quickly, leaving the pipeline broken in the meantime. Option B is wrong because changing the region introduces additional complexity (e.g., data residency, latency, and potential BigQuery dataset location mismatches) and does not address the underlying design issue of creating a new endpoint per deployment. Option C is wrong because removing the accuracy threshold undermines the model quality gate, potentially deploying poor models, and still requires creating a new endpoint each time, which would still hit the quota limit.

Full explanation →

284

MCQhard

A team of ML engineers is building a real-time fraud detection system. They use Cloud Pub/Sub to stream transactions, Dataflow for feature engineering, and Vertex AI to get predictions. They want to ensure that the data used for training matches the data used for serving to avoid training-serving skew. Which approach should they take?

A.Use a batch processing system for both training and serving to ensure identical feature calculations.

B.Implement separate feature engineering pipelines for training and serving, but document them carefully.

C.Use Vertex AI Feature Store to store features computed during training and retrieve them in the serving pipeline.

D.Ensure that both training and serving read from the same Cloud Storage location.

AnswerC

Feature Store provides a consistent feature definition and computation.

Why this answer

Vertex AI Feature Store ensures that the same feature engineering logic is applied consistently during both training and serving. By storing precomputed features in the Feature Store, the serving pipeline retrieves the exact same feature values that were used during training, eliminating the risk of training-serving skew. This approach is specifically designed for real-time systems where streaming data (via Pub/Sub and Dataflow) must be served with identical transformations.

Exam trap

The trap here is that candidates confuse data consistency (same raw source) with feature consistency (same computed values), leading them to pick Option D, which only addresses raw data location, not the transformation logic.

How to eliminate wrong answers

Option A is wrong because batch processing introduces latency that is incompatible with real-time fraud detection, and it does not guarantee identical feature calculations if the batch and streaming codebases diverge. Option B is wrong because separate pipelines inevitably lead to implementation differences, documentation drift, and training-serving skew — the opposite of the desired outcome. Option D is wrong because reading from the same Cloud Storage location only ensures raw data consistency, not that the feature engineering transformations (e.g., aggregations, windowing, encoding) are identical between training and serving.

Full explanation →

285

Multi-Selectmedium

A team of data scientists and ML engineers is collaborating on a shared feature store in Vertex AI Feature Store. They need to ensure that feature definitions are versioned and that changes are reviewed before being used in production pipelines. Which TWO practices should they implement?

Select 2 answers

A.Allow data scientists to edit feature definitions directly in the Vertex AI Feature Store console.

B.Require code reviews for all changes to feature definitions before merging to the main branch.

C.Define multiple feature views in Vertex AI Feature Store for different environments and manage access via IAM.

D.Store feature definition code in a version-controlled repository such as Cloud Source Repositories.

E.Use scheduled batch jobs to synchronize feature definitions from a shared spreadsheet to Vertex AI Feature Store.

AnswersB, D

Code reviews ensure quality and approval.

Why this answer

Option B is correct because requiring code reviews for all changes to feature definitions before merging to the main branch enforces a peer-review gate, ensuring that modifications are validated for correctness, consistency, and compliance before they reach production. This aligns with MLOps best practices for governance and reduces the risk of introducing errors or breaking changes into the feature store.

Exam trap

Google Cloud often tests the distinction between environment isolation (IAM and multiple feature views) and the actual versioning/review process, leading candidates to mistakenly select Option C as a versioning practice when it only addresses access control and environment separation.

Full explanation →

286

MCQhard

A logistics company uses Vertex AI AutoML Tables to predict delivery delays based on order attributes, weather data, and traffic data. The model is retrained weekly using a Vertex AI Pipeline that runs a BigQuery query to get training data, then triggers AutoML training. Recently, the pipeline fails with the error 'Dataset not found' when the AutoML training step starts. The BigQuery query runs successfully and outputs a table. Which is the most likely cause?

A.The AutoML training step is referencing a different dataset location.

B.The training data has been manually deleted from Cloud Storage.

C.The pipeline's IAM permissions are insufficient to access BigQuery.

D.The BigQuery output table is not being passed as a Vertex AI Dataset resource.

AnswerD

The pipeline must create a Vertex AI Dataset from the BigQuery table for AutoML to use.

Why this answer

The error 'Dataset not found' occurs because AutoML Tables requires a Vertex AI Dataset resource (a metadata wrapper) to reference the training data, not just a BigQuery table. The pipeline's BigQuery query produces a table, but if that table is not explicitly converted into or passed as a Vertex AI Dataset resource (via the `aiplatform.Dataset` creation step), AutoML training cannot locate it. Option D correctly identifies this missing step as the root cause.

Exam trap

Google Cloud often tests the distinction between a raw data source (BigQuery table) and a Vertex AI Dataset resource, trapping candidates who assume AutoML can directly consume a BigQuery table without the required metadata wrapper.

How to eliminate wrong answers

Option A is wrong because the error is 'Dataset not found', not a location mismatch; AutoML Tables uses Dataset resource IDs, not direct paths, so a different dataset location would cause a different error (e.g., 'Permission denied' or 'Table not found'). Option B is wrong because the training data is stored in BigQuery, not Cloud Storage, and the error occurs at the AutoML step, not during data retrieval; manual deletion of a Cloud Storage file would not affect a BigQuery-sourced dataset. Option C is wrong because the BigQuery query runs successfully, proving the pipeline's IAM permissions to access BigQuery are sufficient; insufficient permissions would fail at the query step, not at the AutoML training step.

Full explanation →

287

MCQhard

A data science team is deploying a PyTorch model for real-time inference using Vertex AI Endpoints. The model requires a custom container with specific CUDA drivers and Python packages. They have created a Docker image and pushed it to Artifact Registry. The pipeline should automatically retrain the model every week and deploy the new version if it passes validation. However, the deployment step fails intermittently with the error 'The container image is not compatible with the machine type.' What is the most likely cause?

A.The service account does not have permission to pull the container from Artifact Registry.

B.The container image requires GPU support but the machine type specified in the endpoint is a CPU-only machine.

C.The container's health check endpoint is not responding correctly.

D.The model artifact size exceeds the maximum allowed for the machine type.

AnswerB

CUDA drivers require GPU machines; using a CPU machine causes compatibility error.

Why this answer

The error 'The container image is not compatible with the machine type' indicates a mismatch between the container's hardware requirements and the machine type selected for the Vertex AI Endpoint. Since the custom container requires specific CUDA drivers, it is built for GPU acceleration. If the endpoint is configured with a CPU-only machine type (e.g., n1-standard-4), the container will fail to run because the GPU drivers cannot initialize, triggering this incompatibility error.

Exam trap

Google Cloud often tests the distinction between deployment-time compatibility errors and runtime health check failures, tricking candidates into confusing a misconfigured machine type with a failing health probe.

How to eliminate wrong answers

Option A is wrong because a permission issue (e.g., missing artifactregistry.reader role) would produce an 'unauthorized' or 'access denied' error when pulling the image, not a compatibility error. Option C is wrong because a failing health check would cause the deployment to succeed initially but then report the container as unhealthy, not a pre-deployment compatibility error. Option D is wrong because Vertex AI has no per-machine-type artifact size limit; model size constraints are separate and would manifest as a resource-exhausted error, not a compatibility error.

Full explanation →

288

MCQhard

A global e-commerce company uses BigQuery ML to forecast daily sales for 10,000 products. They use a time-series model with a horizon of 7 days. Recently, forecasts for a specific product category have been consistently too high. They suspect the model is not capturing a new seasonal pattern. Which action should they take first to diagnose the issue?

A.Retrain the model with minimal additional data

B.Run ML.EVALUATE on the recent sales data and compare accuracy metrics

C.Increase the forecast horizon to 14 days

D.Switch to AutoML forecasting via Vertex AI AutoML

AnswerB

Allows quantifying drift and identifying underperforming categories.

Why this answer

Running ML.EVALUATE on recent sales data allows you to compute accuracy metrics (e.g., MAE, MAPE) specifically for the period where the model is failing. This isolates whether the error is due to a new seasonal pattern or another cause, without retraining or changing the model architecture. It is the standard first diagnostic step in BigQuery ML for time-series models.

Exam trap

Google Cloud often tests the principle that diagnosis must precede action—candidates mistakenly jump to retraining or switching tools instead of evaluating the existing model's performance on the problematic data window.

How to eliminate wrong answers

Option A is wrong because retraining with minimal additional data does not diagnose why forecasts are too high; it only incorporates more data without identifying the root cause. Option C is wrong because increasing the forecast horizon to 14 days would worsen the problem by extending predictions further into the uncertain future, not addressing the seasonal pattern miss. Option D is wrong because switching to AutoML forecasting via Vertex AI AutoML is a premature architectural change that bypasses the diagnostic step; you should first evaluate the current model to understand the error before migrating.

Full explanation →

289

MCQmedium

An MLOps team wants to set up alerts for GPU memory utilization on Vertex AI Training jobs. Which approach is most efficient?

A.Enable Cloud Audit Logs for the training job and parse the logs for GPU memory events.

B.Create a log-based metric from the training job's GPU logs.

C.Add a container sidecar that emits a custom metric for GPU memory usage via OpenCensus.

D.Use the 'compute.googleapis.com/accelerator/memory_utilization' metric with a metric threshold condition.

AnswerD

Automatically collected GPU metric.

Why this answer

Option D is correct because Vertex AI training jobs automatically export the 'compute.googleapis.com/accelerator/memory_utilization' metric to Cloud Monitoring. This metric is natively collected by the Google Cloud agent on the training VM, so you can directly create a metric threshold alert without any custom instrumentation or log parsing. It is the most efficient approach as it requires zero additional code or configuration.

Exam trap

Google Cloud often tests the misconception that custom instrumentation (sidecars or log parsing) is always required for GPU monitoring, when in fact Vertex AI provides a native metric that eliminates that need.

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs record administrative actions (e.g., who created a job), not runtime GPU memory utilization; they lack the granularity needed for real-time resource monitoring. Option B is wrong because log-based metrics require you to first generate GPU memory logs (which Vertex AI does not emit by default) and then parse them, adding latency and complexity compared to using a pre-existing metric. Option C is wrong because adding a sidecar container to emit a custom metric via OpenCensus is unnecessary overhead; Vertex AI already exposes the exact GPU memory metric natively, making a sidecar redundant and less efficient.

Full explanation →

290

MCQeasy

A marketing agency uses Vertex AI AutoML Vision to classify social media images into brand logos and generic content. They have 5,000 images per class. The model achieves 95% accuracy on validation set, but in production it misclassifies many images that contain logos in unusual angles or lighting. They have limited ML expertise and want to improve robustness. Which action should they take?

A.Switch to a custom CNN model trained with data augmentation.

B.Augment the training set with images that have varied angles and lighting.

C.Deploy the model with a lower confidence threshold.

D.Use Vertex AI Matching Engine for similarity search instead.

AnswerB

Simply adding more diverse training images improves model robustness.

Why this answer

Option B is correct because the core issue is a domain shift between the training data (likely clean, canonical logo images) and production data (logos at unusual angles and lighting). Augmenting the training set with those specific variations directly addresses the lack of robustness by exposing the model to the missing edge cases during training, which is the most effective and simplest fix for a team with limited ML expertise using AutoML Vision.

Exam trap

The trap here is that candidates often assume a more complex model (custom CNN) is needed for robustness, when in fact the problem is a data distribution mismatch that can be fixed with simple data augmentation, which is the most practical solution for a team with limited ML expertise using a managed service like AutoML.

How to eliminate wrong answers

Option A is wrong because switching to a custom CNN model requires significant ML expertise to design, train, and tune, which contradicts the team's limited ML expertise; AutoML Vision already uses a CNN-based architecture under the hood, so the issue is data quality, not model architecture. Option C is wrong because lowering the confidence threshold would increase the number of false positives (misclassifying generic content as logos), which does not fix the model's inability to correctly recognize logos at unusual angles—it only changes the decision boundary, not the model's feature representation. Option D is wrong because Vertex AI Matching Engine is designed for similarity search (e.g., finding nearest neighbors in an embedding space), not for classification; it would require generating embeddings for all images and does not directly solve the classification robustness problem, nor does it leverage the existing labeled training data.

Full explanation →

291

MCQmedium

Your ML pipeline uses Vertex AI Feature Store to serve features for online predictions. You need to monitor the freshness of features in the online store. Which approach is most effective?

A.Set up a Cloud Monitoring alert for feature store entity count.

B.Schedule a nightly BigQuery batch job to compare feature values.

C.Create a custom metric in Cloud Monitoring that tracks the time since last feature update, and set an alert threshold.

D.Enable detailed audit logs in Feature Store and export to BigQuery.

AnswerC

Directly measures staleness.

Why this answer

Option C is correct because Cloud Monitoring custom metrics allow you to track the timestamp of the last feature update in Vertex AI Feature Store and set an alert threshold for staleness. This directly measures feature freshness, which is critical for online predictions where stale features can degrade model accuracy. Other options either measure unrelated metrics (entity count), are too slow (nightly batch), or focus on auditing rather than real-time monitoring.

Exam trap

The trap here is that candidates confuse monitoring entity count (a capacity metric) with freshness, or assume that batch comparison or audit logs provide real-time monitoring, when only a custom staleness metric with alerting directly addresses the requirement.

How to eliminate wrong answers

Option A is wrong because monitoring the entity count in the feature store tracks the number of stored feature values, not the time since they were last updated, so it cannot detect staleness. Option B is wrong because a nightly BigQuery batch job introduces latency of up to 24 hours, making it unsuitable for real-time freshness monitoring required for online predictions. Option D is wrong because enabling detailed audit logs and exporting to BigQuery provides an after-the-fact record of changes but does not offer real-time alerting on feature staleness.

Full explanation →

292

MCQeasy

An ML team is using Vertex AI Pipelines to run automated retraining workflows. They want to monitor pipeline execution and receive alerts when a pipeline run fails. Which Google Cloud service should they use to set up such alerts?

A.Vertex AI Metadata

B.Cloud Monitoring

C.Cloud Logging

D.Cloud Scheduler

AnswerB

Cloud Monitoring can be configured with alerts on metrics like pipeline run failure count or success rate.

Why this answer

Cloud Monitoring (formerly Stackdriver Monitoring) is the correct service because it provides alerting policies that can be triggered based on pipeline run status metrics, such as failure counts or run state changes. Vertex AI Pipelines automatically exports execution metrics to Cloud Monitoring, allowing you to define conditions (e.g., metric 'pipeline/run_count' with filter 'status=FAILED') and configure notifications via channels like email, Pub/Sub, or PagerDuty.

Exam trap

The trap here is that candidates confuse Cloud Logging (which stores logs) with Cloud Monitoring (which provides alerting), or assume Vertex AI Metadata can trigger alerts because it tracks pipeline metadata, but it lacks any notification or policy engine.

How to eliminate wrong answers

Option A is wrong because Vertex AI Metadata is a managed metadata store for tracking artifacts, lineage, and executions; it does not provide alerting capabilities. Option C is wrong because Cloud Logging is for storing and querying logs, not for setting up proactive alerts on pipeline failures (though logs can be used to create log-based metrics, the question specifically asks for alerts on pipeline execution, which is natively handled by Cloud Monitoring metrics). Option D is wrong because Cloud Scheduler is a cron job service for triggering workflows on a schedule; it cannot monitor pipeline runs or generate failure alerts.

Full explanation →

293

Multi-Selectmedium

Which THREE actions should be taken to automate a machine learning pipeline using Cloud Build and Vertex AI?

Select 3 answers

A.Write a cloudbuild.yaml that builds a training container and submits a Vertex AI PipelineJob

B.Use Cloud Functions to retrain the model each time a build completes

C.Set up a Cloud Scheduler job to poll for new build artifacts

D.Define the training and deployment steps in a Vertex AI Pipeline and submit it from Cloud Build

E.Configure a Cloud Build trigger to run on commits to the source repository

AnswersA, D, E

Cloud Build uses build config to define steps, including submitting pipeline jobs.

Why this answer

Option A is correct because Cloud Build's cloudbuild.yaml can define a step that builds a custom training container and submits it as a Vertex AI PipelineJob. This directly automates the ML pipeline by using Cloud Build to trigger a Vertex AI pipeline, which is the recommended pattern for CI/CD of ML workflows.

Exam trap

Google Cloud often tests the distinction between event-driven triggers (Cloud Build triggers, Pub/Sub) and polling mechanisms (Cloud Scheduler, Cloud Functions) — the trap here is that candidates may think polling or separate functions are needed for automation, when in fact Cloud Build's native triggers and pipeline submission are the correct, integrated approach.

Full explanation →

294

Multi-Selecteasy

An ML team is deploying a model to Vertex AI for the first time. Which THREE are best practices for scaling from prototype to production?

Select 3 answers

A.Manually scale instances based on historical traffic patterns.

B.Store all features in a Feature Store for consistency.

C.Use a single large instance to simplify management.

D.Monitor model performance for drift and accuracy degradation.

E.Automate model retraining and deployment using Vertex AI Pipelines.

AnswersB, D, E

Feature Store ensures consistent feature computation across training and serving.

Why this answer

Storing all features in a Feature Store (Option B) ensures consistency between training and serving, preventing training-serving skew. Vertex AI Feature Store provides a centralized repository for feature values, enabling reuse, point-in-time lookups, and online serving with low latency, which is critical for production reliability.

Exam trap

Google Cloud often tests the misconception that manual scaling or single-instance architectures are simpler and more reliable, but the PMLE exam emphasizes automated, resilient, and consistent practices like autoscaling and feature stores for production ML workloads.

Full explanation →

295

MCQeasy

A team has developed a prototype of a recommendation model using a small dataset on a single VM. They need to scale to a larger dataset for production training. They plan to use Vertex AI training with a custom container. What is the best practice for handling the increased data volume?

A.Increase the batch size to maximum.

B.Use TFRecord format and streaming reads.

C.Store all data in memory before training.

D.Use a single powerful VM with high memory.

AnswerB

Efficiently loads data in batches, leveraging Cloud Storage streaming.

Why this answer

Option B is correct because using TFRecord format with streaming reads allows efficient, scalable data loading from Cloud Storage, reducing memory pressure and improving I/O performance. Option A is wrong because storing all data in memory is not scalable. Option C is wrong because increasing batch size to maximum can cause memory issues and may not improve throughput.

Option D is wrong because a single powerful VM still has limits and is not cost-effective for large datasets.

Full explanation →

296

MCQeasy

Your company deploys batch prediction jobs using Vertex AI Batch Prediction. You need to monitor the jobs for failures and performance. What is the recommended approach?

A.Use Cloud Logging to export batch prediction logs and create log-based metrics.

B.Set up email alerts in the Vertex AI console for failed jobs.

C.Use Cloud Monitoring to create custom dashboards and alerts based on Vertex AI batch prediction metrics.

D.Enable the Recommender to get optimization suggestions for batch jobs.

AnswerC

Cloud Monitoring natively supports Vertex AI metrics for batch predictions.

Why this answer

Option C is correct because Cloud Monitoring (formerly Stackdriver) is the native Google Cloud service for collecting, visualizing, and alerting on metrics from Vertex AI, including batch prediction job success rates, latency, and resource utilization. It provides pre-built dashboards and the ability to create custom alerts, making it the recommended approach for monitoring failures and performance in a centralized, scalable way.

Exam trap

Google Cloud often tests the misconception that Cloud Logging is the primary monitoring tool for metrics, when in fact Cloud Monitoring is the dedicated service for metrics and alerting, while Cloud Logging is for logs and log-based metrics only.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is designed for log data, not structured metrics; while you could create log-based metrics from batch prediction logs, this is an indirect, less efficient method that lacks the pre-built performance metrics and alerting capabilities of Cloud Monitoring. Option B is wrong because email alerts in the Vertex AI console are not a native feature; Vertex AI does not provide a built-in email alerting mechanism for job failures—alerts must be configured through Cloud Monitoring or Cloud Logging. Option D is wrong because the Recommender provides optimization suggestions (e.g., machine type, resource allocation) but does not monitor job failures or performance in real time; it is a post-hoc analysis tool, not a monitoring solution.

Full explanation →

297

MCQeasy

A retail company has deployed a machine learning model using Vertex AI Endpoints to predict inventory demand. The model was trained on data from the past two years and has been in production for six months. The team has enabled Vertex AI Model Monitoring to track prediction drift with an alert threshold of 0.2. Last week, they received an alert that the prediction drift score reached 0.35, exceeding the threshold. The engineer checks the monitoring dashboard and sees that the distribution of predictions has shifted noticeably compared to the training data. The engineer also notices that the model's accuracy metrics, computed from weekly ground truth data, have remained within acceptable range. What should the engineer do first?

A.Investigate the input feature distributions for the recent serving requests to identify if data drift is the underlying cause of the prediction drift.

B.Increase the prediction drift alert threshold to 0.4 to reduce the number of false alerts.

C.Retrain the model using the latest three months of data to incorporate recent trends.

D.Roll back to an earlier model version that had lower prediction drift.

AnswerA

By checking input feature distributions, the engineer can confirm whether data drift is present, which commonly causes prediction drift even if accuracy remains temporarily stable.

Why this answer

The prediction drift alert indicates a shift in prediction distribution, but accuracy is stable. This suggests data drift (change in input features) rather than concept drift. The engineer should first investigate input feature distributions to confirm if data drift is the cause.

Retraining (A) is premature without root cause analysis. Increasing the threshold (C) ignores the underlying issue. Rolling back (D) may not help if the previous version also suffers from the same data drift.

Full explanation →

298

MCQmedium

A team wants to deploy a BigQuery ML model for online prediction. Which approach should they take?

A.Export the model to Cloud Storage and deploy to AI Platform

B.Export the model to Vertex AI and create an endpoint

C.None of these; BigQuery ML models cannot be used for online prediction

D.Use BigQuery ML's ML.PREDICT for online predictions

AnswerB

Vertex AI supports deploying BigQuery ML models for online serving.

Why this answer

BigQuery ML models can be exported directly to Vertex AI for online prediction. Vertex AI provides a managed endpoint that supports real-time serving with low latency, which is required for online prediction. Exporting to Cloud Storage and then deploying to AI Platform is outdated because AI Platform is now part of Vertex AI, and the recommended path is to export the model directly to Vertex AI and create an endpoint.

Exam trap

Google Cloud often tests the distinction between batch prediction (ML.PREDICT) and online prediction (Vertex AI endpoint), and the trap here is that candidates assume BigQuery ML's ML.PREDICT can serve real-time requests, but it is designed for batch processing only.

How to eliminate wrong answers

Option A is wrong because exporting the model to Cloud Storage and deploying to AI Platform is a legacy approach; AI Platform has been integrated into Vertex AI, and the current best practice is to export directly to Vertex AI. Option C is wrong because BigQuery ML models can indeed be used for online prediction by exporting them to Vertex AI and creating an endpoint. Option D is wrong because ML.PREDICT in BigQuery ML is designed for batch predictions, not for real-time online predictions with low-latency requirements.

Full explanation →

299

Multi-Selecteasy

A team is deploying a new model version. They want to ensure that they can quickly roll back if the new version performs poorly in production. Which TWO actions should they take? (Choose 2.)

Select 2 answers

A.Keep the old model version deployed alongside the new one

B.Configure Vertex AI Model Monitoring to compare predictions

C.Use traffic splitting to gradually shift traffic

D.Set up Cloud Monitoring alerts on model performance

E.Store multiple model versions in the same endpoint

AnswersC, E

Traffic splitting allows you to direct a small percentage of traffic to the new version and easily shift all traffic back if issues arise.

Why this answer

Option C is correct because traffic splitting allows you to gradually shift a percentage of inference requests from the old model version to the new one. If the new version performs poorly, you can immediately revert the split to 0% for the new version, providing a fast and controlled rollback without redeploying or disrupting service.

Exam trap

The trap here is that candidates confuse monitoring and alerting (options B and D) with the actual deployment and rollback mechanism, assuming that detecting poor performance is equivalent to being able to quickly roll back, when in fact you need a traffic management feature like traffic splitting to execute the rollback.

Full explanation →

300

MCQhard

A company has multiple business units using the same Vertex AI environment. They need to enforce that models deployed to production have passed a validation pipeline, and only the ML Engineering team can deploy to production. Which IAM configuration should they use?

A.Use Vertex AI Workbench with user-managed notebooks.

B.Use custom roles with permissions to deploy models, and use Cloud Audit Logs to monitor deployments.

C.Use Binary Authorization to ensure models are signed.

D.Use organization policies to restrict deployment to specific locations.

E.Use Vertex AI Model Registry with automated deployment via Cloud Build, and restrict those permissions to the ML Engineering team using IAM conditions.

AnswerE

This ensures only approved pipelines trigger deployment and only authorized team can initiate.

Why this answer

Option E is correct because it combines Vertex AI Model Registry (which enforces that only validated models are promoted to production) with Cloud Build for automated deployment, and uses IAM conditions to restrict deployment permissions exclusively to the ML Engineering team. This ensures that models must pass the validation pipeline before deployment, and only authorized personnel can trigger the deployment process.

Exam trap

The trap here is that candidates may confuse monitoring (Audit Logs) or location restrictions (Organization Policies) with enforcing a validation pipeline and team-specific deployment permissions, missing the need for a model registry and automated deployment with IAM conditions.

How to eliminate wrong answers

Option A is wrong because Vertex AI Workbench with user-managed notebooks is a development environment for building and training models, not a mechanism for enforcing deployment validation or restricting deployment permissions. Option B is wrong because custom roles with deployment permissions and Cloud Audit Logs only provide monitoring and access control, but do not enforce that models have passed a validation pipeline before deployment. Option C is wrong because Binary Authorization is designed for container image signing and attestation, not for validating ML model pipelines or restricting deployment to specific teams.

Option D is wrong because organization policies can restrict deployment to specific locations (e.g., regions), but they do not enforce model validation or restrict deployment permissions to a specific team.

Full explanation →

Google Professional Machine Learning Engineer (PMLE) — Questions 226–300