Google Professional Machine Learning Engineer PMLE Questions 301–375 | Page 5/7

301

MCQmedium

A team deploys a model using Vertex AI and wants to monitor for concept drift. What should they track?

A.Number of prediction requests

B.Model prediction latency

C.Changes in input data distribution

D.Changes in the relationship between inputs and outputs

AnswerD

Concept drift is a change in the underlying function mapping inputs to outputs.

Why this answer

Concept drift refers to a change in the underlying relationship between the input features and the target variable over time, which degrades model performance. In Vertex AI, monitoring this requires tracking the statistical relationship between inputs and outputs (e.g., via prediction residuals or model performance metrics), not just the input distribution alone. Option D correctly identifies this need, as concept drift is fundamentally about the input-output mapping shifting, even if the input distribution remains stable.

Exam trap

Google Cloud often tests the distinction between data drift (input distribution changes) and concept drift (input-output relationship changes), and the trap here is that candidates confuse the two, picking Option C because they think monitoring input data is sufficient for detecting all model degradation.

How to eliminate wrong answers

Option A is wrong because the number of prediction requests measures traffic volume, not data or concept drift; it is a scaling or operational metric, not a model quality metric. Option B is wrong because prediction latency measures inference speed, which is a performance indicator unrelated to the statistical properties of data or model relationships. Option C is wrong because changes in input data distribution represent data drift (covariate shift), not concept drift; while data drift can cause concept drift, monitoring only input distribution misses shifts in the input-output relationship that occur without distributional changes.

Full explanation →

302

Multi-Selecteasy

Which TWO are best practices for building ML pipelines on Vertex AI Pipelines?

Select 2 answers

A.Store all trained models in Cloud Storage without versioning

B.Use Cloud Build as the pipeline orchestrator

C.Use a container-based approach for each component

D.Define pipelines using the Kubeflow Pipelines SDK

E.Use Cloud Composer as the primary pipeline tool

AnswersC, D

Containerized components are reusable and scalable.

Why this answer

Option C is correct because Vertex AI Pipelines is designed to run container-based components, where each step in the pipeline is a Docker container that encapsulates its dependencies and execution logic. This approach ensures reproducibility, isolation, and scalability, aligning with best practices for ML pipelines on Vertex AI.

Exam trap

Google Cloud often tests the distinction between general-purpose orchestration tools (Cloud Composer, Cloud Build) and ML-specific pipeline services (Vertex AI Pipelines), expecting candidates to recognize that container-based components and the Kubeflow Pipelines SDK are the correct building blocks for ML pipelines on Vertex AI.

Full explanation →

303

Multi-Selectmedium

Which TWO are best practices when deploying AutoML models to production?

Select 2 answers

A.Monitor for data drift

B.Train the model on a disk to reduce latency

C.Enable Vertex AI Explainability

D.Deploy on sole-tenant nodes

E.Use TPUs for model serving

AnswersA, C

Data drift can degrade performance; monitoring is essential.

Why this answer

Monitoring for data drift (Option A) is a best practice because production models can degrade over time as the statistical properties of input data change. Vertex AI provides a Model Monitoring service that automatically detects skew and drift by comparing serving data distribution against training data distribution, triggering alerts when anomaly thresholds are breached. This ensures model reliability and performance in production.

Exam trap

Google Cloud often tests the misconception that TPUs are suitable for model serving, but TPUs are optimized for training and not supported for Vertex AI AutoML serving, which uses CPUs or GPUs for inference.

Full explanation →

304

MCQeasy

A team is using Vertex AI Pipelines to automate their ML workflow. They want to ensure that pipeline runs are reproducible and that artifacts are tracked. Which feature should they use?

A.Vertex AI Feature Store

B.Vertex AI Experiments

C.Vertex AI Model Registry

D.Vertex AI Endpoints

AnswerB

Experiments track parameters, metrics, and artifacts for each run.

Why this answer

Vertex AI Experiments is the correct feature because it captures parameters, metrics, and artifacts for each pipeline run, enabling reproducibility and lineage tracking. This directly supports the team's need to ensure runs are reproducible and artifacts are tracked, as Experiments automatically logs metadata for every execution.

Exam trap

The trap here is that candidates confuse artifact tracking with model management or deployment features, leading them to select Model Registry or Endpoints instead of recognizing that Experiments provides the run-level metadata and lineage required for reproducibility.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is designed for managing and serving feature data for ML models, not for tracking pipeline runs or artifacts. Option C is wrong because Vertex AI Model Registry focuses on managing model versions and deployment, not on capturing run-level metadata or artifact lineage. Option D is wrong because Vertex AI Endpoints are for deploying models to serve predictions, not for tracking reproducibility or artifacts in pipeline runs.

Full explanation →

305

MCQeasy

You need to set up monitoring for a Vertex AI model that serves predictions in real-time. The model is expected to have a latency SLA of under 100ms. Which metric should you configure an alert on to ensure the SLA is met?

A.p50 latency of prediction requests

B.Prediction drift score

C.p99 latency of prediction requests

D.Number of prediction requests per second

AnswerC

p99 captures tail latency critical for SLA.

Why this answer

Option C is correct because p99 latency measures the worst-case latency experienced by 99% of requests, which is the standard metric for enforcing a strict SLA like under 100ms. Monitoring p99 ensures that even the slowest 1% of requests do not violate the threshold, providing a robust guarantee for real-time predictions.

Exam trap

Google Cloud often tests the misconception that median (p50) latency is sufficient for SLAs, but the trap is that SLAs require tail-latency guarantees (p99 or p999) to catch performance outliers that violate the threshold.

How to eliminate wrong answers

Option A is wrong because p50 latency (median) ignores the tail latency, meaning half of the requests could exceed 100ms without triggering an alert, failing the SLA. Option B is wrong because prediction drift score measures changes in model input/output distributions over time, not latency, and is irrelevant for SLA compliance. Option D is wrong because the number of prediction requests per second (throughput) does not measure individual request latency; high throughput can occur even if latency spikes above 100ms.

Full explanation →

306

Multi-Selectmedium

A company wants to build a low-code ML pipeline using Vertex AI Pipelines and BigQuery ML. They need to train, evaluate, and deploy a model. Which TWO statements are correct about the integration between Vertex AI Pipelines and BigQuery ML? (Choose TWO.)

Select 2 answers

A.BigQuery ML models are automatically stored in Vertex AI Model Registry after training.

B.BigQuery ML supports hyperparameter tuning using the CREATE MODEL statement.

C.Vertex AI Pipelines supports automatic retry of failed steps due to transient errors.

D.Vertex AI Pipeline steps can include BigQuery ML training via the BigQueryQueryJob operator.

E.The trained BigQuery ML model can be registered in Vertex AI Model Registry and deployed to an endpoint.

AnswersD, E

BigQuery ML training can be invoked as a SQL query step.

Why this answer

Option D is correct because Vertex AI Pipelines can integrate with BigQuery ML by using the BigQueryQueryJob operator to execute SQL-based training queries, such as `CREATE MODEL`, as a pipeline step. This allows you to orchestrate BigQuery ML model training within a Vertex AI Pipeline, enabling a low-code ML workflow.

Exam trap

Google Cloud often tests the misconception that BigQuery ML models are automatically registered in Vertex AI Model Registry after training, but in reality, you must explicitly export or upload the model to the registry as a separate step.

Full explanation →

307

MCQmedium

You are an ML engineer at a logistics company. You have deployed a deep learning model on Vertex AI Endpoints using a custom container with GPU acceleration. The model predicts delivery times based on route features. After one week, you notice that the endpoint's GPU utilization is consistently at 10%, but the prediction latency has increased by 50%. The number of prediction requests per second has remained stable. You check the container logs and see no errors. The model is served using TensorFlow Serving with batching enabled (batch size: 32, batch timeout: 100ms). The custom container uses a single NVIDIA T4 GPU. You have also set the Vertex AI endpoint to use autoscaling with minReplicaCount: 1 and maxReplicaCount: 5, and the CPU utilization target is 60%. Which action should you take to reduce latency?

A.Increase the minReplicaCount to 3 to handle requests in parallel.

B.Reduce the CPU utilization target to 40% to trigger more aggressive autoscaling.

C.Quantize the model to FP16 to reduce compute time per inference.

D.Increase the batch size to 64 and batch timeout to 200ms to improve GPU utilization.

AnswerD

Larger batch sizes allow the GPU to process more data per inference, increasing throughput and reducing per-request latency once the batch fills up.

Why this answer

The core issue is low GPU utilization (10%) despite increased latency, indicating that the GPU is underutilized and the bottleneck is likely in batching or data pipeline overhead. Increasing the batch size to 64 and batch timeout to 200ms allows TensorFlow Serving to accumulate more requests per batch, improving GPU throughput and reducing per-request latency by better leveraging GPU parallelism. This directly addresses the mismatch between low GPU utilization and high latency.

Exam trap

The trap here is that candidates focus on scaling or model optimization (A, B, C) without recognizing that low GPU utilization with high latency is a classic sign of inefficient batching, not insufficient compute or replicas.

How to eliminate wrong answers

Option A is wrong because increasing minReplicaCount adds more CPU-bound replicas, which does not address the GPU underutilization and may increase cost without reducing latency, as the GPU is already idle. Option B is wrong because reducing the CPU utilization target triggers more aggressive autoscaling based on CPU, but the bottleneck is GPU utilization, not CPU; this would add more replicas without improving GPU efficiency. Option C is wrong because quantizing to FP16 could reduce compute time per inference, but the problem is low GPU utilization, not compute-bound operations; quantization may not help if the GPU is already idle due to small batch sizes, and it could introduce accuracy loss.

Full explanation →

308

MCQeasy

A company has deployed a TensorFlow model on Vertex AI Prediction for real-time inference. They notice that during peak hours, the prediction latency increases significantly, and some requests time out. The model requires GPU acceleration. Which action should they take to reduce latency and avoid timeouts?

A.Enable autoscaling with min replicas set to the base load and max replicas set to handle peak load, and ensure GPU quota is sufficient.

B.Switch to a larger machine type with more vCPUs.

C.Increase the number of replicas in the Vertex AI Prediction endpoint statically to handle peak load.

D.Use Cloud Functions to invoke the model asynchronously.

AnswerA

Autoscaling adjusts replicas dynamically, and sufficient GPU quota prevents resource bottlenecks.

Why this answer

Option A is correct because enabling autoscaling with appropriate min and max replicas dynamically adjusts capacity to handle peak load, and ensuring sufficient GPU quota prevents resource constraints. Option B is wrong because statically increasing replicas leads to resource waste during low traffic and may not react quickly to spikes. Option C is wrong because increasing CPU resources does not address GPU-bound inference.

Option D is wrong because Cloud Functions is not designed for GPU-accelerated inference and introduces additional latency.

Full explanation →

309

MCQeasy

A startup wants to add sentiment analysis to their customer feedback app without any labeled data or custom model training. Which Google Cloud service should they use?

A.Cloud Natural Language API

B.AutoML Natural Language with manual labeling

C.Use BigQuery ML to train a text classification model

D.Train a custom sentiment model on Vertex AI

AnswerA

Pre-trained model available via API call.

Why this answer

The Cloud Natural Language API provides pre-trained models for sentiment analysis that require no labeled data or custom training. It offers a ready-to-use sentiment analysis feature via a simple API call, making it ideal for a startup that wants to add sentiment analysis without any machine learning expertise or data preparation.

Exam trap

Google Cloud often tests the distinction between pre-trained APIs and custom training services, where candidates mistakenly choose AutoML or Vertex AI because they think any ML task requires custom training, overlooking the existence of fully managed, pre-trained APIs like Cloud Natural Language API.

How to eliminate wrong answers

Option B is wrong because AutoML Natural Language with manual labeling requires labeled data and custom model training, which contradicts the requirement of no labeled data or custom training. Option C is wrong because BigQuery ML is designed for training models using SQL queries on structured data, not for pre-trained sentiment analysis, and it still requires labeled data and training. Option D is wrong because training a custom sentiment model on Vertex AI involves building, training, and deploying a custom model, which requires labeled data and significant ML effort, not a pre-built solution.

Full explanation →

310

MCQhard

A team trains a distributed TensorFlow model using the config above. After training, they deploy the model for online predictions. The model returns poor quality predictions. They suspect that the model was not trained correctly due to a configuration error. What is the most likely mistake?

A.The `scaleTier` is set to 'STANDARD_1' which only supports up to 3 workers.

B.The training job is using a custom container that does not match the requirements.

C.The model was exported incorrectly because the training job did not specify a `--model-export-path`.

D.The parameter server count should be at least equal to the worker count.

AnswerA

STANDARD_1 limits workers to 3; the actual job may have ignored the 10 worker setting.

Why this answer

Option B is correct because 'STANDARD_1' scale tier is for small scale, max workers is 3. The config set 10 workers, which would be ignored or cause error. The training might have run with fewer workers, leading to poor model.

Option A: not required; option C: model-dir is fine; option D: not indicated.

Full explanation →

311

Multi-Selecthard

An ML engineer is deploying a large BERT-based natural language processing model for real-time inference on Vertex AI Prediction. The model has a large memory footprint (2GB) and experiences unpredictable traffic spikes up to 10x the baseline. The engineer needs to minimize latency and cost while handling spiky traffic. Which TWO actions should the engineer take? (Choose two.)

Select 2 answers

A.Configure the endpoint to use manual scaling with a fixed number of replicas equal to peak traffic.

B.Enable automatic scaling with a maximum of 3 replicas to limit cost.

C.Use a custom prediction routine with model quantization to reduce model size.

D.Set up model monitoring to detect prediction drift and retrain regularly.

E.Use a GPU machine type (NVIDIA T4) to accelerate inference.

AnswersC, E

Quantization reduces model size and inference latency, improving both cost and speed.

Why this answer

Option C is correct because model quantization reduces the memory footprint of a BERT model (e.g., from 2GB to ~500MB with INT8 quantization), which directly lowers inference latency and cost by enabling faster loading and more efficient use of hardware. This is critical for real-time inference with unpredictable traffic spikes, as smaller models scale more easily and reduce the need for excessive replicas.

Exam trap

The trap here is that candidates often assume GPU acceleration (Option E) is always the best choice for reducing latency, but for a 2GB BERT model with spiky traffic, quantization (Option C) can achieve similar latency improvements at a fraction of the cost, and the question explicitly asks to minimize both latency and cost, making quantization a more balanced solution.

Full explanation →

312

MCQhard

You are an ML engineer at a large e-commerce company. Your team has developed a product recommendation model using TensorFlow and deployed it on Vertex AI Endpoints for real-time inference. The model is retrained weekly using a Vertex AI Pipeline that reads new user interaction data from BigQuery, trains the model, evaluates it, and deploys the new version to the endpoint with a traffic split: 10% to the new model and 90% to the previous champion model. Recently, the team noticed that the new model's online prediction latency has increased significantly (from 50ms to 200ms) after deployment, causing timeouts for some requests. The training code has not changed, and the model size is similar. The pipeline uses a custom container with the same TensorFlow Serving image as before. The deployment step uses the same machine type (n1-standard-4) for the endpoint. What is the most likely cause of the latency increase?

A.The endpoint is using a machine type that is not optimized for the new model's computation.

B.The new model has a significantly different architecture that requires more computation.

C.The pipeline now includes a data validation step that modifies the SavedModel's serving signature, adding an extra preprocessing operation.

D.The new model is experiencing data skew because the training data distribution has changed.

AnswerC

A data validation step might have inadvertently added preprocessing ops, increasing latency.

Why this answer

Option C is correct because the pipeline now includes a data validation step that modifies the SavedModel's serving signature, adding an extra preprocessing operation. This additional operation runs during inference on Vertex AI Endpoints, increasing the per-request latency from 50ms to 200ms, even though the model architecture and size remain unchanged. The custom container and machine type are identical, so the latency increase must stem from a change in the serving graph itself.

Exam trap

Google Cloud often tests the concept that changes in the ML pipeline (like adding a data validation step) can alter the serving signature and increase latency, even when the model architecture and infrastructure remain unchanged, tricking candidates into focusing on hardware or data distribution instead.

How to eliminate wrong answers

Option A is wrong because the endpoint uses the same machine type (n1-standard-4) as before, so the machine is not the cause of the latency increase. Option B is wrong because the training code has not changed and the model size is similar, indicating the architecture is not significantly different. Option D is wrong because data skew affects prediction accuracy, not latency; it does not explain a 4x increase in inference time.

Full explanation →

313

MCQmedium

The exhibit shows part of a Vertex AI Pipeline definition. The pipeline fails at the training step with an error: 'Missing required input: train_data'. What is the most likely cause?

A.The evaluation step expects a metric output but training does not produce it

B.The training step uses the wrong image tag

C.The container command for data_processing is incorrect

D.The data_processing step does not define any outputs

E.The pipeline is missing a deployment step

AnswerD

The pipeline must define an output from data_processing to feed into training.

Why this answer

The error 'Missing required input: train_data' indicates that the training step expects an input artifact named 'train_data', but no upstream step provides it. In Vertex AI Pipelines, a component's output must be explicitly defined and connected to the downstream component's input. Since the data_processing step does not define any outputs, it cannot produce the 'train_data' artifact, causing the training step to fail.

Exam trap

Google Cloud often tests the distinction between runtime errors (e.g., container image issues) and graph validation errors (e.g., missing input/output connections), leading candidates to confuse a missing output definition with a container or command misconfiguration.

How to eliminate wrong answers

Option A is wrong because the error is about a missing input, not a missing metric output; the evaluation step's expectations are irrelevant to the training step's input requirement. Option B is wrong because an incorrect image tag would cause a container runtime error (e.g., 'ImagePullBackOff'), not a 'Missing required input' error, which is a pipeline graph validation issue. Option C is wrong because an incorrect container command for data_processing would cause that step to fail, but the error specifically points to the training step's missing input, not a failure in data_processing.

Option E is wrong because a missing deployment step would not cause a training step input error; deployment occurs after training and evaluation, and its absence would not affect the training step's input requirements.

Full explanation →

314

Multi-Selecteasy

Which TWO of the following are best practices for versioning ML models and datasets?

Select 2 answers

A.Use Vertex AI Model Registry for model versioning and lineage tracking.

B.Use semantic versioning for datasets.

C.Store datasets and models in the same Cloud Storage bucket with version prefixes.

D.Use Git LFS for dataset versioning.

E.Use Cloud Data Catalog to tag dataset versions.

AnswersA, B

Model Registry is designed for model versioning and captures lineage.

Why this answer

Vertex AI Model Registry is a managed service that automatically tracks model versions, artifacts, and lineage metadata (e.g., training runs, evaluation metrics, and source datasets). It provides a centralized hub for model governance, enabling reproducibility and auditability without manual versioning overhead. This makes it a best practice for versioning ML models in a production MLOps workflow.

Exam trap

Google Cloud often tests the misconception that storing artifacts in the same bucket with version prefixes is sufficient for versioning, when in fact it lacks lineage tracking, automated metadata, and governance controls that dedicated registries and versioning schemes provide.

Full explanation →

315

MCQmedium

A financial services firm uses Vertex AI AutoML Natural Language to classify customer feedback into categories (positive, neutral, negative). They notice that the model performs poorly on neutral and negative classes, with high false negatives for negative. The dataset has 10,000 samples: 8,000 positive, 1,000 neutral, 1,000 negative. They have trained the model with automatic data split and default hyperparameters. Which course of action should they take to improve classification of minority classes?

A.Use a custom model with a weighted loss function.

B.Enable the 'weighted' option in AutoML NLP to handle class imbalance.

C.Increase the number of training node hours.

D.Set the data split to 50/25/25 for train/validation/test.

AnswerB

This built-in option adjusts weights for minority classes, improving performance.

Why this answer

Option B is correct because AutoML Natural Language provides a built-in 'weighted' option that automatically adjusts the loss function to penalize misclassifications of minority classes more heavily, directly addressing the class imbalance without requiring custom model development. This is the simplest and most effective way to improve recall for the neutral and negative classes within the AutoML framework.

Exam trap

Google Cloud often tests the misconception that any class imbalance problem requires a custom model or manual data augmentation, when in fact AutoML's built-in 'weighted' option is the prescribed low-code solution for such scenarios.

How to eliminate wrong answers

Option A is wrong because using a custom model with a weighted loss function would require moving away from AutoML's low-code paradigm, which contradicts the scenario's implicit requirement for a low-code solution; AutoML already handles weighting internally via the 'weighted' option. Option C is wrong because increasing training node hours only provides more compute time for the same training process and does not address the fundamental issue of class imbalance; it may lead to overfitting on the majority class. Option D is wrong because changing the data split ratio (e.g., 50/25/25) does not mitigate class imbalance; it merely redistributes the same skewed proportions across training, validation, and test sets, leaving the model still biased toward the majority positive class.

Full explanation →

316

MCQhard

A healthcare startup deployed a Vertex AI AutoML Vision model to detect anomalies in medical images. The model performs well on the test set but has high latency in production, exceeding the 2-second SLA. The images are stored in Cloud Storage and are processed via a Cloud Function triggered by new uploads. What is the most likely cause?

A.The images are being resized and preprocessed in the Cloud Function, adding latency.

B.The model is deployed on a small machine type with insufficient compute.

C.The Cloud Function has a cold start issue.

D.The AutoML Vision endpoint is not using GPU acceleration.

AnswerB

A small machine type (e.g., n1-standard-2) can cause high inference latency under load.

Why this answer

Option B is correct because the most likely cause of high latency exceeding the 2-second SLA is that the Vertex AI AutoML Vision model is deployed on a small machine type (e.g., n1-standard-2 or lower) with insufficient compute resources (CPU/memory). AutoML Vision endpoints use container-based serving, and underpowered machines cannot handle the inference load efficiently, especially for high-resolution medical images, leading to response times beyond the SLA.

Exam trap

The trap here is that candidates confuse cold start latency (Cloud Function) with inference latency (model serving), or assume GPU acceleration is optional for AutoML endpoints, when in fact AutoML Vision automatically uses GPUs and the real bottleneck is the compute capacity of the serving machine.

How to eliminate wrong answers

Option A is wrong because image resizing and preprocessing in the Cloud Function typically add minimal latency (milliseconds) and are not the primary cause of exceeding a 2-second SLA; the bottleneck is inference, not preprocessing. Option C is wrong because cold starts in Cloud Functions add 1-2 seconds at most and can be mitigated with min instances, but the question states the model performs well on the test set, implying the issue is inference latency, not function initialization. Option D is wrong because AutoML Vision endpoints automatically use GPU acceleration when available and appropriate; the lack of GPU is not a configurable option for AutoML endpoints, and the latency issue is more likely due to insufficient CPU/memory on the serving machine.

Full explanation →

317

Matchingmedium

Match each optimization algorithm to its characteristic.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Stochastic gradient descent with constant learning rate

Adaptive moment estimation with per-parameter learning rates

Root mean square propagation, adapts learning rate per parameter

Adaptive gradient algorithm, reduces learning rate for frequent features

Accelerates SGD by adding a fraction of previous update

Why these pairings

Optimizers affect training convergence.

Full explanation →

318

MCQmedium

Refer to the exhibit. What is the purpose of this query?

A.To detect data drift

B.To find prediction errors in Cloud Logging

C.To count all prediction requests

D.To monitor model latency

AnswerB

The filter uses PredictionError type and ERROR severity.

Why this answer

The query filters Cloud Logging entries for the string 'prediction failed', which directly indicates prediction errors logged by the ML prediction service. This is a common pattern for monitoring model inference failures in production, not for measuring drift, counting requests, or measuring latency.

Exam trap

Google Cloud often tests the distinction between log-based monitoring (for errors) and metric-based monitoring (for counts, latency, drift), so candidates mistakenly choose 'count all prediction requests' when the query clearly filters for failures, not all requests.

How to eliminate wrong answers

Option A is wrong because data drift detection requires comparing feature distributions over time, not searching for error log messages. Option C is wrong because counting all prediction requests would require a metric like `prediction_count` or a log-based metric counting all prediction entries, not filtering for failures. Option D is wrong because monitoring model latency requires timing metrics (e.g., `prediction_latency_ms`) or log entries with duration fields, not a search for 'prediction failed'.

Full explanation →

319

Multi-Selecteasy

Your team deploys a model using Vertex AI Endpoints with autoscaling. Which TWO metrics are most important to monitor in order to optimize cost and performance? (Choose two.)

Select 2 answers

A.Number of active nodes in the endpoint.

B.Number of requests per minute.

C.CPU utilization of the serving containers.

D.Error rate (HTTP 4xx/5xx).

E.P99 prediction latency.

AnswersB, C

Indicates traffic patterns.

Why this answer

Option B is correct because the number of requests per minute directly drives autoscaling behavior in Vertex AI Endpoints. Monitoring this metric allows you to right-size the number of serving nodes to match traffic patterns, avoiding over-provisioning (cost) or under-provisioning (performance). Option C is correct because CPU utilization of the serving containers indicates whether the model is compute-bound or idle; high CPU suggests the need for more nodes, while low CPU suggests over-provisioning, directly impacting both cost and latency.

Exam trap

Google Cloud often tests the distinction between metrics that are direct inputs to autoscaling decisions (requests per minute, CPU utilization) versus metrics that are outcomes of scaling (active nodes, latency, error rate), leading candidates to mistakenly select outcome metrics as primary optimization drivers.

Full explanation →

320

Multi-Selecteasy

A data analyst wants to build a binary classification model using a low-code ML solution on Google Cloud. The dataset is stored in BigQuery and contains 500,000 rows with 20 features, including categorical and numerical columns. The analyst has minimal coding experience and needs to deploy the model as an API endpoint for real-time predictions. Which two Google Cloud services should the analyst use to accomplish this task with minimal code? Choose two options.

Select 2 answers

A.BigQuery ML

B.Vertex AI Endpoints

C.Cloud Functions

D.Vertex AI Workbench

E.AutoML Tables

AnswersB, E

Vertex AI Endpoints provides a serverless option to deploy trained models as REST APIs with autoscaling, ideal for real-time predictions without code.

Why this answer

Vertex AI Endpoints is correct because it provides a managed service to deploy trained models as REST API endpoints for real-time predictions with minimal code. The analyst can deploy an AutoML Tables model directly to a Vertex AI Endpoint, enabling low-code deployment and serving.

Exam trap

Google Cloud often tests the distinction between model training services (BigQuery ML, AutoML Tables) and model deployment services (Vertex AI Endpoints), leading candidates to incorrectly select BigQuery ML for real-time API deployment when it only supports batch inference.

Full explanation →

321

MCQhard

A team uses Vertex AI Pipelines with CustomJob components that pull training code from a Cloud Source Repository. The pipeline fails with a 'Permission denied' error when trying to access the repository. The service account used by the pipeline has the 'Source Repository Viewer' role. What is the likely issue?

A.The training code contains a dependency that is not available in the custom container

B.The 'Source Repository Viewer' role is insufficient; the service account needs 'Source Repository Reader' or higher

C.The pipeline is running in a different project than the repository; cross-project access is not supported

D.The repository URL is incorrectly formatted; use the SSH URL instead of HTTPS

AnswerB

Reader role allows cloning and fetching, while Viewer only allows browsing.

Why this answer

The 'Source Repository Viewer' role only allows listing and viewing repository metadata, not reading the actual source code. To clone or pull code from a Cloud Source Repository, the service account needs the 'Source Repository Reader' role (or higher), which grants the `source.repos.get` and `source.repos.read` permissions required for Git operations. The pipeline's CustomJob component fails because the service account lacks these permissions when attempting to access the repository.

Exam trap

Google Cloud often tests the distinction between IAM roles that grant read-only access to metadata versus those that grant actual data access, leading candidates to assume 'Viewer' is sufficient for reading source code.

How to eliminate wrong answers

Option A is wrong because a missing dependency would cause a runtime error during training, not a 'Permission denied' error when accessing the repository. Option C is wrong because cross-project access to Cloud Source Repositories is fully supported as long as the service account has the appropriate IAM roles on the repository's project. Option D is wrong because both HTTPS and SSH URLs are supported for Cloud Source Repositories; the error is a permissions issue, not a URL format issue.

Full explanation →

322

MCQeasy

A company uses Cloud Composer to orchestrate their ML pipelines. They notice that tasks are being queued but not executed, causing delays. What is the most likely cause?

A.The Airflow web server is down

B.The DAG file is corrupted

C.The Cloud Storage bucket containing DAGs is not accessible

D.The Airflow worker resources are exhausted

AnswerD

If workers are busy or the cluster is under-provisioned, tasks will be queued.

Why this answer

When tasks are queued but not executed, it typically indicates that the Airflow workers have no available slots to pick up new tasks. In Cloud Composer, the Celery executor distributes tasks to workers; if all worker concurrency slots are saturated or the worker node pool is under-provisioned, tasks remain in the 'queued' state until a worker becomes free. This is the most likely cause given the symptom of tasks being queued without execution.

Exam trap

The trap here is that candidates confuse the roles of Airflow components (web server, scheduler, worker) and assume a UI or DAG access issue causes queued tasks, when in reality the worker capacity is the bottleneck.

How to eliminate wrong answers

Option A is wrong because the Airflow web server is responsible for the UI and DAG parsing, not for executing tasks; if it were down, the UI would be inaccessible but tasks could still be queued and executed by workers. Option B is wrong because a corrupted DAG file would cause a parse error, preventing the DAG from being scheduled or appearing in the UI, not leaving tasks in a queued state. Option C is wrong because if the Cloud Storage bucket containing DAGs were not accessible, the DAGs would not be synced to the Airflow environment at all, resulting in missing DAGs rather than queued tasks.

Full explanation →

323

Multi-Selecthard

Which THREE actions should be taken to manage model versions effectively?

Select 3 answers

A.Delete old versions immediately

B.Use Vertex AI Model Registry

C.Set up model evaluation alerts

D.Use the same model name for all versions

E.Assign version aliases like 'champion' and 'experiment'

AnswersB, C, E

Model Registry provides versioning and deployment control.

Why this answer

Vertex AI Model Registry is a centralized repository that tracks, versions, and manages ML models. It enables you to organize models, assign aliases (like 'champion' or 'experiment'), and control deployment, ensuring reproducibility and governance across the model lifecycle.

Exam trap

Google Cloud often tests the misconception that deleting old versions is a best practice for storage optimization, when in reality versioning requires retaining history for reproducibility and rollback, and that aliases are the correct mechanism for labeling model stages.

Full explanation →

324

MCQhard

A company serves a PyTorch model using a custom container on Vertex AI Prediction. They notice that after a few hours, the endpoint returns 502 errors. The logs show 'Out of memory' errors. The container has a memory limit of 4GB, and the model loads a 3GB vocabulary file. What is the most likely cause and best fix?

A.Increase the container memory to 8GB.

B.Load the vocabulary file once at startup and reuse it.

C.Increase the number of replicas to distribute load.

D.Switch to Vertex AI Batch Prediction.

AnswerB

Prevents repeated loading, solving OOM.

Why this answer

The 502 errors and 'Out of memory' errors indicate that the container is running out of memory during inference. Since the model loads a 3GB vocabulary file, and the container has only 4GB of memory, loading this file repeatedly for each prediction request (e.g., inside the prediction handler) would quickly exhaust memory. The correct fix is to load the vocabulary file once at container startup and reuse it across all requests, which is a standard best practice for serving models with large static assets.

Exam trap

Google Cloud often tests the misconception that OOM errors are always solved by increasing memory, but the trap here is that the real issue is inefficient resource reuse—loading a large file per request—rather than insufficient total memory.

How to eliminate wrong answers

Option A is wrong because simply increasing memory to 8GB does not address the root cause—the vocabulary file is being loaded repeatedly, which will still cause memory bloat and eventual OOM errors, just at a higher threshold. Option C is wrong because increasing replicas distributes incoming traffic but does not fix the per-container memory leak caused by repeated vocabulary loading; each replica would still suffer the same OOM issue. Option D is wrong because switching to Batch Prediction is for offline, asynchronous processing, not for real-time serving, and does not solve the memory management problem within the container.

Full explanation →

325

Multi-Selectmedium

Your team manages multiple ML models on Vertex AI. You need to implement a centralized monitoring solution to track model performance over time. Which TWO approaches should you consider? (Choose two.)

Select 2 answers

A.Store all prediction logs in BigQuery and analyze using SQL.

B.Use Cloud Source Repositories to track model code versions.

C.Create Cloud Monitoring dashboards and alerts based on Vertex AI metrics.

D.Use Vertex AI Model Monitoring to detect training-serving skew and feature drift for each model.

E.Enable Cloud Billing budgets to track cost per model.

AnswersC, D

Centralized view of all models.

Why this answer

Option C is correct because Cloud Monitoring provides centralized dashboards and alerting for Vertex AI metrics such as prediction latency, request count, and error rates, enabling you to track model performance over time without additional infrastructure. Option D is correct because Vertex AI Model Monitoring is purpose-built to detect training-serving skew and feature drift by comparing serving data distributions to training data, which is essential for maintaining model performance in production.

Exam trap

The trap here is that candidates may confuse logging (Option A) or cost tracking (Option E) with performance monitoring, or mistakenly think version control (Option B) is part of monitoring, when the question specifically asks for centralized monitoring of model performance over time.

Full explanation →

326

MCQhard

A company uses Vertex AI Experiments to track ML training runs. They want to enforce that all training runs use only approved libraries from a central Artifact Registry to ensure compliance. Which approach should they take?

A.Use a startup script in the training VM to install libraries from Artifact Registry.

B.Use Vertex AI Pipelines with a component that pulls libraries from Artifact Registry at runtime.

C.Create a custom Vertex AI training container that installs libraries from Artifact Registry at build time and restrict training job submission to that container using IAM.

D.Configure Vertex AI Training with a custom job configuration that specifies the library sources.

E.Use Cloud Build to build the training image with approved libraries and push to Container Registry, then restrict training jobs to that image.

AnswerC

This encapsulates libraries in the container and controls usage.

Why this answer

Option C is correct because it enforces compliance at the image level: by building a custom container that installs only approved libraries from Artifact Registry at build time, and then restricting training job submission to that specific container using IAM, you ensure that no unauthorized libraries can be introduced at runtime. This approach eliminates the risk of developers injecting unapproved dependencies via startup scripts or runtime pulls, and it aligns with the principle of immutable infrastructure for ML training.

Exam trap

The trap here is that candidates confuse runtime library installation (options A, B, D) with build-time image hardening (option C), overlooking that only a pre-built, IAM-restricted container can truly prevent unauthorized dependencies from being loaded during training.

How to eliminate wrong answers

Option A is wrong because a startup script runs after the VM starts, allowing users to modify or override the library list at runtime, which does not enforce compliance. Option B is wrong because pulling libraries from Artifact Registry at runtime still permits dynamic changes to dependencies, and the pipeline component itself could be modified to pull from other sources. Option D is wrong because a custom job configuration only specifies library sources as metadata; it does not prevent the training job from installing additional or different libraries during execution.

Option E is wrong because it pushes the image to Container Registry (now deprecated in favor of Artifact Registry) and does not restrict training jobs to that image via IAM—any user with permissions could submit a job using a different image.

Full explanation →

327

MCQhard

A financial services company uses Vertex AI Pipelines to train and deploy models for fraud detection. The ML team consists of data scientists who develop models and ML engineers who deploy them. They use a CI/CD pipeline with Cloud Build to build and push Docker images to Artifact Registry, then trigger Vertex AI Pipelines. Recently, the team noticed that a model deployed to production was trained on a dataset that had not been approved by the data governance team. Upon investigation, they found that a data scientist accidentally used an unapproved version of the training data by specifying a Cloud Storage path that was not the latest approved dataset. The company needs to enforce that only approved datasets are used in training jobs. Which approach should they take?

A.Implement a manual approval process where data scientists request dataset paths from the data governance team before each training run.

B.After training, run a validation step that checks if the dataset used matches the latest approved version, and roll back if not.

C.Use a curated dataset registry in BigQuery or Cloud Storage with IAM conditions that allow access only to datasets tagged as 'approved'. Modify the CI/CD pipeline to pass only approved dataset references to the training job.

D.Restrict all Cloud Storage buckets to be read-only for the data scientists, and have ML engineers copy approved datasets to a separate bucket.

AnswerC

This automates governance by restricting training to approved datasets via IAM and pipeline configuration.

Why this answer

Option C is correct because it enforces governance at the source by using IAM conditions to restrict access to only approved datasets, preventing unauthorized data from being used in training. This approach integrates with the CI/CD pipeline to automatically pass only approved dataset references, eliminating the risk of human error in specifying Cloud Storage paths.

Exam trap

Google Cloud often tests the distinction between reactive validation (Option B) and proactive enforcement (Option C), where candidates mistakenly choose a post-training check that wastes resources instead of a preventive IAM-based control.

How to eliminate wrong answers

Option A is wrong because a manual approval process is error-prone, slow, and does not scale; it relies on human compliance rather than automated enforcement, leaving the system vulnerable to accidental misuse. Option B is wrong because it is reactive—it detects the issue after training has already occurred, wasting compute resources and potentially exposing the model to unapproved data before rollback. Option D is wrong because it restricts data scientists' access entirely, which hinders their ability to experiment and develop models; it also shifts the burden to ML engineers without addressing the root cause of dataset version control.

Full explanation →

328

MCQmedium

A team of ML engineers is collaborating on a project using Vertex AI. They want to ensure that only approved models are deployed to production. Which approach should they use?

A.Store all models in a Cloud Storage bucket and manually control access via IAM permissions.

B.Deploy models directly from training jobs to an endpoint without version tracking.

C.Use Vertex AI Model Registry with version aliases to manage model versions and promote them after approval.

D.Use Cloud Dataflow to transform raw predictions and then store them in BigQuery for analysis.

AnswerC

Model Registry provides version control, staging, and alias-based deployment.

Why this answer

Vertex AI Model Registry provides a centralized repository for managing model versions, with support for version aliases (e.g., 'champion', 'challenger') that allow teams to promote models to production only after approval. This ensures governance and traceability, meeting the requirement that only approved models are deployed.

Exam trap

The trap here is that candidates may confuse storage access control (IAM) with model lifecycle governance, or assume that any data pipeline tool (Dataflow) can manage model approvals, when in fact only a dedicated model registry with version aliases provides the required approval workflow and traceability.

How to eliminate wrong answers

Option A is wrong because storing models in Cloud Storage with manual IAM control lacks version tracking, approval workflows, and integration with Vertex AI's deployment services, making it error-prone and unscalable for production governance. Option B is wrong because deploying directly from training jobs without version tracking bypasses model validation, approval gates, and rollback capabilities, violating the requirement for controlled production deployments. Option D is wrong because Cloud Dataflow is a data processing service for stream/batch pipelines, not a model management or approval mechanism; it is irrelevant to controlling which models are deployed.

Full explanation →

329

MCQeasy

A company deploys an online prediction model serving 100 requests per second. They are optimizing for both latency and throughput. Which monitoring strategy should they use?

A.Monitor only the request count and set an alert if it drops below a threshold.

B.Set a single alert on the 99th percentile latency and ignore throughput since it's already high.

C.Monitor the error rate and set an alert if it exceeds 1%.

D.Monitor both the p50 and p99 latency, and the request count. Create a dashboard showing latency vs. throughput at different load levels.

AnswerD

Allows understanding of the relationship.

Why this answer

Option D is correct because monitoring both p50 and p99 latency alongside request count provides a comprehensive view of system performance under load. Latency percentiles reveal tail behavior (p99) and typical user experience (p50), while request count tracks throughput. A dashboard correlating latency vs. throughput at different load levels is essential for identifying performance cliffs or degradation before failures occur, aligning with best practices for production ML inference systems.

Exam trap

The trap here is that candidates often focus on a single metric (e.g., error rate or p99 latency) and overlook the need for multi-metric correlation, especially the latency-throughput trade-off, which is a core concept in monitoring ML systems under production load.

How to eliminate wrong answers

Option A is wrong because monitoring only request count and alerting on a drop below threshold ignores latency and error rate, missing critical issues like increased response times or silent failures that degrade user experience without reducing request count. Option B is wrong because setting a single alert on p99 latency and ignoring throughput neglects the trade-off between latency and throughput; high throughput can mask latency spikes, and p99 alone does not capture system capacity limits or performance under varying load. Option C is wrong because monitoring only error rate and alerting on 1% misses latency degradation and throughput drops; a system can have low error rates but high latency (e.g., due to queue buildup) or reduced throughput, both of which violate performance objectives.

Full explanation →

330

Multi-Selectmedium

Which TWO of the following are best practices for managing data in a collaborative machine learning environment on Google Cloud?

Select 2 answers

A.Always replicate data across multiple regions to ensure low latency.

B.Implement fine-grained access control using IAM conditions.

C.Use Cloud Data Catalog to discover and annotate datasets.

D.Store all raw data in a single Cloud Storage bucket for easy access.

E.Use data versioning with tools like DVC or Dataflow to track changes.

AnswersC, E

Data Catalog aids in data governance and collaboration.

Why this answer

Option C is correct because Cloud Data Catalog provides a managed metadata management service that allows teams to discover, annotate, and manage datasets across Google Cloud. It enables data scientists to search for datasets by tags, descriptions, and schema, which is essential for collaboration and data governance in a multi-user ML environment.

Exam trap

Google Cloud often tests the misconception that 'replication equals performance' or that 'single bucket simplicity is best,' when in reality collaborative ML requires discoverability (Data Catalog) and reproducibility (versioning) over raw storage or access control alone.

Full explanation →

331

MCQmedium

A company deploys a model on Vertex AI Prediction with autoscaling enabled. They notice that during a traffic spike, new instances take several minutes to become available, causing high latency. What is the best solution?

A.Disable autoscaling and use a fixed number of replicas

B.Increase the max replicas setting

C.Decrease the machine type to reduce provisioning time

D.Set a higher min replicas to maintain a baseline of warm instances

AnswerD

Warm instances reduce latency during spikes.

Why this answer

Option D is correct because setting a higher min replicas ensures that a baseline number of instances are always warm and ready to serve traffic. During a traffic spike, new instances still take time to provision (cold start), but the warm instances handle the initial surge without latency spikes. This directly addresses the observed high latency during spikes.

Exam trap

Google Cloud often tests the misconception that increasing max replicas or decreasing machine type solves cold-start latency, when the real solution is maintaining a warm baseline via min replicas.

How to eliminate wrong answers

Option A is wrong because disabling autoscaling and using a fixed number of replicas eliminates elasticity, leading to either over-provisioning (cost) or under-provisioning (latency) during variable traffic. Option B is wrong because increasing max replicas only raises the ceiling for scaling out; it does not reduce the cold-start provisioning time for new instances during a spike. Option C is wrong because decreasing the machine type reduces compute capacity per instance, which can increase latency under load, and does not meaningfully reduce provisioning time (which is dominated by container image pull and model loading, not machine type).

Full explanation →

332

Drag & Dropmedium

Drag and drop the steps to create and deploy a custom ML model on Vertex AI using a container in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First, build and push the container, then register the model, deploy to an endpoint, and finally test.

Full explanation →

333

MCQhard

Your team has deployed a PyTorch model using a custom container on Vertex AI Prediction. The model uses dynamic batching to combine incoming requests. You notice that the average latency is 150 ms, but the 99th percentile latency is 2 seconds. Cloud Monitoring shows that the CPU is idle much of the time, and GPU utilization is around 70%. The model is deployed on a single n1-standard-4 with a T4 GPU. You suspect the issue is related to request queuing. Which change would most effectively reduce tail latency?

A.Add a second replica to share the load.

B.Increase the batch timeout to allow larger batches to form, reducing the number of batches.

C.Decrease the batch size to reduce processing time per batch.

D.Implement a priority queue to handle high-priority requests first.

AnswerA

More replicas reduce queue depth and tail latency.

Why this answer

Option C is correct because adding a replica reduces the queue length per replica, thus reducing waiting time for requests. Option A might increase tail latency if timeout is too long. Option B could reduce processing time but not queuing delay.

Option D adds complexity and doesn't address root cause.

Full explanation →

334

MCQmedium

A company uses Vertex AI Pipelines for model training and deployment. The pipeline includes a model evaluation step that produces metrics. If the metrics are below a threshold, the pipeline should fail and not deploy. Which component should they use?

A.Use a conditional operator in the pipeline to skip or fail based on metrics.

B.A Python component that uses the SDK to raise an exception if metrics are low.

C.A Vertex AI Model Evaluation component configured with a threshold.

D.Use Cloud Monitoring to trigger an alert and manually stop deployment.

E.A custom container that returns a non-zero exit code on failure.

AnswerA

Conditionals are the standard way to control pipeline flow based on data.

Why this answer

Option A is correct because Vertex AI Pipelines supports conditional execution via the `Condition` component or `if/else` operators within the pipeline DAG. This allows you to evaluate model metrics (e.g., accuracy, AUC) and, if they fall below a defined threshold, either skip the deployment step or explicitly fail the pipeline using `PipelineTask.fail()` or a conditional branch that raises an error. This is the native, declarative way to control pipeline flow based on evaluation results without relying on external services or manual intervention.

Exam trap

The trap here is that candidates confuse raising an exception in a component (Option B) with pipeline-level conditional failure, not realizing that exceptions may not propagate correctly in a distributed pipeline and that Vertex AI Pipelines provides explicit conditional operators for this exact purpose.

How to eliminate wrong answers

Option B is wrong because raising an exception inside a Python component using the SDK does not cleanly fail the pipeline in a controlled, observable manner; it may cause the component to retry or hang depending on the pipeline's error handling configuration, and it bypasses the pipeline's built-in conditional logic. Option C is wrong because Vertex AI Model Evaluation component does not have a configurable threshold that automatically fails the pipeline; it only produces evaluation metrics, and the threshold logic must be implemented separately (e.g., via a conditional). Option D is wrong because Cloud Monitoring alerts are for observability and manual intervention, not for programmatically failing a pipeline; this approach introduces latency and human error, and does not integrate with Vertex AI Pipelines' native failure mechanisms.

Option E is wrong because a custom container returning a non-zero exit code will cause the pipeline step to fail, but it does not provide a way to conditionally fail based on metrics without additional logic inside the container; moreover, it is less maintainable and less transparent than using a built-in conditional operator.

Full explanation →

335

MCQhard

Refer to the exhibit. What is being configured?

A.A model training pipeline

B.A batch prediction job

C.An endpoint with autoscaling based on request count

D.An endpoint with autoscaling based on CPU utilization

AnswerC

The autoscaling metric is 'prediction/online/requests'.

Why this answer

The exhibit shows the configuration of an Amazon SageMaker endpoint with a scaling policy that uses 'InvocationsPerInstance' as the target metric. This is the standard method for enabling autoscaling based on request count, where the scaling policy adjusts the number of instances to maintain a target number of invocations per instance. Option C is correct because the configuration explicitly sets the target tracking metric to 'SageMakerVariantInvocationsPerInstance', which triggers scaling based on request count.

Exam trap

Google Cloud often tests the distinction between request-count-based and CPU-based autoscaling; the trap here is that candidates see 'autoscaling' and assume CPU utilization is the default metric, but the exhibit explicitly shows the invocation-based metric, making Option D a distractor for those who do not read the configuration details carefully.

How to eliminate wrong answers

Option A is wrong because a model training pipeline involves steps like data preprocessing, training, and evaluation, not endpoint scaling policies or instance count settings. Option B is wrong because a batch prediction job uses a transform job or batch transform, not a persistent endpoint with autoscaling and invocation metrics. Option D is wrong because the exhibit shows 'InvocationsPerInstance' as the target metric, not CPU utilization; CPU-based autoscaling would use a metric like 'CPUUtilization' from CloudWatch, not the invocation-based metric configured here.

Full explanation →

336

MCQhard

The exhibit shows a Cloud Composer environment variable configuration. An ML pipeline DAG fails with an authentication error when trying to access Vertex AI. What is the most likely cause?

A.The Airflow worker does not have the proper scopes to access Vertex AI

B.The service account key in the environment variable is expired

C.The DAG file is missing a required Python library

D.The Cloud Composer environment is in a different project than Vertex AI

AnswerA

The environment variable 'GOOGLE_APPLICATION_CREDENTIALS' is set to a service account key path, but the worker VM may not have the necessary scopes.

Why this answer

The authentication error when accessing Vertex AI from Cloud Composer most likely occurs because the Airflow worker's service account lacks the necessary OAuth scopes or IAM permissions. Cloud Composer uses a worker service account to execute tasks; if this account does not have the `https://www.googleapis.com/auth/cloud-platform` scope or the `aiplatform.user` role, the Airflow worker cannot authenticate to Vertex AI APIs, resulting in a 403 or 401 error.

Exam trap

Google Cloud often tests the distinction between authentication (scopes/identity) and authorization (IAM roles), so candidates mistakenly blame cross-project configuration or missing libraries when the root cause is the worker's service account lacking the required OAuth scopes.

How to eliminate wrong answers

Option B is wrong because expired service account keys would cause a different error (e.g., 'invalid_grant' or 'expired key'), not a generic authentication error, and Cloud Composer typically uses a service account attached to the environment, not a key stored in an environment variable. Option C is wrong because a missing Python library would raise an ImportError or ModuleNotFoundError, not an authentication error. Option D is wrong because Cloud Composer and Vertex AI can be in different projects as long as the service account has cross-project IAM permissions; the error would be a permission denied, not an authentication failure.

Full explanation →

337

MCQmedium

A team has a prototype image classification model trained on a small dataset using TensorFlow Keras on a single GPU. They need to train on a larger dataset (1 million images) using a distributed strategy on Vertex AI with 8 GPUs. They implement a MirroredStrategy for data parallelism. During the first few epochs, the training speed does not improve significantly compared to a single GPU, and GPU utilization is low. The data is stored as JPEG files in Cloud Storage, and the input pipeline uses tf.data with map to decode images. What is the most likely cause?

A.The batch size per GPU is too large.

B.The MirroredStrategy is not properly configured.

C.The data loading from Cloud Storage is a bottleneck.

D.The model is too small for distributed training.

AnswerC

I/O bottleneck starves GPUs, causing low utilization.

Why this answer

Option B is correct because reading and decoding JPEG images from Cloud Storage can be I/O-bound, causing low GPU utilization. Option A is wrong because large batch size per GPU could cause memory issues but not low utilization. Option C is wrong because MirroredStrategy is typically configured correctly.

Option D is wrong because even if the model is small, distributed training should still improve throughput if the pipeline is not bottlenecked.

Full explanation →

338

MCQhard

A company uses Vertex AI Predictions with a custom container that invokes an external API for feature enrichment. The prediction response time is highly variable. The engineer wants to monitor the external API's contribution to latency. What should the engineer do?

A.Instrument the prediction container to emit custom metrics for the time spent in each prediction step, including the external API call.

B.Add a timeout setting to the endpoint's request to limit the external API call duration.

C.Monitor the Vertex AI endpoint latency metric and correlate with system metrics like CPU and memory.

D.Use Cloud Trace to trace the prediction request end-to-end, including the external API call.

AnswerA

Custom metrics provide granular breakdown.

Why this answer

Option A is correct because instrumenting the custom container to emit custom metrics (e.g., using OpenTelemetry or a Prometheus client library) allows the engineer to directly measure the time spent in each prediction step, isolating the external API call's contribution to latency. This provides granular, real-time visibility into the specific bottleneck, which is essential when the response time is highly variable and the external API is a known dependency.

Exam trap

Google Cloud often tests the distinction between monitoring (custom metrics) and tracing (Cloud Trace) — the trap here is that candidates assume Cloud Trace automatically captures all downstream calls, but it requires explicit instrumentation of the external API call to record its duration, whereas custom metrics can be emitted directly from the container code without needing distributed tracing context.

How to eliminate wrong answers

Option B is wrong because adding a timeout setting to the endpoint's request limits the duration of the external API call but does not provide any monitoring data; it only caps the latency, potentially causing failures without diagnosing the root cause. Option C is wrong because monitoring the Vertex AI endpoint latency metric and correlating with CPU/memory only gives aggregate performance data, not the specific contribution of the external API call, making it impossible to isolate the external API's impact. Option D is wrong because Cloud Trace can trace the request end-to-end, but it requires the custom container to be instrumented with trace context propagation; without explicit instrumentation of the external API call, Cloud Trace will not capture the time spent in that external call, leaving the same gap in visibility.

Full explanation →

339

MCQmedium

A financial services company wants to detect fraudulent transactions in real-time. They have a trained XGBoost model that runs on a single Compute Engine instance. The current solution processes about 100 transactions per second, but they need to scale to 10,000 transactions per second. Which approach should they take?

A.Increase the VM to a machine type with more vCPUs and memory

B.Deploy the model to Vertex AI Prediction with autoscaling enabled

C.Use Dataflow to process transactions in micro-batches every second

D.Rewrite the model as a Cloud Function triggered by Pub/Sub messages

AnswerB

Vertex AI Prediction automatically scales based on traffic.

Why this answer

Vertex AI Prediction with autoscaling is the correct choice because it is purpose-built for serving ML models at scale, automatically adjusting the number of compute nodes based on incoming request traffic. This allows the company to seamlessly handle the increase from 100 to 10,000 transactions per second without manual intervention, while XGBoost is natively supported as a framework.

Exam trap

Google Cloud often tests the misconception that vertical scaling (bigger VM) is sufficient for large throughput increases, when in reality horizontal scaling with a managed service like Vertex AI is required for elasticity and high availability.

How to eliminate wrong answers

Option A is wrong because simply scaling up a single VM (vertical scaling) has hardware limits and cannot reliably handle a 100x increase in throughput; it also introduces a single point of failure and lacks automatic scaling. Option C is wrong because Dataflow is designed for batch and stream processing pipelines, not for low-latency real-time model serving; processing in micro-batches every second would add unacceptable latency for fraud detection. Option D is wrong because Cloud Functions have a maximum timeout of 9 minutes and are not designed for sustained high-throughput inference workloads; they are better suited for lightweight, event-driven tasks, not for serving a complex XGBoost model at 10,000 TPS.

Full explanation →

340

Multi-Selecteasy

Which TWO actions are best practices when scaling a prototype ML model to production in Google Cloud?

Select 2 answers

A.Store and manage features in a feature store like Vertex AI Feature Store.

B.Test the model only on a small sample of the production data to save costs.

C.Set up monitoring and logging for model performance and data drift.

D.Manually scale inference instances based on historical traffic patterns.

E.Use one-hot encoding for all categorical features without considering cardinality.

AnswersA, C

Feature store ensures consistency and reuse across models.

Why this answer

Vertex AI Feature Store centralizes feature management, ensuring consistency between training and serving. This eliminates training-serving skew by providing a single source of truth for features, which is critical when scaling from prototype to production.

Exam trap

Google Cloud often tests the misconception that cost-saving shortcuts like limited testing or manual scaling are acceptable in production, when in fact reliability and monitoring are non-negotiable for ML systems at scale.

Full explanation →

341

MCQhard

A retail company uses Vertex AI Tabular (AutoML Tables) to build a customer churn prediction model. The training dataset contains 50,000 rows and 30 features, with a 5% churn rate. The model achieves an AUC of 0.85 on the test set. When deployed for online predictions, the average latency is 800ms, while the business requirement is under 200ms. The engineer has already reduced the feature set to 10 features, but latency only dropped to 600ms. The model size is 2GB. The endpoint is in us-central1 using an n1-standard-4 machine with minReplicaCount=1. What should the engineer do to meet the latency requirement?

A.Move the endpoint to a region geographically closer to the majority of customers.

B.Use a larger machine type (e.g., n1-highmem-8) for the endpoint.

C.Convert the model to a custom TensorFlow Lite model and deploy it.

D.Enable model compression in Vertex AI Tabular.

AnswerD

Model compression reduces model size and inference latency, which directly addresses the issue.

Why this answer

Vertex AI Tabular (AutoML Tables) supports model compression, which reduces model size and inference latency by applying techniques like quantization and pruning. Since the model is 2GB and latency is 600ms (still above the 200ms target), enabling compression can shrink the model significantly, often cutting latency by 2-3x, directly meeting the requirement without changing infrastructure or converting to a different framework.

Exam trap

Google Cloud often tests the misconception that latency issues are always solved by scaling up hardware or changing regions, but the real bottleneck here is model size and inference complexity, which Vertex AI Tabular's built-in compression directly addresses without requiring framework conversion or infrastructure changes.

How to eliminate wrong answers

Option A is wrong because moving the endpoint geographically reduces network round-trip time, but the 600ms latency is dominated by model inference time on the server, not network latency; the business requirement is under 200ms total, and network latency is typically <50ms within a region. Option B is wrong because using a larger machine type (e.g., n1-highmem-8) increases CPU/memory resources, but AutoML Tables models are often CPU-bound and the bottleneck is model size and complexity, not compute capacity; a larger machine may only shave off a small fraction of latency and is cost-inefficient. Option C is wrong because Vertex AI Tabular models are not TensorFlow-based; they are ensemble models (e.g., gradient-boosted trees, neural networks) that cannot be directly converted to TensorFlow Lite, which is designed for TensorFlow/Keras models, not AutoML Tables output.

Full explanation →

342

MCQhard

A real-time recommendation model deployed on Vertex AI Endpoints is experiencing increased latency, especially during peak hours. The model is hosted on a single machine with 4 CPUs. Which set of actions should you take to diagnose and resolve the issue?

A.Increase the machine type to with 32 CPUs and disable autoscaling.

B.Switch the endpoint to use GPUs and enable batch requests.

C.Enable autoscaling on the endpoint and analyze request patterns to set min/max instances.

D.Change the serving framework to use TensorFlow Serving with gRPC.

AnswerC

Autoscaling handles peak load efficiently.

Why this answer

Option C is correct because enabling autoscaling on a Vertex AI Endpoint allows the deployment to dynamically adjust the number of serving instances based on real-time traffic, directly addressing peak-hour latency. Analyzing request patterns to set appropriate min/max instances ensures that the endpoint scales proactively without over-provisioning, which is the standard diagnostic and resolution approach for latency issues caused by insufficient capacity under variable load.

Exam trap

Google Cloud often tests the misconception that scaling up (vertical scaling) or changing frameworks is the first step to fix latency, when the correct approach is to first diagnose capacity constraints and then scale out horizontally with autoscaling.

How to eliminate wrong answers

Option A is wrong because simply increasing the machine type to 32 CPUs without autoscaling does not resolve peak-hour latency; it only increases static capacity, leading to over-provisioning during low traffic and still failing under sudden spikes if the single instance is overwhelmed. Option B is wrong because switching to GPUs is not a direct fix for latency caused by CPU-bound serving; GPUs benefit compute-heavy models (e.g., deep learning) but add overhead for small models, and enabling batch requests increases latency for real-time predictions as it waits to accumulate requests. Option D is wrong because changing the serving framework to TensorFlow Serving with gRPC does not address the root cause of insufficient compute capacity; it may improve throughput per instance but cannot compensate for a single machine being overloaded during peak hours.

Full explanation →

343

MCQmedium

A company uses AutoML Tables to predict customer churn. The model's AUC is low. Which action is most likely to improve performance?

A.Use a different optimization objective

B.Add more training data

C.Increase the training budget to 10 hours

D.Remove features with low importance

AnswerB

Correct: More data generally improves model performance.

Why this answer

Adding more training data often helps improve model performance. Increasing the training budget alone may not help if data is insufficient. Removing features with low importance could hurt.

Changing the optimization objective may not directly improve AUC.

Full explanation →

344

MCQeasy

Refer to the exhibit. What does this query return?

A.The maximum latency per minute

B.The error rate per minute

C.The total number of predictions per minute

D.The average latency per minute for the model

AnswerD

The query uses 'mean' aggregator over 1-minute windows.

Why this answer

The query uses the `rate` function to calculate the per-second rate of increase of the `latency_seconds` counter, and then applies the `avg` aggregator to compute the average latency across all instances over the specified time range. The `by (model)` clause groups the result by the `model` label, so the output is the average latency per minute for each model. This is why option D is correct.

Exam trap

Google Cloud often tests the distinction between `avg` and `max` aggregators in PromQL queries, and candidates mistakenly think `rate` alone implies a maximum or total, rather than understanding that `avg` computes the mean over the rate values.

How to eliminate wrong answers

Option A is wrong because the query uses `avg` to compute the average, not `max` to find the maximum latency per minute. Option B is wrong because the query operates on `latency_seconds`, a latency metric, not on an error counter or error rate metric. Option C is wrong because the query uses `avg` to average latency values, not `sum` or `count` to total the number of predictions per minute.

Full explanation →

345

MCQmedium

An MLOps team needs to automatically retrain a model when new training data becomes available. They use Vertex AI Pipelines. What is the recommended way to trigger the pipeline?

A.Use Model Evaluation to decide

B.Set up a trigger in Vertex AI Pipelines

C.Cloud Functions triggered by Cloud Storage events

D.Cloud Scheduler on a daily basis

AnswerC

Cloud Functions can listen for object finalize events in Cloud Storage and start the pipeline.

Why this answer

Option C is correct because Vertex AI Pipelines does not natively support event-driven triggers. The recommended pattern is to use Cloud Functions, which can be triggered by Cloud Storage events (e.g., object finalize/create) when new training data is uploaded. The Cloud Function then programmatically submits the pipeline run via the Vertex AI Pipelines client library or REST API, enabling an automated retraining workflow.

Exam trap

The trap here is that candidates assume Vertex AI Pipelines has a built-in trigger mechanism (Option B) because many CI/CD tools do, but Google Cloud's recommended pattern relies on external event-driven services like Cloud Functions.

How to eliminate wrong answers

Option A is wrong because Model Evaluation is a post-training assessment step, not a trigger mechanism; it cannot initiate pipeline execution. Option B is wrong because Vertex AI Pipelines itself does not provide a built-in trigger; triggers must be implemented externally via Cloud Functions, Cloud Scheduler, or similar services. Option D is wrong because Cloud Scheduler on a daily basis is a time-based trigger, not an event-driven one; it would retrain on a fixed schedule regardless of whether new data has arrived, leading to unnecessary runs or missed retraining opportunities.

Full explanation →

346

Multi-Selecteasy

Which TWO statements about Vertex AI Feature Store are correct? (Choose 2)

Select 2 answers

A.Feature Store automatically applies feature engineering transformations.

B.Feature Store can only store numerical features.

C.Feature Store can only be used with Vertex AI models.

D.Feature Store provides a centralized repository for feature data.

E.Feature Store supports both online and offline serving.

AnswersD, E

Correct: it centralizes features for reuse.

Why this answer

Option D is correct because Vertex AI Feature Store is designed as a centralized repository that organizes, stores, and serves feature data consistently across different models and pipelines. This centralization ensures feature reuse, consistency, and governance, preventing data silos and duplication across the ML lifecycle.

Exam trap

Google Cloud often tests the misconception that Vertex AI Feature Store is tightly coupled to Vertex AI models or that it performs automatic feature engineering, when in fact it is a decoupled storage and serving layer that supports any ML framework and requires explicit feature engineering steps.

Full explanation →

347

MCQeasy

An ML team is using Vertex AI Online Prediction and wants to receive alerts when the 99th percentile latency exceeds 500ms for more than 5 minutes. What is the best practice to set up this alert in Cloud Monitoring?

A.Create a custom metric from the prediction container that emits latency percentiles, then set an alert on that metric.

B.Use the 'aiplatform.googleapis.com/prediction/online_prediction_latencies' metric with a metric threshold condition set to 500ms and a percentile aligner of 99.

C.Use a log-based metric to parse latency from Cloud Logging and alert when the average exceeds 500ms.

D.Export prediction latency logs to BigQuery and run a scheduled query to check the 99th percentile, then trigger a Cloud Function to send an alert.

AnswerB

This directly monitors the 99th percentile latency.

Why this answer

Option B is correct because Cloud Monitoring provides a pre-built metric, `aiplatform.googleapis.com/prediction/online_prediction_latencies`, which directly captures prediction latency. By applying a percentile aligner of 99 and a metric threshold condition of 500ms, you can alert when the 99th percentile latency exceeds 500ms for the specified duration, without needing custom instrumentation or external processing.

Exam trap

Google Cloud often tests the misconception that you must create custom metrics or use log-based solutions for percentile-based alerting, when in fact Cloud Monitoring's distribution metrics and percentile aligners handle this natively.

How to eliminate wrong answers

Option A is wrong because creating a custom metric from the prediction container is unnecessary and adds complexity; Vertex AI already emits the required latency metric natively, and custom metrics would require additional code and maintenance. Option C is wrong because using a log-based metric to parse latency from Cloud Logging and alerting on the average (not the 99th percentile) does not meet the requirement to monitor the 99th percentile latency; log-based metrics also introduce latency and parsing overhead. Option D is wrong because exporting logs to BigQuery and running scheduled queries is an overly complex, non-real-time approach that violates the best practice of using built-in monitoring capabilities; it also introduces additional cost and delay compared to native Cloud Monitoring alerts.

Full explanation →

348

MCQmedium

A company has a TensorFlow model that uses custom operations compiled as .so files. They want to deploy it on Vertex AI for online predictions. The model runs correctly when loaded locally. However, on Vertex AI, the prediction fails with a 'Op type not registered' error. What is the most likely reason?

A.The model is using a deprecated TensorFlow version.

B.The custom ops are not included in the model directory.

C.The prediction request format is incorrect.

D.The custom ops were compiled for a different CPU architecture.

AnswerD

Incompatible instruction sets cause the op to fail to register.

Why this answer

Option D is correct because custom TensorFlow operations compiled as .so files are architecture-specific. If the local machine uses a different CPU architecture (e.g., x86_64 with AVX2) than the Vertex AI serving nodes (e.g., x86_64 without AVX2 or ARM), the dynamic library will fail to load, causing the 'Op type not registered' error. The model runs locally because the ops are available, but on Vertex AI the shared object cannot be loaded, so TensorFlow cannot register the custom kernels.

Exam trap

Google Cloud often tests the misconception that 'Op type not registered' is always due to missing files or version mismatches, but the real trap is that candidates overlook CPU architecture compatibility when deploying compiled custom ops to a cloud environment where the serving hardware may differ from the build environment.

How to eliminate wrong answers

Option A is wrong because a deprecated TensorFlow version would typically cause compatibility warnings or missing API errors, not an 'Op type not registered' error specifically for custom ops; the error indicates the op kernel is missing, not that the version is unsupported. Option B is wrong because if the custom ops were not included in the model directory, the model would fail to load entirely or produce a 'file not found' error, not an 'Op type not registered' error; the error occurs when the .so file is present but cannot be loaded due to architecture mismatch. Option C is wrong because an incorrect prediction request format would result in a 400 Bad Request or a deserialization error, not a TensorFlow runtime error about unregistered ops; the error is raised during model inference, not request parsing.

Full explanation →

349

Multi-Selectmedium

Which THREE are best practices for implementing CI/CD for ML pipelines on Google Cloud? (Choose THREE.)

Select 3 answers

A.Maintain separate environments for dev, staging, and production

B.Track all experiments and artifacts using Vertex ML Metadata

C.Use Cloud Build to automate testing, building, and deployment of pipeline components

D.Design pipelines with low-code components to reduce development time

E.Write unit tests for every training job

AnswersA, B, C

Prevents unintended changes to production.

Why this answer

Maintaining separate environments for dev, staging, and production is a core CI/CD best practice because it isolates changes, prevents accidental breakage in production, and allows thorough validation at each stage. On Google Cloud, this aligns with using distinct Vertex AI Pipelines instances or separate projects to enforce environment-specific configurations and access controls.

Exam trap

Google Cloud often tests the distinction between general software CI/CD practices and ML-specific CI/CD needs, trapping candidates who over-apply traditional unit testing or assume low-code tools are always best practices for production ML pipelines.

Full explanation →

350

MCQeasy

A data science team has deployed a model on Vertex AI and wants to automatically detect when the distribution of a specific feature shifts significantly from the training data. Which service should they use?

A.Cloud Data Loss Prevention

B.Vertex AI Model Monitoring

C.Vertex AI Explainable AI

D.Cloud Composer

AnswerB

Vertex AI Model Monitoring includes skew detection, which compares training and serving distributions and alerts on significant shifts.

Why this answer

Vertex AI Model Monitoring is the correct service because it is specifically designed to detect feature distribution drift (skew) between training and serving data for deployed models. It continuously monitors the input features and alerts when statistical metrics like the Jensen-Shannon divergence or the L-infinity distance exceed a configured threshold, enabling proactive model retraining.

Exam trap

Google Cloud often tests the distinction between monitoring model performance (e.g., accuracy, latency) versus monitoring data distribution drift, and candidates may confuse Vertex AI Model Monitoring with Explainable AI because both involve model analysis, but only Model Monitoring tracks shifts over time.

How to eliminate wrong answers

Option A is wrong because Cloud Data Loss Prevention (DLP) is used for inspecting, classifying, and de-identifying sensitive data (e.g., PII, credit card numbers), not for monitoring feature distributions or model drift. Option C is wrong because Vertex AI Explainable AI provides feature attributions and explanations for model predictions (e.g., Shapley values, integrated gradients), but does not monitor distribution shifts over time. Option D is wrong because Cloud Composer is a managed Apache Airflow service for orchestrating workflows and pipelines, not a dedicated tool for detecting feature drift in deployed models.

Full explanation →

351

MCQhard

A company uses Vertex AI Pipelines with Kubeflow DSL for hyperparameter tuning. They notice that some trials fail due to OOM errors. How should they configure the pipeline to automatically handle this?

A.Use a larger machine type for the whole pipeline

B.Use Cloud Composer to catch failures and resubmit

C.Reduce the number of trials

D.Add a retry policy to the hyperparameter tuning step with backoff

E.Increase the memory for all trials in the pipeline definition

AnswerD

Retries failed trials automatically.

Why this answer

Option D is correct because Vertex AI Pipelines supports retry policies on individual pipeline steps, including hyperparameter tuning jobs. By adding a retry policy with exponential backoff, the pipeline can automatically re-run failed trials caused by transient OOM errors without manual intervention, while avoiding immediate retries that could overload resources.

Exam trap

Google Cloud often tests the misconception that retry policies are only for network requests or that OOM errors require permanent resource increases, when in fact transient OOMs in ML pipelines can be handled gracefully with step-level retries and backoff.

How to eliminate wrong answers

Option A is wrong because using a larger machine type for the whole pipeline is inefficient and costly; it does not target only the failing trials and may not resolve OOM errors if the issue is specific to certain hyperparameter configurations. Option B is wrong because Cloud Composer is an orchestration service for Apache Airflow workflows, not designed to catch and resubmit individual Vertex AI pipeline step failures; it adds unnecessary complexity and latency. Option C is wrong because reducing the number of trials limits the search space and may prevent finding the optimal hyperparameters, without addressing the root cause of OOM errors.

Option E is wrong because increasing memory for all trials in the pipeline definition is a blunt approach that wastes resources on trials that do not need extra memory, and it does not handle transient failures that may occur even with sufficient memory.

Full explanation →

352

Multi-Selecthard

A machine learning team is deploying a model for real-time predictions using Vertex AI. They need to ensure that the deployment follows best practices for collaboration and governance. Which TWO actions should they take?

Select 2 answers

A.Use a continuous integration/continuous deployment (CI/CD) pipeline to deploy model versions.

B.Store all model artifacts in a local file system to reduce latency.

C.Enable model monitoring to detect data drift and performance degradation.

D.Manually configure autoscaling parameters for the endpoint.

E.Allow any team member to deploy directly to production without review.

AnswersA, C

CI/CD ensures consistent, repeatable deployments.

Why this answer

Option A is correct because using a CI/CD pipeline for deploying model versions ensures automated, repeatable, and auditable deployments, which is a best practice for collaboration and governance. This approach enforces version control, testing, and approval gates, reducing the risk of errors and enabling rollback if needed.

Exam trap

Google Cloud often tests the misconception that local storage or manual configuration is acceptable for governance, when in fact centralized artifact storage and automated scaling are required for collaboration and reliability.

Full explanation →

353

MCQmedium

An MLOps team is implementing a CI/CD pipeline for a TensorFlow model on Vertex AI. The model training job takes 2 hours and produces a SavedModel. The team wants to automatically trigger a new pipeline run whenever a change is pushed to the 'main' branch of their source repository. The pipeline should include training, evaluation, and if metrics exceed a threshold, deploy the model to a Vertex AI endpoint. Which trigger configuration should they use?

A.Use Eventarc to listen for Cloud Source Repository push events and invoke a Cloud Run service that starts the pipeline.

B.Use an Artifact Registry trigger to detect new model images and then start the pipeline.

C.Set up a Cloud Scheduler job that runs every 2 hours and triggers a Vertex AI Pipeline run.

D.Configure a Cloud Build trigger that watches the 'main' branch of Cloud Source Repositories; in the build config, use steps to run the pipeline via the Vertex AI API.

AnswerD

Cloud Build triggers are designed for source code events and can orchestrate ML pipelines.

Why this answer

Option D is correct because Cloud Build triggers can be configured to watch a specific branch (e.g., 'main') in Cloud Source Repositories and automatically execute a build configuration. Within that build config, you can use the `gcloud` or `curl` steps to invoke the Vertex AI Pipeline API, which starts the training, evaluation, and conditional deployment workflow. This directly matches the requirement for a branch-based push trigger that orchestrates the full ML pipeline.

Exam trap

Google Cloud often tests the distinction between event-driven triggers (Cloud Build for source code changes) and artifact-based triggers (Artifact Registry for new images), leading candidates to confuse the two when the requirement is to start a pipeline from a code push.

How to eliminate wrong answers

Option A is wrong because Eventarc is designed for event-driven, asynchronous invocations (e.g., from Cloud Storage or Pub/Sub), but it does not natively integrate with Cloud Source Repositories push events; Cloud Build triggers are the correct service for repository push events. Option B is wrong because an Artifact Registry trigger would fire only after a new model image is pushed, but the requirement is to trigger on a source code change (push to 'main'), not on a new artifact. Option C is wrong because a Cloud Scheduler job running every 2 hours is a time-based schedule, not a push-triggered event; it would not respond to code changes and would run even when no changes occur, wasting resources.

Full explanation →

354

MCQmedium

A team uses Vertex AI Feature Store to serve features for online predictions. They notice that the online serving latency is high for certain features. The features are stored in a BigQuery source with high cardinality. What is the best practice to reduce latency?

A.Use batch prediction instead of online prediction.

B.Move the features to Cloud Storage and read them directly.

C.Increase the number of nodes in the feature store cluster.

D.Use feature store caching with a larger cache size.

AnswerD

Caching frequently accessed features reduces BigQuery calls and latency.

Why this answer

Option B is correct because caching can reduce repeated access to BigQuery. Option A might help but not directly address high cardinality; option C would not integrate with Feature Store; option D is a workaround but not best practice.

Full explanation →

355

MCQhard

A company wants to use ML to predict customer churn. They have user activity logs in Cloud Storage, account data in BigQuery, and want an automated pipeline. Which pipeline architecture on Google Cloud should they use?

A.Load both data sources into AutoML Tables and train directly

B.Export logs from Cloud Storage to Cloud Dataproc for preprocessing, then train

C.Use Cloud Functions to preprocess data, then train on AI Platform

D.Use BigQuery to join logs and account data, train on Vertex AI, deploy to an endpoint

AnswerD

Seamless integration: BigQuery queries external tables, Vertex AI trains from BigQuery, endpoint serves.

Why this answer

Option D is correct because it leverages BigQuery's ability to join structured account data with semi-structured logs (via federated queries or external tables), then uses Vertex AI for end-to-end ML training and deployment. This architecture minimizes data movement, keeps the pipeline serverless, and directly addresses the requirement for an automated pipeline with both data sources.

Exam trap

Google Cloud often tests the misconception that AutoML Tables can handle multi-source data natively, when in fact it requires a single pre-joined dataset, and that Cloud Functions are suitable for heavy preprocessing workloads despite their strict resource limits.

How to eliminate wrong answers

Option A is wrong because AutoML Tables requires data to be in a single table format (CSV/JSON) and cannot directly ingest data from two separate sources without prior joining; it also lacks native pipeline automation. Option B is wrong because Cloud Dataproc (managed Spark/Hadoop) is overkill for simple preprocessing and introduces unnecessary cluster management overhead; BigQuery can perform the join and preprocessing more efficiently without spinning up ephemeral clusters. Option C is wrong because Cloud Functions have a 9-minute timeout and 2GB memory limit, making them unsuitable for preprocessing large-scale log data; Vertex AI is the correct training platform, but the preprocessing should be done in BigQuery, not Cloud Functions.

Full explanation →

356

MCQhard

A retail company has been using Vertex AI AutoML to predict store-level demand for each product. They have a pipeline that runs nightly: data is extracted from BigQuery, preprocessed via Dataflow, and then used to train a new AutoML model each night. The model is deployed to a Vertex AI Endpoint for real-time inference. After two months, they notice that predictions for a new product category (recently launched) are consistently inaccurate, with predicted sales far exceeding actuals. They suspect data drift due to the new category. The data scientist has limited coding skills and wants a low-code solution. Which course of action should they take to improve predictions for the new category?

A.Add the product category as a feature in the AutoML dataset and retrain the model with the updated dataset

B.Retrain the model using only data from the new product category to specialize the model for that category

C.Use Vertex AI custom training with a Python script to fine-tune the model on the new category data

D.Remove the new product category from the training data because it causes bias, and rely on the pre-trained model's general pattern

AnswerA

Allows model to learn category-specific demand patterns.

Why this answer

Adding the product category as a feature in the AutoML dataset allows the model to learn the distinct demand patterns of the new category directly from the data. Vertex AI AutoML automatically handles feature engineering and can adjust its predictions based on this categorical input, addressing the data drift without requiring custom code. This low-code approach leverages AutoML's built-in ability to incorporate new features and retrain with minimal manual intervention.

Exam trap

Google Cloud often tests the misconception that specialized models (Option B) or custom training (Option C) are necessary for new data patterns, when in fact AutoML's feature-based retraining is the simplest low-code solution that leverages the model's existing architecture.

How to eliminate wrong answers

Option B is wrong because retraining only on the new category data would discard the valuable historical patterns from other categories, leading to overfitting and poor generalization for the new category. Option C is wrong because it requires custom Python scripting and custom training, which contradicts the low-code requirement and the data scientist's limited coding skills. Option D is wrong because removing the new category from training data would prevent the model from learning its specific patterns, causing the model to continue making inaccurate predictions based on the old distribution.

Full explanation →

357

MCQhard

After setting up model monitoring on Vertex AI for a classification model, the engineer sees a high number of anomaly alerts for the "age" feature. Upon investigation, the age distribution in recent predictions is similar to training data. What might be the cause?

A.The feature importance of age has changed

B.The monitoring baseline was incorrectly set

C.The monitoring threshold for age is too low

D.The model is overfitting to age

AnswerC

A low threshold triggers alerts for small, insignificant deviations.

Why this answer

Option C is correct because the high number of anomaly alerts despite the age distribution being similar to training data indicates that the monitoring threshold for the 'age' feature is set too low. In Vertex AI Model Monitoring, anomaly detection compares recent prediction distributions against a baseline using statistical tests (e.g., the Kolmogorov-Smirnov test for numerical features). If the threshold is too sensitive, even minor, statistically insignificant deviations can trigger alerts, leading to false positives even when the distribution is essentially unchanged.

Exam trap

The trap here is that candidates confuse 'anomaly alerts' with 'model performance degradation' or 'data drift,' but the question specifically states the distribution is similar, so the root cause is a misconfigured sensitivity threshold, not a genuine distribution shift.

How to eliminate wrong answers

Option A is wrong because feature importance measures the contribution of a feature to model predictions, not the distribution of the feature values themselves; a change in feature importance would not directly cause distribution-based anomaly alerts. Option B is wrong because if the monitoring baseline were incorrectly set (e.g., using a non-representative sample), the age distribution in recent predictions would likely differ from the training data, but the question states the distribution is similar, so the baseline is not the issue. Option D is wrong because overfitting to age would manifest as poor generalization on unseen data, not as anomaly alerts on the feature distribution; overfitting does not inherently trigger monitoring alerts unless the distribution shifts.

Full explanation →

358

MCQhard

A large e-commerce company deploys multiple ML models on Vertex AI Endpoints. They use Vertex AI Model Registry to manage model versions. Recently, a team accidentally deployed an unvalidated model to production, causing a service outage. They want to implement a governance process where models must pass certain validation checks before deployment. The validation includes unit tests, fairness checks, and performance benchmarks. They use CI/CD pipelines (Cloud Build). They also need to allow manual approval for critical models. Which combination of Vertex AI features and Cloud Build steps would enforce the required governance?

A.Use Vertex AI Experiments to log validation results and require manual checks before deployment.

B.Set up Cloud Armor to block deployment of unvalidated models.

C.Implement Cloud Build triggers that run validation steps, then use Vertex AI Model Registry 'state' to mark models as 'validated' before allowing deployment to endpoints.

D.Use Vertex AI Continuous Monitoring to automatically detect issues and roll back deployments.

AnswerC

This enforces a gate where only models with appropriate state can be deployed.

Why this answer

Option C is correct because it combines Cloud Build triggers to run validation steps (unit tests, fairness checks, performance benchmarks) and uses Vertex AI Model Registry's 'state' field to mark models as 'validated' only after passing those checks. This state then acts as a gate in the deployment pipeline, ensuring that only validated models can be deployed to Vertex AI Endpoints. The manual approval for critical models can be integrated as a Cloud Build approval step before the state is set to 'validated'.

Exam trap

The trap here is confusing reactive monitoring (Continuous Monitoring) or unrelated security services (Cloud Armor) with proactive deployment governance, while overlooking that Vertex AI Model Registry's state field is the correct mechanism to enforce pre-deployment validation gates.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing ML experiments, not for enforcing deployment governance or blocking deployments; it cannot prevent an unvalidated model from being deployed. Option B is wrong because Cloud Armor is a web application firewall for protecting against DDoS and OWASP attacks, not a service for validating or blocking ML model deployments. Option D is wrong because Vertex AI Continuous Monitoring detects prediction drift and data quality issues after deployment, but it does not prevent the initial deployment of an unvalidated model; it is a reactive, not proactive, governance tool.

Full explanation →

359

MCQhard

Your team is serving a large language model on Vertex AI using a custom container. The endpoint experiences intermittent 502 errors during traffic spikes. The autoscaling configuration uses a CPU utilization target of 60% and the model is deployed on n1-standard-4 instances. The model requires significant memory. Which combination of changes is most likely to resolve the issue?

A.Increase the target CPU utilization to 90% to allow more requests per instance.

B.Switch to a machine type with more memory, e.g., n1-highmem-8, and increase min_replica_count.

C.Enable canary traffic splitting to reduce load on the main endpoint.

D.Reduce the model batch size from 32 to 1 to lower memory per request.

AnswerB

High memory instances reduce memory contention, and more replicas absorb traffic spikes.

Why this answer

The 502 errors likely indicate the instances are overwhelmed or timing out. Increasing the machine type to a high-memory instance reduces memory pressure, and adding more replicas through a lower target scaling metric or higher min replicas provides capacity. Tuning batch size helps but is secondary.

GPU may not help if the issue is memory.

Full explanation →

360

MCQeasy

A developer creates a Cloud Build trigger that runs a training pipeline whenever code is pushed to the main branch of the repository. The trigger is configured to use a source archive stored in Cloud Storage. After pushing code to main, the build fails with the error shown. What is the most likely cause of this failure?

A.The build configuration file is missing from the source archive.

B.The included files filter 'train/**' excludes all files outside the train directory, causing the build to have no source.

C.The source archive is not being updated when code is pushed, so the trigger tries to fetch an old or nonexistent object.

D.The service account does not have storage.objectViewer permission on the bucket.

AnswerC

The trigger points to a static archive; pushing new code does not update the archive, leading to missing source.

Why this answer

Option C is correct because the trigger is configured to use a source archive stored in Cloud Storage. When code is pushed to the main branch, the trigger attempts to fetch the archive from the specified Cloud Storage location. If the archive is not updated (e.g., via a separate upload or a Cloud Function that rebuilds the archive on push), the trigger will either fetch an old version or fail if the object does not exist.

The error indicates that the build cannot proceed because the source archive is stale or missing, not because of a missing config file or permission issue.

Exam trap

The trap here is that candidates assume the included files filter (Option B) causes the failure, but the error is about the source archive itself being outdated or missing, not about which files are included within it.

How to eliminate wrong answers

Option A is wrong because the error message does not indicate a missing build configuration file; a missing cloudbuild.yaml would produce a specific 'build configuration file not found' error, not a generic fetch failure. Option B is wrong because the included files filter 'train/**' only restricts which files are included in the build context, but it does not cause the source archive itself to be missing or stale; the error is about fetching the archive, not about empty source. Option D is wrong because if the service account lacked storage.objectViewer permission on the bucket, the error would be a 403 Forbidden or access denied, not a generic build failure related to source archive retrieval.

Full explanation →

361

MCQmedium

A company deploys a model on Vertex AI Endpoint and expects high traffic spikes during promotional events. The current configuration uses manual scaling with 2 replicas. Which autoscaling configuration should they use to handle spikes while minimizing cost during normal traffic?

A.Keep manual scaling but increase replicas to 10.

B.Set min_replica_count=2 and max_replica_count=10 with no scaling metric.

C.Enable basic scaling with target_cpu_utilization=0.6 and set min_replica_count=2, max_replica_count=10.

D.Use custom metric scaling with a Cloud Monitoring metric for prediction latency.

AnswerC

Basic scaling adjusts replicas based on CPU load.

Why this answer

Option B is correct because basic scaling with a target metric (e.g., CPU utilization) automatically adjusts replicas based on load, reducing cost during low traffic and scaling up during spikes. Option A is wrong because no scaling cannot adapt. Option C is wrong because manual scaling requires constant adjustments.

Option D is wrong because custom metric scaling is possible but basic scaling is simpler and sufficient for CPU-bound models.

Full explanation →

362

Multi-Selecteasy

A team has deployed a model on Vertex AI Prediction and wants to monitor for data drift. Which TWO metrics should they use to detect drift in numerical features?

Select 2 answers

A.Pearson correlation coefficient

B.Jensen-Shannon divergence (JSD)

C.Chi-squared statistic

D.Population Stability Index (PSI)

E.Kolmogorov-Smirnov (KS) statistic

AnswersB, E

JSD measures similarity between two probability distributions and works for numerical features after binning.

Why this answer

Jensen-Shannon divergence (JSD) is a symmetric, bounded (0 to 1) measure of the difference between two probability distributions, making it ideal for detecting drift in numerical features by comparing the training distribution to the serving distribution. It is a smoothed and normalized version of Kullback-Leibler divergence, and Vertex AI Prediction's Model Monitoring natively supports JSD for numerical feature drift detection.

Exam trap

Google Cloud often tests the misconception that Pearson correlation or Chi-squared are appropriate for numerical drift, when in fact Pearson measures correlation between two variables and Chi-squared is for categorical data, leading candidates to overlook the correct distribution-comparison metrics like JSD and KS.

Full explanation →

363

Multi-Selectmedium

A team is responsible for monitoring the health of a Vertex AI pipeline that runs daily. Which THREE resources should they use to gain visibility into pipeline performance and failures? (Choose 3.)

Select 3 answers

A.Cloud Trace for analyzing distributed execution

B.Cloud Composer for tracking DAGs

C.Vertex AI Experiments for comparing pipeline runs

D.Cloud Monitoring for metrics and alerts on pipeline runs

E.Cloud Logging for viewing pipeline step logs

AnswersC, D, E

Vertex AI Experiments tracks pipeline runs and allows comparison of metrics across runs over time.

Why this answer

Vertex AI Experiments (Option C) is correct because it provides a systematic way to log, compare, and analyze pipeline runs, including metrics, parameters, and artifacts. This allows the team to track performance trends across daily runs, identify regressions, and correlate failures with specific run configurations, which is essential for monitoring pipeline health over time.

Exam trap

Google Cloud often tests the distinction between monitoring (observing run-level metrics and logs) and tracing (analyzing request-level latency), leading candidates to incorrectly select Cloud Trace for pipeline health visibility when it is actually intended for distributed request tracing.

Full explanation →

364

MCQmedium

A data science team uses a shared Cloud Storage bucket to store training datasets. They notice that some team members accidentally overwrite existing datasets, causing issues with reproducibility. Which approach best prevents accidental overwrites while maintaining collaboration?

A.Use a single shared service account with strict IAM roles that allow only append operations.

B.Require team members to manually rename files before uploading.

C.Set bucket permissions to read-only for all team members except the data owner.

D.Enable object versioning on the bucket and use lifecycle rules to manage versions.

AnswerD

Versioning allows recovery of previous versions if overwritten.

Why this answer

Option D is correct because enabling object versioning on a Cloud Storage bucket preserves all versions of an object, so even if a team member overwrites a dataset, the previous version remains accessible. This maintains collaboration (anyone can upload) while preventing permanent data loss. Lifecycle rules can then be used to manage storage costs by automatically deleting old versions after a specified period.

Exam trap

The trap here is that candidates may think IAM roles or permissions are the only way to control data integrity, overlooking that object versioning provides a safety net without blocking collaboration.

How to eliminate wrong answers

Option A is wrong because Cloud Storage does not support 'append-only' IAM roles; objects are immutable and must be rewritten entirely, so this approach would not prevent overwrites and would break normal upload workflows. Option B is wrong because relying on manual renaming is error-prone and does not enforce any technical control, so accidental overwrites can still occur. Option C is wrong because making the bucket read-only for most team members prevents them from uploading new datasets at all, which destroys collaboration and is overly restrictive.

Full explanation →

365

MCQeasy

You are responsible for maintaining an ML pipeline that runs daily on Vertex AI Pipelines. The pipeline preprocesses data, trains a model, and deploys it to an endpoint. Recently, the pipeline has been failing at the deployment step because the endpoint already exists and the deploy step tries to create a new endpoint instead of updating the existing one. The pipeline code is written using the Kubeflow Pipelines SDK. You need to modify the pipeline to resolve this issue with minimal changes. What should you do?

A.Change the pipeline to use a Cloud Function that triggers the deployment independently, bypassing Vertex AI Pipelines.

B.In the deployment component, add a check to verify if the endpoint exists, and if so, call the update endpoint method instead of create.

C.Set the deploy component's retry policy to infinite so it eventually succeeds.

D.Manually delete the existing endpoint before each pipeline run.

AnswerB

This directly fixes the deployment logic to handle existing endpoints.

Why this answer

Option A is correct because it addresses the root cause: the deployment component should check if the endpoint exists and update it instead of creating a new one. Option B is wrong because using a Cloud Function bypasses the pipeline orchestration and adds unnecessary complexity. Option C is wrong because retrying will not fix the fundamental issue of trying to create an existing endpoint.

Option D is wrong because manual deletion defeats automation and is not a robust solution.

Full explanation →

366

MCQhard

A travel booking company has a real-time recommendation system that suggests hotels and flights to users. The model is served using TensorFlow Serving on a Google Kubernetes Engine (GKE) cluster with auto-scaling enabled. The cluster uses n1-standard-4 machine types. The team has set up Cloud Monitoring dashboards and alerts. Last week, during a major holiday promotion, the team noticed that the model's inference latency P99 increased from 150 ms to 450 ms over a 30-minute period, while the request throughput increased from 500 to 1,200 requests per second. CPU utilization across the cluster rose to 95%, but memory utilization remained at 60%. The model version and the serving infrastructure configuration have not changed since the last deployment. Which action should the team take to mitigate the latency issue?

A.Implement a feature engineering pipeline that compresses the input features to reduce data size and inference time.

B.Deploy a newer version of the model that uses a more efficient architecture to reduce computational complexity.

C.Increase the number of TensorFlow Serving instances by reducing the CPU request per pod in GKE to allow more pods per node.

D.Add more nodes to the GKE cluster to increase the total CPU resources available for serving.

AnswerD

Adding nodes increases compute capacity, allowing more parallel inference and reducing latency under high load.

Why this answer

The latency spike is caused by CPU saturation (95% utilization) under increased load (500 to 1,200 RPS). Adding more nodes to the GKE cluster directly increases the total CPU resources available, allowing the existing TensorFlow Serving pods to handle the higher throughput without contention. This is the most immediate and infrastructure-appropriate fix because the model version and serving configuration have not changed, ruling out model-level or code-level optimizations.

Exam trap

Google Cloud often tests the misconception that reducing per-pod CPU requests (Option C) is a valid scaling strategy, but in reality this increases overcommitment and can worsen latency under high load, whereas adding nodes (Option D) provides dedicated resources without contention.

How to eliminate wrong answers

Option A is wrong because compressing input features may reduce data size but does not address the root cause of CPU saturation; inference latency is dominated by model computation, not I/O, and the feature engineering pipeline is not part of the serving infrastructure. Option B is wrong because deploying a newer, more efficient model is a long-term optimization, not an immediate mitigation; the question states the model version has not changed and the issue is purely resource contention under load. Option C is wrong because reducing the CPU request per pod would allow more pods per node, but this would increase CPU overcommitment and worsen contention on already saturated nodes, potentially causing further latency degradation or pod evictions.

Full explanation →

367

MCQmedium

A data science team is collaborating on a project to build a churn prediction model. They use Vertex AI Workbench instances for development. Each data scientist has their own instance with a persistent disk. They share code via a GitHub repository. They want to ensure that the model training is reproducible across different team members' environments. Currently, they manually install Python packages in their instances, and they have noticed that the model metrics differ slightly between runs on different instances. Which of the following is the best action to ensure reproducibility?

A.Standardize the instance machine type and ensure all have the same number of CPUs.

B.Use Cloud Functions to run the training code instead.

C.Use Vertex AI Experiments with a fixed environment by specifying a prebuilt container.

D.Create a custom Docker image with all dependencies and use it in Vertex AI Training jobs.

E.Ask all team members to use the same Python virtual environment and install packages from a requirements.txt file.

AnswerC

Experiments track parameters and metrics while ensuring a consistent environment.

Why this answer

Option C is correct because Vertex AI Experiments with a prebuilt container ensures a fixed, reproducible environment by pinning the exact OS, Python version, and all dependencies. This eliminates the variability introduced by manual package installations and differing instance configurations, directly addressing the team's issue of inconsistent model metrics across runs.

Exam trap

Google Cloud often tests the distinction between environment reproducibility (which requires fixed software stacks) and hardware consistency (which is less critical for deterministic training), leading candidates to mistakenly choose hardware standardization (Option A) or manual dependency management (Option E).

How to eliminate wrong answers

Option A is wrong because standardizing machine type and CPU count does not control for differences in Python package versions or system libraries, which are the primary cause of metric discrepancies. Option B is wrong because Cloud Functions are designed for event-driven, stateless workloads and are not suitable for long-running model training jobs; they also do not inherently enforce a fixed environment. Option D is wrong because while a custom Docker image is a valid approach, it is not the best action here because Vertex AI Experiments with a prebuilt container provides a simpler, managed solution that automatically tracks experiments and environments without requiring the team to build and maintain custom images.

Option E is wrong because manually using a requirements.txt file and virtual environments is error-prone and does not guarantee identical system-level dependencies or Python interpreter versions across different instances, leading to subtle reproducibility issues.

Full explanation →

368

MCQmedium

A company uses Vertex AI Model Monitoring to detect data drift. They have a model that predicts house prices. Which dataset should they compare against the training data to detect drift?

A.The entire historical prediction data

B.A random sample of recent predictions

C.The latest batch of predictions

D.The validation data used during training

AnswerC

Comparing the latest serving data distribution to training data detects drift.

Why this answer

Option C is correct because Vertex AI Model Monitoring compares the training data (serving as the baseline) against the latest batch of predictions to detect data drift. This batch represents the most recent inference requests, allowing the monitoring service to compute statistical distribution differences (e.g., Jensen-Shannon divergence) and trigger alerts when drift exceeds a configured threshold. Using the latest batch ensures timely detection of shifts in the production data distribution.

Exam trap

Google Cloud often tests the distinction between 'recent predictions' and 'latest batch' — the trap is that candidates confuse a random sample (which is statistically valid for inference but not for drift detection) with the complete batch that Vertex AI requires for accurate distribution comparison.

How to eliminate wrong answers

Option A is wrong because using the entire historical prediction data would dilute recent drift signals with older, potentially stale distributions, making it harder to detect current drift and violating Vertex AI's requirement for a sliding window of recent predictions. Option B is wrong because a random sample of recent predictions lacks the systematic coverage of the full production traffic; Vertex AI Model Monitoring expects a complete batch to accurately compute per-feature drift metrics, and random sampling can miss localized drift patterns. Option D is wrong because validation data is a static holdout set from training time, not production data; comparing against it would measure generalization error, not data drift, and Vertex AI Model Monitoring is designed to compare against training data, not validation data.

Full explanation →

369

MCQhard

A company uses Vertex AI Pipelines to orchestrate their ML training workflow. The pipeline includes a BigQuery ML training step, a model evaluation step, and a deployment step to Vertex AI Endpoints. The engineer notices that the pipeline fails intermittently due to a quota exceeded error on Vertex AI Endpoints during model deployment. What is the best long-term solution to prevent this failure?

A.Run the pipeline steps sequentially with longer wait times.

B.Add retry logic with exponential backoff to the deployment step in the pipeline.

C.Switch to deploying models using a custom container on Compute Engine.

D.Request a permanent quota increase for Vertex AI Endpoints.

AnswerB

Handles transient quota errors gracefully without manual intervention.

Why this answer

Option D is correct because implementing retry logic with exponential backoff is a resilient pattern for transient quota errors. Option A is wrong because increasing quota requires a support ticket and may not be granted immediately. Option B is wrong because using a custom container does not address quota limits.

Option C is wrong because sequential execution does not prevent quota errors.

Full explanation →

370

MCQhard

A company uses Vertex AI Endpoints for model serving and wants to implement A/B testing between model versions. They need to gradually shift traffic from the old to the new version while monitoring performance. Which Vertex AI feature allows this with minimal operational overhead?

A.Using a custom load balancer with weighted backend services

B.Model Deployments with traffic splitting

C.Vertex AI Experiments for tracking

D.Cloud Run revisions with traffic migration

Why this answer

Option B is correct because Vertex AI Endpoints allow deploying multiple model versions to the same endpoint and setting a traffic split percentage that can be gradually adjusted. Option A is not a feature. Option C is possible but adds overhead.

Option D is for experiments, not serving.

Full explanation →

371

Multi-Selecthard

A company has a prototype ML model that predicts equipment failure. They want to deploy it to production using Vertex AI. The model must be retrained weekly with new data. They also need to monitor for data drift and model performance. Which THREE components should they include in their MLOps pipeline? (Choose 3)

Select 3 answers

A.A scheduled training pipeline that retrains the model weekly.

B.A manual QA step where data scientists approve each deployment.

C.A manual review of new data before it is used for training.

D.An automated trigger that redeploys the model when performance drops below a threshold.

E.A monitoring system that checks for data drift and triggers alerts.

AnswersA, D, E

Scheduled retraining is essential for keeping the model up-to-date.

Why this answer

Option A is correct because the requirement specifies weekly retraining, which is best implemented as a scheduled training pipeline in Vertex AI using Cloud Scheduler or a recurring AI Platform Pipeline run. This automates the retraining process without manual intervention, ensuring the model stays current with new data.

Exam trap

Google Cloud often tests the distinction between necessary manual oversight and fully automated MLOps practices, leading candidates to overestimate the need for human approval steps in a production pipeline that demands speed and scalability.

Full explanation →

372

MCQhard

A team is scaling their prototype inference model to handle high-throughput requests with low latency. They use a custom container on Vertex AI Prediction. They notice that latency spikes occur under heavy load. What is the most effective strategy?

A.Enable auto-scaling with a higher minimum number of replicas.

B.Optimize model serving with batching and model warm-up.

C.Use a larger machine type with more CPUs.

D.Use a GPU-based machine.

AnswerB

Batching reduces overhead per request; warm-up avoids cold start.

Why this answer

Option C is correct because optimizing model serving with batching and model warm-up reduces per-request overhead and ensures consistent latency. Option A is wrong because adding CPUs may not help if the bottleneck is model inference computation. Option B is wrong because auto-scaling doesn't reduce latency spikes; it adds replicas over time.

Option D is wrong because GPU may help but not specifically for latency spikes due to load variation.

Full explanation →

373

Multi-Selecthard

A team is serving a large language model (LLM) on Vertex AI using a custom container. They want to reduce tail latency. Which THREE strategies should they consider?

Select 3 answers

A.Increase the number of replicas.

B.Use dynamic batching to combine requests.

C.Implement response caching for common queries.

D.Quantize the model to INT8 to reduce computation.

E.Upgrade to a more powerful GPU type.

AnswersB, C, D

Improves GPU utilization and reduces per-request latency.

Why this answer

Dynamic batching (B) reduces tail latency by grouping multiple inference requests into a single batch, which improves GPU utilization and amortizes overhead across requests. This is particularly effective for LLMs because it allows the model to process more tokens per forward pass, reducing the per-request latency variance that contributes to tail latency.

Exam trap

The trap here is that candidates confuse scaling strategies (like increasing replicas or upgrading hardware) with latency-optimization techniques, failing to recognize that tail latency is primarily reduced by batching and caching, not by adding more compute resources.

Full explanation →

374

Matchingmedium

Match each feature engineering technique to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Convert categorical variable into binary columns

Combine two or more features to capture interactions

Normalize numeric features to a standard range

Group continuous values into discrete intervals

Weight term frequency by inverse document frequency

Why these pairings

Feature engineering is essential for model performance.

Full explanation →

375

MCQhard

A data engineering team wants to orchestrate an ML pipeline that includes data preprocessing in Dataflow, AutoML training, and model deployment. They want to minimize operational overhead. Which approach is best?

A.Use Cloud Composer with Apache Airflow DAG

B.Use AI Platform Training with script

C.Use Cloud Scheduler to trigger Cloud Functions

D.Use Vertex AI Pipelines with custom components

AnswerD

Correct: Purpose-built for ML workflows, minimal overhead.

Why this answer

Vertex AI Pipelines with custom components is the best choice because it provides a fully managed, serverless orchestration service that natively integrates with Dataflow, AutoML, and model deployment. This minimizes operational overhead by eliminating the need to manage infrastructure, handle retries, or maintain a separate orchestration server, while offering built-in artifact tracking and pipeline caching.

Exam trap

The trap here is that candidates often confuse 'orchestration' with 'scheduling' and pick Cloud Scheduler, failing to recognize that a multi-step ML pipeline requires workflow orchestration with dependencies and error handling, not just a time-based trigger.

How to eliminate wrong answers

Option A is wrong because Cloud Composer with Apache Airflow DAG requires managing a Kubernetes cluster, Airflow workers, and infrastructure, which increases operational overhead rather than minimizing it. Option B is wrong because AI Platform Training with a script only handles the training step in isolation, not the end-to-end orchestration of preprocessing, training, and deployment. Option C is wrong because Cloud Scheduler to trigger Cloud Functions is a simple time-based trigger that lacks the workflow orchestration capabilities (e.g., conditional branching, parallel steps, dependency management) needed for a multi-step ML pipeline.

Full explanation →

Google Professional Machine Learning Engineer (PMLE) — Questions 301–375