Google Professional Machine Learning Engineer PMLE Questions 601–675 | Page 9/14

601

MCQeasy

Which Vertex AI service allows you to discover, fine-tune, and deploy foundation models with a few clicks, including models like Llama and Gemma?

A.Vertex AI Prediction

B.Vertex AI Vizier

C.Vertex AI Model Garden

D.Vertex AI JumpStart

AnswerD

JumpStart offers one-click deployment and fine-tuning of foundation models.

Why this answer

Vertex AI JumpStart (now part of Model Garden) provides one-click deployment and fine-tuning of foundation models. Model Garden is the broader model discovery hub, but JumpStart is the feature for quick start.

Full explanation →

602

MCQeasy

A machine learning engineer wants to deploy a trained model to Vertex AI for online predictions. Which Vertex AI resource is required to serve the model and provide an endpoint URL?

A.Vertex AI Pipeline

B.Vertex AI Model Registry

C.Vertex AI Feature Store

D.Vertex AI Endpoint

AnswerD

Correct. An endpoint is required to deploy a model and obtain a URL for online predictions.

Why this answer

Vertex AI Endpoint is the required resource to deploy a trained model for online predictions, as it provides a dedicated endpoint URL that accepts prediction requests and routes them to the model. Without an endpoint, the model cannot be accessed via HTTP/HTTPS for real-time inference, which is the core requirement for online serving.

Exam trap

The trap here is that candidates confuse the Model Registry (which stores and versions models) with the actual serving infrastructure, assuming that registering a model automatically creates an endpoint, when in fact a separate Endpoint resource must be created and the model must be deployed to it.

How to eliminate wrong answers

Option A is wrong because Vertex AI Pipeline is used for orchestrating and automating ML workflows (e.g., training, evaluation), not for serving models or providing an endpoint URL. Option B is wrong because Vertex AI Model Registry is a central repository for managing model versions and metadata, but it does not itself expose an endpoint for predictions; models must be deployed to an endpoint for serving. Option C is wrong because Vertex AI Feature Store is designed for storing, serving, and sharing feature data for training and prediction, not for hosting models or providing inference endpoints.

Full explanation →

603

MCQhard

A team trains a distributed TensorFlow model using the config above. After training, they deploy the model for online predictions. The model returns poor quality predictions. They suspect that the model was not trained correctly due to a configuration error. What is the most likely mistake?

A.The `scaleTier` is set to 'STANDARD_1' which only supports up to 3 workers.

B.The training job is using a custom container that does not match the requirements.

C.The model was exported incorrectly because the training job did not specify a `--model-export-path`.

D.The parameter server count should be at least equal to the worker count.

AnswerA

STANDARD_1 limits workers to 3; the actual job may have ignored the 10 worker setting.

Why this answer

Option B is correct because 'STANDARD_1' scale tier is for small scale, max workers is 3. The config set 10 workers, which would be ignored or cause error. The training might have run with fewer workers, leading to poor model.

Option A: not required; option C: model-dir is fine; option D: not indicated.

Full explanation →

604

Multi-Selectmedium

An ML engineer needs to set up automated retraining triggered by data drift. They have decided to use Cloud Monitoring alerts to detect drift. Which TWO additional services are required to complete the retraining pipeline? (Choose 2)

Select 2 answers

A.Cloud Dataflow

B.Cloud Build

C.Cloud Scheduler

D.Vertex AI Pipeline

E.Cloud Functions

AnswersD, E

Correct: Vertex AI Pipeline orchestrates the training and deployment steps.

Why this answer

The typical architecture uses Cloud Monitoring alert -> Pub/Sub -> Cloud Functions -> Vertex AI Pipeline. Cloud Functions processes the alert and triggers the pipeline. Vertex AI Pipeline orchestrates training and deployment.

Full explanation →

605

Multi-Selecthard

An ML engineer is deploying a large BERT-based natural language processing model for real-time inference on Vertex AI Prediction. The model has a large memory footprint (2GB) and experiences unpredictable traffic spikes up to 10x the baseline. The engineer needs to minimize latency and cost while handling spiky traffic. Which TWO actions should the engineer take? (Choose two.)

Select 2 answers

A.Configure the endpoint to use manual scaling with a fixed number of replicas equal to peak traffic.

B.Enable automatic scaling with a maximum of 3 replicas to limit cost.

C.Use a custom prediction routine with model quantization to reduce model size.

D.Set up model monitoring to detect prediction drift and retrain regularly.

E.Use a GPU machine type (NVIDIA T4) to accelerate inference.

AnswersC, E

Quantization reduces model size and inference latency, improving both cost and speed.

Why this answer

Option C is correct because model quantization reduces the memory footprint of a BERT model (e.g., from 2GB to ~500MB with INT8 quantization), which directly lowers inference latency and cost by enabling faster loading and more efficient use of hardware. This is critical for real-time inference with unpredictable traffic spikes, as smaller models scale more easily and reduce the need for excessive replicas.

Exam trap

The trap here is that candidates often assume GPU acceleration (Option E) is always the best choice for reducing latency, but for a 2GB BERT model with spiky traffic, quantization (Option C) can achieve similar latency improvements at a fraction of the cost, and the question explicitly asks to minimize both latency and cost, making quantization a more balanced solution.

Full explanation →

606

MCQeasy

Your Vertex AI endpoint receives many identical prediction requests (same input features). You want to cache responses to reduce latency and cost. Which Google Cloud service should you use?

A.Cloud Memorystore for Redis

B.Cloud CDN

C.Bigtable

D.Cloud Storage with object versioning

AnswerA

Redis provides low-latency caching of serialized prediction responses keyed by request hash.

Why this answer

Cloud Memorystore (Redis) is ideal for caching prediction results. By hashing the request input, you can store and retrieve cached responses, avoiding redundant inference.

Full explanation →

607

MCQhard

You are an ML engineer at a large e-commerce company. Your team has developed a product recommendation model using TensorFlow and deployed it on Vertex AI Endpoints for real-time inference. The model is retrained weekly using a Vertex AI Pipeline that reads new user interaction data from BigQuery, trains the model, evaluates it, and deploys the new version to the endpoint with a traffic split: 10% to the new model and 90% to the previous champion model. Recently, the team noticed that the new model's online prediction latency has increased significantly (from 50ms to 200ms) after deployment, causing timeouts for some requests. The training code has not changed, and the model size is similar. The pipeline uses a custom container with the same TensorFlow Serving image as before. The deployment step uses the same machine type (n1-standard-4) for the endpoint. What is the most likely cause of the latency increase?

A.The endpoint is using a machine type that is not optimized for the new model's computation.

B.The new model has a significantly different architecture that requires more computation.

C.The pipeline now includes a data validation step that modifies the SavedModel's serving signature, adding an extra preprocessing operation.

D.The new model is experiencing data skew because the training data distribution has changed.

AnswerC

A data validation step might have inadvertently added preprocessing ops, increasing latency.

Why this answer

Option C is correct because the pipeline now includes a data validation step that modifies the SavedModel's serving signature, adding an extra preprocessing operation. This additional operation runs during inference on Vertex AI Endpoints, increasing the per-request latency from 50ms to 200ms, even though the model architecture and size remain unchanged. The custom container and machine type are identical, so the latency increase must stem from a change in the serving graph itself.

Exam trap

Google Cloud often tests the concept that changes in the ML pipeline (like adding a data validation step) can alter the serving signature and increase latency, even when the model architecture and infrastructure remain unchanged, tricking candidates into focusing on hardware or data distribution instead.

How to eliminate wrong answers

Option A is wrong because the endpoint uses the same machine type (n1-standard-4) as before, so the machine is not the cause of the latency increase. Option B is wrong because the training code has not changed and the model size is similar, indicating the architecture is not significantly different. Option D is wrong because data skew affects prediction accuracy, not latency; it does not explain a 4x increase in inference time.

Full explanation →

608

MCQmedium

The exhibit shows part of a Vertex AI Pipeline definition. The pipeline fails at the training step with an error: 'Missing required input: train_data'. What is the most likely cause?

A.The evaluation step expects a metric output but training does not produce it

B.The training step uses the wrong image tag

C.The container command for data_processing is incorrect

D.The data_processing step does not define any outputs

E.The pipeline is missing a deployment step

AnswerD

The pipeline must define an output from data_processing to feed into training.

Why this answer

The error 'Missing required input: train_data' indicates that the training step expects an input artifact named 'train_data', but no upstream step provides it. In Vertex AI Pipelines, a component's output must be explicitly defined and connected to the downstream component's input. Since the data_processing step does not define any outputs, it cannot produce the 'train_data' artifact, causing the training step to fail.

Exam trap

Google Cloud often tests the distinction between runtime errors (e.g., container image issues) and graph validation errors (e.g., missing input/output connections), leading candidates to confuse a missing output definition with a container or command misconfiguration.

How to eliminate wrong answers

Option A is wrong because the error is about a missing input, not a missing metric output; the evaluation step's expectations are irrelevant to the training step's input requirement. Option B is wrong because an incorrect image tag would cause a container runtime error (e.g., 'ImagePullBackOff'), not a 'Missing required input' error, which is a pipeline graph validation issue. Option C is wrong because an incorrect container command for data_processing would cause that step to fail, but the error specifically points to the training step's missing input, not a failure in data_processing.

Option E is wrong because a missing deployment step would not cause a training step input error; deployment occurs after training and evaluation, and its absence would not affect the training step's input requirements.

Full explanation →

609

Multi-Selecteasy

Which TWO of the following are best practices for versioning ML models and datasets?

Select 2 answers

A.Use Vertex AI Model Registry for model versioning and lineage tracking.

B.Use semantic versioning for datasets.

C.Store datasets and models in the same Cloud Storage bucket with version prefixes.

D.Use Git LFS for dataset versioning.

E.Use Cloud Data Catalog to tag dataset versions.

AnswersA, B

Model Registry is designed for model versioning and captures lineage.

Why this answer

Vertex AI Model Registry is a managed service that automatically tracks model versions, artifacts, and lineage metadata (e.g., training runs, evaluation metrics, and source datasets). It provides a centralized hub for model governance, enabling reproducibility and auditability without manual versioning overhead. This makes it a best practice for versioning ML models in a production MLOps workflow.

Exam trap

Google Cloud often tests the misconception that storing artifacts in the same bucket with version prefixes is sufficient for versioning, when in fact it lacks lineage tracking, automated metadata, and governance controls that dedicated registries and versioning schemes provide.

Full explanation →

610

MCQmedium

A financial services firm uses Vertex AI AutoML Natural Language to classify customer feedback into categories (positive, neutral, negative). They notice that the model performs poorly on neutral and negative classes, with high false negatives for negative. The dataset has 10,000 samples: 8,000 positive, 1,000 neutral, 1,000 negative. They have trained the model with automatic data split and default hyperparameters. Which course of action should they take to improve classification of minority classes?

A.Use a custom model with a weighted loss function.

B.Enable the 'weighted' option in AutoML NLP to handle class imbalance.

C.Increase the number of training node hours.

D.Set the data split to 50/25/25 for train/validation/test.

AnswerB

This built-in option adjusts weights for minority classes, improving performance.

Why this answer

Option B is correct because AutoML Natural Language provides a built-in 'weighted' option that automatically adjusts the loss function to penalize misclassifications of minority classes more heavily, directly addressing the class imbalance without requiring custom model development. This is the simplest and most effective way to improve recall for the neutral and negative classes within the AutoML framework.

Exam trap

Google Cloud often tests the misconception that any class imbalance problem requires a custom model or manual data augmentation, when in fact AutoML's built-in 'weighted' option is the prescribed low-code solution for such scenarios.

How to eliminate wrong answers

Option A is wrong because using a custom model with a weighted loss function would require moving away from AutoML's low-code paradigm, which contradicts the scenario's implicit requirement for a low-code solution; AutoML already handles weighting internally via the 'weighted' option. Option C is wrong because increasing training node hours only provides more compute time for the same training process and does not address the fundamental issue of class imbalance; it may lead to overfitting on the majority class. Option D is wrong because changing the data split ratio (e.g., 50/25/25) does not mitigate class imbalance; it merely redistributes the same skewed proportions across training, validation, and test sets, leaving the model still biased toward the majority positive class.

Full explanation →

611

MCQhard

A healthcare startup deployed a Vertex AI AutoML Vision model to detect anomalies in medical images. The model performs well on the test set but has high latency in production, exceeding the 2-second SLA. The images are stored in Cloud Storage and are processed via a Cloud Function triggered by new uploads. What is the most likely cause?

A.The images are being resized and preprocessed in the Cloud Function, adding latency.

B.The model is deployed on a small machine type with insufficient compute.

C.The Cloud Function has a cold start issue.

D.The AutoML Vision endpoint is not using GPU acceleration.

AnswerB

A small machine type (e.g., n1-standard-2) can cause high inference latency under load.

Why this answer

Option B is correct because the most likely cause of high latency exceeding the 2-second SLA is that the Vertex AI AutoML Vision model is deployed on a small machine type (e.g., n1-standard-2 or lower) with insufficient compute resources (CPU/memory). AutoML Vision endpoints use container-based serving, and underpowered machines cannot handle the inference load efficiently, especially for high-resolution medical images, leading to response times beyond the SLA.

Exam trap

The trap here is that candidates confuse cold start latency (Cloud Function) with inference latency (model serving), or assume GPU acceleration is optional for AutoML endpoints, when in fact AutoML Vision automatically uses GPUs and the real bottleneck is the compute capacity of the serving machine.

How to eliminate wrong answers

Option A is wrong because image resizing and preprocessing in the Cloud Function typically add minimal latency (milliseconds) and are not the primary cause of exceeding a 2-second SLA; the bottleneck is inference, not preprocessing. Option C is wrong because cold starts in Cloud Functions add 1-2 seconds at most and can be mitigated with min instances, but the question states the model performs well on the test set, implying the issue is inference latency, not function initialization. Option D is wrong because AutoML Vision endpoints automatically use GPU acceleration when available and appropriate; the lack of GPU is not a configurable option for AutoML endpoints, and the latency issue is more likely due to insufficient CPU/memory on the serving machine.

Full explanation →

612

Matchingmedium

Match each optimization algorithm to its characteristic.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Stochastic gradient descent with constant learning rate

Adaptive moment estimation with per-parameter learning rates

Root mean square propagation, adapts learning rate per parameter

Adaptive gradient algorithm, reduces learning rate for frequent features

Accelerates SGD by adding a fraction of previous update

Why these pairings

Optimizers differ in how they update learning rates: SGD uses a constant rate; Adam combines momentum and adaptive scaling; RMSProp uses gradient scaling. Common confusions include mixing up adaptive methods or misattributing momentum.

Full explanation →

613

MCQmedium

Refer to the exhibit. What is the purpose of this query?

A.To detect data drift

B.To find prediction errors in Cloud Logging

C.To count all prediction requests

D.To monitor model latency

AnswerB

The filter uses PredictionError type and ERROR severity.

Why this answer

The query filters Cloud Logging entries for the string 'prediction failed', which directly indicates prediction errors logged by the ML prediction service. This is a common pattern for monitoring model inference failures in production, not for measuring drift, counting requests, or measuring latency.

Exam trap

Google Cloud often tests the distinction between log-based monitoring (for errors) and metric-based monitoring (for counts, latency, drift), so candidates mistakenly choose 'count all prediction requests' when the query clearly filters for failures, not all requests.

How to eliminate wrong answers

Option A is wrong because data drift detection requires comparing feature distributions over time, not searching for error log messages. Option C is wrong because counting all prediction requests would require a metric like `prediction_count` or a log-based metric counting all prediction entries, not filtering for failures. Option D is wrong because monitoring model latency requires timing metrics (e.g., `prediction_latency_ms`) or log entries with duration fields, not a search for 'prediction failed'.

Full explanation →

614

MCQeasy

Your company runs a high-traffic web application that serves the same machine learning model prediction for many identical requests (e.g., product recommendations for the same user profile). You want to reduce latency and load on the prediction endpoint by caching responses. Which Google Cloud service should you use?

A.Cloud CDN

B.Cloud Memorystore

C.Cloud Spanner

D.BigQuery

AnswerB

Memorystore (Redis) provides low-latency caching for prediction responses.

Why this answer

Cloud Memorystore (B) is correct because it provides a managed in-memory cache (Redis or Memcached) that can store the results of identical prediction requests, reducing latency and load on the prediction endpoint. By caching responses keyed on the user profile or request parameters, subsequent identical requests can be served directly from Memorystore in microseconds, avoiding redundant model inference.

Exam trap

The trap here is that candidates confuse caching at the edge (CDN) with caching at the application layer (Memorystore), assuming any cache service works for dynamic API responses, but Cloud CDN cannot cache POST requests or application-specific payloads without significant configuration and still lacks the fine-grained key-value semantics needed for identical prediction requests.

How to eliminate wrong answers

Option A (Cloud CDN) is wrong because it caches static content (e.g., images, CSS) at edge locations, not dynamic API responses for identical requests; it cannot cache POST request payloads or application-level prediction results without complex workarounds. Option C (Cloud Spanner) is wrong because it is a globally distributed relational database designed for transactional consistency and high availability, not for low-latency caching of ephemeral prediction responses. Option D (BigQuery) is wrong because it is a serverless data warehouse for analytical queries on large datasets, not a caching layer for real-time inference results.

Full explanation →

615

MCQeasy

A data scientist wants to quickly experiment with a pre-trained Vision Transformer model from Hugging Face and fine-tune it on a custom dataset using Vertex AI. They want to use a managed environment with minimal setup. Which Vertex AI service should they use?

A.Vertex AI Prediction

B.Vertex AI Workbench

C.Vertex AI JumpStart

D.AI Platform Training

AnswerC

JumpStart offers pre-built models and ML solutions that can be deployed and fine-tuned with minimal effort.

Why this answer

Vertex AI JumpStart is the correct choice because it provides a managed environment with pre-built, optimized containers for popular models like Vision Transformers, enabling one-click deployment and fine-tuning with minimal setup. It abstracts away infrastructure management, allowing the data scientist to quickly experiment without configuring custom training scripts or environments.

Exam trap

Cisco often tests the distinction between managed services (JumpStart) and semi-managed environments (Workbench), where candidates mistakenly choose Workbench for its notebook interface, overlooking that JumpStart offers a more streamlined, pre-configured path for quick experimentation.

How to eliminate wrong answers

Option A is wrong because Vertex AI Prediction is designed for serving trained models for inference, not for training or fine-tuning; it lacks the capability to run training jobs. Option B is wrong because Vertex AI Workbench is a Jupyter-based notebook environment that requires manual setup of dependencies and infrastructure, which contradicts the 'minimal setup' requirement. Option D is wrong because AI Platform Training is a legacy service that has been superseded by Vertex AI; it requires more manual configuration and does not offer the same level of pre-built, managed integration for Hugging Face models as JumpStart.

Full explanation →

616

Multi-Selectmedium

You need to deploy a model that requires a large amount of memory (over 200 GB) for inference. The model is a custom PyTorch model. Vertex AI endpoints have machine type limitations. Which TWO actions can you take to handle this memory requirement? (Choose 2 correct answers)

Select 2 answers

A.Use a machine type from the n1-highmem series, such as n1-highmem-32 (208 GB) or higher.

B.Use multiple replicas and split the model across them.

C.Use a custom container to load the model and optimize its memory footprint.

D.Use batch prediction instead of online prediction.

E.Deploy the model on Cloud Run with 32 GB memory.

AnswersA, C

n1-highmem provides high memory per CPU, with sizes up to 416 GB.

Why this answer

Option A is correct because the n1-highmem-32 machine type provides 208 GB of memory, which meets the requirement of over 200 GB. Vertex AI endpoints support this machine series, allowing you to deploy a custom PyTorch model with sufficient RAM for inference without modification.

Exam trap

Cisco often tests the misconception that model parallelism across replicas is a valid strategy for memory constraints, but in Vertex AI, replicas are stateless and cannot share model weights for a single inference request.

Full explanation →

617

MCQmedium

A company has a large dataset of labeled images (e.g., different species of plants). They want to train a custom image classification model with minimal effort and no prior ML experience. Which Google Cloud service should they use?

A.Cloud TPU

B.AutoML Vision

C.Vertex AI Workbench with a custom TensorFlow model

D.Vision API

AnswerB

AutoML Vision allows training custom image classification models with a simple UI, no coding required.

Why this answer

AutoML Vision is the correct choice because it allows users with no prior ML experience to train a custom image classification model using a simple graphical interface, requiring only labeled images as input. It automates model architecture search, hyperparameter tuning, and deployment, minimizing manual effort while delivering a production-ready model.

Exam trap

The trap here is that candidates confuse AutoML Vision (custom model training with minimal effort) with Vision API (pre-trained, no custom training), often picking D because both involve 'Vision' and seem low-code, but Vision API cannot be retrained on custom data.

How to eliminate wrong answers

Option A is wrong because Cloud TPU is a hardware accelerator for training custom models, requiring users to write and manage their own ML code (e.g., TensorFlow/PyTorch), which demands significant ML expertise and effort. Option C is wrong because Vertex AI Workbench with a custom TensorFlow model requires users to write, debug, and train a model from scratch using notebooks, which is not minimal effort and assumes ML experience. Option D is wrong because Vision API is a pre-trained API for general image recognition tasks (e.g., label detection, OCR) and cannot be trained on custom labeled datasets like plant species; it offers no customization for specific classification needs.

Full explanation →

618

Multi-Selecteasy

Your team deploys a model using Vertex AI Endpoints with autoscaling. Which TWO metrics are most important to monitor in order to optimize cost and performance? (Choose two.)

Select 2 answers

A.Number of active nodes in the endpoint.

B.Number of requests per minute.

C.CPU utilization of the serving containers.

D.Error rate (HTTP 4xx/5xx).

E.P99 prediction latency.

AnswersB, C

Indicates traffic patterns.

Why this answer

Option B is correct because the number of requests per minute directly drives autoscaling behavior in Vertex AI Endpoints. Monitoring this metric allows you to right-size the number of serving nodes to match traffic patterns, avoiding over-provisioning (cost) or under-provisioning (performance). Option C is correct because CPU utilization of the serving containers indicates whether the model is compute-bound or idle; high CPU suggests the need for more nodes, while low CPU suggests over-provisioning, directly impacting both cost and latency.

Exam trap

Google Cloud often tests the distinction between metrics that are direct inputs to autoscaling decisions (requests per minute, CPU utilization) versus metrics that are outcomes of scaling (active nodes, latency, error rate), leading candidates to mistakenly select outcome metrics as primary optimization drivers.

Full explanation →

619

Multi-Selecteasy

A data analyst wants to build a binary classification model using a low-code ML solution on Google Cloud. The dataset is stored in BigQuery and contains 500,000 rows with 20 features, including categorical and numerical columns. The analyst has minimal coding experience and needs to deploy the model as an API endpoint for real-time predictions. Which two Google Cloud services should the analyst use to accomplish this task with minimal code? Choose two options.

Select 2 answers

A.BigQuery ML

B.Vertex AI Endpoints

C.Cloud Functions

D.Vertex AI Workbench

E.AutoML Tables

AnswersB, E

Vertex AI Endpoints provides a serverless option to deploy trained models as REST APIs with autoscaling, ideal for real-time predictions without code.

Why this answer

Vertex AI Endpoints is correct because it provides a managed service to deploy trained models as REST API endpoints for real-time predictions with minimal code. The analyst can deploy an AutoML Tables model directly to a Vertex AI Endpoint, enabling low-code deployment and serving.

Exam trap

Google Cloud often tests the distinction between model training services (BigQuery ML, AutoML Tables) and model deployment services (Vertex AI Endpoints), leading candidates to incorrectly select BigQuery ML for real-time API deployment when it only supports batch inference.

Full explanation →

620

MCQeasy

A company wants to transcribe customer service calls in real-time to detect sentiment and identify urgent issues. They need a solution with low latency. Which combination of pre-built APIs should they use?

A.Text-to-Speech and Natural Language API

B.Speech-to-Text and Translation API

C.Video Intelligence API

D.Speech-to-Text and Natural Language API

AnswerD

Speech-to-Text transcribes audio, then Natural Language API performs sentiment analysis on the text.

Why this answer

Speech-to-Text can transcribe audio in real-time, and Natural Language API can analyze sentiment from the transcribed text. Text-to-Speech is for generating speech. Video Intelligence is for video content.

Full explanation →

621

MCQhard

A team uses Vertex AI Pipelines with CustomJob components that pull training code from a Cloud Source Repository. The pipeline fails with a 'Permission denied' error when trying to access the repository. The service account used by the pipeline has the 'Source Repository Viewer' role. What is the likely issue?

A.The training code contains a dependency that is not available in the custom container

B.The 'Source Repository Viewer' role is insufficient; the service account needs 'Source Repository Reader' or higher

C.The pipeline is running in a different project than the repository; cross-project access is not supported

D.The repository URL is incorrectly formatted; use the SSH URL instead of HTTPS

AnswerB

Reader role allows cloning and fetching, while Viewer only allows browsing.

Why this answer

The 'Source Repository Viewer' role only allows listing and viewing repository metadata, not reading the actual source code. To clone or pull code from a Cloud Source Repository, the service account needs the 'Source Repository Reader' role (or higher), which grants the `source.repos.get` and `source.repos.read` permissions required for Git operations. The pipeline's CustomJob component fails because the service account lacks these permissions when attempting to access the repository.

Exam trap

Google Cloud often tests the distinction between IAM roles that grant read-only access to metadata versus those that grant actual data access, leading candidates to assume 'Viewer' is sufficient for reading source code.

How to eliminate wrong answers

Option A is wrong because a missing dependency would cause a runtime error during training, not a 'Permission denied' error when accessing the repository. Option C is wrong because cross-project access to Cloud Source Repositories is fully supported as long as the service account has the appropriate IAM roles on the repository's project. Option D is wrong because both HTTPS and SSH URLs are supported for Cloud Source Repositories; the error is a permissions issue, not a URL format issue.

Full explanation →

622

Multi-Selecthard

A team wants to deploy a model on Vertex AI Edge Manager for offline inference on edge devices. Which three steps are required? (Choose 3)

Select 3 answers

A.Enable Vertex AI Model Monitoring

B.Package the model and deploy using Vertex AI Edge Manager

C.Convert the model to TensorFlow Lite or ONNX format

D.Upload the model to Vertex AI Model Registry

E.Create a Vertex AI Endpoint with HTTP endpoint

AnswersB, C, D

Edge Manager handles packaging and pushing to edge devices.

Why this answer

The model must be in a format compatible with edge devices (TFLite/ONNX), and Edge Manager requires packaging and deploying the model to the device.

Full explanation →

623

MCQeasy

A company uses Cloud Composer to orchestrate their ML pipelines. They notice that tasks are being queued but not executed, causing delays. What is the most likely cause?

A.The Airflow web server is down

B.The DAG file is corrupted

C.The Cloud Storage bucket containing DAGs is not accessible

D.The Airflow worker resources are exhausted

AnswerD

If workers are busy or the cluster is under-provisioned, tasks will be queued.

Why this answer

When tasks are queued but not executed, it typically indicates that the Airflow workers have no available slots to pick up new tasks. In Cloud Composer, the Celery executor distributes tasks to workers; if all worker concurrency slots are saturated or the worker node pool is under-provisioned, tasks remain in the 'queued' state until a worker becomes free. This is the most likely cause given the symptom of tasks being queued without execution.

Exam trap

The trap here is that candidates confuse the roles of Airflow components (web server, scheduler, worker) and assume a UI or DAG access issue causes queued tasks, when in reality the worker capacity is the bottleneck.

How to eliminate wrong answers

Option A is wrong because the Airflow web server is responsible for the UI and DAG parsing, not for executing tasks; if it were down, the UI would be inaccessible but tasks could still be queued and executed by workers. Option B is wrong because a corrupted DAG file would cause a parse error, preventing the DAG from being scheduled or appearing in the UI, not leaving tasks in a queued state. Option C is wrong because if the Cloud Storage bucket containing DAGs were not accessible, the DAGs would not be synced to the Airflow environment at all, resulting in missing DAGs rather than queued tasks.

Full explanation →

624

MCQeasy

You are deploying a model to a Vertex AI endpoint and need to minimize latency for online predictions. Which machine type should you choose?

A.n1-standard-2 with NVIDIA Tesla T4

B.e2-standard-2

C.n1-standard-2

D.n1-highmem-2

AnswerA

GPUs accelerate inference, reducing latency for deep learning models.

Why this answer

GPU-enabled machines (e.g., n1-standard-2 with NVIDIA Tesla T4) accelerate compute-heavy models, reducing prediction latency significantly compared to CPU-only instances.

Full explanation →

625

Multi-Selecthard

Which THREE actions should be taken to manage model versions effectively?

Select 3 answers

A.Delete old versions immediately

B.Use Vertex AI Model Registry

C.Set up model evaluation alerts

D.Use the same model name for all versions

E.Assign version aliases like 'champion' and 'experiment'

AnswersB, C, E

Model Registry provides versioning and deployment control.

Why this answer

Vertex AI Model Registry is a centralized repository that tracks, versions, and manages ML models. It enables you to organize models, assign aliases (like 'champion' or 'experiment'), and control deployment, ensuring reproducibility and governance across the model lifecycle.

Exam trap

Google Cloud often tests the misconception that deleting old versions is a best practice for storage optimization, when in reality versioning requires retaining history for reproducibility and rollback, and that aliases are the correct mechanism for labeling model stages.

Full explanation →

626

MCQeasy

A data scientist has deployed a classification model on a Vertex AI Endpoint and wants to monitor for feature drift in the serving data compared to the training data. Which Vertex AI service should be used?

A.Vertex AI Explainable AI

B.Vertex AI Model Evaluation

C.Vertex AI Continuous Training

D.Vertex AI Model Monitoring

AnswerD

Correct service for monitoring feature drift and skew.

Why this answer

Vertex AI Model Monitoring is specifically designed to monitor deployed models for feature skew, feature drift, and prediction drift. It supports algorithms like Jensen-Shannon divergence and Population Stability Index.

Full explanation →

627

MCQhard

A company serves a PyTorch model using a custom container on Vertex AI Prediction. They notice that after a few hours, the endpoint returns 502 errors. The logs show 'Out of memory' errors. The container has a memory limit of 4GB, and the model loads a 3GB vocabulary file. What is the most likely cause and best fix?

A.Increase the container memory to 8GB.

B.Load the vocabulary file once at startup and reuse it.

C.Increase the number of replicas to distribute load.

D.Switch to Vertex AI Batch Prediction.

AnswerB

Prevents repeated loading, solving OOM.

Why this answer

The 502 errors and 'Out of memory' errors indicate that the container is running out of memory during inference. Since the model loads a 3GB vocabulary file, and the container has only 4GB of memory, loading this file repeatedly for each prediction request (e.g., inside the prediction handler) would quickly exhaust memory. The correct fix is to load the vocabulary file once at container startup and reuse it across all requests, which is a standard best practice for serving models with large static assets.

Exam trap

Google Cloud often tests the misconception that OOM errors are always solved by increasing memory, but the trap here is that the real issue is inefficient resource reuse—loading a large file per request—rather than insufficient total memory.

How to eliminate wrong answers

Option A is wrong because simply increasing memory to 8GB does not address the root cause—the vocabulary file is being loaded repeatedly, which will still cause memory bloat and eventual OOM errors, just at a higher threshold. Option C is wrong because increasing replicas distributes incoming traffic but does not fix the per-container memory leak caused by repeated vocabulary loading; each replica would still suffer the same OOM issue. Option D is wrong because switching to Batch Prediction is for offline, asynchronous processing, not for real-time serving, and does not solve the memory management problem within the container.

Full explanation →

628

MCQmedium

You are fine-tuning a pre-trained BERT model from Hugging Face on a custom text classification dataset using Vertex AI Training. You want to speed up training by using mixed precision. What should you do?

A.Modify the model to use half-precision layers

B.Use a custom container with TensorFlow instead of PyTorch

C.Enable mixed precision via Vertex AI hyperparameter tuning

D.Set fp16=True in the TrainingArguments

AnswerD

This enables mixed precision in Hugging Face Trainer.

Why this answer

Hugging Face Trainer supports mixed precision via the fp16 argument. Set it to True in the TrainingArguments. No need to modify the model architecture or use a custom container.

Full explanation →

629

Multi-Selectmedium

Your team manages multiple ML models on Vertex AI. You need to implement a centralized monitoring solution to track model performance over time. Which TWO approaches should you consider? (Choose two.)

Select 2 answers

A.Store all prediction logs in BigQuery and analyze using SQL.

B.Use Cloud Source Repositories to track model code versions.

C.Create Cloud Monitoring dashboards and alerts based on Vertex AI metrics.

D.Use Vertex AI Model Monitoring to detect training-serving skew and feature drift for each model.

E.Enable Cloud Billing budgets to track cost per model.

AnswersC, D

Centralized view of all models.

Why this answer

Option C is correct because Cloud Monitoring provides centralized dashboards and alerting for Vertex AI metrics such as prediction latency, request count, and error rates, enabling you to track model performance over time without additional infrastructure. Option D is correct because Vertex AI Model Monitoring is purpose-built to detect training-serving skew and feature drift by comparing serving data distributions to training data, which is essential for maintaining model performance in production.

Exam trap

The trap here is that candidates may confuse logging (Option A) or cost tracking (Option E) with performance monitoring, or mistakenly think version control (Option B) is part of monitoring, when the question specifically asks for centralized monitoring of model performance over time.

Full explanation →

630

MCQhard

A company uses Vertex AI Experiments to track ML training runs. They want to enforce that all training runs use only approved libraries from a central Artifact Registry to ensure compliance. Which approach should they take?

A.Use a startup script in the training VM to install libraries from Artifact Registry.

B.Use Vertex AI Pipelines with a component that pulls libraries from Artifact Registry at runtime.

C.Create a custom Vertex AI training container that installs libraries from Artifact Registry at build time and restrict training job submission to that container using IAM.

D.Configure Vertex AI Training with a custom job configuration that specifies the library sources.

E.Use Cloud Build to build the training image with approved libraries and push to Container Registry, then restrict training jobs to that image.

AnswerC

This encapsulates libraries in the container and controls usage.

Why this answer

Option C is correct because it enforces compliance at the image level: by building a custom container that installs only approved libraries from Artifact Registry at build time, and then restricting training job submission to that specific container using IAM, you ensure that no unauthorized libraries can be introduced at runtime. This approach eliminates the risk of developers injecting unapproved dependencies via startup scripts or runtime pulls, and it aligns with the principle of immutable infrastructure for ML training.

Exam trap

The trap here is that candidates confuse runtime library installation (options A, B, D) with build-time image hardening (option C), overlooking that only a pre-built, IAM-restricted container can truly prevent unauthorized dependencies from being loaded during training.

How to eliminate wrong answers

Option A is wrong because a startup script runs after the VM starts, allowing users to modify or override the library list at runtime, which does not enforce compliance. Option B is wrong because pulling libraries from Artifact Registry at runtime still permits dynamic changes to dependencies, and the pipeline component itself could be modified to pull from other sources. Option D is wrong because a custom job configuration only specifies library sources as metadata; it does not prevent the training job from installing additional or different libraries during execution.

Option E is wrong because it pushes the image to Container Registry (now deprecated in favor of Artifact Registry) and does not restrict training jobs to that image via IAM—any user with permissions could submit a job using a different image.

Full explanation →

631

Multi-Selecthard

An ML engineer is troubleshooting why a Vertex AI Endpoint is returning high prediction latency. They have enabled request/response logging and see that some requests take >1 second while most are fast. Which THREE actions should they take to diagnose the issue?

Select 3 answers

A.Check CPU and GPU utilisation metrics on the endpoint.

B.Review the request/response logs in BigQuery to identify if large payloads correlate with high latency.

C.Disable Vertex AI Model Monitoring to reduce overhead.

D.Check the p99 latency metric in Cloud Monitoring.

E.Increase the number of replicas to resolve the issue immediately.

AnswersA, B, D

High utilisation can indicate resource bottleneck.

Why this answer

Check Cloud Monitoring metrics for p99 latency trends, CPU/GPU utilisation to identify bottlenecks, and review request/response logs for patterns (e.g., large payloads causing slow inference).

Full explanation →

632

MCQhard

An organization needs to implement MLOps with standardized pipeline templates across multiple teams. Which Vertex AI feature should they use to create reusable pipeline components?

A.Vertex AI Pipelines

B.Vertex AI Experiments

C.Vertex AI Metadata

D.Vertex AI Workbench

AnswerA

Pipelines support reusable components and templates.

Why this answer

Vertex AI Pipelines allows creation of reusable pipeline components and templates that can be shared across teams, enabling standardization.

Full explanation →

633

MCQhard

A financial services company uses Vertex AI Pipelines to train and deploy models for fraud detection. The ML team consists of data scientists who develop models and ML engineers who deploy them. They use a CI/CD pipeline with Cloud Build to build and push Docker images to Artifact Registry, then trigger Vertex AI Pipelines. Recently, the team noticed that a model deployed to production was trained on a dataset that had not been approved by the data governance team. Upon investigation, they found that a data scientist accidentally used an unapproved version of the training data by specifying a Cloud Storage path that was not the latest approved dataset. The company needs to enforce that only approved datasets are used in training jobs. Which approach should they take?

A.Implement a manual approval process where data scientists request dataset paths from the data governance team before each training run.

B.After training, run a validation step that checks if the dataset used matches the latest approved version, and roll back if not.

C.Use a curated dataset registry in BigQuery or Cloud Storage with IAM conditions that allow access only to datasets tagged as 'approved'. Modify the CI/CD pipeline to pass only approved dataset references to the training job.

D.Restrict all Cloud Storage buckets to be read-only for the data scientists, and have ML engineers copy approved datasets to a separate bucket.

AnswerC

This automates governance by restricting training to approved datasets via IAM and pipeline configuration.

Why this answer

Option C is correct because it enforces governance at the source by using IAM conditions to restrict access to only approved datasets, preventing unauthorized data from being used in training. This approach integrates with the CI/CD pipeline to automatically pass only approved dataset references, eliminating the risk of human error in specifying Cloud Storage paths.

Exam trap

Google Cloud often tests the distinction between reactive validation (Option B) and proactive enforcement (Option C), where candidates mistakenly choose a post-training check that wastes resources instead of a preventive IAM-based control.

How to eliminate wrong answers

Option A is wrong because a manual approval process is error-prone, slow, and does not scale; it relies on human compliance rather than automated enforcement, leaving the system vulnerable to accidental misuse. Option B is wrong because it is reactive—it detects the issue after training has already occurred, wasting compute resources and potentially exposing the model to unapproved data before rollback. Option D is wrong because it restricts data scientists' access entirely, which hinders their ability to experiment and develop models; it also shifts the burden to ML engineers without addressing the root cause of dataset version control.

Full explanation →

634

MCQmedium

A team of ML engineers is collaborating on a project using Vertex AI. They want to ensure that only approved models are deployed to production. Which approach should they use?

A.Store all models in a Cloud Storage bucket and manually control access via IAM permissions.

B.Deploy models directly from training jobs to an endpoint without version tracking.

C.Use Vertex AI Model Registry with version aliases to manage model versions and promote them after approval.

D.Use Cloud Dataflow to transform raw predictions and then store them in BigQuery for analysis.

AnswerC

Model Registry provides version control, staging, and alias-based deployment.

Why this answer

Vertex AI Model Registry provides a centralized repository for managing model versions, with support for version aliases (e.g., 'champion', 'challenger') that allow teams to promote models to production only after approval. This ensures governance and traceability, meeting the requirement that only approved models are deployed.

Exam trap

The trap here is that candidates may confuse storage access control (IAM) with model lifecycle governance, or assume that any data pipeline tool (Dataflow) can manage model approvals, when in fact only a dedicated model registry with version aliases provides the required approval workflow and traceability.

How to eliminate wrong answers

Option A is wrong because storing models in Cloud Storage with manual IAM control lacks version tracking, approval workflows, and integration with Vertex AI's deployment services, making it error-prone and unscalable for production governance. Option B is wrong because deploying directly from training jobs without version tracking bypasses model validation, approval gates, and rollback capabilities, violating the requirement for controlled production deployments. Option D is wrong because Cloud Dataflow is a data processing service for stream/batch pipelines, not a model management or approval mechanism; it is irrelevant to controlling which models are deployed.

Full explanation →

635

MCQeasy

A company deploys an online prediction model serving 100 requests per second. They are optimizing for both latency and throughput. Which monitoring strategy should they use?

A.Monitor only the request count and set an alert if it drops below a threshold.

B.Set a single alert on the 99th percentile latency and ignore throughput since it's already high.

C.Monitor the error rate and set an alert if it exceeds 1%.

D.Monitor both the p50 and p99 latency, and the request count. Create a dashboard showing latency vs. throughput at different load levels.

AnswerD

Allows understanding of the relationship.

Why this answer

Option D is correct because monitoring both p50 and p99 latency alongside request count provides a comprehensive view of system performance under load. Latency percentiles reveal tail behavior (p99) and typical user experience (p50), while request count tracks throughput. A dashboard correlating latency vs. throughput at different load levels is essential for identifying performance cliffs or degradation before failures occur, aligning with best practices for production ML inference systems.

Exam trap

The trap here is that candidates often focus on a single metric (e.g., error rate or p99 latency) and overlook the need for multi-metric correlation, especially the latency-throughput trade-off, which is a core concept in monitoring ML systems under production load.

How to eliminate wrong answers

Option A is wrong because monitoring only request count and alerting on a drop below threshold ignores latency and error rate, missing critical issues like increased response times or silent failures that degrade user experience without reducing request count. Option B is wrong because setting a single alert on p99 latency and ignoring throughput neglects the trade-off between latency and throughput; high throughput can mask latency spikes, and p99 alone does not capture system capacity limits or performance under varying load. Option C is wrong because monitoring only error rate and alerting on 1% misses latency degradation and throughput drops; a system can have low error rates but high latency (e.g., due to queue buildup) or reduced throughput, both of which violate performance objectives.

Full explanation →

636

Multi-Selectmedium

Which TWO of the following are best practices for managing data in a collaborative machine learning environment on Google Cloud?

Select 2 answers

A.Always replicate data across multiple regions to ensure low latency.

B.Implement fine-grained access control using IAM conditions.

C.Use Cloud Data Catalog to discover and annotate datasets.

D.Store all raw data in a single Cloud Storage bucket for easy access.

E.Use data versioning with tools like DVC or Dataflow to track changes.

AnswersC, E

Data Catalog aids in data governance and collaboration.

Why this answer

Option C is correct because Cloud Data Catalog provides a managed metadata management service that allows teams to discover, annotate, and manage datasets across Google Cloud. It enables data scientists to search for datasets by tags, descriptions, and schema, which is essential for collaboration and data governance in a multi-user ML environment.

Exam trap

Google Cloud often tests the misconception that 'replication equals performance' or that 'single bucket simplicity is best,' when in reality collaborative ML requires discoverability (Data Catalog) and reproducibility (versioning) over raw storage or access control alone.

Full explanation →

637

MCQmedium

A company deploys a model on Vertex AI Prediction with autoscaling enabled. They notice that during a traffic spike, new instances take several minutes to become available, causing high latency. What is the best solution?

A.Disable autoscaling and use a fixed number of replicas

B.Increase the max replicas setting

C.Decrease the machine type to reduce provisioning time

D.Set a higher min replicas to maintain a baseline of warm instances

AnswerD

Warm instances reduce latency during spikes.

Why this answer

Option D is correct because setting a higher min replicas ensures that a baseline number of instances are always warm and ready to serve traffic. During a traffic spike, new instances still take time to provision (cold start), but the warm instances handle the initial surge without latency spikes. This directly addresses the observed high latency during spikes.

Exam trap

Google Cloud often tests the misconception that increasing max replicas or decreasing machine type solves cold-start latency, when the real solution is maintaining a warm baseline via min replicas.

How to eliminate wrong answers

Option A is wrong because disabling autoscaling and using a fixed number of replicas eliminates elasticity, leading to either over-provisioning (cost) or under-provisioning (latency) during variable traffic. Option B is wrong because increasing max replicas only raises the ceiling for scaling out; it does not reduce the cold-start provisioning time for new instances during a spike. Option C is wrong because decreasing the machine type reduces compute capacity per instance, which can increase latency under load, and does not meaningfully reduce provisioning time (which is dominated by container image pull and model loading, not machine type).

Full explanation →

638

Drag & Dropmedium

Drag and drop the steps to create and deploy a custom ML model on Vertex AI using a container in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

First, build and push the container, then register the model, deploy to an endpoint, and finally test.

Full explanation →

639

MCQmedium

An organization wants to use Vertex AI JumpStart to fine-tune a foundation model for a custom classification task. They have a labeled dataset stored in BigQuery. Which steps should they take?

A.Export the data from BigQuery to CSV files in Cloud Storage, then upload to JumpStart.

B.Write a custom training script using PyTorch and submit to Vertex AI Training.

C.Use the Vertex AI console to select a model from JumpStart, specify the BigQuery table as the source, and launch fine-tuning.

D.Use Vertex AI Model Garden to deploy the model directly without fine-tuning.

AnswerC

JumpStart integrates with BigQuery for data input.

Why this answer

JumpStart fine-tuning typically involves selecting a model, connecting to your dataset (often via BigQuery), configuring training, and launching. The process is largely GUI-based.

Full explanation →

640

MCQhard

Your team has deployed a PyTorch model using a custom container on Vertex AI Prediction. The model uses dynamic batching to combine incoming requests. You notice that the average latency is 150 ms, but the 99th percentile latency is 2 seconds. Cloud Monitoring shows that the CPU is idle much of the time, and GPU utilization is around 70%. The model is deployed on a single n1-standard-4 with a T4 GPU. You suspect the issue is related to request queuing. Which change would most effectively reduce tail latency?

A.Add a second replica to share the load.

B.Increase the batch timeout to allow larger batches to form, reducing the number of batches.

C.Decrease the batch size to reduce processing time per batch.

D.Implement a priority queue to handle high-priority requests first.

AnswerA

More replicas reduce queue depth and tail latency.

Why this answer

Option C is correct because adding a replica reduces the queue length per replica, thus reducing waiting time for requests. Option A might increase tail latency if timeout is too long. Option B could reduce processing time but not queuing delay.

Option D adds complexity and doesn't address root cause.

Full explanation →

641

MCQmedium

A company uses Vertex AI Pipelines for model training and deployment. The pipeline includes a model evaluation step that produces metrics. If the metrics are below a threshold, the pipeline should fail and not deploy. Which component should they use?

A.Use a conditional operator in the pipeline to skip or fail based on metrics.

B.A Python component that uses the SDK to raise an exception if metrics are low.

C.A Vertex AI Model Evaluation component configured with a threshold.

D.Use Cloud Monitoring to trigger an alert and manually stop deployment.

E.A custom container that returns a non-zero exit code on failure.

AnswerA

Conditionals are the standard way to control pipeline flow based on data.

Why this answer

Option A is correct because Vertex AI Pipelines supports conditional execution via the `Condition` component or `if/else` operators within the pipeline DAG. This allows you to evaluate model metrics (e.g., accuracy, AUC) and, if they fall below a defined threshold, either skip the deployment step or explicitly fail the pipeline using `PipelineTask.fail()` or a conditional branch that raises an error. This is the native, declarative way to control pipeline flow based on evaluation results without relying on external services or manual intervention.

Exam trap

The trap here is that candidates confuse raising an exception in a component (Option B) with pipeline-level conditional failure, not realizing that exceptions may not propagate correctly in a distributed pipeline and that Vertex AI Pipelines provides explicit conditional operators for this exact purpose.

How to eliminate wrong answers

Option B is wrong because raising an exception inside a Python component using the SDK does not cleanly fail the pipeline in a controlled, observable manner; it may cause the component to retry or hang depending on the pipeline's error handling configuration, and it bypasses the pipeline's built-in conditional logic. Option C is wrong because Vertex AI Model Evaluation component does not have a configurable threshold that automatically fails the pipeline; it only produces evaluation metrics, and the threshold logic must be implemented separately (e.g., via a conditional). Option D is wrong because Cloud Monitoring alerts are for observability and manual intervention, not for programmatically failing a pipeline; this approach introduces latency and human error, and does not integrate with Vertex AI Pipelines' native failure mechanisms.

Option E is wrong because a custom container returning a non-zero exit code will cause the pipeline step to fail, but it does not provide a way to conditionally fail based on metrics without additional logic inside the container; moreover, it is less maintainable and less transparent than using a built-in conditional operator.

Full explanation →

642

MCQmedium

A team wants to implement continuous training for their ML model. The pipeline should be triggered when new training data arrives in a Cloud Storage bucket. Which combination of services should they use?

A.Cloud Storage → BigQuery → Vertex AI Pipelines

B.Cloud Storage → Cloud Functions → Vertex AI Pipelines

C.Cloud Storage → Cloud Build → Vertex AI Pipelines

D.Cloud Storage → Cloud Scheduler → Vertex AI Pipelines

AnswerB

Cloud Storage can send notifications to Cloud Functions (via Pub/Sub) which then starts the pipeline.

Why this answer

Event-driven triggers for Vertex AI pipelines are typically implemented using Cloud Storage notifications to Pub/Sub, which triggers a Cloud Function that creates a pipeline job. Cloud Build is for CI/CD, not for event-driven data triggers.

Full explanation →

643

MCQeasy

A company wants to monitor fairness of a model by evaluating performance metrics across demographic subgroups. They have ground truth labels stored in BigQuery. Which Vertex AI service should they use?

A.Vertex AI Model Monitoring

B.Vertex AI Prediction

C.Vertex AI Explainability

D.Vertex AI Model Evaluation

AnswerD

Model Evaluation's sliced evaluation is designed for fairness assessment.

Why this answer

Vertex AI Model Evaluation supports sliced evaluation, allowing you to compute metrics per subgroup (e.g., by demographic) using data in BigQuery.

Full explanation →

644

MCQmedium

A team uses Vertex AI Workbench notebooks for collaborative model development. They want to ensure that code changes are version-controlled, that multiple data scientists can work on the same notebook without conflicts, and that the environment is reproducible across team members. Which approach should they take?

A.Use a shared JupyterLab instance launched on a single VM; data scientists connect simultaneously.

B.Use Vertex AI Workbench managed notebooks with Git integration and a custom container image for environment reproducibility.

C.Store notebooks in Cloud Storage and share the bucket; each user edits their own copy.

D.Use Vertex AI Pipelines to run all code as pipelines; data scientists only view results in notebooks.

AnswerB

Git integration provides version control; custom containers ensure consistent environments across team members.

Why this answer

Vertex AI Workbench supports Git integration for version control. Using a user-managed notebook with a custom container image (Docker) ensures environment reproducibility. This allows multiple data scientists to clone the same repo, make changes in isolation, and merge code.

The managed notebook option also supports Git, but user-managed provides flexibility for custom environments.

Full explanation →

645

MCQmedium

An ML team wants to implement data versioning for large datasets stored in Google Cloud Storage. They need to track changes over time and reproduce previous data states. Which tool is most appropriate?

A.Cloud Storage Object Versioning

B.BigQuery table snapshots

C.Git LFS

D.DVC

AnswerD

DVC provides data versioning and pipeline reproducibility.

Why this answer

DVC (Data Version Control) is designed specifically for versioning large datasets and ML models, working with cloud storage.

Full explanation →

646

MCQhard

Refer to the exhibit. What is being configured?

A.A model training pipeline

B.A batch prediction job

C.An endpoint with autoscaling based on request count

D.An endpoint with autoscaling based on CPU utilization

AnswerC

The autoscaling metric is 'prediction/online/requests'.

Why this answer

The exhibit shows the configuration of an Amazon SageMaker endpoint with a scaling policy that uses 'InvocationsPerInstance' as the target metric. This is the standard method for enabling autoscaling based on request count, where the scaling policy adjusts the number of instances to maintain a target number of invocations per instance. Option C is correct because the configuration explicitly sets the target tracking metric to 'SageMakerVariantInvocationsPerInstance', which triggers scaling based on request count.

Exam trap

Google Cloud often tests the distinction between request-count-based and CPU-based autoscaling; the trap here is that candidates see 'autoscaling' and assume CPU utilization is the default metric, but the exhibit explicitly shows the invocation-based metric, making Option D a distractor for those who do not read the configuration details carefully.

How to eliminate wrong answers

Option A is wrong because a model training pipeline involves steps like data preprocessing, training, and evaluation, not endpoint scaling policies or instance count settings. Option B is wrong because a batch prediction job uses a transform job or batch transform, not a persistent endpoint with autoscaling and invocation metrics. Option D is wrong because the exhibit shows 'InvocationsPerInstance' as the target metric, not CPU utilization; CPU-based autoscaling would use a metric like 'CPUUtilization' from CloudWatch, not the invocation-based metric configured here.

Full explanation →

647

Multi-Selecteasy

Which THREE are valid uses of Vertex AI Metadata?

Select 3 answers

A.Record the execution of a pipeline step

B.Track which dataset was used to train a model

C.Query upstream sources of a model

D.Deploy a model to an endpoint

E.Store hyperparameter values of an experiment run

AnswersA, B, C

Executions are tracked in Metadata.

Why this answer

Vertex AI Metadata tracks lineage of artefacts, executions, and contexts across pipelines.

Full explanation →

648

MCQhard

The exhibit shows a Cloud Composer environment variable configuration. An ML pipeline DAG fails with an authentication error when trying to access Vertex AI. What is the most likely cause?

A.The Airflow worker does not have the proper scopes to access Vertex AI

B.The service account key in the environment variable is expired

C.The DAG file is missing a required Python library

D.The Cloud Composer environment is in a different project than Vertex AI

AnswerA

The environment variable 'GOOGLE_APPLICATION_CREDENTIALS' is set to a service account key path, but the worker VM may not have the necessary scopes.

Why this answer

The authentication error when accessing Vertex AI from Cloud Composer most likely occurs because the Airflow worker's service account lacks the necessary OAuth scopes or IAM permissions. Cloud Composer uses a worker service account to execute tasks; if this account does not have the `https://www.googleapis.com/auth/cloud-platform` scope or the `aiplatform.user` role, the Airflow worker cannot authenticate to Vertex AI APIs, resulting in a 403 or 401 error.

Exam trap

Google Cloud often tests the distinction between authentication (scopes/identity) and authorization (IAM roles), so candidates mistakenly blame cross-project configuration or missing libraries when the root cause is the worker's service account lacking the required OAuth scopes.

How to eliminate wrong answers

Option B is wrong because expired service account keys would cause a different error (e.g., 'invalid_grant' or 'expired key'), not a generic authentication error, and Cloud Composer typically uses a service account attached to the environment, not a key stored in an environment variable. Option C is wrong because a missing Python library would raise an ImportError or ModuleNotFoundError, not an authentication error. Option D is wrong because Cloud Composer and Vertex AI can be in different projects as long as the service account has cross-project IAM permissions; the error would be a permission denied, not an authentication failure.

Full explanation →

649

MCQmedium

A team has a prototype image classification model trained on a small dataset using TensorFlow Keras on a single GPU. They need to train on a larger dataset (1 million images) using a distributed strategy on Vertex AI with 8 GPUs. They implement a MirroredStrategy for data parallelism. During the first few epochs, the training speed does not improve significantly compared to a single GPU, and GPU utilization is low. The data is stored as JPEG files in Cloud Storage, and the input pipeline uses tf.data with map to decode images. What is the most likely cause?

A.The batch size per GPU is too large.

B.The MirroredStrategy is not properly configured.

C.The data loading from Cloud Storage is a bottleneck.

D.The model is too small for distributed training.

AnswerC

I/O bottleneck starves GPUs, causing low utilization.

Why this answer

Option B is correct because reading and decoding JPEG images from Cloud Storage can be I/O-bound, causing low GPU utilization. Option A is wrong because large batch size per GPU could cause memory issues but not low utilization. Option C is wrong because MirroredStrategy is typically configured correctly.

Option D is wrong because even if the model is small, distributed training should still improve throughput if the pipeline is not bottlenecked.

Full explanation →

650

MCQhard

A company uses Vertex AI Predictions with a custom container that invokes an external API for feature enrichment. The prediction response time is highly variable. The engineer wants to monitor the external API's contribution to latency. What should the engineer do?

A.Instrument the prediction container to emit custom metrics for the time spent in each prediction step, including the external API call.

B.Add a timeout setting to the endpoint's request to limit the external API call duration.

C.Monitor the Vertex AI endpoint latency metric and correlate with system metrics like CPU and memory.

D.Use Cloud Trace to trace the prediction request end-to-end, including the external API call.

AnswerA

Custom metrics provide granular breakdown.

Why this answer

Option A is correct because instrumenting the custom container to emit custom metrics (e.g., using OpenTelemetry or a Prometheus client library) allows the engineer to directly measure the time spent in each prediction step, isolating the external API call's contribution to latency. This provides granular, real-time visibility into the specific bottleneck, which is essential when the response time is highly variable and the external API is a known dependency.

Exam trap

Google Cloud often tests the distinction between monitoring (custom metrics) and tracing (Cloud Trace) — the trap here is that candidates assume Cloud Trace automatically captures all downstream calls, but it requires explicit instrumentation of the external API call to record its duration, whereas custom metrics can be emitted directly from the container code without needing distributed tracing context.

How to eliminate wrong answers

Option B is wrong because adding a timeout setting to the endpoint's request limits the duration of the external API call but does not provide any monitoring data; it only caps the latency, potentially causing failures without diagnosing the root cause. Option C is wrong because monitoring the Vertex AI endpoint latency metric and correlating with CPU/memory only gives aggregate performance data, not the specific contribution of the external API call, making it impossible to isolate the external API's impact. Option D is wrong because Cloud Trace can trace the request end-to-end, but it requires the custom container to be instrumented with trace context propagation; without explicit instrumentation of the external API call, Cloud Trace will not capture the time spent in that external call, leaving the same gap in visibility.

Full explanation →

651

MCQmedium

You are developing a Vertex AI pipeline that runs multiple parallel training jobs with different hyperparameters, then collects their results and selects the best model. Which KFP SDK v2 construct should you use to run the parallel training tasks?

A.dsl.Metrics

B.dsl.If

C.dsl.ParallelFor

D.dsl.Collected

AnswerD

dsl.Collected creates a parallel for-loop, executing tasks in parallel for each item in a list.

Why this answer

In KFP SDK v2, `dsl.Collected` is used to gather the outputs of parallel tasks (e.g., training jobs with different hyperparameters) into a single list, which can then be passed to a downstream task for aggregation or selection. This construct enables the pipeline to collect results from multiple parallel branches and use them in a subsequent component, such as a model selector.

Exam trap

Cisco often tests the distinction between `dsl.ParallelFor` (which controls the parallel execution) and `dsl.Collected` (which gathers the outputs), leading candidates to mistakenly choose `dsl.ParallelFor` when the question asks about collecting results.

How to eliminate wrong answers

Option A is wrong because `dsl.Metrics` is a KFP SDK v2 class for defining and reporting evaluation metrics (e.g., accuracy, loss) from a component, not for collecting outputs from parallel tasks. Option B is wrong because `dsl.If` is a conditional construct used to execute tasks based on a condition, not for gathering results from parallel executions. Option C is wrong because `dsl.ParallelFor` is used to iterate over a list and execute tasks in parallel, but it does not provide a mechanism to collect the outputs of those parallel tasks into a single downstream input; `dsl.Collected` is specifically designed for that purpose.

Full explanation →

652

MCQmedium

A company wants to build a product recommendation engine for their e-commerce website. They have historical purchase data and user interaction logs. They want a managed service that can quickly generate personalized recommendations without building custom models. Which service should they use?

A.Dataflow with TensorFlow

B.BigQuery ML with MATRIX_FACTORIZATION

C.AutoML Tables

D.Recommendations AI

AnswerD

Recommendations AI provides pre-built models for personalized recommendations, ideal for e-commerce.

Why this answer

Recommendations AI is a managed service specifically for retail recommendation use cases. It offers pre-built models like 'recommended-for-you' and 'frequently-bought-together'. BigQuery ML would require custom model building, and AutoML Tables is for general tabular data, not specifically for recommendations.

Full explanation →

653

MCQeasy

You need to serve a model on an edge device with low latency and offline capability. Which approach should you use?

A.Export the model to TensorFlow Lite and use Vertex AI Edge Manager for deployment.

B.Use Cloud Run for on-device inference.

C.Deploy the model to a Vertex AI endpoint and rely on mobile connectivity.

D.Use AI Platform Prediction (not Vertex AI).

AnswerA

Correct. Edge Manager handles deployment to devices with TFLite or ONNX models.

Why this answer

TensorFlow Lite is specifically designed for on-device inference with low latency and offline capability, converting models into a lightweight format optimized for edge hardware. Vertex AI Edge Manager extends this by providing deployment, monitoring, and management of models on edge devices, ensuring they run efficiently without constant cloud connectivity.

Exam trap

Cisco often tests the distinction between cloud-based inference services (like Vertex AI endpoints or Cloud Run) and edge-optimized solutions (like TensorFlow Lite with Edge Manager), trapping candidates who assume any Google Cloud ML service can be deployed offline.

How to eliminate wrong answers

Option B is wrong because Cloud Run is a serverless compute platform for containerized applications in the cloud, not designed for on-device inference; it requires network connectivity and cannot operate offline. Option C is wrong because deploying to a Vertex AI endpoint relies on mobile connectivity for inference requests, introducing latency and failing when offline, contradicting the low-latency and offline requirements. Option D is wrong because AI Platform Prediction (the predecessor to Vertex AI) is a cloud-based prediction service that also requires network connectivity and is not optimized for edge deployment or offline operation.

Full explanation →

654

MCQmedium

A financial services company wants to detect fraudulent transactions in real-time. They have a trained XGBoost model that runs on a single Compute Engine instance. The current solution processes about 100 transactions per second, but they need to scale to 10,000 transactions per second. Which approach should they take?

A.Increase the VM to a machine type with more vCPUs and memory

B.Deploy the model to Vertex AI Prediction with autoscaling enabled

C.Use Dataflow to process transactions in micro-batches every second

D.Rewrite the model as a Cloud Function triggered by Pub/Sub messages

AnswerB

Vertex AI Prediction automatically scales based on traffic.

Why this answer

Vertex AI Prediction with autoscaling is the correct choice because it is purpose-built for serving ML models at scale, automatically adjusting the number of compute nodes based on incoming request traffic. This allows the company to seamlessly handle the increase from 100 to 10,000 transactions per second without manual intervention, while XGBoost is natively supported as a framework.

Exam trap

Google Cloud often tests the misconception that vertical scaling (bigger VM) is sufficient for large throughput increases, when in reality horizontal scaling with a managed service like Vertex AI is required for elasticity and high availability.

How to eliminate wrong answers

Option A is wrong because simply scaling up a single VM (vertical scaling) has hardware limits and cannot reliably handle a 100x increase in throughput; it also introduces a single point of failure and lacks automatic scaling. Option C is wrong because Dataflow is designed for batch and stream processing pipelines, not for low-latency real-time model serving; processing in micro-batches every second would add unacceptable latency for fraud detection. Option D is wrong because Cloud Functions have a maximum timeout of 9 minutes and are not designed for sustained high-throughput inference workloads; they are better suited for lightweight, event-driven tasks, not for serving a complex XGBoost model at 10,000 TPS.

Full explanation →

655

Multi-Selecteasy

Which TWO actions are best practices when scaling a prototype ML model to production in Google Cloud?

Select 2 answers

A.Store and manage features in a feature store like Vertex AI Feature Store.

B.Test the model only on a small sample of the production data to save costs.

C.Set up monitoring and logging for model performance and data drift.

D.Manually scale inference instances based on historical traffic patterns.

E.Use one-hot encoding for all categorical features without considering cardinality.

AnswersA, C

Feature store ensures consistency and reuse across models.

Why this answer

Vertex AI Feature Store centralizes feature management, ensuring consistency between training and serving. This eliminates training-serving skew by providing a single source of truth for features, which is critical when scaling from prototype to production.

Exam trap

Google Cloud often tests the misconception that cost-saving shortcuts like limited testing or manual scaling are acceptable in production, when in fact reliability and monitoring are non-negotiable for ML systems at scale.

Full explanation →

656

MCQmedium

A team is building a fraud detection model that requires joining real-time transaction features with historical user features. They need to ensure that the training data does not use future information (data leakage). Which Vertex AI Feature Store capability should they use?

A.Online store serving with Bigtable

B.Feature store time travel

C.Point-in-time correct join

D.Feature monitoring for drift

AnswerC

This ensures historical consistency between features and labels.

Why this answer

Point-in-time correct join ensures that at training time, each row is joined with the feature value as it existed at that specific point in time, preventing leakage from future feature values.

Full explanation →

657

Multi-Selecthard

A team uses Vertex AI Pipelines and wants to track lineage of artifacts and executions. Which three resources should they use? (Choose three.)

Select 3 answers

A.Artifacts

B.Vertex AI Experiments

C.Vertex AI Metadata

D.Model Registry

E.Executions

AnswersA, C, E

Artifacts represent data or model outputs in the lineage graph.

Full explanation →

658

MCQmedium

An ML engineer is using Vertex AI Vizier to tune hyperparameters for a PyTorch model. They want to maximise the chance of finding the global optimum within a fixed trial budget of 50 trials. Which algorithm should they select?

A.Random search

B.Bayesian optimisation

C.Grid search

D.Evolutionary algorithm

AnswerB

Uses probabilistic model to focus on promising regions; best for limited budget.

Why this answer

Bayesian optimisation (option B) is the correct choice because it builds a probabilistic surrogate model of the objective function and uses an acquisition function to balance exploration and exploitation, making it highly sample-efficient. With only 50 trials, Bayesian optimisation maximises the probability of finding the global optimum by focusing trials on the most promising hyperparameter regions, unlike random or grid search which waste trials on unpromising areas.

Exam trap

The trap here is that candidates often choose random search (option A) because they recall it is better than grid search for high-dimensional spaces, but they overlook that Bayesian optimisation is strictly more sample-efficient and is the default recommendation in Vertex AI Vizier for maximising global optimum discovery under a fixed trial budget.

How to eliminate wrong answers

Option A is wrong because random search, while better than grid search in high-dimensional spaces, does not use past trial results to guide future trials, so it wastes trials on suboptimal regions and has a lower probability of finding the global optimum within a fixed budget of 50 trials. Option C is wrong because grid search exhaustively evaluates a fixed set of points, which scales exponentially with the number of hyperparameters and is extremely inefficient for more than a few parameters, often missing the global optimum entirely within a limited budget. Option D is wrong because evolutionary algorithms (e.g., genetic algorithms) are population-based and require many generations to converge, typically needing hundreds or thousands of trials to be effective, making them impractical for a tight budget of 50 trials.

Full explanation →

659

MCQhard

A retail company uses Vertex AI Tabular (AutoML Tables) to build a customer churn prediction model. The training dataset contains 50,000 rows and 30 features, with a 5% churn rate. The model achieves an AUC of 0.85 on the test set. When deployed for online predictions, the average latency is 800ms, while the business requirement is under 200ms. The engineer has already reduced the feature set to 10 features, but latency only dropped to 600ms. The model size is 2GB. The endpoint is in us-central1 using an n1-standard-4 machine with minReplicaCount=1. What should the engineer do to meet the latency requirement?

A.Move the endpoint to a region geographically closer to the majority of customers.

B.Use a larger machine type (e.g., n1-highmem-8) for the endpoint.

C.Convert the model to a custom TensorFlow Lite model and deploy it.

D.Enable model compression in Vertex AI Tabular.

AnswerD

Model compression reduces model size and inference latency, which directly addresses the issue.

Why this answer

Vertex AI Tabular (AutoML Tables) supports model compression, which reduces model size and inference latency by applying techniques like quantization and pruning. Since the model is 2GB and latency is 600ms (still above the 200ms target), enabling compression can shrink the model significantly, often cutting latency by 2-3x, directly meeting the requirement without changing infrastructure or converting to a different framework.

Exam trap

Google Cloud often tests the misconception that latency issues are always solved by scaling up hardware or changing regions, but the real bottleneck here is model size and inference complexity, which Vertex AI Tabular's built-in compression directly addresses without requiring framework conversion or infrastructure changes.

How to eliminate wrong answers

Option A is wrong because moving the endpoint geographically reduces network round-trip time, but the 600ms latency is dominated by model inference time on the server, not network latency; the business requirement is under 200ms total, and network latency is typically <50ms within a region. Option B is wrong because using a larger machine type (e.g., n1-highmem-8) increases CPU/memory resources, but AutoML Tables models are often CPU-bound and the bottleneck is model size and complexity, not compute capacity; a larger machine may only shave off a small fraction of latency and is cost-inefficient. Option C is wrong because Vertex AI Tabular models are not TensorFlow-based; they are ensemble models (e.g., gradient-boosted trees, neural networks) that cannot be directly converted to TensorFlow Lite, which is designed for TensorFlow/Keras models, not AutoML Tables output.

Full explanation →

660

Multi-Selecteasy

A company is deploying a computer vision model on edge devices using TensorFlow Lite. They want to reduce model size without significant accuracy loss. Which TWO model compression techniques are most suitable?

Select 2 answers

A.Quantization-aware training (QAT)

B.Weight pruning

C.Knowledge distillation

D.Post-training float16 quantization

E.Increasing model depth

AnswersB, D

Pruning removes near-zero weights, reducing model size. It can be done post-training with minimal accuracy loss.

Why this answer

Weight pruning (B) is suitable because it removes redundant connections (weights) from the neural network, reducing the model size and computational cost while often preserving accuracy if done gradually. Post-training float16 quantization (D) converts model weights from float32 to float16, halving the storage size with minimal accuracy loss, and is directly supported by TensorFlow Lite for edge deployment.

Exam trap

Cisco often tests the distinction between techniques that directly reduce model size (pruning, quantization) versus those that improve accuracy or create new models (QAT, knowledge distillation), leading candidates to select QAT as a compression method when it is actually a training-time optimization.

Full explanation →

661

MCQhard

A real-time recommendation model deployed on Vertex AI Endpoints is experiencing increased latency, especially during peak hours. The model is hosted on a single machine with 4 CPUs. Which set of actions should you take to diagnose and resolve the issue?

A.Increase the machine type to with 32 CPUs and disable autoscaling.

B.Switch the endpoint to use GPUs and enable batch requests.

C.Enable autoscaling on the endpoint and analyze request patterns to set min/max instances.

D.Change the serving framework to use TensorFlow Serving with gRPC.

AnswerC

Autoscaling handles peak load efficiently.

Why this answer

Option C is correct because enabling autoscaling on a Vertex AI Endpoint allows the deployment to dynamically adjust the number of serving instances based on real-time traffic, directly addressing peak-hour latency. Analyzing request patterns to set appropriate min/max instances ensures that the endpoint scales proactively without over-provisioning, which is the standard diagnostic and resolution approach for latency issues caused by insufficient capacity under variable load.

Exam trap

Google Cloud often tests the misconception that scaling up (vertical scaling) or changing frameworks is the first step to fix latency, when the correct approach is to first diagnose capacity constraints and then scale out horizontally with autoscaling.

How to eliminate wrong answers

Option A is wrong because simply increasing the machine type to 32 CPUs without autoscaling does not resolve peak-hour latency; it only increases static capacity, leading to over-provisioning during low traffic and still failing under sudden spikes if the single instance is overwhelmed. Option B is wrong because switching to GPUs is not a direct fix for latency caused by CPU-bound serving; GPUs benefit compute-heavy models (e.g., deep learning) but add overhead for small models, and enabling batch requests increases latency for real-time predictions as it waits to accumulate requests. Option D is wrong because changing the serving framework to TensorFlow Serving with gRPC does not address the root cause of insufficient compute capacity; it may improve throughput per instance but cannot compensate for a single machine being overloaded during peak hours.

Full explanation →

662

MCQhard

Your team is training a very large transformer model that does not fit on a single GPU. They are using Vertex AI custom training with PyTorch. Which distributed training approach should they use?

A.Data parallelism using PyTorch DistributedDataParallel (DDP)

B.Horovod with allreduce

C.Model parallelism using pipeline parallelism

D.Vertex AI distributed training with TF_CONFIG

AnswerC

Splits layers across GPUs, allowing large models.

Why this answer

When a transformer model is too large to fit on a single GPU, model parallelism (specifically pipeline parallelism) is required because it splits the model's layers across multiple devices, with each device holding a subset of the model's parameters. Data parallelism (DDP) replicates the entire model on each GPU, which fails if the model exceeds a single GPU's memory. Pipeline parallelism allows training very large models by partitioning the model into stages and passing activations and gradients sequentially between devices.

Exam trap

Cisco often tests the distinction between data parallelism (which replicates the model) and model parallelism (which splits the model), and the trap here is that candidates assume any distributed training framework (like DDP or Horovod) can handle oversized models, ignoring the fundamental memory constraint that data parallelism cannot overcome.

How to eliminate wrong answers

Option A is wrong because PyTorch DistributedDataParallel (DDP) implements data parallelism, which requires the entire model to fit on each GPU; if the model is too large for one GPU, DDP cannot be used. Option B is wrong because Horovod with allreduce is also a data-parallel approach that replicates the model on every worker, suffering the same memory limitation as DDP. Option D is wrong because Vertex AI distributed training with TF_CONFIG is a configuration mechanism for TensorFlow-based distributed training (using MirroredStrategy or MultiWorkerMirroredStrategy), not a PyTorch-native approach, and it still relies on data parallelism unless combined with model parallelism; the question specifies PyTorch, making this option technically incompatible.

Full explanation →

663

MCQmedium

A company uses AutoML Tables to predict customer churn. The model's AUC is low. Which action is most likely to improve performance?

A.Use a different optimization objective

B.Add more training data

C.Increase the training budget to 10 hours

D.Remove features with low importance

AnswerB

Correct: More data generally improves model performance.

Why this answer

Adding more training data often helps improve model performance. Increasing the training budget alone may not help if data is insufficient. Removing features with low importance could hurt.

Changing the optimization objective may not directly improve AUC.

Full explanation →

664

MCQeasy

Refer to the exhibit. What does this query return?

A.The maximum latency per minute

B.The error rate per minute

C.The total number of predictions per minute

D.The average latency per minute for the model

AnswerD

The query uses 'mean' aggregator over 1-minute windows.

Why this answer

The query uses the `rate` function to calculate the per-second rate of increase of the `latency_seconds` counter, and then applies the `avg` aggregator to compute the average latency across all instances over the specified time range. The `by (model)` clause groups the result by the `model` label, so the output is the average latency per minute for each model. This is why option D is correct.

Exam trap

Google Cloud often tests the distinction between `avg` and `max` aggregators in PromQL queries, and candidates mistakenly think `rate` alone implies a maximum or total, rather than understanding that `avg` computes the mean over the rate values.

How to eliminate wrong answers

Option A is wrong because the query uses `avg` to compute the average, not `max` to find the maximum latency per minute. Option B is wrong because the query operates on `latency_seconds`, a latency metric, not on an error counter or error rate metric. Option C is wrong because the query uses `avg` to average latency values, not `sum` or `count` to total the number of predictions per minute.

Full explanation →

665

MCQmedium

An MLOps team needs to automatically retrain a model when new training data becomes available. They use Vertex AI Pipelines. What is the recommended way to trigger the pipeline?

A.Use Model Evaluation to decide

B.Set up a trigger in Vertex AI Pipelines

C.Cloud Functions triggered by Cloud Storage events

D.Cloud Scheduler on a daily basis

AnswerC

Cloud Functions can listen for object finalize events in Cloud Storage and start the pipeline.

Why this answer

Option C is correct because Vertex AI Pipelines does not natively support event-driven triggers. The recommended pattern is to use Cloud Functions, which can be triggered by Cloud Storage events (e.g., object finalize/create) when new training data is uploaded. The Cloud Function then programmatically submits the pipeline run via the Vertex AI Pipelines client library or REST API, enabling an automated retraining workflow.

Exam trap

The trap here is that candidates assume Vertex AI Pipelines has a built-in trigger mechanism (Option B) because many CI/CD tools do, but Google Cloud's recommended pattern relies on external event-driven services like Cloud Functions.

How to eliminate wrong answers

Option A is wrong because Model Evaluation is a post-training assessment step, not a trigger mechanism; it cannot initiate pipeline execution. Option B is wrong because Vertex AI Pipelines itself does not provide a built-in trigger; triggers must be implemented externally via Cloud Functions, Cloud Scheduler, or similar services. Option D is wrong because Cloud Scheduler on a daily basis is a time-based trigger, not an event-driven one; it would retrain on a fixed schedule regardless of whether new data has arrived, leading to unnecessary runs or missed retraining opportunities.

Full explanation →

666

Multi-Selecteasy

Which TWO statements about Vertex AI Feature Store are correct? (Choose 2)

Select 2 answers

A.Feature Store automatically applies feature engineering transformations.

B.Feature Store can only store numerical features.

C.Feature Store can only be used with Vertex AI models.

D.Feature Store provides a centralized repository for feature data.

E.Feature Store supports both online and offline serving.

AnswersD, E

Correct: it centralizes features for reuse.

Why this answer

Option D is correct because Vertex AI Feature Store is designed as a centralized repository that organizes, stores, and serves feature data consistently across different models and pipelines. This centralization ensures feature reuse, consistency, and governance, preventing data silos and duplication across the ML lifecycle.

Exam trap

Google Cloud often tests the misconception that Vertex AI Feature Store is tightly coupled to Vertex AI models or that it performs automatic feature engineering, when in fact it is a decoupled storage and serving layer that supports any ML framework and requires explicit feature engineering steps.

Full explanation →

667

MCQeasy

An ML team is using Vertex AI Online Prediction and wants to receive alerts when the 99th percentile latency exceeds 500ms for more than 5 minutes. What is the best practice to set up this alert in Cloud Monitoring?

A.Create a custom metric from the prediction container that emits latency percentiles, then set an alert on that metric.

B.Use the 'aiplatform.googleapis.com/prediction/online_prediction_latencies' metric with a metric threshold condition set to 500ms and a percentile aligner of 99.

C.Use a log-based metric to parse latency from Cloud Logging and alert when the average exceeds 500ms.

D.Export prediction latency logs to BigQuery and run a scheduled query to check the 99th percentile, then trigger a Cloud Function to send an alert.

AnswerB

This directly monitors the 99th percentile latency.

Why this answer

Option B is correct because Cloud Monitoring provides a pre-built metric, `aiplatform.googleapis.com/prediction/online_prediction_latencies`, which directly captures prediction latency. By applying a percentile aligner of 99 and a metric threshold condition of 500ms, you can alert when the 99th percentile latency exceeds 500ms for the specified duration, without needing custom instrumentation or external processing.

Exam trap

Google Cloud often tests the misconception that you must create custom metrics or use log-based solutions for percentile-based alerting, when in fact Cloud Monitoring's distribution metrics and percentile aligners handle this natively.

How to eliminate wrong answers

Option A is wrong because creating a custom metric from the prediction container is unnecessary and adds complexity; Vertex AI already emits the required latency metric natively, and custom metrics would require additional code and maintenance. Option C is wrong because using a log-based metric to parse latency from Cloud Logging and alerting on the average (not the 99th percentile) does not meet the requirement to monitor the 99th percentile latency; log-based metrics also introduce latency and parsing overhead. Option D is wrong because exporting logs to BigQuery and running scheduled queries is an overly complex, non-real-time approach that violates the best practice of using built-in monitoring capabilities; it also introduces additional cost and delay compared to native Cloud Monitoring alerts.

Full explanation →

668

MCQhard

A data engineering team needs to compute rolling window features (7-day average, 30-day sum) from a high-volume stream of e-commerce events stored in BigQuery. They must output the features to Vertex AI Feature Store for online serving. Which approach is MOST cost-effective and scalable?

A.Use Cloud Composer (Airflow) with a daily DAG to run SQL queries on BigQuery and export results

B.Schedule a query in BigQuery using scheduled queries and export results to Feature Store

C.Use Cloud Functions triggered by Pub/Sub to compute features on the fly

D.Use Dataflow with Apache Beam, reading from BigQuery, computing windowed aggregations, and writing to Vertex AI Feature Store

AnswerD

Dataflow handles large-scale, windowed feature computation efficiently and can write to Feature Store.

Why this answer

Dataflow (Apache Beam) is ideal for processing large-scale batch and streaming data. It can read from BigQuery, perform windowing computations, and write to Feature Store's online store. Cloud Functions have timeouts, Cloud Composer is not optimal for streaming, and BigQuery scheduled queries are not designed for streaming-feature computation.

Full explanation →

669

MCQmedium

You have a champion model serving 100% traffic on a Vertex AI endpoint. You want to deploy a challenger model and gradually shift 10% of traffic to it for A/B testing. What is the correct approach?

A.Use Cloud Run to deploy both models and use Cloud Endpoints for traffic splitting.

B.Deploy the challenger on the same endpoint and use the traffic split parameter to allocate 10% traffic to it.

C.Deploy the challenger on a separate endpoint and use Cloud Armor to split traffic.

D.Create a new endpoint for the challenger and route 10% of requests via a load balancer.

AnswerB

Correct. Vertex AI allows multiple deployed models on one endpoint with traffic percentages.

Why this answer

Vertex AI endpoints support traffic splitting by deploying multiple model versions and assigning traffic percentages. You deploy the challenger as a new deployed model on the same endpoint and set traffic split: champion 90%, challenger 10%.

Full explanation →

670

MCQmedium

A company has a TensorFlow model that uses custom operations compiled as .so files. They want to deploy it on Vertex AI for online predictions. The model runs correctly when loaded locally. However, on Vertex AI, the prediction fails with a 'Op type not registered' error. What is the most likely reason?

A.The model is using a deprecated TensorFlow version.

B.The custom ops are not included in the model directory.

C.The prediction request format is incorrect.

D.The custom ops were compiled for a different CPU architecture.

AnswerD

Incompatible instruction sets cause the op to fail to register.

Why this answer

Option D is correct because custom TensorFlow operations compiled as .so files are architecture-specific. If the local machine uses a different CPU architecture (e.g., x86_64 with AVX2) than the Vertex AI serving nodes (e.g., x86_64 without AVX2 or ARM), the dynamic library will fail to load, causing the 'Op type not registered' error. The model runs locally because the ops are available, but on Vertex AI the shared object cannot be loaded, so TensorFlow cannot register the custom kernels.

Exam trap

Google Cloud often tests the misconception that 'Op type not registered' is always due to missing files or version mismatches, but the real trap is that candidates overlook CPU architecture compatibility when deploying compiled custom ops to a cloud environment where the serving hardware may differ from the build environment.

How to eliminate wrong answers

Option A is wrong because a deprecated TensorFlow version would typically cause compatibility warnings or missing API errors, not an 'Op type not registered' error specifically for custom ops; the error indicates the op kernel is missing, not that the version is unsupported. Option B is wrong because if the custom ops were not included in the model directory, the model would fail to load entirely or produce a 'file not found' error, not an 'Op type not registered' error; the error occurs when the .so file is present but cannot be loaded due to architecture mismatch. Option C is wrong because an incorrect prediction request format would result in a 400 Bad Request or a deserialization error, not a TensorFlow runtime error about unregistered ops; the error is raised during model inference, not request parsing.

Full explanation →

671

Multi-Selectmedium

Which THREE are best practices for implementing CI/CD for ML pipelines on Google Cloud? (Choose THREE.)

Select 3 answers

A.Maintain separate environments for dev, staging, and production

B.Track all experiments and artifacts using Vertex ML Metadata

C.Use Cloud Build to automate testing, building, and deployment of pipeline components

D.Design pipelines with low-code components to reduce development time

E.Write unit tests for every training job

AnswersA, B, C

Prevents unintended changes to production.

Why this answer

Maintaining separate environments for dev, staging, and production is a core CI/CD best practice because it isolates changes, prevents accidental breakage in production, and allows thorough validation at each stage. On Google Cloud, this aligns with using distinct Vertex AI Pipelines instances or separate projects to enforce environment-specific configurations and access controls.

Exam trap

Google Cloud often tests the distinction between general software CI/CD practices and ML-specific CI/CD needs, trapping candidates who over-apply traditional unit testing or assume low-code tools are always best practices for production ML pipelines.

Full explanation →

672

MCQhard

A company uses BigQuery ML to train a boosted tree classifier on a large dataset. After training, they want to understand which features most influence predictions. Which BigQuery ML function should they use?

A.ML.EXPLAIN_PREDICT

B.ML.FEATURE_IMPORTANCE

C.ML.EVALUATE

D.ML.PREDICT

AnswerB

Why this answer

ML.FEATURE_IMPORTANCE returns feature importance for tree-based models in BigQuery ML. ML.EVALUATE gives evaluation metrics, ML.PREDICT gives predictions, ML.EXPLAIN_PREDICT gives local explanations with SHAP values.

Full explanation →

673

MCQmedium

You have a Vertex AI endpoint with min_replica_count=2 and max_replica_count=10. You notice that during a traffic spike, the endpoint does not scale up quickly enough, causing increased latency. What should you do to improve autoscaling responsiveness?

A.Increase max_replica_count to 20.

B.Disable autoscaling and manually manage replicas.

C.Increase min_replica_count to 10.

D.Reduce the target CPU utilization percentage from default to a lower value.

AnswerD

Lower target utilization triggers scaling sooner, improving responsiveness.

Why this answer

Option D is correct because reducing the target CPU utilization percentage (e.g., from the default 60% to a lower value like 40%) causes the autoscaler to trigger scale-up actions sooner, as the threshold for adding replicas is reached at a lower CPU load. This improves responsiveness during traffic spikes by initiating scaling earlier, reducing latency. The endpoint's min_replica_count=2 and max_replica_count=10 remain unchanged, so the scaling range is preserved.

Exam trap

Cisco often tests the misconception that increasing max_replica_count or min_replica_count improves scaling speed, when in fact the key lever is the target utilization threshold that controls autoscaler sensitivity.

How to eliminate wrong answers

Option A is wrong because increasing max_replica_count to 20 does not address the speed of scaling; it only expands the upper bound, which may help if the spike exceeds 10 replicas but does not make the autoscaler react faster. Option B is wrong because disabling autoscaling and manually managing replicas removes the ability to dynamically handle traffic spikes, leading to either over-provisioning or under-provisioning and increased latency. Option C is wrong because increasing min_replica_count to 10 forces a minimum of 10 replicas at all times, which wastes resources during low traffic and does not improve the autoscaler's responsiveness to sudden spikes; it only pre-allocates capacity.

Full explanation →

674

MCQeasy

A data science team has deployed a model on Vertex AI and wants to automatically detect when the distribution of a specific feature shifts significantly from the training data. Which service should they use?

A.Cloud Data Loss Prevention

B.Vertex AI Model Monitoring

C.Vertex AI Explainable AI

D.Cloud Composer

AnswerB

Vertex AI Model Monitoring includes skew detection, which compares training and serving distributions and alerts on significant shifts.

Why this answer

Vertex AI Model Monitoring is the correct service because it is specifically designed to detect feature distribution drift (skew) between training and serving data for deployed models. It continuously monitors the input features and alerts when statistical metrics like the Jensen-Shannon divergence or the L-infinity distance exceed a configured threshold, enabling proactive model retraining.

Exam trap

Google Cloud often tests the distinction between monitoring model performance (e.g., accuracy, latency) versus monitoring data distribution drift, and candidates may confuse Vertex AI Model Monitoring with Explainable AI because both involve model analysis, but only Model Monitoring tracks shifts over time.

How to eliminate wrong answers

Option A is wrong because Cloud Data Loss Prevention (DLP) is used for inspecting, classifying, and de-identifying sensitive data (e.g., PII, credit card numbers), not for monitoring feature distributions or model drift. Option C is wrong because Vertex AI Explainable AI provides feature attributions and explanations for model predictions (e.g., Shapley values, integrated gradients), but does not monitor distribution shifts over time. Option D is wrong because Cloud Composer is a managed Apache Airflow service for orchestrating workflows and pipelines, not a dedicated tool for detecting feature drift in deployed models.

Full explanation →

675

MCQmedium

A team is deploying a large PyTorch model for online inference. They want to use NVIDIA Triton Inference Server to optimize serving performance. How can they integrate Triton with Vertex AI?

A.Package the model with Triton in a custom container and deploy it to Vertex AI

B.Vertex AI automatically uses Triton for all PyTorch models

C.Deploy the model to GKE and use Vertex AI as a frontend

D.Use a prebuilt Vertex AI PyTorch container that includes Triton

AnswerA

Custom containers allow full control, including running Triton.

Why this answer

Vertex AI supports custom containers; you can build a Docker image with Triton Inference Server and deploy it as a model on Vertex AI.

Full explanation →

Google Professional Machine Learning Engineer (PMLE) — Questions 601–675