Knowledge + Practice

Google Professional Machine Learning Engineer (PMLE) — Questions 826–900

1000 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 12 of 14

826

MCQhard

A team is fine-tuning a large language model (LLaMA 2) using Vertex AI with a custom container on a multi-node GPU cluster. They need to implement model parallelism to fit the model across multiple GPUs because it does not fit into a single GPU memory. Which distributed training strategy should they use?

A.Use Vertex AI Hyperparameter Tuning to find optimal model partitioning

B.Use tf.distribute.MirroredStrategy across all GPUs

C.Implement pipeline parallelism by manually splitting the model layers across GPUs and using a framework like PyTorch's RPC or Megatron-LM

D.Use Vertex AI distributed training with TF_CONFIG to set up multi-worker mirrored strategy and rely on XLA to partition the model

AnswerC

Pipeline parallelism is the appropriate model parallelism technique for large models; it must be manually configured.

Why this answer

Model parallelism, specifically pipeline parallelism, splits the model layers across devices. For large models that don't fit on one GPU, this is necessary. Data parallelism (even with ZeRO) still requires the full model on each device.

Vertex AI does not natively support model parallelism; users must configure it manually using frameworks like Megatron-LM or DeepSpeed.

Full explanation →

827

MCQeasy

You have deployed a text classification model using Vertex AI Endpoints. The model is performing well, but the operations team wants to be alerted if the endpoint returns an excessive number of HTTP 503 errors. What is the simplest way to achieve this?

A.Configure a Cloud Monitoring uptime check on the endpoint URL.

B.Create a Cloud Monitoring alert based on the metric 'prediction/failed_request_count' with a condition on 5xx errors.

C.Add a logging statement in the custom prediction routine to count errors manually.

D.Export Cloud Logging to BigQuery and run a scheduled query for 503s.

AnswerB

Built-in metric directly reflects HTTP errors.

Why this answer

Option B is correct because Vertex AI Endpoints automatically export the 'prediction/failed_request_count' metric to Cloud Monitoring, which includes a label for HTTP status codes. By creating an alert on this metric with a filter for 5xx errors, you can directly monitor excessive 503 responses without additional infrastructure or custom code.

Exam trap

The trap here is that candidates often confuse uptime checks (which measure availability from external probes) with metric-based alerts (which track internal error counts), leading them to choose Option A despite its inability to specifically detect 503 errors.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring uptime checks test endpoint availability from external locations, but they cannot distinguish between 503 errors and other HTTP statuses; they only report overall uptime/downtime, not specific error counts. Option C is wrong because adding a logging statement in a custom prediction routine requires modifying the deployment code and does not leverage the built-in metrics already available in Vertex AI, making it unnecessarily complex and not the simplest approach. Option D is wrong because exporting logs to BigQuery and running scheduled queries introduces significant latency, cost, and operational overhead compared to using the native Cloud Monitoring alert, which provides real-time detection with minimal configuration.

Full explanation →

828

MCQmedium

A team deploys a model on Vertex AI that uses a custom prediction routine (CPR) with a dependency on a native library. The container crashes with 'ImportError: libcudart.so.11.0: cannot open shared object file'. How should they resolve this?

A.Build a custom container image that includes the CUDA runtime library.

B.Submit the model for batch prediction to avoid the error.

C.Request a GPU machine type for the endpoint.

D.Use a Vertex AI pre-built container for PyTorch instead.

AnswerA

Ensures the library is available.

Why this answer

The error 'ImportError: libcudart.so.11.0: cannot open shared object file' indicates that the CUDA runtime library (version 11.0) is missing from the container environment. Since the custom prediction routine (CPR) depends on a native library that requires this CUDA runtime, the correct solution is to build a custom container image that includes the CUDA runtime library. This ensures the shared object is available at runtime, resolving the import error.

Exam trap

Google Cloud often tests the misconception that requesting a GPU machine type automatically provides the necessary CUDA libraries, but in reality, the CUDA runtime must be explicitly included in the container image, as the GPU machine type only provides the hardware and driver, not the user-space libraries.

How to eliminate wrong answers

Option B is wrong because submitting the model for batch prediction does not change the container environment; the same missing CUDA runtime library will cause the same ImportError during batch prediction. Option C is wrong because requesting a GPU machine type for the endpoint provides GPU hardware but does not install the CUDA runtime library into the container; the library must be present in the container image regardless of the underlying hardware. Option D is wrong because using a Vertex AI pre-built container for PyTorch does not guarantee inclusion of the specific CUDA runtime version 11.0 required by the native library; the pre-built container may have a different CUDA version or omit the library entirely.

Full explanation →

829

MCQeasy

A data scientist runs a BigQuery ML prediction query and gets a region mismatch error. The model is in the US region, but the new_data table is in the EU region. What is the simplest way to resolve this?

A.Recreate the model in the EU region using the same training data

B.Copy the new_data table to the US region using the BigQuery UI or CLI

C.Enable cross-region query in BigQuery settings

D.Export the model from US and import it to EU

AnswerB

Copying the table to the same region resolves the mismatch with minimal effort.

Why this answer

Option B is correct because the simplest fix is to move the new_data table to the same region as the model (US). BigQuery ML requires that the model and the data used for predictions reside in the same multi-region or regional location. Copying the table via the BigQuery UI or CLI (e.g., `bq cp`) is a straightforward, no-code operation that avoids retraining or exporting the model.

Exam trap

The trap here is that candidates may overthink the solution and choose to recreate the model or export/import it, not realizing that the simplest and most efficient fix is to copy the data table to the model's region.

How to eliminate wrong answers

Option A is wrong because recreating the model in the EU region would require retraining the model from scratch, which is unnecessary and time-consuming when a simple data copy resolves the mismatch. Option C is wrong because BigQuery does not support a 'cross-region query' setting; queries are always restricted to a single region or multi-region, and enabling such a feature is not possible. Option D is wrong because exporting and importing a model between regions is more complex and involves additional steps (e.g., using Cloud Storage as an intermediary), whereas copying the table is simpler and directly addresses the region mismatch.

Full explanation →

830

MCQeasy

You are using Vertex AI Training to train a model and then automatically deploy the best candidate to a Vertex AI Prediction endpoint via the Vertex AI Model Registry. However, after deployment, you notice that the endpoint returns predictions for the new model, but they are significantly different from the evaluation metrics computed during training. The training scripts used TensorFlow with a serving input function. What is the most likely issue and how would you fix it?

A.The endpoint is using a different machine type affecting numerical precision; you should use the same machine type as training.

B.The serving input function's preprocessing steps do not match the training preprocessing; you should verify and align them.

C.The model registry deployed a different version; you should check the alias.

D.The model was saved with training-only metrics; you should retrain with evaluation metrics.

AnswerB

Preprocessing mismatch is a common cause for prediction discrepancies.

Why this answer

Option B is correct because the serving input function's preprocessing must match training preprocessing exactly; any mismatch causes prediction errors. Option A is wrong because the model saved includes evaluation metrics. Option C is possible but less likely given the consistency of difference.

Option D is unlikely as numerical precision differences are minimal.

Full explanation →

831

MCQhard

Refer to the exhibit. A data analyst creates a BigQuery ML logistic regression model for churn prediction. The model evaluation shows high precision but low recall. Which change to the model creation would most likely improve recall?

A.Drop more columns to reduce overfitting.

B.Increase the training data by including customers without churn dates.

C.Use ML.ADJUST_THRESHOLD to lower the classification threshold.

D.Change model_type to 'BOOSTED_TREE_CLASSIFIER'.

AnswerC

Why C is correct: Lowering threshold increases sensitivity, improving recall.

Why this answer

Option C is correct because lowering the classification threshold (e.g., from 0.5 to 0.3) will classify more customers as positive (churn), increasing recall (true positives / (true positives + false negatives)). In BigQuery ML, ML.ADJUST_THRESHOLD directly modifies the decision boundary, trading off precision for recall. This is the most direct way to address low recall without altering the model architecture or training data.

Exam trap

Google Cloud often tests the misconception that changing the model type (e.g., to boosted trees) is the default solution for any performance metric issue, when in fact the threshold adjustment is the simplest and most direct way to trade off precision and recall in a logistic regression model.

How to eliminate wrong answers

Option A is wrong because dropping more columns to reduce overfitting would likely harm recall further by removing potentially informative features, and overfitting typically causes high variance, not low recall. Option B is wrong because including customers without churn dates (non-churners) would increase the class imbalance, making the model even more biased toward the majority class and likely reducing recall further. Option D is wrong because changing model_type to 'BOOSTED_TREE_CLASSIFIER' might improve overall performance but does not specifically target the recall issue; it is a model architecture change that could also reduce recall if the class imbalance is not addressed, and it is not the most direct fix for a threshold-related precision-recall trade-off.

Full explanation →

832

MCQeasy

Which machine type is most suitable for a Vertex AI endpoint serving a GPU-accelerated model?

A.n1-standard-4 with attached GPU

B.e2-standard-4

C.n1-highmem-8

D.n1-standard-4

AnswerA

You need GPU machine type or attach GPU to N1.

Why this answer

Option A is correct because Vertex AI endpoints require a machine type that supports GPU acceleration, and the n1-standard-4 with an attached GPU provides the necessary CPU-to-GPU balance for inference workloads. The n1 series supports GPU attachments via the `accelerator` configuration, enabling CUDA-based model serving, while the e2 series and n1-highmem-8 without GPU cannot leverage GPU acceleration.

Exam trap

Cisco often tests the misconception that any n1 machine type inherently supports GPU acceleration, but the trap is that the GPU must be explicitly attached, and options like n1-highmem-8 or n1-standard-4 without the `accelerator` configuration are CPU-only, failing to meet the requirement.

How to eliminate wrong answers

Option B is wrong because e2-standard-4 does not support GPU attachments; the e2 machine series lacks the PCIe passthrough capability required for NVIDIA GPUs in Vertex AI. Option C is wrong because n1-highmem-8, while part of the n1 series, is a high-memory configuration that is overkill for most GPU-accelerated models and does not include a GPU by default; without an attached GPU, it cannot accelerate model inference. Option D is wrong because n1-standard-4 without an attached GPU is a CPU-only instance that cannot utilize GPU acceleration, making it unsuitable for a GPU-accelerated model endpoint.

Full explanation →

833

MCQmedium

You have a Vertex AI endpoint with two deployed models: model A (champion) and model B (challenger). Traffic split is 90:10. You want to gradually increase model B's traffic to 50% over a week. What is the best way to update the traffic split?

A.Use gcloud ai models upload to overwrite model B with new settings.

B.Create a new endpoint and migrate traffic gradually using a load balancer.

C.Use the gcloud ai endpoints update command to change traffic split.

D.Delete model B and redeploy with a new traffic split.

AnswerC

Correct. The gcloud command can update traffic percentages.

Why this answer

The `gcloud ai endpoints update` command allows you to directly modify the traffic split of an existing Vertex AI endpoint without redeploying or recreating the endpoint. This is the intended method for gradually shifting traffic between deployed models, as it supports incremental updates to the `--traffic-split` parameter, enabling a controlled rollout from 90:10 to 50:50 over a week.

Exam trap

Cisco often tests the misconception that you must recreate or redeploy resources to change configuration, when in fact Vertex AI endpoints support live updates via the `update` command, avoiding unnecessary downtime and complexity.

How to eliminate wrong answers

Option A is wrong because `gcloud ai models upload` is used to upload a new model version, not to update traffic split settings on an existing endpoint; traffic splits are managed at the endpoint level, not the model level. Option B is wrong because creating a new endpoint and using a load balancer adds unnecessary complexity and cost, and Vertex AI endpoints natively support traffic splitting without external load balancers. Option D is wrong because deleting and redeploying model B would cause downtime for the 10% of traffic already served by model B, and the traffic split can be updated without redeployment.

Full explanation →

834

MCQhard

Your team has deployed a text classification model on Vertex AI Endpoints. You notice that the model's latency has increased significantly over the last week, but the request rate has remained stable. Which of the following is the most likely cause?

A.A sudden increase in the number of prediction requests

B.The model was replaced with a larger version without updating the endpoint

C.A change in the preprocessing logic that now includes a computationally expensive step

D.A misconfiguration in the autoscaling policy

AnswerC

This increases per-request latency without changing request rate.

Why this answer

A computationally expensive preprocessing step directly increases per-request latency on the inference path, even when request rate is stable. Vertex AI Endpoints execute user-provided preprocessing code before model inference, so adding a heavy operation (e.g., large regex, image resizing, or external API call) will linearly increase response time for every prediction.

Exam trap

The trap here is that candidates confuse 'model latency' with 'request rate' and assume any latency increase must be due to scaling issues, ignoring that preprocessing logic changes can dramatically affect per-request performance without altering throughput.

How to eliminate wrong answers

Option A is wrong because a sudden increase in request rate would cause latency to rise, but the question explicitly states request rate has remained stable. Option B is wrong because replacing the model with a larger version requires deploying a new model to the endpoint or updating the endpoint's deployed model; simply replacing the model binary without updating the endpoint's deployment configuration would not change the model served, so latency would not increase. Option D is wrong because a misconfiguration in autoscaling policy (e.g., too few min replicas) would cause latency to increase only when request rate exceeds the current serving capacity, but request rate is stable and autoscaling would have already scaled to match the stable load.

Full explanation →

835

MCQhard

Refer to the exhibit. A team uses this Cloud Build configuration to deploy a model to a Vertex AI endpoint. The build succeeds up to the 'upload' step, but the 'deploy-model' step fails with an error that the model 'my-model' does not exist. What is the most likely cause?

A.The deploy step uses the display name instead of the model resource ID

B.The model was not uploaded because the artifact URI is a directory, not a valid SavedModel

C.The Vertex AI API was not enabled for the project

D.The region in the deploy step does not match the model's region

AnswerB

The artifact URI must point to a specific model file or subdirectory, not a generic directory.

Why this answer

The 'deploy-model' step fails because the model was not successfully uploaded. Cloud Build's 'upload' step expects a valid SavedModel artifact (a directory containing a saved_model.pb file and variables subdirectory). If the artifact URI points to a directory that is not a valid SavedModel, the upload may appear to succeed but does not register a usable model resource, causing the subsequent deploy step to fail with 'model does not exist'.

Exam trap

Google Cloud often tests the distinction between a successful upload step and a valid model registration, trapping candidates who assume any directory upload creates a usable model resource.

How to eliminate wrong answers

Option A is wrong because the deploy step uses the model resource ID, not the display name; the error message explicitly says 'my-model' does not exist, indicating the model resource was never created. Option C is wrong because if the Vertex AI API were not enabled, the build would fail at the 'upload' step or earlier with an API enablement error, not specifically at the deploy step. Option D is wrong because region mismatch would cause a different error (e.g., 'model not found in region') or a permission error, but the error message states the model does not exist, implying it was never registered in any region.

Full explanation →

836

MCQeasy

A data scientist trained a model on historical data from 2020-2022 and deployed it in January 2023. In February 2023, the model's accuracy drops significantly. Which monitoring metric would most likely indicate the root cause?

A.Number of unique users calling the endpoint.

B.Prediction latency p99.

C.Number of missing feature values in requests.

D.Training-serving skew detected by Vertex AI Model Monitoring.

AnswerD

Skew indicates that serving data distribution differs from training data, likely causing accuracy drop.

Why this answer

Option D is correct because Vertex AI Model Monitoring specifically detects training-serving skew, which occurs when the distribution of input features at serving time differs from the training data distribution. Since the model was trained on 2020-2022 data and deployed in January 2023, a significant accuracy drop in February 2023 likely indicates that the real-world data distribution has shifted (e.g., seasonal patterns, new user behavior), causing the model to encounter unseen patterns. This skew is a common root cause of performance degradation and is directly monitored by Vertex AI's skew detection feature.

Exam trap

Google Cloud often tests the distinction between model performance metrics (accuracy, precision) and operational metrics (latency, throughput, user count), and the trap here is that candidates may confuse a drop in accuracy with a system-level issue like latency or missing values, rather than recognizing that accuracy degradation is most directly linked to data distribution shifts (skew).

How to eliminate wrong answers

Option A is wrong because the number of unique users calling the endpoint is a business metric, not a model performance metric; it does not directly indicate why accuracy dropped. Option B is wrong because prediction latency p99 measures response time, not prediction quality; high latency could degrade user experience but does not explain a drop in accuracy. Option C is wrong because missing feature values in requests would cause errors or fallback behavior, but the question states accuracy drops, not that predictions fail; missing values are typically handled by imputation or default values and would not necessarily cause a significant accuracy drop unless the model was not trained to handle them.

Full explanation →

837

Drag & Dropmedium

Drag and drop the steps to set up a BigQuery ML linear regression model for forecasting in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

The correct sequence for setting up a BigQuery ML linear regression model is to first prepare the training data, then create the model using that data, evaluate the model's performance, and finally use it for predictions. This ensures the model is trained on appropriate data, validated for accuracy, and then applied for forecasting.

Full explanation →

838

Multi-Selectmedium

A company wants to implement a central model governance strategy using Vertex AI. They need to track model lineage, store evaluation metrics, and manage model versions across teams. Which THREE Vertex AI services should they use? (Choose 3)

Select 3 answers

A.Vertex AI Metadata

B.Vertex AI Model Registry

C.Vertex AI Workbench

D.Vertex AI Experiments

E.Vertex AI Feature Store

AnswersA, B, D

Tracks artifact lineage across the ML lifecycle.

Why this answer

Vertex AI Metadata tracks lineage, Model Registry stores models and versions with evaluation metrics, and Experiments captures run parameters and metrics.

Full explanation →

839

Multi-Selectmedium

A company wants to reduce costs for serving a model on Vertex AI Prediction without sacrificing availability. Which THREE strategies should they consider?

Select 3 answers

A.Use larger machine types to reduce the number of replicas

B.Switch to HTTP/2 to reduce network overhead

C.Enable automatic batching to improve throughput per instance

D.Use CPU instead of GPU for models that can run on CPU

E.Use min replicas=0 and enable autoscaling

AnswersC, D, E

Batching increases efficiency, reducing number of instances needed.

Why this answer

Option C is correct because enabling automatic batching on Vertex AI Prediction allows the model server to group multiple inference requests into a single batch, which increases throughput per instance and reduces the total number of compute resources needed. This directly lowers serving costs without sacrificing availability, as the batching is handled transparently by the Vertex AI Prediction infrastructure.

Exam trap

Google Cloud often tests the misconception that reducing replicas with larger machines is cost-effective, but the trap here is that larger machines increase per-unit cost and can lead to idle capacity, whereas autoscaling with min replicas=0 and batching optimizes cost without sacrificing availability.

Full explanation →

840

Multi-Selecthard

Which THREE factors are critical when designing a model serving architecture for a global user base with strict latency SLAs? (Choose 3.)

Select 3 answers

A.Use batch prediction to process requests in bulk for efficiency.

B.Deploy the model in a single region to avoid data sovereignty issues.

C.Enable autoscaling with request-based metrics to handle traffic spikes.

D.Implement request caching for idempotent predictions when appropriate.

E.Use multi-region deployment with Vertex AI Endpoints in multiple locations.

AnswersC, D, E

Autoscaling ensures capacity matches demand.

Why this answer

Options B, C, and E are correct. Option A is wrong because single-region deployment cannot meet global latency. Option D is wrong because batch processing adds latency.

Full explanation →

841

MCQmedium

An engineer needs to deploy multiple models on a single Vertex AI endpoint with separate traffic allocations. What is the maximum number of deployed models that can be assigned traffic on one endpoint?

A.2

B.Unlimited

C.10

D.5

AnswerD

Vertex AI allows up to 5 deployed models per endpoint with traffic splitting.

Why this answer

Vertex AI endpoints allow up to 5 deployed models to receive traffic simultaneously, with each model assigned a traffic percentage that sums to 100%. This limit ensures predictable routing and resource management, preventing overcommitment of the endpoint's underlying infrastructure.

Exam trap

Cisco often tests the misconception that Vertex AI endpoints support an unlimited number of deployed models or a higher number like 10, but the actual hard limit is 5, as defined in the Vertex AI quotas documentation.

How to eliminate wrong answers

Option A is wrong because the limit is not 2; Vertex AI supports up to 5 deployed models per endpoint, not just a pair for canary testing. Option B is wrong because the number is not unlimited; there is a hard cap of 5 to maintain endpoint stability and avoid excessive routing complexity. Option C is wrong because the limit is 5, not 10; this is a specific quota enforced by Vertex AI to control resource allocation and prevent performance degradation.

Full explanation →

842

MCQeasy

A data scientist has trained a scikit-learn model locally and wants to deploy it to Vertex AI for online predictions with low latency. The model is a small RandomForestClassifier (100 MB). What is the recommended way to deploy this model?

A.Deploy the model on a Kubernetes cluster with Istio.

B.Package the model as a Docker container with a custom prediction routine.

C.Upload the model to Vertex AI Model Registry using the pre-built scikit-learn serving container.

D.Export the model as a TensorFlow SavedModel and use the pre-built TF serving container.

AnswerC

Vertex AI offers a pre-built container for scikit-learn that handles prediction out of the box.

Why this answer

Option C is correct because Vertex AI provides a pre-built container for scikit-learn that is optimized for serving predictions with low latency. For a small RandomForestClassifier (100 MB), this container handles model loading, request routing, and scaling automatically, eliminating the need for custom infrastructure. This is the recommended approach for deploying scikit-learn models to Vertex AI for online predictions.

Exam trap

Google Cloud often tests the misconception that any model must be containerized or converted to TensorFlow for deployment, but the correct answer leverages the platform's pre-built container for the specific framework, which is the simplest and most efficient path for small models.

How to eliminate wrong answers

Option A is wrong because deploying on a Kubernetes cluster with Istio adds unnecessary operational complexity and overhead for a small model that can be served directly via Vertex AI's managed infrastructure; it is not the recommended path for a simple scikit-learn model. Option B is wrong because packaging the model as a Docker container with a custom prediction routine is overkill when Vertex AI already offers a pre-built, optimized scikit-learn serving container that handles the prediction logic out of the box. Option D is wrong because exporting a scikit-learn model as a TensorFlow SavedModel is not a direct conversion; scikit-learn models are not natively compatible with TensorFlow Serving, and this would require significant re-engineering or use of ONNX, which is not the recommended path for a RandomForestClassifier.

Full explanation →

843

MCQhard

Your team has built a low-latency similarity search service using Vertex AI Matching Engine (Vector Search). The index is updated daily with new embeddings. You need to serve the latest index without downtime. What is the correct deployment strategy?

A.Use streaming updates to continuously update the index.

B.Create a new index version, deploy it to the same endpoint, and then update the endpoint to use the new index version.

C.Update the existing index in place by calling the index update API.

D.Delete the old index and create a new index each day, then deploy the new index to a new endpoint and update DNS.

AnswerB

Correct. This allows zero-downtime updates.

Why this answer

Option B is correct because Vertex AI Matching Engine supports deploying a new index version to the same endpoint without downtime. You create a new index version from the updated embeddings, deploy it to the existing endpoint, and then update the endpoint to use the new version. This allows traffic to seamlessly switch to the updated index once it is fully deployed, avoiding any service interruption.

Exam trap

The trap here is that candidates may assume streaming updates (Option A) are possible for low-latency similarity search, but Vertex AI Matching Engine requires batch index creation and does not support real-time streaming updates, making versioned deployment the only correct approach for zero-downtime updates.

How to eliminate wrong answers

Option A is wrong because streaming updates are not supported for Vertex AI Matching Engine indexes; the index is built offline and cannot be continuously updated in real-time. Option C is wrong because updating an index in place via the update API is not supported; you must create a new index version for each update. Option D is wrong because deleting the old index and creating a new one each day, then deploying to a new endpoint and updating DNS, introduces unnecessary complexity and potential downtime during DNS propagation and endpoint creation.

Full explanation →

844

Multi-Selectmedium

An ML team is designing an automated pipeline to retrain a recommendation model every day using new user interaction data stored in BigQuery. The pipeline must be cost-efficient, scalable, and require minimal manual intervention. Which two approaches should they consider?

Select 2 answers

A.Deploy a custom Kubernetes cron job on GKE to run the training script directly.

B.Use Cloud Composer (Airflow) to schedule the pipeline with a DAG.

C.Use Cloud Scheduler to publish a Pub/Sub message daily, which triggers a Cloud Function that starts the Vertex AI Pipeline.

D.Use Dataflow to continuously read from BigQuery and trigger training when new data arrives.

E.Use Vertex AI Pipelines to define the workflow and preemptible VMs for training to reduce cost.

AnswersC, E

This provides automated daily triggering with minimal overhead.

Why this answer

Option C is correct because Cloud Scheduler triggers a Pub/Sub message that invokes a Cloud Function, which starts a Vertex AI Pipeline. This serverless approach is cost-efficient (no idle compute), scales automatically, and requires minimal manual intervention. Option E is correct because Vertex AI Pipelines natively orchestrates ML workflows, and using preemptible VMs reduces training costs by up to 80% while maintaining scalability.

Exam trap

Google Cloud often tests the distinction between batch scheduling (Cloud Scheduler) and continuous streaming (Dataflow), and candidates mistakenly choose Dataflow because they think 'new data' implies real-time, but the requirement is a daily retrain, not a streaming trigger.

Full explanation →

845

MCQhard

You have a model that requires GPU for efficient inference. You deploy it on Vertex AI with a single NVIDIA T4 GPU accelerator and notice that the GPU utilization hovers around 30%. The endpoint has 10 replicas. What is the best way to improve cost efficiency while maintaining throughput?

A.Use a larger GPU like V100 to process requests faster.

B.Reduce the number of replicas to increase GPU utilization per instance.

C.Enable autoscaling to increase the number of replicas.

D.Switch to a CPU-only instance; the model can run on CPU.

AnswerB

Fewer replicas with same traffic will increase GPU utilization and reduce cost.

Why this answer

If GPU utilization is low, you can reduce the number of replicas or increase the batch size per request to fully utilize the GPU. Reducing replicas directly saves cost. Increasing batch size may also help but requires code changes.

Full explanation →

846

MCQeasy

An ML engineer needs to deploy a model from Vertex AI Model Registry to an endpoint. The model has multiple versions. They want to designate one version as the 'champion' for production traffic. How should they do this?

A.Use the 'champion' alias

B.Set the version as default in the registry

C.Use the 'production' alias

D.Deploy the model directly to an endpoint without alias

AnswerA

The champion alias is used to designate the production version.

Why this answer

Vertex AI Model Registry supports aliases; assigning the 'champion' alias to a model version allows routing production traffic to that version.

Full explanation →

847

MCQeasy

Refer to the exhibit. A data scientist notices that predictions from a deployed model are taking longer than expected. Which Cloud Monitoring metric should be inspected first to identify the bottleneck?

A.Vertex AI - Model - Compute utilization

B.Vertex AI - Endpoint - Prediction latency distribution

C.Vertex AI - Endpoint - Traffic

D.Vertex AI - Endpoint - Online prediction errors

AnswerB

This metric directly shows the distribution of latency for prediction requests, making it the first place to look for a bottleneck.

Why this answer

The data scientist is investigating slow predictions from a deployed model. The most direct metric to identify the latency bottleneck is the prediction latency distribution, which shows the distribution of response times for online prediction requests. This metric allows you to pinpoint whether the delay is due to model inference time, network overhead, or endpoint queuing, making it the first logical place to inspect.

Exam trap

Google Cloud often tests the distinction between metrics that measure performance (latency) versus metrics that measure capacity (utilization, traffic) or errors, leading candidates to mistakenly choose compute utilization or traffic when the question explicitly asks about prediction time.

How to eliminate wrong answers

Option A is wrong because Vertex AI - Model - Compute utilization measures the resource usage (CPU/memory) of the model's compute resources, which can indicate a resource bottleneck but does not directly show prediction latency; it is a secondary metric to investigate after latency is confirmed. Option C is wrong because Vertex AI - Endpoint - Traffic measures the number of requests per second (RPS) to the endpoint, which can indicate load but does not directly measure how long each prediction takes; high traffic can cause latency, but the metric itself is not a latency metric. Option D is wrong because Vertex AI - Endpoint - Online prediction errors tracks the count or rate of failed predictions (e.g., timeouts, invalid inputs), not the latency of successful predictions; errors may be a consequence of latency but are not the primary metric for identifying a latency bottleneck.

Full explanation →

848

MCQhard

A team uses Vertex AI Metadata to track pipeline runs. They need to identify all artifacts that were generated by a particular pipeline execution. Which API method should they use?

A.List executions and then list artifacts separately

B.Use the lineage query API with the execution ID

C.Create a context and query executions

D.Query artifacts by filter on execution ID

AnswerB

The lineage query returns upstream/downstream artifacts for the given execution.

Why this answer

The lineage query API can retrieve upstream and downstream artifacts for a given execution, allowing tracing of artifacts from a specific pipeline run.

Full explanation →

849

MCQhard

A machine learning engineer is training a large-scale text classification model using a distributed strategy on TPUs. The training loss decreases normally but the validation loss starts increasing after a few epochs while training loss continues to decrease. The engineer suspects overfitting. Which technique is most appropriate to address this while scaling training?

A.Add dropout regularization.

B.Use early stopping with patience.

C.Reduce the learning rate.

D.Increase the batch size.

AnswerA

Reduces overfitting by randomly dropping units, effective in distributed settings.

Why this answer

Option B is correct because dropout regularization is a common technique to prevent overfitting in neural networks, and it can be applied in distributed training without major modifications. Option A is wrong because reducing learning rate may not directly address overfitting. Option C is wrong because increasing batch size can sometimes help generalization but is not a primary anti-overfitting method.

Option D is wrong because early stopping prevents further overfitting but does not address the cause during training.

Full explanation →

850

MCQeasy

A company deploys a TensorFlow model on Vertex AI Prediction with a single node. During peak hours, inference latency increases. What should they do first to reduce latency?

A.Enable autoscaling for the deployment

B.Increase the machine type of the node

C.Decrease the min replicas to 0

D.Enable automatic batching of requests

AnswerA

Autoscaling adds nodes during peak traffic, reducing latency.

Why this answer

Enabling autoscaling for the deployment is the correct first step because it allows Vertex AI Prediction to dynamically adjust the number of replicas based on incoming traffic. During peak hours, autoscaling can add more nodes to distribute the inference load, directly reducing latency without requiring manual intervention or over-provisioning.

Exam trap

The trap here is that candidates often confuse improving throughput (batching or bigger machines) with reducing latency under load, but the first action should always be to add more replicas via autoscaling to handle concurrent requests, not to optimize a single node's performance.

How to eliminate wrong answers

Option B is wrong because increasing the machine type of the node (e.g., moving to a larger VM) may improve per-node throughput but does not address the root cause of insufficient capacity during traffic spikes; it also increases cost without guaranteeing latency reduction if the single node is already saturated. Option C is wrong because decreasing the min replicas to 0 would cause the deployment to scale down to zero during idle periods, but during peak hours it would still need to scale up from zero, causing cold-start latency and potentially failing to handle the initial burst of requests. Option D is wrong because enabling automatic batching of requests can improve throughput by grouping multiple inference requests into a single batch, but it does not reduce latency for individual requests—in fact, it may increase latency as requests wait for a batch to fill.

Full explanation →

851

MCQeasy

An ML engineer needs to monitor a deployed model for data drift. They want to compare the distribution of incoming predictions against a baseline distribution. Which Vertex AI service should they use?

A.Vertex AI Feature Store

B.Vertex AI Model Monitoring

C.Vertex AI Experiments

D.Vertex AI Explainable AI

AnswerB

Designed for detecting drift and anomalies in prediction data.

Why this answer

Vertex AI Model Monitoring is the correct service because it is specifically designed to detect data drift and feature skew in deployed models. It continuously compares the distribution of incoming prediction requests against a baseline distribution (e.g., training data or a previous window) and alerts the engineer when statistically significant drift is detected, using metrics like Jensen-Shannon divergence or L-infinity distance.

Exam trap

Google Cloud often tests the distinction between monitoring (drift detection) and other MLOps components like feature stores or experiment tracking, so the trap here is that candidates may confuse 'monitoring' with 'storing features' or 'tracking experiments' because all are part of the ML lifecycle but serve different purposes.

How to eliminate wrong answers

Option A is wrong because Vertex AI Feature Store is a centralized repository for storing, managing, and serving feature values for training and serving, not for monitoring distributional shifts in predictions. Option C is wrong because Vertex AI Experiments is used for tracking and comparing machine learning experiments (e.g., hyperparameter tuning runs), not for real-time monitoring of deployed model predictions. Option D is wrong because Vertex AI Explainable AI provides feature attributions and explanations for model predictions, but does not perform statistical drift detection or baseline comparison.

Full explanation →

852

MCQmedium

Refer to the exhibit. What is this Cloud Build step doing?

A.Uploading a model to Vertex AI Model Registry

B.Deploying a model to a Vertex AI endpoint

C.Creating a custom container for prediction

D.Training a model in Vertex AI

AnswerA

The 'upload' command registers the model.

Why this answer

The Cloud Build step shown uses the `gcloud ai models upload` command, which specifically uploads a model artifact to the Vertex AI Model Registry. This action registers the model metadata and location in Vertex AI, making it available for versioning and later deployment, but does not create an endpoint or perform training.

Exam trap

Google Cloud often tests the distinction between model registration (upload) and model deployment (endpoint creation), leading candidates to confuse the `gcloud ai models upload` step with the actual deployment to an endpoint.

How to eliminate wrong answers

Option B is wrong because deploying a model to a Vertex AI endpoint requires the `gcloud ai endpoints deploy-model` command, not `gcloud ai models upload`. Option C is wrong because creating a custom container for prediction involves building and pushing a Docker image (e.g., via `gcloud builds submit` or `docker push`), not uploading a model to the registry. Option D is wrong because training a model in Vertex AI uses `gcloud ai custom-jobs create` or `gcloud ai training jobs submit`, not the model upload command.

Full explanation →

853

MCQhard

A company is using AutoML Vision for object detection and observes high latency for online predictions. What can they do to reduce latency?

A.Reduce the training budget to create a smaller model

B.Use continuous batch prediction instead of online prediction

C.Deploy the model to a region closer to the users

D.Use a larger batch size in the prediction request

AnswerA

A smaller model has lower inference latency.

Why this answer

Reducing the training budget in AutoML Vision forces the model to use fewer node-hours, which typically results in a smaller and less complex model. A smaller model has fewer parameters and requires less computation during inference, directly reducing the latency for online predictions. This is a trade-off between model accuracy and inference speed.

Exam trap

The trap here is that candidates often confuse network latency with inference latency, assuming that deploying closer to users (Option C) is the primary fix, when in fact the question specifically targets high latency for online predictions caused by model complexity.

How to eliminate wrong answers

Option B is wrong because continuous batch prediction is designed for offline, asynchronous processing of large datasets and does not reduce latency for real-time online predictions; it actually increases end-to-end time. Option C is wrong because deploying the model to a region closer to users reduces network latency but does not address the model inference latency itself, which is the primary bottleneck in AutoML Vision's online prediction. Option D is wrong because AutoML Vision online prediction endpoints do not support user-defined batch sizes; the batch size is fixed by the service, and attempting to use a larger batch size would not be accepted or would increase latency per request.

Full explanation →

854

MCQmedium

Refer to the exhibit. A team deploys a model with the above configuration. They observe that during traffic spikes, the endpoint does not scale up quickly enough, causing increased latency. The average CPU utilization never exceeds 50%. What is the most likely reason for the slow scaling?

A.The autoscaling metric is not configured

B.The minReplicaCount is too low

C.The accelerator is causing a bottleneck

D.The machineType does not have enough CPU

AnswerA

The strategy is 'manual', so autoscaling is not configured; changing to 'autoscaling' with a target metric would resolve the issue.

Why this answer

Option C is correct. The configuration shows strategy: manual, meaning autoscaling is disabled. Without autoscaling, the endpoint does not add instances in response to load.

Option A increases min replicas but still manual. Option B changes machine type but scaling remains manual. Option D is irrelevant because CPU utilization is low.

Full explanation →

855

MCQeasy

You are fine-tuning a BERT model from Hugging Face Transformers on Vertex AI. You want to minimise cost for a short experiment. Which compute configuration should you use?

A.A custom training job with a single NVIDIA T4 GPU using spot VMs

B.A custom training job with a TPU v3-8 pod

C.A custom training job with 8 NVIDIA V100 GPUs using regular VMs

D.A standard n1-highmem-8 machine with no accelerator

AnswerA

Spot VMs lower cost; T4 is sufficient for fine-tuning BERT.

Why this answer

Spot VMs provide up to 60-90% discount compared to regular VMs. Since the experiment is short, preemption risk is low. Custom TPU pods are expensive and overkill; T4 GPUs are cheaper but spot VMs are the most cost-effective.

Full explanation →

856

MCQmedium

Your team has deployed a model on Vertex AI endpoints. You need to monitor the prediction latency to ensure it meets a 99th percentile SLO of 500ms. You want to set up an alert if the latency exceeds this threshold. Which metric should you use?

A.The 99th percentile of the `prediction/online/response_latencies` metric.

B.The number of prediction requests that timeout.

C.Average prediction latency from the endpoint's logs.

D.The maximum prediction latency from the endpoint's monitoring dashboard.

AnswerA

This metric provides quantile data for latency, allowing you to monitor the 99th percentile.

Why this answer

Option A is correct because the `prediction/online/response_latencies` metric in Vertex AI provides a distribution of latency values, allowing you to query the 99th percentile directly. This aligns with the SLO requirement to monitor the tail latency, not the average or maximum, ensuring that the worst-case performance for 1% of requests stays under 500ms.

Exam trap

Cisco often tests the distinction between tail latency (percentiles) and central tendency (average) or extreme values (maximum), trapping candidates who confuse SLO monitoring with simple failure counts or averages.

How to eliminate wrong answers

Option B is wrong because the number of prediction requests that timeout is a count of failures, not a latency measurement; it does not capture the 99th percentile latency and would miss requests that complete but exceed 500ms. Option C is wrong because average prediction latency can mask high tail latencies; a low average could hide a significant number of requests exceeding 500ms, violating the SLO. Option D is wrong because the maximum prediction latency is a single extreme value, often an outlier due to cold starts or transient spikes, and does not represent the 99th percentile behavior required for the SLO.

Full explanation →

857

MCQhard

A team is building a CI/CD pipeline for ML using Cloud Build. The pipeline trains a model and deploys it to Vertex AI. Recently, a change in the data processing step caused the model to be trained with a different data version, leading to a failed deployment because the model was invalid. How should the team prevent this in the future?

A.Add a manual review step before training

B.Pin all library versions in the Docker image

C.Use a data versioning tool (e.g., DVC) to track datasets and ensure the pipeline always uses the correct version

D.Schedule a cron job to check for data changes

AnswerC

Data versioning ensures reproducibility and consistency across pipeline runs.

Why this answer

Option C is correct because the root cause is a data version mismatch, not a code or environment issue. A data versioning tool like DVC (Data Version Control) tracks dataset versions via hash-based pointers in Git, ensuring the pipeline retrieves the exact dataset version used during training. This prevents silent failures when data processing steps change the data schema or content, which library pinning or manual reviews cannot guarantee.

Exam trap

The trap here is that candidates confuse environment reproducibility (pinning libraries) with data reproducibility, assuming that locking code dependencies is sufficient to prevent model failures caused by data drift or version changes.

How to eliminate wrong answers

Option A is wrong because a manual review step before training introduces human latency and does not enforce data version consistency; it relies on a person to catch a version mismatch that may not be visually obvious. Option B is wrong because pinning library versions in the Docker image addresses dependency drift in code, not data versioning; the model failed due to a different data version, not a library incompatibility. Option D is wrong because scheduling a cron job to check for data changes is reactive and does not prevent the pipeline from using the wrong data version; it only alerts after the fact, and the pipeline would still train on incorrect data.

Full explanation →

858

MCQhard

A company serves a scikit-learn model on Vertex AI Prediction but receives a 400 error with 'Prediction failed: Model evaluation error'. What is the most likely cause?

A.The input data format is incorrect

B.The model was trained with a different framework

C.The model uses a scikit-learn version not supported by Vertex AI

D.The endpoint is overloaded and timing out

AnswerC

Version mismatch causes evaluation failure.

Why this answer

Vertex AI Prediction supports specific versions of scikit-learn for serving models. If the model was trained with a version that is not in the supported list (e.g., 0.19, 0.20, 0.22, 0.23, 0.24, 1.0, 1.1), the prediction endpoint will fail with a 'Model evaluation error' because the underlying runtime cannot load the serialized model (e.g., pickle or joblib file). This is the most likely cause of a 400 error when the input format is otherwise correct.

Exam trap

Google Cloud often tests the misconception that a 400 error always indicates a client-side input format issue, but here the error message 'Model evaluation error' points to a server-side model loading failure due to version incompatibility, not the input data.

How to eliminate wrong answers

Option A is wrong because an incorrect input data format typically results in a different error message, such as 'Invalid input' or 'Prediction failed: Input parsing error', not 'Model evaluation error'. Option B is wrong because Vertex AI Prediction supports multiple frameworks (TensorFlow, PyTorch, XGBoost, scikit-learn) and will not throw a 'Model evaluation error' solely due to a different training framework; it would fail at model upload or deployment with an unsupported framework error. Option D is wrong because an overloaded endpoint or timeout would return a 429 (Too Many Requests) or 504 (Gateway Timeout) status code, not a 400 error with 'Model evaluation error'.

Full explanation →

859

MCQhard

A team is monitoring a production ML system that includes multiple models and data processing pipelines. They want to set up a comprehensive alerting strategy that minimizes false positives while ensuring critical issues are promptly addressed. Which approach is the most effective?

A.Set up alerts for all possible error conditions

B.Use static thresholds based on historical data

C.Rely on manual monitoring during business hours

D.Use AIOps with anomaly detection to dynamically adjust thresholds

AnswerD

AIOps anomaly detection models learn normal behavior and flag deviations, reducing false positives while detecting real anomalies.

Why this answer

Option D is correct because AIOps with anomaly detection uses machine learning to dynamically adjust alert thresholds based on real-time system behavior, reducing false positives while ensuring critical issues are detected promptly. This approach adapts to changing data distributions and traffic patterns, unlike static thresholds that require manual tuning and often miss subtle anomalies. It is the most effective strategy for complex ML production systems where multiple models and pipelines interact, as it can correlate signals across components to identify genuine incidents.

Exam trap

The trap here is that candidates often choose static thresholds (Option B) because they seem simpler and more predictable, but they fail to recognize that production ML systems require adaptive thresholds to handle dynamic data distributions and avoid alert fatigue.

How to eliminate wrong answers

Option A is wrong because setting alerts for all possible error conditions leads to alert fatigue, overwhelming the team with noise and causing critical issues to be missed; it lacks prioritization and ignores the need for intelligent filtering. Option B is wrong because static thresholds based on historical data fail to adapt to concept drift, seasonal patterns, or sudden traffic spikes, resulting in either too many false positives or missed anomalies when the system behavior changes. Option C is wrong because relying on manual monitoring during business hours introduces unacceptable latency for critical issues that occur outside those hours, and human error or fatigue can cause delays in detection; it is not scalable for 24/7 production ML systems.

Full explanation →

860

Multi-Selecteasy

An ML engineer is creating a Vertex AI Pipeline that includes a loop to train multiple models in parallel on different hyperparameter sets. Which TWO KFP SDK v2 constructs can be used to implement this parallel execution?

Select 2 answers

A.dsl.If

B.dsl.Parallel

C.A for loop in the pipeline function that creates multiple tasks

D.dsl.Pipeline

E.dsl.Collected

AnswersC, E

You can use a Python for loop to generate multiple task invocations, effectively creating parallel tasks.

Why this answer

Option C is correct because in KFP SDK v2, you can use a Python for loop inside the pipeline function to dynamically create multiple task instances, each with different hyperparameter sets. These tasks are then executed in parallel by the Vertex AI Pipelines orchestrator, as long as there are no data dependencies between them. This pattern leverages the SDK's ability to compile Python control flow into a directed acyclic graph (DAG) of pipeline steps.

Exam trap

Cisco often tests the misconception that a dedicated 'Parallel' construct exists in KFP SDK v2, when in fact parallel execution is achieved through Python loops or the dsl.ParallelFor component, and candidates may confuse dsl.ParallelFor with the nonexistent dsl.Parallel.

Full explanation →

861

MCQhard

A company is deploying multiple models on a single Vertex AI endpoint to reduce costs. Each model has different traffic patterns. Which configuration should they use?

A.Use Cloud Run to serve each model as a separate service.

B.Use a single endpoint with multiple deployed models and traffic allocation.

C.Deploy each model to separate endpoints and use a load balancer.

D.Use Vertex AI Matching Engine to serve models.

AnswerB

Multi-model serving allows deploying several models on one endpoint with traffic splitting.

Why this answer

Vertex AI endpoints support deploying multiple models behind a single endpoint with traffic splitting, allowing you to route different percentages of requests to each model based on their traffic patterns. This reduces infrastructure costs compared to separate endpoints, as the endpoint's underlying compute resources are shared. Traffic allocation can be adjusted dynamically to match changing model usage without redeploying.

Exam trap

The trap here is that candidates confuse Vertex AI endpoints with generic load balancing (Option C) or think separate services (Option A) are needed for different models, missing the cost-saving capability of traffic splitting on a single endpoint.

How to eliminate wrong answers

Option A is wrong because Cloud Run serves each model as a separate service, which does not consolidate models under a single endpoint and incurs additional networking and management overhead, failing to reduce costs as intended. Option C is wrong because deploying each model to separate endpoints and using a load balancer adds complexity and cost (multiple endpoints, load balancer charges) without leveraging Vertex AI's native traffic splitting, which is more efficient. Option D is wrong because Vertex AI Matching Engine is designed for vector similarity search (e.g., embeddings), not for serving multiple prediction models with traffic allocation on a single endpoint.

Full explanation →

862

MCQeasy

A machine learning engineer is exporting a trained model from Vertex AI Training to the Model Registry. Which artifact should they upload as the model artifact?

A.The saved model directory containing the model file(s) and any custom dependencies.

B.Only the model checkpoint file (.ckpt or .h5).

C.The entire training directory including training code and logs.

D.A zip file of the training source code.

AnswerA

This is the standard artifact expected by Vertex AI for deployment.

Why this answer

When exporting a trained model from Vertex AI Training to the Model Registry, the correct artifact is the saved model directory that contains the model file(s) (e.g., SavedModel format for TensorFlow, model.pkl for scikit-learn) along with any custom dependencies required for serving. This ensures the model can be deployed consistently to endpoints or batch predictions, as the Model Registry expects a self-contained artifact that includes both the model binary and its runtime dependencies.

Exam trap

Google Cloud often tests the distinction between training artifacts (checkpoints, code) and deployable model artifacts, trapping candidates who confuse a checkpoint (used for resuming training) with a final, serving-ready model.

How to eliminate wrong answers

Option B is wrong because a model checkpoint file (.ckpt or .h5) is an intermediate training state, not a final deployable artifact; it lacks the serialized graph and serving signatures needed for inference. Option C is wrong because uploading the entire training directory, including training code and logs, introduces unnecessary files and violates the Model Registry's expectation of a minimal, serving-ready artifact. Option D is wrong because a zip file of the training source code contains no model weights or architecture, making it useless for deployment.

Full explanation →

863

MCQmedium

A data science team uses BigQuery to store raw data and Vertex AI for model training. They want to ensure that only authorized users can access training data, and that model artifacts are automatically versioned and tracked. Which combination of Google Cloud services should they use?

A.Dataflow for data access control and Vertex AI Experiments for model tracking

B.Cloud Storage with bucket-level IAM and Cloud Build for versioning

C.Cloud Composer for data access control and Cloud Source Repositories for model versioning

D.Vertex AI Feature Store with access control and Vertex AI ML Metadata for model versioning

AnswerD

Vertex AI Feature Store provides controlled access to features, and ML Metadata tracks model artifacts and versions.

Why this answer

Vertex AI Feature Store provides fine-grained access control to training data, ensuring only authorized users can access it. Vertex AI ML Metadata automatically tracks and versions model artifacts, lineage, and parameters, which aligns with the requirement for automated versioning and tracking.

Exam trap

Google Cloud often tests the distinction between services that handle data processing (Dataflow, Cloud Composer) versus those that handle access control and metadata management (Feature Store, ML Metadata), leading candidates to confuse orchestration or CI/CD tools with versioning and access control solutions.

How to eliminate wrong answers

Option A is wrong because Dataflow is a data processing service, not an access control mechanism; it does not provide data access control for training data in BigQuery or Vertex AI. Option B is wrong because Cloud Storage with bucket-level IAM can control access to stored objects, but Cloud Build is a CI/CD service for building and deploying applications, not for versioning model artifacts automatically. Option C is wrong because Cloud Composer is a workflow orchestration service (based on Apache Airflow), not a data access control solution, and Cloud Source Repositories is a Git repository for source code, not designed for model versioning or tracking.

Full explanation →

864

MCQmedium

A team wants to collect ground truth labels for their model deployed on Vertex AI Endpoint to perform model quality monitoring. They have a process that generates actual outcomes within 24 hours of prediction. What is the recommended approach for storing these labels?

A.Upload the ground truth labels to a BigQuery table with a schema that includes prediction timestamp and model version.

B.Use Vertex AI Experiments to log ground truth alongside training runs.

C.Store the ground truth labels in Cloud Storage as CSV files and reference them in the monitoring config.

D.Insert ground truth labels directly into the Vertex AI Endpoint's log sink.

AnswerA

Correct approach; BigQuery is required for ground truth storage.

Why this answer

Vertex AI Model Monitoring for model quality requires ground truth data to be uploaded to BigQuery. Labels are stored in a BigQuery table with the prediction timestamp and model ID for comparison.

Full explanation →

865

MCQmedium

A retail company wants to predict customer churn using historical purchase data stored in BigQuery. The data includes customer demographics, transaction history, and support interactions. The team is comfortable writing SQL and wants to avoid moving data to a separate environment. Which approach should they take?

A.Use the Cloud Natural Language API to analyze customer support interactions and combine results with purchase data in BigQuery.

B.Export the data to a CSV file and use Vertex AI AutoML Tables to train a classification model.

C.Use BigQuery ML to create a logistic regression model (LOGISTIC_REG) on the data directly in BigQuery.

D.Create a Dataflow pipeline to stream data to Cloud SQL and use Cloud SQL's built-in ML functions.

AnswerC

BigQuery ML supports logistic regression for binary classification and runs entirely in BigQuery using SQL.

Why this answer

Option C is correct because BigQuery ML allows the team to build and train a logistic regression model directly on data stored in BigQuery using SQL syntax, without moving data to a separate environment. The LOGISTIC_REG model type is specifically designed for binary classification tasks like churn prediction, and it runs entirely within BigQuery's serverless infrastructure, satisfying the team's requirement to avoid data movement.

Exam trap

Cisco often tests the misconception that ML requires moving data to a separate platform (like Vertex AI or Cloud SQL), when in fact BigQuery ML provides a low-code, SQL-based solution that keeps data in place and meets the stated constraints.

How to eliminate wrong answers

Option A is wrong because the Cloud Natural Language API is used for text analysis (e.g., sentiment extraction), not for training a predictive churn model; it would require additional steps to combine results and does not provide a built-in classification model. Option B is wrong because exporting data to a CSV file and using Vertex AI AutoML Tables violates the requirement to avoid moving data to a separate environment, and it introduces unnecessary data egress and manual steps. Option D is wrong because Cloud SQL does not have built-in ML functions for training classification models; it is a relational database service, and streaming data through Dataflow to Cloud SQL adds complexity and does not leverage BigQuery's native ML capabilities.

Full explanation →

866

Drag & Dropmedium

Drag and drop the steps to set up data lineage tracking for ML pipelines using Vertex AI Experiments in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

1Step 1

2Step 2

3Step 3

4Step 4

Why this order

Start with SDK setup, then create an experiment, log metrics, record artifacts, and review lineage.

Full explanation →

867

Multi-Selectmedium

Which THREE are key capabilities of Vertex AI Feature Store?

Select 3 answers

A.Automatic generation of feature embeddings

B.Feature monitoring and validation to detect skew

C.Online serving for low-latency feature retrieval

D.Real-time streaming ingestion from Apache Kafka

E.Offline batch serving for training

AnswersB, C, E

Feature Store includes monitoring for distribution changes.

Why this answer

Option B is correct because Vertex AI Feature Store provides built-in feature monitoring and validation capabilities that detect training-serving skew and data drift. This is critical for maintaining model performance in production, as it alerts when the distribution of feature values changes between training and serving environments.

Exam trap

Google Cloud often tests the misconception that Vertex AI Feature Store includes automatic embedding generation or direct Kafka integration, when in fact these are separate services or require custom implementation.

Full explanation →

868

MCQhard

A Vertex AI pipeline is triggered from Cloud Build using the configuration above. The pipeline fails with an error: 'Unable to submit build: The source code is not available.' What is the most likely cause?

A.The Docker build step failed silently due to a missing dependency.

B.The 'gcloud builds submit' command does not have access to the source code in the Cloud Build environment.

C.The Docker image tag does not include a hash, causing the push to fail.

D.The Cloud Build service account lacks permission to access the Vertex AI Pipeline API.

AnswerB

The source code must be provided or referenced explicitly; using 'gcloud builds submit' in a step requires the source to be available via a trigger or artifact.

Why this answer

The error 'Unable to submit build: The source code is not available' indicates that the Cloud Build environment cannot locate the source code when the 'gcloud builds submit' command is executed. This typically happens when the pipeline is triggered from Cloud Build but the source code is not properly staged or accessible in the build context, often because the build configuration does not include the source directory or the source is not uploaded to Cloud Storage. Option B correctly identifies that the command lacks access to the source code in the Cloud Build environment.

Exam trap

Google Cloud often tests the distinction between source code availability errors and permission or build failures, leading candidates to mistakenly attribute the error to service account permissions or Docker issues when the root cause is a missing or misconfigured source path.

How to eliminate wrong answers

Option A is wrong because a silent Docker build step failure due to a missing dependency would produce a different error, such as 'Failed to build' or 'Docker build failed', not a source code unavailability error. Option C is wrong because the Docker image tag missing a hash would cause a push failure with an error like 'unauthorized' or 'tag invalid', not a source code availability issue. Option D is wrong because a permission issue with the Cloud Build service account accessing the Vertex AI Pipeline API would result in an authorization error (e.g., 'Permission denied'), not a source code unavailability error.

Full explanation →

869

MCQeasy

A data engineer wants to compute feature aggregates over a large dataset stored in BigQuery and write the results to Vertex AI Feature Store. The pipeline must handle both batch and streaming data. Which Google Cloud service should they use?

A.BigQuery scheduled queries

B.Cloud Functions triggered by Pub/Sub

C.Cloud Dataproc with Spark

D.Cloud Dataflow with Apache Beam

AnswerD

Dataflow handles both batch and streaming, integrates with BigQuery and Feature Store.

Why this answer

Dataflow (Apache Beam) is the recommended service for both batch and streaming data processing at scale. It can read from BigQuery and write to Feature Store.

Full explanation →

870

Multi-Selecthard

A team uses Vertex AI Pipelines for continuous training triggered by model drift. They want to monitor the pipeline execution cost and optimize resource usage. Which THREE metrics should they track? (Choose 3)

Select 3 answers

A.Pipeline execution duration

B.Number of failed pipeline runs

C.Model accuracy on validation set

D.Total GPU hours consumed per pipeline run

E.Cost per pipeline run in Cloud Billing

AnswersA, D, E

Longer duration increases cost; optimizing duration saves money.

Why this answer

Training cost is influenced by GPU hours, machine type, and training duration. Tracking these helps optimize.

Full explanation →

871

Multi-Selectmedium

You are using Vertex AI to train a model with a custom container. You need to pass command-line arguments for hyperparameters. Which TWO methods can you use? (Choose 2.)

Select 3 answers

A.Store arguments in a Cloud Storage file and download at runtime.

B.Set environment variables using the 'env' field and reference them in the container.

C.Set hyperparameter values in the 'hyperparameters' field of the worker pool spec.

D.Use the 'command' field to override the entrypoint and include arguments.

E.Specify args in the 'args' field of the container spec.

AnswersB, D, E

Environment variables can also be used to pass arguments.

Why this answer

Option B is correct because Vertex AI custom container training allows you to pass environment variables via the 'env' field in the worker pool spec, which the container can then reference at runtime. Option D is correct because you can override the container's default entrypoint using the 'command' field and include command-line arguments directly in that field. Option E is also correct as the 'args' field in the container spec can be used to pass arguments that are appended to the entrypoint or command.

Exam trap

Cisco often tests the distinction between the 'hyperparameters' field (used for tuning jobs) and the 'args'/'command' fields (used for passing static arguments), causing candidates to mistakenly select option C as a valid method for passing command-line arguments.

Full explanation →

872

MCQmedium

A team uses Vertex AI Pipelines. They need to ensure that only certain team members can deploy models to production. What is the best approach?

A.Use Vertex AI Experiments to track models

B.Store model artifacts in a bucket with bucket-level permissions

C.Use IAM roles with custom permissions on the Vertex AI Model Registry

D.Create separate projects for dev and prod

AnswerC

Model Registry integrates with IAM to grant specific deployment permissions.

Why this answer

Option C is correct because Vertex AI Model Registry supports IAM roles with custom permissions, allowing fine-grained access control over who can promote or deploy models to production. By assigning specific roles (e.g., `roles/aiplatform.modelDeployer`) to only authorized team members, you can restrict deployment actions while still permitting others to view or register models. This approach directly addresses the need to control production deployments without affecting other pipeline stages.

Exam trap

The trap here is that candidates often confuse artifact storage permissions (bucket-level IAM) with deployment permissions (model registry IAM), leading them to choose Option B, even though bucket permissions do not control the Vertex AI deployment API call.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing model training runs (e.g., hyperparameters, metrics), not for controlling access or permissions to deploy models. Option B is wrong because bucket-level permissions control access to the storage location of model artifacts, but they do not govern the deployment action itself within Vertex AI Pipelines; a user with bucket access could still lack deployment permissions, or vice versa. Option D is wrong because creating separate projects for dev and prod is an organizational boundary that can help with isolation, but it does not provide granular control over which specific team members can deploy within the same project; it also introduces overhead in managing multiple projects and does not leverage Vertex AI's native IAM capabilities for model registry operations.

Full explanation →

873

MCQeasy

A machine learning engineer is building a pipeline with Vertex AI Pipelines and wants to pass a large dataset between components without copying it to the container's memory. What is the best practice for passing data between pipeline components?

A.Mount an NFS volume to all containers and share data via the filesystem.

B.Use Cloud Storage URIs (gs://) to point to the data location.

C.Serialize the dataset to JSON and include it as a pipeline parameter.

D.Use the importer component to load the data into the pipeline as an in-memory artifact.

AnswerB

Using GCS URIs allows components to read/write data efficiently and supports caching.

Why this answer

Option B is correct because Vertex AI Pipelines natively supports passing Cloud Storage URIs (gs://) as artifact references between components, allowing components to read the dataset directly from GCS without copying it into container memory. This avoids memory limits and enables efficient handling of large datasets by leveraging GCS's scalable object storage.

Exam trap

Cisco often tests the misconception that large data must be passed as in-memory artifacts or serialized parameters, when the correct cloud-native pattern is to pass a storage URI and let components read the data lazily from object storage.

How to eliminate wrong answers

Option A is wrong because mounting an NFS volume introduces network filesystem latency, requires additional infrastructure setup, and is not a native or recommended pattern in Vertex AI Pipelines, which is designed for serverless, cloud-native artifact passing. Option C is wrong because serializing a large dataset to JSON and including it as a pipeline parameter would exceed the maximum parameter size limit (typically 512 KB in Vertex AI Pipelines) and would force the entire dataset into memory, defeating the purpose of avoiding memory copies. Option D is wrong because the importer component registers an external artifact (e.g., a GCS URI) into the pipeline's metadata store but does not load data into memory; the misconception is that it creates an in-memory artifact, whereas it merely creates a metadata reference.

Full explanation →

874

Multi-Selectmedium

A company is fine-tuning a large language model (Gemma 7B) using Vertex AI JumpStart. They want to reduce the model's memory footprint for deployment on edge devices. Which THREE model compression techniques should they consider?

Select 3 answers

A.Post-training quantization to INT8

B.Using a larger batch size during inference

C.Quantization-aware training

D.Increasing the number of layers

E.Pruning of weights with small magnitude

AnswersA, C, E

Reduces model size and speeds up inference.

Why this answer

Post-training quantization (e.g., to INT8) reduces precision and size. Quantization-aware training simulates quantization during training for better accuracy. Pruning removes redundant weights.

Knowledge distillation trains a smaller model. For edge deployment, quantization (both post-training and quantization-aware) and pruning are common. Distillation is also valid but often considered separate.

Full explanation →

875

MCQhard

A company wants to deploy a TensorFlow model on edge devices for real-time inference without internet connectivity. Which Vertex AI service should they use to manage the deployment?

A.TensorFlow Lite Converter

B.Vertex AI Endpoint

C.Vertex AI Edge Manager

D.AI Platform Prediction (legacy)

AnswerC

Edge Manager specifically manages model deployment on edge devices.

Why this answer

Vertex AI Edge Manager allows you to deploy models to edge devices and manage them centrally.

Full explanation →

876

MCQhard

Refer to the exhibit. A Machine Learning Engineer attempts to deploy a model to a Vertex AI Endpoint for online predictions but receives an error. What is the most likely cause of this error?

A.The model is not compatible with the selected machine type.

B.The machine type does not support GPU acceleration.

C.The min replica count is set to 0, which is not allowed for online prediction.

D.The endpoint is not in the same region as the model.

AnswerC

The error clearly states that min_replica_count must be at least 1.

Why this answer

Vertex AI online prediction endpoints require at least one replica to serve traffic. Setting `min_replica_count` to 0 is only valid for batch prediction, not for online prediction, because the endpoint must always have a running instance to handle incoming requests. The error occurs because the deployment request violates this constraint, causing the API to reject the configuration.

Exam trap

Google Cloud often tests the distinction between batch and online prediction configuration requirements, specifically that `min_replica_count = 0` is valid for batch but invalid for online, leading candidates to overlook this subtle but critical constraint.

How to eliminate wrong answers

Option A is wrong because Vertex AI automatically validates model compatibility with the selected machine type at deployment time; if there were an incompatibility, the error would be specific to that mismatch, not a generic deployment failure. Option B is wrong because GPU acceleration is optional and not required for online prediction; the error message would explicitly mention GPU-related issues if that were the cause. Option D is wrong because Vertex AI endpoints and models can be in different regions as long as the endpoint is deployed in a supported region; the platform handles cross-region model serving transparently.

Full explanation →

877

MCQmedium

A machine learning engineer notices that the Vertex AI Prediction endpoint's error rate has increased over the past week. The model was retrained with new data and redeployed. Which step should the engineer take first to diagnose the issue?

A.Increase the number of replicas to reduce error rate.

B.Compare the input data distribution of recent requests to the training data distribution using Explainable AI.

C.Roll back to the previous model version immediately.

D.Check the Cloud Monitoring dashboard for latency and error codes, and review the model's prediction logs.

AnswerD

Monitoring and logs provide direct evidence to diagnose errors.

Why this answer

Option C is correct because reviewing Cloud Monitoring dashboards and logs provides immediate insights into error patterns and root cause. Option A is premature without investigation. Option B is more advanced and requires setup.

Option D might temporarily reduce errors due to overload but does not address the underlying cause.

Full explanation →

878

Multi-Selecteasy

Which TWO practices help ensure reproducible ML experiments?

Select 2 answers

A.Store all artifacts in a temporary bucket

B.Use a random seed for each run

C.Use Vertex AI Experiments to track parameters and metrics

D.Version control training code with Cloud Source Repositories

E.Use preemptible VMs

AnswersC, D

Experiments record the exact configuration and results.

Why this answer

Vertex AI Experiments automatically logs parameters, metrics, and artifacts for each run, creating a complete lineage that enables exact reproduction of results. By tracking these details alongside the code version, you can recreate the exact environment and configuration that produced a given outcome, which is essential for reproducibility.

Exam trap

Google Cloud often tests the distinction between practices that improve reproducibility (like tracking parameters and versioning code) versus practices that improve cost efficiency or speed (like using preemptible VMs or temporary storage), leading candidates to conflate operational convenience with scientific reproducibility.

Full explanation →

879

MCQhard

A team uses custom training and deploys a TensorFlow model using Vertex AI Endpoints. They set up Cloud Monitoring alerts for online prediction latency. However, they notice the latency metric shows a spike every hour, but the actual user experience is fine. What could be the cause?

A.The metric includes prediction time plus log writing time

B.The alert threshold is too low

C.The metric is being sampled every hour

D.A monitoring agent on the VM is causing additional latency

AnswerA

Periodic log dumping can cause hourly spikes in measured latency.

Why this answer

Option A is correct because Vertex AI Endpoints' latency metric includes both the model inference time and the time taken to write prediction logs to Cloud Logging. This log writing occurs asynchronously but can cause periodic spikes in the reported latency metric when log buffers flush, even though the actual user-facing prediction latency remains unaffected. The spike every hour aligns with log rotation or buffer flush intervals, not with actual prediction performance degradation.

Exam trap

Google Cloud often tests the misconception that latency metrics reflect only model inference time, when in reality they may include ancillary operations like logging, causing candidates to overlook the logging overhead as the source of periodic spikes.

How to eliminate wrong answers

Option B is wrong because the alert threshold being too low would cause continuous or frequent alerts, not a predictable hourly spike in the latency metric itself. Option C is wrong because sampling every hour would produce a single data point per hour, not a spike within the metric; the metric is reported continuously, and sampling frequency does not create spikes. Option D is wrong because a monitoring agent on the VM would add consistent overhead, not a periodic hourly spike, and Vertex AI Endpoints are managed services where customers do not manage VMs directly for prediction serving.

Full explanation →

880

MCQmedium

A company is training a large neural network on Vertex AI and training jobs keep failing with 'Out of memory' errors. The VM uses a standard n1-standard-4 machine with 15 GB RAM. Which action should they take first?

A.Use a larger machine type like n1-standard-16

B.Reduce the batch size in the training script

C.Enable distributed training across multiple VMs

D.Switch the training to CPU only

AnswerB

Smaller batch size reduces peak memory usage.

Why this answer

The 'Out of memory' error on a n1-standard-4 VM (15 GB RAM) indicates the model's memory footprint exceeds available RAM. Reducing the batch size directly decreases the memory required for storing intermediate activations and gradients during training, which is the most immediate and cost-effective fix without changing the underlying infrastructure.

Exam trap

The trap here is that candidates often jump to scaling up infrastructure (larger machine or distributed training) instead of first tuning the training hyperparameter (batch size) that directly controls memory consumption, which is the simplest and most cost-effective fix.

How to eliminate wrong answers

Option A is wrong because upgrading to a larger machine type (e.g., n1-standard-16) increases cost and may still fail if the memory issue is due to a batch size that is too large for the model; it does not address the root cause of memory pressure. Option C is wrong because enabling distributed training across multiple VMs introduces network overhead and synchronization complexity (e.g., using all-reduce with NCCL) and does not reduce per-VM memory consumption; it may even increase memory usage due to gradient accumulation buffers. Option D is wrong because switching to CPU only typically uses more memory for the same batch size (CPU memory is not the bottleneck here; GPU memory is not even mentioned, and the error is on a standard VM without GPUs), and it would dramatically slow training without solving the OOM issue.

Full explanation →

881

MCQhard

A financial services firm deploys a binary classification model for fraud detection. The model's precision is 0.95 and recall is 0.60 on the test set. After deployment, the fraud rate in production is 0.5% compared to 5% in the test set. The model shows good calibration on the test set (Brier score 0.02) but poor calibration in production (Brier score 0.15). What is the most likely explanation for the calibration degradation?

A.The distribution of input features has shifted significantly, causing the model to produce incorrect probabilities.

B.The model overfits to noise in the training data, leading to poor generalization.

C.The production data has a different class imbalance than the training data, causing the model to be biased toward the majority class.

D.The relationship between features and the target has changed (concept drift), causing the model's probability estimates to be misaligned with the true probabilities.

AnswerD

Concept drift changes the conditional distribution P(Y|X), which directly affects calibration.

Why this answer

The model's calibration degrades in production despite being well-calibrated on the test set, which had a 5% fraud rate, while production has a 0.5% fraud rate. This shift in class imbalance (prior probability shift) directly affects the model's probability estimates because the model's predicted probabilities are conditional on the training distribution. Option D is correct because concept drift—specifically a change in the base rate of fraud—causes the model's probability estimates to no longer reflect the true posterior probabilities in production, leading to a higher Brier score.

Exam trap

The trap here is that candidates confuse covariate shift (feature distribution change) with prior probability shift (class imbalance change), and incorrectly attribute calibration degradation to feature drift rather than the direct effect of base rate change on probability estimates.

How to eliminate wrong answers

Option A is wrong because input feature distribution shift (covariate shift) would primarily affect the model's feature space and could degrade calibration, but the core issue here is the change in class imbalance (prior probability shift), not feature distribution. Option B is wrong because overfitting to noise would manifest as poor performance on both test and production sets, but the model shows good calibration on the test set (Brier score 0.02), indicating it generalizes well to the test distribution. Option C is wrong because while the production data has a different class imbalance, the model is not necessarily biased toward the majority class; the degradation is due to the mismatch between the training prior and production prior, which directly skews probability estimates regardless of majority class bias.

Full explanation →

882

MCQmedium

An engineer is configuring Vertex AI Model Monitoring for a model deployed on an endpoint. They want to monitor feature skew using the training dataset as a baseline. The training dataset is large (10 TB). What is the most efficient way to provide the baseline distribution?

A.Compute the baseline distribution offline and upload a JSON file with feature statistics

B.Upload the entire training dataset as a CSV file to Cloud Storage and reference it in the monitoring config

C.Sample 1% of the training data and use that as baseline

D.Use the BigQuery table containing the training data as the baseline source in the monitoring configuration

AnswerD

Correct: Vertex AI Model Monitoring supports BigQuery tables as baseline sources.

Why this answer

Vertex AI Model Monitoring can automatically compute the baseline distribution from the training dataset stored in BigQuery. This avoids manual computation or exporting large datasets.

Full explanation →

883

Multi-Selectmedium

Which TWO options are recommended practices for managing model versions across teams in Google Cloud?

Select 2 answers

A.Store all model files in a GitHub repository

B.Maintain a custom database to map model names to artifact locations

C.Use AI Platform (Unified) Models as the primary model registry

D.Use Vertex AI Model Registry to track model versions and their deployment history

E.Use Cloud Storage buckets with object versioning enabled to store model artifacts

AnswersD, E

Model Registry is the recommended service for managing model versions.

Why this answer

Vertex AI Model Registry is the recommended service for managing model versions across teams because it provides a centralized repository to track model versions, their associated metadata, and deployment history. It integrates natively with Vertex AI endpoints and pipelines, enabling consistent governance and lineage tracking across the ML lifecycle.

Exam trap

Google Cloud often tests the distinction between legacy AI Platform (Unified) Models and the current Vertex AI Model Registry, expecting candidates to recognize that the registry is the recommended service for version management and deployment history, not just a generic model storage location.

Full explanation →

884

MCQmedium

A retail company wants to build a customer churn prediction model using BigQuery ML. The data is stored in BigQuery tables and includes customer demographics, purchase history, and support interactions. The data scientist wants to experiment with different model types quickly without moving data to another environment. Which approach should they use?

A.Use Cloud Composer to orchestrate a custom training pipeline on Vertex AI.

B.Use AI Platform Notebooks with pandas and scikit-learn.

C.Use BigQuery ML to create and evaluate models directly in BigQuery.

D.Export the data to Cloud Storage and use Vertex AI AutoML Tables.

AnswerC

Why C is correct: BigQuery ML is a low-code solution that works directly on BigQuery data.

Why this answer

BigQuery ML (BQML) allows data scientists to create, train, and evaluate machine learning models directly in BigQuery using SQL, without moving data to another environment. This approach supports rapid experimentation with various model types (e.g., logistic regression, boosted trees, deep neural networks) and is ideal for the stated requirement of quick iteration while keeping data in place.

Exam trap

Google Cloud often tests the candidate's ability to recognize that BigQuery ML is purpose-built for low-code, in-database ML experimentation, and the trap here is assuming that more complex or external tools (like Vertex AI or Cloud Composer) are necessary when the simpler, integrated solution suffices.

How to eliminate wrong answers

Option A is wrong because Cloud Composer is an orchestration tool for workflows, not a direct model training environment; using it to build a custom pipeline on Vertex AI would require moving data and add unnecessary complexity. Option B is wrong because AI Platform Notebooks with pandas and scikit-learn require exporting data from BigQuery to a Python environment, violating the requirement to keep data in BigQuery. Option D is wrong because exporting data to Cloud Storage for Vertex AI AutoML Tables introduces data movement and latency, contradicting the need for quick experimentation without moving data.

Full explanation →

885

MCQhard

You are building a Vertex AI pipeline using the KFP SDK v2. One component processes a large dataset and outputs a metrics artifact. You notice that the component is being cached even when the dataset changes, because the component code and image remain the same. How can you force the component to always re-execute when the dataset changes?

A.Use the dsl.CacheKey annotation to explicitly set cache keys.

B.Set caching_strategy.max_cache_staleness = "0s" on the component.

C.Change the component's image tag to :latest.

D.Add a random integer parameter to the component to vary the inputs.

AnswerB

Setting max_cache_staleness to 0s disables caching for that component, forcing re-execution.

Why this answer

In KFP SDK v2, caching is keyed on the component code, image digest, and all input values. If the dataset is passed as a URI string, changing the URI will invalidate the cache. If the dataset changes without a URI change, you can disable caching per component by setting caching_strategy.max_cache_staleness to 0 or using the disable_cache method.

Full explanation →

886

Multi-Selecthard

A team wants to monitor features in Vertex AI Feature Store for drift. Which TWO configurations are required?

Select 2 answers

A.Enable feature monitoring on the feature view

B.Set up a BigQuery sink for monitoring results

C.Configure an alerting channel (e.g., email) for drift notifications

D.Create a Cloud Scheduler job to trigger monitoring

E.Deploy a monitoring model to an endpoint

AnswersA, C

Required to compute statistics and detect drift.

Why this answer

Enable feature monitoring on the feature view and configure an alerting channel (e.g., email, Pub/Sub) for drift notifications.

Full explanation →

887

MCQmedium

A data scientist notices that the model's prediction latency has increased over the last week. They need to investigate the root cause by examining request and response logs for the Vertex AI Endpoint. What is the recommended way to capture these logs?

A.Enable Vertex AI Model Monitoring with sampling rate 100%

B.Export logs to Cloud Storage using a sink and use gcloud logging read

C.Use Cloud Monitoring custom metrics to capture latency per request

D.Configure request/response logging by specifying a BigQuery destination in the endpoint deployment

AnswerD

Correct: Vertex AI Endpoints can log request/response to BigQuery when enabled in the deployment.

Why this answer

Vertex AI Endpoint can be configured to log request/response data to BigQuery via a log sink. This data can then be analyzed to understand latency issues.

Full explanation →

888

MCQeasy

A team is building a feature pipeline for an ML model. They need to compute aggregate features over a sliding time window from streaming data. Which Google Cloud service is most appropriate for this task?

A.Cloud Dataflow with fixed windows

B.Cloud Pub/Sub for windowing logic

C.BigQuery scheduled queries

D.Cloud Functions with Pub/Sub triggers

AnswerA

Dataflow allows windowed aggregations (sliding, fixed, session) on streaming data.

Why this answer

Cloud Dataflow with fixed windows is the most appropriate choice because it natively supports windowing and aggregation over streaming data using the Apache Beam programming model. Fixed windows allow you to define sliding time intervals (e.g., every 5 minutes) to compute aggregate features like sums or averages, which is exactly what the feature pipeline requires.

Exam trap

The trap here is that candidates confuse Cloud Pub/Sub's ability to handle streaming data with the ability to perform windowed aggregations, but Pub/Sub is only a transport layer and cannot compute features itself.

How to eliminate wrong answers

Option B is wrong because Cloud Pub/Sub is a messaging service that handles event ingestion and delivery, not windowing logic or stateful aggregation; it has no built-in capability to compute sliding window aggregates. Option C is wrong because BigQuery scheduled queries operate on batch data in tables, not on streaming data in real time, and they lack the low-latency, per-event windowing needed for a streaming feature pipeline. Option D is wrong because Cloud Functions with Pub/Sub triggers are stateless and ephemeral, with a maximum timeout of 9 minutes (or 60 minutes with 2nd gen), making them unsuitable for maintaining sliding window state or performing continuous aggregation over streaming data.

Full explanation →

889

MCQhard

An ML engineer is trying to upload a TensorFlow model to Vertex AI using the gcloud command shown. The model was trained using TensorFlow 2.11 and saved with model.save('model/'). The engineer sees the error. What is the most likely cause?

A.The container port should be 8080 instead of 8501.

B.The service account does not have permission to access the bucket.

C.The container image is for TensorFlow 2.11 but the model was saved with an older version.

D.The model was saved in a format other than SavedModel (e.g., HDF5) or the artifact path does not contain the expected directory structure.

AnswerD

The error explicitly states no saved_model.pb found, indicating the model is not in SavedModel format.

Why this answer

Option D is correct because the error indicates that Vertex AI cannot find the expected SavedModel artifacts (saved_model.pb and variables/ directory) at the specified path. When using model.save('model/') with TensorFlow 2.11, the default format is the SavedModel format, but the artifact path must point to the directory containing the saved_model.pb file, not a parent directory or a model saved in HDF5 format. The gcloud command likely references a path that does not contain the required SavedModel structure, causing the upload to fail.

Exam trap

Google Cloud often tests the distinction between SavedModel and HDF5 formats, and candidates mistakenly assume that any model.save() call produces a valid SavedModel, overlooking that the artifact path must point to the correct directory structure with saved_model.pb.

How to eliminate wrong answers

Option A is wrong because the container port 8501 is the default for TensorFlow Serving's REST API, and Vertex AI's prediction container for TensorFlow models typically uses port 8501 for HTTP requests; port 8080 is used for custom containers, not for standard TensorFlow Serving images. Option B is wrong because the error message in the question does not mention permissions or access to a bucket; a bucket permission issue would produce a 403 or 401 error, not a model format error. Option C is wrong because TensorFlow 2.11 is fully backward-compatible with SavedModels saved by older versions, and the container image for TensorFlow 2.11 can serve models saved with any earlier TensorFlow 2.x version without issue.

Full explanation →

890

MCQhard

An organization has multiple ML pipelines running on Vertex AI. They want to centralize monitoring and alerting for pipeline failures, including root cause analysis. Which combination of services should they use?

A.Cloud Trace + Cloud Debugger

B.Cloud Logging + Cloud Monitoring + Error Reporting

C.Cloud Operations for GKE + Stackdriver

D.Cloud Audit Logs + Cloud Functions

AnswerB

These services provide log aggregation, metrics, and error analysis for failures.

Why this answer

Option B is correct because Cloud Logging captures pipeline execution logs, Cloud Monitoring provides metrics and alerting on pipeline failures, and Error Reporting aggregates and analyzes errors with stack traces for root cause analysis. Together, they form a centralized observability stack that meets the requirement for monitoring, alerting, and root cause analysis of ML pipeline failures on Vertex AI.

Exam trap

The trap here is that candidates confuse Cloud Trace and Cloud Debugger (debugging tools) with the monitoring and logging services needed for failure detection and root cause analysis, or mistakenly think Cloud Audit Logs (compliance logs) are sufficient for pipeline error monitoring.

How to eliminate wrong answers

Option A is wrong because Cloud Trace is designed for latency analysis of distributed systems, not for monitoring pipeline failures or root cause analysis of errors, and Cloud Debugger inspects live application state without capturing historical failure data. Option C is wrong because Cloud Operations for GKE is specific to Google Kubernetes Engine workloads, not Vertex AI pipelines, and Stackdriver is the legacy name for what is now Cloud Operations, making this option outdated and misaligned with Vertex AI. Option D is wrong because Cloud Audit Logs record administrative actions and access logs, not pipeline execution errors or failures, and Cloud Functions alone cannot provide the centralized monitoring, alerting, and error analysis required.

Full explanation →

891

MCQhard

A team uses Vertex AI Feature Store to serve features for real-time predictions. They notice that feature values are frequently updated from multiple source systems, leading to inconsistencies. They need to ensure that feature values are consistent across all serving endpoints. What should they do?

A.Use batch ingestion with weekly updates to reduce update frequency

B.Increase the offline storage TTL to retain historical feature values

C.Implement a manual approval process for feature updates

D.Use a streaming ingestion pipeline with exactly-once semantics

AnswerD

Exactly-once streaming ensures each update is applied exactly once, maintaining consistency.

Why this answer

Option D is correct because streaming ingestion with exactly-once semantics ensures that each feature update is applied precisely once, preventing duplicates or missed updates that cause inconsistencies. This approach synchronizes feature values across all serving endpoints in near real-time, directly addressing the problem of frequent updates from multiple source systems.

Exam trap

The trap here is that candidates may confuse consistency with data freshness or retention, leading them to choose batch ingestion or TTL adjustments, when the core issue is update semantics in a distributed streaming context.

How to eliminate wrong answers

Option A is wrong because reducing update frequency with batch ingestion does not resolve inconsistencies from frequent updates; it merely delays them and can lead to stale features. Option B is wrong because increasing offline storage TTL retains historical values but does not affect consistency of current feature values across serving endpoints. Option C is wrong because a manual approval process introduces latency and is impractical for real-time predictions, and it does not guarantee consistency across distributed endpoints.

Full explanation →

892

Multi-Selecthard

A team is troubleshooting a Vertex AI Pipelines run that keeps failing at the model evaluation step. The pipeline includes steps: data preprocessing, training, evaluation, and deployment. Which THREE actions should they take to diagnose the issue?

Select 3 answers

A.Verify that the training step output is correctly linked as input to evaluation.

B.Run the evaluation code locally with the same input data.

C.Increase the memory of the evaluation step's machine.

D.Check the logs of the evaluation step in Cloud Logging.

E.Replace the evaluation step with a Vertex AI Model Evaluation service.

AnswersA, B, D

Mismatched outputs are a common pipeline failure cause.

Why this answer

Option A is correct because Vertex AI Pipelines relies on precise input/output artifact linking between steps. If the training step's output (e.g., a model artifact or evaluation metrics) is not correctly wired as the input to the evaluation step, the pipeline will fail due to missing or mismatched data. This is a common misconfiguration in Kubeflow Pipelines DSL, where step outputs must be explicitly passed as arguments to downstream components.

Exam trap

Google Cloud often tests the misconception that resource scaling (Option C) is the first diagnostic step for pipeline failures, when in reality, most failures in Vertex AI Pipelines stem from misconfigured artifact passing or code errors, not hardware limits.

Full explanation →

893

MCQmedium

You are using Vertex AI Matching Engine for similarity search. Your index has 10 million embeddings of 512 dimensions. The query latency requirement is under 10ms for 99th percentile. Which index type should you choose?

A.Brute-force index with cosine distance.

B.Approximate Nearest Neighbor (ANN) index using the ScaNN algorithm.

C.A custom distance-based index using Cloud SQL.

D.A tree-based index from scikit-learn deployed as a custom container.

AnswerB

ANN with ScaNN is designed for low-latency, high-scale similarity search.

Why this answer

Option B is correct because the ScaNN (Scalable Nearest Neighbors) algorithm is specifically designed for high-dimensional, large-scale similarity search with strict latency requirements. With 10 million 512-dimensional embeddings, an ANN index like ScaNN can achieve sub-10ms query latency at the 99th percentile by trading a small amount of recall for dramatic speed improvements, which is exactly what Vertex AI Matching Engine optimizes for.

Exam trap

The trap here is that candidates assume brute-force is the only 'accurate' option and underestimate how severely the curse of dimensionality degrades tree-based and exact methods at 512 dimensions, leading them to pick A or D despite the explicit latency constraint.

How to eliminate wrong answers

Option A is wrong because a brute-force index computes exact distances against all 10 million embeddings, which for 512-dimensional vectors would require O(10M * 512) operations per query, far exceeding the 10ms latency target even with optimized hardware. Option C is wrong because Cloud SQL is a relational database not designed for vector similarity search; it lacks native support for high-dimensional distance computations and would require full table scans, making sub-10ms latency impossible at this scale. Option D is wrong because scikit-learn's tree-based indices (e.g., KD-Tree, Ball Tree) degrade to near-linear search in high dimensions (curse of dimensionality), performing no better than brute force for 512 dimensions, and deploying as a custom container adds unnecessary overhead without addressing the fundamental algorithmic limitation.

Full explanation →

894

MCQmedium

A team wants to implement continuous delivery for their ML models. They have a pipeline that trains a model and evaluates it. If the evaluation metrics exceed a threshold, the model should be deployed to a staging endpoint, and after manual approval, to production. Which approach should they use?

A.Use Cloud Build to orchestrate the whole process, with a manual approval step before deploying to production.

B.Use Vertex AI Pipelines to deploy to production directly after evaluation.

C.Use Cloud Scheduler to trigger deployment every hour.

D.Use Cloud Functions to deploy after evaluation without approval.

AnswerA

Cloud Build supports manual approval gates, making it suitable for CD.

Why this answer

Option A is correct because Cloud Build supports manual approval steps via its 'approval' configuration in the build YAML, allowing the team to gate the production deployment after staging evaluation. This aligns with the requirement for continuous delivery (not deployment) where a human-in-the-loop approves the final production rollout. Vertex AI Pipelines lacks native manual approval gating, and the other options bypass the required manual approval step entirely.

Exam trap

The trap here is confusing continuous delivery (which includes a manual approval gate) with continuous deployment (which is fully automated), leading candidates to choose options that skip the required human approval step.

How to eliminate wrong answers

Option B is wrong because Vertex AI Pipelines does not have a built-in manual approval step; deploying directly to production after evaluation violates the requirement for manual approval. Option C is wrong because Cloud Scheduler triggers deployments on a fixed schedule (every hour), not based on evaluation metrics exceeding a threshold, and it lacks the manual approval gate. Option D is wrong because Cloud Functions would deploy automatically after evaluation without any manual approval step, contradicting the explicit requirement for human approval before production deployment.

Full explanation →

895

MCQhard

A team is building a continuous training pipeline that retrains a model when new data arrives. They want to detect data drift between the training dataset and the serving data. Which approach should they integrate into the pipeline to compare the distributions of the two datasets?

A.Export metrics to Cloud Monitoring and set up alerting on mean values.

B.Use Vertex AI Model Monitoring with skew detection enabled.

C.Use Cloud DLP to inspect the datasets and generate summary statistics.

D.Compute histograms of features in BigQuery ML and compare them manually.

AnswerB

Model Monitoring provides built-in skew and drift detection by comparing distributions.

Why this answer

Vertex AI Model Monitoring with skew detection enabled is the correct approach because it is specifically designed to detect data drift between training and serving datasets in a continuous training pipeline. It automatically computes distribution statistics (e.g., using Jensen-Shannon divergence or L-infinity distance) for each feature and compares the training data distribution against the serving data distribution, triggering alerts when drift exceeds a configured threshold. This integrates natively with Vertex AI Pipelines, enabling automated retraining workflows.

Exam trap

Cisco often tests the distinction between monitoring for data quality (e.g., missing values, outliers) versus monitoring for distribution drift, and candidates mistakenly choose Cloud Monitoring or manual histogram comparison because they think any metric or visualization can detect drift, but only dedicated drift detection services like Vertex AI Model Monitoring provide the necessary statistical tests and automated alerting.

How to eliminate wrong answers

Option A is wrong because exporting metrics to Cloud Monitoring and setting up alerting on mean values only monitors a single statistic (the mean) and cannot detect complex distribution shifts like changes in variance, skewness, or multimodal distributions; it also lacks the statistical tests needed for rigorous drift detection. Option C is wrong because Cloud DLP is a data loss prevention service focused on inspecting and redacting sensitive data (e.g., PII, credit card numbers), not on comparing statistical distributions of datasets for drift detection. Option D is wrong because computing histograms in BigQuery ML and comparing them manually is not automated, does not scale to many features, and lacks built-in statistical significance tests (e.g., Kolmogorov-Smirnov test) that Vertex AI Model Monitoring provides out-of-the-box.

Full explanation →

896

MCQmedium

You are using Vertex AI Vector Search with an approximate nearest neighbor index. You need to update the index with new data every hour. The updates must be available for queries immediately. Which update method should you use?

A.Recreate the index every hour using a scheduled job.

B.Batch update by creating a new index and deploying it.

C.Streaming updates using the streaming API.

D.Use a brute-force index that supports real-time updates.

AnswerC

Streaming updates allow immediate visibility of new data in the index.

Why this answer

Option C is correct because Vertex AI Vector Search supports streaming updates via its streaming API, which allows you to insert, update, or delete vectors in real time. This ensures that new data is immediately available for approximate nearest neighbor (ANN) queries without requiring index recreation or redeployment, meeting the requirement for hourly updates with instant query availability.

Exam trap

Cisco often tests the misconception that batch updates (creating a new index) are the only way to update an ANN index, but the streaming API is designed specifically for low-latency, incremental updates without full index rebuilds.

How to eliminate wrong answers

Option A is wrong because recreating the entire index every hour is inefficient and introduces downtime during the rebuild process, failing the requirement for immediate query availability. Option B is wrong because batch updating by creating a new index and deploying it involves a delay for building and deploying the index, so updates are not available immediately for queries. Option D is wrong because Vertex AI Vector Search does not offer a brute-force index that supports real-time updates; brute-force indices are typically used for exact nearest neighbor search and are not designed for real-time streaming updates in this service.

Full explanation →

897

MCQmedium

A healthcare provider needs to extract structured information from incoming PDF forms (e.g., patient intake forms). They want to automate data extraction without writing custom models. Which Google Cloud service should they use?

A.Document AI with a form parser processor

B.Natural Language API for entity extraction

C.Vision API

D.AutoML Vision for object detection

AnswerA

Document AI's form parser is designed to extract key-value pairs and tables from forms.

Why this answer

Document AI with a form parser processor is the correct choice because it is purpose-built for extracting structured data from PDF forms, including key-value pairs and tables, without requiring custom model development. It uses pre-trained models specifically for form understanding, making it ideal for automating intake form processing.

Exam trap

Cisco often tests the distinction between general-purpose OCR (Vision API) and specialized document understanding (Document AI), leading candidates to choose Vision API for form parsing when it lacks the necessary form-field extraction capabilities.

How to eliminate wrong answers

Option B is wrong because Natural Language API is designed for extracting entities and sentiment from unstructured text, not for parsing structured form fields from PDF documents. Option C is wrong because Vision API provides optical character recognition (OCR) to extract raw text from images but lacks the form-specific parsing logic to identify key-value pairs and table structures. Option D is wrong because AutoML Vision for object detection is used for identifying objects within images, not for extracting structured data from forms.

Full explanation →

898

Matchingmedium

Match each ML pipeline component to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Production ML pipeline framework by Google

ML toolkit for Kubernetes-based workflows

Unified stream and batch data processing service

Managed Apache Airflow workflow orchestration

Serverless ML pipeline orchestration on Vertex AI

Why these pairings

Correct matches: Data ingestion with collecting raw data, Data validation with checking data quality, Feature engineering with transforming raw data. Common confusions include swapping model training with evaluation, model evaluation with deployment, and model deployment with feature engineering.

Full explanation →

899

MCQhard

You are using Vertex AI Vector Search for a product recommendation system. Your index is updated with new embeddings every hour. To minimize query latency while keeping the index fresh, what should you do?

A.Use streaming updates to insert new embeddings into the deployed index.

B.Rebuild the entire index hourly as a batch job and redeploy it.

C.Create a new index each hour and use traffic splitting to gradually shift traffic.

D.Use a brute-force index instead of ANN to ensure accuracy after updates.

AnswerA

Streaming updates provide low-latency freshness without full rebuild.

Why this answer

Streaming updates allow real-time insertion of new embeddings without rebuilding the entire index, maintaining low latency for queries.

Full explanation →

900

MCQmedium

An engineer wants to configure alerting when the data distribution of a serving feature deviates from the training data distribution. The model is deployed on Vertex AI Endpoints. Which divergence metric should they use to compare the training and serving distributions?

A.Kullback-Leibler divergence

B.Population Stability Index (PSI)

C.Jensen-Shannon divergence

D.Chi-squared test

AnswerC

JS divergence is the recommended metric for detecting distribution skew in Vertex AI Model Monitoring.

Why this answer

Vertex AI Model Monitoring supports Jensen-Shannon divergence for comparing distributions. It is a symmetric and bounded metric suitable for detecting feature skew between training and serving data.

Full explanation →

Page 12 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice PMLE by domain

Target a specific domain to shore up weak areas.

Automating and Orchestrating ML Pipelines Collaborating Within and Across Teams to Manage Data and Models Serving and Scaling Models Monitoring ML Solutions Architecting Low-Code ML Solutions Scaling Prototypes into ML Models Collaborating to manage data and models Solving business challenges with ML

See all domains with question counts →