CCNA Ml Monitoring Questions

75 of 86 questions · Page 1/2 · Ml Monitoring topic · Answers revealed

1
MCQmedium

You are responsible for monitoring a batch prediction pipeline that runs daily. Recently, the pipeline started failing intermittently with out-of-memory errors. The input data volume has not changed. What is the most likely cause?

A.A recent code change that loads the entire dataset into memory before processing
B.Increase in model size due to retraining
C.Decrease in the number of worker machines
D.Increase in input data size
AnswerA

This could cause OOM for large datasets.

Why this answer

Option A is correct because a code change that loads the entire dataset into memory before processing would directly cause out-of-memory (OOM) errors, even if the input data volume remains unchanged. In batch prediction pipelines, data is typically streamed or processed in chunks to manage memory efficiently. A change that bypasses this pattern and loads all data at once can exceed the available heap or container memory, leading to intermittent failures depending on data characteristics or concurrent loads.

Exam trap

The trap here is that candidates may assume OOM errors are always caused by increased data volume or resource scaling issues, but the question explicitly states data volume is unchanged, forcing you to consider code-level changes that alter memory access patterns.

How to eliminate wrong answers

Option B is wrong because an increase in model size due to retraining would affect memory usage during model loading or inference, but it would not cause intermittent OOM errors if the input data volume is unchanged; model size changes are typically gradual and would cause consistent failures, not intermittent ones. Option C is wrong because a decrease in the number of worker machines would reduce total available memory, but the question states the input data volume has not changed, so this would cause consistent OOM errors on every run, not intermittent ones. Option D is wrong because the question explicitly states that input data volume has not changed, so an increase in data size cannot be the cause.

2
Multi-Selecthard

A financial services company has deployed a classification model on Vertex AI to detect fraudulent transactions. The model is monitored using Vertex AI Model Monitoring for skew and drift detection, and also logs predictions to BigQuery for analysis. After a month, the monitoring alerts show a significant drift in one feature (transaction_amount). Which TWO actions should the team take to diagnose and address this issue?

Select 2 answers
A.Compare the feature distribution in the training data with the recent serving data using statistical tests.
B.Retrain the model on the most recent data to incorporate the new distribution.
C.Increase the frequency of model monitoring checks to every hour.
D.Increase the sampling rate for prediction logging to ensure full data capture.
E.Reduce the alert threshold to minimize false positives.
AnswersA, B

This diagnostic step helps understand the nature and extent of the drift.

Why this answer

Option A is correct because comparing the feature distribution of the training data with recent serving data using statistical tests (e.g., Kolmogorov-Smirnov or Jensen-Shannon divergence) is the standard first step to quantify the drift and confirm it is statistically significant. This diagnostic action helps the team understand the nature and magnitude of the drift before deciding on remediation steps. Vertex AI Model Monitoring already performs such comparisons, but the team should independently verify the results in BigQuery to ensure accuracy.

Exam trap

The trap here is that candidates often confuse 'detecting drift' with 'fixing drift' and immediately choose retraining (Option B) without first performing a diagnostic comparison, which is a critical step in the ML lifecycle per the PMLE exam's emphasis on systematic troubleshooting.

3
MCQhard

Your company uses a custom container for model serving on Vertex AI. After a recent update, the model returns predictions but they are clearly wrong (e.g., negative probabilities for a classification model). The logs show no errors. What is the most likely cause?

A.The preprocessing code in the container was updated but the model was not retrained on the new preprocessing
B.The model file is corrupted
C.The model file was accidentally replaced with a different model
D.The container is using an incompatible version of the serving framework
AnswerA

Feature transformation mismatch leads to incorrect predictions.

Why this answer

Option A is correct because the most likely cause of a model returning predictions without errors, but with clearly wrong outputs like negative probabilities, is a mismatch between the preprocessing logic used during training and inference. If the preprocessing code in the container was updated (e.g., scaling, normalization, or feature engineering steps changed) but the model was not retrained on data processed with that new logic, the model receives inputs that are out of distribution, leading to nonsensical outputs. Vertex AI containers run inference with the deployed code, so any change in preprocessing directly affects the input tensor values without raising runtime errors.

Exam trap

Google Cloud often tests the concept that silent prediction errors (no logs, no crashes) are almost always due to data or preprocessing mismatches, not infrastructure or model file issues, which would generate explicit errors.

How to eliminate wrong answers

Option B is wrong because a corrupted model file would typically cause loading failures, runtime errors, or crashes, not silent generation of plausible but wrong predictions like negative probabilities. Option C is wrong because replacing the model file with a different model would likely produce predictions that are consistently wrong in a different pattern (e.g., all zeros, constant values) or cause shape mismatches, not specifically negative probabilities from a classification model. Option D is wrong because an incompatible serving framework version would usually manifest as import errors, missing symbols, or version mismatch warnings in logs, not silent incorrect predictions with no errors.

4
MCQeasy

A machine learning model deployed on Vertex AI is returning erroneous predictions. The team needs to investigate the root cause by examining the prediction request and response details. Which Google Cloud tool is best suited for this?

A.Cloud Monitoring
B.Cloud Debugger
C.Cloud Logging
D.Cloud Trace
AnswerC

Cloud Logging can capture structured logs from Vertex AI predictions, including request and response data for analysis.

Why this answer

Cloud Logging is the correct tool because it captures detailed logs of prediction requests and responses, including input features, model outputs, and any errors. By examining these logs, the team can trace the exact data flow and identify discrepancies causing erroneous predictions, such as data preprocessing issues or model version mismatches.

Exam trap

The trap here is that candidates confuse Cloud Monitoring (which shows aggregate health metrics) with Cloud Logging (which provides granular request/response data), leading them to choose a tool that cannot reveal the specific prediction details needed for root cause analysis.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring focuses on metrics and alerting (e.g., latency, error rates) but does not capture the content of individual prediction requests or responses. Option B is wrong because Cloud Debugger is designed for inspecting live application code state (e.g., variable values) in production, not for logging request/response payloads of ML predictions. Option D is wrong because Cloud Trace provides latency analysis and distributed tracing of requests across services, but it does not log the actual prediction data or response details needed to debug prediction errors.

5
Multi-Selecthard

A financial institution uses a machine learning model to approve loans. They must monitor for fairness and bias. Which THREE Google Cloud tools or features can help them achieve this? (Choose 3.)

Select 3 answers
A.What-If Tool
B.Vertex AI Model Monitoring
C.Cloud Data Loss Prevention
D.Cloud Healthcare API
E.Explainable AI
AnswersA, B, E

The What-If Tool allows testing different scenarios and slicing by protected attributes to evaluate fairness.

Why this answer

The What-If Tool (WIT) is a Google Cloud tool integrated with Vertex AI that allows users to analyze model behavior across different subsets of data, such as demographic groups. It provides interactive visualizations to test how changes in input features affect predictions, enabling fairness assessments by comparing performance metrics across groups. This directly supports monitoring for bias in loan approval decisions.

Exam trap

Google Cloud often tests the distinction between data security tools (like DLP) and ML fairness tools, so candidates mistakenly select Cloud DLP thinking it addresses bias because it handles sensitive attributes, but DLP does not analyze model predictions or fairness metrics.

6
MCQhard

A large enterprise has multiple ML models deployed in production across different regions. They want to implement a centralized monitoring dashboard that tracks key performance indicators such as prediction accuracy, latency, and error rates for all models, with the ability to drill down into individual model versions. Which approach best meets these requirements?

A.Use Vertex AI Experiments to log metrics and compare across runs
B.Use Cloud Logging to search logs from each model and create a dashboard
C.Use BigQuery to store prediction logs and then visualize in Looker
D.Use Cloud Monitoring with custom metrics reported by each model deployment, and create a unified dashboard with filterable resources
AnswerD

Cloud Monitoring supports custom metrics and dashboards that can be filtered by resource labels (e.g., model name, version), providing centralized visibility and drill-down capability.

Why this answer

Option D is correct because Cloud Monitoring with custom metrics allows each model deployment to report key performance indicators (e.g., prediction accuracy, latency, error rates) as metric time series. These custom metrics can be aggregated into a single unified dashboard, and the dashboard can be configured with filterable resources (e.g., region, model version) to enable drill-down into individual model versions. This approach provides centralized, real-time monitoring without relying on log-based or batch analytics.

Exam trap

Google Cloud often tests the distinction between logging (Cloud Logging) and monitoring (Cloud Monitoring), where candidates mistakenly think log-based dashboards are sufficient for real-time KPI tracking, ignoring the need for structured, low-latency custom metrics.

How to eliminate wrong answers

Option A is wrong because Vertex AI Experiments is designed for tracking and comparing training runs (e.g., hyperparameter tuning), not for real-time monitoring of deployed models in production across regions. Option B is wrong because Cloud Logging is a log management service that requires parsing unstructured log entries to extract metrics, which is inefficient for real-time KPIs and lacks native metric aggregation and dashboard drill-down capabilities. Option C is wrong because BigQuery is a data warehouse for storing and querying large datasets, and while Looker can visualize it, this approach introduces latency from batch loading and is not designed for real-time monitoring of live model deployments.

7
MCQeasy

Refer to the exhibit. A Vertex AI prediction endpoint is failing with a deadline exceeded error. The log shows the following. What is the most likely cause?

A.The prediction request is malformed
B.Insufficient CPU or memory for the load
C.The model is too large for the machine type
D.The model version is corrupted
AnswerB

High CPU and memory utilization indicate the machine type is inadequate for the prediction workload, leading to timeouts.

Why this answer

A deadline exceeded error in Vertex AI prediction endpoints typically indicates that the model is taking too long to respond, often due to insufficient CPU or memory resources for the current load. This causes the request to time out before the inference completes, as the underlying infrastructure cannot process the requests quickly enough.

Exam trap

Google Cloud often tests the distinction between deployment-time errors (like model size) and runtime errors (like timeout), so candidates mistakenly associate a deadline exceeded error with model corruption or malformed requests rather than resource constraints.

How to eliminate wrong answers

Option A is wrong because a malformed request would result in an invalid argument or bad request error (e.g., HTTP 400), not a deadline exceeded (HTTP 504) error. Option C is wrong because a model that is too large for the machine type would cause a resource exhaustion error at deployment time (e.g., 'Insufficient memory to load model'), not a runtime deadline exceeded error. Option D is wrong because a corrupted model version would cause model loading failures or prediction errors (e.g., 'Model not found' or 'Internal server error'), not a timeout-related deadline exceeded error.

8
MCQeasy

You are monitoring a classification model that predicts loan default. The model was trained on data from 2020-2022. In 2023, the economic conditions changed, and the model's accuracy dropped significantly. Which monitoring approach would best help you detect this issue early?

A.Monitor the accuracy of the model on the latest batch of labeled data
B.Monitor feature distribution drift using KS test
C.Monitor the prediction distribution for significant shift from training distribution
D.Monitor the freshness of the training data
AnswerC

Prediction distribution shift can indicate concept drift even without labels.

Why this answer

Option C is correct because monitoring the prediction distribution for a significant shift from the training distribution directly detects changes in the model's output behavior, which is the earliest indicator of concept drift or data drift caused by economic changes. Unlike accuracy monitoring, this approach does not require labeled data, enabling real-time detection of performance degradation before ground truth labels become available.

Exam trap

The trap here is that candidates often choose monitoring feature drift (Option B) because it sounds technical, but they overlook that concept drift—a change in the relationship between features and the target—is better detected by monitoring prediction distribution shifts, not just feature distribution shifts.

How to eliminate wrong answers

Option A is wrong because monitoring accuracy on labeled data is a reactive approach that requires ground truth labels, which are often delayed or unavailable in real-time, making it too slow to detect early drift. Option B is wrong because monitoring feature distribution drift using the KS test only detects changes in input features, not the relationship between features and the target (concept drift), so it may miss shifts in the decision boundary caused by economic changes. Option D is wrong because monitoring the freshness of training data is a data management practice that does not directly detect model performance degradation or drift; it only ensures the training data is recent, not that the model is still valid under new conditions.

9
MCQmedium

Your organization has a requirement to monitor fairness of an ML model that predicts loan approvals. You need to set up alerts if the model's predictions show bias against a protected group. Which tool on Google Cloud can you use to monitor this?

A.Cloud Vision API to analyze demographic data.
B.Vertex AI Model Monitoring with Fairness Indicators integration.
C.AutoML Tables fairness evaluation results from training.
D.Cloud DLP (Data Loss Prevention) to inspect input features for bias.
AnswerB

Fairness Indicators can be evaluated and monitored via Vertex AI Model Monitoring.

Why this answer

Vertex AI Model Monitoring with Fairness Indicators integration is the correct tool because it allows you to continuously monitor a deployed model's predictions for bias against protected groups (e.g., race, gender) by analyzing prediction distributions and setting alert thresholds. This is a post-deployment monitoring capability, not a training-time evaluation, and it directly addresses the requirement to set up alerts on live predictions.

Exam trap

The trap here is that candidates confuse training-time fairness evaluation (AutoML Tables) with post-deployment monitoring (Vertex AI Model Monitoring), or they mistakenly think data inspection tools like Cloud DLP or Vision API can perform bias analysis on predictions.

How to eliminate wrong answers

Option A is wrong because Cloud Vision API is an image analysis service for detecting objects, text, and faces in images; it has no capability to analyze demographic data or monitor ML model fairness. Option C is wrong because AutoML Tables fairness evaluation results are generated during model training, not for ongoing post-deployment monitoring; the question specifically requires setting up alerts on predictions, which is a monitoring, not training, task. Option D is wrong because Cloud DLP is designed to inspect and redact sensitive data (e.g., PII) in text, not to analyze model predictions for bias or set fairness alerts.

10
MCQmedium

A company has deployed a model that predicts customer churn. The model's performance, as measured by AUC, has been declining over the past month. The team suspects data drift. They have enabled Vertex AI Model Monitoring, but no alerts have been triggered. What is a possible reason for the lack of alerts?

A.The monitoring is only sampling 10% of the serving data
B.The drift detection threshold is set too low
C.The model is being retrained daily
D.The drift detection focuses on categorical features only
AnswerA

Low sampling rates mean that Model Monitoring only examines a small fraction of predictions, potentially missing drift if it is not uniformly distributed.

Why this answer

If the sampling rate is low (e.g., 10% of serving data), Model Monitoring may not capture enough data to detect drift, leading to no alerts even if drift exists. A low threshold would create more alerts, not fewer. Daily retraining might correct drift, but would still likely trigger alerts if drift occurred between retraining runs.

Restricting to categorical features only would miss continuous feature drift, but that would still trigger alerts for categorical features.

11
Multi-Selecthard

You are monitoring a production model that is experiencing gradual decay in AUC. Which THREE metrics should you set up alerts for to diagnose the root cause? (Choose three.)

Select 3 answers
A.Concept drift score measured by comparing predicted vs actual outcomes.
B.Training-serving skew for categorical features with high importance.
C.Average prediction latency over the past hour.
D.Feature drift score for key numerical features.
E.Model staleness (days since last retraining).
AnswersA, B, D

Detects changes in relationship between features and labels.

Why this answer

Option A is correct because concept drift directly measures the degradation of model performance by comparing predicted probabilities against actual outcomes over time. A gradual AUC decay indicates that the relationship between features and the target is shifting, and tracking concept drift via metrics like the PSI or distribution of residuals helps isolate whether the model's predictive power is eroding due to changing data patterns.

Exam trap

Google Cloud often tests the distinction between metrics that indicate a symptom (e.g., latency, staleness) versus metrics that directly measure the cause of performance decay (drift scores), leading candidates to select operational metrics instead of diagnostic ones.

12
Drag & Dropmedium

Drag and drop the steps to set up a distributed training job on Vertex AI using a custom container in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

First prepare the code and container, then push, configure the job, and run.

13
MCQhard

A company uses a custom container on Vertex AI Prediction. They want to send custom metrics from their prediction container to Cloud Monitoring. Which method should they use?

A.OpenCensus or OpenTelemetry SDK
B.Vertex AI built-in metrics
C.Stackdriver Monitoring agent installed in the container
D.Cloud Logging log-based metrics
AnswerA

Vertex AI Prediction integrates with OpenTelemetry for custom metrics.

Why this answer

Option A is correct because OpenCensus and OpenTelemetry are the recommended open-source frameworks for exporting custom metrics from custom containers on Vertex AI Prediction to Cloud Monitoring. They provide a standardized way to instrument your application code, collect metrics, and send them directly to Cloud Monitoring via the Cloud Monitoring API, without requiring additional agents or log-based workarounds.

Exam trap

The trap here is that candidates often confuse built-in Vertex AI metrics (which are automatic but limited) with the need for custom metrics, or they incorrectly assume that log-based metrics are the simplest path, when in fact OpenCensus/OpenTelemetry are the direct and recommended method for custom containers.

How to eliminate wrong answers

Option B is wrong because Vertex AI built-in metrics only cover default infrastructure metrics (e.g., CPU, memory, request latency) and cannot capture custom application-level metrics defined by the user. Option C is wrong because the Stackdriver Monitoring agent (now the Ops Agent) is designed for VM-based environments and is not intended to be installed inside a container; it would add unnecessary overhead and is not the recommended pattern for custom containers on Vertex AI. Option D is wrong because Cloud Logging log-based metrics require you to write metrics as structured log entries and then define metric filters, which is an indirect, higher-latency approach compared to directly exporting metrics via OpenCensus/OpenTelemetry, and it is not the standard method for custom containers in Vertex AI Prediction.

14
MCQmedium

A financial services company uses a custom deep learning model on Vertex AI to automatically approve or reject credit card transactions. The model is explainable using Vertex Explainable AI, and the company monitors feature attribution drift with thresholds defined per feature. Last week, the monitoring system flagged that the mean absolute attribution score for the 'transaction_amount' feature increased from 0.35 to 0.55. The overall model accuracy, measured on a daily batch of labeled transactions, has remained around 97%. The operations team is concerned about potential compliance issues due to changing model behavior. What should the data scientist do?

A.Tune the alert threshold for 'transaction_amount' to 0.6 to avoid future false alarms.
B.Retrain the model by increasing regularization to reduce the importance of the 'transaction_amount' feature.
C.Investigate whether there has been a shift in the distribution of 'transaction_amount' values in the recent transaction data, which could explain the attribution change.
D.Disable the feature attribution drift monitoring for 'transaction_amount' since the model accuracy is stable.
AnswerC

A distribution shift in the feature values can cause the model to rely more heavily on that feature, leading to higher attribution scores. Investigating this is the appropriate diagnostic step.

Why this answer

Option C is correct because a shift in the distribution of the 'transaction_amount' feature (e.g., due to seasonality or a new customer segment) can naturally cause its attribution score to change without indicating model degradation. Vertex Explainable AI computes feature attributions relative to the current data distribution; if the input values shift, the model's reliance on that feature may legitimately increase. Investigating the distribution shift is the first diagnostic step before adjusting thresholds or retraining, as stable accuracy does not rule out data drift that could lead to compliance issues.

Exam trap

The trap here is that candidates assume stable accuracy means the model is fine, but the PMLE exam tests that feature attribution drift can indicate a change in model behavior that accuracy alone cannot detect, especially for compliance-sensitive applications.

How to eliminate wrong answers

Option A is wrong because tuning the alert threshold to 0.6 without understanding the root cause ignores the possibility of a real distribution shift or model behavior change, and could mask a genuine compliance risk. Option B is wrong because increasing regularization to reduce the importance of 'transaction_amount' is a premature intervention that could harm model performance and does not address why the attribution changed; it assumes the change is harmful without evidence. Option D is wrong because disabling monitoring for a feature based solely on stable accuracy is dangerous—accuracy can remain high while feature attributions drift, leading to biased or non-compliant decisions that accuracy alone does not capture.

15
MCQmedium

Your team has a production ML model on Vertex AI that shows a gradual decline in accuracy over the past week. The model is retrained weekly using the latest data. Which monitoring approach should you implement to detect the issue earlier?

A.Configure Vertex AI Model Monitoring to detect feature drift and alert when metrics exceed thresholds.
B.Create a Cloud Monitoring alert for prediction response count.
C.Use BigQuery ML to retrain the model more frequently.
D.Set up a Cloud Monitoring uptime check on the prediction endpoint.
AnswerA

Vertex AI Model Monitoring directly monitors for drift and skew, which helps detect accuracy decline.

Why this answer

Option B is correct because Vertex AI Model Monitoring can detect training-serving skew and data drift, which are common causes of accuracy decline. Option A is wrong because Cloud Monitoring without custom metrics cannot detect drift automatically. Option C is wrong because BigQuery ML is not a monitoring tool.

Option D is wrong because alerting on raw prediction count is irrelevant.

16
Multi-Selecthard

A company uses Vertex AI Model Monitoring to detect training-serving skew. They have a categorical feature 'product_category' with high cardinality. The monitoring job alerts for skew, but the data scientists believe the model performance is still acceptable. Which THREE actions should the team take to investigate and resolve the alert?

Select 3 answers
A.Examine which categories have the largest distribution changes to understand the nature of the shift.
B.Adjust the alerting threshold based on historical drift patterns to reduce noise.
C.Compare model performance metrics (e.g., AUC) on the drifted segment vs. the non-drifted segment.
D.Remove the drifted categories from the feature set to eliminate the alert.
E.Ignore the alert because the model is performing well; monitoring alerts are often false positives.
AnswersA, B, C

Identifying specific categories helps assess whether the drift is due to seasonal effects or other benign causes.

Why this answer

Option A is correct because examining which categories have the largest distribution changes allows the team to pinpoint the root cause of the training-serving skew. In Vertex AI Model Monitoring, the skew alert is based on statistical distance metrics (e.g., Jensen-Shannon divergence) between training and serving distributions. By drilling down into the specific categories driving the divergence, the team can assess whether the shift is benign (e.g., seasonal) or problematic, rather than relying on aggregate model performance alone.

Exam trap

Google Cloud often tests the misconception that a model's aggregate performance metrics (e.g., AUC) are sufficient to dismiss drift alerts, but the trap is that drift can be localized to specific segments without affecting overall metrics, requiring per-segment evaluation.

17
Multi-Selecthard

Which THREE components should you include in a comprehensive model monitoring dashboard for a production ML system?

Select 3 answers
A.Team member roles and responsibilities
B.System resource utilization (CPU, memory, latency)
C.Input data quality metrics (missing values, outliers)
D.Training pipeline code version
E.Model performance metrics (accuracy, precision, recall) over time
AnswersB, C, E

Ensures infrastructure is healthy.

Why this answer

Option B is correct because system resource utilization metrics (CPU, memory, latency) are essential for monitoring the health and performance of the production infrastructure hosting the ML model. These metrics help detect resource bottlenecks, scaling issues, or degradation that could impact inference latency and throughput, which are critical for maintaining service-level objectives (SLOs).

Exam trap

Google Cloud often tests the distinction between operational governance artifacts (like team roles) and actual monitoring metrics; the trap here is confusing project management documentation with the technical components of a live monitoring dashboard.

18
MCQhard

An e-commerce company uses a Vertex AI endpoint for product recommendations. Recently, the click-through rate (CTR) dropped significantly. Model monitoring shows no significant data drift or skew. Logs show increased latency but no errors. Which technique should the engineer use to diagnose the issue?

A.Increase the endpoint's request timeout value to accommodate the higher latency.
B.Enable autoscaling on the endpoint to reduce latency by adding more nodes.
C.Retrain the model with the most recent user interaction data.
D.Analyze the prediction output distribution using Vertex AI Model Monitoring for prediction drift and compare to a baseline.
AnswerD

Prediction drift can directly impact CTR even without data drift.

Why this answer

Option D is correct because the drop in CTR despite no data drift or skew suggests that the model's predictions have shifted in distribution (prediction drift), even if the input features remain stable. Vertex AI Model Monitoring can compare the current prediction output distribution against a baseline to detect such drift, which directly explains the CTR decline. The increased latency is a symptom, not the root cause, and fixing latency alone would not restore CTR.

Exam trap

Google Cloud often tests the distinction between data drift (input distribution changes) and prediction drift (output distribution changes), and candidates mistakenly assume that no data drift means the model is fine, overlooking that the model's predictions can still degrade due to concept drift.

How to eliminate wrong answers

Option A is wrong because increasing the request timeout does not address the root cause of the CTR drop; it only masks the latency issue and may lead to worse user experience if predictions are stale. Option B is wrong because enabling autoscaling reduces latency by adding nodes, but the CTR drop is not caused by latency; it is a prediction quality issue, and autoscaling does not fix prediction drift. Option C is wrong because retraining with recent data assumes the model is stale, but monitoring shows no data drift or skew, so the input distribution is fine; the problem is in the output distribution, and retraining without investigating prediction drift may not resolve the issue.

19
MCQmedium

Refer to the exhibit. A team configured Vertex AI Model Monitoring with skew detection for feature "income" with a threshold of 0.2. However, they have not received any alerts even though they suspect data drift. What is the most likely reason?

A.The monitoring is not enabled for the endpoint
B.The 'income' feature is not present in the serving data
C.The actual skew is below the threshold
D.The drift detection threshold is set higher
AnswerB

If the feature is missing from serving data, skew detection cannot perform comparison and will not generate alerts.

Why this answer

If the 'income' feature is not present in the serving data, the skew detection cannot compute a comparison, and no alert is generated even if other drifts exist. The threshold being low would increase alerts, not suppress them. The monitoring likely is enabled since the config is present.

The drift threshold for drift detection is separate.

20
Multi-Selectmedium

A company wants to set up end-to-end monitoring for a Vertex AI model. Which three components should they include?

Select 3 answers
A.Feature store backup status
B.Model performance metrics
C.Data drift and concept drift detection
D.Prediction latency
E.Model training cost
AnswersB, C, D

Performance metrics like AUC or RMSE are essential for model health.

Why this answer

Model performance metrics (Option B) are essential for end-to-end monitoring because they track how well the Vertex AI model is performing over time using key indicators like accuracy, precision, recall, or AUC-ROC. This allows the team to detect degradation in prediction quality, which is a core requirement for maintaining model reliability in production.

Exam trap

The trap here is that candidates often confuse operational or cost-related metrics (like backup status or training cost) with the three core pillars of model monitoring: performance metrics, drift detection, and latency tracking.

21
MCQeasy

A machine learning engineer wants to monitor model performance on Vertex AI for a regression model. Which metric is most appropriate to track the average prediction error?

A.F1 score
B.Precision
C.Accuracy
D.RMSE
AnswerD

RMSE measures average prediction error in regression.

Why this answer

RMSE (Root Mean Squared Error) is the most appropriate metric for tracking average prediction error in a regression model because it measures the standard deviation of residuals (prediction errors) in the same units as the target variable. On Vertex AI, RMSE is a built-in evaluation metric for regression models, directly quantifying how far predictions deviate from actual values on average.

Exam trap

Google Cloud often tests the distinction between classification and regression metrics, and the trap here is that candidates mistakenly apply classification metrics like F1, precision, or accuracy to a regression problem, not recognizing that RMSE is the standard for continuous prediction error.

How to eliminate wrong answers

Option A is wrong because F1 score is a classification metric that combines precision and recall, not applicable to regression tasks. Option B is wrong because precision measures the proportion of true positive predictions among all positive predictions, used only in classification contexts. Option C is wrong because accuracy is the ratio of correct predictions to total predictions, suitable for classification but meaningless for continuous-valued regression outputs.

22
MCQhard

A global retailer has deployed a real-time product recommendation model on Vertex AI Endpoints. The model is a large neural network that runs on a single node with 8 vCPUs and 30 GB memory. Over the past week, the p99 latency has increased from 200ms to 2 seconds, and the error rate has risen to 5%. Cloud Monitoring shows that the endpoint's CPU utilization is consistently near 100%, and memory is at 80%. The ML engineer suspects the model is too large for the node, but model size has not changed. Logs show no increase in request volume (steady at 50 QPS). There are no recent model updates. The engineer has tried to increase the node to 16 vCPUs, but latency decreased only slightly. What is the most likely root cause and the best first step to resolve it?

A.Profile the inference code to identify inefficient operations, such as unnecessary copies or suboptimal batch processing, and optimize the model serving logic.
B.Add more nodes to the endpoint by enabling autoscaling to distribute the load.
C.Retrain the model with a smaller architecture to reduce inference time.
D.Move the model to a machine type with more CPU cores and a GPU to accelerate inference.
AnswerA

The symptoms point to a code-level issue; profiling will reveal bottlenecks.

Why this answer

The p99 latency spike and high CPU utilization despite unchanged model size and request volume indicate a software bottleneck, not a hardware one. Profiling the inference code (Option A) can reveal inefficient operations like unnecessary data copies or suboptimal batch processing that degrade performance on the existing node. Since increasing vCPUs barely helped, the root cause is likely within the serving logic, not the compute capacity.

Exam trap

Google Cloud often tests the misconception that latency and CPU issues are always solved by scaling up hardware, when in fact software inefficiencies in the serving stack are a frequent root cause in ML deployments.

How to eliminate wrong answers

Option B is wrong because adding nodes via autoscaling would not address the root cause of high CPU utilization per node; it would only distribute the load, but each node would still suffer from the same inefficiency, and the steady 50 QPS suggests no need for more nodes. Option C is wrong because retraining with a smaller architecture is a long-term solution that ignores the immediate issue of serving inefficiency; the model size hasn't changed, and the problem is runtime performance, not model accuracy. Option D is wrong because moving to a GPU or more CPU cores treats the symptom (high CPU) rather than the cause; the minimal improvement from doubling vCPUs suggests the bottleneck is in software, not hardware, and a GPU would not fix inefficient code paths.

23
MCQhard

You have a model that predicts equipment failure. The model is retrained every week with new data. You notice that the model's precision is stable but recall drops suddenly. Which monitoring strategy would best help you understand the cause?

A.Monitor feature drift for all input features.
B.Monitor the distribution of the model's predicted probabilities and compare to the empirical failure rate over time.
C.Compare the number of predictions per day with previous weeks.
D.Check the request latency at the endpoint.
AnswerB

This helps detect concept drift: if predicted probabilities shift relative to actual outcomes, recall may drop.

Why this answer

Option B is correct because a drop in recall (more false negatives) while precision stays stable suggests the model's decision threshold may be misaligned with the current data distribution. Monitoring the distribution of predicted probabilities against the empirical failure rate over time directly reveals if the model's confidence calibration has shifted, indicating concept drift or a change in the underlying failure rate that requires threshold recalibration.

Exam trap

Google Cloud often tests the distinction between data drift (feature drift) and concept drift (label/prior shift), and the trap here is that candidates assume any performance degradation must be due to feature drift, ignoring that a stable precision with dropping recall specifically signals a threshold or label distribution issue best diagnosed via probability calibration monitoring.

How to eliminate wrong answers

Option A is wrong because monitoring feature drift for all input features is too broad and may not directly explain a recall drop; feature drift can cause both precision and recall to change, but a stable precision with dropping recall points to a threshold or label distribution issue, not necessarily input feature drift. Option C is wrong because comparing the number of predictions per day with previous weeks only detects volume anomalies (e.g., traffic spikes), which do not affect recall directly and would not explain a systematic increase in false negatives. Option D is wrong because checking request latency at the endpoint measures infrastructure performance (e.g., network delays, compute bottlenecks), which has no causal link to model prediction quality like recall degradation.

24
MCQmedium

A team is using Vertex AI Feature Store to manage features for training and serving. They want to monitor the freshness of the features (i.e., how recently each feature was updated). Which approach should they take?

A.Use Cloud Logging to track feature updates
B.Use Vertex AI Feature Store's monitoring dashboard
C.Create a custom Cloud Monitoring metric based on feature ingestion timestamps
D.Use Cloud Audit Logs to monitor API calls
AnswerC

By exporting timestamps as custom metrics, the team can monitor feature freshness in Cloud Monitoring and set alerts.

Why this answer

Vertex AI Feature Store does not provide a built-in monitoring dashboard for feature freshness. To track how recently each feature was updated, you must create a custom Cloud Monitoring metric based on feature ingestion timestamps, which allows you to define alerting thresholds and visualize freshness over time.

Exam trap

The trap here is that candidates assume Vertex AI Feature Store has a built-in freshness monitoring dashboard, but it only provides monitoring for distribution drift and skew, not for update timestamps.

How to eliminate wrong answers

Option A is wrong because Cloud Logging captures log entries but is not designed for real-time metric-based monitoring of feature freshness; it would require parsing logs and creating custom metrics, which is less direct than using Cloud Monitoring. Option B is wrong because Vertex AI Feature Store's monitoring dashboard focuses on feature value distribution drift and skew, not on freshness or update timestamps. Option D is wrong because Cloud Audit Logs record API calls for compliance and security, not the actual data update timestamps needed to measure feature freshness.

25
MCQeasy

A data scientist wants to log prediction inputs and outputs for model monitoring. Which Google Cloud service is best suited for this?

A.Cloud Monitoring
B.Cloud Storage
C.Cloud Logging
D.BigQuery
AnswerC

Cloud Logging can ingest and store prediction logs.

Why this answer

Cloud Logging is the best choice because it is designed to ingest, store, and analyze log data, including custom log entries from applications. The data scientist can use the Cloud Logging API to write structured log entries containing prediction inputs and outputs, then query them using Logs Explorer or export them for further analysis. This aligns with the requirement to log prediction inputs and outputs for model monitoring, as Cloud Logging provides a centralized, scalable, and queryable log management service.

Exam trap

Google Cloud often tests the distinction between logging (Cloud Logging) and monitoring (Cloud Monitoring), where candidates mistakenly choose Cloud Monitoring because they think 'monitoring' includes logging, but Cloud Monitoring is for metrics and alerts, not for storing and querying log data.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring is focused on collecting metrics, uptime checks, and alerting on system performance (e.g., CPU utilization, latency), not on storing and querying arbitrary log data like prediction inputs and outputs. Option B is wrong because Cloud Storage is an object storage service for unstructured data (e.g., images, backups), not a log management service; it lacks native querying capabilities for log entries and is not designed for real-time log ingestion and search. Option D is wrong because BigQuery is a serverless data warehouse for analytical queries on large structured datasets, not a log management service; while it can store logs exported from Cloud Logging, it is not the primary service for ingesting and querying log entries in real time.

26
Multi-Selecteasy

Which TWO actions are appropriate when you detect that a production model's prediction distribution has shifted significantly from the training distribution?

Select 2 answers
A.Immediately roll back to the previous model version
B.Increase logging for future predictions
C.Retrain the model using the most recent data
D.Investigate the cause of the shift before taking corrective action
E.Reduce the traffic to the model to minimize impact
AnswersC, D

Adapts model to new distribution.

Why this answer

Option C is correct because retraining the model on the most recent data directly addresses the distribution shift by adapting the model to the new data patterns. This is a standard practice in MLOps when the shift is confirmed and the cause is understood, ensuring the model remains accurate and reliable in production.

Exam trap

Google Cloud often tests the misconception that immediate rollback or traffic reduction is the correct first action, when in fact the proper response is to investigate the cause before taking corrective action like retraining.

27
MCQmedium

A company implements an ML pipeline using Vertex AI Pipelines. The pipeline trains a model using custom training jobs and then deploys it to an endpoint. The team notices that the endpoint occasionally serves an older model version for a few minutes after a new pipeline run completes. What is the most likely cause?

A.The new model artifact is temporarily unavailable, so the endpoint falls back to the previous version.
B.The prediction cache is returning cached results from the old model.
C.The pipeline failed to update the endpoint with the new model ID.
D.The endpoint is configured with a canary traffic split, and the old model is still receiving a fraction of traffic during the rollout.
AnswerD

Canary deployments gradually shift traffic, so some requests hit the old model until the rollout is complete.

Why this answer

D is correct because Vertex AI endpoints can be configured with a canary (gradual) traffic rollout strategy. When a new model is deployed, traffic is shifted incrementally from the old model to the new one over a specified duration. During this rollout window, the old model continues to serve a fraction of requests, which explains why users occasionally see the older model version for a few minutes after the pipeline completes.

Exam trap

The trap here is that candidates confuse a canary rollout with a deployment failure or caching issue, assuming the old model persists due to an error rather than recognizing it as an intentional traffic-splitting mechanism during a gradual rollout.

How to eliminate wrong answers

Option A is wrong because Vertex AI endpoints do not automatically fall back to a previous model version when a new artifact is temporarily unavailable; instead, the deployment would fail or the endpoint would return an error. Option B is wrong because Vertex AI endpoints do not have a built-in prediction cache that returns cached results from an old model; caching is not a default behavior for model serving. Option C is wrong because if the pipeline failed to update the endpoint with the new model ID, the endpoint would consistently serve the old model, not just occasionally for a few minutes.

28
MCQeasy

You have an online prediction model that is showing increasing prediction latency. You have already verified that the request rate and input data size are unchanged. Which of the following should you investigate next?

A.Check if the model was recently updated to a larger version
B.Check the monitoring dashboard configuration
C.Check if the feature engineering logic was changed
D.Check the geographic location of the endpoint
AnswerA

Larger model increases inference latency.

Why this answer

If request rate and input data size are unchanged, increased prediction latency often points to a change in the model itself. A larger model (e.g., deeper neural network, more parameters) requires more computation per inference, directly increasing latency. This is a common root cause when monitoring ML pipelines, as model version updates can silently alter performance characteristics.

Exam trap

Google Cloud often tests the distinction between network-level latency (e.g., geographic location) and compute-level latency (e.g., model size), tempting candidates to pick the geographic option when the root cause is model-related.

How to eliminate wrong answers

Option B is wrong because the monitoring dashboard configuration only affects how metrics are displayed or alerted, not the underlying latency of predictions. Option C is wrong because feature engineering logic changes would alter input data size or structure, but the question states input data size is unchanged. Option D is wrong because the geographic location of the endpoint affects network latency, not the model's prediction latency (which is server-side compute time).

29
MCQmedium

You are monitoring a machine learning pipeline that runs on Vertex AI Pipelines. The pipeline occasionally fails with a 'ResourceExhausted' error when attempting to read data from BigQuery. Which action should you take to resolve this issue?

A.Switch from BigQuery to Cloud Storage for data source
B.Increase the memory allocated to the pipeline step
C.Reduce the complexity of the BigQuery query or increase the reservation size
D.Reduce the batch size of the data being read
AnswerC

ResourceExhausted error is due to BigQuery limits; simplifying query or increasing slots can help.

Why this answer

The 'ResourceExhausted' error when reading from BigQuery indicates that the query is consuming more resources than the BigQuery reservation allows. Option C is correct because reducing query complexity (e.g., using fewer JOINs, aggregations, or partitions) or increasing the reservation size directly addresses the root cause by either lowering resource demand or allocating more capacity. Other options like switching to Cloud Storage or adjusting pipeline memory do not fix the BigQuery-specific quota or slot exhaustion.

Exam trap

Google Cloud often tests the misconception that memory or batch size adjustments in the pipeline environment can fix backend service quota errors, when in fact the error is specific to BigQuery's resource management (slots/queries) and requires query optimization or reservation changes.

How to eliminate wrong answers

Option A is wrong because switching to Cloud Storage does not resolve the BigQuery resource exhaustion; it changes the data source but introduces new latency and format compatibility issues without addressing the query's resource consumption. Option B is wrong because increasing memory allocated to the pipeline step only affects the compute environment (e.g., the container running the pipeline), not the BigQuery service's slot or query quota limits. Option D is wrong because reducing the batch size of data being read may reduce memory pressure on the pipeline but does not affect the BigQuery query's resource usage; the error originates from BigQuery's backend, not from the client-side read volume.

30
Multi-Selecthard

An ML engineer is building a monitoring dashboard for a Vertex AI pipeline that includes training, evaluation, and batch prediction. Which THREE components should be included to provide comprehensive observability? (Select THREE.)

Select 3 answers
A.Pipeline execution status, duration, and failure rates for each component.
B.Compute engine CPU and memory logs for each pipeline step.
C.Model evaluation metrics (e.g., accuracy, AUC) after training and validation.
D.Data validation reports showing anomaly counts and feature statistics.
E.Online prediction latency and request count from the deployed model endpoint.
AnswersA, C, D

Core pipeline health metrics.

Why this answer

Option A is correct because pipeline execution status, duration, and failure rates are fundamental metrics for monitoring the health and performance of a Vertex AI pipeline. These metrics allow the ML engineer to quickly identify bottlenecks, track overall workflow progress, and detect failures in training, evaluation, or batch prediction steps, which is essential for comprehensive observability.

Exam trap

The trap here is that candidates often confuse infrastructure monitoring (CPU/memory logs) or serving-layer metrics (online prediction latency) with pipeline-specific observability, leading them to select options that are relevant to different stages of the ML lifecycle rather than the pipeline itself.

31
MCQhard

A retail company deployed a demand forecasting model using TensorFlow on Vertex AI Batch Prediction. The model runs weekly on a large dataset stored in BigQuery. Over the past month, the prediction accuracy has degraded significantly. The ML engineer reviews the monitoring dashboard and sees that the feature distribution for 'product_price' has shifted from a mean of $50 to $55, and the new product category 'electronics' now represents 20% of the data, whereas it was only 5% in training. The model was never retrained after initial deployment six months ago. The engineer also notices that the Vertex Explainable AI feature importance scores have changed: 'product_price' used to be the top feature (importance 0.35) but now ranks third (importance 0.20). The company requires minimal downtime and wants to improve accuracy as quickly as possible without incurring high costs from excessive retraining. Which course of action should the ML engineer take?

A.Increase the complexity of the model by switching from a feedforward neural network to a gradient boosted tree ensemble, and then deploy without retraining.
B.Route all predictions to human reviewers until the model can be re-evaluated, and then manually correct the outputs.
C.Retrain the model using the most recent 3 months of data, including all new product categories, and deploy the updated model via a new Vertex AI endpoint.
D.Adjust the prediction threshold for the 'product_price' feature to account for the price shift, and monitor for another month.
AnswerC

Retraining with recent data addresses both covariate shift and concept drift, and is the standard approach for maintaining accuracy.

Why this answer

The correct action is to retrain the model with the latest data because the feature distributions and data composition have changed significantly (covariate shift and concept drift). Simply using a more complex model (B) may overfit without addressing the underlying drift. Adjusting thresholds (C) is insufficient because the model's predictions are likely inaccurate.

Sending all data to a human review (D) is costly and not scalable; retraining is the proper response.

32
MCQmedium

A machine learning engineer notices that the online prediction latency for a custom TensorFlow model deployed on Vertex AI has increased significantly over the past week. Cloud Monitoring shows that the CPU utilization of the endpoints remains below 40%, but the number of concurrent requests has doubled. What is the most likely cause of the latency increase?

A.Data skew causing longer inference time
B.Memory leak in the serving container
C.Insufficient number of replicas for autoscaling
D.Model overfitting
AnswerC

If the number of replicas is not scaling fast enough to match increased concurrency, requests queue up, leading to higher latency while each replica's CPU is underutilized.

Why this answer

Option C is correct because the CPU utilization remains below 40% while concurrent requests have doubled, indicating that the existing replicas are not saturated on CPU but are bottlenecked by request queuing or thread contention. Vertex AI autoscaling scales based on CPU utilization by default; if the threshold is not crossed, new replicas are not provisioned, causing requests to queue and latency to spike. The engineer should verify the autoscaling configuration and consider scaling on request count or reducing the CPU utilization target.

Exam trap

Google Cloud often tests the misconception that low CPU utilization always means there is spare capacity, when in reality the bottleneck can be request queuing or thread pool exhaustion that does not raise CPU usage.

How to eliminate wrong answers

Option A is wrong because data skew would cause a persistent increase in per-request inference time, but the observation shows CPU utilization is low and latency increased only after request volume doubled, not due to a change in data distribution. Option B is wrong because a memory leak would manifest as increasing memory usage over time, potentially causing OOM kills or garbage collection pauses, but the described symptom is low CPU and doubled concurrency, not memory pressure. Option D is wrong because model overfitting affects prediction accuracy, not inference latency; overfitting does not change the computational cost of a forward pass.

33
MCQeasy

An ML engineer is monitoring a Vertex AI Feature Store used for online serving. Which metrics are most important to track for ensuring low-latency online serving?

A.Number of feature stores and feature values.
B.Storage utilization and write throughput to the feature store.
C.Batch export duration and number of exported features.
D.Feature value retrieval latency (p99) and error rate.
AnswerD

These directly affect online serving performance.

Why this answer

For online serving, the primary concern is the latency and reliability of feature value retrieval at inference time. The p99 retrieval latency directly measures the worst-case delay experienced by users, while the error rate captures failures that could cause serving disruptions. Other metrics like storage utilization or batch export duration are relevant for offline or batch pipelines, not real-time serving.

Exam trap

The trap here is that candidates confuse metrics for offline batch operations (like export duration) with those for online serving, or assume that storage-level metrics (like utilization) are sufficient for performance monitoring, when in fact only retrieval latency and error rate directly reflect the serving quality.

How to eliminate wrong answers

Option A is wrong because the number of feature stores and feature values does not directly impact serving latency; it is a capacity planning metric, not a performance indicator. Option B is wrong because storage utilization and write throughput are important for data ingestion and maintenance, but they do not measure the online retrieval performance that affects inference latency. Option C is wrong because batch export duration and number of exported features pertain to offline batch serving or data export jobs, not the low-latency online serving path.

34
MCQmedium

A data science team uses TFX to train and deploy a model on Vertex AI. They want automated monitoring for pipeline health. Which set of metrics should they monitor to quickly detect issues in the training pipeline?

A.Prediction request count, latency, and error rate on the serving endpoint.
B.Pipeline execution status (success/failure), component completion times, and data validation anomalies.
C.Number of pipeline runs, average CPU utilization, and memory usage.
D.Model accuracy, precision, and recall on the evaluation dataset.
AnswerB

Directly monitors pipeline health including data quality.

Why this answer

Option B is correct because the question specifically asks about monitoring the training pipeline's health, not the serving infrastructure. Pipeline execution status directly indicates whether the pipeline ran successfully, component completion times help identify bottlenecks or failures, and data validation anomalies catch data quality issues early in the pipeline — all of which are essential for detecting issues in the training pipeline itself.

Exam trap

The trap here is that candidates confuse serving endpoint metrics (like latency and error rate) with pipeline health metrics, because both are part of an ML system, but the question explicitly asks about the training pipeline, not the serving infrastructure.

How to eliminate wrong answers

Option A is wrong because prediction request count, latency, and error rate are metrics for monitoring the serving endpoint (model serving), not the training pipeline. Option C is wrong because number of pipeline runs, average CPU utilization, and memory usage are infrastructure-level metrics that do not directly indicate pipeline health or data quality issues. Option D is wrong because model accuracy, precision, and recall are evaluation metrics for model performance, not for detecting issues in the training pipeline's execution or data validation.

35
Multi-Selectmedium

Which TWO metrics should you monitor to detect data drift in a batch prediction pipeline?

Select 2 answers
A.Model accuracy on recent labeled data
B.Model prediction latency
C.Feature distribution drift (e.g., KS test)
D.Prediction distribution drift
E.Training data size
AnswersC, D

Directly measures input drift.

Why this answer

Feature distribution drift (C) is correct because it directly measures changes in the input data distribution over time using statistical tests like the Kolmogorov-Smirnov (KS) test, which compares the cumulative distribution of a feature in the current batch against a reference baseline. This is a primary indicator of data drift, as shifts in feature distributions can degrade model performance even if labels are not immediately available.

Exam trap

Google Cloud often tests the distinction between monitoring for data drift (input distribution changes) versus monitoring for model performance degradation (accuracy), leading candidates to incorrectly select accuracy as a drift metric when it is actually a downstream effect.

36
MCQmedium

You have deployed a regression model that predicts house prices. Over the past month, the model's predictions have been consistently too high. You suspect data drift in the input features. Which monitoring metric should you prioritize to confirm this?

A.Monitor prediction drift (prediction distribution)
B.Monitor feature distribution drift using a divergence metric like Jensen-Shannon divergence
C.Monitor feature attribution drift using SHAP values
D.Monitor residual distribution drift
AnswerB

Feature drift measures input distribution change.

Why this answer

Option B is correct because the question describes a scenario where predictions are consistently too high, which is a symptom of data drift—a change in the distribution of input features. Monitoring feature distribution drift using a divergence metric like Jensen-Shannon divergence directly measures whether the input data has shifted from the training distribution, which would cause the model to make biased predictions. This is the most direct way to confirm data drift in the input features.

Exam trap

Google Cloud often tests the distinction between monitoring prediction drift (output) and feature drift (input), trapping candidates who assume that a change in predictions automatically implies data drift without verifying the input distributions.

How to eliminate wrong answers

Option A is wrong because monitoring prediction drift (prediction distribution) only tells you that the outputs have changed, not why; it does not isolate whether the cause is data drift in features or other issues like concept drift. Option C is wrong because monitoring feature attribution drift using SHAP values measures changes in feature importance, not changes in the feature distributions themselves; it can indicate which features are driving predictions differently but does not directly confirm data drift. Option D is wrong because monitoring residual distribution drift focuses on the errors (residuals) between predictions and actual values, which can be influenced by both data drift and concept drift; it does not specifically confirm data drift in input features.

37
Multi-Selecteasy

An ML team wants to monitor their recommendation model for fairness. Which TWO metrics should they track to detect potential bias? (Select TWO.)

Select 2 answers
A.Pair-wise fairness metrics such as equal opportunity difference.
B.Recall for the minority group only.
C.Overall accuracy on the test set.
D.Average prediction confidence per request.
E.Prediction distribution (e.g., top-K recommendations) across different sensitive attribute groups.
AnswersA, E

Standard fairness metric.

Why this answer

Pair-wise fairness metrics like equal opportunity difference directly compare model outcomes (e.g., true positive rates) across sensitive groups, making them a standard tool for detecting bias in classification tasks. This metric measures the difference in true positive rates between privileged and unprivileged groups, where a value close to zero indicates fairness. Tracking such metrics aligns with the core principle of monitoring for disparate impact in ML systems.

Exam trap

Google Cloud often tests the misconception that overall accuracy or group-specific recall alone is sufficient for fairness monitoring, when in fact comparative metrics across groups are required to detect bias.

38
MCQmedium

A company deploys a classification model on Vertex AI for loan approval. After a month, they notice the precision has dropped significantly. What should they do first?

A.Retrain the model with more data
B.Increase the number of prediction nodes
C.Check for data drift using Vertex AI Model Monitoring
D.Revert to the previous model version
AnswerC

Model monitoring is designed to detect drift, which could cause precision drop.

Why this answer

Option C is correct because a sudden drop in precision indicates that the model's predictions are no longer aligning with the ground truth, which is a classic symptom of data drift. Vertex AI Model Monitoring can automatically detect drift in feature distributions or prediction output compared to a baseline, allowing you to identify the root cause before taking corrective action. Retraining or reverting without first diagnosing the drift could waste resources or mask the underlying issue.

Exam trap

Google Cloud often tests the misconception that any performance degradation should be immediately fixed by retraining or rolling back, rather than first diagnosing the cause through monitoring tools like Vertex AI Model Monitoring.

How to eliminate wrong answers

Option A is wrong because retraining with more data does not address the root cause if the data distribution has shifted; it may even reinforce the drift if the new data is also drifted. Option B is wrong because increasing prediction nodes only improves throughput and latency, not prediction quality or precision. Option D is wrong because reverting to a previous model version is a reactive rollback that does not diagnose why precision dropped; the old model may also suffer from drift if the environment has changed.

39
MCQeasy

A data science team deploys a regression model to predict house prices. After one month, the mean absolute error (MAE) on the serving data increases by 20% compared to the test set. Which monitoring strategy should the team implement first to diagnose the issue?

A.Retrain the model daily with the latest data to adapt to changing patterns.
B.Monitor prediction residuals and compute serving-time MAE over sliding windows.
C.Compare the distribution of training labels with serving labels using a two-sample t-test.
D.Monitor input feature distributions for drift using the Kolmogorov-Smirnov test.
AnswerB

Directly tracking MAE on serving data over time is the most straightforward diagnostic for performance degradation.

Why this answer

Option B is correct because the first step in diagnosing a 20% MAE increase on serving data is to monitor prediction residuals over sliding windows. This directly tracks how model errors evolve in production, allowing the team to detect whether performance degradation is sudden or gradual, and to correlate it with specific time windows or data slices. Computing serving-time MAE on sliding windows provides an immediate, interpretable signal of model health without assuming the root cause.

Exam trap

Google Cloud often tests the misconception that the first step in diagnosing model degradation is to check for data drift (Option D), when in fact the correct first step is to confirm and quantify the performance drop itself using serving-time metrics like sliding-window MAE.

How to eliminate wrong answers

Option A is wrong because retraining daily without first diagnosing the cause of the MAE increase is a reactive, resource-intensive approach that may mask underlying issues like data drift or concept drift, and does not help identify whether retraining is even necessary. Option C is wrong because comparing training labels with serving labels using a two-sample t-test checks for label distribution shift, but the MAE increase could be due to feature drift, concept drift, or data quality issues unrelated to label distribution; this test is too narrow and may miss the actual cause. Option D is wrong because monitoring input feature distributions for drift using the Kolmogorov-Smirnov test is a valid technique, but it is a secondary diagnostic step; the first priority should be to confirm and characterize the performance degradation itself via residual monitoring before investigating potential causes.

40
MCQmedium

A company deploys a batch prediction job on Vertex AI using a custom container. The job completes successfully, but the predictions are later found to be inaccurate. The ML engineer wants to set up monitoring to detect similar issues proactively. Which approach should the engineer take?

A.Use Cloud Monitoring to create a custom metric for prediction confidence and set an alert when confidence drops below 0.8.
B.Use Cloud Logging to export prediction requests and responses, then create a metric based on prediction count.
C.Export batch predictions to BigQuery, and use Vertex AI Model Monitoring to compare prediction distributions against a baseline.
D.Enable Cloud Audit Logs to track when the batch prediction job runs and analyze the logs for anomalies.
AnswerC

Model Monitoring detects drift by comparing predictions to a baseline.

Why this answer

Option C is correct because Vertex AI Model Monitoring can compare the distribution of batch prediction outputs (stored in BigQuery) against a baseline distribution to detect data drift or skew, which is the most direct way to proactively identify prediction inaccuracies. This approach monitors the statistical properties of predictions over time, catching shifts that could cause accuracy degradation even when the job runs successfully.

Exam trap

The trap here is that candidates assume monitoring prediction confidence or logging request counts is sufficient for detecting inaccuracies, but the PMLE exam specifically tests the concept of distribution drift monitoring as the correct proactive approach for batch prediction quality.

How to eliminate wrong answers

Option A is wrong because prediction confidence is a model-specific output (e.g., softmax probabilities) that may not exist for all models (e.g., regression models), and a fixed threshold of 0.8 is arbitrary; the question requires detecting inaccuracies proactively, not monitoring a single confidence score. Option B is wrong because exporting prediction requests/responses to Cloud Logging and creating a metric based on prediction count only tracks volume, not prediction quality or drift; count metrics cannot detect inaccuracies. Option D is wrong because Cloud Audit Logs track administrative actions (e.g., who ran the job), not the prediction data itself; analyzing audit logs for anomalies would not reveal prediction inaccuracies.

41
MCQhard

Refer to the exhibit. An alert policy is configured to trigger when prediction latency exceeds 500 ms for 5 consecutive minutes. The team is experiencing many false positive alerts during brief latency spikes. Which adjustment would most effectively reduce false positives while still detecting prolonged latency issues?

A.Change the comparison to less than
B.Add a condition that CPU utilization is also high
C.Increase the duration to 30 minutes
D.Increase the threshold to 1000 ms
AnswerC

A longer duration means the condition must persist for 30 minutes, filtering out brief spikes while still catching sustained high latency.

Why this answer

Increasing the duration from 5 to 30 minutes (Option C) directly addresses the problem of false positives from brief latency spikes by requiring the latency to exceed 500 ms for a longer continuous period before triggering an alert. This ensures that only sustained, prolonged latency issues—not transient spikes—activate the policy, aligning with the goal of detecting genuine degradation while ignoring noise.

Exam trap

Google Cloud often tests the distinction between threshold and duration adjustments, trapping candidates who think raising the threshold (Option D) is the only way to reduce false positives, when in fact increasing the evaluation window is more precise for filtering out transient spikes without compromising detection of sustained issues.

How to eliminate wrong answers

Option A is wrong because changing the comparison to 'less than' would invert the logic, triggering alerts when latency is below 500 ms, which is the opposite of detecting high latency and would generate false positives for normal or low-latency conditions. Option B is wrong because adding a condition that CPU utilization is also high introduces an unnecessary dependency that may miss prolonged latency issues caused by other factors (e.g., network bottlenecks, memory pressure, or I/O wait), and it does not address the core problem of brief latency spikes. Option D is wrong because increasing the threshold to 1000 ms would allow sustained latency between 500 ms and 1000 ms to go undetected, failing to capture prolonged issues that still violate the original 500 ms requirement, and it does not filter out brief spikes.

42
Multi-Selectmedium

A machine learning engineer is monitoring a deployed churn prediction model that has shown a gradual decline in accuracy over the past month. The engineer wants to diagnose the root cause of the performance degradation. Which TWO actions should the engineer take? (Choose two.)

Select 2 answers
A.Increase the model's learning rate and fine-tune it on the latest data.
B.Immediately retrain the model using all available historical data to improve accuracy.
C.Deploy a second model in parallel to compare predictions.
D.Use Vertex AI Model Monitoring to detect data drift by comparing the distribution of recent input features against the training data distribution.
E.Monitor the model's prediction accuracy by comparing recent predictions against newly collected ground truth labels.
AnswersD, E

Detecting data drift helps identify if the input distribution has changed, which often causes prediction drift.

Why this answer

Option D is correct because Vertex AI Model Monitoring is specifically designed to detect data drift by comparing the distribution of recent input features against the training data distribution. This allows the engineer to identify if the gradual decline in accuracy is caused by changes in the input data, which is a common root cause for model performance degradation over time.

Exam trap

The trap here is that candidates often confuse reactive retraining (Option B) with diagnostic monitoring, failing to recognize that the first step in troubleshooting performance degradation is to identify the root cause through drift detection and ground truth comparison, not to immediately modify or retrain the model.

43
MCQmedium

A company deploys a custom ML model on Vertex AI to predict customer churn. The model retrains weekly, and predictions are served via a Vertex AI endpoint. After a recent retraining, the monitoring dashboard shows a sudden increase in prediction requests but a decrease in predicted churn probabilities. The model's accuracy on the validation set remains stable. What is the most likely cause of the observed behavior?

A.A training-serving skew exists between the training pipeline and the serving endpoint.
B.Concept drift has occurred, changing the relationship between features and churn.
C.The incoming data distribution has changed, e.g., due to a new marketing campaign attracting different customers.
D.Data leakage during training caused the model to overfit to historical patterns.
AnswerC

This is covariate shift; the model sees inputs it wasn't trained on, leading to lower confidence predictions.

Why this answer

Option C is correct because a sudden increase in prediction requests alongside a decrease in predicted churn probabilities, while validation accuracy remains stable, indicates a shift in the incoming data distribution (covariate shift). This is typical when a new marketing campaign attracts a different customer segment that inherently has lower churn risk. The model itself hasn't degraded; it's simply seeing a different population than it was trained on, which changes the base rate of churn in the live traffic.

Exam trap

Google Cloud often tests the distinction between covariate shift (data distribution change) and concept drift (relationship change), trapping candidates who assume any change in predictions must be due to model degradation or data leakage.

How to eliminate wrong answers

Option A is wrong because training-serving skew refers to a mismatch in feature preprocessing or data format between training and serving, which would typically cause a drop in accuracy or anomalous predictions, not a stable validation accuracy with a shift in prediction distribution. Option B is wrong because concept drift would change the relationship between features and the target (churn), leading to a decline in model accuracy on the validation set, which is explicitly stated as stable. Option D is wrong because data leakage during training would cause overfitting to historical patterns, resulting in poor generalization and a drop in validation accuracy, not a stable accuracy with a shift in prediction probabilities.

44
MCQeasy

A model deployed on Vertex AI Endpoints shows increasing prediction latency. What is the most scalable way to reduce latency?

A.Switch to a larger machine type
B.Enable autoscaling with min nodes increased
C.Use batch prediction instead
D.Deploy multiple model versions
AnswerB

Autoscaling adds nodes during high load, reducing latency.

Why this answer

Increasing the minimum number of nodes in autoscaling ensures that a baseline of compute capacity is always ready to handle requests, reducing cold-start latency. This is the most scalable approach because it allows the endpoint to dynamically scale up during traffic spikes while maintaining a floor of pre-warmed instances, directly addressing prediction latency without over-provisioning.

Exam trap

The trap here is that candidates confuse 'scalability' with 'raw performance' and choose a larger machine type (A), not realizing that horizontal scaling with pre-warmed nodes is more cost-effective and elastic for reducing latency under variable load.

How to eliminate wrong answers

Option A is wrong because switching to a larger machine type (e.g., more vCPUs or memory) can reduce per-request latency but is not scalable—it increases cost linearly and does not handle traffic bursts efficiently, as it still relies on a single node's capacity. Option C is wrong because batch prediction is designed for offline, asynchronous processing of large datasets and does not reduce real-time prediction latency; it actually increases end-to-end time for individual requests. Option D is wrong because deploying multiple model versions does not inherently reduce latency; it adds routing overhead and does not address compute capacity or cold starts, and is intended for A/B testing or gradual rollouts, not performance optimization.

45
MCQhard

A recommendation system model is updated daily via a retraining pipeline. After each update, the online prediction latency increases significantly for about 30 minutes before returning to normal. What is the most likely cause and solution?

A.The Vertex AI endpoint autoscaling policy is too aggressive, causing scale-down during retraining.
B.The retraining pipeline runs on a GKE cluster that shares resources with the serving endpoint.
C.The model is being switched from CPU to GPU at deployment.
D.The new model version causes cold start in the serving infrastructure; pre-warm the model by sending a dummy request after deployment.
AnswerD

Pre-warming ensures the model is loaded into memory before serving real traffic.

Why this answer

Option A is correct because the cold start due to model version change causes initial slow inference while caches warm up, and pre-warming with traffic can mitigate. Option B is wrong because GKE is not directly involved. Option C is wrong because GPU switching is not needed.

Option D is wrong because the issue is not resource contention.

46
MCQmedium

An MLOps team wants to set up alerts for GPU memory utilization on Vertex AI Training jobs. Which approach is most efficient?

A.Enable Cloud Audit Logs for the training job and parse the logs for GPU memory events.
B.Create a log-based metric from the training job's GPU logs.
C.Add a container sidecar that emits a custom metric for GPU memory usage via OpenCensus.
D.Use the 'compute.googleapis.com/accelerator/memory_utilization' metric with a metric threshold condition.
AnswerD

Automatically collected GPU metric.

Why this answer

Option D is correct because Vertex AI training jobs automatically export the 'compute.googleapis.com/accelerator/memory_utilization' metric to Cloud Monitoring. This metric is natively collected by the Google Cloud agent on the training VM, so you can directly create a metric threshold alert without any custom instrumentation or log parsing. It is the most efficient approach as it requires zero additional code or configuration.

Exam trap

Google Cloud often tests the misconception that custom instrumentation (sidecars or log parsing) is always required for GPU monitoring, when in fact Vertex AI provides a native metric that eliminates that need.

How to eliminate wrong answers

Option A is wrong because Cloud Audit Logs record administrative actions (e.g., who created a job), not runtime GPU memory utilization; they lack the granularity needed for real-time resource monitoring. Option B is wrong because log-based metrics require you to first generate GPU memory logs (which Vertex AI does not emit by default) and then parse them, adding latency and complexity compared to using a pre-existing metric. Option C is wrong because adding a sidecar container to emit a custom metric via OpenCensus is unnecessary overhead; Vertex AI already exposes the exact GPU memory metric natively, making a sidecar redundant and less efficient.

47
MCQmedium

Your ML pipeline uses Vertex AI Feature Store to serve features for online predictions. You need to monitor the freshness of features in the online store. Which approach is most effective?

A.Set up a Cloud Monitoring alert for feature store entity count.
B.Schedule a nightly BigQuery batch job to compare feature values.
C.Create a custom metric in Cloud Monitoring that tracks the time since last feature update, and set an alert threshold.
D.Enable detailed audit logs in Feature Store and export to BigQuery.
AnswerC

Directly measures staleness.

Why this answer

Option C is correct because Cloud Monitoring custom metrics allow you to track the timestamp of the last feature update in Vertex AI Feature Store and set an alert threshold for staleness. This directly measures feature freshness, which is critical for online predictions where stale features can degrade model accuracy. Other options either measure unrelated metrics (entity count), are too slow (nightly batch), or focus on auditing rather than real-time monitoring.

Exam trap

The trap here is that candidates confuse monitoring entity count (a capacity metric) with freshness, or assume that batch comparison or audit logs provide real-time monitoring, when only a custom staleness metric with alerting directly addresses the requirement.

How to eliminate wrong answers

Option A is wrong because monitoring the entity count in the feature store tracks the number of stored feature values, not the time since they were last updated, so it cannot detect staleness. Option B is wrong because a nightly BigQuery batch job introduces latency of up to 24 hours, making it unsuitable for real-time freshness monitoring required for online predictions. Option D is wrong because enabling detailed audit logs and exporting to BigQuery provides an after-the-fact record of changes but does not offer real-time alerting on feature staleness.

48
MCQeasy

An ML team is using Vertex AI Pipelines to run automated retraining workflows. They want to monitor pipeline execution and receive alerts when a pipeline run fails. Which Google Cloud service should they use to set up such alerts?

A.Vertex AI Metadata
B.Cloud Monitoring
C.Cloud Logging
D.Cloud Scheduler
AnswerB

Cloud Monitoring can be configured with alerts on metrics like pipeline run failure count or success rate.

Why this answer

Cloud Monitoring (formerly Stackdriver Monitoring) is the correct service because it provides alerting policies that can be triggered based on pipeline run status metrics, such as failure counts or run state changes. Vertex AI Pipelines automatically exports execution metrics to Cloud Monitoring, allowing you to define conditions (e.g., metric 'pipeline/run_count' with filter 'status=FAILED') and configure notifications via channels like email, Pub/Sub, or PagerDuty.

Exam trap

The trap here is that candidates confuse Cloud Logging (which stores logs) with Cloud Monitoring (which provides alerting), or assume Vertex AI Metadata can trigger alerts because it tracks pipeline metadata, but it lacks any notification or policy engine.

How to eliminate wrong answers

Option A is wrong because Vertex AI Metadata is a managed metadata store for tracking artifacts, lineage, and executions; it does not provide alerting capabilities. Option C is wrong because Cloud Logging is for storing and querying logs, not for setting up proactive alerts on pipeline failures (though logs can be used to create log-based metrics, the question specifically asks for alerts on pipeline execution, which is natively handled by Cloud Monitoring metrics). Option D is wrong because Cloud Scheduler is a cron job service for triggering workflows on a schedule; it cannot monitor pipeline runs or generate failure alerts.

49
MCQeasy

Your company deploys batch prediction jobs using Vertex AI Batch Prediction. You need to monitor the jobs for failures and performance. What is the recommended approach?

A.Use Cloud Logging to export batch prediction logs and create log-based metrics.
B.Set up email alerts in the Vertex AI console for failed jobs.
C.Use Cloud Monitoring to create custom dashboards and alerts based on Vertex AI batch prediction metrics.
D.Enable the Recommender to get optimization suggestions for batch jobs.
AnswerC

Cloud Monitoring natively supports Vertex AI metrics for batch predictions.

Why this answer

Option C is correct because Cloud Monitoring (formerly Stackdriver) is the native Google Cloud service for collecting, visualizing, and alerting on metrics from Vertex AI, including batch prediction job success rates, latency, and resource utilization. It provides pre-built dashboards and the ability to create custom alerts, making it the recommended approach for monitoring failures and performance in a centralized, scalable way.

Exam trap

Google Cloud often tests the misconception that Cloud Logging is the primary monitoring tool for metrics, when in fact Cloud Monitoring is the dedicated service for metrics and alerting, while Cloud Logging is for logs and log-based metrics only.

How to eliminate wrong answers

Option A is wrong because Cloud Logging is designed for log data, not structured metrics; while you could create log-based metrics from batch prediction logs, this is an indirect, less efficient method that lacks the pre-built performance metrics and alerting capabilities of Cloud Monitoring. Option B is wrong because email alerts in the Vertex AI console are not a native feature; Vertex AI does not provide a built-in email alerting mechanism for job failures—alerts must be configured through Cloud Monitoring or Cloud Logging. Option D is wrong because the Recommender provides optimization suggestions (e.g., machine type, resource allocation) but does not monitor job failures or performance in real time; it is a post-hoc analysis tool, not a monitoring solution.

50
MCQeasy

A retail company has deployed a machine learning model using Vertex AI Endpoints to predict inventory demand. The model was trained on data from the past two years and has been in production for six months. The team has enabled Vertex AI Model Monitoring to track prediction drift with an alert threshold of 0.2. Last week, they received an alert that the prediction drift score reached 0.35, exceeding the threshold. The engineer checks the monitoring dashboard and sees that the distribution of predictions has shifted noticeably compared to the training data. The engineer also notices that the model's accuracy metrics, computed from weekly ground truth data, have remained within acceptable range. What should the engineer do first?

A.Investigate the input feature distributions for the recent serving requests to identify if data drift is the underlying cause of the prediction drift.
B.Increase the prediction drift alert threshold to 0.4 to reduce the number of false alerts.
C.Retrain the model using the latest three months of data to incorporate recent trends.
D.Roll back to an earlier model version that had lower prediction drift.
AnswerA

By checking input feature distributions, the engineer can confirm whether data drift is present, which commonly causes prediction drift even if accuracy remains temporarily stable.

Why this answer

The prediction drift alert indicates a shift in prediction distribution, but accuracy is stable. This suggests data drift (change in input features) rather than concept drift. The engineer should first investigate input feature distributions to confirm if data drift is the cause.

Retraining (A) is premature without root cause analysis. Increasing the threshold (C) ignores the underlying issue. Rolling back (D) may not help if the previous version also suffers from the same data drift.

51
Multi-Selecteasy

A team is deploying a new model version. They want to ensure that they can quickly roll back if the new version performs poorly in production. Which TWO actions should they take? (Choose 2.)

Select 2 answers
A.Keep the old model version deployed alongside the new one
B.Configure Vertex AI Model Monitoring to compare predictions
C.Use traffic splitting to gradually shift traffic
D.Set up Cloud Monitoring alerts on model performance
E.Store multiple model versions in the same endpoint
AnswersC, E

Traffic splitting allows you to direct a small percentage of traffic to the new version and easily shift all traffic back if issues arise.

Why this answer

Option C is correct because traffic splitting allows you to gradually shift a percentage of inference requests from the old model version to the new one. If the new version performs poorly, you can immediately revert the split to 0% for the new version, providing a fast and controlled rollback without redeploying or disrupting service.

Exam trap

The trap here is that candidates confuse monitoring and alerting (options B and D) with the actual deployment and rollback mechanism, assuming that detecting poor performance is equivalent to being able to quickly roll back, when in fact you need a traffic management feature like traffic splitting to execute the rollback.

52
MCQmedium

A team deploys a model using Vertex AI and wants to monitor for concept drift. What should they track?

A.Number of prediction requests
B.Model prediction latency
C.Changes in input data distribution
D.Changes in the relationship between inputs and outputs
AnswerD

Concept drift is a change in the underlying function mapping inputs to outputs.

Why this answer

Concept drift refers to a change in the underlying relationship between the input features and the target variable over time, which degrades model performance. In Vertex AI, monitoring this requires tracking the statistical relationship between inputs and outputs (e.g., via prediction residuals or model performance metrics), not just the input distribution alone. Option D correctly identifies this need, as concept drift is fundamentally about the input-output mapping shifting, even if the input distribution remains stable.

Exam trap

Google Cloud often tests the distinction between data drift (input distribution changes) and concept drift (input-output relationship changes), and the trap here is that candidates confuse the two, picking Option C because they think monitoring input data is sufficient for detecting all model degradation.

How to eliminate wrong answers

Option A is wrong because the number of prediction requests measures traffic volume, not data or concept drift; it is a scaling or operational metric, not a model quality metric. Option B is wrong because prediction latency measures inference speed, which is a performance indicator unrelated to the statistical properties of data or model relationships. Option C is wrong because changes in input data distribution represent data drift (covariate shift), not concept drift; while data drift can cause concept drift, monitoring only input distribution misses shifts in the input-output relationship that occur without distributional changes.

53
MCQeasy

You need to set up monitoring for a Vertex AI model that serves predictions in real-time. The model is expected to have a latency SLA of under 100ms. Which metric should you configure an alert on to ensure the SLA is met?

A.p50 latency of prediction requests
B.Prediction drift score
C.p99 latency of prediction requests
D.Number of prediction requests per second
AnswerC

p99 captures tail latency critical for SLA.

Why this answer

Option C is correct because p99 latency measures the worst-case latency experienced by 99% of requests, which is the standard metric for enforcing a strict SLA like under 100ms. Monitoring p99 ensures that even the slowest 1% of requests do not violate the threshold, providing a robust guarantee for real-time predictions.

Exam trap

Google Cloud often tests the misconception that median (p50) latency is sufficient for SLAs, but the trap is that SLAs require tail-latency guarantees (p99 or p999) to catch performance outliers that violate the threshold.

How to eliminate wrong answers

Option A is wrong because p50 latency (median) ignores the tail latency, meaning half of the requests could exceed 100ms without triggering an alert, failing the SLA. Option B is wrong because prediction drift score measures changes in model input/output distributions over time, not latency, and is irrelevant for SLA compliance. Option D is wrong because the number of prediction requests per second (throughput) does not measure individual request latency; high throughput can occur even if latency spikes above 100ms.

54
MCQmedium

You are an ML engineer at a logistics company. You have deployed a deep learning model on Vertex AI Endpoints using a custom container with GPU acceleration. The model predicts delivery times based on route features. After one week, you notice that the endpoint's GPU utilization is consistently at 10%, but the prediction latency has increased by 50%. The number of prediction requests per second has remained stable. You check the container logs and see no errors. The model is served using TensorFlow Serving with batching enabled (batch size: 32, batch timeout: 100ms). The custom container uses a single NVIDIA T4 GPU. You have also set the Vertex AI endpoint to use autoscaling with minReplicaCount: 1 and maxReplicaCount: 5, and the CPU utilization target is 60%. Which action should you take to reduce latency?

A.Increase the minReplicaCount to 3 to handle requests in parallel.
B.Reduce the CPU utilization target to 40% to trigger more aggressive autoscaling.
C.Quantize the model to FP16 to reduce compute time per inference.
D.Increase the batch size to 64 and batch timeout to 200ms to improve GPU utilization.
AnswerD

Larger batch sizes allow the GPU to process more data per inference, increasing throughput and reducing per-request latency once the batch fills up.

Why this answer

The core issue is low GPU utilization (10%) despite increased latency, indicating that the GPU is underutilized and the bottleneck is likely in batching or data pipeline overhead. Increasing the batch size to 64 and batch timeout to 200ms allows TensorFlow Serving to accumulate more requests per batch, improving GPU throughput and reducing per-request latency by better leveraging GPU parallelism. This directly addresses the mismatch between low GPU utilization and high latency.

Exam trap

The trap here is that candidates focus on scaling or model optimization (A, B, C) without recognizing that low GPU utilization with high latency is a classic sign of inefficient batching, not insufficient compute or replicas.

How to eliminate wrong answers

Option A is wrong because increasing minReplicaCount adds more CPU-bound replicas, which does not address the GPU underutilization and may increase cost without reducing latency, as the GPU is already idle. Option B is wrong because reducing the CPU utilization target triggers more aggressive autoscaling based on CPU, but the bottleneck is GPU utilization, not CPU; this would add more replicas without improving GPU efficiency. Option C is wrong because quantizing to FP16 could reduce compute time per inference, but the problem is low GPU utilization, not compute-bound operations; quantization may not help if the GPU is already idle due to small batch sizes, and it could introduce accuracy loss.

55
MCQmedium

Refer to the exhibit. What is the purpose of this query?

A.To detect data drift
B.To find prediction errors in Cloud Logging
C.To count all prediction requests
D.To monitor model latency
AnswerB

The filter uses PredictionError type and ERROR severity.

Why this answer

The query filters Cloud Logging entries for the string 'prediction failed', which directly indicates prediction errors logged by the ML prediction service. This is a common pattern for monitoring model inference failures in production, not for measuring drift, counting requests, or measuring latency.

Exam trap

Google Cloud often tests the distinction between log-based monitoring (for errors) and metric-based monitoring (for counts, latency, drift), so candidates mistakenly choose 'count all prediction requests' when the query clearly filters for failures, not all requests.

How to eliminate wrong answers

Option A is wrong because data drift detection requires comparing feature distributions over time, not searching for error log messages. Option C is wrong because counting all prediction requests would require a metric like `prediction_count` or a log-based metric counting all prediction entries, not filtering for failures. Option D is wrong because monitoring model latency requires timing metrics (e.g., `prediction_latency_ms`) or log entries with duration fields, not a search for 'prediction failed'.

56
Multi-Selecteasy

Your team deploys a model using Vertex AI Endpoints with autoscaling. Which TWO metrics are most important to monitor in order to optimize cost and performance? (Choose two.)

Select 2 answers
A.Number of active nodes in the endpoint.
B.Number of requests per minute.
C.CPU utilization of the serving containers.
D.Error rate (HTTP 4xx/5xx).
E.P99 prediction latency.
AnswersB, C

Indicates traffic patterns.

Why this answer

Option B is correct because the number of requests per minute directly drives autoscaling behavior in Vertex AI Endpoints. Monitoring this metric allows you to right-size the number of serving nodes to match traffic patterns, avoiding over-provisioning (cost) or under-provisioning (performance). Option C is correct because CPU utilization of the serving containers indicates whether the model is compute-bound or idle; high CPU suggests the need for more nodes, while low CPU suggests over-provisioning, directly impacting both cost and latency.

Exam trap

Google Cloud often tests the distinction between metrics that are direct inputs to autoscaling decisions (requests per minute, CPU utilization) versus metrics that are outcomes of scaling (active nodes, latency, error rate), leading candidates to mistakenly select outcome metrics as primary optimization drivers.

57
Multi-Selectmedium

Your team manages multiple ML models on Vertex AI. You need to implement a centralized monitoring solution to track model performance over time. Which TWO approaches should you consider? (Choose two.)

Select 2 answers
A.Store all prediction logs in BigQuery and analyze using SQL.
B.Use Cloud Source Repositories to track model code versions.
C.Create Cloud Monitoring dashboards and alerts based on Vertex AI metrics.
D.Use Vertex AI Model Monitoring to detect training-serving skew and feature drift for each model.
E.Enable Cloud Billing budgets to track cost per model.
AnswersC, D

Centralized view of all models.

Why this answer

Option C is correct because Cloud Monitoring provides centralized dashboards and alerting for Vertex AI metrics such as prediction latency, request count, and error rates, enabling you to track model performance over time without additional infrastructure. Option D is correct because Vertex AI Model Monitoring is purpose-built to detect training-serving skew and feature drift by comparing serving data distributions to training data, which is essential for maintaining model performance in production.

Exam trap

The trap here is that candidates may confuse logging (Option A) or cost tracking (Option E) with performance monitoring, or mistakenly think version control (Option B) is part of monitoring, when the question specifically asks for centralized monitoring of model performance over time.

58
MCQeasy

A company deploys an online prediction model serving 100 requests per second. They are optimizing for both latency and throughput. Which monitoring strategy should they use?

A.Monitor only the request count and set an alert if it drops below a threshold.
B.Set a single alert on the 99th percentile latency and ignore throughput since it's already high.
C.Monitor the error rate and set an alert if it exceeds 1%.
D.Monitor both the p50 and p99 latency, and the request count. Create a dashboard showing latency vs. throughput at different load levels.
AnswerD

Allows understanding of the relationship.

Why this answer

Option D is correct because monitoring both p50 and p99 latency alongside request count provides a comprehensive view of system performance under load. Latency percentiles reveal tail behavior (p99) and typical user experience (p50), while request count tracks throughput. A dashboard correlating latency vs. throughput at different load levels is essential for identifying performance cliffs or degradation before failures occur, aligning with best practices for production ML inference systems.

Exam trap

The trap here is that candidates often focus on a single metric (e.g., error rate or p99 latency) and overlook the need for multi-metric correlation, especially the latency-throughput trade-off, which is a core concept in monitoring ML systems under production load.

How to eliminate wrong answers

Option A is wrong because monitoring only request count and alerting on a drop below threshold ignores latency and error rate, missing critical issues like increased response times or silent failures that degrade user experience without reducing request count. Option B is wrong because setting a single alert on p99 latency and ignoring throughput neglects the trade-off between latency and throughput; high throughput can mask latency spikes, and p99 alone does not capture system capacity limits or performance under varying load. Option C is wrong because monitoring only error rate and alerting on 1% misses latency degradation and throughput drops; a system can have low error rates but high latency (e.g., due to queue buildup) or reduced throughput, both of which violate performance objectives.

59
MCQhard

A company uses Vertex AI Predictions with a custom container that invokes an external API for feature enrichment. The prediction response time is highly variable. The engineer wants to monitor the external API's contribution to latency. What should the engineer do?

A.Instrument the prediction container to emit custom metrics for the time spent in each prediction step, including the external API call.
B.Add a timeout setting to the endpoint's request to limit the external API call duration.
C.Monitor the Vertex AI endpoint latency metric and correlate with system metrics like CPU and memory.
D.Use Cloud Trace to trace the prediction request end-to-end, including the external API call.
AnswerA

Custom metrics provide granular breakdown.

Why this answer

Option A is correct because instrumenting the custom container to emit custom metrics (e.g., using OpenTelemetry or a Prometheus client library) allows the engineer to directly measure the time spent in each prediction step, isolating the external API call's contribution to latency. This provides granular, real-time visibility into the specific bottleneck, which is essential when the response time is highly variable and the external API is a known dependency.

Exam trap

Google Cloud often tests the distinction between monitoring (custom metrics) and tracing (Cloud Trace) — the trap here is that candidates assume Cloud Trace automatically captures all downstream calls, but it requires explicit instrumentation of the external API call to record its duration, whereas custom metrics can be emitted directly from the container code without needing distributed tracing context.

How to eliminate wrong answers

Option B is wrong because adding a timeout setting to the endpoint's request limits the duration of the external API call but does not provide any monitoring data; it only caps the latency, potentially causing failures without diagnosing the root cause. Option C is wrong because monitoring the Vertex AI endpoint latency metric and correlating with CPU/memory only gives aggregate performance data, not the specific contribution of the external API call, making it impossible to isolate the external API's impact. Option D is wrong because Cloud Trace can trace the request end-to-end, but it requires the custom container to be instrumented with trace context propagation; without explicit instrumentation of the external API call, Cloud Trace will not capture the time spent in that external call, leaving the same gap in visibility.

60
MCQhard

A real-time recommendation model deployed on Vertex AI Endpoints is experiencing increased latency, especially during peak hours. The model is hosted on a single machine with 4 CPUs. Which set of actions should you take to diagnose and resolve the issue?

A.Increase the machine type to with 32 CPUs and disable autoscaling.
B.Switch the endpoint to use GPUs and enable batch requests.
C.Enable autoscaling on the endpoint and analyze request patterns to set min/max instances.
D.Change the serving framework to use TensorFlow Serving with gRPC.
AnswerC

Autoscaling handles peak load efficiently.

Why this answer

Option C is correct because enabling autoscaling on a Vertex AI Endpoint allows the deployment to dynamically adjust the number of serving instances based on real-time traffic, directly addressing peak-hour latency. Analyzing request patterns to set appropriate min/max instances ensures that the endpoint scales proactively without over-provisioning, which is the standard diagnostic and resolution approach for latency issues caused by insufficient capacity under variable load.

Exam trap

Google Cloud often tests the misconception that scaling up (vertical scaling) or changing frameworks is the first step to fix latency, when the correct approach is to first diagnose capacity constraints and then scale out horizontally with autoscaling.

How to eliminate wrong answers

Option A is wrong because simply increasing the machine type to 32 CPUs without autoscaling does not resolve peak-hour latency; it only increases static capacity, leading to over-provisioning during low traffic and still failing under sudden spikes if the single instance is overwhelmed. Option B is wrong because switching to GPUs is not a direct fix for latency caused by CPU-bound serving; GPUs benefit compute-heavy models (e.g., deep learning) but add overhead for small models, and enabling batch requests increases latency for real-time predictions as it waits to accumulate requests. Option D is wrong because changing the serving framework to TensorFlow Serving with gRPC does not address the root cause of insufficient compute capacity; it may improve throughput per instance but cannot compensate for a single machine being overloaded during peak hours.

61
MCQeasy

Refer to the exhibit. What does this query return?

A.The maximum latency per minute
B.The error rate per minute
C.The total number of predictions per minute
D.The average latency per minute for the model
AnswerD

The query uses 'mean' aggregator over 1-minute windows.

Why this answer

The query uses the `rate` function to calculate the per-second rate of increase of the `latency_seconds` counter, and then applies the `avg` aggregator to compute the average latency across all instances over the specified time range. The `by (model)` clause groups the result by the `model` label, so the output is the average latency per minute for each model. This is why option D is correct.

Exam trap

Google Cloud often tests the distinction between `avg` and `max` aggregators in PromQL queries, and candidates mistakenly think `rate` alone implies a maximum or total, rather than understanding that `avg` computes the mean over the rate values.

How to eliminate wrong answers

Option A is wrong because the query uses `avg` to compute the average, not `max` to find the maximum latency per minute. Option B is wrong because the query operates on `latency_seconds`, a latency metric, not on an error counter or error rate metric. Option C is wrong because the query uses `avg` to average latency values, not `sum` or `count` to total the number of predictions per minute.

62
MCQeasy

An ML team is using Vertex AI Online Prediction and wants to receive alerts when the 99th percentile latency exceeds 500ms for more than 5 minutes. What is the best practice to set up this alert in Cloud Monitoring?

A.Create a custom metric from the prediction container that emits latency percentiles, then set an alert on that metric.
B.Use the 'aiplatform.googleapis.com/prediction/online_prediction_latencies' metric with a metric threshold condition set to 500ms and a percentile aligner of 99.
C.Use a log-based metric to parse latency from Cloud Logging and alert when the average exceeds 500ms.
D.Export prediction latency logs to BigQuery and run a scheduled query to check the 99th percentile, then trigger a Cloud Function to send an alert.
AnswerB

This directly monitors the 99th percentile latency.

Why this answer

Option B is correct because Cloud Monitoring provides a pre-built metric, `aiplatform.googleapis.com/prediction/online_prediction_latencies`, which directly captures prediction latency. By applying a percentile aligner of 99 and a metric threshold condition of 500ms, you can alert when the 99th percentile latency exceeds 500ms for the specified duration, without needing custom instrumentation or external processing.

Exam trap

Google Cloud often tests the misconception that you must create custom metrics or use log-based solutions for percentile-based alerting, when in fact Cloud Monitoring's distribution metrics and percentile aligners handle this natively.

How to eliminate wrong answers

Option A is wrong because creating a custom metric from the prediction container is unnecessary and adds complexity; Vertex AI already emits the required latency metric natively, and custom metrics would require additional code and maintenance. Option C is wrong because using a log-based metric to parse latency from Cloud Logging and alerting on the average (not the 99th percentile) does not meet the requirement to monitor the 99th percentile latency; log-based metrics also introduce latency and parsing overhead. Option D is wrong because exporting logs to BigQuery and running scheduled queries is an overly complex, non-real-time approach that violates the best practice of using built-in monitoring capabilities; it also introduces additional cost and delay compared to native Cloud Monitoring alerts.

63
MCQeasy

A data science team has deployed a model on Vertex AI and wants to automatically detect when the distribution of a specific feature shifts significantly from the training data. Which service should they use?

A.Cloud Data Loss Prevention
B.Vertex AI Model Monitoring
C.Vertex AI Explainable AI
D.Cloud Composer
AnswerB

Vertex AI Model Monitoring includes skew detection, which compares training and serving distributions and alerts on significant shifts.

Why this answer

Vertex AI Model Monitoring is the correct service because it is specifically designed to detect feature distribution drift (skew) between training and serving data for deployed models. It continuously monitors the input features and alerts when statistical metrics like the Jensen-Shannon divergence or the L-infinity distance exceed a configured threshold, enabling proactive model retraining.

Exam trap

Google Cloud often tests the distinction between monitoring model performance (e.g., accuracy, latency) versus monitoring data distribution drift, and candidates may confuse Vertex AI Model Monitoring with Explainable AI because both involve model analysis, but only Model Monitoring tracks shifts over time.

How to eliminate wrong answers

Option A is wrong because Cloud Data Loss Prevention (DLP) is used for inspecting, classifying, and de-identifying sensitive data (e.g., PII, credit card numbers), not for monitoring feature distributions or model drift. Option C is wrong because Vertex AI Explainable AI provides feature attributions and explanations for model predictions (e.g., Shapley values, integrated gradients), but does not monitor distribution shifts over time. Option D is wrong because Cloud Composer is a managed Apache Airflow service for orchestrating workflows and pipelines, not a dedicated tool for detecting feature drift in deployed models.

64
MCQhard

After setting up model monitoring on Vertex AI for a classification model, the engineer sees a high number of anomaly alerts for the "age" feature. Upon investigation, the age distribution in recent predictions is similar to training data. What might be the cause?

A.The feature importance of age has changed
B.The monitoring baseline was incorrectly set
C.The monitoring threshold for age is too low
D.The model is overfitting to age
AnswerC

A low threshold triggers alerts for small, insignificant deviations.

Why this answer

Option C is correct because the high number of anomaly alerts despite the age distribution being similar to training data indicates that the monitoring threshold for the 'age' feature is set too low. In Vertex AI Model Monitoring, anomaly detection compares recent prediction distributions against a baseline using statistical tests (e.g., the Kolmogorov-Smirnov test for numerical features). If the threshold is too sensitive, even minor, statistically insignificant deviations can trigger alerts, leading to false positives even when the distribution is essentially unchanged.

Exam trap

The trap here is that candidates confuse 'anomaly alerts' with 'model performance degradation' or 'data drift,' but the question specifically states the distribution is similar, so the root cause is a misconfigured sensitivity threshold, not a genuine distribution shift.

How to eliminate wrong answers

Option A is wrong because feature importance measures the contribution of a feature to model predictions, not the distribution of the feature values themselves; a change in feature importance would not directly cause distribution-based anomaly alerts. Option B is wrong because if the monitoring baseline were incorrectly set (e.g., using a non-representative sample), the age distribution in recent predictions would likely differ from the training data, but the question states the distribution is similar, so the baseline is not the issue. Option D is wrong because overfitting to age would manifest as poor generalization on unseen data, not as anomaly alerts on the feature distribution; overfitting does not inherently trigger monitoring alerts unless the distribution shifts.

65
Multi-Selecteasy

A team has deployed a model on Vertex AI Prediction and wants to monitor for data drift. Which TWO metrics should they use to detect drift in numerical features?

Select 2 answers
A.Pearson correlation coefficient
B.Jensen-Shannon divergence (JSD)
C.Chi-squared statistic
D.Population Stability Index (PSI)
E.Kolmogorov-Smirnov (KS) statistic
AnswersB, E

JSD measures similarity between two probability distributions and works for numerical features after binning.

Why this answer

Jensen-Shannon divergence (JSD) is a symmetric, bounded (0 to 1) measure of the difference between two probability distributions, making it ideal for detecting drift in numerical features by comparing the training distribution to the serving distribution. It is a smoothed and normalized version of Kullback-Leibler divergence, and Vertex AI Prediction's Model Monitoring natively supports JSD for numerical feature drift detection.

Exam trap

Google Cloud often tests the misconception that Pearson correlation or Chi-squared are appropriate for numerical drift, when in fact Pearson measures correlation between two variables and Chi-squared is for categorical data, leading candidates to overlook the correct distribution-comparison metrics like JSD and KS.

66
Multi-Selectmedium

A team is responsible for monitoring the health of a Vertex AI pipeline that runs daily. Which THREE resources should they use to gain visibility into pipeline performance and failures? (Choose 3.)

Select 3 answers
A.Cloud Trace for analyzing distributed execution
B.Cloud Composer for tracking DAGs
C.Vertex AI Experiments for comparing pipeline runs
D.Cloud Monitoring for metrics and alerts on pipeline runs
E.Cloud Logging for viewing pipeline step logs
AnswersC, D, E

Vertex AI Experiments tracks pipeline runs and allows comparison of metrics across runs over time.

Why this answer

Vertex AI Experiments (Option C) is correct because it provides a systematic way to log, compare, and analyze pipeline runs, including metrics, parameters, and artifacts. This allows the team to track performance trends across daily runs, identify regressions, and correlate failures with specific run configurations, which is essential for monitoring pipeline health over time.

Exam trap

Google Cloud often tests the distinction between monitoring (observing run-level metrics and logs) and tracing (analyzing request-level latency), leading candidates to incorrectly select Cloud Trace for pipeline health visibility when it is actually intended for distributed request tracing.

67
MCQhard

A travel booking company has a real-time recommendation system that suggests hotels and flights to users. The model is served using TensorFlow Serving on a Google Kubernetes Engine (GKE) cluster with auto-scaling enabled. The cluster uses n1-standard-4 machine types. The team has set up Cloud Monitoring dashboards and alerts. Last week, during a major holiday promotion, the team noticed that the model's inference latency P99 increased from 150 ms to 450 ms over a 30-minute period, while the request throughput increased from 500 to 1,200 requests per second. CPU utilization across the cluster rose to 95%, but memory utilization remained at 60%. The model version and the serving infrastructure configuration have not changed since the last deployment. Which action should the team take to mitigate the latency issue?

A.Implement a feature engineering pipeline that compresses the input features to reduce data size and inference time.
B.Deploy a newer version of the model that uses a more efficient architecture to reduce computational complexity.
C.Increase the number of TensorFlow Serving instances by reducing the CPU request per pod in GKE to allow more pods per node.
D.Add more nodes to the GKE cluster to increase the total CPU resources available for serving.
AnswerD

Adding nodes increases compute capacity, allowing more parallel inference and reducing latency under high load.

Why this answer

The latency spike is caused by CPU saturation (95% utilization) under increased load (500 to 1,200 RPS). Adding more nodes to the GKE cluster directly increases the total CPU resources available, allowing the existing TensorFlow Serving pods to handle the higher throughput without contention. This is the most immediate and infrastructure-appropriate fix because the model version and serving configuration have not changed, ruling out model-level or code-level optimizations.

Exam trap

Google Cloud often tests the misconception that reducing per-pod CPU requests (Option C) is a valid scaling strategy, but in reality this increases overcommitment and can worsen latency under high load, whereas adding nodes (Option D) provides dedicated resources without contention.

How to eliminate wrong answers

Option A is wrong because compressing input features may reduce data size but does not address the root cause of CPU saturation; inference latency is dominated by model computation, not I/O, and the feature engineering pipeline is not part of the serving infrastructure. Option B is wrong because deploying a newer, more efficient model is a long-term optimization, not an immediate mitigation; the question states the model version has not changed and the issue is purely resource contention under load. Option C is wrong because reducing the CPU request per pod would allow more pods per node, but this would increase CPU overcommitment and worsen contention on already saturated nodes, potentially causing further latency degradation or pod evictions.

68
MCQmedium

A company uses Vertex AI Model Monitoring to detect data drift. They have a model that predicts house prices. Which dataset should they compare against the training data to detect drift?

A.The entire historical prediction data
B.A random sample of recent predictions
C.The latest batch of predictions
D.The validation data used during training
AnswerC

Comparing the latest serving data distribution to training data detects drift.

Why this answer

Option C is correct because Vertex AI Model Monitoring compares the training data (serving as the baseline) against the latest batch of predictions to detect data drift. This batch represents the most recent inference requests, allowing the monitoring service to compute statistical distribution differences (e.g., Jensen-Shannon divergence) and trigger alerts when drift exceeds a configured threshold. Using the latest batch ensures timely detection of shifts in the production data distribution.

Exam trap

Google Cloud often tests the distinction between 'recent predictions' and 'latest batch' — the trap is that candidates confuse a random sample (which is statistically valid for inference but not for drift detection) with the complete batch that Vertex AI requires for accurate distribution comparison.

How to eliminate wrong answers

Option A is wrong because using the entire historical prediction data would dilute recent drift signals with older, potentially stale distributions, making it harder to detect current drift and violating Vertex AI's requirement for a sliding window of recent predictions. Option B is wrong because a random sample of recent predictions lacks the systematic coverage of the full production traffic; Vertex AI Model Monitoring expects a complete batch to accurately compute per-feature drift metrics, and random sampling can miss localized drift patterns. Option D is wrong because validation data is a static holdout set from training time, not production data; comparing against it would measure generalization error, not data drift, and Vertex AI Model Monitoring is designed to compare against training data, not validation data.

69
MCQhard

Refer to the exhibit. An engineer notices no drift alerts but the model performance has degraded. What is the likely cause?

A.Feature attribution monitoring is causing too many false positives
B.Drift threshold for income is too high
C.Skew thresholds are not configured for categorical features
D.Concept drift is occurring, which is not captured by drift or skew detection
AnswerD

Concept drift affects the model's predictive relationship, not input distributions.

Why this answer

Concept drift occurs when the statistical properties of the target variable change over time, causing model performance to degrade even when the input data distribution remains stable. Drift detection (e.g., data drift or skew) monitors changes in feature distributions, not the relationship between features and the target. Since no drift alerts were triggered, the input data appears unchanged, but the model's predictive relationship has shifted — this is classic concept drift, which requires performance monitoring (e.g., accuracy, F1-score) rather than drift or skew detection.

Exam trap

Google Cloud often tests the distinction between data drift (input feature changes) and concept drift (target relationship changes), trapping candidates who assume that no drift alerts mean the model is healthy, when in fact performance degradation can occur without any feature distribution shift.

How to eliminate wrong answers

Option A is wrong because feature attribution monitoring (e.g., SHAP values) explains model predictions but does not generate false positives for drift; it is unrelated to the absence of drift alerts. Option B is wrong because a drift threshold for income being too high would suppress drift alerts for that feature, but the scenario states no drift alerts at all, and concept drift is not captured by feature-level drift thresholds. Option C is wrong because skew thresholds for categorical features detect distribution shifts in those features, but the problem is a change in the target relationship (concept drift), not a change in feature distributions.

70
MCQeasy

A startup is deploying its first machine learning model using BigQuery ML. The model is a logistic regression for churn prediction, trained on a dataset of 5 million rows. The pipeline runs every week: it exports training data from BigQuery, trains a model using BigQuery ML, and then deploys the model as a remote model for predictions. The ML engineer wants to set up basic monitoring to ensure the pipeline runs successfully and the model quality does not degrade. Which monitoring approach should the engineer implement first?

A.Set up Cloud Monitoring alerts on the pipeline's execution status and duration, and create a simple dashboard showing these metrics.
B.Export BigQuery audit logs to Cloud Logging and analyze them for any errors.
C.Enable Vertex AI Model Monitoring to detect data drift between training and serving data.
D.Monitor the model's area under the ROC curve (AUC) over time and alert if it drops by more than 0.01.
AnswerA

Fundamental monitoring ensures pipeline runs successfully.

Why this answer

Option A is correct because the first priority in monitoring a new ML pipeline is ensuring it runs successfully and on time. Cloud Monitoring alerts on execution status and duration directly address pipeline reliability, which is the most basic operational concern before model quality metrics like AUC or drift can be meaningful. This approach aligns with the principle of starting with infrastructure health before advanced model monitoring.

Exam trap

Google Cloud often tests the principle of 'start with the basics' — candidates are tempted to jump to advanced monitoring like drift or AUC, but the correct first step is ensuring the pipeline runs reliably.

How to eliminate wrong answers

Option B is wrong because exporting BigQuery audit logs to Cloud Logging and analyzing them for errors is a reactive, post-hoc approach that does not provide real-time pipeline monitoring; it also adds complexity without addressing the immediate need for basic pipeline health checks. Option C is wrong because Vertex AI Model Monitoring for data drift is an advanced monitoring technique that requires a stable serving environment and baseline data, which is premature for a first deployment; it also incurs additional cost and setup time. Option D is wrong because monitoring AUC over time and alerting on a drop of 0.01 assumes the model is already in production with a baseline, but the question asks for the first monitoring step, which should be pipeline execution success, not model performance degradation.

71
MCQeasy

You have deployed a text classification model using Vertex AI Endpoints. The model is performing well, but the operations team wants to be alerted if the endpoint returns an excessive number of HTTP 503 errors. What is the simplest way to achieve this?

A.Configure a Cloud Monitoring uptime check on the endpoint URL.
B.Create a Cloud Monitoring alert based on the metric 'prediction/failed_request_count' with a condition on 5xx errors.
C.Add a logging statement in the custom prediction routine to count errors manually.
D.Export Cloud Logging to BigQuery and run a scheduled query for 503s.
AnswerB

Built-in metric directly reflects HTTP errors.

Why this answer

Option B is correct because Vertex AI Endpoints automatically export the 'prediction/failed_request_count' metric to Cloud Monitoring, which includes a label for HTTP status codes. By creating an alert on this metric with a filter for 5xx errors, you can directly monitor excessive 503 responses without additional infrastructure or custom code.

Exam trap

The trap here is that candidates often confuse uptime checks (which measure availability from external probes) with metric-based alerts (which track internal error counts), leading them to choose Option A despite its inability to specifically detect 503 errors.

How to eliminate wrong answers

Option A is wrong because Cloud Monitoring uptime checks test endpoint availability from external locations, but they cannot distinguish between 503 errors and other HTTP statuses; they only report overall uptime/downtime, not specific error counts. Option C is wrong because adding a logging statement in a custom prediction routine requires modifying the deployment code and does not leverage the built-in metrics already available in Vertex AI, making it unnecessarily complex and not the simplest approach. Option D is wrong because exporting logs to BigQuery and running scheduled queries introduces significant latency, cost, and operational overhead compared to using the native Cloud Monitoring alert, which provides real-time detection with minimal configuration.

72
MCQhard

Your team has deployed a text classification model on Vertex AI Endpoints. You notice that the model's latency has increased significantly over the last week, but the request rate has remained stable. Which of the following is the most likely cause?

A.A sudden increase in the number of prediction requests
B.The model was replaced with a larger version without updating the endpoint
C.A change in the preprocessing logic that now includes a computationally expensive step
D.A misconfiguration in the autoscaling policy
AnswerC

This increases per-request latency without changing request rate.

Why this answer

A computationally expensive preprocessing step directly increases per-request latency on the inference path, even when request rate is stable. Vertex AI Endpoints execute user-provided preprocessing code before model inference, so adding a heavy operation (e.g., large regex, image resizing, or external API call) will linearly increase response time for every prediction.

Exam trap

The trap here is that candidates confuse 'model latency' with 'request rate' and assume any latency increase must be due to scaling issues, ignoring that preprocessing logic changes can dramatically affect per-request performance without altering throughput.

How to eliminate wrong answers

Option A is wrong because a sudden increase in request rate would cause latency to rise, but the question explicitly states request rate has remained stable. Option B is wrong because replacing the model with a larger version requires deploying a new model to the endpoint or updating the endpoint's deployed model; simply replacing the model binary without updating the endpoint's deployment configuration would not change the model served, so latency would not increase. Option D is wrong because a misconfiguration in autoscaling policy (e.g., too few min replicas) would cause latency to increase only when request rate exceeds the current serving capacity, but request rate is stable and autoscaling would have already scaled to match the stable load.

73
MCQeasy

A data scientist trained a model on historical data from 2020-2022 and deployed it in January 2023. In February 2023, the model's accuracy drops significantly. Which monitoring metric would most likely indicate the root cause?

A.Number of unique users calling the endpoint.
B.Prediction latency p99.
C.Number of missing feature values in requests.
D.Training-serving skew detected by Vertex AI Model Monitoring.
AnswerD

Skew indicates that serving data distribution differs from training data, likely causing accuracy drop.

Why this answer

Option D is correct because Vertex AI Model Monitoring specifically detects training-serving skew, which occurs when the distribution of input features at serving time differs from the training data distribution. Since the model was trained on 2020-2022 data and deployed in January 2023, a significant accuracy drop in February 2023 likely indicates that the real-world data distribution has shifted (e.g., seasonal patterns, new user behavior), causing the model to encounter unseen patterns. This skew is a common root cause of performance degradation and is directly monitored by Vertex AI's skew detection feature.

Exam trap

Google Cloud often tests the distinction between model performance metrics (accuracy, precision) and operational metrics (latency, throughput, user count), and the trap here is that candidates may confuse a drop in accuracy with a system-level issue like latency or missing values, rather than recognizing that accuracy degradation is most directly linked to data distribution shifts (skew).

How to eliminate wrong answers

Option A is wrong because the number of unique users calling the endpoint is a business metric, not a model performance metric; it does not directly indicate why accuracy dropped. Option B is wrong because prediction latency p99 measures response time, not prediction quality; high latency could degrade user experience but does not explain a drop in accuracy. Option C is wrong because missing feature values in requests would cause errors or fallback behavior, but the question states accuracy drops, not that predictions fail; missing values are typically handled by imputation or default values and would not necessarily cause a significant accuracy drop unless the model was not trained to handle them.

74
MCQeasy

Refer to the exhibit. A data scientist notices that predictions from a deployed model are taking longer than expected. Which Cloud Monitoring metric should be inspected first to identify the bottleneck?

A.Vertex AI - Model - Compute utilization
B.Vertex AI - Endpoint - Prediction latency distribution
C.Vertex AI - Endpoint - Traffic
D.Vertex AI - Endpoint - Online prediction errors
AnswerB

This metric directly shows the distribution of latency for prediction requests, making it the first place to look for a bottleneck.

Why this answer

The data scientist is investigating slow predictions from a deployed model. The most direct metric to identify the latency bottleneck is the prediction latency distribution, which shows the distribution of response times for online prediction requests. This metric allows you to pinpoint whether the delay is due to model inference time, network overhead, or endpoint queuing, making it the first logical place to inspect.

Exam trap

Google Cloud often tests the distinction between metrics that measure performance (latency) versus metrics that measure capacity (utilization, traffic) or errors, leading candidates to mistakenly choose compute utilization or traffic when the question explicitly asks about prediction time.

How to eliminate wrong answers

Option A is wrong because Vertex AI - Model - Compute utilization measures the resource usage (CPU/memory) of the model's compute resources, which can indicate a resource bottleneck but does not directly show prediction latency; it is a secondary metric to investigate after latency is confirmed. Option C is wrong because Vertex AI - Endpoint - Traffic measures the number of requests per second (RPS) to the endpoint, which can indicate load but does not directly measure how long each prediction takes; high traffic can cause latency, but the metric itself is not a latency metric. Option D is wrong because Vertex AI - Endpoint - Online prediction errors tracks the count or rate of failed predictions (e.g., timeouts, invalid inputs), not the latency of successful predictions; errors may be a consequence of latency but are not the primary metric for identifying a latency bottleneck.

75
MCQhard

A team is monitoring a production ML system that includes multiple models and data processing pipelines. They want to set up a comprehensive alerting strategy that minimizes false positives while ensuring critical issues are promptly addressed. Which approach is the most effective?

A.Set up alerts for all possible error conditions
B.Use static thresholds based on historical data
C.Rely on manual monitoring during business hours
D.Use AIOps with anomaly detection to dynamically adjust thresholds
AnswerD

AIOps anomaly detection models learn normal behavior and flag deviations, reducing false positives while detecting real anomalies.

Why this answer

Option D is correct because AIOps with anomaly detection uses machine learning to dynamically adjust alert thresholds based on real-time system behavior, reducing false positives while ensuring critical issues are detected promptly. This approach adapts to changing data distributions and traffic patterns, unlike static thresholds that require manual tuning and often miss subtle anomalies. It is the most effective strategy for complex ML production systems where multiple models and pipelines interact, as it can correlate signals across components to identify genuine incidents.

Exam trap

The trap here is that candidates often choose static thresholds (Option B) because they seem simpler and more predictable, but they fail to recognize that production ML systems require adaptive thresholds to handle dynamic data distributions and avoid alert fatigue.

How to eliminate wrong answers

Option A is wrong because setting alerts for all possible error conditions leads to alert fatigue, overwhelming the team with noise and causing critical issues to be missed; it lacks prioritization and ignores the need for intelligent filtering. Option B is wrong because static thresholds based on historical data fail to adapt to concept drift, seasonal patterns, or sudden traffic spikes, resulting in either too many false positives or missed anomalies when the system behavior changes. Option C is wrong because relying on manual monitoring during business hours introduces unacceptable latency for critical issues that occur outside those hours, and human error or fatigue can cause delays in detection; it is not scalable for 24/7 production ML systems.

Page 1 of 2 · 86 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Ml Monitoring questions.