CCNA Monitoring ML solutions Questions — Page 2 of 2

MCQhard

A team uses custom training and deploys a TensorFlow model using Vertex AI Endpoints. They set up Cloud Monitoring alerts for online prediction latency. However, they notice the latency metric shows a spike every hour, but the actual user experience is fine. What could be the cause?

A.The metric includes prediction time plus log writing time

B.The alert threshold is too low

C.The metric is being sampled every hour

D.A monitoring agent on the VM is causing additional latency

AnswerA

Periodic log dumping can cause hourly spikes in measured latency.

Why this answer

Option A is correct because Vertex AI Endpoints' latency metric includes both the model inference time and the time taken to write prediction logs to Cloud Logging. This log writing occurs asynchronously but can cause periodic spikes in the reported latency metric when log buffers flush, even though the actual user-facing prediction latency remains unaffected. The spike every hour aligns with log rotation or buffer flush intervals, not with actual prediction performance degradation.

Exam trap

Google Cloud often tests the misconception that latency metrics reflect only model inference time, when in reality they may include ancillary operations like logging, causing candidates to overlook the logging overhead as the source of periodic spikes.

How to eliminate wrong answers

Option B is wrong because the alert threshold being too low would cause continuous or frequent alerts, not a predictable hourly spike in the latency metric itself. Option C is wrong because sampling every hour would produce a single data point per hour, not a spike within the metric; the metric is reported continuously, and sampling frequency does not create spikes. Option D is wrong because a monitoring agent on the VM would add consistent overhead, not a periodic hourly spike, and Vertex AI Endpoints are managed services where customers do not manage VMs directly for prediction serving.

Practice this question →

MCQhard

A financial services firm deploys a binary classification model for fraud detection. The model's precision is 0.95 and recall is 0.60 on the test set. After deployment, the fraud rate in production is 0.5% compared to 5% in the test set. The model shows good calibration on the test set (Brier score 0.02) but poor calibration in production (Brier score 0.15). What is the most likely explanation for the calibration degradation?

A.The distribution of input features has shifted significantly, causing the model to produce incorrect probabilities.

B.The model overfits to noise in the training data, leading to poor generalization.

C.The production data has a different class imbalance than the training data, causing the model to be biased toward the majority class.

D.The relationship between features and the target has changed (concept drift), causing the model's probability estimates to be misaligned with the true probabilities.

AnswerD

Concept drift changes the conditional distribution P(Y|X), which directly affects calibration.

Why this answer

The model's calibration degrades in production despite being well-calibrated on the test set, which had a 5% fraud rate, while production has a 0.5% fraud rate. This shift in class imbalance (prior probability shift) directly affects the model's probability estimates because the model's predicted probabilities are conditional on the training distribution. Option D is correct because concept drift—specifically a change in the base rate of fraud—causes the model's probability estimates to no longer reflect the true posterior probabilities in production, leading to a higher Brier score.

Exam trap

The trap here is that candidates confuse covariate shift (feature distribution change) with prior probability shift (class imbalance change), and incorrectly attribute calibration degradation to feature drift rather than the direct effect of base rate change on probability estimates.

How to eliminate wrong answers

Option A is wrong because input feature distribution shift (covariate shift) would primarily affect the model's feature space and could degrade calibration, but the core issue here is the change in class imbalance (prior probability shift), not feature distribution. Option B is wrong because overfitting to noise would manifest as poor performance on both test and production sets, but the model shows good calibration on the test set (Brier score 0.02), indicating it generalizes well to the test distribution. Option C is wrong because while the production data has a different class imbalance, the model is not necessarily biased toward the majority class; the degradation is due to the mismatch between the training prior and production prior, which directly skews probability estimates regardless of majority class bias.

Practice this question →

MCQmedium

An e-commerce company uses a recommendation model deployed on Vertex AI Endpoints. The model's latency increases gradually over two weeks, causing timeouts. The model is served using a custom container. What is the most likely root cause and corrective action?

A.The model is receiving more traffic; scale the number of replicas.

B.The custom container has a memory leak; implement memory monitoring and set container resource limits.

C.The Vertex AI endpoint has changed its URL; update the client application.

D.The model file size has grown due to feature engineering; reduce feature count.

AnswerB

Memory leaks are a common cause of gradual performance degradation in long-running containers.

Why this answer

A gradual increase in latency over two weeks, without a sudden spike, strongly indicates a memory leak in the custom container. As the leak accumulates, the container's garbage collection becomes less effective, leading to increased GC pauses and eventual timeouts. Setting resource limits and monitoring memory usage can prevent the container from exhausting host memory and causing performance degradation.

Exam trap

The trap here is that candidates confuse a gradual latency increase with a traffic scaling issue (Option A), but the slow, steady degradation over weeks is the hallmark of a resource leak, not a demand spike.

How to eliminate wrong answers

Option A is wrong because a gradual latency increase over two weeks is not characteristic of a sudden traffic surge; traffic spikes would cause immediate latency jumps, not a slow creep. Option C is wrong because Vertex AI endpoint URLs are stable and do not change over time; a URL change would cause immediate 404 errors, not gradual latency increases. Option D is wrong because model file size does not change dynamically during serving; feature engineering changes would require a new model deployment, not cause a gradual latency increase in an already deployed model.

Practice this question →

Matchingmedium

Match each Google Cloud storage option to its best use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Unstructured object storage for any type of data

NoSQL wide-column database for low-latency, high-throughput

Serverless data warehouse for analytics at scale

Relational database for OLTP workloads

NoSQL document database for mobile/web apps

Why these pairings

Choosing the right storage is crucial for ML data pipelines.

Practice this question →

MCQmedium

A financial services company uses a custom container to serve a fraud detection model on Vertex AI Endpoints. The model requires a feature store lookup for each prediction. Recently, the feature store (Cloud Bigtable) experienced a brief outage, causing some predictions to fail. After the outage resolved, the endpoint's CPU utilization dropped significantly, and prediction latency improved. However, the model's false positive rate increased sharply. The ML engineer suspects the model is using stale features because the feature store outage caused missing lookups. Cloud Monitoring for the endpoint shows no errors after the outage, but the number of feature store read requests per prediction decreased by 30%. Which metric should the engineer use to confirm the hypothesis of stale features?

A.Monitor the prediction request latency to see if it remains low.

B.Use Vertex AI Model Monitoring to compare the prediction distribution before and after the outage; significant drift indicates stale features.

C.Verify the feature store's read throughput and latency metrics to ensure it is healthy.

D.Check the error rate for the endpoint; if no errors, then features were retrieved correctly.

AnswerB

Drift detection directly reveals changes in model behavior due to input changes.

Why this answer

Option B is correct because Vertex AI Model Monitoring can detect prediction distribution drift, which directly indicates that the model is receiving different input features than expected. A significant drift after the outage, combined with the 30% drop in feature store read requests, confirms that stale or default features were substituted for missing lookups, causing the false positive rate to spike.

Exam trap

The trap here is that candidates assume no errors means no problem, but the question explicitly describes a silent failure where the model uses stale features without raising any error, so metrics like latency or error rate are irrelevant for detecting feature staleness.

How to eliminate wrong answers

Option A is wrong because low prediction latency does not confirm stale features; it only indicates that the endpoint is processing requests faster, which could be due to fewer feature store reads (as observed) but does not prove that the features used are stale. Option C is wrong because verifying the feature store's health metrics (read throughput, latency) only confirms that Bigtable is operational now, not whether the model used stale features during the outage or after. Option D is wrong because the absence of endpoint errors does not guarantee correct feature retrieval; the model can silently use default or cached values without raising errors, which is exactly what happened here.

Practice this question →

Multi-Selectmedium

A data science team uses Vertex AI Model Monitoring to detect data quality issues in a production model. Which TWO metrics should they enable to identify problems with missing values in predictions? (Select TWO.)

Select 2 answers

A.Feature value distribution skew (distance metrics).

B.Training-serving skew detection for all features.

C.Total count of missing values across all features.

D.Prediction confidence score.

E.Missing value ratio per feature.

AnswersA, E

Can detect shifts due to missing values being treated differently.

Why this answer

Option A is correct because Vertex AI Model Monitoring's feature value distribution skew detection uses distance metrics (e.g., Jenson-Shannon divergence, L-infinity) to compare the distribution of feature values in the serving data against the training data. A sudden increase in missing values in a feature will shift its distribution, triggering a skew alert. This allows the team to detect missing value problems indirectly by monitoring distributional drift.

Exam trap

Google Cloud often tests the distinction between aggregate metrics (like total count) and per-feature metrics (like ratio), and candidates mistakenly select 'total count of missing values across all features' because they think it directly addresses missing values, but Vertex AI Model Monitoring only supports per-feature missing value ratios.

Practice this question →

MCQhard

A company has deployed a machine learning model that uses a large input tensor. They notice that the prediction latency varies significantly between requests of the same size. Cloud Monitoring shows that the serving endpoint's CPU utilization is consistently below 50%, but memory utilization fluctuates between 70% and 95%. What is the most likely cause?

A.The model is performing garbage collection cycles

B.The model is using excessive memory due to a memory leak

C.The prediction latency is being affected by CPU throttling

D.The model is hitting a cold start due to autoscaling

AnswerA

Garbage collection pauses can cause latency spikes without high CPU usage, as memory utilization fluctuates during GC.

Why this answer

The correct answer is A because the described symptoms—low CPU utilization (below 50%) and high, fluctuating memory utilization (70%–95%) with variable latency—are classic indicators of garbage collection (GC) pauses in a managed runtime like Python or Java. When the model processes large input tensors, it allocates significant memory; as memory pressure builds, the garbage collector runs more frequently, causing stop-the-world pauses that increase latency unpredictably, even though CPU is not fully utilized.

Exam trap

Google Cloud often tests the misconception that high memory utilization always indicates a memory leak, but the key differentiator is the pattern of fluctuation versus monotonic increase, and the fact that GC pauses cause latency spikes without high CPU usage.

How to eliminate wrong answers

Option B is wrong because a memory leak would cause memory utilization to steadily increase over time (monotonically) rather than fluctuate between 70% and 95%, and it would eventually lead to an out-of-memory crash, not just variable latency. Option C is wrong because CPU throttling (e.g., due to thermal limits or cloud provider CPU credits exhaustion) would manifest as sustained high CPU utilization or a hard cap on CPU speed, not consistently below 50% utilization. Option D is wrong because cold starts due to autoscaling occur when new instances are spun up to handle increased load, which would show a correlation with request volume spikes and initial high latency on the first request, not persistent latency variation across all requests of the same size.

Practice this question →

Multi-Selectmedium

A company uses Vertex AI Model Monitoring. Which two configuration options can be set to reduce false positive drift alerts?

Select 2 answers

A.Use a sample percentage of predictions

B.Set a shorter alerting window

C.Increase the drift threshold

D.Decrease the drift threshold

E.Enable feature attribution monitoring

AnswersA, C

Sampling reduces the volume of data compared, potentially reducing noise-induced false alarms.

Why this answer

Option A is correct because using a sample percentage of predictions reduces the volume of data analyzed for drift, which lowers the chance of detecting statistically insignificant fluctuations that could trigger false positive alerts. This is a common technique to filter out noise in high-throughput production systems.

Exam trap

Google Cloud often tests the misconception that increasing sensitivity (lowering thresholds or shortening windows) reduces false positives, when in fact the opposite is true—these actions increase alert volume and false positives.

Practice this question →

Multi-Selecthard

A team is monitoring a batch prediction job on Vertex AI. Which two metrics should they monitor to ensure the job completes successfully without errors?

Select 2 answers

A.Data size of input

B.Prediction requests per second

C.Job failure rate

D.Model endpoint latency

E.Number of preempted workers

AnswersC, E

Failure rate directly indicates job success.

Why this answer

Option C is correct because the job failure rate directly indicates whether the batch prediction job is completing successfully or encountering errors. Monitoring this metric allows the team to detect and respond to failures in the prediction pipeline, ensuring the job finishes without errors.

Exam trap

Google Cloud often tests the distinction between batch and online prediction metrics, and the trap here is that candidates mistakenly apply online serving metrics (like latency or requests per second) to batch jobs, or overlook worker preemption as a critical failure indicator in distributed batch processing.

Practice this question →

Multi-Selectmedium

A financial services company has deployed a credit risk ML model on Vertex AI. They want to monitor the model for fairness across demographic groups to ensure no biased outcomes. Which TWO actions should they take as best practices? (Choose TWO.)

Select 2 answers

A.Eliminate all features that are correlated with protected attributes from the model input to ensure fairness.

B.Use Vertex Explainable AI to understand feature attributions and compare their distributions across demographic groups.

C.Periodically compare the model's performance metrics (e.g., AUC) on the overall population versus the holdout test set.

D.Store all model predictions in BigQuery but do not capture ground truth labels to avoid privacy issues.

E.Set up alerts on the Vertex AI Model Monitoring fairness metrics, such as equal opportunity difference, and configure a slack channel for notifications.

AnswersB, E

Feature attribution analysis helps identify if the model relies disproportionately on sensitive attributes.

Why this answer

Option B is correct because Vertex Explainable AI provides feature attribution scores that can be compared across demographic groups to detect if the model relies on sensitive attributes or proxies. This enables fairness auditing by revealing whether the model's decision logic differs systematically for protected groups, which is a best practice for monitoring bias.

Exam trap

Google Cloud often tests the misconception that removing protected attributes or correlated features is sufficient for fairness, when in reality proxy features and complex interactions can still cause bias, making monitoring with explainability and fairness metrics essential.

Practice this question →

MCQhard

You are the ML engineer for a financial services company. You have deployed a fraud detection model on Vertex AI Endpoints using a custom container. The model is a gradient boosting model trained on transactional data. Over the past week, the model's precision has dropped from 95% to 80%, while recall has remained stable. The input data volume and distribution have not changed significantly. The model is served on a single endpoint with autoscaling enabled (min replicas=2, max replicas=10). You notice that the average CPU utilization of the serving containers has increased from 40% to 90%, and the p99 latency has increased from 50ms to 200ms. The model is retrained weekly using the latest data, and the last retraining was 3 days ago. The logs show no errors, and the model version is unchanged. Given these symptoms, what is the most likely cause of the precision drop?

A.The autoscaling policy is not scaling up fast enough, causing increased latency and prediction errors.

B.The model is overfitting to recent transaction patterns due to weekly retraining.

C.A recent change in the preprocessing code in the container transformed features differently than what the model expects, causing incorrect predictions.

D.The model was replaced with a different version without updating the endpoint.

AnswerC

Feature transformation mismatch can cause precision drop without affecting recall.

Why this answer

Option C is correct because the precision drop without a change in input distribution or recall strongly indicates a systematic error in predictions, not a data shift. A preprocessing code change in the custom container would cause the model to receive features transformed differently than during training, leading to incorrect probability estimates. The increased CPU utilization and latency are consistent with the container performing additional or different preprocessing steps, not with autoscaling issues or model version changes.

Exam trap

The trap here is that candidates often attribute latency increases and precision drops to autoscaling or model drift, but the key clue is that recall remains stable, which points to a systematic prediction error (preprocessing mismatch) rather than a data distribution shift or infrastructure scaling problem.

How to eliminate wrong answers

Option A is wrong because autoscaling delays cause increased latency and potential timeouts, but they do not directly cause a precision drop; precision depends on the correctness of predictions, not on response time. Option B is wrong because overfitting to recent patterns would typically cause a drop in recall as well, not just precision, and the input distribution has not changed significantly. Option D is wrong because the logs show no errors and the model version is unchanged, so the endpoint is still serving the same model; a version replacement would require an explicit update and would likely trigger a deployment event.

Practice this question →