The correct answer is that data drift is causing the performance degradation and latency increase. When the production data distribution no longer matches the training data, the model makes more incorrect predictions, and these out-of-distribution inputs often require additional computation—such as fallback logic or uncertainty estimation—which directly drives up latency. On the Google Professional Machine Learning Engineer exam, this scenario tests your understanding of how data drift impacts both accuracy and operational metrics, a common trap being to overlook that degraded predictions can consume extra resources. A key memory tip is to think of the “drift double-hit”: accuracy drops while latency spikes, because the model struggles to process unfamiliar data efficiently.
PMLE Scaling prototypes into ML models Practice Question
This PMLE practice question tests your understanding of scaling prototypes into ml models. Read the scenario carefully and evaluate each option against the stated constraints before committing to an answer. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.
Exhibit
Refer to the exhibit.
```
Model accuracy: 0.92
Training data: 10,000 records
Online prediction latency: 95th percentile = 450ms
QPS: 50
After moving to production:
- New data from users: 100,000 records/day
- Data distribution shift detected (new features emerge)
- Prediction latency increases to 95th percentile = 1200ms
- QPS drops to 30
```
A team deployed a prototype classification model to Vertex AI Prediction. After a week, they notice the metrics shown in the exhibit. What is the most likely cause of the performance degradation and latency increase?
Clue words in this question
Noticing these words before you look at the options changes how you read each choice.
Clue: "most likely"
Why it matters: Probability qualifier — the question wants the most probable cause or outcome, not a guaranteed one. Eliminate low-probability options.
Refer to the exhibit.
```
Model accuracy: 0.92
Training data: 10,000 records
Online prediction latency: 95th percentile = 450ms
QPS: 50
After moving to production:
- New data from users: 100,000 records/day
- Data distribution shift detected (new features emerge)
- Prediction latency increases to 95th percentile = 1200ms
- QPS drops to 30
```
A
The prediction endpoint's autoscaling is too slow, causing requests to queue and time out.
Why wrong: Autoscaling may contribute but does not explain the accuracy drop.
B
The prediction requests are too large, exceeding the maximum request size limit for Vertex AI.
Why wrong: Request size limit would cause errors, not a gradual latency increase.
C
The training data does not represent the current production data distribution, causing the model to make incorrect predictions and requiring more computation.
Data distribution shift degrades accuracy and can increase latency if the model is uncertain.
D
The custom prediction container uses outdated libraries that are incompatible with Vertex AI's runtime.
Why wrong: Library incompatibility would cause errors, not gradual latency increase and accuracy drop.
Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.
Correct answer & explanation
✓
The training data does not represent the current production data distribution, causing the model to make incorrect predictions and requiring more computation.
The exhibit shows both accuracy degradation and increased latency. Option C is correct because when the production data distribution shifts away from the training data (data drift), the model makes more incorrect predictions, which can trigger additional computation (e.g., retries, fallback logic, or increased uncertainty estimation) and cause latency spikes. Vertex AI Prediction does not inherently add computation for wrong predictions, but the model's internal confidence thresholds or post-processing steps may consume extra resources when handling out-of-distribution inputs.
Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Answer analysis
Option-by-option breakdown
For each option: why learners choose it and why it is or isn't the right answer here.
✗
The prediction endpoint's autoscaling is too slow, causing requests to queue and time out.
Why it's wrong here
Autoscaling may contribute but does not explain the accuracy drop.
✗
The prediction requests are too large, exceeding the maximum request size limit for Vertex AI.
Why it's wrong here
Request size limit would cause errors, not a gradual latency increase.
✓
The training data does not represent the current production data distribution, causing the model to make incorrect predictions and requiring more computation.
Why this is correct
Data distribution shift degrades accuracy and can increase latency if the model is uncertain.
Clue confirmation
The clue word "most likely" in the question point toward this answer.
Related concept
Read the scenario before looking for a memorised answer.
✗
The custom prediction container uses outdated libraries that are incompatible with Vertex AI's runtime.
Why it's wrong here
Library incompatibility would cause errors, not gradual latency increase and accuracy drop.
Common exam traps
Common exam trap: answer the scenario, not the keyword
Google Cloud often tests the misconception that latency increase must be caused by infrastructure issues (autoscaling or request size) rather than model behavior, but the key clue is the simultaneous accuracy degradation, which points to data drift as the root cause.
Detailed technical explanation
How to think about this question
Data drift is often detected by monitoring the distribution of prediction inputs versus training features using statistical tests like Kolmogorov-Smirnov or Population Stability Index (PSI). In Vertex AI, you can enable Model Monitoring to automatically detect drift and skew, triggering retraining or alerting. A real-world scenario is a retail demand model trained on pre-pandemic data failing during a pandemic because customer behavior shifted, causing both accuracy loss and increased latency due to the model's uncertainty handling (e.g., Bayesian neural networks taking longer to sample).
KKey Concepts to Remember
Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.
TExam Day Tips
→Watch for words such as best, first, most likely and least administrative effort.
→Review why wrong options are wrong, not only why the correct option is correct.
Key takeaway
Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Real-world example
How this comes up in practice
A cloud solutions architect for a retail company is evaluating services for a new workload. The correct answer here reflects best practice for the specific scenario described — not a general cloud recommendation. Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option. Cloud exam questions reward reading the constraint carefully: the same technology can be right or wrong depending on the use case.
What to study next
Got this wrong? Here's your next step.
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
Scaling prototypes into ML models — This question tests Scaling prototypes into ML models — Read the scenario before looking for a memorised answer..
What is the correct answer to this question?
The correct answer is: The training data does not represent the current production data distribution, causing the model to make incorrect predictions and requiring more computation. — The exhibit shows both accuracy degradation and increased latency. Option C is correct because when the production data distribution shifts away from the training data (data drift), the model makes more incorrect predictions, which can trigger additional computation (e.g., retries, fallback logic, or increased uncertainty estimation) and cause latency spikes. Vertex AI Prediction does not inherently add computation for wrong predictions, but the model's internal confidence thresholds or post-processing steps may consume extra resources when handling out-of-distribution inputs.
What should I do if I get this PMLE question wrong?
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
Are there clue words in this question I should notice?
Yes — watch for: "most likely". Probability qualifier — the question wants the most probable cause or outcome, not a guaranteed one. Eliminate low-probability options.
What is the key concept behind this question?
Read the scenario before looking for a memorised answer.
About these practice questions
Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
This PMLE practice question is part of Courseiva's free Google Cloud certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the PMLE exam.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.