Question 494 of 1,000
Operationalizing machine learning modelshardMultiple SelectObjective-mapped

Diagnosing AUC Drop Despite Training Validation

This PDE practice question tests your understanding of operationalizing machine learning models. The scenario asks you to isolate a root cause — eliminate options that address a different problem before choosing. A key principle to apply: data Drift. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

An MLOps team manages a pipeline that retrains an XGBoost classifier weekly using BigQuery data. The pipeline is orchestrated with Cloud Composer and deploys the new model to Vertex AI Endpoint if validation metrics (AUC > 0.9) are met. Over the past month, the deployed model's AUC has dropped from 0.95 to 0.88, despite the training pipeline consistently reporting AUC > 0.9. Which THREE steps should the team take to diagnose and fix this issue?

Quick Answer

The answer is to implement model validation on the deployed endpoint by logging predictions and comparing against actuals for a sample of traffic using Vertex Explainable AI. This step is critical because the core issue is a training-serving skew, where the model performs well on the validation data seen during training but fails on the real-world data distribution encountered at inference. The team must diagnose this model performance degradation by capturing live prediction data and analyzing feature attribution to identify which input features have drifted from the training dataset, revealing why the deployed AUC is dropping despite the pipeline’s validation passing. On the Google Professional Data Engineer exam, this scenario tests your understanding of MLOps monitoring and the difference between offline validation and online evaluation, a common trap where candidates focus only on retraining or hyperparameter tuning. A useful memory tip is “validate where you serve”—always monitor the endpoint, not just the pipeline, to catch silent model decay.

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

Add a canary deployment step where new model version receives a small percentage of traffic before full rollout.

Option B is correct because a canary deployment allows the team to gradually roll out the new model to a small percentage of traffic, enabling early detection of performance degradation in production before a full rollout. This step directly addresses the discrepancy between training metrics and live performance by exposing the model to real-world data patterns that may differ from the training set. In Cloud Composer and Vertex AI, canary deployments can be implemented by routing a fraction of requests to the new model version and monitoring its AUC in real time.

Key principle: Data Drift

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

  • Review the training pipeline's hyperparameter tuning configuration to ensure it is not overfitting to stale data.

    Why it's wrong here

    Hyperparameter tuning is unlikely the root cause; the issue is more about training-serving skew or concept drift.

  • Add a canary deployment step where new model version receives a small percentage of traffic before full rollout.

    Why this is correct

    Canary testing can catch performance issues early before the model is fully deployed.

    Related concept

    Data Drift

  • Compare feature distributions between the training data and online serving data using Vertex AI Model Monitoring.

    Why this is correct

    This can detect data skew, which is a common cause of performance degradation.

    Related concept

    Data Drift

  • Retrain the model using a longer training history to include older data that may still be relevant.

    Why it's wrong here

    Adding older data might dilute recent patterns; the issue is likely skew or drift, not lack of data.

  • Implement model validation on the deployed endpoint by logging predictions and comparing against actuals for a sample of traffic using Vertex Explainable AI.

    Why this is correct

    This helps monitor actual model performance in production and detect drift.

    Related concept

    Data Drift

Common exam traps

Common exam trap: answer the scenario, not the keyword

A common mistake is assuming that retraining with more historical data (Option D) or tuning hyperparameters (Option A) will solve the performance drop. However, the real issue is often data drift or a mismatch between training and serving environments. In Google Cloud, the correct approach is to use Vertex AI Model Monitoring to compare feature distributions, implement a canary deployment with Cloud Composer to test new models against live traffic, and validate predictions using Vertex Explainable AI to log and compare against actual outcomes.

Detailed technical explanation

How to think about this question

Vertex AI Model Monitoring (Option C) automatically detects skew and drift by comparing feature distributions between training data and online serving data using statistical tests like Jensen-Shannon divergence or L-infinity distance. When a model's AUC drops in production despite high training AUC, it often indicates covariate shift (feature distribution change) or concept drift (change in the relationship between features and target). Canary deployments (Option B) combined with prediction logging and actuals comparison (Option E) form a robust feedback loop that catches such drift early, allowing rollback before full rollout.

KKey Concepts to Remember

  • Data Drift
  • Canary Deployment
  • Model Validation

TExam Day Tips

  • Watch for words such as best, first, most likely and least administrative effort.
  • Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Data Drift

Real-world example

How this comes up in practice

A cloud solutions architect for a retail company is evaluating services for a new workload. The correct answer here reflects best practice for the specific scenario described — not a general cloud recommendation. Data Drift Cloud exam questions reward reading the constraint carefully: the same technology can be right or wrong depending on the use case.

What to study next

Got this wrong? Here's your next step.

Review data Drift, then practise related PDE questions on the same topic to reinforce the concept.

Related practice questions

Related PDE practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Practice this exam

Start a free PDE practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

FAQ

Questions learners often ask

What does this PDE question test?

Operationalizing machine learning models — This question tests Operationalizing machine learning models — Data Drift.

What is the correct answer to this question?

The correct answer is: Add a canary deployment step where new model version receives a small percentage of traffic before full rollout. — Option B is correct because a canary deployment allows the team to gradually roll out the new model to a small percentage of traffic, enabling early detection of performance degradation in production before a full rollout. This step directly addresses the discrepancy between training metrics and live performance by exposing the model to real-world data patterns that may differ from the training set. In Cloud Composer and Vertex AI, canary deployments can be implemented by routing a fraction of requests to the new model version and monitoring its AUC in real time.

What should I do if I get this PDE question wrong?

Review data Drift, then practise related PDE questions on the same topic to reinforce the concept.

What is the key concept behind this question?

Data Drift

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Keep practising

More PDE practice questions

Last reviewed: Jul 4, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

This PDE practice question is part of Courseiva's free Google Cloud certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the PDE exam.