A company uses SageMaker to deploy a model for predicting customer churn. The model was trained on historical data and achieves 85% accuracy on the test set. After deployment, the model's predictions are significantly worse on new data due to changes in customer behavior. What is the MOST likely cause?
Changes in customer behavior cause concept drift, reducing model accuracy over time.
Why this answer
The model's performance degradation on new data, despite high accuracy on the test set, is a classic symptom of concept drift. Concept drift occurs when the statistical properties of the target variable (customer churn) change over time due to shifts in customer behavior, making the trained model's decision boundary obsolete. SageMaker deployed the model as a persistent endpoint, but the underlying data distribution has evolved, so the model no longer generalizes to the current environment.
Exam trap
The trap here is that candidates confuse concept drift with overfitting, assuming any performance drop after deployment must be due to the model memorizing noise, but the key differentiator is the temporal nature of the degradation tied to changing customer behavior, not a static training-data issue.
How to eliminate wrong answers
Option A is wrong because data leakage would inflate test set accuracy artificially, but the model would fail immediately on new data—not after a period of deployment—and the scenario describes a gradual change in customer behavior, not a training flaw. Option B is wrong because a small training dataset typically causes high bias or variance, leading to poor accuracy on both test and new data, whereas here the model initially achieved 85% accuracy on the test set. Option D is wrong because overfitting would cause poor performance on the test set (not 85% accuracy) and would not explain a delayed degradation tied to changing customer behavior; overfitting is a static issue, not a temporal one.