A dataset for binary classification has a severe class imbalance (5% positive class). Which two data preparation techniques can help address this imbalance? (Choose two.)
Reduces majority class size to balance with minority class.
Why this answer
Option D is correct because undersampling the majority class reduces the number of instances from the dominant class, helping to balance the dataset and prevent the model from being biased toward the majority class. This technique is straightforward and can be effective when the majority class has redundant or noisy samples, though it risks losing valuable information.
Exam trap
AWS often tests the distinction between techniques that change the dataset distribution (like undersampling and oversampling) versus those that only affect model training or evaluation (like stratified splitting), leading candidates to mistakenly select stratified splitting as a balancing technique.