A data scientist is training a binary classification model using Amazon SageMaker. The dataset is highly imbalanced (99% negative class, 1% positive class). The model currently achieves 99% accuracy but fails to detect most positive cases. Which metric should the data scientist primarily use to evaluate model performance?
Trap 1: ROC AUC
ROC AUC can be overly optimistic for imbalanced data.
Trap 2: Recall
Recall alone ignores false positives.
Trap 3: Accuracy
Accuracy is misleading for imbalanced datasets.
- A
ROC AUC
Why wrong: ROC AUC can be overly optimistic for imbalanced data.
- B
F1 score
F1 score balances precision and recall, suitable for imbalanced data.
- C
Recall
Why wrong: Recall alone ignores false positives.
- D
Accuracy
Why wrong: Accuracy is misleading for imbalanced datasets.