A data scientist is training a binary classification model using imbalanced data where the positive class is only 1% of the dataset. The scientist wants to maximize the recall for the positive class while maintaining reasonable precision. Which evaluation metric is most appropriate to tune during model selection?
Trap 1: Log loss
Log loss measures probability calibration, not classification performance for the minority class directly.
Trap 2: Area under the ROC curve (AUC)
AUC measures rank ordering but does not directly optimize recall at a specific threshold.
Trap 3: Accuracy
Accuracy can be high even if the model predicts all negatives, failing to capture the minority class.
- A
Log loss
Why wrong: Log loss measures probability calibration, not classification performance for the minority class directly.
- B
Area under the ROC curve (AUC)
Why wrong: AUC measures rank ordering but does not directly optimize recall at a specific threshold.
- C
F1 score
F1 score combines precision and recall, making it suitable for imbalanced classes when both matter.
- D
Accuracy
Why wrong: Accuracy can be high even if the model predicts all negatives, failing to capture the minority class.