A data scientist is building a model to predict whether a loan application will default. The dataset has 10,000 labeled examples with 1,000 defaults. Which metric is MOST appropriate for evaluating this highly imbalanced binary classification?
Trap 1: Precision
Precision only considers the correctness of positive predictions, ignoring false negatives.
Trap 2: Recall
Recall only measures how many actual positives are captured, ignoring false positives.
Trap 3: Accuracy
Accuracy can be high even if the model predicts 'no default' for all cases, which fails to capture the minority class.
- A
Precision
Why wrong: Precision only considers the correctness of positive predictions, ignoring false negatives.
- B
AUC-ROC
AUC-ROC evaluates model performance across all thresholds and is insensitive to class imbalance.
- C
Recall
Why wrong: Recall only measures how many actual positives are captured, ignoring false positives.
- D
Accuracy
Why wrong: Accuracy can be high even if the model predicts 'no default' for all cases, which fails to capture the minority class.