A data scientist is building a classification model to detect fraudulent transactions. The dataset is highly imbalanced with only 1% fraudulent cases. Which approach should the scientist use to evaluate model performance most effectively?
Trap 1: Accuracy
Accuracy is not suitable for imbalanced datasets as it can be high even if the model fails to detect fraud.
Trap 2: Recall
Recall only considers false negatives, not false positives, so it is insufficient alone.
Trap 3: Precision
Precision only considers false positives, not false negatives, so it does not fully capture model performance on fraud detection.
- A
F1 score
F1 score is the harmonic mean of precision and recall, providing a balanced measure for imbalanced datasets.
- B
Accuracy
Why wrong: Accuracy is not suitable for imbalanced datasets as it can be high even if the model fails to detect fraud.
- C
Recall
Why wrong: Recall only considers false negatives, not false positives, so it is insufficient alone.
- D
Precision
Why wrong: Precision only considers false positives, not false negatives, so it does not fully capture model performance on fraud detection.