A data scientist is training a binary classification model to detect fraudulent transactions. The dataset has 99% legitimate transactions and 1% fraudulent. The model achieves 99% accuracy but fails to catch most fraud. Which metric should the team prioritize to evaluate model performance?
Recall measures the ability to catch fraudulent transactions, which is the primary goal.
Why this answer
Recall (sensitivity) measures the proportion of actual positive cases (fraud) correctly identified. With 99% accuracy but failing to catch most fraud, the model is biased toward the majority class (legitimate transactions), so recall is the critical metric to ensure fraud detection improves.
Exam trap
CompTIA often tests the misconception that high accuracy implies good model performance, especially in imbalanced datasets, leading candidates to overlook recall as the appropriate metric for minority class detection.
How to eliminate wrong answers
Option A is wrong because F1 score is the harmonic mean of precision and recall; while useful, it does not isolate the model's ability to catch fraud, and in this imbalanced dataset, a high F1 could still mask poor recall if precision is high. Option B is wrong because precision measures how many predicted frauds are actually fraud, but the model's failure to catch most fraud means recall is the primary concern, not the false positive rate. Option C is wrong because accuracy is misleading in imbalanced datasets; 99% accuracy can be achieved by simply predicting 'legitimate' for all transactions, which explains why the model fails to detect fraud.