- A
Accuracy
Why wrong: Accuracy is high (99%) but masks the model's inability to detect fraud; it is not the best metric for imbalanced classes.
- B
Precision
Why wrong: Precision for the fraud class would be undefined (division by zero) because the model never predicts fraud, so it does not reveal the failure clearly.
- C
Recall
Recall for the fraud class is 0 since no fraudulent transactions are identified; this directly shows the model's failure to catch any positive cases.
- D
F1-score
Why wrong: F1-score would be 0, which does indicate failure, but recall is the fundamental metric that the model is missing all positive cases.
Quick Answer
The answer is recall. Recall, also known as sensitivity or the true positive rate, is the metric to evaluate imbalanced classification model recall because it measures how many actual positive cases—here, fraudulent transactions—the model correctly identifies. In this scenario, a model that predicts “legitimate” for every transaction achieves 99% accuracy but zero true positives, yielding a recall of 0%, which immediately exposes its complete failure to detect fraud. On the Microsoft Azure AI Fundamentals AI-900 exam, this tests your understanding that accuracy is misleading on imbalanced datasets; the exam often presents a high-accuracy trap to see if you recognize that recall reveals poor performance on imbalanced datasets. A common memory tip is “Recall catches the rare cases”—think of it as the metric that recalls the positive class from hiding, while accuracy can be fooled by a majority class.
AI-900 Practice Question: Describe fundamental principles of machine learning on Azure
This AI-900 practice question tests your understanding of describe fundamental principles of machine learning on azure. The scenario asks you to isolate a root cause — eliminate options that address a different problem before choosing. A key principle to apply: recall measures the proportion of actual positive cases correctly identified.. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.
A data scientist trains a binary classification model to detect fraudulent transactions. The dataset contains 99% legitimate transactions (negative class) and 1% fraudulent transactions (positive class). The model predicts 'legitimate' for every transaction in the test set and achieves 99% accuracy. Which metric would best reveal that the model is failing to identify any fraudulent transactions?
Clue words in this question
Noticing these words before you look at the options changes how you read each choice.
Clue:
"best"Why it matters: Signals that multiple options may be partially correct. Choose the option that most directly solves the exact problem described, not the one that sounds most complete.
Answer choices
Why each option matters
Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.
Correct answer & explanation
Recall
Recall (also known as sensitivity or true positive rate) measures the proportion of actual positive cases (fraudulent transactions) that the model correctly identifies. With 99% accuracy but zero true positives, the recall is 0%, which immediately reveals the model's complete failure to detect fraud. In Azure Machine Learning, the classification metrics pane would show recall = 0.0 for the positive class, highlighting this issue despite high accuracy.
Key principle: Recall measures the proportion of actual positive cases correctly identified.
Answer analysis
Option-by-option breakdown
For each option: why learners choose it and why it is or isn't the right answer here.
- ✗
Accuracy
Why it's wrong here
Accuracy is high (99%) but masks the model's inability to detect fraud; it is not the best metric for imbalanced classes.
- ✗
Precision
Why it's wrong here
Precision for the fraud class would be undefined (division by zero) because the model never predicts fraud, so it does not reveal the failure clearly.
- ✓
Recall
Why this is correct
Recall for the fraud class is 0 since no fraudulent transactions are identified; this directly shows the model's failure to catch any positive cases.
Clue confirmation
The clue word "best" in the question point toward this answer.
Related concept
Recall measures the proportion of actual positive cases correctly identified.
- ✗
F1-score
Why it's wrong here
F1-score would be 0, which does indicate failure, but recall is the fundamental metric that the model is missing all positive cases.
Common exam traps
Common exam trap: answer the scenario, not the keyword
Microsoft often tests the trap that high accuracy implies a good model, especially with imbalanced data, leading candidates to overlook that recall (or sensitivity) is the critical metric for detecting minority class failures.
Detailed technical explanation
How to think about this question
Recall is calculated as TP / (TP + FN). In this scenario, TP = 0 and FN = all actual frauds (1% of dataset), so recall = 0. Under the hood, Azure's automated ML and evaluation pipelines compute per-class metrics; for imbalanced datasets, the 'weighted' or 'macro' recall averages can still mask class-level failures, but per-class recall for the positive class is the definitive indicator. In real-world fraud detection, a recall below a business threshold (e.g., 90%) triggers model retraining or threshold tuning, as missing fraud incurs high financial cost.
KKey Concepts to Remember
- Recall measures the proportion of actual positive cases correctly identified.
- High recall indicates a model is good at finding all positive instances.
- Recall is crucial when the cost of false negatives is high (e.g., missing fraud).
- Recall is calculated as True Positives / (True Positives + False Negatives).
TExam Day Tips
- Watch for words such as best, first, most likely and least administrative effort.
- Review why wrong options are wrong, not only why the correct option is correct.
Key takeaway
Recall measures the proportion of actual positive cases correctly identified.
Real-world example
How this comes up in practice
A cloud solutions architect for a retail company is evaluating services for a new workload. The correct answer here reflects best practice for the specific scenario described — not a general cloud recommendation. Recall measures the proportion of actual positive cases correctly identified. Cloud exam questions reward reading the constraint carefully: the same technology can be right or wrong depending on the use case.
What to study next
Got this wrong? Here's your next step.
Review recall measures the proportion of actual positive cases correctly identified., then practise related AI-900 questions on the same topic to reinforce the concept.
- →
Describe fundamental principles of machine learning on Azure — study guide chapter
Learn the concepts, then practise the questions
- →
Describe fundamental principles of machine learning on Azure practice questions
Targeted practice on this topic area only
- →
All AI-900 questions
1,020 questions across all exam domains
- →
Microsoft Azure AI Fundamentals AI-900 study guide
Full concept coverage aligned to exam objectives
- →
AI-900 practice test guide
How to use practice tests most effectively before exam day
Related practice questions
Related AI-900 practice-question pages
Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.
Describe Artificial Intelligence workloads and considerations practice questions
Practise AI-900 questions linked to Describe Artificial Intelligence workloads and considerations.
Describe fundamental principles of machine learning on Azure practice questions
Practise AI-900 questions linked to Describe fundamental principles of machine learning on Azure.
Describe features of computer vision workloads on Azure practice questions
Practise AI-900 questions linked to Describe features of computer vision workloads on Azure.
Describe features of Natural Language Processing workloads on Azure practice questions
Practise AI-900 questions linked to Describe features of Natural Language Processing workloads on Azure.
Describe features of generative AI workloads on Azure practice questions
Practise AI-900 questions linked to Describe features of generative AI workloads on Azure.
AI-900 fundamentals practice questions
Practise AI-900 questions linked to AI-900 fundamentals.
AI-900 scenario practice questions
Practise AI-900 questions linked to AI-900 scenario.
AI-900 troubleshooting practice questions
Practise AI-900 questions linked to AI-900 troubleshooting.
Practice this exam
Start a free AI-900 practice session
Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.
FAQ
Questions learners often ask
What does this AI-900 question test?
Describe fundamental principles of machine learning on Azure — This question tests Describe fundamental principles of machine learning on Azure — Recall measures the proportion of actual positive cases correctly identified..
What is the correct answer to this question?
The correct answer is: Recall — Recall (also known as sensitivity or true positive rate) measures the proportion of actual positive cases (fraudulent transactions) that the model correctly identifies. With 99% accuracy but zero true positives, the recall is 0%, which immediately reveals the model's complete failure to detect fraud. In Azure Machine Learning, the classification metrics pane would show recall = 0.0 for the positive class, highlighting this issue despite high accuracy.
What should I do if I get this AI-900 question wrong?
Review recall measures the proportion of actual positive cases correctly identified., then practise related AI-900 questions on the same topic to reinforce the concept.
Are there clue words in this question I should notice?
Yes — watch for: "best". Signals that multiple options may be partially correct. Choose the option that most directly solves the exact problem described, not the one that sounds most complete.
What is the key concept behind this question?
Recall measures the proportion of actual positive cases correctly identified.
About these practice questions
Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →
Same concept, more angles
7 more ways this is tested on AI-900
These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.
Variation 1. A data scientist trains a binary classification model to detect fraudulent transactions. The dataset contains only 1% fraudulent cases. The model predicts 'not fraudulent' for all transactions and achieves 99% accuracy. Which metric would best reveal the model's poor performance on fraud detection?
medium- A.Precision
- ✓ B.Recall
- C.F1 score
- D.Accuracy
Why B: Recall (sensitivity) measures the proportion of actual positive cases (fraudulent transactions) correctly identified by the model. With 1% fraud, a model that predicts 'not fraudulent' for all transactions will have a recall of 0% because it fails to catch any true positives, despite 99% accuracy. This makes recall the best metric to reveal the model's inability to detect fraud.
Variation 2. A data scientist trains a binary classification model to detect fraudulent transactions. The dataset contains only 2% fraudulent transactions. The model achieves 98% overall accuracy, but it fails to detect any fraudulent transactions, classifying all transactions as legitimate. Which metric would most clearly reveal this failure?
hard- A.Precision
- ✓ B.Recall
- C.F1 score
- D.Specificity
Why B: Recall (also known as sensitivity or true positive rate) measures the proportion of actual positive cases (fraudulent transactions) that were correctly identified by the model. In this scenario, the model classifies all transactions as legitimate, so it detects zero fraudulent transactions, yielding a recall of 0%. Despite 98% overall accuracy, the recall metric clearly exposes the model's complete failure to identify any fraud.
Variation 3. A data scientist trains a binary classification model to detect a rare disease. The dataset contains 99% negative cases and only 1% positive cases. The model predicts all cases as negative, achieving an accuracy of 99% on the test set. However, the business requires the model to identify as many positive cases as possible. Which metric should the data scientist examine to best reveal that the model is failing to identify any positive cases?
medium- A.Precision
- ✓ B.Recall
- C.F1 score
- D.AUC-ROC
Why B: Recall (sensitivity) measures the proportion of actual positive cases correctly identified by the model. With all predictions as negative, recall is 0%, directly revealing the model's failure to detect any positive cases despite the high accuracy.
Variation 4. A data scientist trains a binary classification model to detect spam emails. The dataset contains 95% legitimate emails (negative class) and 5% spam (positive class). The model predicts all emails as legitimate. The accuracy is 95%, but the model is useless. Which metric would best indicate the model's failure?
hard- A.Precision
- ✓ B.Recall
- C.F1 score
- D.Specificity
Why B: Recall (sensitivity) measures the proportion of actual positive cases correctly identified. With 5% spam and the model predicting all as legitimate, recall is 0% because no spam emails are detected. This directly exposes the model's failure to identify the positive class despite high accuracy.
Variation 5. A data scientist trains a binary classification model to predict loan defaults. The dataset contains 98% non-default cases and only 2% default cases. The model predicts 'non-default' for every instance, achieving 98% accuracy on the test set. Which metric would best reveal that the model fails to identify any actual defaults?
hard- ✓ A.Recall for the default class
- B.Precision for the default class
- C.F1 score for the default class
- D.Accuracy
Why A: Recall for the default class measures the proportion of actual default cases that the model correctly identifies. With the model predicting 'non-default' for every instance, recall for the default class is 0%, because it fails to capture any true positives. This directly reveals the model's inability to detect any actual defaults, despite the high overall accuracy.
Variation 6. A data scientist trains a binary classification model to predict whether a loan applicant will default (positive class) or not (negative class). The training data contains 5% default cases. The model predicts 'no default' for every applicant in the test set and achieves 95% accuracy. Which evaluation metric best reveals that the model is failing to identify any default cases?
medium- A.A. Precision for the default class
- ✓ B.B. Recall for the default class
- C.C. F1-score for the default class
- D.D. Overall accuracy
Why B: Recall for the default class (positive class) measures the proportion of actual default cases that the model correctly identifies. With a model that predicts 'no default' for every applicant, recall for the default class is 0% because it fails to identify any true positive cases. This metric directly reveals the model's inability to detect defaults, despite the high overall accuracy of 95%.
Variation 7. A data scientist is building a classification model to detect fraudulent transactions. The dataset has 1,000,000 legitimate transactions and only 1,000 fraudulent ones. The model achieves 99.9% accuracy on the test set, but it fails to catch most fraudulent cases. Which metric should the data scientist prioritize to better evaluate the model's performance on this imbalanced dataset?
hard- A.Accuracy
- B.Mean Squared Error
- ✓ C.Recall
- D.R-squared
Why C: Recall measures the proportion of actual positive cases (fraudulent transactions) correctly identified by the model. With only 1,000 fraud cases out of 1,001,000 total transactions, a model that predicts 'legitimate' for every transaction would achieve 99.9% accuracy but 0% recall, making recall the critical metric for imbalanced fraud detection.
Last reviewed: Jun 30, 2026
This AI-900 practice question is part of Courseiva's free Microsoft certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the AI-900 exam.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.