- A
Use random undersampling of the majority class to balance the dataset
Why wrong: Undersampling can improve recall but often at the cost of losing information and reducing precision.
- B
Use oversampling techniques like SMOTE to create synthetic samples of the minority class
SMOTE generates synthetic minority samples, helping the model learn better decision boundaries for the minority class, improving recall with less precision loss.
- C
Change the decision threshold to 0.3
Why wrong: Lowering the threshold increases recall but can significantly reduce precision, which may not be acceptable.
- D
Increase the regularization strength (C) in logistic regression
Why wrong: Stronger regularization reduces model complexity, which typically worsens recall.
Quick Answer
The answer is to use oversampling techniques like SMOTE to create synthetic samples of the minority class. SMOTE, or Synthetic Minority Oversampling Technique, works by interpolating between existing minority instances to generate novel, realistic data points, which directly addresses the low recall of 20% by balancing the class distribution without simply duplicating records. This allows the logistic regression model to learn better decision boundaries for the default class, improving recall toward the 70% target while reducing the overfitting risk that comes with naive oversampling, thus helping maintain precision. On the AWS Certified Machine Learning Specialty MLS-C01 exam, this scenario tests your understanding of how to handle imbalanced classification without discarding data—a common trap is choosing undersampling, which would discard valuable majority samples and hurt precision. Remember the memory tip: SMOTE “smooths” the imbalance by creating synthetic neighbors, not clones.
MLS-C01 Modeling Practice Question
This MLS-C01 practice question tests your understanding of modeling. Read the scenario carefully and evaluate each option against the stated constraints before committing to an answer. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.
A data scientist is working on a binary classification problem to predict loan default. The dataset has 200,000 samples and 50 features. The target variable is imbalanced: 5% default, 95% non-default. The scientist trains a logistic regression model and achieves 95% accuracy, but the recall for the default class is only 20%. The business requires that at least 70% of actual defaults be identified (recall >= 0.7). Which approach should the scientist take to improve recall without significantly sacrificing precision?
Clue words in this question
Noticing these words before you look at the options changes how you read each choice.
Clue:
"least"Why it matters: You want the option with minimum overhead, fewest steps, or lowest impact — not the most feature-rich or comprehensive answer.
Answer choices
Why each option matters
Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.
Correct answer & explanation
Use oversampling techniques like SMOTE to create synthetic samples of the minority class
SMOTE (Synthetic Minority Oversampling Technique) creates synthetic samples for the minority class by interpolating between existing minority instances, which increases the representation of the default class in the training data. This directly addresses the low recall (20%) by providing the logistic regression model with more balanced class distributions, enabling it to learn decision boundaries that capture more true positives without discarding majority class information. Unlike simple oversampling, SMOTE reduces overfitting risk by generating novel samples rather than duplicating existing ones, which helps maintain precision while improving recall.
Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Answer analysis
Option-by-option breakdown
For each option: why learners choose it and why it is or isn't the right answer here.
- ✗
Use random undersampling of the majority class to balance the dataset
Why it's wrong here
Undersampling can improve recall but often at the cost of losing information and reducing precision.
- ✓
Use oversampling techniques like SMOTE to create synthetic samples of the minority class
Why this is correct
SMOTE generates synthetic minority samples, helping the model learn better decision boundaries for the minority class, improving recall with less precision loss.
Clue confirmation
The clue word "least" in the question point toward this answer.
Related concept
Read the scenario before looking for a memorised answer.
- ✗
Change the decision threshold to 0.3
Why it's wrong here
Lowering the threshold increases recall but can significantly reduce precision, which may not be acceptable.
- ✗
Increase the regularization strength (C) in logistic regression
Why it's wrong here
Stronger regularization reduces model complexity, which typically worsens recall.
Common exam traps
Common exam trap: answer the scenario, not the keyword
The trap here is that candidates often choose threshold adjustment (Option C) as a quick fix for recall, failing to recognize that it is a superficial change that does not improve the model's learned decision boundary and typically sacrifices precision disproportionately, whereas SMOTE addresses the root cause of imbalance in the training data.
Detailed technical explanation
How to think about this question
SMOTE works by selecting a minority class sample, finding its k-nearest neighbors (typically k=5), and generating synthetic instances along the line segments connecting the sample to randomly chosen neighbors in feature space. This interpolation preserves the underlying data distribution better than random oversampling, which simply duplicates existing samples and can lead to overfitting. In practice, SMOTE is often combined with undersampling of the majority class (e.g., SMOTEENN or SMOTETomek) to clean noisy borderline samples, but the core mechanism of synthetic generation is what directly improves recall for imbalanced datasets.
KKey Concepts to Remember
- Read the scenario before looking for a memorised answer.
- Find the constraint that changes the correct option.
- Eliminate answers that are true in general but not in this case.
TExam Day Tips
- Watch for words such as best, first, most likely and least administrative effort.
- Review why wrong options are wrong, not only why the correct option is correct.
Key takeaway
Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Real-world example
How this comes up in practice
A cloud solutions architect for a retail company is evaluating services for a new workload. The correct answer here reflects best practice for the specific scenario described — not a general cloud recommendation. Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option. Cloud exam questions reward reading the constraint carefully: the same technology can be right or wrong depending on the use case.
What to study next
Got this wrong? Here's your next step.
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
- →
Modeling — study guide chapter
Learn the concepts, then practise the questions
- →
Modeling practice questions
Targeted practice on this topic area only
- →
All MLS-C01 questions
1,755 questions across all exam domains
- →
AWS Certified Machine Learning Specialty MLS-C01 study guide
Full concept coverage aligned to exam objectives
- →
MLS-C01 practice test guide
How to use practice tests most effectively before exam day
Related practice questions
Related MLS-C01 practice-question pages
Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.
Data Engineering practice questions
Practise MLS-C01 questions linked to Data Engineering.
Machine Learning Implementation and Operations practice questions
Practise MLS-C01 questions linked to Machine Learning Implementation and Operations.
Modeling practice questions
Practise MLS-C01 questions linked to Modeling.
Exploratory Data Analysis practice questions
Practise MLS-C01 questions linked to Exploratory Data Analysis.
MLS-C01 fundamentals practice questions
Practise MLS-C01 questions linked to MLS-C01 fundamentals.
MLS-C01 scenario practice questions
Practise MLS-C01 questions linked to MLS-C01 scenario.
MLS-C01 troubleshooting practice questions
Practise MLS-C01 questions linked to MLS-C01 troubleshooting.
Practice this exam
Start a free MLS-C01 practice session
Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.
FAQ
Questions learners often ask
What does this MLS-C01 question test?
Modeling — This question tests Modeling — Read the scenario before looking for a memorised answer..
What is the correct answer to this question?
The correct answer is: Use oversampling techniques like SMOTE to create synthetic samples of the minority class — SMOTE (Synthetic Minority Oversampling Technique) creates synthetic samples for the minority class by interpolating between existing minority instances, which increases the representation of the default class in the training data. This directly addresses the low recall (20%) by providing the logistic regression model with more balanced class distributions, enabling it to learn decision boundaries that capture more true positives without discarding majority class information. Unlike simple oversampling, SMOTE reduces overfitting risk by generating novel samples rather than duplicating existing ones, which helps maintain precision while improving recall.
What should I do if I get this MLS-C01 question wrong?
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
Are there clue words in this question I should notice?
Yes — watch for: "least". You want the option with minimum overhead, fewest steps, or lowest impact — not the most feature-rich or comprehensive answer.
What is the key concept behind this question?
Read the scenario before looking for a memorised answer.
About these practice questions
Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →
Same concept, more angles
1 more ways this is tested on MLS-C01
These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.
Variation 1. A data scientist is training a binary classifier using imbalanced data. Which TWO techniques can help improve model performance on the minority class? (Choose two.)
easy- A.Undersample the majority class randomly.
- B.Use accuracy as the evaluation metric.
- ✓ C.Use the F1 score as the evaluation metric.
- ✓ D.Oversample the minority class using SMOTE.
- E.Apply L1 regularization to the model.
Why C: The F1 score is the harmonic mean of precision and recall, making it a robust evaluation metric for imbalanced datasets because it captures both false positives and false negatives. Unlike accuracy, which can be misleadingly high when the majority class dominates, the F1 score provides a balanced measure of model performance on the minority class.
Last reviewed: Jun 24, 2026
This MLS-C01 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the MLS-C01 exam.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.