- A
Using a complex model like a deep neural network on a small dataset
Complex models on small data overfit.
- B
Model has too many parameters relative to the number of training samples
Too many parameters allow memorization.
- C
Having a large dataset with many samples
Why wrong: More data reduces overfitting.
- D
Training for too many epochs
Long training can lead to overfitting.
- E
Using regularization techniques
Why wrong: Regularization prevents overfitting.
Quick Answer
The answer is training for too many epochs, as this allows the model to memorize noise and irrelevant patterns in the training data rather than learning generalizable features. This occurs because a complex model, such as a deep neural network with high capacity, can easily overfit when exposed to a small dataset, capturing spurious correlations that do not reflect the true underlying distribution. On the AWS Certified Machine Learning Specialty MLS-C01 exam, this concept tests your understanding of the bias-variance tradeoff and model regularization; a common trap is confusing overfitting with underfitting or assuming more epochs always improve accuracy. To avoid this, remember the mnemonic "Epochs Overfit Noise" (EON) — excessive epochs on limited data lead to memorization, not generalization.
MLS-C01 Modeling Practice Question
This MLS-C01 practice question tests your understanding of modeling. The scenario asks you to isolate a root cause — eliminate options that address a different problem before choosing. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.
Which THREE of the following are common causes of overfitting in machine learning models?
Answer choices
Why each option matters
Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.
Correct answer & explanation
Using a complex model like a deep neural network on a small dataset
Option A is correct because a complex model like a deep neural network has high capacity and can easily memorize noise and patterns specific to a small dataset, rather than learning generalizable features. With limited training samples, the model fails to capture the underlying data distribution, leading to poor performance on unseen data.
Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Answer analysis
Option-by-option breakdown
For each option: why learners choose it and why it is or isn't the right answer here.
- ✓
Using a complex model like a deep neural network on a small dataset
Why this is correct
Complex models on small data overfit.
Related concept
Read the scenario before looking for a memorised answer.
- ✓
Model has too many parameters relative to the number of training samples
Why this is correct
Too many parameters allow memorization.
Related concept
Read the scenario before looking for a memorised answer.
- ✗
Having a large dataset with many samples
Why it's wrong here
More data reduces overfitting.
- ✓
Training for too many epochs
Why this is correct
Long training can lead to overfitting.
Related concept
Read the scenario before looking for a memorised answer.
- ✗
Using regularization techniques
Why it's wrong here
Regularization prevents overfitting.
Common exam traps
Common exam trap: answer the scenario, not the keyword
Cisco often tests the misconception that more data or regularization causes overfitting, when in fact both are standard countermeasures; the trap is confusing correlation with causation in model training dynamics.
Detailed technical explanation
How to think about this question
Overfitting occurs when a model learns the training data too well, including its noise and outliers, resulting in high variance. This is often quantified by a large gap between training and validation loss. In practice, techniques like early stopping, cross-validation, and data augmentation are used alongside regularization to mitigate overfitting, especially in deep learning where the number of parameters can exceed the number of samples by orders of magnitude.
KKey Concepts to Remember
- Read the scenario before looking for a memorised answer.
- Find the constraint that changes the correct option.
- Eliminate answers that are true in general but not in this case.
TExam Day Tips
- Watch for words such as best, first, most likely and least administrative effort.
- Review why wrong options are wrong, not only why the correct option is correct.
Key takeaway
Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.
Real-world example
How this comes up in practice
A healthcare organisation deploys an application with a public-facing web tier and a private database tier. The database subnet has no public IP and only accepts connections from the web tier's security group. Questions like this test whether you can design cloud network isolation using VNets/VPCs, subnets, and security group rules.
What to study next
Got this wrong? Here's your next step.
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
- →
Modeling — study guide chapter
Learn the concepts, then practise the questions
- →
Modeling practice questions
Targeted practice on this topic area only
- →
All MLS-C01 questions
1,755 questions across all exam domains
- →
AWS Certified Machine Learning Specialty MLS-C01 study guide
Full concept coverage aligned to exam objectives
- →
MLS-C01 practice test guide
How to use practice tests most effectively before exam day
Related practice questions
Related MLS-C01 practice-question pages
Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.
Data Engineering practice questions
Practise MLS-C01 questions linked to Data Engineering.
Machine Learning Implementation and Operations practice questions
Practise MLS-C01 questions linked to Machine Learning Implementation and Operations.
Modeling practice questions
Practise MLS-C01 questions linked to Modeling.
Exploratory Data Analysis practice questions
Practise MLS-C01 questions linked to Exploratory Data Analysis.
MLS-C01 fundamentals practice questions
Practise MLS-C01 questions linked to MLS-C01 fundamentals.
MLS-C01 scenario practice questions
Practise MLS-C01 questions linked to MLS-C01 scenario.
MLS-C01 troubleshooting practice questions
Practise MLS-C01 questions linked to MLS-C01 troubleshooting.
Practice this exam
Start a free MLS-C01 practice session
Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.
FAQ
Questions learners often ask
What does this MLS-C01 question test?
Modeling — This question tests Modeling — Read the scenario before looking for a memorised answer..
What is the correct answer to this question?
The correct answer is: Using a complex model like a deep neural network on a small dataset — Option A is correct because a complex model like a deep neural network has high capacity and can easily memorize noise and patterns specific to a small dataset, rather than learning generalizable features. With limited training samples, the model fails to capture the underlying data distribution, leading to poor performance on unseen data.
What should I do if I get this MLS-C01 question wrong?
Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.
What is the key concept behind this question?
Read the scenario before looking for a memorised answer.
About these practice questions
Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →
Same concept, more angles
2 more ways this is tested on MLS-C01
These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.
Variation 1. A data scientist is training a neural network for a multi-class classification problem with 100 classes. The model uses a softmax output layer and cross-entropy loss. During training, the loss decreases steadily but the accuracy on the validation set plateaus early. Which of the following is the most likely cause?
hard- A.Batch size is too large
- ✓ B.The model is overfitting the training data
- C.Number of epochs is too small
- D.Learning rate is too high
Why B: When the validation accuracy plateaus early while training loss continues to decrease, it indicates that the model is memorizing the training data rather than learning generalizable patterns. This is classic overfitting, where the softmax output layer produces high-confidence predictions for training samples but fails to generalize to unseen validation data, causing cross-entropy loss to drop on the training set while validation accuracy stagnates.
Variation 2. A data scientist is training a neural network for a multi-class classification problem. The model is overfitting. Which TWO of the following techniques can help reduce overfitting? (Choose two.)
medium- A.Increase the number of hidden layers.
- ✓ B.Add dropout layers after the hidden layers.
- C.Decrease the learning rate.
- ✓ D.Add L2 regularization to the loss function.
- E.Reduce the batch size.
Why B: Option B is correct because dropout layers randomly deactivate a fraction of neurons during training, which prevents the network from relying too heavily on any single neuron and forces it to learn more robust features. This reduces co-adaptation among neurons and is a standard regularization technique to combat overfitting in neural networks.
Keep practising
More MLS-C01 practice questions
- A company is using Amazon Kinesis Data Streams to ingest real-time clickstream data. The data is consumed by a Lambda fu…
- A team is building a data pipeline to process terabytes of log data daily using Amazon EMR. The data arrives in 5-minute…
- A data science team is building a real-time fraud detection system. Transactions are streamed via Amazon Kinesis Data St…
- A company uses Amazon SageMaker to train and deploy machine learning models. The training data is stored in Amazon S3 (P…
- A data engineer is building a data pipeline to process user clickstream data. The data arrives as JSON files in an S3 bu…
- A data engineering team is designing a data lake on AWS for machine learning workloads. The data includes structured, se…
Last reviewed: Jun 24, 2026
This MLS-C01 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the MLS-C01 exam.
Question Discussion
Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.
Sign in to join the discussion.