MLS-C01 · topic practice

Modeling practice questions

Practise AWS Certified Machine Learning Specialty MLS-C01 Modeling practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Modeling

What the exam tests

What to know about Modeling

Modeling questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Modeling exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Modeling questions

20 questions · select your answer, then reveal the explanation

Question 1easymultiple choice
Read the full Modeling explanation →

A data scientist is training a binary classification model using Amazon SageMaker. The dataset is highly imbalanced (99% negative class, 1% positive class). The model currently achieves 99% accuracy but fails to detect most positive cases. Which metric should the data scientist primarily use to evaluate model performance?

Question 2easymultiple choice
Read the full Modeling explanation →

A team is building a product recommendation system using matrix factorization in Amazon SageMaker. They notice that the model's training loss decreases steadily but validation loss starts increasing after 5 epochs. What is the most likely cause?

Question 3mediummultiple choice
Read the full Modeling explanation →

A company is using Amazon SageMaker to train a deep learning model on a large dataset. The training job is taking too long. The team wants to reduce training time without changing the model architecture. Which action should they take?

Question 4mediummultiple choice
Read the full Modeling explanation →

A data scientist is deploying a regression model in Amazon SageMaker that predicts housing prices. The model shows high bias (underfitting). Which action is most likely to reduce bias?

Question 5hardmultiple choice
Read the full Modeling explanation →

A machine learning engineer is training a neural network on Amazon SageMaker using a custom Docker container. The training job fails with an error: 'CUDA out of memory.' The training instance is an ml.p3.2xlarge with 16 GB GPU memory. The model and data fit into memory when using batch size 32, but the engineer wants to maximize GPU utilization. Which approach should the engineer use to fix the out-of-memory error while maintaining efficient training?

Question 6hardmulti select
Read the full Modeling explanation →

A data scientist is training a deep learning model using Amazon SageMaker. The training loss is decreasing, but the validation loss starts increasing after 10 epochs. The model is overfitting. Which TWO actions should the data scientist take to reduce overfitting? (Choose 2.)

Question 7hardmulti select
Read the full Modeling explanation →

A company is using Amazon SageMaker to tune hyperparameters for a gradient boosting model. The objective is to minimize root mean squared error (RMSE). The data scientist wants to explore the hyperparameter space efficiently. Which THREE hyperparameter tuning strategies should the data scientist consider? (Choose 3.)

Question 8mediummultiple choice
Read the full NAT/PAT explanation →

A data scientist is training a binary classifier on an imbalanced dataset where the positive class represents 1% of the data. The model currently achieves 99% accuracy but a recall of only 10% on the positive class. Which metric combination should the data scientist prioritize to evaluate model improvements?

Question 9hardmultiple choice
Read the full Modeling explanation →

An e-commerce company uses a linear regression model to predict customer lifetime value (LTV). The model shows high variance on the test set, with training RMSE much lower than test RMSE. Which of the following is the MOST effective approach to reduce overfitting?

Question 10easymultiple choice
Read the full Modeling explanation →

A company wants to use Amazon SageMaker to train a deep learning model using a custom TensorFlow script. The data is stored in an S3 bucket. Which SageMaker API operation should be used to launch the training job?

Question 11hardmultiple choice
Read the full Modeling explanation →

A data scientist is building a multi-class classification model with 10 classes. The dataset has 100,000 samples. After training a random forest with 100 trees, the model achieves 85% accuracy on the test set. However, the data scientist notices that for one rare class (1% of data), recall is only 5%. Which technique is MOST likely to improve recall for the rare class without significantly reducing overall accuracy?

Question 12mediummultiple choice
Read the full Modeling explanation →

A company uses an XGBoost model to predict equipment failures. The model has high precision but low recall. The business impact of a false negative is very high (missing a failure). Which action would MOST effectively increase recall while keeping precision reasonably high?

Question 13mediummulti select
Read the full Modeling explanation →

Which TWO metrics are MOST appropriate for evaluating a regression model that predicts house prices, where the business is most sensitive to large errors?

Question 14hardmulti select
Read the full Modeling explanation →

Which THREE techniques can help reduce overfitting in a neural network trained on a small dataset?

Question 15hardmultiple choice
Read the full Modeling explanation →

A data scientist runs a SageMaker training job that fails with the above error. The S3 bucket and object exist, and the IAM role has s3:GetObject permission. What is the MOST likely cause?

Exhibit

Refer to the exhibit.

```
Training job status: Failed
Error: ClientError: Data download failed.
The downloaded file size (0 bytes) does not match expected size (1024 bytes).
Check that the S3 object exists and is readable.
```
Question 16easymultiple choice
Read the full Modeling explanation →

A data scientist is trying to run a SageMaker training job that writes output to an S3 bucket 'my-bucket'. The IAM policy is shown. The training job fails with an AccessDenied error when trying to write to S3. What is the reason?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateTrainingJob",
        "sagemaker:DescribeTrainingJob",
        "sagemaker:StopTrainingJob"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*"
    }
  ]
}
```
Question 17mediummultiple choice
Read the full Modeling explanation →

A data scientist is training a binary classifier to predict customer churn. The dataset has 10,000 samples, with 500 churners (positive class). The scientist trains a logistic regression model and obtains an F1-score of 0.6. To improve the F1-score, which approach is MOST likely to be effective?

Question 18hardmultiple choice
Read the full Modeling explanation →

A company is deploying a real-time fraud detection system using a gradient boosting model on AWS SageMaker. The model uses 200 features and is trained on 50 GB of data. The inference latency requirement is under 10 ms per request. During load testing, the endpoint shows average latency of 15 ms. Which change is MOST likely to reduce latency below 10 ms?

Question 19easymultiple choice
Read the full Modeling explanation →

A machine learning team is training a deep learning model on Amazon SageMaker and notices that the training loss is decreasing but the validation loss is increasing. What is the most likely cause?

Question 20mediummultiple choice
Read the full Modeling explanation →

A company is building a recommendation system for an e-commerce platform. They have user-item interaction data (clicks, purchases) and want to use matrix factorization. They plan to use Amazon SageMaker to train the model. Which dataset format is MOST appropriate for the built-in Factorization Machines algorithm?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Modeling sessions

Start a Modeling only practice session

Every question in these sessions is drawn from the Modeling domain — nothing else.

Related practice questions

Related MLS-C01 topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the MLS-C01 exam test about Modeling?
Modeling questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Modeling questions in a focused session?
Yes — the session launcher on this page draws every question from the Modeling domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other MLS-C01 topics?
Use the topic links above to move to related areas, or go back to the MLS-C01 question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the MLS-C01 exam covers. They are not copied from any real exam or dump site.