Back to AWS Certified Machine Learning Specialty MLS-C01 questions

Scenario-based practice

Select Two (Multi-Select) Questions

Practise AWS Certified Machine Learning Specialty MLS-C01 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

20
scenario questions
MLS-C01
exam code
Amazon Web Services
vendor

Scenario guide

How to approach select two (multi-select) questions

Multi-select questions tell you to 'Choose TWO' or 'Choose THREE'. Getting partial credit is not a thing — you must select all correct answers with no incorrect ones. The stem always states how many to choose, so trust it. These questions require precision, not best-guess elimination.

Quick answer

Select Two (Multi-Select) Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Related practice questions

Related MLS-C01 topic practice pages

Scenario questions usually connect to one or more exam topics. Use these links to review the underlying concepts behind the scenario.

Practice set

Practice scenarios

Question 1mediummulti select
Read the full NAT/PAT explanation →

A data engineering team is designing a data lake on AWS for machine learning workloads. The data includes structured, semi-structured, and unstructured data. The team needs to ensure that the data is cataloged, easily discoverable, and can be queried by Amazon Athena and Amazon EMR. The team also wants to enforce fine-grained access control at the column and row level for sensitive data. Which combination of AWS services should the team use? (Select TWO.)

Question 2hardmulti select
Full question →

A company needs to build a data lake on AWS for analytics. The data includes structured, semi-structured, and unstructured data. The solution must support schema-on-read, provide fine-grained access control, and be cost-effective for storing rarely accessed data. Which THREE services should be used? (Choose THREE)

Question 3hardmulti select
Full question →

A data scientist is training a deep learning model using Amazon SageMaker. The training loss is decreasing, but the validation loss starts increasing after 10 epochs. The model is overfitting. Which TWO actions should the data scientist take to reduce overfitting? (Choose 2.)

Question 4hardmulti select
Full question →

A data scientist is analyzing a dataset of customer reviews. The dataset contains a text column 'review' and a numerical rating from 1 to 5. The data scientist wants to create features for sentiment analysis. Which THREE preprocessing steps should be applied to the text data before feature extraction? (Choose THREE.)

Question 5mediummulti select
Full question →

Which TWO of the following are appropriate techniques for detecting outliers in a univariate continuous feature?

Question 6hardmulti select
Full question →

Which THREE of the following are best practices when performing exploratory data analysis on a dataset with both numerical and categorical features?

Question 7hardmulti select
Full question →

A company is using Amazon SageMaker to tune hyperparameters for a gradient boosting model. The objective is to minimize root mean squared error (RMSE). The data scientist wants to explore the hyperparameter space efficiently. Which THREE hyperparameter tuning strategies should the data scientist consider? (Choose 3.)

Question 8mediummulti select
Full question →

Which TWO metrics are MOST appropriate for evaluating a regression model that predicts house prices, where the business is most sensitive to large errors?

Question 9hardmulti select
Full question →

Which THREE techniques can help reduce overfitting in a neural network trained on a small dataset?

Question 10mediummulti select
Full question →

A data scientist is training a neural network for image classification. The training loss is not decreasing significantly, and the validation loss is high. Which TWO actions should the scientist take to address potential vanishing gradients?

Question 11hardmulti select
Full question →

A company is using Amazon SageMaker to train a large language model. The training job is taking too long. The data scientist wants to reduce training time without sacrificing model accuracy. Which THREE strategies are MOST appropriate?

Question 12mediummulti select
Full question →

A company is using Amazon SageMaker to build a machine learning pipeline. The pipeline includes data preprocessing, training, and evaluation steps. The company wants to ensure that the pipeline is reproducible and that artifacts are versioned. Which TWO actions should be taken? (Choose TWO.)

Question 13hardmulti select
Full question →

A data scientist is deploying a model on Amazon SageMaker for real-time inference. The model is a PyTorch model that requires custom inference code. The data scientist needs to handle variable-length inputs and optimize inference latency. Which THREE steps should the data scientist take? (Choose THREE.)

Question 14hardmulti select
Full question →

A machine learning team is building a multi-class image classifier using a pre-trained ResNet-50 model in Amazon SageMaker. The dataset has 10 classes but is highly imbalanced, with one class representing 80% of the samples. The team wants to improve model performance on the minority classes. Which TWO of the following approaches are most likely to help? (Select TWO.)

Question 15hardmulti select
Full question →

A data science team is training a large deep learning model using Amazon SageMaker. The training job is taking a long time because the model has many layers and the dataset is large. The team wants to reduce training time by distributing the training across multiple GPUs on a single instance, as well as across multiple instances. Which TWO actions should the team take? (Choose two.)

Question 16mediummulti select
Full question →

A data scientist is analyzing a dataset with 100 features and 10,000 observations. The target variable is binary (0/1). Initial exploratory data analysis reveals that many features have missing values, high correlation with each other, and non-normal distributions. The data scientist wants to identify the most important features for predicting the target while reducing dimensionality. Which TWO actions should the data scientist take? (Choose two.)

Question 17hardmulti select
Full question →

A company is building a real-time anomaly detection system for network traffic logs. The logs are ingested via Amazon Kinesis Data Streams and processed with an Amazon SageMaker endpoint for inference. The team needs to ensure that the inference results are stored durably and can be replayed for model retraining. The system must handle at least 10,000 records per second with low latency. Which three AWS services should the team use to build this architecture? (Select THREE.)

Question 18mediummulti select
Full question →

A data engineer is designing a streaming pipeline using Amazon Kinesis Data Analytics for Apache Flink. The pipeline reads from a Kinesis data stream and writes to a S3 bucket. The job must recover quickly from failures without reprocessing large amounts of data. Which TWO configurations should be used? (Choose TWO)

Question 19mediummulti select
Full question →

A data scientist is building a recommender system using Amazon SageMaker. The dataset contains user-item interactions with implicit feedback (clicks). Which THREE evaluation metrics are appropriate for this use case?

Question 20hardmulti select
Full question →

Which THREE of the following are common causes of overfitting in machine learning models?

These MLS-C01 practice questions are part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style MLS-C01 questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.