How many Select Two (Multi-Select) Questions questions are on this page?

This page has 20 Select Two (Multi-Select) Questions scenario questions for the MLS-C01 exam, each with detailed explanations and wrong-answer analysis.

How should I approach MLS-C01 scenario questions?

Read the full scenario before looking at the answer options. Identify the constraint or requirement in the scenario, then eliminate options that are generally true but wrong for this specific case. Scenario questions reward careful reading over pattern matching.

← Back to AWS Certified Machine Learning Specialty MLS-C01 questions

Scenario-based practice

Select Two (Multi-Select) Questions

Practise AWS Certified Machine Learning Specialty MLS-C01 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

Start full practice test Read exam guide

scenario questions

MLS-C01

exam code

Amazon Web Services

vendor

Scenario guide

How to approach select two (multi-select) questions

Multi-select questions tell you to 'Choose TWO' or 'Choose THREE'. Getting partial credit is not a thing — you must select all correct answers with no incorrect ones. The stem always states how many to choose, so trust it. These questions require precision, not best-guess elimination.

Quick answer

Select Two (Multi-Select) Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Practice scenarios

Question 1mediummulti select

Read the full NAT/PAT explanation →

A data engineering team is designing a data lake on AWS for machine learning workloads. The data includes structured, semi-structured, and unstructured data. The team needs to ensure that the data is cataloged, easily discoverable, and can be queried by Amazon Athena and Amazon EMR. The team also wants to enforce fine-grained access control at the column and row level for sensitive data. Which combination of AWS services should the team use? (Select TWO.)

A
AWS Lake Formation
Lake Formation provides fine-grained access control and integrates with Glue Catalog.
B
AWS Identity and Access Management (IAM)
Why wrong: IAM provides identity and access management but not column/row-level granularity.
C
AWS Glue Data Catalog
Glue Data Catalog is the central metadata repository for Athena, EMR, and other services.
D
Amazon RDS for PostgreSQL
Why wrong: RDS is a relational database, not a data catalog for the data lake.
E
Amazon DynamoDB
Why wrong: DynamoDB is a key-value store and not used for data lake cataloging.

Select Two (Multi-Select) Questions

How to approach select two (multi-select) questions

Quick answer

Related MLS-C01 topic practice pages

Data Engineering practice questions

Machine Learning Implementation and Operations practice questions

Modeling practice questions

Exploratory Data Analysis practice questions

MLS-C01 fundamentals practice questions

MLS-C01 scenario practice questions

MLS-C01 troubleshooting practice questions

Practice scenarios

A data scientist is training a deep learning model using Amazon SageMaker. The training loss is decreasing, but the validation loss starts increasing after 10 epochs. The model is overfitting. Which TWO actions should the data scientist take to reduce overfitting? (Choose 2.)

Which TWO of the following are appropriate techniques for detecting outliers in a univariate continuous feature?

Which THREE of the following are best practices when performing exploratory data analysis on a dataset with both numerical and categorical features?

Which TWO metrics are MOST appropriate for evaluating a regression model that predicts house prices, where the business is most sensitive to large errors?

Which THREE techniques can help reduce overfitting in a neural network trained on a small dataset?

A data scientist is training a neural network for image classification. The training loss is not decreasing significantly, and the validation loss is high. Which TWO actions should the scientist take to address potential vanishing gradients?

A company is using Amazon SageMaker to train a large language model. The training job is taking too long. The data scientist wants to reduce training time without sacrificing model accuracy. Which THREE strategies are MOST appropriate?

A company is using Amazon SageMaker to build a machine learning pipeline. The pipeline includes data preprocessing, training, and evaluation steps. The company wants to ensure that the pipeline is reproducible and that artifacts are versioned. Which TWO actions should be taken? (Choose TWO.)

A data scientist is building a recommender system using Amazon SageMaker. The dataset contains user-item interactions with implicit feedback (clicks). Which THREE evaluation metrics are appropriate for this use case?

Which THREE of the following are common causes of overfitting in machine learning models?