Back to AWS Certified Machine Learning Engineer Associate MLA-C01 questions

Scenario-based practice

Select Two (Multi-Select) Questions

Practise AWS Certified Machine Learning Engineer Associate MLA-C01 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

20
scenario questions
MLA-C01
exam code
Amazon Web Services
vendor

Scenario guide

How to approach select two (multi-select) questions

Multi-select questions tell you to 'Choose TWO' or 'Choose THREE'. Getting partial credit is not a thing — you must select all correct answers with no incorrect ones. The stem always states how many to choose, so trust it. These questions require precision, not best-guess elimination.

Quick answer

Select Two (Multi-Select) Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Related practice questions

Related MLA-C01 topic practice pages

Scenario questions usually connect to one or more exam topics. Use these links to review the underlying concepts behind the scenario.

Practice set

Practice scenarios

Question 1hardmulti select
Full question →

A company is running a SageMaker endpoint serving multiple models. They need to monitor for data drift and model quality. Which THREE actions are necessary? (Choose three.)

Question 2mediummulti select
Full question →

A team is training a deep learning model on Amazon SageMaker using a custom Docker container. Which three practices should they follow to optimize training performance? (Choose three.)

Question 3mediummulti select
Full question →

A machine learning team is preparing a dataset for a regression model. The dataset contains numerical features that are on different scales (e.g., age 0-100, income 0-1,000,000). The team plans to use Amazon SageMaker to train a linear regression model. Which THREE data preparation steps should the team take to ensure the model performs well? (Select THREE.)

Question 4hardmulti select
Read the full NAT/PAT explanation →

A team is preparing text data for a natural language processing (NLP) model. They have a corpus of customer reviews. Which THREE preprocessing steps are essential to reduce noise and improve model performance?

Question 5mediummulti select
Full question →

A machine learning engineer is preparing a dataset for a multiclass classification task. The dataset has 10 features and 100,000 rows. Which TWO techniques should the engineer use to reduce the risk of overfitting during data preparation?

Question 6mediummulti select
Full question →

A data scientist is building a text classification model using a pre-trained BERT model from the Hugging Face library on SageMaker. The scientist wants to fine-tune the model on a custom dataset. Which TWO steps are necessary to set up the fine-tuning job? (Select TWO.)

Question 7hardmulti select
Full question →

A machine learning engineer is deploying a custom PyTorch model to a SageMaker endpoint for real-time inference. The model requires GPU acceleration. The engineer wants to minimize latency and cost. Which THREE actions should the engineer take? (Select THREE.)

Question 8easymulti select
Full question →

A company ingests daily log data into an S3 bucket. They need to update the existing ML training dataset with new data without reprocessing the entire history. Which two strategies should they adopt? (Choose two.)

Question 9mediummulti select
Full question →

A data scientist is performing feature engineering for a dataset with both numerical and categorical features. The data scientist wants to apply transformations that preserve the interpretability of the features. Which TWO transformations should the data scientist use? (Select TWO)

Question 10hardmulti select
Full question →

A data scientist is working with a dataset containing customer demographics and purchase history. The dataset includes categorical variables with high cardinality (e.g., ZIP code, product ID). The data scientist wants to perform feature engineering to improve model performance. Which THREE feature engineering techniques should the data scientist consider? (Choose three.)

Question 11mediummulti select
Full question →

A dataset for binary classification has a severe class imbalance (5% positive class). Which two data preparation techniques can help address this imbalance? (Choose two.)

Question 12easymulti select
Full question →

A data scientist is evaluating data quality for a machine learning project. The dataset has missing values, outliers, and inconsistent formatting. Which TWO steps should the data scientist perform during the data preparation phase? (Choose 2.)

Question 13hardmulti select
Full question →

A company is preparing a large dataset for a SageMaker built-in XGBoost model. The dataset has missing values in both numeric and categorical features, and some categorical features have high cardinality. Which THREE data preparation steps should the company take to optimize model performance? (Choose three.)

Question 14hardmulti select
Full question →

A data engineer is optimizing Amazon Athena queries on large datasets stored in S3 for machine learning data preparation. Which THREE practices improve query performance?

Question 15hardmulti select
Full question →

A company is using an AWS Step Functions state machine to orchestrate a multi-step ML deployment. The workflow includes: training a model, evaluating it, registering the model, and deploying to a staging endpoint. They need to implement an approval gate before deploying to production. Which THREE components are necessary to achieve this? (Choose three.)

Question 16hardmulti select
Read the full NAT/PAT explanation →

A company is deploying a machine learning model using Amazon SageMaker. The model is a large deep learning model that requires GPU for inference. The company expects unpredictable traffic patterns with occasional bursts. They want to minimize cost while ensuring low latency during bursts. Which TWO actions should they take? (Select TWO.)

Question 17mediummulti select
Full question →

Which TWO of the following are best practices for deploying machine learning models on SageMaker? (Select TWO.)

Question 18hardmulti select
Read the full NAT/PAT explanation →

A healthcare company deploys a model to predict patient readmission risk. The model was trained on historical data and is now showing signs of concept drift. The team needs to implement a monitoring solution that can detect drift and automatically retrain the model when drift is detected. Which THREE steps should the team take to build this solution? (Choose THREE.)

Question 19hardmulti select
Full question →

A company is building a real-time inference pipeline for an ML model. The raw data arrives in JSON format via Amazon Kinesis Data Streams. Before invoking the SageMaker endpoint, the data must be preprocessed to match the training data format. Which THREE steps should be included in the preprocessing function? (Select THREE)

Question 20hardmulti select
Full question →

You are preparing a time-series dataset for a forecasting model. Which three steps are critical to prevent data leakage during preprocessing? (Choose three.)

These MLA-C01 practice questions are part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style MLA-C01 questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.