How many Hard Difficulty Questions questions are on this page?

This page has 20 Hard Difficulty Questions scenario questions for the MLA-C01 exam, each with detailed explanations and wrong-answer analysis.

How should I approach MLA-C01 scenario questions?

Read the full scenario before looking at the answer options. Identify the constraint or requirement in the scenario, then eliminate options that are generally true but wrong for this specific case. Scenario questions reward careful reading over pattern matching.

← Back to AWS Certified Machine Learning Engineer Associate MLA-C01 questions

Scenario-based practice

Hard Difficulty Questions

Practise AWS Certified Machine Learning Engineer Associate MLA-C01 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

Start full practice test Read exam guide

scenario questions

MLA-C01

exam code

Amazon Web Services

vendor

Scenario guide

How to approach hard difficulty questions

These are the questions most candidates get wrong. They require connecting multiple concepts, reading tricky output, or knowing edge-case behaviour that isn't on most study cards. Practising them trains you to operate under uncertainty — a necessary skill on the real exam.

Quick answer

Hard Difficulty Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Practice scenarios

Question 1hardmulti select

Full question →

A company is running a SageMaker endpoint serving multiple models. They need to monitor for data drift and model quality. Which THREE actions are necessary? (Choose three.)

A
Deploy a shadow endpoint for comparison
Why wrong: Shadow endpoint is for traffic shifting, not monitoring drift.
B
Enable data capture on the endpoint
Data capture logs inference requests for monitoring.
C
Use SageMaker Debugger for monitoring
Why wrong: Debugger is for training debugging, not production monitoring.
D
Create a SageMaker Model Monitor schedule
Schedule defines how often to run monitoring jobs.
E
Configure baseline constraints from training data
Baseline constraints define expected statistical properties for drift detection.

Hard Difficulty Questions

How to approach hard difficulty questions

Quick answer

Related MLA-C01 topic practice pages

Data Preparation for Machine Learning practice questions

ML Model Development practice questions

Deployment and Orchestration of ML Workflows practice questions

ML Solution Monitoring, Maintenance and Security practice questions

MLA-C01 fundamentals practice questions

MLA-C01 scenario practice questions

MLA-C01 troubleshooting practice questions

Practice scenarios

A company is running a SageMaker endpoint serving multiple models. They need to monitor for data drift and model quality. Which THREE actions are necessary? (Choose three.)

A data scientist trained a logistic regression model on a dataset with 100 features. After training, the training accuracy is 0.99 but validation accuracy is 0.75. Which action is MOST likely to reduce overfitting?

A data engineer is processing a large dataset in Amazon S3 with AWS Glue ETL. The dataset contains timestamps in multiple time zones. The engineer needs to create a feature for hour-of-day consistent across all records. Which approach ensures correctness?

A dataset contains a numerical feature with extreme outliers. The outliers are genuine (not errors), and the ML model is a linear regression which is sensitive to outliers. Which data transformation should be applied to reduce the impact of outliers while preserving the data?

A team is preparing text data for a natural language processing (NLP) model. They have a corpus of customer reviews. Which THREE preprocessing steps are essential to reduce noise and improve model performance?

A machine learning engineer is deploying a custom PyTorch model to a SageMaker endpoint for real-time inference. The model requires GPU acceleration. The engineer wants to minimize latency and cost. Which THREE actions should the engineer take? (Select THREE.)

A company wants to use a pre-trained NLP model from SageMaker JumpStart for sentiment analysis. Which step is required to make predictions?

A team is building a regression model on a dataset with missing values in multiple features. They decide to use a k-Nearest Neighbors (k-NN) imputer. The dataset has 100,000 rows and 50 features. Which step should the team take to ensure the imputation is efficient and accurate?

Refer to the exhibit. A data engineer deploys this Glue job via CloudFormation. When running, the job fails with a timeout after 2 hours. The job processes a large dataset and expected to take 3 hours. Which change would resolve the issue?

A data engineer is optimizing Amazon Athena queries on large datasets stored in S3 for machine learning data preparation. Which THREE practices improve query performance?

Exhibit

Refer to the exhibit. A SageMaker Pipeline fails with 'Invalid output reference' at the TrainingStep. What is the most likely cause?

Exhibit