Back to AWS Certified Machine Learning Engineer Associate MLA-C01 questions

Scenario-based practice

Troubleshooting Scenario Questions

Practise AWS Certified Machine Learning Engineer Associate MLA-C01 practice questions — original exam-style scenarios covering every exam domain, with detailed explanations, wrong-answer analysis, and common exam traps.

13
scenario questions
MLA-C01
exam code
Amazon Web Services
vendor

Scenario guide

How to approach troubleshooting scenario questions

These questions describe a network symptom and ask you to identify the root cause or the correct fix. They appear across all certification exams and reward systematic thinking over memorisation. The best candidates follow a consistent troubleshooting framework even under time pressure.

Quick answer

Troubleshooting Scenario Questions questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Related practice questions

Related MLA-C01 topic practice pages

Scenario questions usually connect to one or more exam topics. Use these links to review the underlying concepts behind the scenario.

Practice set

Practice scenarios

Question 1easymultiple choice
Full question →

An ML engineer runs the CLI command shown in the exhibit. However, the training job fails immediately with an error: 'Unable to assume role'. What is the most likely cause?

Exhibit

Refer to the exhibit.

aws sagemaker create-training-job \
    --training-job-name my-training-job \
    --algorithm-specification 'TrainingImage=123456789012.dkr.ecr.us-west-2.amazonaws.com/my-custom-training:latest,TrainingInputMode=File' \
    --role-arn arn:aws:iam::123456789012:role/SageMakerExecutionRole \
    --input-data-config '[{"ChannelName":"train","DataSource":{"S3DataSource":{"S3Uri":"s3://my-bucket/train/","S3DataType":"S3Prefix"}},"ContentType":"text/csv"}]' \
    --output-data-config '{"S3OutputPath":"s3://my-bucket/output/"}' \
    --resource-config '{"InstanceType":"ml.m5.large","InstanceCount":1,"VolumeSizeInGB":30}' \
    --vpc-config '{"SecurityGroupIds":["sg-12345678"],"Subnets":["subnet-12345678"]}'
Question 2easymultiple choice
Full question →

Refer to the exhibit. A user is unable to invoke a SageMaker endpoint. The IAM policy shown is attached to the user. Which permission is missing to allow invocation?

Exhibit

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:DescribeEndpoint",
        "sagemaker:ListEndpoints"
      ],
      "Resource": "*"
    }
  ]
}
Question 3mediummultiple choice
Full question →

A machine learning engineer is troubleshooting a model that is producing unexpectedly low accuracy in production. The engineer examines the model's training data and finds that the distribution of the target variable in production is significantly different from the training set. What type of drift is the model experiencing?

Question 4mediummultiple choice
Full question →

A SageMaker Processing job fails with the error: 'Unable to parse CSV file due to inconsistent number of columns'. The data is stored as CSV in S3. What is the most likely cause?

Question 5hardmultiple choice
Full question →

A team uses SageMaker Neo to compile a model for deployment on a target device. After compilation, they deploy the compiled model to a SageMaker endpoint using the Neo-optimized container. The endpoint fails to start with error "RuntimeError: Unable to load model". What could be the issue?

Question 6mediummultiple choice
Full question →

A team used the above config to create an endpoint. However, the endpoint fails to invoke because of a "ModelError". What is the most likely cause?

Network Topology
aws sagemaker create-endpoint-configendpoint-config-name my-configproduction-variants VariantName=variant1Refer to the exhibit.```"ModelName": "my-model","PrimaryContainer": {"Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference:1.8.1-cpu-py36-ubuntu16.04","ModelDataUrl": "s3://my-bucket/model.tar.gz"},"ExecutionRoleArn": "arn:aws:iam::123456789012:role/SageMakerRole"And a CLI command:
Question 7mediummultiple choice
Full question →

A company is training a deep learning model on Amazon SageMaker. The training job started but has been stuck in 'InProgress' state for an unusually long time with low CPU utilization. The data scientist suspects a bottleneck. What should be the first troubleshooting step?

Question 8easymultiple choice
Full question →

A company uses Amazon SageMaker to deploy a real-time inference endpoint. They notice increased latency in predictions during peak hours. Which should they investigate first to address the issue?

Question 9mediummultiple choice
Full question →

Refer to the exhibit. A data engineer investigates why a SageMaker endpoint is returning errors. The endpoint configuration has been updated to point to a new model version. What is the MOST likely cause of the error?

Exhibit

[ERROR] 2024-03-15 10:23:45,123 - sagemaker - 1321 - root - ERROR - InvocationException: Received response status code 404 from container. Error: ResourceNotFoundException: Model 'my-model-v2' is not found. You may be using an outdated endpoint configuration.
Question 10mediummultiple choice
Full question →

A data scientist is training a binary classification model using Amazon SageMaker. The dataset has a severe class imbalance (95% negative, 5% positive). The model achieves 99% accuracy but fails to identify positive cases correctly. Which action should the data scientist take to improve the model's ability to detect positive cases?

Question 11hardmultiple choice
Full question →

A company is deploying a ML model for real-time fraud detection using SageMaker. The model must process requests within 50 ms and scale to handle up to 10,000 requests per second during peak hours. The data includes PII, so all traffic must stay within a VPC. The team has configured the SageMaker endpoint with a VPC and an internet gateway for model downloads. During a load test, the endpoint fails to achieve the required throughput. Which change would most likely resolve the issue?

Question 12hardmultiple choice
Full question →

A team is using SageMaker Pipelines to train a model. The pipeline has multiple steps: data processing, training, evaluation, and registration. They use a Condition step to evaluate the model's accuracy and if it exceeds a threshold, register the model. They run the pipeline and the training step succeeds, but the pipeline fails at the Condition step with an error: 'Unable to evaluate condition: the property 'Accuracy' does not exist.' The evaluation step output is a JSON file with key 'accuracy'. What is the most likely cause?

Question 13hardmultiple choice
Full question →

A financial services company uses Amazon SageMaker to deploy a fraud detection model for real-time inference. The model is deployed on an ml.m5.large instance with a SageMaker real-time endpoint. The endpoint has an auto scaling policy configured using a custom scaling policy based on average CPU utilization, with scale out threshold at 70% and scale in threshold at 30%. During a flash sale event, the traffic to the endpoint spikes tenfold within minutes. The endpoint fails to handle the load, resulting in increased latency and timeouts. The data science team needs to improve the scalability of the endpoint to handle sudden traffic spikes. Which solution should the team implement?

These MLA-C01 practice questions are part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style MLA-C01 questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.