Back to AWS Certified Machine Learning Engineer Associate MLA-C01

Amazon Web Services exam questions

AWS Certified Machine Learning Engineer Associate MLA-C01 practice test

Practise RAM questions covering identification, installation, speeds, dual-channel, and troubleshooting for the MLA-C01 exam.

507
practice questions
4
topics covered
MLA-C01
exam code
Amazon Web Services
vendor

Study modes

Three ways to study

Start with the Study Sheet to learn the material, switch to Practice Tests for active recall, then take a Mock Exam to simulate the real thing.

Study Sheet

All 507 questions with correct answers and explanations already visible. Read at your own pace — no time pressure.

Start reading →

Practice Test

Answer first, then see feedback and explanation. Tracks your score per session. Best for active recall and identifying weak areas.

Mock Exam

Full timed simulation with countdown. Answers hidden until the end. Includes all question types just like the real exam.

Start mock exam →

Study Sheet

All 507 MLA-C01 questions with answers

Every question in the bank, paginated 75 per page. Correct answers and full explanations are revealed upfront — ideal for first-pass learning and pre-exam review.

7 pages · 75 questions per page · 507 total

Domain practice

Study MLA-C01 by domain

Each domain has its own study sheet and practice test. Target the areas where you're weakest instead of repeating questions you already know.

All domains with question counts →

Related practice questions

Study MLA-C01 by topic

Topic pages go deep on individual concepts — each one covers a specific exam topic with questions, explanations, and study notes.

Courseiva uses original exam-style practice questions created for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps. Learn the difference →

Sample questions

AWS Certified Machine Learning Engineer Associate MLA-C01 practice questions

Start practice test

A company is running a SageMaker endpoint serving multiple models. They need to monitor for data drift and model quality. Which THREE actions are necessary? (Choose three.)

A data scientist trained a logistic regression model on a dataset with 100 features. After training, the training accuracy is 0.99 but validation accuracy is 0.75. Which action is MOST likely to reduce overfitting?

A team is training a deep learning model on Amazon SageMaker using a custom Docker container. Which three practices should they follow to optimize training performance? (Choose three.)

A company is using SageMaker to train a neural network for image classification. The training job is taking too long. The team wants to reduce training time without sacrificing model accuracy. Which approach should they recommend?

A team is developing a model to predict customer churn. The dataset has 10,000 samples with 20 features. The target variable is binary with 15% churn rate. The team wants to use logistic regression. Which data preprocessing step is MOST important to ensure proper convergence?

A data engineer is processing a large dataset in Amazon S3 with AWS Glue ETL. The dataset contains timestamps in multiple time zones. The engineer needs to create a feature for hour-of-day consistent across all records. Which approach ensures correctness?

A dataset contains a numerical feature with extreme outliers. The outliers are genuine (not errors), and the ML model is a linear regression which is sensitive to outliers. Which data transformation should be applied to reduce the impact of outliers while preserving the data?

A data scientist is preparing a dataset for a binary classification model to predict customer churn. The dataset contains a timestamp column 'signup_date' that is not relevant for the prediction. What is the most appropriate action to handle this column?

An ML team wants to deploy a model that was trained using XGBoost in SageMaker. They want to use the built-in XGBoost algorithm container for inference. Which inference option requires the least custom code?

An ML engineer runs the CLI command shown in the exhibit. However, the training job fails immediately with an error: 'Unable to assume role'. What is the most likely cause?

Exhibit

Refer to the exhibit.

aws sagemaker create-training-job \
    --training-job-name my-training-job \
    --algorithm-specification 'TrainingImage=123456789012.dkr.ecr.us-west-2.amazonaws.com/my-custom-training:latest,TrainingInputMode=File' \
    --role-arn arn:aws:iam::123456789012:role/SageMakerExecutionRole \
    --input-data-config '[{"ChannelName":"train","DataSource":{"S3DataSource":{"S3Uri":"s3://my-bucket/train/","S3DataType":"S3Prefix"}},"ContentType":"text/csv"}]' \
    --output-data-config '{"S3OutputPath":"s3://my-bucket/output/"}' \
    --resource-config '{"InstanceType":"ml.m5.large","InstanceCount":1,"VolumeSizeInGB":30}' \
    --vpc-config '{"SecurityGroupIds":["sg-12345678"],"Subnets":["subnet-12345678"]}'

Refer to the exhibit. A data scientist creates a SageMaker Pipeline definition using the JSON shown. The pipeline runs successfully, but the scientist notices that the training step did not use the parameter 'TrainingInstanceCount' defined in Parameters. Why did this happen?

Exhibit

{
  "PipelineExperimentConfig": {
    "ExperimentName": "my-experiment",
    "TrialName": "my-trial"
  },
  "Parameters": {
    "TrainingInstanceType": "ml.m5.large",
    "TrainingInstanceCount": 2,
    "MaxRuntimeInSeconds": 86400
  },
  "Steps": [
    {
      "Name": "Preprocess",
      "Type": "Processing",
      "ProcessingJobName": "preprocess-job",
      "ProcessingResources": {
        "ClusterConfig": {
          "InstanceCount": 1,
          "InstanceType": "ml.m5.large",
          "VolumeSizeInGB": 30
        }
      }
    },
    {
      "Name": "Train",
      "Type": "Training",
      "TrainingJobName": "train-job",
      "AlgorithmSpecification": {
        "TrainingImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-algo:latest",
        "TrainingInputMode": "File"
      },
      "ResourceConfig": {
        "InstanceCount": 2,
        "InstanceType": "ml.m5.large",
        "VolumeSizeInGB": 30
      }
    }
  ]
}

A machine learning team is preparing a dataset for a regression model. The dataset contains numerical features that are on different scales (e.g., age 0-100, income 0-1,000,000). The team plans to use Amazon SageMaker to train a linear regression model. Which THREE data preparation steps should the team take to ensure the model performs well? (Select THREE.)

A data scientist is working on a time series forecasting problem. The dataset contains a column 'sales' with occasional negative values due to returns. The model expects non-negative input. Which data preparation step should be taken?

Question 14hardmulti select
Read the full NAT/PAT explanation →

A team is preparing text data for a natural language processing (NLP) model. They have a corpus of customer reviews. Which THREE preprocessing steps are essential to reduce noise and improve model performance?

A team is using Amazon SageMaker Processing for data preprocessing. They have a Parquet dataset in Amazon S3. Which configuration will provide the most efficient reading of the dataset during processing?

A machine learning engineer is preparing a dataset for a multiclass classification task. The dataset has 10 features and 100,000 rows. Which TWO techniques should the engineer use to reduce the risk of overfitting during data preparation?

A machine learning engineer is preparing a dataset that contains both numerical and categorical features. The categorical features have high cardinality (e.g., zip code with thousands of unique values). Which technique is most appropriate for encoding these high-cardinality categorical features?

A data scientist is building a text classification model using a pre-trained BERT model from the Hugging Face library on SageMaker. The scientist wants to fine-tune the model on a custom dataset. Which TWO steps are necessary to set up the fine-tuning job? (Select TWO.)

A machine learning engineer is deploying a custom PyTorch model to a SageMaker endpoint for real-time inference. The model requires GPU acceleration. The engineer wants to minimize latency and cost. Which THREE actions should the engineer take? (Select THREE.)

A data scientist is preparing a large dataset for training a binary classification model. The dataset has a severe class imbalance (95% negative, 5% positive). Which data preparation technique should the scientist use to address this imbalance without losing too much data?

A company wants to use a pre-trained NLP model from SageMaker JumpStart for sentiment analysis. Which step is required to make predictions?

A data scientist is training a binary classification model using imbalanced data where the positive class is only 1% of the dataset. The scientist wants to maximize the recall for the positive class while maintaining reasonable precision. Which evaluation metric is most appropriate to tune during model selection?

An ML engineer needs to split a dataset into training, validation, and test sets. The dataset has a time-based column that should not be leaked. Which split method is most appropriate?

A team is building a regression model on a dataset with missing values in multiple features. They decide to use a k-Nearest Neighbors (k-NN) imputer. The dataset has 100,000 rows and 50 features. Which step should the team take to ensure the imputation is efficient and accurate?

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

Exam question guide

How to use these MLA-C01 questions

Use these questions as active recall, not passive reading. Try the question first, review the answer choices, then open the explanation and connect the result back to the exam topic.

Quick answer

RAM tests your ability to identify, install, and troubleshoot memory types, speeds, and configurations for PCs.

Identifying DDR3 vs DDR4 vs DDR5 physical and electrical differences

Matching RAM speed (MHz) to motherboard and CPU support

Calculating total memory capacity from module size and slots

Troubleshooting common RAM errors like beep codes and blue screens

These MLA-C01 practice questions are part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style MLA-C01 questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.