Back to AWS Certified Machine Learning Specialty MLS-C01

Amazon Web Services exam questions

AWS Certified Machine Learning Specialty MLS-C01 practice test

Practise RAM questions covering identification, installation, speeds, dual-channel, and troubleshooting for the MLS-C01 exam.

1,755
practice questions
4
topics covered
MLS-C01
exam code
Amazon Web Services
vendor

Study modes

Three ways to study

Start with the Study Sheet to learn the material, switch to Practice Tests for active recall, then take a Mock Exam to simulate the real thing.

Study Sheet

All 1,755 questions with correct answers and explanations already visible. Read at your own pace — no time pressure.

Start reading →

Practice Test

Answer first, then see feedback and explanation. Tracks your score per session. Best for active recall and identifying weak areas.

Mock Exam

Full timed simulation with countdown. Answers hidden until the end. Includes all question types just like the real exam.

Start mock exam →

Study Sheet

All 1,755 MLS-C01 questions with answers

Every question in the bank, paginated 75 per page. Correct answers and full explanations are revealed upfront — ideal for first-pass learning and pre-exam review.

24 pages · 75 questions per page · 1,755 total

Domain practice

Study MLS-C01 by domain

Each domain has its own study sheet and practice test. Target the areas where you're weakest instead of repeating questions you already know.

All domains with question counts →

Related practice questions

Study MLS-C01 by topic

Topic pages go deep on individual concepts — each one covers a specific exam topic with questions, explanations, and study notes.

Courseiva uses original exam-style practice questions created for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps. Learn the difference →

Sample questions

AWS Certified Machine Learning Specialty MLS-C01 practice questions

Start practice test
Question 1mediummultiple choice
Read the full Data Engineering explanation →

A company is using Amazon Kinesis Data Streams to ingest real-time clickstream data. The data is consumed by a Lambda function that writes to an S3 bucket. Recently, the Lambda function started failing with 'ProvisionedThroughputExceededException' errors. What is the MOST likely cause?

A team is building a data pipeline to process terabytes of log data daily using Amazon EMR. The data arrives in 5-minute windows and must be available for querying within 30 minutes. The data is originally in gzip-compressed CSV files. Which approach will minimize processing time and cost?

Question 3mediummultiple choice
Read the full Data Engineering explanation →

A data science team is building a real-time fraud detection system. Transactions are streamed via Amazon Kinesis Data Streams, and a Lambda function performs feature engineering and invokes an Amazon SageMaker endpoint for predictions. The team notices that the Lambda function is timing out and causing data loss. Which solution should the team implement to process the stream reliably and at low latency?

A company uses Amazon SageMaker to train and deploy machine learning models. The training data is stored in Amazon S3 (Parquet format, 10 TB). The data scientists have been running training jobs using the File mode input, but the jobs are taking too long due to data download time. They want to reduce the training start-up time and overall training time. Which solution is MOST cost-effective and efficient?

Question 5easymultiple choice
Read the full NAT/PAT explanation →

A data engineer is building a data pipeline to process user clickstream data. The data arrives as JSON files in an S3 bucket. The pipeline must transform the JSON into Parquet format and partition by date and event type, then make the data available for Amazon Athena queries. The engineer needs a fully managed, serverless solution with minimal operational overhead. Which combination of AWS services should the engineer use?

Question 6mediummulti select
Read the full NAT/PAT explanation →

A data engineering team is designing a data lake on AWS for machine learning workloads. The data includes structured, semi-structured, and unstructured data. The team needs to ensure that the data is cataloged, easily discoverable, and can be queried by Amazon Athena and Amazon EMR. The team also wants to enforce fine-grained access control at the column and row level for sensitive data. Which combination of AWS services should the team use? (Select TWO.)

A data scientist needs to transform raw JSON data from an S3 bucket into Parquet format using AWS Glue. The job must be cost-effective and run only when new data arrives. Which solution should be used?

Question 8mediummultiple choice
Read the full Data Engineering explanation →

A company uses AWS Glue to catalog data in S3. Data is partitioned by year, month, day. The Glue crawler runs daily but sometimes misses new partitions. What should be done to ensure all partitions are cataloged?

A company needs to build a data lake on AWS for analytics. The data includes structured, semi-structured, and unstructured data. The solution must support schema-on-read, provide fine-grained access control, and be cost-effective for storing rarely accessed data. Which THREE services should be used? (Choose THREE)

A data engineer created an IAM policy to allow a Glue ETL job to read and write objects to an S3 bucket. The ETL job fails when writing data with the error 'Access Denied'. The job is configured to use SSE-S3 (AES256) encryption. What is the likely issue?

Exhibit

Refer to the exhibit.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-data-lake/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-server-side-encryption": "AES256"
        }
      }
    }
  ]
}

A company runs a real-time fraud detection system using Amazon Kinesis Data Streams with 100 shards. Data is consumed by a custom Java application running on Amazon EC2 instances in an Auto Scaling group. The application processes records and writes results to a DynamoDB table. Over the past month, the application has experienced intermittent slowdowns and the DynamoDB write capacity has been fully utilized during peak hours. The team wants to improve throughput without losing the ability to reprocess failed records. The application currently uses the Kinesis Client Library (KCL) with DynamoDB as the lease table. The team is considering the following changes: A. Increase the number of EC2 instances to match the number of shards. B. Switch to using AWS Lambda as the consumer to handle scaling automatically. C. Increase the write capacity of the DynamoDB lease table to handle more workers. D. Use enhanced fan-out to have each consumer receive its own 2 MB/second shard throughput. Which change should the team implement first to address the issue?

A data scientist is performing EDA on a dataset with 1,000 features and 10,000 rows. The target variable is binary. After checking for multicollinearity, the scientist finds many pairs of features with correlation > 0.95. Which action should be taken to prepare the data for modeling?

Match each hyperparameter tuning strategy to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Exhaustive search over specified hyperparameter values

Random sampling of hyperparameter combinations

Probabilistic model to guide search

Early stopping and resource allocation

SageMaker automatic tuning

Match each AWS AI service to its capability.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Natural language processing

Language translation

Text-to-speech

Speech-to-text

Conversational chatbots

A data scientist is analyzing a dataset with missing values in 30% of the rows for the 'age' column. The data scientist decides to impute the missing values with the median of the observed 'age' values. What is a potential drawback of this approach?

Question 16hardmultiple choice
Read the full Modeling explanation →

An e-commerce company uses a linear regression model to predict customer lifetime value (LTV). The model shows high variance on the test set, with training RMSE much lower than test RMSE. Which of the following is the MOST effective approach to reduce overfitting?

Question 17hardmulti select
Read the full Modeling explanation →

A data scientist is training a deep learning model using Amazon SageMaker. The training loss is decreasing, but the validation loss starts increasing after 10 epochs. The model is overfitting. Which TWO actions should the data scientist take to reduce overfitting? (Choose 2.)

A data scientist is exploring a dataset with 500 features and 10,000 samples. The data scientist computes the pairwise correlation matrix and finds that many features have correlations above 0.9. The data scientist wants to reduce the dataset to 50 features while preserving as much variance as possible. Which technique should be used?

A data scientist is analyzing a dataset of customer reviews. The dataset contains a text column 'review' and a numerical rating from 1 to 5. The data scientist wants to create features for sentiment analysis. Which THREE preprocessing steps should be applied to the text data before feature extraction? (Choose THREE.)

During EDA, a data scientist notices that a feature has a high proportion of missing values (e.g., 70%). The feature is continuous and expected to be important based on domain knowledge. What is the best approach to handle this?

Question 21easymultiple choice
Read the full NAT/PAT explanation →

During EDA, a data scientist creates a scatter matrix of numerical features and notices that some features have a funnel-shaped pattern (variance increases with the mean). What is the appropriate transformation to stabilize variance?

Which TWO of the following are appropriate techniques for detecting outliers in a univariate continuous feature?

Which THREE of the following are best practices when performing exploratory data analysis on a dataset with both numerical and categorical features?

Question 24mediummultiple choice
Read the full NAT/PAT explanation →

A data scientist is performing exploratory data analysis on a dataset containing customer transactions. The dataset has 1 million rows with 50 features, including numerical and categorical variables. The goal is to identify patterns and potential data quality issues before building a model. Which approach should the data scientist take to efficiently explore the data?

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

Exam question guide

How to use these MLS-C01 questions

Use these questions as active recall, not passive reading. Try the question first, review the answer choices, then open the explanation and connect the result back to the exam topic.

Quick answer

RAM tests your ability to identify, install, and troubleshoot memory types, speeds, and configurations for PCs.

Identifying DDR3 vs DDR4 vs DDR5 physical and electrical differences

Matching RAM speed (MHz) to motherboard and CPU support

Calculating total memory capacity from module size and slots

Troubleshooting common RAM errors like beep codes and blue screens

These MLS-C01 practice questions are part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style MLS-C01 questions with detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics.