← MLS-C01·Amazon Web Services

Question 974 of 1,755

Modeling →hardMultiple ChoiceObjective-mapped

Quick Answer

The answer is one ml.c5.xlarge instance with auto-scaling up to 2 instances. This configuration is correct because the ml.c5.xlarge offers 4 vCPUs and 8 GB of memory, comfortably exceeding the 4 GB requirement for the large ensemble model, while its compute-optimized design keeps latency under 200 ms for real-time inference. Auto-scaling to a maximum of two instances ensures the endpoint can handle 100 concurrent requests during peak traffic without over-provisioning, minimizing cost during low-traffic periods by scaling in. On the AWS Certified Machine Learning Specialty MLS-C01 exam, this scenario tests your ability to balance cost and latency when choosing SageMaker instances for real-time inference—a common trap is selecting a larger single instance (like ml.c5.2xlarge) that wastes resources, or a memory-optimized family (like ml.r5) that isn’t needed for compute-bound models. Remember the mnemonic “C for Compute, Scale for Savings” to recall that the c5 family handles compute-heavy inference, and auto-scaling trims costs.

MLS-C01 Modeling Practice Question

This MLS-C01 practice question tests your understanding of modeling. This is a configuration task: choose the command set that satisfies every stated requirement. Small differences — like 'secret' vs 'password' or 'transport input ssh' vs 'all' — change whether the answer is correct. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

A machine learning engineer is deploying a model to an Amazon SageMaker endpoint for real-time inference. The model is a large ensemble that requires 4 GB of memory. The engineer wants to minimize cost while ensuring the endpoint can handle up to 100 concurrent requests with a latency under 200 ms. Which instance configuration is most appropriate?

Clue words in this question

Noticing these words before you look at the options changes how you read each choice.

Clue: "minimum / minimize"
Why it matters: Asks for the least resource use — fewest addresses, smallest subnet, lowest overhead. Eliminate over-provisioned options even if they would technically work.

Question 1hardmultiple choice

Full question →

A
Two ml.t3.medium instances behind a load balancer.
Why wrong: t3.medium has 4 GB memory each but not optimized for inference; cost may be higher.
B
One ml.c5.xlarge instance with auto-scaling up to 2 instances.
ml.c5.xlarge has 4 GB memory, cost-effective, and auto-scaling handles load.
C
One ml.m5.2xlarge instance.
Why wrong: ml.m5.2xlarge has 16 GB memory, more than needed and more expensive.
D
One ml.p3.2xlarge instance.
Why wrong: GPU instance is overkill and costly.

Full breakdown with real-world context →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

✓

One ml.c5.xlarge instance with auto-scaling up to 2 instances.

Option B is correct because the ml.c5.xlarge instance provides sufficient compute (4 vCPUs, 8 GB memory) for the 4 GB model, and auto-scaling up to 2 instances allows handling 100 concurrent requests with low latency while minimizing cost during low traffic. The ml.c5 family is optimized for compute-intensive inference, and auto-scaling ensures the endpoint scales out only when needed, avoiding over-provisioning.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

✗
Two ml.t3.medium instances behind a load balancer.
Why it's wrong here
t3.medium has 4 GB memory each but not optimized for inference; cost may be higher.
✓
One ml.c5.xlarge instance with auto-scaling up to 2 instances.
Why this is correct
ml.c5.xlarge has 4 GB memory, cost-effective, and auto-scaling handles load.
Clue confirmation
The clue word "minimum / minimize" in the question point toward this answer.
Related concept
Read the scenario before looking for a memorised answer.
✗
One ml.m5.2xlarge instance.
Why it's wrong here
ml.m5.2xlarge has 16 GB memory, more than needed and more expensive.
✗
One ml.p3.2xlarge instance.
Why it's wrong here
GPU instance is overkill and costly.

Common exam traps

Common exam trap: answer the scenario, not the keyword

The trap here is that candidates often choose a single large instance (like ml.m5.2xlarge) thinking it simplifies management, but auto-scaling with a smaller instance type is more cost-effective and still meets latency requirements under variable load.

Detailed technical explanation

How to think about this question

SageMaker auto-scaling uses a target tracking scaling policy based on a custom metric like 'InvocationsPerInstance' or 'SageMakerVariantInvocationsPerInstance', which adjusts the instance count to maintain a target utilization. The ml.c5.xlarge uses Intel Xeon Scalable processors with AVX-512 instructions, which can accelerate ensemble model inference through vectorized operations. In practice, a single ml.c5.xlarge can handle ~50 concurrent requests with sub-200 ms latency for a 4 GB model, so scaling to 2 instances covers the 100-request peak.

KKey Concepts to Remember

Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.

TExam Day Tips

Watch for words such as best, first, most likely and least administrative effort.
Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A startup's cloud architect reviews their monthly bill and notices costs are higher than expected for a long-running batch job. Switching from on-demand instances to Reserved Instances — or using Spot/Preemptible VMs — can reduce compute costs by up to 72 %. Questions like this test whether you understand the tradeoffs between commitment, flexibility, and cost across cloud pricing models.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related MLS-C01 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Data Engineering practice questions

Practise MLS-C01 questions linked to Data Engineering.

Machine Learning Implementation and Operations practice questions

Practise MLS-C01 questions linked to Machine Learning Implementation and Operations.

Modeling practice questions

Practise MLS-C01 questions linked to Modeling.

Exploratory Data Analysis practice questions

Practise MLS-C01 questions linked to Exploratory Data Analysis.

MLS-C01 fundamentals practice questions

Practise MLS-C01 questions linked to MLS-C01 fundamentals.

MLS-C01 scenario practice questions

Practise MLS-C01 questions linked to MLS-C01 scenario.

MLS-C01 troubleshooting practice questions

Practise MLS-C01 questions linked to MLS-C01 troubleshooting.

Practice this exam

Start a free MLS-C01 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

10 questions 20 questions 30 questions 50 questions Timed 30

MLS-C01 practice-test guide →Study guide →Browse all practice tests

FAQ

Questions learners often ask

What does this MLS-C01 question test?

Modeling — This question tests Modeling — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: One ml.c5.xlarge instance with auto-scaling up to 2 instances. — Option B is correct because the ml.c5.xlarge instance provides sufficient compute (4 vCPUs, 8 GB memory) for the 4 GB model, and auto-scaling up to 2 instances allows handling 100 concurrent requests with low latency while minimizing cost during low traffic. The ml.c5 family is optimized for compute-intensive inference, and auto-scaling ensures the endpoint scales out only when needed, avoiding over-provisioning.

What should I do if I get this MLS-C01 question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Are there clue words in this question I should notice?

Yes — watch for: "minimum / minimize". Asks for the least resource use — fewest addresses, smallest subnet, lowest overhead. Eliminate over-provisioned options even if they would technically work.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Same concept, more angles

2 more ways this is tested on MLS-C01

These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.

Variation 1. A machine learning engineer is using Amazon SageMaker to deploy a model for real-time inference. The model is a large ensemble that requires 4 GB of memory and has a latency requirement of 100 ms. Which instance type and deployment configuration should the engineer choose to optimize cost while meeting requirements?

hard

✓ A.ml.m5.large (2 vCPU, 8 GB memory)
B.SageMaker Serverless Inference
C.ml.c5.large (2 vCPU, 4 GB memory)
D.ml.p3.2xlarge (8 vCPU, 61 GB memory, 1 GPU)

Why A: ml.m5.large provides 8 GB memory and is cost-effective for real-time inference with moderate latency requirements. Option A is wrong because ml.c5.large has only 4 GB memory, insufficient for 4 GB model plus overhead. Option B is wrong because ml.p3.2xlarge is GPU-accelerated and expensive, overkill for this model. Option D is wrong because Serverless Inference has cold start latency that may exceed 100 ms.

Variation 2. A machine learning engineer is using Amazon SageMaker to deploy a model for real-time inference. The model must respond within 100 milliseconds. The initial deployment uses a single ml.m5.large instance, but latency is too high. Which change should the engineer make to reduce latency?

easy

✓ A.Switch to a compute-optimized instance like ml.c5.2xlarge.
B.Use batch transform instead of real-time endpoint.
C.Deploy to a single ml.t2.medium instance to reduce cost.
D.Deploy the model on a multi-model endpoint.

Why A: Option A is correct because using a more powerful instance reduces inference time. Option B is wrong because multi-model endpoint can lead to resource contention. Option C is wrong because batch transforms are for offline predictions. Option D is wrong because scaling down reduces resources.

Keep practising

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

This MLS-C01 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the MLS-C01 exam.