Question 1,463 of 1,755
ModelingeasyMultiple ChoiceObjective-mapped

Quick Answer

The answer is SageMaker automatic scaling, which is the correct feature for auto-scaling SageMaker endpoints for real-time inference. This capability, powered by Application Auto Scaling, allows you to define scaling policies based on metrics like CPU utilization or request latency, automatically adjusting the number of instances to match variable traffic patterns while maintaining low latency. On the AWS Certified Machine Learning Specialty MLS-C01 exam, this concept tests your understanding of operational efficiency versus manual instance management; a common trap is confusing automatic scaling with multi-model endpoints or Elastic Inference, which address different concerns like model isolation or cost-effective GPU acceleration. Remember that automatic scaling is about dynamic capacity adjustment, not hardware acceleration. A useful memory tip is to think of it as "traffic-responsive instance count"—if traffic spikes, instances scale out; if it drops, they scale in, ensuring you pay only for what you need without sacrificing performance.

MLS-C01 Modeling Practice Question

This MLS-C01 practice question tests your understanding of modeling. Read the scenario carefully and evaluate each option against the stated constraints before committing to an answer. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

A machine learning engineer is deploying a model to Amazon SageMaker for real-time inference. The model requires low latency and must handle variable traffic patterns. Which SageMaker feature should the engineer use to automatically scale the number of instances based on demand?

Question 1easymultiple choice
Read the full NAT/PAT explanation →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

SageMaker automatic scaling

SageMaker automatic scaling (Application Auto Scaling) is the correct feature because it allows the engineer to define scaling policies (e.g., based on CPU utilization or request latency) that automatically adjust the number of instances behind a SageMaker endpoint in response to real-time traffic patterns. This ensures low latency by maintaining sufficient capacity during spikes and reducing costs during lulls, without manual intervention.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

  • SageMaker automatic scaling

    Why this is correct

    SageMaker integrates with Application Auto Scaling to scale the number of instances based on demand.

    Related concept

    Read the scenario before looking for a memorised answer.

  • Amazon EC2 Auto Scaling

    Why it's wrong here

    Amazon EC2 Auto Scaling cannot directly scale SageMaker endpoints; SageMaker uses its own scaling mechanisms.

  • Elastic Inference

    Why it's wrong here

    Elastic Inference provides GPU acceleration but does not handle automatic scaling of instances.

  • SageMaker Batch Transform

    Why it's wrong here

    Batch Transform is for offline predictions and does not provide real-time inference or auto scaling.

Common exam traps

Common exam trap: answer the scenario, not the keyword

The trap here is that candidates confuse Amazon EC2 Auto Scaling (which scales EC2 instances in an Auto Scaling group) with SageMaker automatic scaling (which scales SageMaker endpoint instances via Application Auto Scaling), leading them to pick B even though it does not directly apply to SageMaker endpoints.

Detailed technical explanation

How to think about this question

Under the hood, SageMaker automatic scaling uses AWS Application Auto Scaling with a scaling policy that targets a metric like SageMakerVariantInvocationsPerInstance or CPUUtilization. The scaling cooldown period (default 300 seconds) prevents rapid oscillations, and you can register a scalable target with a MinCapacity and MaxCapacity to control cost. In a real-world scenario, a model serving a mobile app with unpredictable user spikes (e.g., Black Friday) would use this to scale from 2 to 20 instances within minutes, while maintaining sub-100ms latency.

KKey Concepts to Remember

  • Read the scenario before looking for a memorised answer.
  • Find the constraint that changes the correct option.
  • Eliminate answers that are true in general but not in this case.

TExam Day Tips

  • Watch for words such as best, first, most likely and least administrative effort.
  • Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

A startup's cloud architect reviews their monthly bill and notices costs are higher than expected for a long-running batch job. Switching from on-demand instances to Reserved Instances — or using Spot/Preemptible VMs — can reduce compute costs by up to 72 %. Questions like this test whether you understand the tradeoffs between commitment, flexibility, and cost across cloud pricing models.

What to study next

Got this wrong? Here's your next step.

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

Related practice questions

Related MLS-C01 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Practice this exam

Start a free MLS-C01 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

FAQ

Questions learners often ask

What does this MLS-C01 question test?

Modeling — This question tests Modeling — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: SageMaker automatic scaling — SageMaker automatic scaling (Application Auto Scaling) is the correct feature because it allows the engineer to define scaling policies (e.g., based on CPU utilization or request latency) that automatically adjust the number of instances behind a SageMaker endpoint in response to real-time traffic patterns. This ensures low latency by maintaining sufficient capacity during spikes and reducing costs during lulls, without manual intervention.

What should I do if I get this MLS-C01 question wrong?

Identify which exam domain this question belongs to, review the core concept, then practise similar questions from the same domain.

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Same concept, more angles

1 more ways this is tested on MLS-C01

These questions test the same concept from different angles. Work through them to make sure you can recognise it however the exam phrases it.

Variation 1. A machine learning engineer is deploying a model using Amazon SageMaker and wants to automatically scale the endpoint based on the number of incoming requests. Which scaling policy should be used?

easy
  • A.Step scaling
  • B.Scheduled scaling
  • C.Target tracking scaling
  • D.Simple scaling

Why C: SageMaker endpoints support Application Auto Scaling, which can use a target tracking scaling policy based on a metric like InvocationsPerInstance. Simple scaling and step scaling are also possible but target tracking is simpler. Scheduled scaling is for predictable traffic. Option A: Target tracking scaling is correct. Option B: Simple scaling requires manual thresholds. Option C: Step scaling is more complex. Option D: Scheduled scaling is for predictable patterns.

Keep practising

More MLS-C01 practice questions

Last reviewed: Jun 24, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

This MLS-C01 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the MLS-C01 exam.