← MLA-C01·Amazon Web Services

Question 308 of 507

Deployment and Orchestration of ML Workflows →hardMultiple ChoiceObjective-mapped

MLA-C01 Deployment and Orchestration of ML Workflows Practice Question

This MLA-C01 practice question tests your understanding of deployment and orchestration of ml workflows. The scenario asks you to isolate a root cause — eliminate options that address a different problem before choosing. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

A financial services company uses Amazon SageMaker to deploy a fraud detection model for real-time inference. The model is deployed on an ml.m5.large instance with a SageMaker real-time endpoint. The endpoint has an auto scaling policy configured using a custom scaling policy based on average CPU utilization, with scale out threshold at 70% and scale in threshold at 30%. During a flash sale event, the traffic to the endpoint spikes tenfold within minutes. The endpoint fails to handle the load, resulting in increased latency and timeouts. The data science team needs to improve the scalability of the endpoint to handle sudden traffic spikes. Which solution should the team implement?

Question 1hardmultiple choice

Full question →

A
Implement a SageMaker Model Ensemble with two additional models to balance the load.
Why wrong: Adding models increases the computational load per request, worsening the latency issue.
B
Replace the custom scaling policy with a target tracking scaling policy based on the number of invocations per instance, with a target value of 1000.
Target tracking on request count provides faster reaction to traffic spikes because it directly measures the traffic, whereas CPU utilization is a lagging indicator.
C
Implement a SageMaker Inference Pipeline with a pre-processing step to reduce model input size.
Why wrong: An inference pipeline adds extra pre-processing time, increasing latency, and does not improve scaling responsiveness.
D
Switch to a GPU instance type, such as ml.p3.2xlarge, to increase compute capacity.
Why wrong: While GPU instances may process requests faster, the scaling policy still lags, so the endpoint will still fail to scale in time.

Full breakdown with real-world context →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

✓

Replace the custom scaling policy with a target tracking scaling policy based on the number of invocations per instance, with a target value of 1000.

Option D is correct because target tracking scaling policies based on request count respond faster to traffic spikes than CPU-based scaling, which suffers from lag. Option A is incorrect because GPU instances do not address the scaling policy lag. Option B is incorrect because model ensemble increases compute load. Option C is incorrect because inference pipelines add latency, not reduce it.

Key principle: Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

✗
Implement a SageMaker Model Ensemble with two additional models to balance the load.
Why it's wrong here
Adding models increases the computational load per request, worsening the latency issue.
✓
Replace the custom scaling policy with a target tracking scaling policy based on the number of invocations per instance, with a target value of 1000.
Why this is correct
Target tracking on request count provides faster reaction to traffic spikes because it directly measures the traffic, whereas CPU utilization is a lagging indicator.
Related concept
Read the scenario before looking for a memorised answer.
✗
Implement a SageMaker Inference Pipeline with a pre-processing step to reduce model input size.
Why it's wrong here
An inference pipeline adds extra pre-processing time, increasing latency, and does not improve scaling responsiveness.
✗
Switch to a GPU instance type, such as ml.p3.2xlarge, to increase compute capacity.
Why it's wrong here
While GPU instances may process requests faster, the scaling policy still lags, so the endpoint will still fail to scale in time.

Common exam traps

Common exam trap: answer the scenario, not the keyword

Many certification questions include familiar terms but test a specific constraint. Read the exact wording before choosing an answer that is generally true but wrong for this case.

Detailed technical explanation

How to think about this question

This question should be treated as a scenario, not a definition check. Identify the problem, the constraint and the best action. Then compare each option against those facts.

KKey Concepts to Remember

Read the scenario before looking for a memorised answer.
Find the constraint that changes the correct option.
Eliminate answers that are true in general but not in this case.
Use explanations to understand the rule behind the answer.

TExam Day Tips

Underline the problem statement mentally.
Watch for words such as best, first, most likely and least administrative effort.
Review why wrong options are wrong, not only why the correct option is correct.

Key takeaway

Answer the scenario, not the keyword: identify the specific constraint before choosing the most familiar-sounding option.

Real-world example

How this comes up in practice

An e-commerce site experiences heavy traffic on Black Friday and near-zero traffic during off-peak weeks. Rather than provisioning permanent large VMs, the team uses auto-scaling groups that add capacity automatically under load and reduce it overnight. Questions like this test whether you understand elasticity, availability zones, and cloud compute scaling patterns.

What to study next

Got this wrong? Here's your next step.

Identify which MLA-C01 exam domain this question belongs to, then review the specific concept being tested. Practise related questions in that domain and focus on understanding why each wrong answer is tempting — not just why the correct answer is right.

Related MLA-C01 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Data Preparation for Machine Learning practice questions

Practise MLA-C01 questions linked to Data Preparation for Machine Learning.

ML Model Development practice questions

Practise MLA-C01 questions linked to ML Model Development.

Deployment and Orchestration of ML Workflows practice questions

Practise MLA-C01 questions linked to Deployment and Orchestration of ML Workflows.

ML Solution Monitoring, Maintenance and Security practice questions

Practise MLA-C01 questions linked to ML Solution Monitoring, Maintenance and Security.

MLA-C01 fundamentals practice questions

Practise MLA-C01 questions linked to MLA-C01 fundamentals.

MLA-C01 scenario practice questions

Practise MLA-C01 questions linked to MLA-C01 scenario.

MLA-C01 troubleshooting practice questions

Practise MLA-C01 questions linked to MLA-C01 troubleshooting.

Practice this exam

Start a free MLA-C01 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

10 questions 20 questions 30 questions 50 questions Timed 30

MLA-C01 practice-test guide →Study guide →Browse all practice tests

FAQ

Questions learners often ask

What does this MLA-C01 question test?

Deployment and Orchestration of ML Workflows — This question tests Deployment and Orchestration of ML Workflows — Read the scenario before looking for a memorised answer..

What is the correct answer to this question?

The correct answer is: Replace the custom scaling policy with a target tracking scaling policy based on the number of invocations per instance, with a target value of 1000. — Option D is correct because target tracking scaling policies based on request count respond faster to traffic spikes than CPU-based scaling, which suffers from lag. Option A is incorrect because GPU instances do not address the scaling policy lag. Option B is incorrect because model ensemble increases compute load. Option C is incorrect because inference pipelines add latency, not reduce it.

What should I do if I get this MLA-C01 question wrong?

What is the key concept behind this question?

Read the scenario before looking for a memorised answer.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Keep practising

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

This MLA-C01 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the MLA-C01 exam.