Question 78 of 1,755
Machine Learning Implementation and OperationshardMultiple SelectObjective-mapped

Quick Answer

The answer is to use GPU instances, enable SageMaker Neo, and reduce the input data size. These three measures directly cut inference latency by accelerating computation, optimizing the model for the target hardware, and minimizing the data that must be processed per request. On the AWS Certified Machine Learning Specialty MLS-C01 exam, this question tests your understanding of real-time endpoint optimization, often appearing as a trap where multi-model endpoints or increased batch size are listed as distractors—multi-model endpoints add switching overhead, and larger batches increase per-request latency despite improving throughput. A common memory tip is to think of the three Gs: GPU, Graph optimization (Neo), and Gigabytes reduced (smaller input).

MLS-C01 Practice Question: Machine Learning Implementation and Operations

This MLS-C01 practice question tests your understanding of machine learning implementation and operations. Read the scenario carefully and evaluate each option against the stated constraints before committing to an answer. After answering, compare your reasoning against the explanation and wrong-answer breakdown below. Once you have made your selection, read the full explanation to reinforce the concept and understand why each distractor is designed to mislead on exam day.

Which THREE measures can help reduce inference latency for a deep learning model deployed on SageMaker real-time endpoints? (Select THREE.)

Question 1hardmulti select
Full question →

Answer choices

Why each option matters

Answer the question above first, then reveal the full breakdown to understand why each option is right or wrong.

Correct answer & explanation

Enable SageMaker Neo to compile the model.

To reduce latency, use GPU instances, enable model compilation with SageMaker Neo, reduce input size, and use multi-model endpoints to share resources. However, multi-model endpoints add latency when switching models. Increasing batch size usually increases latency per request but can improve throughput. The three correct measures are: use GPU instances, enable SageMaker Neo, and reduce input data size.

Key principle: Authentication proves identity; authorization controls what that identity can do after login. Both must work for full privileged access.

Answer analysis

Option-by-option breakdown

For each option: why learners choose it and why it is or isn't the right answer here.

  • Enable SageMaker Neo to compile the model.

    Why this is correct

    Neo optimizes models for target hardware, reducing latency.

    Related concept

    Authentication checks who the user is.

  • Increase the batch size for inference.

    Why it's wrong here

    Larger batch sizes increase latency per request, though throughput may improve.

  • Use GPU instances for inference.

    Why this is correct

    GPUs accelerate deep learning inference.

    Related concept

    Authentication checks who the user is.

  • Reduce the input data size (e.g., lower resolution images).

    Why this is correct

    Smaller inputs reduce computation time.

    Related concept

    Authentication checks who the user is.

  • Use a multi-model endpoint to share the instance.

    Why it's wrong here

    Multi-model endpoints can add latency when loading models.

Common exam traps

Common exam trap: authentication is not authorization

Logging in proves the user can authenticate. It does not automatically mean the user is allowed to enter privileged or configuration mode. Watch for AAA authorization, privilege level and command authorization details.

Detailed technical explanation

How to think about this question

This kind of question is testing the difference between identity and permission. A user may successfully log in to a router because authentication is working, but still fail to enter configuration mode because authorization is missing, misconfigured or mapped to a lower privilege level.

KKey Concepts to Remember

  • Authentication checks who the user is.
  • Authorization controls what the user is allowed to do after login.
  • Privilege levels affect access to EXEC and configuration commands.
  • AAA, TACACS+ and RADIUS can separate login success from command access.

TExam Day Tips

  • Do not assume successful login means full administrative access.
  • Look for words such as cannot enter configuration mode, privilege level, authorization or command access.
  • Separate login problems from permission problems before choosing the answer.

Key takeaway

Authentication proves identity; authorization controls what that identity can do after login. Both must work for full privileged access.

Real-world example

How this comes up in practice

A cloud solutions architect for a retail company is evaluating services for a new workload. The correct answer here reflects best practice for the specific scenario described — not a general cloud recommendation. Authentication proves identity; authorization controls what that identity can do after login. Both must work for full privileged access. Cloud exam questions reward reading the constraint carefully: the same technology can be right or wrong depending on the use case.

What to study next

Got this wrong? Here's your next step.

Review Cisco AAA concepts — authentication, authorization, and accounting. Study privilege levels (0–15), command authorization under TACACS+, and how RADIUS differs. Then practise related MLS-C01 questions on access control and AAA configuration.

Related practice questions

Related MLS-C01 practice-question pages

Use these pages to review the topic behind this question. This is how one missed question becomes focused revision.

Practice this exam

Start a free MLS-C01 practice session

Short sessions build daily habit. Longer sessions build exam-day stamina. Try a timed session to simulate real conditions.

FAQ

Questions learners often ask

What does this MLS-C01 question test?

Machine Learning Implementation and Operations — This question tests Machine Learning Implementation and Operations — Authentication checks who the user is..

What is the correct answer to this question?

The correct answer is: Enable SageMaker Neo to compile the model. — To reduce latency, use GPU instances, enable model compilation with SageMaker Neo, reduce input size, and use multi-model endpoints to share resources. However, multi-model endpoints add latency when switching models. Increasing batch size usually increases latency per request but can improve throughput. The three correct measures are: use GPU instances, enable SageMaker Neo, and reduce input data size.

What should I do if I get this MLS-C01 question wrong?

Review Cisco AAA concepts — authentication, authorization, and accounting. Study privilege levels (0–15), command authorization under TACACS+, and how RADIUS differs. Then practise related MLS-C01 questions on access control and AAA configuration.

What is the key concept behind this question?

Authentication checks who the user is.

About these practice questions

Courseiva creates original exam-style practice questions with explanations and wrong-answer analysis. It does not publish real exam questions, exam dumps, or protected exam content. Learn why practice questions differ from exam dumps →

How Courseiva writes practice questions · Editorial policy

Last reviewed: Jun 20, 2026

Question Discussion

Share a tip, memory trick, or ask about the reasoning behind this question. Do not post real exam questions, leaked content, braindumps, or copyrighted exam material. Comments are moderated and may be removed without notice.

Loading comments…

Sign in to join the discussion.

This MLS-C01 practice question is part of Courseiva's free Amazon Web Services certification practice question bank. Courseiva provides original exam-style practice questions with explanations, topic-based practice, mock exams, readiness tracking, and study analytics to help learners prepare for the MLS-C01 exam.