Knowledge + Practice

CCNA Mla Deployment Orchestration Questions

75 of 108 questions · Page 1/2 · Mla Deployment Orchestration topic · Answers revealed

Practice these questions Exam hub All questions

1

MCQhard

A team needs to deploy a model that has compliance requirements to log all inference requests and responses for auditing. The model will be served using a real-time endpoint. How can they achieve this without custom code?

A.Enable SageMaker Data Capture on the endpoint

B.Add a custom Lambda function using a container

C.Use SageMaker Debugger to monitor inference

D.Enable CloudTrail for the endpoint

AnswerA

Data Capture logs requests and responses to S3 automatically.

Why this answer

SageMaker Data Capture is the native, no-code feature that automatically logs inference requests and responses for real-time endpoints. It captures payload data to an S3 bucket without requiring any custom code, directly meeting the compliance requirement for audit logging.

Exam trap

The trap here is that candidates often confuse CloudTrail (which logs API calls) with Data Capture (which logs payloads), or they assume Debugger can be repurposed for inference logging, but Debugger only works during training.

How to eliminate wrong answers

Option B is wrong because adding a custom Lambda function using a container introduces custom code, which the question explicitly states should be avoided. Option C is wrong because SageMaker Debugger is designed for monitoring training jobs and debugging model performance, not for capturing inference request/response logs for auditing. Option D is wrong because AWS CloudTrail logs API calls to the SageMaker endpoint (e.g., InvokeEndpoint actions) but does not capture the actual inference request and response payloads.

Practice this question →

2

Multi-Selecthard

An MLOps team is designing a SageMaker Pipeline to automate model retraining. The pipeline must: (1) run training only if new training data is available, (2) register the model in SageMaker Model Registry only if evaluation metrics exceed a threshold, (3) deploy the approved model to a staging endpoint automatically. Which THREE steps should they include? (Choose THREE.)

Select 3 answers

A.ConditionStep to check if evaluation metrics exceed the threshold

B.RegisterModel step to register the model in the Model Registry

C.TuningStep to perform hyperparameter optimization

D.TransformStep to deploy the model to a staging endpoint

E.ProcessingStep to check for new training data availability

AnswersA, B, E

A ConditionStep allows branching: if metrics exceed threshold, proceed to register; otherwise skip.

Why this answer

A ConditionStep evaluates a condition and routes to different branches. The RegisterModel step registers a model in the registry. A TransformStep (batch transform) runs inference, but for deployment a CreateModel and endpoint creation step is needed, though not listed; the correct deploy step is not a TransformStep.

A ProcessingStep can be used to check for new data. A TrainingStep trains the model. A TuningStep is for hyperparameter tuning.

Practice this question →

3

MCQmedium

A company wants to run inference on a large dataset stored in S3 using a pre-trained model. The inference can tolerate latency from minutes to hours, and they want a fully managed solution that autoscales to handle large volumes. Which SageMaker inference option is most suitable?

A.Batch transform

B.Real-time endpoint

C.Asynchronous inference

D.Serverless inference

AnswerA

Batch transform processes large S3 datasets offline with automatic scaling, ideal for this use case.

Why this answer

Batch transform is the most suitable option because the company needs to run inference on a large dataset stored in S3 with latency tolerance from minutes to hours, and requires a fully managed, autoscaling solution. SageMaker Batch Transform processes the entire dataset as a single job, automatically provisions and scales compute resources, and writes results back to S3 without the need for persistent endpoints.

Exam trap

Cisco often tests the distinction between latency tolerance and workload type, where candidates mistakenly choose asynchronous inference because they see 'tolerates latency' and 'autoscaling' without recognizing that batch transform is the only option designed for processing entire datasets stored in S3 as a single job, not individual requests.

How to eliminate wrong answers

Option B (Real-time endpoint) is wrong because it is designed for low-latency, synchronous inference (milliseconds to seconds) and requires a persistent endpoint that incurs ongoing costs, not suitable for large batch jobs with hour-long latency tolerance. Option C (Asynchronous inference) is wrong because it is intended for near-real-time requests with payloads up to 1 GB and queues requests for processing, but it still maintains a persistent endpoint and is optimized for workloads with latency in seconds to minutes, not hours-long batch processing of large datasets. Option D (Serverless inference) is wrong because it is designed for intermittent, short-lived inference requests with automatic scaling to zero, but it has a maximum timeout of 15 minutes per request and is not suitable for processing large datasets as a single job.

Practice this question →

4

MCQeasy

A data scientist wants to compare the performance of two model versions (V1 and V2) in production by splitting traffic between them. They want to gradually increase the percentage of traffic to the new version while monitoring metrics. Which SageMaker feature enables this?

A.SageMaker production variants with traffic splitting

B.SageMaker shadow testing

C.SageMaker blue/green deployment

D.SageMaker canary deployment

AnswerA

Production variants allow splitting traffic between model versions for A/B testing and gradual rollout.

Why this answer

Production variants with traffic splitting allow routing a percentage of inference requests to different model versions. By updating the initial variant weights, the data scientist can gradually shift traffic from V1 to V2. Shadow testing mirrors traffic but does not affect real responses.

Blue/green is a deployment pattern but not a SageMaker feature for gradual traffic splitting. Canary deployments are a pattern but SageMaker implements it via production variants.

Practice this question →

5

MCQeasy

A data scientist needs to deploy a single ML model that will serve real-time predictions with low latency (under 10 ms) for a high-traffic web application. The model fits in memory and requires GPU acceleration. Which SageMaker inference option is MOST suitable?

A.Real-time endpoint on ml.m5 instances

B.Batch Transform

C.Real-time endpoint on ml.g4dn instances

D.Serverless Inference

AnswerC

ml.g4dn instances offer GPU acceleration and are designed for low-latency, real-time inference.

Why this answer

Real-time endpoints on GPU instances (ml.g4dn) provide low latency and GPU acceleration, ideal for high-traffic, latency-sensitive workloads.

Practice this question →

6

Multi-Selectmedium

A team wants to deploy a new model using a canary deployment strategy on SageMaker. Which TWO configurations are necessary? (Choose two.)

Select 2 answers

A.Create a CloudWatch alarm to automatically rollback

B.Set the initial traffic distribution (e.g., 90% old, 10% new)

C.Enable data capture on the endpoint

D.Create a new endpoint configuration with two production variants, each pointing to a different model

E.Use SageMaker Model Registry to approve the new model

AnswersB, D

Initial weights define the canary traffic percentage.

Why this answer

A canary deployment requires two production variants (old and new) with traffic distribution. Gradual traffic shifting can be achieved by updating the endpoint's variant weights over time.

Practice this question →

7

MCQeasy

A data scientist wants to version and manage trained models, require approval before deployment, and enable cross-account deployment. Which SageMaker feature provides these capabilities?

A.SageMaker Neo

B.SageMaker Pipelines

C.Amazon Elastic Inference

D.SageMaker Model Registry

AnswerD

Why this answer

SageMaker Model Registry is the correct choice because it provides a centralized catalog for versioning trained models, supports approval workflows (e.g., pending, approved, rejected) to gate deployment, and enables cross-account deployment by sharing model package ARNs across AWS accounts via AWS Resource Access Manager (RAM) or cross-account IAM roles. This directly satisfies all three requirements: versioning, approval before deployment, and cross-account deployment.

Exam trap

The trap here is that candidates may confuse SageMaker Pipelines (which orchestrates the ML workflow) with Model Registry (which manages model versions and approvals), but Pipelines lacks native versioning and approval gatekeeping, while Model Registry is specifically designed for those governance tasks.

How to eliminate wrong answers

Option A is wrong because SageMaker Neo is a model optimization and compilation service that converts trained models into efficient runtime code for target hardware (e.g., ARM, Intel, NVIDIA), but it does not provide model versioning, approval workflows, or cross-account deployment capabilities. Option B is wrong because SageMaker Pipelines is a CI/CD orchestration service for building, training, and deploying ML pipelines, but it does not natively include a model registry with approval gates or cross-account deployment features; while it can integrate with Model Registry, Pipelines itself does not offer versioning or approval management. Option C is wrong because Amazon Elastic Inference is a service that attaches low-cost GPU-powered inference acceleration to SageMaker endpoints, but it has no role in model versioning, approval workflows, or cross-account deployment.

Practice this question →

8

MCQhard

A company uses SageMaker Pipelines to orchestrate their ML workflow. They notice that if a pipeline step fails due to a transient error (e.g., a brief network issue), the entire pipeline fails and they must manually rerun from the beginning. They want to automatically retry failed steps a few times before failing. What should they do?

A.Use a Lambda function to catch step failures and re-invoke the step

B.Use the CreatePipelineExecution API with a flag to ignore failures

C.Configure a RetryPolicy in the pipeline step definition to specify the number of retry attempts and backoff

D.Use AWS Step Functions to orchestrate the workflow instead of SageMaker Pipelines

AnswerC

SageMaker Pipelines supports RetryPolicy to automatically retry steps on failure.

Why this answer

SageMaker Pipelines supports retry policies for steps. By setting a RetryPolicy in the step definition with a maximum number of retry attempts, the pipeline will automatically retry the step on failure. The other options do not achieve automatic retry within the pipeline: Step Functions would require rebuilding the pipeline, Lambda cannot retry pipeline steps, and CreatePipelineExecution does not handle retries.

Practice this question →

9

Multi-Selectmedium

A company is using SageMaker to serve a model for real-time predictions. They want to test a new model version by routing a small percentage of live traffic to it while the rest goes to the current model. They also need to compare performance metrics. Which TWO actions should they take? (Select TWO.)

Select 2 answers

A.Deploy the new model to a separate endpoint and use Route 53 to split traffic

B.Compile the new model with SageMaker Neo before deployment

C.Use SageMaker Batch Transform to evaluate the new model

D.Monitor the performance of both variants using SageMaker CloudWatch metrics

E.Configure a production variant with the new model and set initial traffic weight to a small percentage

AnswersD, E

Why this answer

Option D is correct because Amazon CloudWatch provides built-in metrics for SageMaker endpoints, including latency, invocation counts, and error rates, which can be monitored per production variant. This allows the company to compare the performance of the new model version against the current model in real time. Option E is correct because SageMaker endpoints support multiple production variants, and you can set an initial traffic weight (e.g., 5%) to route a small percentage of live traffic to the new model while the rest goes to the existing variant.

Exam trap

Cisco often tests the misconception that you need an external load balancer or DNS service (like Route 53) to split traffic, when SageMaker's built-in production variant feature handles this natively.

Practice this question →

10

MCQmedium

An ML engineer needs to orchestrate a multi-step workflow that includes data preprocessing on Spark, model training on SageMaker, and deployment to a production endpoint. They require tight integration with other AWS services and the ability to add custom logic. Which AWS service should they use alongside SageMaker?

A.AWS Step Functions

B.AWS CloudFormation

C.SageMaker Pipelines

D.Amazon EventBridge

AnswerA

Why this answer

AWS Step Functions is the correct choice because it provides a serverless workflow orchestration service that can coordinate multi-step ML pipelines involving Spark on AWS Glue or EMR, SageMaker training jobs, and endpoint deployments. It offers tight integration with over 200 AWS services via direct SDK integrations, supports custom logic through Lambda functions, and includes built-in error handling, retries, and parallel execution — making it ideal for complex, heterogeneous ML workflows that extend beyond SageMaker's native capabilities.

Exam trap

The trap here is that candidates confuse SageMaker Pipelines (a SageMaker-native orchestrator) with a general-purpose orchestrator, overlooking the requirement for tight integration with non-SageMaker services like Spark and custom logic — Step Functions is the correct choice for heterogeneous, multi-service ML workflows.

How to eliminate wrong answers

Option B (AWS CloudFormation) is wrong because it is an Infrastructure as Code (IaC) service for provisioning and managing AWS resources declaratively, not a workflow orchestrator — it cannot sequence steps like 'run Spark job, then train model, then deploy endpoint' with conditional logic or dynamic state management. Option C (SageMaker Pipelines) is wrong because while it can orchestrate SageMaker-native steps (training, tuning, batch transform), it lacks direct integration with external services like Spark on EMR or Glue and cannot easily incorporate custom logic outside the SageMaker ecosystem — the question explicitly requires tight integration with other AWS services and custom logic beyond SageMaker. Option D (Amazon EventBridge) is wrong because it is an event bus service for routing events between services based on rules, not a workflow orchestrator — it cannot manage sequential dependencies, retries, or stateful execution of a multi-step pipeline.

Practice this question →

11

MCQeasy

A machine learning engineer needs to optimize a trained TensorFlow model for deployment on edge devices with limited compute. Which SageMaker feature should they use to compile the model for target hardware?

A.SageMaker Model Monitor

B.SageMaker Neo

C.SageMaker Debugger

D.SageMaker Elastic Inference

AnswerB

Neo compiles models for target hardware, optimizing for edge deployment.

Why this answer

SageMaker Neo is the correct choice because it is specifically designed to compile trained machine learning models into an optimized format for target hardware architectures, such as ARM, Intel, or NVIDIA, enabling efficient inference on edge devices with limited compute resources. It uses a compiler to apply hardware-specific optimizations like operator fusion and memory layout tuning, reducing latency and memory footprint without requiring manual code changes.

Exam trap

The trap here is that candidates confuse SageMaker Neo with SageMaker Elastic Inference, mistakenly thinking Elastic Inference compiles models for edge devices, when in fact Elastic Inference only accelerates cloud inference by attaching a fractional GPU and does not perform compilation or target edge hardware.

How to eliminate wrong answers

Option A is wrong because SageMaker Model Monitor is used for detecting data drift and model quality degradation in production, not for compiling or optimizing models for hardware. Option C is wrong because SageMaker Debugger is a tool for monitoring training jobs, capturing tensors and metrics to debug issues like vanishing gradients, not for post-training compilation or hardware-specific optimization. Option D is wrong because SageMaker Elastic Inference attaches a separate accelerator to an endpoint for low-cost GPU acceleration, but it does not compile or optimize the model for edge hardware; it is a runtime acceleration service for cloud inference, not for edge deployment.

Practice this question →

12

MCQmedium

An ML engineer wants to use MLflow on SageMaker to track experiments and log metrics. They have set up MLflow on an EC2 instance. How can they best integrate MLflow tracking with SageMaker training jobs?

A.Install MLflow on the SageMaker notebook instance only

B.Use the SageMaker Experiments integration with MLflow

C.Set the MLFLOW_TRACKING_URI environment variable in the training job and use the mlflow library in the training script

D.Use SageMaker Processing to run MLflow after training

AnswerC

This is the standard way to use MLflow with SageMaker; it allows logging metrics, parameters, and artifacts.

Why this answer

Option C is correct because the ML engineer can set the `MLFLOW_TRACKING_URI` environment variable in the SageMaker training job definition and use the `mlflow` library inside the training script to log parameters, metrics, and artifacts directly to the MLflow tracking server running on the EC2 instance. This approach allows the training job to communicate with the external MLflow server over HTTP/HTTPS without requiring any additional SageMaker integrations.

Exam trap

Cisco often tests the misconception that SageMaker Experiments is required for tracking with MLflow, but the correct approach is to directly configure the MLflow tracking URI and use the mlflow library in the training script, as SageMaker does not natively block outbound HTTP connections to an external MLflow server.

How to eliminate wrong answers

Option A is wrong because installing MLflow only on the SageMaker notebook instance does not enable tracking from SageMaker training jobs, which run on separate, ephemeral compute instances that do not have access to the notebook instance's local MLflow server. Option B is wrong because SageMaker Experiments is a separate tracking service that does not natively integrate with an external MLflow server; using it would require additional custom code to bridge the two systems, and it does not replace the need to set the tracking URI. Option D is wrong because SageMaker Processing is designed for data preprocessing and postprocessing, not for real-time metric logging during training; running MLflow after training would miss the ability to log metrics incrementally during the training run.

Practice this question →

13

Multi-Selectmedium

A company wants to deploy a containerized inference application that includes a custom pre-processing script and a TensorFlow model on SageMaker. They need the ability to independently scale the pre-processing and model serving components. Which TWO SageMaker features support this? (Select TWO.)

Select 2 answers

A.SageMaker inference pipeline

B.SageMaker Batch Transform

C.Multi-model endpoint

D.Multiple real-time endpoints with an Application Load Balancer

E.Multi-container endpoint

AnswersD, E

Why this answer

Multi-container endpoints allow running multiple containers (e.g., pre-processing and model) on the same endpoint, but scaling is per endpoint instance. Multi-model endpoints are for multiple models, not multiple containers. To independently scale, use separate endpoints with a load balancer, or use AWS App Mesh (not a SageMaker feature).

The correct answer here is that multi-container endpoints run multiple containers, and for independent scaling you can use multiple endpoints.

Practice this question →

14

MCQmedium

A startup wants to deploy a model that has variable traffic patterns, with some periods of no traffic and occasional spikes. They want to pay only for what they use and do not want to manage instances. Which SageMaker inference option should they choose?

A.Batch transform

B.Real-time endpoint with auto-scaling

C.Serverless inference

D.Multi-model endpoint

AnswerC

Serverless inference scales to zero and charges per request, perfect for variable traffic.

Why this answer

Serverless inference is the correct choice because it automatically scales to zero during periods of no traffic and scales up to handle spikes, charging only for the compute time used. This eliminates the need to manage underlying instances, making it ideal for variable and intermittent traffic patterns.

Exam trap

The trap here is that candidates often confuse auto-scaling with the ability to scale to zero, but real-time endpoints with auto-scaling still maintain a minimum number of instances, incurring costs during idle periods, whereas serverless inference truly scales to zero.

How to eliminate wrong answers

Option A is wrong because batch transform is designed for offline, asynchronous predictions on large datasets, not for real-time or variable traffic patterns with occasional spikes. Option B is wrong because real-time endpoints with auto-scaling still require provisioning and managing underlying instances, and they cannot scale to zero, meaning you incur costs even during no traffic. Option D is wrong because multi-model endpoints reduce hosting costs by sharing instances across models but still require managing instances and cannot scale to zero, so you pay for idle capacity.

Practice this question →

15

MCQmedium

A machine learning team needs to deploy a new model version for A/B testing, gradually shifting traffic from the old version to the new version over 24 hours. Which deployment strategy should they use?

A.Blue/green deployment

B.Shadow testing

C.Direct deployment with immediate full traffic

D.Canary deployment

AnswerD

Canary deployment progressively shifts traffic percentage over time, enabling monitoring and rollback.

Why this answer

Canary deployment is the correct strategy because it allows gradual traffic shifting from the old model version to the new one over a specified time period (e.g., 24 hours) while monitoring for errors or performance degradation. This approach minimizes risk by exposing only a small percentage of users to the new version initially, then incrementally increasing traffic as confidence grows, which aligns perfectly with the A/B testing requirement.

Exam trap

Cisco often tests the distinction between canary and blue/green deployment, where candidates mistakenly choose blue/green because both involve two versions, but blue/green is an instant switch, not a gradual traffic shift.

How to eliminate wrong answers

Option A is wrong because blue/green deployment involves switching all traffic from the old environment (blue) to the new environment (green) in a single cutover, not gradual traffic shifting over 24 hours. Option B is wrong because shadow testing runs the new model version in parallel with the old one but sends traffic only to the old version, comparing outputs offline without affecting live users, so it does not gradually shift traffic. Option C is wrong because direct deployment with immediate full traffic replaces the old version instantly, providing no gradual rollout or A/B testing capability.

Practice this question →

16

MCQhard

An ML team uses AWS Step Functions to orchestrate a retraining pipeline triggered by EventBridge when new training data arrives. The pipeline includes a SageMaker training job and a model evaluation. If evaluation fails, the team wants to send an alert. How should they implement this?

A.Use SQS dead-letter queue for failed training jobs

B.Add a Catch rule in the Step Functions state machine to invoke a Lambda alert function

C.Configure SageMaker training job to publish to SNS on failure

D.Use EventBridge to monitor the training job status

AnswerB

Catch rules in Step Functions handle errors and route to fallback states.

Why this answer

Step Functions supports error handling via Catch rules; a Catch on the training or evaluation task can transition to a Lambda function that sends an alert.

Practice this question →

17

MCQhard

A company needs to deploy a large language model (LLM) on SageMaker with the Triton Inference Server to maximize GPU utilization and reduce latency. They have an NVIDIA A100 GPU. Which SageMaker inference option supports Triton?

A.SageMaker Batch Transform with Triton

B.SageMaker real-time endpoint using a Triton Inference Server container

C.SageMaker Serverless Inference with a custom container

D.SageMaker Neo compiled model on a CPU endpoint

AnswerB

Why this answer

SageMaker real-time endpoints support the Triton Inference Server through a pre-built container that integrates with NVIDIA A100 GPUs, enabling dynamic batching and concurrent model execution to maximize GPU utilization and reduce latency. Triton is designed for high-throughput inference on GPU hardware, making it the correct choice for this scenario.

Exam trap

The trap here is that candidates may confuse SageMaker Batch Transform with real-time endpoints, assuming Triton can be used for batch processing, but Triton is specifically designed for real-time, low-latency inference and is not supported in Batch Transform jobs.

How to eliminate wrong answers

Option A is wrong because SageMaker Batch Transform does not support the Triton Inference Server; it is designed for offline, asynchronous inference on large datasets without real-time GPU optimization features. Option C is wrong because SageMaker Serverless Inference does not support GPU instances or custom containers with Triton; it is limited to CPU-based inference and automatically managed scaling. Option D is wrong because SageMaker Neo compiles models for CPU or edge devices, not for GPU inference with Triton, and using a CPU endpoint would not leverage the A100 GPU's capabilities.

Practice this question →

18

MCQmedium

A data science team uses SageMaker Pipelines for automated training. They need to conditionally register a model only if evaluation metrics exceed a threshold. Which pipeline step type should they use after the evaluation step?

A.Condition step

B.Processing step

C.Transform step

D.RegisterModel step

AnswerA

Condition step allows branching based on a Boolean condition, such as metric threshold.

Why this answer

The Condition step evaluates a condition and branches the pipeline; if the condition is met, the pipeline proceeds to register the model.

Practice this question →

19

MCQmedium

A company runs a batch inference job on 10 TB of image data stored in S3. Each image needs to be processed by a GPU-accelerated model. The job is not time-sensitive and cost is the primary concern. Which SageMaker option is MOST appropriate?

A.SageMaker Serverless Inference

B.SageMaker Batch Transform with GPU instance and spot instances

C.SageMaker Async Inference with GPU

D.SageMaker real-time endpoint on GPU instances

AnswerB

Why this answer

Option B is correct because Batch Transform with GPU spot instances is the most cost-effective choice for a non-time-sensitive, large-scale batch inference job on 10 TB of data. Spot instances offer up to 90% cost savings over on-demand, and Batch Transform natively handles splitting the dataset, distributing work across instances, and writing results to S3 without requiring a persistent endpoint.

Exam trap

The trap here is that candidates confuse 'batch inference' with 'async inference' and choose Option C, not realizing that Async Inference still requires a running endpoint and is designed for near-real-time processing, not cost-optimized offline batch jobs.

How to eliminate wrong answers

Option A is wrong because SageMaker Serverless Inference is designed for intermittent, low-latency workloads with a maximum payload size of 6 MB and a maximum concurrency of 200, making it unsuitable for processing 10 TB of image data. Option C is wrong because SageMaker Async Inference is optimized for near-real-time requests with large payloads (up to 1 GB) and requires a persistent endpoint, incurring higher costs than a batch job that can use spot instances. Option D is wrong because SageMaker real-time endpoints are provisioned 24/7 and designed for low-latency, high-throughput serving, which is wasteful and expensive for a non-time-sensitive batch job that can tolerate startup delays and interruptions.

Practice this question →

20

MCQeasy

A company wants to version and track ML models, with an approval workflow for promoting models from staging to production. Which SageMaker feature should they use?

A.SageMaker Model Monitor

B.SageMaker Experiments

C.SageMaker Pipelines

D.SageMaker Model Registry

AnswerD

Model Registry offers versioning, approval workflow, and deployment to production.

Why this answer

SageMaker Model Registry is the correct choice because it provides a centralized repository to catalog, version, and manage ML models, and it supports approval workflows (e.g., PendingApproval, Approved, Rejected) to promote models from staging to production. This directly addresses the requirement for version tracking and an approval gate for model promotion.

Exam trap

The trap here is that candidates confuse SageMaker Pipelines (which orchestrates the workflow) with SageMaker Model Registry (which manages the model versions and approvals), but the question specifically asks for the feature that handles versioning and approval workflow, not the orchestration of the pipeline itself.

How to eliminate wrong answers

Option A is wrong because SageMaker Model Monitor is designed for detecting data and model quality drift in production, not for versioning or approval workflows. Option B is wrong because SageMaker Experiments is used for tracking and comparing training runs (e.g., hyperparameters, metrics), not for managing model versions or approval states. Option C is wrong because SageMaker Pipelines orchestrates end-to-end ML workflows (e.g., data processing, training, deployment) but does not natively provide a model version registry or approval workflow; it can integrate with Model Registry for that purpose.

Practice this question →

21

MCQmedium

A company is deploying a large NLP model on SageMaker for real-time inference. They want to reduce inference latency and cost by optimizing the model for the target hardware. The model is trained in PyTorch. Which SageMaker feature should they use to compile the model for best performance on the chosen instance?

A.SageMaker Neo

B.AWS Step Functions

C.Amazon Elastic Inference

D.SageMaker Triton Inference Server

AnswerA

Neo optimizes models for target hardware to improve inference speed and reduce cost.

Why this answer

SageMaker Neo is the correct choice because it is specifically designed to compile trained models (including PyTorch models) into an optimized binary for a target hardware instance, reducing inference latency and improving throughput. Neo applies hardware-specific optimizations such as operator fusion, memory layout tuning, and quantization, which directly address the need for best performance on the chosen SageMaker instance.

Exam trap

The trap here is that candidates confuse model compilation (Neo) with inference serving (Triton) or hardware acceleration (Elastic Inference), leading them to pick a service that addresses a different part of the inference pipeline.

How to eliminate wrong answers

Option B is wrong because AWS Step Functions is a serverless workflow orchestration service, not a model compilation tool; it cannot optimize model performance for hardware. Option C is wrong because Amazon Elastic Inference attaches a separate accelerator to an instance for cost-effective inference, but it does not compile or optimize the model itself; it only provides additional compute resources. Option D is wrong because SageMaker Triton Inference Server is a high-performance inference server that supports multiple frameworks and model formats, but it does not compile the model for the target hardware; it serves models as-is, relying on the underlying framework's runtime.

Practice this question →

22

MCQmedium

A team wants to implement an event-driven retraining pipeline that triggers retraining when new data arrives in an S3 bucket. The pipeline should include preprocessing, training, evaluation, and conditional registration. Which AWS services should they combine?

A.S3 + SQS + SageMaker Batch Transform

B.S3 + Step Functions + SageMaker Training Job

C.S3 + Lambda + SageMaker Pipeline

D.S3 + EventBridge + SageMaker Model Registry

AnswerC

S3 event notifies Lambda, which starts a SageMaker Pipeline for the ML workflow.

Why this answer

Option C is correct because S3 triggers a Lambda function on new data arrival, which invokes SageMaker Pipeline for the full ML workflow (preprocessing, training, evaluation, and conditional registration). SageMaker Pipeline natively supports conditional steps for model registration, making it the only option that directly satisfies the conditional registration requirement without additional custom logic.

Exam trap

Cisco often tests the misconception that Step Functions or EventBridge can replace SageMaker Pipeline's native conditional registration, but only SageMaker Pipeline provides a built-in ConditionStep that directly integrates with Model Registry for gated registration.

How to eliminate wrong answers

Option A is wrong because SQS provides message queuing but no orchestration for preprocessing, training, evaluation, or conditional registration; SageMaker Batch Transform is for inference, not training pipelines. Option B is wrong because Step Functions can orchestrate SageMaker Training Jobs but lacks native conditional registration logic—you would need to build custom evaluation and registration steps, whereas SageMaker Pipeline provides built-in conditional registration. Option D is wrong because EventBridge triggers on S3 events but SageMaker Model Registry only stores and versions models—it does not orchestrate preprocessing, training, or evaluation steps.

Practice this question →

23

Multi-Selectmedium

A company uses SageMaker Pipelines to automate their ML workflow. They want to ensure that pipeline steps are not re-executed if the inputs and parameters have not changed since the last successful run. Which THREE features can help achieve this? (Choose three.)

Select 3 answers

A.Use a ConditionStep to skip steps if data is unchanged

B.Deploy a real-time endpoint for data validation

C.Use SageMaker Model Monitor

D.Enable pipeline caching on each step

E.Leverage SageMaker Experiments lineage to compare input checksums

AnswersA, D, E

A ConditionStep can branch based on data checksums, skipping unnecessary steps.

Why this answer

Pipeline caching reuses step outputs when inputs/parameters are identical. SageMaker Experiments lineage can track previous run metadata. Using a ConditionStep with checksums can skip steps based on data content.

Practice this question →

24

MCQmedium

A team wants to orchestrate a multi-step ML workflow that includes data preprocessing, hyperparameter tuning, model training, evaluation, and conditional deployment to staging or production based on evaluation metrics. The workflow should run on a schedule and track lineage. Which service should they use?

A.SageMaker Pipelines

B.Amazon MWAA (Managed Workflows for Apache Airflow)

C.AWS Glue workflows

D.AWS Step Functions with Lambda functions for each step

AnswerA

SageMaker Pipelines provides DAG-based orchestration with all the required step types and automatic lineage tracking.

Why this answer

SageMaker Pipelines is the correct choice because it is purpose-built for orchestrating multi-step ML workflows, including data preprocessing, hyperparameter tuning, model training, evaluation, and conditional deployment. It natively supports scheduling via EventBridge or a cron expression, tracks lineage automatically through SageMaker Experiments and artifact tracking, and allows conditional branching (e.g., deploy to staging or production based on evaluation metrics) using `ConditionStep`.

Exam trap

The trap here is that candidates often choose AWS Step Functions or MWAA because they are familiar general-purpose orchestrators, but they overlook that SageMaker Pipelines is the only service that provides native, end-to-end ML workflow orchestration with built-in lineage tracking, conditional deployment, and direct integration with SageMaker training, tuning, and model registry.

How to eliminate wrong answers

Option B (Amazon MWAA) is wrong because while Apache Airflow can orchestrate ML workflows, it is a general-purpose workflow engine that requires significant custom setup for ML-specific features like hyperparameter tuning, model evaluation, and lineage tracking; it lacks native integration with SageMaker's conditional deployment and artifact lineage. Option C (AWS Glue workflows) is wrong because Glue workflows are designed for ETL and data preparation tasks, not for orchestrating ML training, hyperparameter tuning, or conditional model deployment; they do not support SageMaker training jobs or endpoint deployment natively. Option D (AWS Step Functions with Lambda functions for each step) is wrong because although Step Functions can orchestrate steps, using Lambda for each ML step introduces cold start latency, payload size limits (256 KB), and a maximum execution duration of 15 minutes, making it impractical for long-running training jobs or hyperparameter tuning; it also lacks built-in lineage tracking and conditional deployment logic specific to ML models.

Practice this question →

25

MCQmedium

A data science team needs to deploy a PyTorch model that performs real-time inference with sub-100ms latency. The model requires GPU acceleration, but the team wants to minimize cost by sharing GPU instances across multiple models. Which SageMaker hosting option should they choose?

A.SageMaker real-time endpoint with Multi-Model Endpoint (MME) on an ml.g4dn instance

B.SageMaker real-time endpoint with a single model per ml.g4dn instance

C.SageMaker Serverless Inference

D.SageMaker Asynchronous Inference

AnswerA

MME on GPU instances allows multiple models to share the same GPU, reducing cost while meeting latency requirements.

Why this answer

Option A is correct because SageMaker Multi-Model Endpoint (MME) allows multiple PyTorch models to share a single GPU instance (e.g., ml.g4dn), reducing cost while meeting sub-100ms latency requirements. MME dynamically loads and unloads models into GPU memory based on traffic, enabling real-time inference with GPU acceleration without dedicating a full instance per model.

Exam trap

The trap here is that candidates often confuse SageMaker Serverless Inference with GPU support, but Serverless does not provide GPU acceleration, making it unsuitable for this latency-sensitive GPU workload.

How to eliminate wrong answers

Option B is wrong because deploying a single model per ml.g4dn instance would increase cost significantly, as the team wants to share GPU instances across multiple models. Option C is wrong because SageMaker Serverless Inference does not support GPU acceleration, so it cannot meet the sub-100ms latency requirement for PyTorch models needing GPU. Option D is wrong because SageMaker Asynchronous Inference is designed for large payloads and longer processing times (typically seconds to minutes), not for real-time sub-100ms inference.

Practice this question →

26

MCQhard

A team uses SageMaker Pipelines with a Condition step to decide whether to register a model based on evaluation metrics. They want to also store the evaluation results for lineage tracking. Which step should they use to record the metrics?

A.Condition step

B.Training step

C.RegisterModel step

D.Processing step

AnswerC

RegisterModel step registers the model and can include evaluation metrics as metadata.

Why this answer

The RegisterModel step in SageMaker Pipelines is designed to create a model version in the SageMaker Model Registry, and it can accept metadata such as evaluation metrics via the `InferenceSpecification` or by passing a metrics dictionary. This allows the team to store evaluation results alongside the model for lineage tracking, fulfilling the requirement to record metrics after a Condition step approves registration.

Exam trap

The trap here is that candidates often assume the Condition step or Processing step can directly store metrics for lineage, but only the RegisterModel step can bind evaluation results to a model version in the Model Registry, which is the explicit requirement for lineage tracking.

How to eliminate wrong answers

Option A is wrong because the Condition step only evaluates a boolean expression (e.g., comparing metrics against a threshold) to control pipeline flow; it does not have the capability to store or persist metrics. Option B is wrong because the Training step outputs a model artifact and training metrics, but it does not record evaluation metrics from a separate evaluation job into the Model Registry for lineage tracking. Option D is wrong because a Processing step can compute evaluation metrics, but it does not inherently register them with the model in the Model Registry; it would need an additional step (like RegisterModel) to persist those metrics for lineage.

Practice this question →

27

Multi-Selecthard

A team is deploying a model using SageMaker real-time endpoint with an ml.m5.large instance. They notice high latency under peak load. They want to reduce latency without increasing instance size. Which THREE actions could help? (Select THREE.)

Select 3 answers

A.Quantize the model to reduce its size

B.Increase the number of instances in the endpoint

C.Compile the model with SageMaker Neo for the ml.m5 instance

D.Attach Amazon Elastic Inference to the endpoint

E.Change the instance type to ml.g4dn.xlarge

AnswersA, C, D

Why this answer

SageMaker Neo compiles the model for the target hardware, reducing latency. Elastic Inference attaches GPU acceleration to a CPU instance. Model quantization reduces model size and speeds up inference.

Increasing instance count does not reduce per-request latency (it increases throughput). Changing to a GPU instance increases instance size.

Practice this question →

28

MCQmedium

A data scientist wants to train a model on SageMaker using a custom PyTorch script, then register the best model in the SageMaker Model Registry. The training job is part of a SageMaker Pipeline. Which pipeline step should be used to register the model?

A.RegisterModelStep

B.CreateModelStep

C.TrainingStep

D.TransformStep

AnswerA

RegisterModelStep registers a trained model into the Model Registry.

Why this answer

The `RegisterModelStep` is specifically designed to create a model resource and register it in the SageMaker Model Registry as part of a pipeline. It takes the training output (e.g., model artifacts from a `TrainingStep`) and packages it with the specified inference image and metadata, then creates a model package group version. This is the correct step for registering a model after training, as it directly integrates with the Model Registry for versioning and approval workflows.

Exam trap

The trap here is that candidates confuse `CreateModelStep` (which creates a deployable model resource) with `RegisterModelStep` (which creates a model package version in the registry), assuming both serve the same purpose of model registration.

How to eliminate wrong answers

Option B is wrong because `CreateModelStep` only creates a SageMaker model resource (for deployment or batch inference) but does not register it in the Model Registry; it lacks the versioning and metadata capabilities needed for registry management. Option C is wrong because `TrainingStep` is used to run a training job and produce model artifacts, but it has no built-in functionality to register the model into the Model Registry; registration requires a separate step. Option D is wrong because `TransformStep` is used for batch inference (transform jobs) on existing models, not for registering models into the registry.

Practice this question →

29

MCQmedium

A company is using SageMaker Pipelines to orchestrate their ML workflow. They have a Condition step that checks if a model's accuracy exceeds 0.9. If true, they want to register the model in the model registry; otherwise, they want to run a retraining step. Which step type should they use for the decision?

A.Condition step

B.Transform step

C.Processing step

D.Tuning step

AnswerA

Condition step allows branching in the pipeline based on a Boolean condition.

Why this answer

The Condition step in SageMaker Pipelines allows you to choose between two branches based on a condition. The other options are not designed for branching: Transform is for batch inference, Tuning is for hyperparameter optimization, and Processing is for data processing.

Practice this question →

30

MCQmedium

A company wants to deploy 50 small models (each ~100 MB) for real-time inference. They need to minimize hosting costs while maintaining low latency. Which SageMaker hosting option is most cost-effective?

A.SageMaker Serverless Inference

B.SageMaker Asynchronous Inference

C.SageMaker Multi-Model Endpoint (MME)

D.SageMaker real-time endpoint with one instance per model

AnswerC

MME allows multiple models to share a single instance, reducing cost.

Why this answer

Multi-Model Endpoint (MME) allows hosting multiple models on the same instance, sharing resources. Since the models are small, MME is cost-effective. Real-time endpoints would require separate instances.

Serverless is for on-demand but may incur cold starts. Asynchronous is for batch-like workloads.

Practice this question →

31

MCQeasy

A data science team has trained a PyTorch model for real-time inference and needs to deploy it on AWS with GPU acceleration while minimizing cold-start latency. Which SageMaker inference option should they choose?

A.Serverless inference

B.Batch transform

C.Asynchronous inference endpoint

D.Real-time endpoint with ml.g4dn instance

AnswerD

GPU-instance-backed real-time endpoints offer low latency and GPU compute, ideal for real-time inference with minimal cold-start.

Why this answer

Real-time endpoints with GPU instances (e.g., ml.g4dn) provide low latency and support GPU acceleration, suitable for interactive inference. Serverless inference does not support GPU instances, asynchronous inference is for non-real-time, and batch transform is for offline predictions.

Practice this question →

32

MCQhard

A data science team wants to host 50 different models for a recommendation engine. Each model is small (under 100 MB) and traffic patterns are unpredictable. They need to minimize cost and operational overhead. Which approach should they take?

A.Deploy each model to its own real-time endpoint

B.Use SageMaker serverless inference for each model

C.Use a single multi-model endpoint (MME)

D.Use a single multi-container endpoint

AnswerC

MME allows hosting many models on shared instances, loading models on demand, reducing cost and overhead.

Why this answer

Option C is correct because a single multi-model endpoint (MME) allows hosting multiple models (up to thousands) on the same endpoint, sharing the underlying compute instance. This minimizes cost and operational overhead for small models (under 100 MB) with unpredictable traffic, as the endpoint dynamically loads and unloads models from Amazon S3 into memory based on incoming requests, eliminating the need for separate endpoints or idle compute.

Exam trap

Cisco often tests the distinction between multi-model endpoints (for multiple independent models) and multi-container endpoints (for a single model with multiple containers), leading candidates to confuse the two and incorrectly choose option D.

How to eliminate wrong answers

Option A is wrong because deploying each model to its own real-time endpoint would require 50 separate endpoints, each with its own compute instance, leading to high cost and operational overhead due to idle resources during unpredictable traffic patterns. Option B is wrong because SageMaker serverless inference is designed for infrequent or sporadic traffic, but it still requires a separate serverless endpoint per model, incurring per-request costs and cold-start latency for each model, which does not minimize cost or overhead for 50 models. Option D is wrong because a single multi-container endpoint is intended for hosting multiple containers that serve a single model (e.g., pre-processing and inference), not for hosting multiple independent models; it does not support dynamic model loading and would require separate containers for each model, defeating the purpose of consolidation.

Practice this question →

33

MCQhard

A company wants to serve a large ensemble of models using NVIDIA Triton Inference Server on SageMaker for high throughput GPU inference. Which SageMaker inference option supports this?

A.Asynchronous Inference

B.Multi-model endpoint

C.Serverless Inference

D.Real-time endpoint with a custom container running Triton

AnswerD

Customers can bring their own Triton container to SageMaker real-time endpoints for optimal GPU inference.

Why this answer

SageMaker supports Triton Inference Server through a custom real-time endpoint container, as Triton is optimized for GPU serving on NVIDIA hardware.

Practice this question →

34

MCQhard

A team uses SageMaker Pipelines to train and register a model. They want to conditionally run a hyperparameter tuning step only if the data quality check passes. Which pipeline step type should they use to branch the execution?

A.TuningStep

B.TrainingStep

C.ConditionStep

D.TransformStep

AnswerC

Why this answer

The ConditionStep allows comparing values and branching to different steps. If data quality passes, the tuning step runs; otherwise, the pipeline stops or runs an alternative step. Other steps do not provide conditional branching.

Practice this question →

35

MCQmedium

A company needs to deploy a new model version to a SageMaker real-time endpoint. They want to route 5% of traffic to the new version initially to monitor for errors before full rollout. Which deployment strategy should they use?

A.Blue/green deployment

B.Shadow testing

C.Canary deployment with production variants

D.Multi-model endpoint

AnswerC

Production variants allow traffic splitting; setting initial weight to 5% on the new variant achieves a canary.

Why this answer

Option C is correct because a canary deployment with production variants allows you to route a specific percentage of traffic (e.g., 5%) to the new model version by adjusting the `InitialVariantWeight` parameter in the production variant configuration. This enables gradual traffic shifting while monitoring errors, and you can later increase the weight to 100% for full rollout. SageMaker real-time endpoints support this natively by hosting multiple model variants behind the same endpoint.

Exam trap

The trap here is that candidates confuse canary deployment with shadow testing, mistakenly thinking shadow testing also routes live user traffic, when in fact shadow testing only duplicates traffic for validation without affecting the user experience.

How to eliminate wrong answers

Option A is wrong because blue/green deployment switches all traffic from the old version to the new version at once, not a gradual 5% routing, which defeats the purpose of initial error monitoring. Option B is wrong because shadow testing sends a copy of live traffic to the new version but does not serve responses to users; it is used for validation without impacting production traffic, not for routing a percentage of user-facing traffic. Option D is wrong because a multi-model endpoint hosts multiple models on the same endpoint but does not provide traffic splitting or weighted routing between model versions; it is designed for cost efficiency with many models, not gradual rollout.

Practice this question →

36

MCQhard

An ML team uses SageMaker Pipelines to automate model retraining. They want to skip redundant training steps when input data has not changed. Which feature should they enable?

A.Pipeline caching

B.Pipeline variable expressions

C.Model registry approval

D.Step parallelism

AnswerA

Caching compares step hash and skips execution if unchanged.

Why this answer

SageMaker Pipelines caching stores step outputs; if the step configuration and inputs are identical, the pipeline reuses the cached output, skipping execution.

Practice this question →

37

MCQmedium

A company wants to deploy a machine learning model using infrastructure as code to ensure reproducibility. They need to define the SageMaker Studio domain, user profiles, and the endpoint configuration. Which tool should they use?

A.AWS CloudFormation or AWS CDK

B.SageMaker Pipelines

C.AWS Step Functions

D.SageMaker Studio

AnswerA

Both are IaC services that can define and provision SageMaker resources in a reproducible manner.

Why this answer

AWS CloudFormation and AWS CDK are infrastructure-as-code (IaC) tools that allow you to define, provision, and manage AWS resources declaratively. For this use case, they can model the entire SageMaker Studio domain, user profiles, and endpoint configuration in templates or code, ensuring reproducibility and version control. This aligns directly with the requirement to deploy ML infrastructure as code.

Exam trap

The trap here is that candidates confuse SageMaker Pipelines (a CI/CD service for ML steps) with infrastructure-as-code tools, forgetting that Pipelines does not manage underlying infrastructure resources like Studio domains or endpoint configurations.

How to eliminate wrong answers

Option B (SageMaker Pipelines) is wrong because it is a purpose-built CI/CD service for ML workflows (training, tuning, batch transforms), not for defining and provisioning infrastructure resources like Studio domains or endpoints. Option C (AWS Step Functions) is wrong because it is a serverless workflow orchestration service for coordinating distributed applications and microservices, not for defining infrastructure resources declaratively. Option D (SageMaker Studio) is wrong because it is the web-based IDE for ML development, not a tool for defining or deploying infrastructure as code.

Practice this question →

38

MCQeasy

Which SageMaker feature compiles a trained model into an optimized binary for a specific hardware target (e.g., Intel, ARM, NVIDIA, or edge devices) to improve inference performance?

A.SageMaker Model Monitor

B.SageMaker Neo

C.Amazon Elastic Inference

D.SageMaker Clarify

AnswerB

Neo compiles models to run efficiently on target hardware including edge devices.

Why this answer

SageMaker Neo is a model compilation service that optimizes models for specific hardware targets. Amazon Elastic Inference attaches GPU acceleration to endpoints, but does not compile models. Model Monitor monitors quality.

SageMaker Clarify explains predictions.

Practice this question →

39

MCQmedium

A company uses SageMaker Model Registry to manage model versions. They want to automate the approval of models that pass automated evaluation, but require manual approval for others. Which Model Registry feature supports this workflow?

A.Approval workflow via pipeline Condition step

B.Cross-account deployment

C.Model versioning

D.Model lineage

AnswerA

A pipeline Condition step can set the model status to Approved or PendingManualApproval based on metrics.

Why this answer

Model Registry supports approval statuses (PendingManualApproval, Approved, Rejected). Automated evaluation can set status to Approved, while borderline cases can be set to PendingManualApproval.

Practice this question →

40

MCQmedium

A company needs to serve real-time predictions from a large ensemble of three deep learning models, each requiring different inference environments (PyTorch, TensorFlow, MXNet). Which SageMaker endpoint type supports running multiple inference containers together?

A.Multi-model endpoint

B.Real-time endpoint with a single container

C.Multi-container endpoint

D.Asynchronous endpoint

AnswerC

Multi-container endpoints support multiple inference containers, each with its own environment.

Why this answer

Option C is correct because Amazon SageMaker multi-container endpoints allow you to run multiple inference containers (e.g., PyTorch, TensorFlow, MXNet) within a single endpoint, each handling different models or inference environments. This is achieved by deploying multiple containers behind a single endpoint with a serial or direct invocation pattern, enabling real-time predictions from the ensemble without managing separate endpoints.

Exam trap

The trap here is that candidates often confuse 'multi-model endpoint' (multiple models in one container) with 'multi-container endpoint' (multiple containers with different environments), leading them to incorrectly select Option A.

How to eliminate wrong answers

Option A is wrong because a multi-model endpoint hosts multiple models within a single container, not multiple containers with different inference environments; it uses a shared serving container and loads models dynamically from Amazon S3. Option B is wrong because a real-time endpoint with a single container can only run one inference environment, making it impossible to serve the three different deep learning frameworks required by the ensemble. Option D is wrong because an asynchronous endpoint is designed for large payloads and long processing times, not for real-time predictions, and it still uses a single container per endpoint.

Practice this question →

41

Multi-Selecthard

A machine learning engineer is designing a SageMaker Pipeline that includes a training step, a processing step for evaluation, and a condition step to decide whether to register the model. The pipeline should support caching to avoid redundant runs when inputs haven't changed. Which three steps must have caching enabled? (Select THREE.)

Select 3 answers

A.Training step

B.Transform step (if used)

C.Processing step (evaluation)

D.Condition step

E.RegisterModel step

AnswersA, C, E

Training outputs a model artifact; caching avoids retraining if inputs unchanged.

Why this answer

For caching to avoid redundant runs, the steps that produce outputs that can be reused must have caching enabled. The processing step (evaluation) and training step both generate outputs that can be cached if their inputs (code, data, hyperparameters) remain the same. The condition step does not produce outputs to cache; it just branches.

The RegisterModel step typically registers metadata, but its inputs (model artifact, metrics) may be generated by previous steps; enabling caching on the RegisterModel step can also avoid re-running if the same model artifact is already registered.

Practice this question →

42

MCQmedium

A company wants to serve 200 different PyTorch models. Each model is small (under 1 GB) and only a fraction are used at any time. To minimize cost and management overhead, which SageMaker inference option should be used?

A.Use batch transform for all models

B.Create a separate real-time endpoint for each model

C.Use a multi-container endpoint

D.Use a multi-model endpoint

AnswerD

Multi-model endpoints allow hosting hundreds of models on one endpoint, loading them dynamically and reducing cost.

Why this answer

A multi-model endpoint (MME) is the correct choice because it allows you to host multiple small PyTorch models (under 1 GB each) on a single endpoint, sharing the underlying compute instance. This minimizes cost by only paying for the active instances, and reduces management overhead since you don't need to create or manage separate endpoints for each model. SageMaker dynamically loads and unloads models from the container's memory based on invocation patterns, which is ideal for a scenario where only a fraction of the 200 models are used at any time.

Exam trap

Cisco often tests the distinction between multi-container endpoints (for multi-stage inference pipelines) and multi-model endpoints (for hosting many independent models), and the trap here is that candidates confuse 'multiple containers' with 'multiple models' and incorrectly choose the multi-container option.

How to eliminate wrong answers

Option A is wrong because batch transform is designed for offline, asynchronous inference on a complete dataset, not for serving real-time predictions on-demand, and it would incur cost for processing all models even when not needed. Option B is wrong because creating a separate real-time endpoint for each of the 200 models would be prohibitively expensive and introduce significant management overhead, as each endpoint requires its own compute instance and incurs hourly charges regardless of usage. Option C is wrong because a multi-container endpoint is used to run multiple containers (e.g., for pre-processing, inference, post-processing) within a single endpoint, not to host multiple independent models; it does not support dynamic loading/unloading of different model artifacts.

Practice this question →

43

MCQmedium

A team uses MLflow on SageMaker for experiment tracking. They want to automatically deploy the best-performing model from an MLflow run to a SageMaker endpoint for real-time inference. What is the MOST efficient way to achieve this?

A.Use AWS Step Functions to trigger an MLflow run and then call SageMaker CreateEndpoint

B.Use SageMaker Pipelines with the MLflow integration to register the model and deploy via a Transform step

C.Set up an EventBridge rule to trigger a Lambda that deploys the model whenever a new MLflow run is logged

D.Manually export the model artifact from MLflow and upload to S3, then create a SageMaker model and endpoint

AnswerB

SageMaker Pipelines can automate the workflow: get best run from MLflow, register model, and deploy using a Transform or endpoint deployment step.

Why this answer

The MLflow Model Registry can be integrated with SageMaker via the MLflow plugin for SageMaker, which allows direct deployment from the registry to an endpoint. Alternatively, using SageMaker Pipelines with the MLflow integration is more automated and production-grade.

Practice this question →

44

MCQeasy

A machine learning engineer needs to deploy a model that requires less than 100 ms inference latency for real-time predictions. The model is a small PyTorch model that fits in a single GPU. Which SageMaker inference option is MOST cost-effective for this scenario?

A.Asynchronous inference endpoint

B.Real-time endpoint on ml.g4dn.xlarge

C.Batch transform job

D.Serverless inference with max concurrency set to 10

AnswerD

Serverless inference scales to zero when idle and charges only for the compute time used, making it cost-effective for low and variable traffic.

Why this answer

For low latency and occasional traffic, serverless inference is cost-effective because it scales to zero when not in use and charges per inference. Real-time endpoints incur cost even when idle, batch transform is for offline processing, and asynchronous inference has higher latency.

Practice this question →

45

MCQmedium

A company uses SageMaker Model Registry to manage model versions. They want to enforce that only models with an 'Approved' status can be deployed to production endpoints. How can they enforce this?

A.Use AWS Lambda to check the model status during deployment

B.Set IAM policies with a condition on sagemaker:ModelVersionStatus

C.Use SageMaker Pipelines to deploy only approved models

D.Configure SCPs to block deployment of unapproved models

AnswerB

IAM condition keys allow restricting CreateEndpointConfig to only approved models.

Why this answer

SageMaker Model Registry supports approval workflows. By using IAM policies that conditionally allow deployment only when the model version status is 'Approved', the company can enforce governance.

Practice this question →

46

Multi-Selectmedium

A company wants to test a new ML model in production with minimal risk before shifting full traffic. They have an existing real-time endpoint serving model version A. They need to route 5% of live traffic to model version B and monitor performance for 24 hours. Which TWO steps should they take? (Choose TWO.)

Select 2 answers

A.Deploy model B using SageMaker batch transform and compare offline metrics

B.Configure a CloudWatch alarm to roll back if error rate exceeds a threshold

C.Use SageMaker's blue/green deployment and shift 5% traffic initially

D.Create a new endpoint with model B and use Amazon Route 53 to split 5% of traffic

E.Update the existing endpoint to include two production variants: variant A with 95% traffic and variant B with 5% traffic

AnswersB, E

CloudWatch alarms can be set on endpoint metrics (e.g., error rate, latency) to trigger automatic rollback or alert the team.

Why this answer

Blue/green deployment creates a new endpoint with the new model and swaps all traffic at once, not a gradual shift. Canary deployment routes a small percentage of traffic to the new version for testing. SageMaker supports canary deployments by updating the endpoint with multiple production variants and specifying initial traffic weights.

The existing endpoint should be updated to include both variants.

Practice this question →

47

MCQeasy

A company needs to deploy a model that processes large payloads (up to 1 GB) asynchronously. The results should be written to S3, and the team needs SNS notifications upon completion. Which SageMaker inference option is MOST suitable?

A.Asynchronous Inference

B.Batch Transform

C.Real-time endpoint

D.Serverless Inference

AnswerA

Designed for large payloads, writes results to S3, and can send SNS notifications.

Why this answer

Asynchronous Inference is designed for large payloads, processes requests asynchronously, and supports SNS notifications on completion.

Practice this question →

48

MCQhard

A company deploys a large NLP model on a SageMaker real-time endpoint using an ml.p3.2xlarge instance. To reduce inference cost without sacrificing throughput, they want to compile the model for their target hardware. Which service should they use?

A.SageMaker Neo

B.Triton Inference Server on SageMaker

C.Amazon Elastic Inference

D.SageMaker Inference Recommender

AnswerA

Neo compiles models for target hardware, optimizing for performance and cost.

Why this answer

SageMaker Neo is the correct service because it compiles trained machine learning models into an optimized binary for a specific target hardware (e.g., ml.p3.2xlarge with NVIDIA GPUs). This reduces inference latency and cost by applying hardware-specific optimizations such as kernel fusion and memory layout tuning, while preserving the original model's throughput. The compilation process uses Apache TVM under the hood to generate efficient code for the target instance type.

Exam trap

Cisco often tests the distinction between model compilation (Neo) and runtime serving optimizations (Triton) or hardware acceleration (Elastic Inference), leading candidates to confuse a compile-time optimization service with a runtime serving framework or a hardware add-on.

How to eliminate wrong answers

Option B is wrong because Triton Inference Server on SageMaker is a model serving framework that supports multiple backends (e.g., TensorRT, ONNX Runtime) and dynamic batching, but it does not perform ahead-of-time model compilation for a specific hardware target; it optimizes runtime execution, not compile-time optimization. Option C is wrong because Amazon Elastic Inference attaches a separate accelerator to a CPU instance for cost savings, but it is not a model compilation service; it is a hardware attachment that does not compile the model for the target instance. Option D is wrong because SageMaker Inference Recommender is a tool for benchmarking and recommending instance types and endpoint configurations based on load tests, not for compiling or optimizing the model binary for a specific hardware target.

Practice this question →

49

Multi-Selecthard

A team is optimizing a deep learning model for deployment on SageMaker using SageMaker Neo. Which THREE of the following are valid optimization techniques that Neo can apply? (Choose THREE.)

Select 3 answers

A.Pruning (removing redundant weights)

B.Operator fusion (combining adjacent operations)

C.Knowledge distillation

D.Hyperparameter tuning

E.Quantisation (e.g., FP16, INT8)

AnswersA, B, E

Neo can prune model weights to reduce model size and computational cost.

Why this answer

SageMaker Neo performs hardware-specific optimizations including quantisation (reducing precision), pruning (removing redundant weights), and operator fusion (combining operations). Knowledge distillation is a training-time technique, not part of Neo. Hyperparameter tuning is done by SageMaker Tuning jobs, not Neo.

Practice this question →

50

MCQmedium

A team wants to use AWS Step Functions to orchestrate a retraining workflow that is triggered when new data arrives in an S3 bucket. They also need to monitor model drift. Which event-driven approach should they use?

A.Configure EventBridge to capture S3 PutObject events and target an AWS Step Functions state machine that runs the retraining pipeline

B.Use a cron-based Step Function schedule that checks for new data every hour

C.Set up an S3 event notification to invoke a Lambda function that starts a SageMaker training job directly

D.Use SageMaker Pipelines with a schedule trigger

AnswerA

EventBridge triggers the Step Functions workflow upon new data arrival, allowing orchestration of retraining and drift monitoring.

Why this answer

Option A is correct because AWS EventBridge can capture S3 PutObject events (via S3's default event notifications or a more granular EventBridge rule) and directly target a Step Functions state machine as a target. This creates a fully event-driven, serverless orchestration for the retraining pipeline without polling or custom code. Step Functions then coordinates the retraining steps, including model drift monitoring, in a reliable and auditable manner.

Exam trap

The trap here is that candidates often confuse S3 event notifications (which directly invoke Lambda) with EventBridge (which can target Step Functions), and they overlook that Step Functions is the recommended orchestration service for complex ML workflows, not just Lambda or SageMaker Pipelines alone.

How to eliminate wrong answers

Option B is wrong because a cron-based schedule polls for new data on a fixed interval, which is not event-driven; it introduces latency and unnecessary invocations when no new data has arrived, and it does not react immediately to S3 events. Option C is wrong because while S3 event notifications can invoke a Lambda function, this approach bypasses Step Functions orchestration, making it harder to manage complex retraining workflows, error handling, and monitoring model drift as part of a coordinated pipeline. Option D is wrong because SageMaker Pipelines with a schedule trigger is not event-driven; it relies on a time-based trigger rather than reacting to new data arrival in S3, and it lacks the flexibility of Step Functions for integrating with other AWS services for drift monitoring.

Practice this question →

51

Multi-Selecthard

A team uses SageMaker Pipelines to train and evaluate a model. They want to run the training step only if the data quality check passes, otherwise skip. Which TWO pipeline step types are required? (Select TWO.)

Select 2 answers

A.RegisterModel step

B.Condition step

C.Processing step

D.Training step

E.Transform step

AnswersB, D

Evaluates the condition and determines the next step.

Why this answer

The Condition step (B) is required because SageMaker Pipelines uses a Condition step to evaluate a boolean expression—such as whether a data quality check passed—and then conditionally execute subsequent steps. The Training step (D) is required because it is the step that actually runs the model training job, and it must be placed inside the 'If' branch of the Condition step to run only when the condition is true.

Exam trap

The trap here is that candidates often think a Processing step (C) can handle conditional logic because it runs custom code, but SageMaker Pipelines requires a dedicated Condition step for branching; the Processing step is only for data processing, not for pipeline control flow.

Practice this question →

52

MCQhard

A team has a large deep learning model that needs to be deployed for real-time inference with GPU acceleration. They want to use the Triton Inference Server on SageMaker to maximize throughput. Which instance type and configuration should they choose?

A.Deploy on an ml.g4dn.xlarge instance using the SageMaker Triton Inference Server container

B.Deploy on an ml.c5.2xlarge instance using a PyTorch container

C.Deploy on an ml.p3.2xlarge instance using the SageMaker built-in XGBoost container

D.Deploy on an ml.m5.large instance with a standard TensorFlow serving container

AnswerA

The ml.g4dn instance has NVIDIA GPU, and the Triton container is optimized for high throughput inference.

Why this answer

Option A is correct because the Triton Inference Server is specifically designed for high-performance inference on large deep learning models, supporting GPU acceleration and dynamic batching to maximize throughput. The ml.g4dn.xlarge instance provides a cost-effective GPU (T4) with sufficient memory for many models, and SageMaker's pre-built Triton container enables seamless deployment with features like model concurrency and request scheduling.

Exam trap

Cisco often tests the misconception that any GPU instance (like ml.p3) is suitable for deep learning inference, but the key is matching the container (Triton) to the workload, not just the instance type, and avoiding CPU-only instances for GPU-accelerated tasks.

How to eliminate wrong answers

Option B is wrong because ml.c5.2xlarge is a CPU-only instance (no GPU), which cannot provide the GPU acceleration required for real-time inference of large deep learning models, leading to high latency and low throughput. Option C is wrong because the SageMaker built-in XGBoost container is designed for gradient-boosted tree models, not deep learning models, and ml.p3.2xlarge (V100 GPU) is overkill for XGBoost and incompatible with the container. Option D is wrong because ml.m5.large is a general-purpose CPU instance with no GPU, and the standard TensorFlow Serving container lacks the advanced features of Triton (e.g., dynamic batching, model ensembles, concurrent model execution) needed to maximize throughput for large models.

Practice this question →

53

MCQmedium

A company has 200 small models (each ~100 MB) that serve different customers. They want to minimize costs while keeping low latency for each customer. Which SageMaker deployment approach is MOST suitable?

A.Deploy each model on a separate real-time endpoint

B.Use a single multi-model endpoint (MME) on an ml.c5.large instance

C.Use a multi-container endpoint with one container per model

D.Use serverless inference for each model

AnswerB

MME hosts many models on shared instances, reducing cost while maintaining low latency for small models.

Why this answer

A single multi-model endpoint (MME) on an ml.c5.large instance is the most suitable because it allows you to host up to 200 small models (each ~100 MB) on a single endpoint, dynamically loading and unloading models from Amazon EBS or Amazon EFS based on inference requests. This minimizes costs by sharing a single instance across all models while maintaining low latency for each customer, as the models are small enough to be cached in memory and loaded quickly on demand.

Exam trap

Cisco often tests the misconception that multi-container endpoints are equivalent to multi-model endpoints, but the trap here is that multi-container endpoints are for chaining containers (e.g., pre-processing + inference) rather than hosting many independent models, leading candidates to overcomplicate the solution.

How to eliminate wrong answers

Option A is wrong because deploying each model on a separate real-time endpoint would require 200 endpoints, each with its own instance, leading to significantly higher costs due to idle resources and per-endpoint charges, with no benefit for such small models. Option C is wrong because a multi-container endpoint runs multiple containers simultaneously on the same instance, which is designed for scenarios requiring different processing pipelines (e.g., pre-processing and inference) rather than hosting many independent models; it would waste memory and incur unnecessary overhead for 200 separate models. Option D is wrong because serverless inference has a maximum concurrency limit (default 50, can be increased to 200) and a cold start latency that can exceed acceptable thresholds for real-time customer requests, plus it is typically more expensive per invocation for high-frequency inference compared to a dedicated instance with MME.

Practice this question →

54

Multi-Selectmedium

A company wants to deploy a new model using a canary deployment strategy on SageMaker. Which two actions should they take? (Select TWO.)

Select 2 answers

A.Register both models in the Model Registry with 'Approved' status

B.Use SageMaker Model Monitor to compare model performance

C.Create a new endpoint with two production variants

D.Enable data capture on the endpoint

E.Set the initial traffic weights for the variants (e.g., 95% and 5%)

AnswersC, E

Two variants enable traffic splitting between current and new models.

Why this answer

To implement canary deployment, create two production variants (current and new) with initial traffic weights (e.g., 95% and 5%), then update the endpoint to gradually shift traffic. Using endpoint update with routing config adjusts traffic weights over time.

Practice this question →

55

MCQmedium

A team has a SageMaker Pipeline that trains a model and registers it in the Model Registry. They want to automate the deployment of the approved model to a staging environment. Which event-driven approach should they use?

A.Use an SQS queue to store approval messages and have a cron job process them

B.Set up a CloudWatch alarm on the Model Registry's ApprovalStatus metric

C.Use Amazon EventBridge to listen for Model Registry approval events and trigger an AWS Lambda function that deploys the model

D.Configure an AWS Step Functions state machine to poll the Model Registry every minute

AnswerC

This is a serverless, event-driven pattern that reacts immediately to approval.

Why this answer

The Amazon EventBridge integration with SageMaker can trigger on Model Registry status changes (e.g., when a model version is approved). A Lambda function can then deploy the model to a staging endpoint. Step Functions can be used, but the trigger should be EventBridge.

CloudWatch alarms are for monitoring metrics.

Practice this question →

56

Multi-Selectmedium

A data scientist needs to deploy an anomaly detection model that processes large payloads (up to 10 MB per request) and expects inference times of up to 10 minutes. The team wants to minimize cost and only pay per inference. Which TWO SageMaker inference options meet these requirements? (Choose TWO.)

Select 1 answer

A.Batch transform

B.Real-time endpoint

C.Serverless inference

D.Asynchronous inference endpoint

AnswersD

Asynchronous inference can handle large payloads (up to 1 GB) and long timeouts (up to 1 hour), and scales to zero, charging per request.

Why this answer

Option D is correct because asynchronous inference endpoints are designed for large payloads (up to 1 GB) and long processing times (up to 1 hour), making them ideal for this 10 MB, 10-minute inference workload. They also follow a pay-per-inference model, charging only for the duration of each inference request, which minimizes cost.

Exam trap

The trap here is that candidates often confuse serverless inference with asynchronous inference, but serverless has a 6 MB payload limit and 15-minute timeout, while asynchronous supports up to 1 GB and 1 hour, making it the correct choice for large, long-running payloads.

Practice this question →

57

MCQmedium

A startup wants to deploy a containerized ML application that includes both a model inference server and a preprocessing component in the same endpoint. Which SageMaker endpoint type supports running multiple containers?

A.Asynchronous Inference

B.Multi-container endpoint

C.Multi-model endpoint

D.Real-time endpoint

AnswerB

Supports multiple containers sharing the same instance, e.g., preprocessing and inference.

Why this answer

Multi-container endpoints allow running multiple containers, enabling preprocessing and inference in the same endpoint.

Practice this question →

58

MCQhard

A team deploys a model on a SageMaker real-time endpoint using an ml.m5.xlarge instance. The model has high latency due to a large neural network. The team wants to reduce latency without changing the model code. Which option should they use?

A.Increase the instance size to ml.m5.4xlarge

B.Attach Amazon Elastic Inference to the endpoint

C.Use SageMaker Neo to compile the model

D.Switch to a GPU instance like ml.g4dn.xlarge

AnswerB

Elastic Inference provides GPU acceleration at lower cost than a full GPU instance, reducing inference latency.

Why this answer

Amazon Elastic Inference attaches a fixed amount of GPU acceleration to an EC2 instance, providing cost-effective acceleration for deep learning inference without needing a full GPU instance.

Practice this question →

59

MCQeasy

A company wants to update an existing SageMaker real-time endpoint to serve a new model version. They need to route a small percentage of traffic to the new version initially and monitor for errors before switching fully. Which deployment pattern supports this?

A.Shadow testing

B.A/B testing with traffic splitting

C.Canary deployment with weighted production variants

D.Blue/green deployment

AnswerC

Canary deployment uses weighted variants to send a small percentage of traffic to the new model, enabling monitoring.

Why this answer

Option C is correct because SageMaker real-time endpoints support canary deployments by configuring multiple production variants with weighted traffic distribution. You can assign a small weight (e.g., 5%) to the new model version variant and 95% to the existing one, then monitor CloudWatch metrics for errors before shifting all traffic to the new variant. This matches the requirement for a gradual, monitored rollout.

Exam trap

Cisco often tests the distinction between canary deployment and blue/green deployment, where candidates mistakenly choose blue/green because it sounds like a safe rollout, but it lacks the gradual traffic shifting required for monitoring a small percentage first.

How to eliminate wrong answers

Option A is wrong because shadow testing (also called mirroring) sends a copy of live traffic to the new model without affecting the live response, but SageMaker does not natively support shadow testing for real-time endpoints; it is typically used for testing without routing any user-facing traffic. Option B is wrong because A/B testing with traffic splitting is a broader concept that could be implemented via weighted variants, but the specific pattern described in the question (routing a small percentage of traffic to a new version and monitoring before switching fully) is precisely a canary deployment, not just any A/B test. Option D is wrong because blue/green deployment involves switching all traffic at once from the old (blue) to the new (green) environment, which does not allow for a small percentage of traffic to be routed initially for monitoring.

Practice this question →

60

MCQmedium

An organization wants to ensure that only approved model versions can be deployed to production. They use the SageMaker Model Registry to track model versions. How can they enforce that only approved models are deployed?

A.Manually review each model before deployment

B.Use SageMaker Model Monitor to check model quality after deployment

C.Use IAM policies to restrict deployment to only Approved model versions

D.Store model metadata in a DynamoDB table and check it before deployment

AnswerC

IAM policies can be written to allow SageMaker CreateEndpoint only for models with an Approved approval status, which is best practice.

Why this answer

Option C is correct because AWS IAM policies can be used to conditionally restrict SageMaker API actions (e.g., CreateEndpointConfig, CreateModel) based on the model version's approval status. By evaluating the `sagemaker:ModelPackageApprovalStatus` condition key in an IAM policy, you can enforce that only model versions with an `Approved` status can be deployed, providing a native, automated, and auditable enforcement mechanism without manual intervention or external dependencies.

Exam trap

The trap here is that candidates confuse SageMaker Model Monitor (post-deployment monitoring) with pre-deployment approval enforcement, or they assume custom external checks (DynamoDB) are necessary when SageMaker provides native IAM-based conditional enforcement.

How to eliminate wrong answers

Option A is wrong because manual review is not a technical enforcement mechanism; it introduces human error, lacks auditability, and does not prevent unauthorized deployments via API or automation. Option B is wrong because SageMaker Model Monitor is a post-deployment tool that detects data drift and quality issues after the model is already serving traffic; it cannot prevent the deployment of unapproved models. Option D is wrong because storing metadata in DynamoDB and checking it before deployment requires custom code, introduces latency, and is not a native SageMaker enforcement mechanism; it also bypasses the built-in approval tracking in the Model Registry.

Practice this question →

61

Multi-Selectmedium

A team wants to deploy a single SageMaker real-time endpoint that serves both a PyTorch model for NLP and a TensorFlow model for image classification. Each model requires a different inference container. Which two features can they use together to achieve this? (Select TWO.)

Select 2 answers

A.Multi-model endpoint

B.Multi-container endpoint

C.Production variants

D.SageMaker inference components

E.SageMaker Neo compilation

AnswersB, D

Multi-container endpoints can run different containers for different models.

Why this answer

A multi-container endpoint allows running multiple containers (e.g., PyTorch and TensorFlow) on the same endpoint. With inference components, each container can be associated with a specific model, and the routing logic directs requests to the appropriate container based on the model name.

Practice this question →

62

MCQmedium

A company uses SageMaker Model Registry to manage model versions. They have a cross-account deployment requirement: models approved in the development account must be deployed to a production account. Which approach is the MOST secure and recommended?

A.Export the model from Model Registry to a tar.gz file and upload to the production account manually

B.Copy the model artifact to a public S3 bucket and then create the model in the production account

C.Use a Lambda function in the development account to call CreateEndpoint in the production account using cross-account IAM roles

D.Share the model package group from the development account to the production account using AWS RAM, then create a model version in the production account

AnswerD

AWS Resource Access Manager allows sharing model packages across accounts securely, and then the production account can deploy.

Why this answer

Cross-account deployment can be achieved by sharing the model package across accounts using AWS Resource Access Manager (RAM) or by exporting the model artifact to an S3 bucket with appropriate cross-account permissions, then creating the model in the target account.

Practice this question →

63

MCQeasy

A company uses SageMaker Neo to compile a trained model for deployment on edge devices. What is the primary benefit of using Neo?

A.It monitors model drift in production

B.It reduces model size and improves inference speed on target hardware

C.It automatically retrains the model on new data

D.It provides a serverless inference endpoint

AnswerB

Neo uses hardware-specific optimizations like kernel fusion and quantization to improve performance.

Why this answer

SageMaker Neo optimizes models for specific hardware architectures (e.g., ARM, Intel, NVIDIA) to achieve faster inference and lower memory footprint.

Practice this question →

64

MCQeasy

A data science team needs to deploy a trained PyTorch model for real-time inference with sub-100ms latency. The model fits on a single GPU. Which SageMaker inference option is MOST cost-effective while meeting the latency requirement?

A.SageMaker Batch Transform

B.SageMaker real-time endpoint on ml.g4dn.xlarge

C.SageMaker Async Inference

D.SageMaker Serverless Inference

AnswerB

Why this answer

SageMaker real-time endpoints provide dedicated, persistent instances that can handle synchronous inference with sub-100ms latency. The ml.g4dn.xlarge instance includes a single NVIDIA T4 GPU, which is sufficient for the model size and offers the lowest cost among GPU instances that meet the latency requirement. This option balances performance and cost for real-time, low-latency inference.

Exam trap

The trap here is that candidates often choose SageMaker Serverless Inference for its cost-saving potential, but they overlook the cold start latency and lack of GPU support, which makes it unsuitable for real-time, sub-100ms inference with PyTorch models.

How to eliminate wrong answers

Option A is wrong because SageMaker Batch Transform is designed for asynchronous, offline inference on large datasets, not for real-time sub-100ms latency; it processes data in batches and returns results only after the job completes. Option C is wrong because SageMaker Async Inference queues inference requests and processes them asynchronously, which introduces unpredictable latency and is not suitable for sub-100ms real-time requirements. Option D is wrong because SageMaker Serverless Inference auto-scales from zero and has a cold start latency that can exceed 100ms, especially for GPU-based models, making it unsuitable for strict real-time latency demands.

Practice this question →

65

Multi-Selectmedium

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but wants to minimize costs by using a smaller attached GPU. Which options can they use? (Select TWO.)

Select 2 answers

A.Amazon Elastic Inference

B.SageMaker Neo compilation

C.Use a smaller GPU instance like ml.g4dn.xlarge instead of ml.p3.2xlarge

D.Quantize the model to INT8 precision

E.Use SageMaker serverless inference with GPU

AnswersA, C

Elastic Inference attaches a GPU accelerator to a CPU instance, providing GPU acceleration at lower cost.

Why this answer

Amazon Elastic Inference (Option A) allows you to attach a smaller, configurable GPU acceleration resource to a SageMaker endpoint, enabling GPU-accelerated inference without the cost of a full GPU instance. This directly meets the requirement of minimizing costs by using a smaller attached GPU.

Exam trap

The trap here is that candidates may confuse SageMaker Neo compilation (a model optimization technique) with hardware acceleration, or mistakenly think SageMaker serverless inference supports GPU, when in fact it only supports CPU-based compute.

Practice this question →

66

MCQmedium

A financial services company needs to enforce that only approved model versions are deployed to production. They use SageMaker Model Registry to track versions, with an approval workflow. Which action must they take in the model registry to ensure only approved models can be deployed?

A.Set the model version status to 'Approved' in the Model Registry

B.Tag the model version as 'production-ready'

C.Manually move the model artifact to a production S3 bucket

D.Use AWS IAM policies to restrict deployment to specific model ARNs

AnswerA

Only model versions with Approved status can be deployed via SageMaker endpoints.

Why this answer

Option A is correct because the SageMaker Model Registry uses a status field to control the lifecycle of model versions. By setting the model version status to 'Approved', the company can enforce that only approved models are deployable, as SageMaker's deployment APIs (e.g., CreateModel, CreateEndpointConfig) can be configured to require an 'Approved' status. This integrates with the approval workflow, ensuring that unapproved or pending versions are blocked from production deployment.

Exam trap

The trap here is that candidates may confuse tagging (a flexible but non-enforceable mechanism) with the Model Registry's built-in approval status, which is specifically designed to enforce deployment gates in SageMaker.

How to eliminate wrong answers

Option B is wrong because tagging a model version as 'production-ready' is a metadata label that does not enforce any deployment restrictions; SageMaker does not natively use tags to gate deployments. Option C is wrong because manually moving the model artifact to a production S3 bucket bypasses the Model Registry's approval workflow entirely, offering no governance or audit trail. Option D is wrong because while IAM policies can restrict deployment to specific model ARNs, they do not leverage the Model Registry's approval status; this approach would require manual ARN management and does not integrate with the approval workflow.

Practice this question →

67

MCQmedium

A machine learning team has a model that needs to serve predictions with very low latency (under 10 ms) for a real-time web application. The model is a small ensemble of three neural networks that fits in memory. Which SageMaker inference option is MOST appropriate?

A.SageMaker batch transform

B.SageMaker real-time endpoint

C.SageMaker asynchronous inference

D.SageMaker serverless inference

AnswerB

Real-time endpoints are always running and can achieve sub-10 ms latency with appropriately sized instances.

Why this answer

SageMaker real-time endpoints are designed for low-latency, synchronous inference, making them the best fit for a model that must serve predictions in under 10 ms. Since the ensemble of three neural networks fits in memory, a real-time endpoint can keep the model loaded and respond to each request with minimal overhead, typically using HTTPS and the SageMaker InvokeEndpoint API.

Exam trap

The trap here is that candidates confuse 'low latency' with 'serverless' or 'asynchronous' options, not realizing that serverless inference has cold starts and asynchronous inference adds queueing delays, both of which break the sub-10 ms requirement.

How to eliminate wrong answers

Option A is wrong because SageMaker batch transform is an asynchronous, offline inference option that processes large datasets in batches and does not provide real-time, low-latency responses. Option C is wrong because SageMaker asynchronous inference is designed for requests with large payloads or long processing times, and it introduces queueing and callback mechanisms that add latency beyond the 10 ms requirement. Option D is wrong because SageMaker serverless inference auto-scales from zero and has a cold-start latency that can exceed 10 ms, making it unsuitable for sub-10 ms real-time predictions.

Practice this question →

68

MCQmedium

A team needs to deploy a new model version to production while minimizing risk. They want to route 5% of live traffic to the new model and 95% to the current model, and then gradually increase the new model's traffic. Which SageMaker deployment pattern should they use?

A.Shadow testing

B.Blue/green deployment

C.A/B testing with production variants

D.Canary deployment using production variants

AnswerD

Canary deployment with production variants allows gradual traffic shift from 5% to 100%.

Why this answer

Canary deployment uses production variants with weighted traffic allocation. By setting the new model variant to 5% and the current to 95%, and later adjusting weights, the team can gradually shift traffic. Blue/green is a full switch, and shadow testing duplicates traffic without affecting live responses.

Practice this question →

69

MCQeasy

An ML engineer needs to compile a trained TensorFlow model to run efficiently on a target edge device with an ARM CPU. Which AWS service should they use?

A.SageMaker Debugger

B.AWS Inferentia

C.SageMaker Neo

D.Amazon Elastic Inference

AnswerC

Neo optimizes models for target hardware, including ARM CPUs, using its compiler.

Why this answer

SageMaker Neo compiles trained models for specific hardware targets, including ARM CPUs, to optimize inference performance.

Practice this question →

70

MCQhard

A data science team uses SageMaker Pipelines to orchestrate their ML workflow. They noticed that even when source data hasn't changed, the pipeline re-runs all steps, wasting compute time. What should they enable to avoid redundant runs?

A.Enable pipeline caching by setting the CacheConfig property for each step

B.Configure the pipeline to run on a schedule instead of on-demand

C.Use the Parameter step to pass previous execution ID

D.Use Lambda step to check data changes before running

AnswerA

Caching causes the pipeline to skip steps if inputs and configuration haven't changed, saving time and cost.

Why this answer

Option A is correct because SageMaker Pipelines supports step caching via the `CacheConfig` property. When enabled, the pipeline checks if the step's inputs (including source data, parameters, and code) have changed since the last successful run. If no changes are detected, the step is skipped and the previous output is reused, eliminating redundant compute.

Exam trap

The trap here is that candidates may think caching requires external logic (like a Lambda step) or scheduling, when SageMaker Pipelines has a native `CacheConfig` property that directly addresses redundant runs with minimal configuration.

How to eliminate wrong answers

Option B is wrong because scheduling the pipeline does not prevent redundant runs; it only triggers execution at fixed intervals, which could still re-run all steps even when data hasn't changed. Option C is wrong because passing a previous execution ID via a Parameter step does not enable caching; it merely provides a reference but does not automatically skip unchanged steps. Option D is wrong because using a Lambda step to check data changes before running adds custom logic but is not a built-in mechanism for step-level caching; SageMaker Pipelines already provides `CacheConfig` for this purpose, making a Lambda workaround unnecessary and less efficient.

Practice this question →

71

MCQmedium

A team uses SageMaker Pipelines to automate retraining. They want to skip the training step if the data has not changed since the last run. Which feature should they enable?

A.Parameterized executions

B.Lineage tracking

C.Step caching

D.Condition step with a custom check

AnswerC

Why this answer

Step caching in SageMaker Pipelines allows you to reuse the output from a previous execution of a step if its input data and configuration parameters have not changed. By enabling caching on the training step, the pipeline automatically skips re-executing that step when the data is identical, saving time and cost. This directly addresses the requirement to skip retraining when data has not changed.

Exam trap

Cisco often tests the distinction between step caching (automatic, built-in) and a Condition step (manual, custom logic), leading candidates to overthink and choose the more complex option D when the simpler caching feature is the correct answer.

How to eliminate wrong answers

Option A is wrong because parameterized executions allow you to pass different parameters into a pipeline run, but they do not automatically skip steps based on data changes; they simply enable dynamic input values. Option B is wrong because lineage tracking records the relationships between artifacts and steps for governance and reproducibility, but it does not provide any mechanism to skip step execution. Option D is wrong because while a Condition step can branch pipeline execution based on a custom check, it requires you to implement the logic to compare data versions manually, whereas step caching provides built-in, automatic detection of unchanged inputs.

Practice this question →

72

Multi-Selecthard

An ML engineer needs to deploy a model that requires GPU acceleration but wants to reduce inference cost by optimizing the model. They are considering SageMaker Neo compilation and Amazon Elastic Inference. Which TWO statements are correct about these services? (Choose two.)

Select 2 answers

A.Amazon Elastic Inference attaches a dedicated GPU accelerator to a CPU instance, reducing cost compared to a full GPU instance

B.SageMaker Neo and Amazon Elastic Inference cannot be used together

C.SageMaker Neo provides a GPU acceleration service similar to Elastic Inference

D.SageMaker Neo optimizes the model by compiling it for the target hardware, reducing inference latency

E.Amazon Elastic Inference compiles the model to run on GPU hardware

AnswersA, D

Elastic Inference provides GPU acceleration at lower cost.

Why this answer

Option A is correct because Amazon Elastic Inference allows you to attach a fraction of a GPU accelerator to an Amazon EC2 CPU instance, providing GPU acceleration at a lower cost than using a full GPU instance. This reduces inference cost by only paying for the GPU compute you need, without the overhead of a dedicated GPU instance.

Exam trap

The trap here is confusing model compilation (SageMaker Neo) with hardware acceleration (Elastic Inference), leading candidates to think they are mutually exclusive or that Elastic Inference performs compilation.

Practice this question →

73

MCQhard

A machine learning engineer deploys a new model version to a SageMaker endpoint with production variants. They want to gradually shift traffic from the old model to the new model, monitoring for errors, and automatically roll back if the error rate exceeds 5%. Which deployment pattern should they use?

A.Canary deployment with CloudWatch alarms

B.A/B testing with traffic splitting

C.Blue/green deployment

D.Shadow testing

AnswerA

Why this answer

Canary deployments gradually shift traffic and allow automated rollback based on CloudWatch alarms. Blue/green switches all at once. A/B testing is for comparing variants.

Shadow testing mirrors traffic but doesn't serve the new model to users.

Practice this question →

74

MCQhard

A team uses SageMaker Pipelines to retrain a model nightly. They want to skip the training step if the new data is unchanged (same checksum as previous run) to save cost and time. Which pipeline configuration achieves this?

A.Enable pipeline caching on the training step

B.Use a Lambda step to check data before running the training step

C.Use a ConditionStep that compares the current data checksum to the previous run's checksum, and branch to a NoOp step if unchanged

D.Set the training step's CacheConfig with a TTL of 24 hours

AnswerC

This allows skipping the training step dynamically based on data content changes.

Why this answer

Option C is correct because SageMaker Pipelines' ConditionStep allows you to evaluate a condition—such as comparing the current data checksum to a stored previous checksum—and branch accordingly. If the checksums match, you can route to a NoOp step (which does nothing) instead of executing the training step, thereby skipping the training and saving cost and time. This is the native, recommended pattern for conditional execution in SageMaker Pipelines.

Exam trap

The trap here is that candidates confuse pipeline caching (which caches based on step input parameters) with conditional branching based on external data state, leading them to pick Option A or D, which do not actually evaluate data checksums.

How to eliminate wrong answers

Option A is wrong because pipeline caching in SageMaker reuses a step's output only if the step's input parameters and source code are unchanged; it does not evaluate external data checksums, so it would not detect unchanged new data. Option B is wrong because a Lambda step can check the data, but it cannot directly skip the training step; you would still need a ConditionStep to branch based on the Lambda's result, making the Lambda step redundant and adding unnecessary complexity. Option D is wrong because CacheConfig with a TTL of 24 hours caches the step's output for that duration regardless of data changes, which would incorrectly skip training even if the data had changed within the TTL window, and it does not compare checksums.

Practice this question →

75

MCQmedium

A company has 200 small PyTorch models that are each used infrequently but need to be available for real-time inference. To minimize costs, they want to host all models on a single endpoint. Which SageMaker feature should they use?

A.Multi-model endpoint (MME)

B.Multi-container endpoint

C.Batch Transform job

D.Asynchronous inference endpoint

AnswerA

Why this answer

Multi-model endpoints allow hosting hundreds of models on a single endpoint, automatically loading/unloading models based on traffic. Multi-container endpoints are for different containers, not multiple models. Batch and asynchronous are not real-time.

Practice this question →

Page 1 of 2 · 108 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Mla Deployment Orchestration questions.

Start 20-question session

CCNA Mla Deployment Orchestration Questions — Page 1 of 2 | Courseiva