MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 5 — 15 Questions

Question 1

A data science team has trained a PyTorch model for real-time inference and needs to deploy it on AWS with GPU acceleration while minimizing cold-start latency. Which SageMaker inference option should they choose?

Accepted Answer

Real-time endpoint with ml.g4dn instance. Real-time endpoints with GPU instances (e.g., ml.g4dn) provide low latency and support GPU acceleration, suitable for interactive inference. Serverless inference does not support GPU instances, asynchronous inference is for non-real-time, and batch transform is for offline predictions.

Answer

Serverless inference

Answer

Batch transform

Answer

Asynchronous inference endpoint

Question 2

A company has 50 small PyTorch models that are used infrequently for inference. They want to minimize costs while maintaining the ability to serve all models from a single endpoint. Which SageMaker feature should they use?

Accepted Answer

Multi-model endpoint. Multi-model endpoints (MME) allow hosting multiple models on a single endpoint, loading models dynamically based on the target model in the request. This reduces cost for many small, infrequently used models by sharing the underlying instance.

Answer

Multi-container endpoint

Answer

Batch transform job

Answer

Real-time endpoint with 50 production variants

Question 3

A machine learning team uses SageMaker Pipelines to automate retraining. They want to avoid re-running data processing steps if the data has not changed since the last successful pipeline run. Which built-in feature should they enable?

Accepted Answer

Pipeline caching. Pipeline caching is the correct choice because SageMaker Pipelines can cache the outputs of each step based on a hash of the step's input parameters, configuration, and code. If the hash matches a previous successful run, the cached output is reused, avoiding redundant execution of data processing steps when the underlying data hasn't changed.

Answer

Model lineage tracking

Answer

Parameterized pipeline executions

Answer

Step parallelism

Question 4

A team needs to deploy a new model version to production while minimizing risk. They want to route 5% of live traffic to the new model and 95% to the current model, and then gradually increase the new model's traffic. Which SageMaker deployment pattern should they use?

Accepted Answer

Canary deployment using production variants. Canary deployment uses production variants with weighted traffic allocation. By setting the new model variant to 5% and the current to 95%, and later adjusting weights, the team can gradually shift traffic. Blue/green is a full switch, and shadow testing duplicates traffic without affecting live responses.

Answer

Shadow testing

Answer

Blue/green deployment

Answer

A/B testing with production variants

Question 5

A company wants to run inference on a large dataset stored in S3 using a pre-trained model. The inference can tolerate latency from minutes to hours, and they want a fully managed solution that autoscales to handle large volumes. Which SageMaker inference option is most suitable?

Accepted Answer

Batch transform. Batch transform is the most suitable option because the company needs to run inference on a large dataset stored in S3 with latency tolerance from minutes to hours, and requires a fully managed, autoscaling solution. SageMaker Batch Transform processes the entire dataset as a single job, automatically provisions and scales compute resources, and writes results back to S3 without the need for persistent endpoints.

Answer

Real-time endpoint

Answer

Asynchronous inference

Answer

Serverless inference

Question 6

A machine learning engineer needs to optimize a trained TensorFlow model for deployment on edge devices with limited compute. Which SageMaker feature should they use to compile the model for target hardware?

Accepted Answer

SageMaker Neo. SageMaker Neo is the correct choice because it is specifically designed to compile trained machine learning models into an optimized format for target hardware architectures, such as ARM, Intel, or NVIDIA, enabling efficient inference on edge devices with limited compute resources. It uses a compiler to apply hardware-specific optimizations like operator fusion and memory layout tuning, reducing latency and memory footprint without requiring manual code changes.

Answer

SageMaker Model Monitor

Answer

SageMaker Debugger

Answer

SageMaker Elastic Inference

Question 7

A team uses SageMaker Pipelines with a Condition step to decide whether to register a model based on evaluation metrics. They want to also store the evaluation results for lineage tracking. Which step should they use to record the metrics?

Accepted Answer

RegisterModel step. The RegisterModel step in SageMaker Pipelines is designed to create a model version in the SageMaker Model Registry, and it can accept metadata such as evaluation metrics via the `InferenceSpecification` or by passing a metrics dictionary. This allows the team to store evaluation results alongside the model for lineage tracking, fulfilling the requirement to record metrics after a Condition step approves registration.

Answer

Condition step

Answer

Training step

Answer

Processing step

Question 8

A company wants to deploy a PyTorch model on SageMaker using the NVIDIA Triton Inference Server for GPU acceleration. They have an existing Triton configuration. Which approach should they take?

Accepted Answer

Use the SageMaker Triton Inference Server container from the Deep Learning Containers. Option C is correct because AWS provides a pre-built SageMaker Triton Inference Server container as part of the Deep Learning Containers (DLCs), which is optimized for GPU acceleration and supports the existing Triton configuration without modification. This container integrates directly with SageMaker hosting endpoints, enabling seamless deployment of PyTorch models with Triton's features like dynamic batching and model concurrency.

Answer

Use SageMaker Neo to compile the model for Triton

Answer

Package Triton as a custom container and use SageMaker batch transform

Answer

Use the standard SageMaker PyTorch container and install Triton at runtime

Question 9

A team wants to implement an event-driven retraining pipeline that triggers retraining when new data arrives in an S3 bucket. The pipeline should include preprocessing, training, evaluation, and conditional registration. Which AWS services should they combine?

Accepted Answer

S3 + Lambda + SageMaker Pipeline. Option C is correct because S3 triggers a Lambda function on new data arrival, which invokes SageMaker Pipeline for the full ML workflow (preprocessing, training, evaluation, and conditional registration). SageMaker Pipeline natively supports conditional steps for model registration, making it the only option that directly satisfies the conditional registration requirement without additional custom logic.

Answer

S3 + SQS + SageMaker Batch Transform

Answer

S3 + Step Functions + SageMaker Training Job

Answer

S3 + EventBridge + SageMaker Model Registry

Question 10

A team needs to deploy a model that has compliance requirements to log all inference requests and responses for auditing. The model will be served using a real-time endpoint. How can they achieve this without custom code?

Accepted Answer

Enable SageMaker Data Capture on the endpoint. SageMaker Data Capture is the native, no-code feature that automatically logs inference requests and responses for real-time endpoints. It captures payload data to an S3 bucket without requiring any custom code, directly meeting the compliance requirement for audit logging.

Answer

Add a custom Lambda function using a container

Answer

Use SageMaker Debugger to monitor inference

Answer

Enable CloudTrail for the endpoint

Question 11

A company wants to version and track ML models, with an approval workflow for promoting models from staging to production. Which SageMaker feature should they use?

Accepted Answer

SageMaker Model Registry. SageMaker Model Registry is the correct choice because it provides a centralized repository to catalog, version, and manage ML models, and it supports approval workflows (e.g., PendingApproval, Approved, Rejected) to promote models from staging to production. This directly addresses the requirement for version tracking and an approval gate for model promotion.

Answer

SageMaker Model Monitor

Answer

SageMaker Experiments

Answer

SageMaker Pipelines

Question 12

A startup wants to deploy a model that has variable traffic patterns, with some periods of no traffic and occasional spikes. They want to pay only for what they use and do not want to manage instances. Which SageMaker inference option should they choose?

Accepted Answer

Serverless inference. Serverless inference is the correct choice because it automatically scales to zero during periods of no traffic and scales up to handle spikes, charging only for the compute time used. This eliminates the need to manage underlying instances, making it ideal for variable and intermittent traffic patterns.

Answer

Batch transform

Answer

Real-time endpoint with auto-scaling

Answer

Multi-model endpoint

Question 13

A company wants to deploy a new model using a canary deployment strategy on SageMaker. Which two actions should they take? (Select TWO.)

Accepted Answer

Create a new endpoint with two production variants. To implement canary deployment, create two production variants (current and new) with initial traffic weights (e.g., 95% and 5%), then update the endpoint to gradually shift traffic. Using endpoint update with routing config adjusts traffic weights over time.

Answer

Register both models in the Model Registry with 'Approved' status

Answer

Use SageMaker Model Monitor to compare model performance

Answer

Enable data capture on the endpoint

Question 14

A machine learning engineer is designing a SageMaker Pipeline that includes a training step, a processing step for evaluation, and a condition step to decide whether to register the model. The pipeline should support caching to avoid redundant runs when inputs haven't changed. Which three steps must have caching enabled? (Select THREE.)

Accepted Answer

Training step. For caching to avoid redundant runs, the steps that produce outputs that can be reused must have caching enabled. The processing step (evaluation) and training step both generate outputs that can be cached if their inputs (code, data, hyperparameters) remain the same. The condition step does not produce outputs to cache; it just branches. The RegisterModel step typically registers metadata, but its inputs (model artifact, metrics) may be generated by previous steps; enabling caching on the RegisterModel step can also avoid re-running if the same model artifact is already registered.

Answer

Transform step (if used)

Answer

Condition step

Question 15

A team wants to deploy a single SageMaker real-time endpoint that serves both a PyTorch model for NLP and a TensorFlow model for image classification. Each model requires a different inference container. Which two features can they use together to achieve this? (Select TWO.)

Accepted Answer

Multi-container endpoint. A multi-container endpoint allows running multiple containers (e.g., PyTorch and TensorFlow) on the same endpoint. With inference components, each container can be associated with a specific model, and the routing logic directs requests to the appropriate container based on the model name.

Answer

Multi-model endpoint

Answer

Production variants

Answer

SageMaker Neo compilation