MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 2 — 15 Questions

Question 1

A machine learning engineer needs to deploy a model that requires less than 100 ms inference latency for real-time predictions. The model is a small PyTorch model that fits in a single GPU. Which SageMaker inference option is MOST cost-effective for this scenario?

Accepted Answer

Serverless inference with max concurrency set to 10. For low latency and occasional traffic, serverless inference is cost-effective because it scales to zero when not in use and charges per inference. Real-time endpoints incur cost even when idle, batch transform is for offline processing, and asynchronous inference has higher latency.

Answer

Asynchronous inference endpoint

Answer

Real-time endpoint on ml.g4dn.xlarge

Answer

Batch transform job

Question 2

A team built a SageMaker Pipeline that includes a training step and a model evaluation step. They want to automatically register a model in SageMaker Model Registry only if the evaluation metric (accuracy) exceeds 0.9. Which pipeline step should be used to implement this conditional logic?

Accepted Answer

Condition step. The Condition step in SageMaker Pipelines allows you to add conditional branching logic, such as evaluating a metric and proceeding only if a condition is met. In this scenario, you would use a ConditionStep to check if the accuracy metric from the evaluation step exceeds 0.9, and then conditionally execute a RegisterModel step to register the model in SageMaker Model Registry.

Answer

RegisterModel step

Answer

Processing step

Answer

Transform step

Question 3

A company has 200 small models (each ~100 MB) that serve different customers. They want to minimize costs while keeping low latency for each customer. Which SageMaker deployment approach is MOST suitable?

Accepted Answer

Use a single multi-model endpoint (MME) on an ml.c5.large instance. A single multi-model endpoint (MME) on an ml.c5.large instance is the most suitable because it allows you to host up to 200 small models (each ~100 MB) on a single endpoint, dynamically loading and unloading models from Amazon EBS or Amazon EFS based on inference requests. This minimizes costs by sharing a single instance across all models while maintaining low latency for each customer, as the models are small enough to be cached in memory and loaded quickly on demand.

Answer

Deploy each model on a separate real-time endpoint

Answer

Use a multi-container endpoint with one container per model

Answer

Use serverless inference for each model

Question 4

A data science team uses SageMaker Pipelines to orchestrate their ML workflow. They noticed that even when source data hasn't changed, the pipeline re-runs all steps, wasting compute time. What should they enable to avoid redundant runs?

Accepted Answer

Enable pipeline caching by setting the CacheConfig property for each step. Option A is correct because SageMaker Pipelines supports step caching via the `CacheConfig` property. When enabled, the pipeline checks if the step's inputs (including source data, parameters, and code) have changed since the last successful run. If no changes are detected, the step is skipped and the previous output is reused, eliminating redundant compute.

Answer

Configure the pipeline to run on a schedule instead of on-demand

Answer

Use the Parameter step to pass previous execution ID

Answer

Use Lambda step to check data changes before running

Question 5

A company wants to update an existing SageMaker real-time endpoint to serve a new model version. They need to route a small percentage of traffic to the new version initially and monitor for errors before switching fully. Which deployment pattern supports this?

Accepted Answer

Canary deployment with weighted production variants. Option C is correct because SageMaker real-time endpoints support canary deployments by configuring multiple production variants with weighted traffic distribution. You can assign a small weight (e.g., 5%) to the new model version variant and 95% to the existing one, then monitor CloudWatch metrics for errors before shifting all traffic to the new variant. This matches the requirement for a gradual, monitored rollout.

Answer

Shadow testing

Answer

A/B testing with traffic splitting

Answer

Blue/green deployment

Question 6

A team uses MLflow on SageMaker for experiment tracking. They want to automatically deploy the best-performing model from an MLflow run to a SageMaker endpoint for real-time inference. What is the MOST efficient way to achieve this?

Accepted Answer

Use SageMaker Pipelines with the MLflow integration to register the model and deploy via a Transform step. The MLflow Model Registry can be integrated with SageMaker via the MLflow plugin for SageMaker, which allows direct deployment from the registry to an endpoint. Alternatively, using SageMaker Pipelines with the MLflow integration is more automated and production-grade.

Answer

Use AWS Step Functions to trigger an MLflow run and then call SageMaker CreateEndpoint

Answer

Set up an EventBridge rule to trigger a Lambda that deploys the model whenever a new MLflow run is logged

Answer

Manually export the model artifact from MLflow and upload to S3, then create a SageMaker model and endpoint

Question 7

A company deploys a large NLP model on a SageMaker real-time endpoint using an ml.p3.2xlarge instance. To reduce inference cost without sacrificing throughput, they want to compile the model for their target hardware. Which service should they use?

Accepted Answer

SageMaker Neo. SageMaker Neo is the correct service because it compiles trained machine learning models into an optimized binary for a specific target hardware (e.g., ml.p3.2xlarge with NVIDIA GPUs). This reduces inference latency and cost by applying hardware-specific optimizations such as kernel fusion and memory layout tuning, while preserving the original model's throughput. The compilation process uses Apache TVM under the hood to generate efficient code for the target instance type.

Answer

Triton Inference Server on SageMaker

Answer

Amazon Elastic Inference

Answer

SageMaker Inference Recommender

Question 8

A company uses SageMaker Model Registry to manage model versions. They have a cross-account deployment requirement: models approved in the development account must be deployed to a production account. Which approach is the MOST secure and recommended?

Accepted Answer

Share the model package group from the development account to the production account using AWS RAM, then create a model version in the production account. Cross-account deployment can be achieved by sharing the model package across accounts using AWS Resource Access Manager (RAM) or by exporting the model artifact to an S3 bucket with appropriate cross-account permissions, then creating the model in the target account.

Answer

Export the model from Model Registry to a tar.gz file and upload to the production account manually

Answer

Copy the model artifact to a public S3 bucket and then create the model in the production account

Answer

Use a Lambda function in the development account to call CreateEndpoint in the production account using cross-account IAM roles

Question 9

A team wants to use AWS Step Functions to orchestrate a retraining workflow that is triggered when new data arrives in an S3 bucket. They also need to monitor model drift. Which event-driven approach should they use?

Accepted Answer

Configure EventBridge to capture S3 PutObject events and target an AWS Step Functions state machine that runs the retraining pipeline. Option A is correct because AWS EventBridge can capture S3 PutObject events (via S3's default event notifications or a more granular EventBridge rule) and directly target a Step Functions state machine as a target. This creates a fully event-driven, serverless orchestration for the retraining pipeline without polling or custom code. Step Functions then coordinates the retraining steps, including model drift monitoring, in a reliable and auditable manner.

Answer

Use a cron-based Step Function schedule that checks for new data every hour

Answer

Set up an S3 event notification to invoke a Lambda function that starts a SageMaker training job directly

Answer

Use SageMaker Pipelines with a schedule trigger

Question 10

A company wants to deploy a PyTorch model that uses dynamic batching and model ensemble. They need to serve multiple models with different frameworks (PyTorch, TensorFlow) within the same endpoint. Which SageMaker feature should they use?

Accepted Answer

Multi-container endpoint. B is correct because a multi-container endpoint allows you to run multiple containers (e.g., one for PyTorch, one for TensorFlow) within the same SageMaker endpoint, enabling model ensemble and dynamic batching across different frameworks. This feature supports serving models with heterogeneous frameworks and dependencies without needing separate endpoints, while still providing a single inference endpoint for clients.

Answer

Triton Inference Server on SageMaker

Answer

Separate endpoints for each framework

Answer

Multi-model endpoint (MME)

Question 11

A team uses SageMaker real-time endpoints for inference. They want to deploy a new model version and compare its performance with the current version under live traffic without affecting user experience. Which method should they use?

Accepted Answer

Shadow testing with SageMaker. Shadow testing (or shadow deployment) sends a copy of live traffic to the new model variant while the current variant serves the actual response. The shadow variant's performance can be monitored without impacting the user.

Answer

A/B testing with production variant traffic splitting

Answer

Batch transform on a holdout test set

Answer

Blue/green deployment

Question 12

A company wants to deploy a machine learning model using infrastructure as code to ensure reproducibility. They need to define the SageMaker Studio domain, user profiles, and the endpoint configuration. Which tool should they use?

Accepted Answer

AWS CloudFormation or AWS CDK. AWS CloudFormation and AWS CDK are infrastructure-as-code (IaC) tools that allow you to define, provision, and manage AWS resources declaratively. For this use case, they can model the entire SageMaker Studio domain, user profiles, and endpoint configuration in templates or code, ensuring reproducibility and version control. This aligns directly with the requirement to deploy ML infrastructure as code.

Answer

SageMaker Pipelines

Answer

AWS Step Functions

Answer

SageMaker Studio

Question 13

A data scientist needs to deploy an anomaly detection model that processes large payloads (up to 10 MB per request) and expects inference times of up to 10 minutes. The team wants to minimize cost and only pay per inference. Which TWO SageMaker inference options meet these requirements? (Choose TWO.)

Accepted Answer

Asynchronous inference endpoint. Option D is correct because asynchronous inference endpoints are designed for large payloads (up to 1 GB) and long processing times (up to 1 hour), making them ideal for this 10 MB, 10-minute inference workload. They also follow a pay-per-inference model, charging only for the duration of each inference request, which minimizes cost.

Answer

Batch transform

Answer

Real-time endpoint

Answer

Serverless inference

Question 14

A team is optimizing a deep learning model for deployment on SageMaker using SageMaker Neo. Which THREE of the following are valid optimization techniques that Neo can apply? (Choose THREE.)

Accepted Answer

Pruning (removing redundant weights). SageMaker Neo performs hardware-specific optimizations including quantisation (reducing precision), pruning (removing redundant weights), and operator fusion (combining operations). Knowledge distillation is a training-time technique, not part of Neo. Hyperparameter tuning is done by SageMaker Tuning jobs, not Neo.

Answer

Knowledge distillation

Answer

Hyperparameter tuning

Question 15

A company uses SageMaker Pipelines to automate their ML workflow. They need to add model versioning and approval workflow. Which THREE steps should they include in their pipeline to achieve this? (Choose THREE.)

Accepted Answer

RegisterModel step. The RegisterModel step is correct because it creates a model package in SageMaker Model Registry, which enables versioning and approval workflows. This step registers the trained model artifact along with metadata, allowing the pipeline to track model versions and trigger approval processes for deployment.

Answer

Training step

Answer

Transform step