MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 3 — 15 Questions

Question 1

A machine learning team has a model that needs to serve predictions with very low latency (under 10 ms) for a real-time web application. The model is a small ensemble of three neural networks that fits in memory. Which SageMaker inference option is MOST appropriate?

Accepted Answer

SageMaker real-time endpoint. SageMaker real-time endpoints are designed for low-latency, synchronous inference, making them the best fit for a model that must serve predictions in under 10 ms. Since the ensemble of three neural networks fits in memory, a real-time endpoint can keep the model loaded and respond to each request with minimal overhead, typically using HTTPS and the SageMaker InvokeEndpoint API.

Answer

SageMaker batch transform

Answer

SageMaker asynchronous inference

Answer

SageMaker serverless inference

Question 2

A company wants to deploy a single model that processes images from a production line. The images are uploaded to an S3 bucket every few minutes, and the inference results must be stored back to S3. The team wants to avoid paying for idle compute and prefers a fully managed, on-demand solution. Which SageMaker inference option should they use?

Accepted Answer

SageMaker asynchronous inference. Asynchronous inference is designed for this use case: it processes images from S3 input, writes results to S3 output, scales to zero when idle, and is fully managed. Real-time endpoints are always running and incur cost when idle. Batch transform is not event-driven. Serverless inference is event-driven but has a payload limit and cold start that may not be suitable for image payloads.

Answer

SageMaker batch transform

Answer

SageMaker real-time endpoint with auto scaling

Answer

SageMaker serverless inference

Question 3

A data science team wants to host 50 different models for a recommendation engine. Each model is small (under 100 MB) and traffic patterns are unpredictable. They need to minimize cost and operational overhead. Which approach should they take?

Accepted Answer

Use a single multi-model endpoint (MME). Option C is correct because a single multi-model endpoint (MME) allows hosting multiple models (up to thousands) on the same endpoint, sharing the underlying compute instance. This minimizes cost and operational overhead for small models (under 100 MB) with unpredictable traffic, as the endpoint dynamically loads and unloads models from Amazon S3 into memory based on incoming requests, eliminating the need for separate endpoints or idle compute.

Answer

Deploy each model to its own real-time endpoint

Answer

Use SageMaker serverless inference for each model

Answer

Use a single multi-container endpoint

Question 4

A machine learning engineer needs to deploy a new version of a model gradually, initially sending 5% of traffic to the new version and 95% to the current version, while monitoring for errors. Which deployment pattern should they use?

Accepted Answer

Canary deployment. Canary deployment is the correct pattern because it allows the ML engineer to route a small percentage of traffic (e.g., 5%) to the new model version while keeping the majority (95%) on the current version. This enables gradual rollout with real-time monitoring for errors, and if issues are detected, traffic can be instantly shifted back to the stable version.

Answer

Blue/green deployment

Answer

Shadow testing

Answer

Rolling deployment

Question 5

A company is using SageMaker Pipelines to orchestrate their ML workflow. They have a Condition step that checks if a model's accuracy exceeds 0.9. If true, they want to register the model in the model registry; otherwise, they want to run a retraining step. Which step type should they use for the decision?

Accepted Answer

Condition step. The Condition step in SageMaker Pipelines allows you to choose between two branches based on a condition. The other options are not designed for branching: Transform is for batch inference, Tuning is for hyperparameter optimization, and Processing is for data processing.

Answer

Transform step

Answer

Processing step

Answer

Tuning step

Question 6

An organization wants to ensure that only approved model versions can be deployed to production. They use the SageMaker Model Registry to track model versions. How can they enforce that only approved models are deployed?

Accepted Answer

Use IAM policies to restrict deployment to only Approved model versions. Option C is correct because AWS IAM policies can be used to conditionally restrict SageMaker API actions (e.g., CreateEndpointConfig, CreateModel) based on the model version's approval status. By evaluating the `sagemaker:ModelPackageApprovalStatus` condition key in an IAM policy, you can enforce that only model versions with an `Approved` status can be deployed, providing a native, automated, and auditable enforcement mechanism without manual intervention or external dependencies.

Answer

Manually review each model before deployment

Answer

Use SageMaker Model Monitor to check model quality after deployment

Answer

Store model metadata in a DynamoDB table and check it before deployment

Question 7

A team has a large deep learning model that needs to be deployed for real-time inference with GPU acceleration. They want to use the Triton Inference Server on SageMaker to maximize throughput. Which instance type and configuration should they choose?

Accepted Answer

Deploy on an ml.g4dn.xlarge instance using the SageMaker Triton Inference Server container. Option A is correct because the Triton Inference Server is specifically designed for high-performance inference on large deep learning models, supporting GPU acceleration and dynamic batching to maximize throughput. The ml.g4dn.xlarge instance provides a cost-effective GPU (T4) with sufficient memory for many models, and SageMaker's pre-built Triton container enables seamless deployment with features like model concurrency and request scheduling.

Answer

Deploy on an ml.c5.2xlarge instance using a PyTorch container

Answer

Deploy on an ml.p3.2xlarge instance using the SageMaker built-in XGBoost container

Answer

Deploy on an ml.m5.large instance with a standard TensorFlow serving container

Question 8

A company uses SageMaker Pipelines to automate their ML workflow. They notice that the pipeline reruns all steps even when the input data has not changed. Which feature should they enable to avoid unnecessary recomputation?

Accepted Answer

Enable pipeline caching. Pipeline caching in SageMaker Pipelines automatically reuses the output of a step if its inputs (including parameters, data, and code) have not changed since the last successful execution. This avoids recomputation by comparing a hash of the step's dependencies against previous runs, making it the correct feature to prevent unnecessary reruns when input data remains identical.

Answer

Use a Lambda step to check input changes

Answer

Use a Conditional step to skip steps

Answer

Set the pipeline execution mode to 'Parallel'

Question 9

An ML engineer wants to use MLflow on SageMaker to track experiments and log metrics. They have set up MLflow on an EC2 instance. How can they best integrate MLflow tracking with SageMaker training jobs?

Accepted Answer

Set the MLFLOW_TRACKING_URI environment variable in the training job and use the mlflow library in the training script. Option C is correct because the ML engineer can set the `MLFLOW_TRACKING_URI` environment variable in the SageMaker training job definition and use the `mlflow` library inside the training script to log parameters, metrics, and artifacts directly to the MLflow tracking server running on the EC2 instance. This approach allows the training job to communicate with the external MLflow server over HTTP/HTTPS without requiring any additional SageMaker integrations.

Answer

Install MLflow on the SageMaker notebook instance only

Answer

Use the SageMaker Experiments integration with MLflow

Answer

Use SageMaker Processing to run MLflow after training

Question 10

A company needs to update a model in production without any downtime. They currently have a single real-time endpoint serving traffic. Which approach allows them to deploy a new model version and switch traffic gradually while being able to roll back quickly?

Accepted Answer

Use a canary deployment by creating a new production variant with the new model and shifting traffic incrementally. SageMaker supports production variants with traffic splitting. By creating a new variant with the new model and shifting traffic gradually, the old variant remains available for rollback. Blue/green deployment with a new endpoint and endpoint configuration swap also allows quick rollback. The key is to have both variants active during the transition.

Answer

Use a multi-model endpoint and replace the model file

Answer

Stop the endpoint, update the model, and restart the endpoint

Answer

Update the existing endpoint's model directly using UpdateEndpoint

Question 11

Which SageMaker feature compiles a trained model into an optimized binary for a specific hardware target (e.g., Intel, ARM, NVIDIA, or edge devices) to improve inference performance?

Accepted Answer

SageMaker Neo. SageMaker Neo is a model compilation service that optimizes models for specific hardware targets. Amazon Elastic Inference attaches GPU acceleration to endpoints, but does not compile models. Model Monitor monitors quality. SageMaker Clarify explains predictions.

Answer

SageMaker Model Monitor

Answer

Amazon Elastic Inference

Answer

SageMaker Clarify

Question 12

A team has a SageMaker Pipeline that trains a model and registers it in the Model Registry. They want to automate the deployment of the approved model to a staging environment. Which event-driven approach should they use?

Accepted Answer

Use Amazon EventBridge to listen for Model Registry approval events and trigger an AWS Lambda function that deploys the model. The Amazon EventBridge integration with SageMaker can trigger on Model Registry status changes (e.g., when a model version is approved). A Lambda function can then deploy the model to a staging endpoint. Step Functions can be used, but the trigger should be EventBridge. CloudWatch alarms are for monitoring metrics.

Answer

Use an SQS queue to store approval messages and have a cron job process them

Answer

Set up a CloudWatch alarm on the Model Registry's ApprovalStatus metric

Answer

Configure an AWS Step Functions state machine to poll the Model Registry every minute

Question 13

An ML engineer is designing a SageMaker Pipeline for a computer vision model. The pipeline includes steps for data processing, training, evaluation, and registration. The engineer wants to enable caching to avoid reprocessing when step inputs have not changed. For which steps is caching supported? (Select TWO.)

Accepted Answer

Processing step. Caching is supported for the following step types: Processing, Training, Tuning, Transform, and AutoML. Condition steps and Lambda steps do not support caching because they are control flow steps.

Answer

Condition step

Answer

Lambda step

Answer

RegisterModel step

Question 14

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but wants to minimize costs by using a smaller attached GPU. Which options can they use? (Select TWO.)

Accepted Answer

Amazon Elastic Inference. Amazon Elastic Inference (Option A) allows you to attach a smaller, configurable GPU acceleration resource to a SageMaker endpoint, enabling GPU-accelerated inference without the cost of a full GPU instance. This directly meets the requirement of minimizing costs by using a smaller attached GPU.

Answer

SageMaker Neo compilation

Answer

Quantize the model to INT8 precision

Answer

Use SageMaker serverless inference with GPU

Question 15

A team is migrating their ML infrastructure to AWS and wants to use infrastructure as code to manage SageMaker Studio domains, user profiles, and associated resources. Which services can they use for this purpose? (Select THREE.)

Accepted Answer

AWS CDK (Cloud Development Kit). AWS CDK (Cloud Development Kit) is correct because it allows you to define AWS infrastructure, including SageMaker Studio domains and user profiles, using familiar programming languages like Python or TypeScript. CDK synthesizes these definitions into CloudFormation templates, enabling infrastructure as code (IaC) for SageMaker resources. This approach provides type safety and high-level abstractions, making it suitable for managing complex ML environments.

Answer

SageMaker Python SDK

Answer

Boto3