MLA-C01 Deployment and Orchestration of ML Workflows — 25 Questions

Question 1

A data scientist needs to deploy a single ML model that will serve real-time predictions with low latency (under 10 ms) for a high-traffic web application. The model fits in memory and requires GPU acceleration. Which SageMaker inference option is MOST suitable?

Accepted Answer

Real-time endpoint on ml.g4dn instances. Real-time endpoints on GPU instances (ml.g4dn) provide low latency and GPU acceleration, ideal for high-traffic, latency-sensitive workloads.

Answer

Real-time endpoint on ml.m5 instances

Answer

Batch Transform

Answer

Serverless Inference

Question 2

A team has 200 small ML models that need to be served via HTTPS endpoints. Each model is used infrequently, and the team wants to minimize hosting costs. Which SageMaker deployment approach is MOST cost-effective?

Accepted Answer

Use a single multi-model endpoint (MME). Multi-model endpoints (MME) allow hosting multiple models on a single endpoint, sharing instances and reducing costs, especially for infrequently used models.

Answer

Use SageMaker Serverless Inference for each model

Answer

Deploy each model on a separate real-time endpoint

Answer

Use Batch Transform for all models

Question 3

An ML team uses SageMaker Pipelines to automate model retraining. They want to skip redundant training steps when input data has not changed. Which feature should they enable?

Accepted Answer

Pipeline caching. SageMaker Pipelines caching stores step outputs; if the step configuration and inputs are identical, the pipeline reuses the cached output, skipping execution.

Answer

Pipeline variable expressions

Answer

Model registry approval

Answer

Step parallelism

Question 4

A company needs to deploy a new model version to a SageMaker real-time endpoint. They want to route 5% of traffic to the new version initially to monitor for errors before full rollout. Which deployment strategy should they use?

Accepted Answer

Canary deployment with production variants. Option C is correct because a canary deployment with production variants allows you to route a specific percentage of traffic (e.g., 5%) to the new model version by adjusting the `InitialVariantWeight` parameter in the production variant configuration. This enables gradual traffic shifting while monitoring errors, and you can later increase the weight to 100% for full rollout. SageMaker real-time endpoints support this natively by hosting multiple model variants behind the same endpoint.

Answer

Blue/green deployment

Answer

Shadow testing

Answer

Multi-model endpoint

Question 5

An ML engineer needs to compile a trained TensorFlow model to run efficiently on a target edge device with an ARM CPU. Which AWS service should they use?

Accepted Answer

SageMaker Neo. SageMaker Neo compiles trained models for specific hardware targets, including ARM CPUs, to optimize inference performance.

Answer

SageMaker Debugger

Answer

AWS Inferentia

Answer

Amazon Elastic Inference

Question 6

A data science team uses SageMaker Pipelines for automated training. They need to conditionally register a model only if evaluation metrics exceed a threshold. Which pipeline step type should they use after the evaluation step?

Accepted Answer

Condition step. The Condition step evaluates a condition and branches the pipeline; if the condition is met, the pipeline proceeds to register the model.

Answer

Processing step

Answer

Transform step

Answer

RegisterModel step

Question 7

A company wants to serve a large ensemble of models using NVIDIA Triton Inference Server on SageMaker for high throughput GPU inference. Which SageMaker inference option supports this?

Accepted Answer

Real-time endpoint with a custom container running Triton. SageMaker supports Triton Inference Server through a custom real-time endpoint container, as Triton is optimized for GPU serving on NVIDIA hardware.

Answer

Asynchronous Inference

Answer

Multi-model endpoint

Answer

Serverless Inference

Question 8

An ML pipeline uses SageMaker Processing to run a feature engineering script. The script takes a long time and the team wants to speed up pipeline execution. What is the MOST effective approach?

Accepted Answer

Increase the instance count for the Processing step. Increasing the instance count for the SageMaker Processing step enables distributed execution of the feature engineering script across multiple nodes. SageMaker Processing supports distributed processing by default when you set the instance_count > 1, which can dramatically reduce wall-clock time for embarrassingly parallel workloads like feature engineering. This is the most effective approach because it directly parallelizes the computation without requiring code changes if the script is designed to work with distributed frameworks like PySpark or if the data is sharded appropriately.

Answer

Enable pipeline caching for the Processing step

Answer

Use a larger instance type with more vCPUs

Answer

Use a Tuning step instead

Question 9

A company uses SageMaker Model Registry to manage model versions. They want to automate the approval of models that pass automated evaluation, but require manual approval for others. Which Model Registry feature supports this workflow?

Accepted Answer

Approval workflow via pipeline Condition step. Model Registry supports approval statuses (PendingManualApproval, Approved, Rejected). Automated evaluation can set status to Approved, while borderline cases can be set to PendingManualApproval.

Answer

Cross-account deployment

Answer

Model versioning

Answer

Model lineage

Question 10

A company needs to deploy a model that processes large payloads (up to 1 GB) asynchronously. The results should be written to S3, and the team needs SNS notifications upon completion. Which SageMaker inference option is MOST suitable?

Accepted Answer

Asynchronous Inference. Asynchronous Inference is designed for large payloads, processes requests asynchronously, and supports SNS notifications on completion.

Answer

Batch Transform

Answer

Real-time endpoint

Answer

Serverless Inference

Question 11

An ML team uses AWS Step Functions to orchestrate a retraining pipeline triggered by EventBridge when new training data arrives. The pipeline includes a SageMaker training job and a model evaluation. If evaluation fails, the team wants to send an alert. How should they implement this?

Accepted Answer

Add a Catch rule in the Step Functions state machine to invoke a Lambda alert function. Step Functions supports error handling via Catch rules; a Catch on the training or evaluation task can transition to a Lambda function that sends an alert.

Answer

Use SQS dead-letter queue for failed training jobs

Answer

Configure SageMaker training job to publish to SNS on failure

Answer

Use EventBridge to monitor the training job status

Question 12

A startup wants to deploy a containerized ML application that includes both a model inference server and a preprocessing component in the same endpoint. Which SageMaker endpoint type supports running multiple containers?

Accepted Answer

Multi-container endpoint. Multi-container endpoints allow running multiple containers, enabling preprocessing and inference in the same endpoint.

Answer

Asynchronous Inference

Answer

Multi-model endpoint

Answer

Real-time endpoint

Question 13

A company uses SageMaker to deploy a model and wants to perform A/B testing by splitting traffic between two model variants. Which TWO actions should they take? (Select TWO.)

Accepted Answer

Configure two production variants on the endpoint, each with an initial weight. Option A is correct because SageMaker endpoints support multiple production variants, each with an assigned weight that determines the proportion of traffic routed to that variant. By setting initial weights (e.g., 50/50 or 90/10), you can split traffic between two model variants for A/B testing without deploying separate endpoints.

Answer

Use SageMaker Model Registry to approve both variants

Answer

Deploy each variant to a separate endpoint and use Route53 weighted routing

Answer

Enable shadow testing on the endpoint

Question 14

An organization wants to automate ML retraining using an event-driven architecture. Which THREE services should they combine? (Select THREE.)

Accepted Answer

SageMaker (training jobs or pipelines). Amazon SageMaker provides the training jobs and pipelines that execute the ML retraining workflow. Amazon EventBridge acts as the event bus that triggers retraining based on events such as new data arrival or model drift detection. AWS Lambda serves as the lightweight compute layer that can preprocess events, invoke SageMaker APIs, or orchestrate conditional logic before starting a training job.

Answer

AWS Glue

Answer

Amazon CloudWatch Logs

Question 15

A team uses SageMaker Pipelines to train and evaluate a model. They want to run the training step only if the data quality check passes, otherwise skip. Which TWO pipeline step types are required? (Select TWO.)

Accepted Answer

Condition step. The Condition step (B) is required because SageMaker Pipelines uses a Condition step to evaluate a boolean expression—such as whether a data quality check passed—and then conditionally execute subsequent steps. The Training step (D) is required because it is the step that actually runs the model training job, and it must be placed inside the 'If' branch of the Condition step to run only when the condition is true.

Answer

RegisterModel step

Answer

Processing step

Answer

Transform step

Question 16

A machine learning engineer needs to deploy a model that requires less than 100 ms inference latency for real-time predictions. The model is a small PyTorch model that fits in a single GPU. Which SageMaker inference option is MOST cost-effective for this scenario?

Accepted Answer

Serverless inference with max concurrency set to 10. For low latency and occasional traffic, serverless inference is cost-effective because it scales to zero when not in use and charges per inference. Real-time endpoints incur cost even when idle, batch transform is for offline processing, and asynchronous inference has higher latency.

Answer

Asynchronous inference endpoint

Answer

Real-time endpoint on ml.g4dn.xlarge

Answer

Batch transform job

Question 17

A team built a SageMaker Pipeline that includes a training step and a model evaluation step. They want to automatically register a model in SageMaker Model Registry only if the evaluation metric (accuracy) exceeds 0.9. Which pipeline step should be used to implement this conditional logic?

Accepted Answer

Condition step. The Condition step in SageMaker Pipelines allows you to add conditional branching logic, such as evaluating a metric and proceeding only if a condition is met. In this scenario, you would use a ConditionStep to check if the accuracy metric from the evaluation step exceeds 0.9, and then conditionally execute a RegisterModel step to register the model in SageMaker Model Registry.

Answer

RegisterModel step

Answer

Processing step

Answer

Transform step

Question 18

A company has 200 small models (each ~100 MB) that serve different customers. They want to minimize costs while keeping low latency for each customer. Which SageMaker deployment approach is MOST suitable?

Accepted Answer

Use a single multi-model endpoint (MME) on an ml.c5.large instance. A single multi-model endpoint (MME) on an ml.c5.large instance is the most suitable because it allows you to host up to 200 small models (each ~100 MB) on a single endpoint, dynamically loading and unloading models from Amazon EBS or Amazon EFS based on inference requests. This minimizes costs by sharing a single instance across all models while maintaining low latency for each customer, as the models are small enough to be cached in memory and loaded quickly on demand.

Answer

Deploy each model on a separate real-time endpoint

Answer

Use a multi-container endpoint with one container per model

Answer

Use serverless inference for each model

Question 19

A data science team uses SageMaker Pipelines to orchestrate their ML workflow. They noticed that even when source data hasn't changed, the pipeline re-runs all steps, wasting compute time. What should they enable to avoid redundant runs?

Accepted Answer

Enable pipeline caching by setting the CacheConfig property for each step. Option A is correct because SageMaker Pipelines supports step caching via the `CacheConfig` property. When enabled, the pipeline checks if the step's inputs (including source data, parameters, and code) have changed since the last successful run. If no changes are detected, the step is skipped and the previous output is reused, eliminating redundant compute.

Answer

Configure the pipeline to run on a schedule instead of on-demand

Answer

Use the Parameter step to pass previous execution ID

Answer

Use Lambda step to check data changes before running

Question 20

A company wants to update an existing SageMaker real-time endpoint to serve a new model version. They need to route a small percentage of traffic to the new version initially and monitor for errors before switching fully. Which deployment pattern supports this?

Accepted Answer

Canary deployment with weighted production variants. Option C is correct because SageMaker real-time endpoints support canary deployments by configuring multiple production variants with weighted traffic distribution. You can assign a small weight (e.g., 5%) to the new model version variant and 95% to the existing one, then monitor CloudWatch metrics for errors before shifting all traffic to the new variant. This matches the requirement for a gradual, monitored rollout.

Answer

Shadow testing

Answer

A/B testing with traffic splitting

Answer

Blue/green deployment