MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 1 — 15 Questions

Question 1

A data scientist needs to deploy a single ML model that will serve real-time predictions with low latency (under 10 ms) for a high-traffic web application. The model fits in memory and requires GPU acceleration. Which SageMaker inference option is MOST suitable?

Accepted Answer

Real-time endpoint on ml.g4dn instances. Real-time endpoints on GPU instances (ml.g4dn) provide low latency and GPU acceleration, ideal for high-traffic, latency-sensitive workloads.

Answer

Real-time endpoint on ml.m5 instances

Answer

Batch Transform

Answer

Serverless Inference

Question 2

A team has 200 small ML models that need to be served via HTTPS endpoints. Each model is used infrequently, and the team wants to minimize hosting costs. Which SageMaker deployment approach is MOST cost-effective?

Accepted Answer

Use a single multi-model endpoint (MME). Multi-model endpoints (MME) allow hosting multiple models on a single endpoint, sharing instances and reducing costs, especially for infrequently used models.

Answer

Use SageMaker Serverless Inference for each model

Answer

Deploy each model on a separate real-time endpoint

Answer

Use Batch Transform for all models

Question 3

An ML team uses SageMaker Pipelines to automate model retraining. They want to skip redundant training steps when input data has not changed. Which feature should they enable?

Accepted Answer

Pipeline caching. SageMaker Pipelines caching stores step outputs; if the step configuration and inputs are identical, the pipeline reuses the cached output, skipping execution.

Answer

Pipeline variable expressions

Answer

Model registry approval

Answer

Step parallelism

Question 4

A company needs to deploy a new model version to a SageMaker real-time endpoint. They want to route 5% of traffic to the new version initially to monitor for errors before full rollout. Which deployment strategy should they use?

Accepted Answer

Canary deployment with production variants. Option C is correct because a canary deployment with production variants allows you to route a specific percentage of traffic (e.g., 5%) to the new model version by adjusting the `InitialVariantWeight` parameter in the production variant configuration. This enables gradual traffic shifting while monitoring errors, and you can later increase the weight to 100% for full rollout. SageMaker real-time endpoints support this natively by hosting multiple model variants behind the same endpoint.

Answer

Blue/green deployment

Answer

Shadow testing

Answer

Multi-model endpoint

Question 5

An ML engineer needs to compile a trained TensorFlow model to run efficiently on a target edge device with an ARM CPU. Which AWS service should they use?

Accepted Answer

SageMaker Neo. SageMaker Neo compiles trained models for specific hardware targets, including ARM CPUs, to optimize inference performance.

Answer

SageMaker Debugger

Answer

AWS Inferentia

Answer

Amazon Elastic Inference

Question 6

A data science team uses SageMaker Pipelines for automated training. They need to conditionally register a model only if evaluation metrics exceed a threshold. Which pipeline step type should they use after the evaluation step?

Accepted Answer

Condition step. The Condition step evaluates a condition and branches the pipeline; if the condition is met, the pipeline proceeds to register the model.

Answer

Processing step

Answer

Transform step

Answer

RegisterModel step

Question 7

A company wants to serve a large ensemble of models using NVIDIA Triton Inference Server on SageMaker for high throughput GPU inference. Which SageMaker inference option supports this?

Accepted Answer

Real-time endpoint with a custom container running Triton. SageMaker supports Triton Inference Server through a custom real-time endpoint container, as Triton is optimized for GPU serving on NVIDIA hardware.

Answer

Asynchronous Inference

Answer

Multi-model endpoint

Answer

Serverless Inference

Question 8

An ML pipeline uses SageMaker Processing to run a feature engineering script. The script takes a long time and the team wants to speed up pipeline execution. What is the MOST effective approach?

Accepted Answer

Increase the instance count for the Processing step. Increasing the instance count for the SageMaker Processing step enables distributed execution of the feature engineering script across multiple nodes. SageMaker Processing supports distributed processing by default when you set the instance_count > 1, which can dramatically reduce wall-clock time for embarrassingly parallel workloads like feature engineering. This is the most effective approach because it directly parallelizes the computation without requiring code changes if the script is designed to work with distributed frameworks like PySpark or if the data is sharded appropriately.

Answer

Enable pipeline caching for the Processing step

Answer

Use a larger instance type with more vCPUs

Answer

Use a Tuning step instead

Question 9

A company uses SageMaker Model Registry to manage model versions. They want to automate the approval of models that pass automated evaluation, but require manual approval for others. Which Model Registry feature supports this workflow?

Accepted Answer

Approval workflow via pipeline Condition step. Model Registry supports approval statuses (PendingManualApproval, Approved, Rejected). Automated evaluation can set status to Approved, while borderline cases can be set to PendingManualApproval.

Answer

Cross-account deployment

Answer

Model versioning

Answer

Model lineage

Question 10

A company needs to deploy a model that processes large payloads (up to 1 GB) asynchronously. The results should be written to S3, and the team needs SNS notifications upon completion. Which SageMaker inference option is MOST suitable?

Accepted Answer

Asynchronous Inference. Asynchronous Inference is designed for large payloads, processes requests asynchronously, and supports SNS notifications on completion.

Answer

Batch Transform

Answer

Real-time endpoint

Answer

Serverless Inference

Question 11

An ML team uses AWS Step Functions to orchestrate a retraining pipeline triggered by EventBridge when new training data arrives. The pipeline includes a SageMaker training job and a model evaluation. If evaluation fails, the team wants to send an alert. How should they implement this?

Accepted Answer

Add a Catch rule in the Step Functions state machine to invoke a Lambda alert function. Step Functions supports error handling via Catch rules; a Catch on the training or evaluation task can transition to a Lambda function that sends an alert.

Answer

Use SQS dead-letter queue for failed training jobs

Answer

Configure SageMaker training job to publish to SNS on failure

Answer

Use EventBridge to monitor the training job status

Question 12

A startup wants to deploy a containerized ML application that includes both a model inference server and a preprocessing component in the same endpoint. Which SageMaker endpoint type supports running multiple containers?

Accepted Answer

Multi-container endpoint. Multi-container endpoints allow running multiple containers, enabling preprocessing and inference in the same endpoint.

Answer

Asynchronous Inference

Answer

Multi-model endpoint

Answer

Real-time endpoint

Question 13

A company uses SageMaker to deploy a model and wants to perform A/B testing by splitting traffic between two model variants. Which TWO actions should they take? (Select TWO.)

Accepted Answer

Configure two production variants on the endpoint, each with an initial weight. Option A is correct because SageMaker endpoints support multiple production variants, each with an assigned weight that determines the proportion of traffic routed to that variant. By setting initial weights (e.g., 50/50 or 90/10), you can split traffic between two model variants for A/B testing without deploying separate endpoints.

Answer

Use SageMaker Model Registry to approve both variants

Answer

Deploy each variant to a separate endpoint and use Route53 weighted routing

Answer

Enable shadow testing on the endpoint

Question 14

An organization wants to automate ML retraining using an event-driven architecture. Which THREE services should they combine? (Select THREE.)

Accepted Answer

SageMaker (training jobs or pipelines). Amazon SageMaker provides the training jobs and pipelines that execute the ML retraining workflow. Amazon EventBridge acts as the event bus that triggers retraining based on events such as new data arrival or model drift detection. AWS Lambda serves as the lightweight compute layer that can preprocess events, invoke SageMaker APIs, or orchestrate conditional logic before starting a training job.

Answer

AWS Glue

Answer

Amazon CloudWatch Logs

Question 15

A team uses SageMaker Pipelines to train and evaluate a model. They want to run the training step only if the data quality check passes, otherwise skip. Which TWO pipeline step types are required? (Select TWO.)

Accepted Answer

Condition step. The Condition step (B) is required because SageMaker Pipelines uses a Condition step to evaluate a boolean expression—such as whether a data quality check passed—and then conditionally execute subsequent steps. The Training step (D) is required because it is the step that actually runs the model training job, and it must be placed inside the 'If' branch of the Condition step to run only when the condition is true.

Answer

RegisterModel step

Answer

Processing step

Answer

Transform step