MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 4 — 15 Questions

Question 1

A data science team needs to deploy a PyTorch model that performs real-time inference with sub-100ms latency. The model requires GPU acceleration, but the team wants to minimize cost by sharing GPU instances across multiple models. Which SageMaker hosting option should they choose?

Accepted Answer

SageMaker real-time endpoint with Multi-Model Endpoint (MME) on an ml.g4dn instance. Option A is correct because SageMaker Multi-Model Endpoint (MME) allows multiple PyTorch models to share a single GPU instance (e.g., ml.g4dn), reducing cost while meeting sub-100ms latency requirements. MME dynamically loads and unloads models into GPU memory based on traffic, enabling real-time inference with GPU acceleration without dedicating a full instance per model.

Answer

SageMaker real-time endpoint with a single model per ml.g4dn instance

Answer

SageMaker Serverless Inference

Answer

SageMaker Asynchronous Inference

Question 2

A machine learning engineer wants to automatically trigger a retraining pipeline whenever new training data arrives in an S3 bucket. The pipeline uses SageMaker Pipelines. Which AWS service should be used to detect the S3 event and start the pipeline?

Accepted Answer

Amazon EventBridge. Amazon EventBridge can be configured to listen for S3 events (e.g., PutObject) and then invoke a Lambda function that starts the SageMaker Pipeline execution. Step Functions could orchestrate the pipeline but is not needed to trigger on S3 events. SageMaker Pipelines does not natively listen to S3 events. CloudWatch Events is the older name for EventBridge.

Answer

SageMaker Pipelines native S3 trigger

Answer

AWS Step Functions

Answer

Amazon CloudWatch Logs

Question 3

A financial services company needs to deploy a machine learning model for real-time fraud detection. The model must be highly available across multiple Availability Zones and must support automatic scaling based on request volume. The company also needs to perform canary deployments to test new model versions with a small percentage of traffic before full rollout. Which SageMaker feature should they use?

Accepted Answer

SageMaker real-time endpoint with production variants. SageMaker real-time endpoints with production variants enable canary deployments by routing a small percentage of traffic to a new model version while the majority goes to the current version. This feature also supports multi-AZ deployment for high availability and automatic scaling based on request volume via Application Auto Scaling, meeting all the stated requirements.

Answer

SageMaker Multi-Model Endpoint

Answer

SageMaker Batch Transform

Answer

SageMaker Serverless Inference

Question 4

A team wants to use MLflow on SageMaker to track experiments and manage model lifecycle. They need to register models in the SageMaker Model Registry after training. Which approach allows them to use MLflow for experiment tracking and then register the best model to SageMaker Model Registry?

Accepted Answer

Use MLflow tracking server on SageMaker, then use the SageMaker MLflow plugin to register the model in SageMaker Model Registry. Option C is correct because the SageMaker MLflow plugin (sagemaker-mlflow) allows you to use an MLflow tracking server hosted on SageMaker for experiment tracking, and then directly register the best model into the SageMaker Model Registry via the plugin's integration. This avoids manual export steps and keeps the model lifecycle management within SageMaker's native registry, which is required by the team's goal.

Answer

Use MLflow to track experiments and register models in MLflow's native registry, then export to SageMaker

Answer

Use SageMaker Experiments for tracking, then manually register the model using SageMaker console

Answer

MLflow cannot be used with SageMaker; use SageMaker Experiments instead

Question 5

A company wants to deploy a trained XGBoost model for batch inference on a large dataset stored in S3. The inference job should be cost-effective and does not require real-time responses. Which SageMaker inference option should they use?

Accepted Answer

SageMaker Batch Transform. SageMaker Batch Transform is designed for batch inference on large datasets stored in S3, processing data in chunks and writing results to S3. It is cost-effective for non-real-time scenarios. Real-time endpoints are for low-latency inference. Serverless is for on-demand, not batch. Asynchronous is for near-real-time with S3 input/output but still not ideal for large batch jobs.

Answer

SageMaker real-time endpoint

Answer

SageMaker Asynchronous Inference

Answer

SageMaker Serverless Inference

Question 6

A machine learning engineer needs to deploy a TensorFlow model that requires a custom inference environment with specific system libraries. The model will be used in a real-time application with variable traffic. They want to minimize cold start latency. Which SageMaker hosting option should they choose?

Accepted Answer

SageMaker real-time endpoint with a custom container. SageMaker real-time endpoints with a custom container are the correct choice because they provide persistent, always-on infrastructure that eliminates cold start latency. By packaging the TensorFlow model with required system libraries in a custom Docker image, the engineer ensures the inference environment is ready immediately, and the endpoint can scale to handle variable traffic with minimal delay.

Answer

SageMaker Serverless Inference with a custom container

Answer

SageMaker Multi-Model Endpoint with a custom container

Answer

SageMaker Asynchronous Inference with a custom container

Question 7

A company uses SageMaker Pipelines to orchestrate their ML workflow. They notice that if a pipeline step fails due to a transient error (e.g., a brief network issue), the entire pipeline fails and they must manually rerun from the beginning. They want to automatically retry failed steps a few times before failing. What should they do?

Accepted Answer

Configure a RetryPolicy in the pipeline step definition to specify the number of retry attempts and backoff. SageMaker Pipelines supports retry policies for steps. By setting a RetryPolicy in the step definition with a maximum number of retry attempts, the pipeline will automatically retry the step on failure. The other options do not achieve automatic retry within the pipeline: Step Functions would require rebuilding the pipeline, Lambda cannot retry pipeline steps, and CreatePipelineExecution does not handle retries.

Answer

Use a Lambda function to catch step failures and re-invoke the step

Answer

Use the CreatePipelineExecution API with a flag to ignore failures

Answer

Use AWS Step Functions to orchestrate the workflow instead of SageMaker Pipelines

Question 8

A data scientist wants to compare the performance of two model versions (V1 and V2) in production by splitting traffic between them. They want to gradually increase the percentage of traffic to the new version while monitoring metrics. Which SageMaker feature enables this?

Accepted Answer

SageMaker production variants with traffic splitting. Production variants with traffic splitting allow routing a percentage of inference requests to different model versions. By updating the initial variant weights, the data scientist can gradually shift traffic from V1 to V2. Shadow testing mirrors traffic but does not affect real responses. Blue/green is a deployment pattern but not a SageMaker feature for gradual traffic splitting. Canary deployments are a pattern but SageMaker implements it via production variants.

Answer

SageMaker shadow testing

Answer

SageMaker blue/green deployment

Answer

SageMaker canary deployment

Question 9

A company is deploying a large NLP model on SageMaker for real-time inference. They want to reduce inference latency and cost by optimizing the model for the target hardware. The model is trained in PyTorch. Which SageMaker feature should they use to compile the model for best performance on the chosen instance?

Accepted Answer

SageMaker Neo. SageMaker Neo is the correct choice because it is specifically designed to compile trained models (including PyTorch models) into an optimized binary for a target hardware instance, reducing inference latency and improving throughput. Neo applies hardware-specific optimizations such as operator fusion, memory layout tuning, and quantization, which directly address the need for best performance on the chosen SageMaker instance.

Answer

AWS Step Functions

Answer

Amazon Elastic Inference

Answer

SageMaker Triton Inference Server

Question 10

A team needs to deploy a PyTorch model that uses custom CUDA kernels. They want to use NVIDIA Triton Inference Server on SageMaker for high-performance serving. Which SageMaker configuration is required to use Triton?

Accepted Answer

Use the SageMaker pre-built Triton Inference Server container available in Amazon ECR. Option B is correct because SageMaker provides a pre-built Triton Inference Server container in Amazon ECR that is optimized for high-performance serving of models, including those with custom CUDA kernels. This container eliminates the need to build a custom image from scratch, ensuring compatibility with SageMaker's deployment infrastructure and reducing operational overhead.

Answer

Create a custom container from scratch with Triton and deploy on SageMaker

Answer

Use a Multi-Model Endpoint with Triton

Answer

Attach an Amazon Elastic Inference accelerator to the endpoint

Question 11

A financial services company needs to enforce that only approved model versions are deployed to production. They use SageMaker Model Registry to track versions, with an approval workflow. Which action must they take in the model registry to ensure only approved models can be deployed?

Accepted Answer

Set the model version status to 'Approved' in the Model Registry. Option A is correct because the SageMaker Model Registry uses a status field to control the lifecycle of model versions. By setting the model version status to 'Approved', the company can enforce that only approved models are deployable, as SageMaker's deployment APIs (e.g., CreateModel, CreateEndpointConfig) can be configured to require an 'Approved' status. This integrates with the approval workflow, ensuring that unapproved or pending versions are blocked from production deployment.

Answer

Tag the model version as 'production-ready'

Answer

Manually move the model artifact to a production S3 bucket

Answer

Use AWS IAM policies to restrict deployment to specific model ARNs

Question 12

A company wants to deploy 50 small models (each ~100 MB) for real-time inference. They need to minimize hosting costs while maintaining low latency. Which SageMaker hosting option is most cost-effective?

Accepted Answer

SageMaker Multi-Model Endpoint (MME). Multi-Model Endpoint (MME) allows hosting multiple models on the same instance, sharing resources. Since the models are small, MME is cost-effective. Real-time endpoints would require separate instances. Serverless is for on-demand but may incur cold starts. Asynchronous is for batch-like workloads.

Answer

SageMaker Serverless Inference

Answer

SageMaker Asynchronous Inference

Answer

SageMaker real-time endpoint with one instance per model

Question 13

A data science team uses SageMaker Pipelines to automate their ML workflow. They want to reduce costs by reusing outputs from previous pipeline runs when the input data and code have not changed. Which TWO actions should they take? (Choose two.)

Accepted Answer

Set the StepStatus of successful steps to 'Cached'. Option A is correct because setting the StepStatus of successful steps to 'Cached' is not a direct action; rather, SageMaker Pipelines uses step caching to automatically reuse outputs from previous runs when the input data, code, and parameters are unchanged. By enabling step caching in the pipeline definition (Option E), SageMaker checks a cache key (hash of inputs, code, and parameters) and, if a match is found, skips re-execution and uses the cached output, reducing compute costs. Option A describes the result of caching (a step's status becomes 'Cached'), but the action to achieve that is enabling caching in the pipeline definition, which is why both A and E are correct.

Answer

Use parallel execution of pipeline steps

Answer

Create multiple pipeline versions for each run

Answer

Disable caching for all steps to avoid unnecessary storage costs

Question 14

An ML engineer needs to deploy a model that requires GPU acceleration but wants to reduce inference cost by optimizing the model. They are considering SageMaker Neo compilation and Amazon Elastic Inference. Which TWO statements are correct about these services? (Choose two.)

Accepted Answer

Amazon Elastic Inference attaches a dedicated GPU accelerator to a CPU instance, reducing cost compared to a full GPU instance. Option A is correct because Amazon Elastic Inference allows you to attach a fraction of a GPU accelerator to an Amazon EC2 CPU instance, providing GPU acceleration at a lower cost than using a full GPU instance. This reduces inference cost by only paying for the GPU compute you need, without the overhead of a dedicated GPU instance.

Answer

SageMaker Neo and Amazon Elastic Inference cannot be used together

Answer

SageMaker Neo provides a GPU acceleration service similar to Elastic Inference

Answer

Amazon Elastic Inference compiles the model to run on GPU hardware

Question 15

A company is using AWS Step Functions to orchestrate their ML retraining pipeline. They want to trigger retraining when new data arrives, but only if the model's performance has degraded below a threshold. Which THREE AWS services should they use together to achieve this? (Choose three.)

Accepted Answer

AWS Step Functions. A solution: Amazon EventBridge detects S3 events (new data), invokes a Lambda function that checks model performance (e.g., via SageMaker Model Monitor or custom metrics), and then starts a Step Functions workflow if degradation is detected. The other services: SageMaker Pipelines could replace Step Functions but is not listed as an option; SageMaker Model Monitor can track performance but is not an event source; CloudWatch Logs is not directly involved in the trigger logic.

Answer

Amazon CloudWatch Logs

Answer

SageMaker Model Registry