MLA-C01 Deployment and Orchestration of ML Workflows Practice Test 8 — 15 Questions

Question 1

Your team manages a SageMaker real-time endpoint for a financial services application that requires low latency for fraud detection. The model is a 1 GB XGBoost model. The endpoint is deployed on two ml.m5.xlarge instances with target tracking auto-scaling based on average CPU utilization at 70%. During peak hours, the endpoint receives a sudden burst of traffic that increases from 500 requests per second to 2000 requests per second within 30 seconds. Many requests start failing with 503 errors. The CPU utilization metric shows that the instances are at 90% before the scaling policy launches new instances. However, by the time the new instances are added (approximately 3 minutes), the burst has subsided. You need to prevent these failures during future bursts while keeping costs reasonable. Which action would be MOST effective?

Accepted Answer

Reduce the target tracking scaling metric to 45% CPU utilization and set a warm-up time of 120 seconds.. Option A is correct because reducing the target tracking scaling metric to 45% CPU utilization triggers scaling actions earlier, before the burst pushes CPU to 90%. Setting a warm-up time of 120 seconds ensures new instances are fully initialized and ready to serve traffic, preventing the 503 errors caused by the 3-minute lag in instance availability.

Answer

Change the scaling policy to step scaling with a lower cooldown (60 seconds) and add an alarm on invocation count.

Answer

Replace the two m5.xlarge instances with one m5.2xlarge instance and keep the same scaling policy.

Answer

Implement scheduled scaling to add two instances 5 minutes before the expected peak hour.

Question 2

Your company uses SageMaker batch transform to process a large dataset (5 TB) of customer transactions every night. The batch transform job uses a single ml.c5.4xlarge instance and takes about 6 hours to complete. However, the job recently started failing with an error message: 'Timed out waiting for transformation to complete. The maximum job duration is 3600 seconds.' You check the input data and notice that one of the input files is a single large JSON file of 50 GB, while the rest are smaller files. The job is configured with a batch strategy of 'MultiRecord' and a maximum payload size of 6 MB. What is the most likely cause of the timeout and which fix should you apply?

Accepted Answer

Split the large JSON file into smaller files (e.g., 100 MB each) before feeding to the batch transform job.. The batch transform job is timing out because the single 50 GB JSON file cannot be processed within the default 3600-second (1-hour) timeout. With a 'MultiRecord' batch strategy and a 6 MB maximum payload size, SageMaker must split the large file into many small batches, but the job still tries to read the entire file sequentially, causing excessive processing time. Splitting the large file into smaller files (e.g., 100 MB each) allows SageMaker to parallelize and complete the transform within the timeout.

Answer

Set the batch strategy to 'SingleRecord' so that each record is processed individually.

Answer

Increase the job timeout to 7200 seconds.

Answer

Increase the number of instances to 5 in the batch transform job.

Question 3

A startup is building a serverless inference API using AWS Lambda. They have a TensorFlow model that is 400 MB in size. They packaged the model and inference code into a Lambda function using a container image. When they test the function with a small input, it consistently times out after 3 seconds. The Lambda function has 512 MB of memory and a timeout of 30 seconds. The business requirement is that inference must complete in less than 5 seconds under normal conditions. What is the most likely cause of the slow performance, and which change should they make?

Accepted Answer

The Lambda function memory is insufficient for the model size; increase memory to 1024 MB or higher.. The most likely cause is that the Lambda function's memory (512 MB) is insufficient to load the 400 MB TensorFlow model into memory, causing excessive swapping or out-of-memory errors that drastically slow inference. Increasing memory to 1024 MB or higher provides more CPU and memory resources, allowing the model to fit and inference to complete within the required 5 seconds.

Answer

The function timeout is too low; increase the timeout to 60 seconds.

Answer

The function is experiencing a cold start; use provisioned concurrency to keep the container warm.

Answer

Use a Lambda function with a GPU container to accelerate inference.

Question 4

A company deploys a deep learning model to a real-time SageMaker endpoint. After deployment, users report high inference latency. Which action is the MOST effective first step to reduce latency?

Accepted Answer

Compile the model using SageMaker Neo to optimize for the target instance.. SageMaker Neo compiles the trained model to optimize it for the target instance hardware, reducing inference latency without requiring additional resources. This is the most effective first step because it directly addresses model execution efficiency, often yielding significant speedups for deep learning models.

Answer

Switch to a larger instance type with more GPU memory.

Answer

Enable SageMaker Model Monitor to capture inference data.

Answer

Increase the number of instances in the endpoint to handle more requests.

Question 5

A company uses SageMaker Pipelines to automate model retraining. The pipeline runs daily but sometimes fails due to data quality issues. What is the best design to handle this?

Accepted Answer

Add a data quality check step with Conditional to skip training if data fails.. Option A is correct because SageMaker Pipelines supports a data quality check step that can be integrated with a ConditionStep. If the data quality check fails, the ConditionStep can skip the training step entirely, preventing the pipeline from failing due to bad data. This design ensures the pipeline completes successfully (or exits gracefully) without wasting compute resources on training with invalid data.

Answer

Use SageMaker Debugger to monitor training.

Answer

Use SageMaker Model Registry to track model versions.

Answer

Increase the instance size for the training step.

Question 6

A machine learning engineer is deploying a model using SageMaker and needs to ensure that the endpoint can automatically scale based on traffic patterns. Which TWO actions should the engineer take? (Choose two.)

Accepted Answer

Define a scaling policy using Application Auto Scaling for the SageMaker endpoint variant.. Option A is correct because SageMaker endpoints use Application Auto Scaling to automatically adjust the number of instances based on traffic. You define a scaling policy (e.g., target tracking, step scaling) that references a CloudWatch metric. Option B is correct because the InvocationsPerInstance metric is a standard SageMaker endpoint metric that reflects the load per instance, and a CloudWatch alarm on this metric can trigger the scaling policy to add or remove instances as traffic changes.

Answer

Enable SageMaker Model Monitor to detect data drift.

Answer

Configure a multi-model endpoint to serve multiple models.

Answer

Use SageMaker batch transform to handle variable traffic.

Question 7

A company is building a CI/CD pipeline for ML models using AWS CodePipeline and SageMaker. The pipeline should include steps to automatically retrain, evaluate, and deploy models. Which THREE components are essential for this pipeline? (Choose three.)

Accepted Answer

SageMaker Pipelines to orchestrate training and evaluation steps.. SageMaker Pipelines is essential because it provides a native orchestration service to define, automate, and manage the end-to-end ML workflow, including training, evaluation, and conditional deployment steps. It integrates directly with other SageMaker components and CodePipeline, enabling a seamless CI/CD pipeline without requiring custom orchestration logic.

Answer

Amazon CloudWatch to log API calls.

Answer

AWS Lambda function to trigger evaluation.

Question 8

A company has deployed a SageMaker real-time endpoint for a model that predicts customer churn. The endpoint uses a single ml.m5.large instance. After deployment, the team notices that during peak hours, the endpoint returns 5xx errors for about 20% of requests. The endpoint has not been configured with any scaling policy. The team needs to resolve this issue with minimal cost increase. Which solution should the team implement?

Accepted Answer

Enable Auto Scaling for the endpoint with a target tracking policy based on the average InvocationsPerInstance metric.. Option B is correct because enabling Auto Scaling with a target tracking policy based on the average InvocationsPerInstance metric dynamically adjusts the number of instances in response to traffic spikes, preventing 5xx errors during peak hours without over-provisioning. This approach minimizes cost by scaling only when needed, unlike manual instance upgrades or batch transforms that either increase baseline cost or introduce latency.

Answer

Deploy the model to a multi-model endpoint to reduce resource utilization.

Answer

Increase the instance type to ml.m5.xlarge to handle more concurrent requests.

Answer

Use SageMaker batch transform instead of real-time inference to process peak traffic asynchronously.

Question 9

A data science team uses SageMaker notebooks to develop models. They want to automate the process of training and registering models whenever new data arrives in an S3 bucket. The team has limited DevOps experience and needs a solution that requires minimal maintenance. Which approach should the team use?

Accepted Answer

Configure an S3 event notification to trigger an AWS Step Functions state machine that runs a SageMaker Pipeline.. Option A is correct because S3 event notifications can directly trigger an AWS Step Functions state machine, which orchestrates a SageMaker Pipeline to automate model training and registration when new data arrives. This serverless approach requires minimal maintenance and aligns with the team's limited DevOps experience, as Step Functions handles retries, error handling, and workflow coordination without custom infrastructure.

Answer

Use AWS Glue to detect new data and trigger a SageMaker training job via a Lambda function.

Answer

Write a Python script that runs on a scheduled EC2 instance to check S3 for new data and trigger training.

Answer

Use Amazon EventBridge to schedule a SageMaker training job every hour, regardless of whether new data exists.

Question 10

A company has trained a custom model using PyTorch on Amazon SageMaker. The model achieves high accuracy, but the inference latency on a real-time endpoint is above the required 100ms SLA. The model is a large neural network with many layers. The company wants to reduce latency without significantly impacting accuracy. Which approach should the machine learning engineer take?

Accepted Answer

Use SageMaker Neo to compile the model for the target hardware.. SageMaker Neo compiles trained models into an optimized binary for the target hardware (e.g., CPU, GPU, or Inferentia). It applies graph-level optimizations, operator fusion, and quantization-aware tuning to reduce inference latency while preserving model accuracy. This directly addresses the need to lower latency below 100ms without retraining or sacrificing significant accuracy.

Answer

Reduce the batch size used during inference.

Answer

Increase the instance size of the endpoint.

Answer

Implement a cache for frequent inference requests.

Question 11

A machine learning team is deploying a fraud detection model using SageMaker. They use the SageMaker Model Registry to track model versions. They want to automatically deploy the latest approved model to a production endpoint whenever a new model version is approved. The team uses a CI/CD pipeline with AWS CodePipeline. The pipeline currently includes a source stage (S3), a build stage (CodeBuild), and a deploy stage (manual approval). They want to automate the deployment of approved models. Which solution will meet these requirements with the least operational overhead?

Accepted Answer

Configure an EventBridge rule to trigger a CodePipeline execution when the model approval status changes.. Option C is correct because it directly integrates SageMaker Model Registry approval events with CodePipeline via EventBridge, enabling fully automated deployment of the latest approved model to a production endpoint with minimal operational overhead. This approach avoids custom code or additional pipeline stages, leveraging native AWS event-driven architecture to trigger the pipeline only when a model version is approved.

Answer

Add a custom action to CodePipeline that uses a SageMaker deployment step.

Answer

Create a Lambda function that triggers on Model Registry approval events and updates the endpoint using the boto3 SDK.

Answer

Use SageMaker Pipelines to deploy the model directly upon training completion.

Question 12

A company uses SageMaker for training and inference. They have a model that retrains weekly. After each retraining, the model is evaluated on a held-out test set. If the evaluation metrics meet a threshold, the model is registered as 'Approved' in the SageMaker Model Registry. The team manually deploys the approved model to a production endpoint. They want to automate this deployment process to reduce manual errors. However, the deployment should only proceed if the new model passes a canary test in a staging environment. Which combination of AWS services should the team use to achieve this?

Accepted Answer

SageMaker Pipelines with a conditional deployment step that includes a canary test.. SageMaker Pipelines natively supports conditional execution steps, allowing you to add a canary test step that evaluates the new model in a staging environment before automatically promoting it to production. This directly addresses the requirement for automated deployment gated by a canary test, without needing external orchestration services.

Answer

AWS CodeDeploy with a blue/green deployment strategy.

Answer

AWS Lambda to deploy to staging, then automatically promote to production if staging tests pass.

Answer

Amazon EKS with a custom inference container and use ArgoCD for automated deployments.

Question 13

A large enterprise has multiple SageMaker endpoints serving models for different business units. Each endpoint uses a separate instance type and scaling policy. The enterprise wants to implement a unified monitoring and logging solution to track endpoint health, latency, and errors across all endpoints. They also want to set up alerts when the error rate exceeds 5% over a 5-minute period. The solution must be centralized and use AWS-native services. Which solution should the team implement?

Accepted Answer

Use Amazon CloudWatch dashboards to aggregate metrics from all endpoints, and create a composite alarm based on the Sum of 5xx error counts across endpoints.. Option D is correct because Amazon CloudWatch can natively ingest SageMaker endpoint metrics (e.g., 5xx error counts, latency, invocation counts) without additional configuration. By creating a CloudWatch dashboard, you aggregate metrics from all endpoints into a single view, and a composite alarm using the Sum statistic across endpoints over a 5-minute period directly triggers when the error rate exceeds 5%. This approach is fully centralized, uses only AWS-native services, and requires no custom code or data streaming.

Answer

Enable SageMaker Model Monitor data capture on each endpoint and stream captured data to Amazon Kinesis for analysis.

Answer

Use AWS CloudTrail to audit all API calls to SageMaker and set up alarms on error responses.

Answer

Use Amazon CloudWatch Logs to collect logs from each endpoint, and use a Lambda function to parse logs and calculate error rates, then publish custom metrics.

Question 14

A financial services company is deploying a credit risk model using SageMaker. They require that the model always uses the latest approved version from the Model Registry. They also need to maintain a detailed audit trail of all model version transitions (e.g., from PendingApproval to Approved). The deployment should be fully automated and must roll back immediately if the new model's error rate exceeds the old model's error rate by more than 2% during a canary deployment. Which solution meets these requirements with the least custom code?

Accepted Answer

Use SageMaker Pipelines with a conditional step to deploy the model after approval, and include a canary deployment using a weight endpoint variant. Use CloudWatch alarms to trigger automatic rollback.. Option B is correct because SageMaker Pipelines natively supports conditional execution and canary deployments using endpoint weight variants, which together enable automated rollback triggered by CloudWatch alarms when the error rate exceeds the 2% threshold. This approach requires minimal custom code by leveraging built-in SageMaker capabilities for model registry integration, deployment, and monitoring.

Answer

Use AWS CodePipeline with a deployment action that uses AWS CloudFormation to update the endpoint. Add a manual approval step for rollback.

Answer

Create an AWS Lambda function that is triggered by Model Registry events, deploys the model to a staging endpoint, runs a canary test, and if successful, updates the production endpoint.

Answer

Use an Amazon EKS cluster with a custom inference container and use ArgoCD for automated deployments.

Question 15

A company deployed a machine learning model on an Amazon SageMaker real-time endpoint. Over several weeks, they notice that inference latency has been gradually increasing, especially during peak business hours. The model and instance type have remained unchanged. What is the most likely cause of the increased latency?

Accepted Answer

The SageMaker endpoint auto scaling is not configured to scale out quickly enough under increasing traffic.. Option B is correct because the gradual increase in latency over time, especially during peak hours, suggests that the endpoint may not be scaling out adequately to handle increased traffic. Option A is incorrect because the model size has not changed. Option C is incorrect because data capture does not inherently cause latency increases over time. Option D is incorrect because batch processing is not used for real-time endpoints.

Answer

The inference script is not using batch processing.

Answer

The model size is too large for the instance type.

Answer

The endpoint has data capture enabled, causing additional overhead.