Free MLA-C01 Deployment and Orchestration of ML Workflows Practice Questions (2026)

Q: How many Deployment and Orchestration of ML Workflows questions are on the MLA-C01 exam?

The Deployment and Orchestration of ML Workflows domain is one of the weighted domains on the MLA-C01 exam. The Courseiva question bank has 124 practice questions for this domain.

Q: How can I practice Deployment and Orchestration of ML Workflows questions for MLA-C01?

Click any of the 124 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Deployment and Orchestration of ML Workflows domain.

Practice Deployment and Orchestration of ML Workflows questions

10Q 20Q 30Q 50Q

All MLA-C01 Deployment and Orchestration of ML Workflows questions (124)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A data science team has trained a PyTorch model using Amazon SageMaker and wants to deploy it with a custom inference container that includes a pre-processing step. The team needs to minimize latency and ensure the pre-processing runs only once per request. Which SageMaker real-time inference option should they use?

A company is deploying a real-time inference endpoint for a natural language processing model using Amazon SageMaker. The model requires GPU acceleration and must handle variable traffic patterns, including sudden spikes. The team wants to minimize costs while maintaining low latency during spikes. Which endpoint configuration strategy should they use?

A machine learning engineer is deploying a model using AWS Lambda for inference. The model is a small scikit-learn classifier with a size of 50 MB. The Lambda function is invoked by an API Gateway REST API. The engineer notices that cold starts are causing high latency. Which action would most effectively reduce cold start latency without increasing costs significantly?

A company uses Amazon SageMaker to train and deploy machine learning models. The security team requires that all data in transit between the training job and S3 be encrypted, and that no data traverses the public internet. Which configuration should the company use?

A team is deploying a deep learning model on a SageMaker real-time endpoint. The model has high memory requirements, and the team wants to minimize instance cost while ensuring the endpoint can handle up to 10 concurrent requests. They plan to use a single ml.p3.2xlarge instance (8 vCPUs, 61 GB memory). Which SageMaker endpoint configuration will allow the endpoint to handle 10 concurrent requests without errors?

A company wants to deploy a machine learning model that was trained on-premises using TensorFlow. The model is a TensorFlow SavedModel. The company uses AWS and wants to minimize operational overhead. Which deployment option meets these requirements?

A team is using AWS Step Functions to orchestrate a machine learning workflow that includes data preprocessing, training, and model evaluation. The team wants to run the workflow whenever new data arrives in an S3 bucket. Which approach should they use to trigger the Step Functions workflow?

A company is deploying a machine learning model using Amazon SageMaker. The model is a large deep learning model that requires GPU for inference. The company expects unpredictable traffic patterns with occasional bursts. They want to minimize cost while ensuring low latency during bursts. Which TWO actions should they take? (Select TWO.)

An MLOps engineer is designing a CI/CD pipeline for deploying machine learning models to a production SageMaker endpoint. The pipeline should include automated testing, approval gates, and rollback capability. Which THREE components should be included in the pipeline? (Select THREE.)

A company is using Amazon SageMaker to deploy a model for real-time inference. The model requires access to a private S3 bucket that contains reference data. The company wants to ensure that the endpoint can access the S3 bucket without using a public internet connection. Which TWO actions should they take? (Select TWO.)

A data scientist is trying to create a SageMaker endpoint configuration with 6 instances of ml.c5.large for a production variant. The creation fails with the error shown in the exhibit. Which action should the data scientist take to resolve this issue?

A machine learning engineer has configured a SageMaker Model Monitor schedule for data quality monitoring as shown in the exhibit. The schedule is set to run hourly. However, the engineer notices that the monitoring jobs are not producing output in the specified S3 bucket. What is the most likely cause?

A data science team has trained a model using SageMaker and wants to deploy it for real-time inference with automatic scaling based on request latency. The deployment must handle unpredictable traffic spikes without manual intervention. Which combination of SageMaker features should the team use?

A machine learning engineer is deploying a PyTorch model for real-time inference on SageMaker. The model requires GPU for low-latency predictions. The deployment fails with the error: 'The primary container does not support the requested instance type.' The instance type is ml.p3.2xlarge. Which action should the engineer take to resolve the issue?

A company is using SageMaker Pipelines to automate a multi-step ML workflow. The pipeline includes data preprocessing, training, and model evaluation. The team wants to ensure that if the evaluation step fails, the pipeline stops and sends an alert to the operations team. Which SageMaker Pipelines feature should they use?

A team is deploying a machine learning model using Amazon SageMaker. They need to serve predictions with sub-100ms latency for a real-time application. The model is a large ensemble that requires 4 GB of memory. The team expects traffic of 100 requests per second initially, but it may double during peak hours. Which instance type and deployment configuration should the team choose to minimize cost while meeting the latency requirement?

A company has a SageMaker endpoint running a model that provides real-time recommendations. Recently, the model's accuracy has degraded due to data drift. The team wants to automatically retrain the model when a drift metric exceeds a threshold and deploy the new model without downtime. Which architecture should the team implement?

A machine learning team needs to deploy a model that was built using scikit-learn. They want to use SageMaker for hosting. Which approach should they take?

Which TWO of the following are best practices for deploying machine learning models on SageMaker? (Select TWO.)

A media company uses SageMaker to host a real-time video recommendation model. The model is deployed on a single ml.c5.xlarge endpoint. During a major live event, traffic surges to 10 times the normal load, and the endpoint becomes unresponsive, causing high latency and errors. The team had set up an Application Auto Scaling target tracking policy based on CPU utilization with a target of 70%. However, scaling did not trigger quickly enough. After the event, the team reviews CloudWatch metrics and notices that CPU utilization never exceeded 70% during the surge, but memory utilization peaked at 95%. The model is memory-bound. The team wants to ensure the endpoint scales automatically before performance degrades during future events. What should the team do?

A financial services company has a SageMaker pipeline that trains a fraud detection model daily. The pipeline consists of three steps: preprocessing (using a Spark script), training (XGBoost), and evaluation. The evaluation step calculates the F1 score and compares it to a threshold of 0.95. If the F1 score is below 0.95, the pipeline should fail and notify the team via email. The team implemented this using a Condition step that checks if the F1 score is greater than or equal to 0.95. If true, the pipeline proceeds to register the model; if false, the pipeline fails. However, the team notices that even when the F1 score is 0.94, the pipeline continues to the registration step. The evaluation script outputs the F1 score as a float with two decimal places in a JSON file. The Condition step uses the expression: $.evaluation.metrics.f1_score >= 0.95. What is the most likely cause of the issue?

A data science team needs to deploy a PyTorch model for real-time inference with low latency. The model requires GPU acceleration. Which SageMaker endpoint configuration should they use?

A team is using SageMaker Pipelines to automate retraining and deployment. They want to trigger the pipeline automatically when new training data is available in an S3 bucket. Which approach should they use?

A financial services company deploys a fraud detection model on a SageMaker real-time endpoint. The inference logic includes a pre-processing step that requires access to a DynamoDB table for user metadata. The model container is a custom Docker image. How should the team grant the endpoint access to DynamoDB?

An ML team wants to deploy a model that was trained using XGBoost in SageMaker. They want to use the built-in XGBoost algorithm container for inference. Which inference option requires the least custom code?

A company is deploying a large number of small models (each < 100 MB) for different customers. They want to minimize costs and management overhead while serving traffic that varies significantly. Which SageMaker endpoint type should they choose?

During a blue/green deployment of a SageMaker endpoint, the team notices that traffic is not being fully shifted to the new variant after the update. The endpoint has two variants with equal initial weights (50% each). The team wants to shift 100% traffic to the new variant. What is the most likely cause?

An ML engineer needs to deploy a model as an AWS Lambda function for serverless inference. The model is a scikit-learn pipeline serialized as a pickle file. What is the best way to include the model in the Lambda deployment?

A company uses SageMaker Pipelines to train and register models. They want to automate the deployment of approved models from the model registry to a staging endpoint. Which service should they use to orchestrate the deployment workflow?

A team is deploying a TensorFlow model on a SageMaker real-time endpoint with automatic scaling. They set the scaling policy to target an average CPU utilization of 50%. However, during traffic spikes, the endpoint experiences high latency and 503 errors. The instance type is ml.c5.large. What should the team do to resolve this while minimizing cost?

A company is deploying a machine learning model using SageMaker hosting. They need to support multiple versions of the model for A/B testing. Which TWO actions are required to set up the A/B test? (Choose two.)

A company is using an AWS Step Functions state machine to orchestrate a multi-step ML deployment. The workflow includes: training a model, evaluating it, registering the model, and deploying to a staging endpoint. They need to implement an approval gate before deploying to production. Which THREE components are necessary to achieve this? (Choose three.)

A company wants to deploy a model on SageMaker serverless inference. Which TWO of the following are limitations of serverless endpoints compared to real-time endpoints? (Choose two.)

A company trained a model using SageMaker and wants to deploy it with low latency for real-time inference. Which SageMaker feature is MOST suitable?

A data scientist wants to automate retraining of a model weekly and deploy the new model automatically after passing validation. Which AWS service combination is best?

A company has a model that receives low traffic but needs to handle sudden spikes. Which deployment option is most cost-effective?

A team notices that inference requests to their SageMaker endpoint are failing with '504 Gateway Timeout' for large payloads. What change should be made?

A company is using SageMaker Model Registry to manage model versions. They want to automatically deploy the latest approved model to production after retraining. Which approach is best?

A company is deploying multiple models on a single endpoint to reduce costs. They need to update one model without affecting others. Which solution?

A company uses SageMaker endpoints with auto-scaling based on CPU utilization. During a flash sale, latency increases despite low CPU. What should be done?

A company is deploying a large model (10GB) for real-time inference. The inference latency is too high. What optimization technique can help?

A team uses SageMaker Pipelines for CI/CD. The training step fails due to insufficient memory. How to fix without rewriting code?

A company wants to deploy a trained model to a SageMaker endpoint with automatic scaling based on traffic. Which TWO configurations are required? (Choose two.)

A company uses SageMaker Pipelines for model training and wants to incorporate model evaluation before deployment into production. Which THREE components are essential? (Choose three.)

A company is running a SageMaker endpoint serving multiple models. They need to monitor for data drift and model quality. Which THREE actions are necessary? (Choose three.)

Refer to the exhibit. A user is unable to invoke a SageMaker endpoint. The IAM policy shown is attached to the user. Which permission is missing to allow invocation?

Refer to the exhibit. A SageMaker endpoint is logging this error when processing inference requests that require database access. What is the most likely cause?

Refer to the exhibit. A SageMaker Pipeline fails with 'Invalid output reference' at the TrainingStep. What is the most likely cause?

A data science team deploys a PyTorch model on Amazon SageMaker for real-time inference. The model requires GPU for low latency. Which instance type is MOST cost-effective while meeting the GPU requirement?

A company uses Amazon SageMaker Pipelines to automate its ML workflow. The pipeline includes a training step and a model evaluation step. If the evaluation step fails, the pipeline should stop and notify the team. How should the company configure the pipeline?

A financial services company deploys multiple models on a single Amazon SageMaker endpoint using a multi-model endpoint (MME). The models are stored in Amazon S3. Each model is approximately 500 MB and is loaded on demand. Users report high latency for cold-start scenarios. What should the company do to reduce cold-start latency?

An e-commerce company uses Amazon SageMaker to deploy a real-time inference endpoint for product recommendations. The endpoint receives bursty traffic, with occasional spikes. The company wants to minimize cost while ensuring that latency remains under 100 ms. Which approach should the company take?

A machine learning engineer needs to deploy a TensorFlow model to Amazon SageMaker and wants to use the built-in TensorFlow Serving container. What should the engineer provide in the model archive?

A data scientist is using Amazon SageMaker Studio to develop a model. The training job is taking longer than expected. The data scientist suspects that the data is being downloaded from Amazon S3 each time the training starts. What is the BEST way to reduce data loading time?

A healthcare company is deploying a model for predicting patient outcomes. The model must be deployed across multiple AWS accounts to meet compliance requirements. Each account has its own Amazon SageMaker endpoint. The company wants to centralize monitoring of model performance without exposing data across accounts. Which solution should the company use?

A company wants to automate its machine learning pipeline using AWS CodePipeline and Amazon SageMaker. The pipeline should train a model, evaluate it, and if the evaluation passes, register the model in the SageMaker Model Registry. Which service should the company use to orchestrate the training and evaluation steps?

A company deploys a model on Amazon SageMaker for real-time inference. The inference latency is too high. The model is a large deep learning model. The company wants to reduce latency without significantly impacting accuracy. Which approach should the company consider?

Refer to the exhibit. An AWS IAM policy is attached to a role used by a CI/CD pipeline to deploy SageMaker endpoints. The pipeline attempts to create an endpoint configuration with a VPC subnet that is not subnet-0123456789abcdef0. What will happen when the pipeline tries to create the endpoint configuration?

Refer to the exhibit. A data scientist creates a SageMaker Pipeline definition using the JSON shown. The pipeline runs successfully, but the scientist notices that the training step did not use the parameter 'TrainingInstanceCount' defined in Parameters. Why did this happen?

Refer to the exhibit. A company configures a SageMaker Model Monitor Data Quality monitoring schedule as shown. The schedule runs every hour. However, the team notices that the monitoring job fails intermittently with an AccessDenied error when accessing the S3 bucket for output. The IAM role SageMakerMonitorRole has permissions to write to s3://my-bucket/monitor-output. What is the MOST likely cause of the failure?

A company uses Amazon SageMaker to deploy a model for real-time inference. They want to perform A/B testing between two model versions. Which TWO actions should the company take to set up A/B testing? (Choose TWO.)

A machine learning team is building a CI/CD pipeline to train and deploy models using Amazon SageMaker. They want to ensure that the deployment step only proceeds if the model evaluation metrics exceed a certain threshold. Which THREE components should the team include in the pipeline? (Choose THREE.)

A data science team is deploying a model on Amazon SageMaker and wants to protect the endpoint from unauthorized access. Which TWO methods can the team use to secure the endpoint? (Choose TWO.)

A data science team needs to deploy a frequently updated PyTorch model for real-time inference. The model is retrained weekly and versioned using SageMaker Model Registry. Which deployment strategy minimizes downtime and allows easy rollback?

An ML team is deploying a model using SageMaker. The model requires GPU inference and must be available in multiple AWS regions for low latency. The team has created a multi-model endpoint with GPU instances. After deployment, they notice high latency spikes when a new model is loaded. What is the most likely cause?

A company is deploying a ML model for real-time fraud detection using SageMaker. The model must process requests within 50 ms and scale to handle up to 10,000 requests per second during peak hours. The data includes PII, so all traffic must stay within a VPC. The team has configured the SageMaker endpoint with a VPC and an internet gateway for model downloads. During a load test, the endpoint fails to achieve the required throughput. Which change would most likely resolve the issue?

A team wants to automate the retraining and deployment of an ML model whenever new labeled data arrives in S3. The workflow includes data preprocessing, training, evaluation, and conditional deployment. Which AWS service is best suited for orchestrating this end-to-end pipeline?

An MLOps engineer is setting up a SageMaker endpoint for a model that performs inference on large images. The model is containerized and expects input in a specific format. The team wants to preprocess the images (resize and normalize) before passing them to the model. What is the most efficient way to implement this?

A company deploys a model using SageMaker real-time endpoint with auto scaling. They observe that during a traffic spike, the endpoint quickly scales up to 10 instances, but after the spike, it takes a long time to scale down, leading to high costs. The scaling policy is based on a simple average CPU utilization threshold. Which adjustment would optimize the scaling down behavior?

A team wants to apply a custom container for inference on SageMaker. The container needs to implement a web server that responds to API requests. Which protocol and port must the container listen on to be compatible with SageMaker hosting?

An ML team is using SageMaker Model Registry to manage model versions. After training a new model version, they register it with an 'Approved' status. The CI/CD pipeline automatically deploys the latest approved model to a staging endpoint. However, the pipeline fails with an error: 'Cannot deploy model because the model version is not approved.' The model version is clearly approved in the registry. What is the most likely cause?

A company uses SageMaker Ground Truth to create a labeled dataset, then trains a model using SageMaker Training. They want to automate the pipeline so that whenever a labeling job is completed, it triggers the training job. Which architecture meets this requirement with minimal latency?

A company deploys a model on SageMaker that serves predictions to a web application. The model's performance degrades over time due to data drift. The company wants to set up continuous monitoring. Which TWO actions should the company take to monitor and retrain the model effectively? (Choose TWO.)

A team is deploying a model using SageMaker Pipelines. They have defined a pipeline with steps: preprocessing, training, evaluation, and conditional registration. The evaluation step produces a JSON file with metrics. If accuracy > 0.9, the model is registered; else, the pipeline fails. Which TWO statements about this pipeline are correct? (Choose TWO.)

A company wants to deploy its trained model to edge devices such as cameras and IoT devices. The model must run efficiently with low latency and minimal memory footprint. Which THREE actions should the company take to prepare the model for edge deployment? (Choose THREE.)

An engineer runs: aws sagemaker describe-endpoint --endpoint-name my-endpoint and receives the exhibit output. The engineer wants to update the endpoint to use a new model version stored in ECR with tag ':2'. Which step is necessary to perform the update?

A SageMaker endpoint is failing with the exhibited error. What is the most likely cause of this error?

An ML engineer runs the CLI command shown in the exhibit. However, the training job fails immediately with an error: 'Unable to assume role'. What is the most likely cause?

A data science team has trained a model using SageMaker and wants to deploy it to a production endpoint with automatic scaling based on request volume. Which SageMaker feature should they use to configure scaling?

A company is deploying a multi-model endpoint using SageMaker to serve multiple models from a single endpoint. They notice that one model consumes excessive memory and impacts others. What is the BEST practice to isolate resource usage?

An ML team uses SageMaker Pipelines to automate retraining. After a pipeline failure, they need to reprocess only the failed step without rerunning the entire pipeline. What should they do?

A company wants to deploy a PyTorch model on SageMaker for real-time inference. Which two steps are required? (Select TWO.)

A company uses SageMaker to orchestrate a training pipeline with multiple steps including preprocessing, training, and evaluation. They want to ensure that each step can be reused and tracked. Which three SageMaker features support this? (Select THREE.)

An organization is deploying a large language model on SageMaker and needs to optimize inference costs while maintaining low latency. Which three strategies should they consider? (Select THREE.)

A team used the above config to create an endpoint. However, the endpoint fails to invoke because of a "ModelError". What is the most likely cause?

A data scientist runs this pipeline but the Train step fails with "ResourceLimitExceeded". What is the most likely cause?

During deployment of a Hugging Face model, the endpoint logs show this error. Which step was likely missed?

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but also needs to keep costs low when traffic is low. Which SageMaker feature should they use?

A team has a large number of models that need to be deployed for batch inference weekly. They want to minimize cost and management overhead. Which approach is MOST efficient?

An ML team uses SageMaker Model Registry to manage model versions. They want to automatically deploy a model to a staging endpoint when a new version is approved. Which AWS service can orchestrate this?

A company deploys a model using SageMaker and enables data capture for monitoring. After a week, they notice that the captured data is not being written to the specified S3 bucket. The endpoint is running and invocations are successful. What is the most likely cause?

A team uses SageMaker Neo to compile a model for deployment on a target device. After compilation, they deploy the compiled model to a SageMaker endpoint using the Neo-optimized container. The endpoint fails to start with error "RuntimeError: Unable to load model". What could be the issue?

A company wants to use SageMaker to serve real-time predictions with a model that has a large memory footprint. They need to ensure the endpoint can handle traffic spikes. Which scaling policy should they use?

A data science team wants to deploy a real-time inference endpoint on Amazon SageMaker for a model that requires low latency (under 100 ms). The model is a small ensemble of three tree-based models, each about 50 MB. The team expects around 1000 requests per minute, with occasional spikes to 5000 requests per minute. Which instance type and deployment strategy would be MOST cost-effective while meeting the latency requirement?

A company has a SageMaker endpoint that was deployed successfully and is in service. However, when the team sends test inferences using the InvokeEndpoint API, they receive a 500 internal server error. The endpoint logs in CloudWatch show a stack trace indicating 'OutOfMemoryError: Java heap space'. The model is a large XGBoost model (2 GB) and the endpoint is using an ml.m5.large instance with 8 GB of memory. What is the MOST likely cause and solution?

A company is running multiple SageMaker endpoints for different models, each serving a separate business unit. The total cost is growing rapidly. The ML engineering team wants to reduce costs without sacrificing performance or isolation. They are considering either consolidating models into a Multi-Model Endpoint (MME) or onto a Multi-Container Endpoint (MCE). The models vary in size from 100 MB to 5 GB, and traffic patterns are unpredictable. Which recommendation is MOST appropriate?

A company uses Amazon SageMaker to train and deploy machine learning models. They need to run batch predictions on 10 TB of data stored in Amazon S3 every night. The model is a PyTorch neural network that fits in GPU memory. The predictions are not time-sensitive, but the job must complete within 8 hours. Which approach would be the MOST cost-effective?

A machine learning engineer is configuring auto-scaling for a SageMaker real-time endpoint. The endpoint is expected to have steady traffic during business hours and low traffic at night. The engineer wants to minimize costs by scaling in during low traffic, but the model container has a long start-up time (about 5 minutes). Which scaling policy should the engineer use to prevent request drops during sudden traffic spikes?

A company uses SageMaker endpoint with production variants for canary deployments. The team wants to gradually shift traffic from the old model variant (variant A) to the new model variant (variant B) over a period of 10 minutes. After the shift, if the new variant's error rate increases by more than 5%, they want to roll back automatically. Which solution meets these requirements with minimal manual intervention?

100

A data scientist needs to version and manage multiple models for a team of five. The team frequently experiments with different algorithms and hyperparameters. They need a centralized registry to store, deploy, and compare model versions. Which AWS service should the data scientist use?

101

An ML team uses AWS Step Functions to orchestrate a multi-step inference pipeline: data preprocessing, model inference, and postprocessing. The pipeline runs on demand for single records. The team notices that the pipeline occasionally fails due to timeouts in the preprocessing step. They want to implement retries with exponential backoff and a maximum retry count of 3 for that step. How should they configure this?

102

A company is deploying a deep learning model for real-time inference using Amazon SageMaker. The model is a CPU-intensive XGBoost model that performs well with CPU. However, the team wants to minimize latency further by using hardware acceleration. They are considering Amazon Elastic Inference (EI) or moving to a GPU instance. The model is not optimized for GPU, so significant code changes would be required. Which approach is the MOST cost-effective way to reduce latency without changing the model code?

103

An ML team is running multiple SageMaker endpoints for various models. The monthly cost is higher than expected. Which TWO actions would help reduce costs without negatively impacting performance?

104

A company deploys a SageMaker endpoint that is InService, but inference requests are returning 503 Service Unavailable errors when traffic is high. The endpoint uses three ml.m5.large instances with target tracking scaling based on CPU utilization. The team has confirmed the model container is healthy. Which TWO possible issues could cause 503 errors?

105

A company is adopting Amazon SageMaker Pipelines to automate their ML workflow. They want to choose three key benefits that SageMaker Pipelines provides over traditional manual scripts and ad-hoc steps. Which THREE benefits are correct?

106

Your team manages a SageMaker real-time endpoint for a financial services application that requires low latency for fraud detection. The model is a 1 GB XGBoost model. The endpoint is deployed on two ml.m5.xlarge instances with target tracking auto-scaling based on average CPU utilization at 70%. During peak hours, the endpoint receives a sudden burst of traffic that increases from 500 requests per second to 2000 requests per second within 30 seconds. Many requests start failing with 503 errors. The CPU utilization metric shows that the instances are at 90% before the scaling policy launches new instances. However, by the time the new instances are added (approximately 3 minutes), the burst has subsided. You need to prevent these failures during future bursts while keeping costs reasonable. Which action would be MOST effective?

107

Your company uses SageMaker batch transform to process a large dataset (5 TB) of customer transactions every night. The batch transform job uses a single ml.c5.4xlarge instance and takes about 6 hours to complete. However, the job recently started failing with an error message: 'Timed out waiting for transformation to complete. The maximum job duration is 3600 seconds.' You check the input data and notice that one of the input files is a single large JSON file of 50 GB, while the rest are smaller files. The job is configured with a batch strategy of 'MultiRecord' and a maximum payload size of 6 MB. What is the most likely cause of the timeout and which fix should you apply?

108

A startup is building a serverless inference API using AWS Lambda. They have a TensorFlow model that is 400 MB in size. They packaged the model and inference code into a Lambda function using a container image. When they test the function with a small input, it consistently times out after 3 seconds. The Lambda function has 512 MB of memory and a timeout of 30 seconds. The business requirement is that inference must complete in less than 5 seconds under normal conditions. What is the most likely cause of the slow performance, and which change should they make?

109

A company deploys a deep learning model to a real-time SageMaker endpoint. After deployment, users report high inference latency. Which action is the MOST effective first step to reduce latency?

110

A company uses SageMaker Pipelines to automate model retraining. The pipeline runs daily but sometimes fails due to data quality issues. What is the best design to handle this?

111

A machine learning engineer is deploying a model using SageMaker and needs to ensure that the endpoint can automatically scale based on traffic patterns. Which TWO actions should the engineer take? (Choose two.)

112

A company is building a CI/CD pipeline for ML models using AWS CodePipeline and SageMaker. The pipeline should include steps to automatically retrain, evaluate, and deploy models. Which THREE components are essential for this pipeline? (Choose three.)

113

A company has deployed a SageMaker real-time endpoint for a model that predicts customer churn. The endpoint uses a single ml.m5.large instance. After deployment, the team notices that during peak hours, the endpoint returns 5xx errors for about 20% of requests. The endpoint has not been configured with any scaling policy. The team needs to resolve this issue with minimal cost increase. Which solution should the team implement?

114

A data science team uses SageMaker notebooks to develop models. They want to automate the process of training and registering models whenever new data arrives in an S3 bucket. The team has limited DevOps experience and needs a solution that requires minimal maintenance. Which approach should the team use?

115

A company has trained a custom model using PyTorch on Amazon SageMaker. The model achieves high accuracy, but the inference latency on a real-time endpoint is above the required 100ms SLA. The model is a large neural network with many layers. The company wants to reduce latency without significantly impacting accuracy. Which approach should the machine learning engineer take?

116

A machine learning team is deploying a fraud detection model using SageMaker. They use the SageMaker Model Registry to track model versions. They want to automatically deploy the latest approved model to a production endpoint whenever a new model version is approved. The team uses a CI/CD pipeline with AWS CodePipeline. The pipeline currently includes a source stage (S3), a build stage (CodeBuild), and a deploy stage (manual approval). They want to automate the deployment of approved models. Which solution will meet these requirements with the least operational overhead?

117

A company uses SageMaker for training and inference. They have a model that retrains weekly. After each retraining, the model is evaluated on a held-out test set. If the evaluation metrics meet a threshold, the model is registered as 'Approved' in the SageMaker Model Registry. The team manually deploys the approved model to a production endpoint. They want to automate this deployment process to reduce manual errors. However, the deployment should only proceed if the new model passes a canary test in a staging environment. Which combination of AWS services should the team use to achieve this?

118

A large enterprise has multiple SageMaker endpoints serving models for different business units. Each endpoint uses a separate instance type and scaling policy. The enterprise wants to implement a unified monitoring and logging solution to track endpoint health, latency, and errors across all endpoints. They also want to set up alerts when the error rate exceeds 5% over a 5-minute period. The solution must be centralized and use AWS-native services. Which solution should the team implement?

119

A financial services company is deploying a credit risk model using SageMaker. They require that the model always uses the latest approved version from the Model Registry. They also need to maintain a detailed audit trail of all model version transitions (e.g., from PendingApproval to Approved). The deployment should be fully automated and must roll back immediately if the new model's error rate exceeds the old model's error rate by more than 2% during a canary deployment. Which solution meets these requirements with the least custom code?

120

A company deployed a machine learning model on an Amazon SageMaker real-time endpoint. Over several weeks, they notice that inference latency has been gradually increasing, especially during peak business hours. The model and instance type have remained unchanged. What is the most likely cause of the increased latency?

121

A data science team deploys a TensorFlow model for real-time inference using the Amazon SageMaker model configuration shown. They observe high latency during the first few requests after deployment. Which TWO actions would reduce cold start latency? (Choose two.)

122

A financial services company uses Amazon SageMaker to deploy a fraud detection model for real-time inference. The model is deployed on an ml.m5.large instance with a SageMaker real-time endpoint. The endpoint has an auto scaling policy configured using a custom scaling policy based on average CPU utilization, with scale out threshold at 70% and scale in threshold at 30%. During a flash sale event, the traffic to the endpoint spikes tenfold within minutes. The endpoint fails to handle the load, resulting in increased latency and timeouts. The data science team needs to improve the scalability of the endpoint to handle sudden traffic spikes. Which solution should the team implement?

123

An ML team at a financial services company has developed a fraud detection model using Amazon SageMaker. The model is currently deployed to a production endpoint with a single variant using the previous model version. The team wants to deploy a new model version with a canary deployment where 10% of traffic goes to the new version and 90% remains on the old version for 30 minutes before shifting all traffic to the new version if no issues are detected. Which step is essential to achieve this safe rollout?

124

A streaming media company uses Amazon SageMaker to host a recommendation model at a real-time endpoint. The model is updated weekly, and the team deploys new model versions using SageMaker's blue/green deployments. Recently, after a deployment, the new endpoint variant began returning HTTP 503 errors (Service Unavailable) for approximately 5 minutes before stabilizing. The deployment uses a linear transition with a 10-minute window. The old variant continues to serve traffic during the transition. The team notices that the error rate spikes right after the new variant becomes active. The endpoint is configured with two instances for each variant. Instance logs show that the new model container is taking longer than expected to load and initialize (e.g., downloading model artifacts from S3 and loading into memory). The team needs to resolve this issue without changing the model or container image. Which combination of actions should the team take to eliminate the 503 errors?

Practice all 124 Deployment and Orchestration of ML Workflows questions

Other MLA-C01 exam domains

Data Preparation for Machine Learning ML Model Development ML Solution Monitoring, Maintenance and Security

Frequently asked questions

What does the Deployment and Orchestration of ML Workflows domain cover on the MLA-C01 exam?

The Deployment and Orchestration of ML Workflows domain covers the key concepts tested in this area of the MLA-C01 exam blueprint published by Amazon Web Services. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all MLA-C01 domains — no account required.

How many Deployment and Orchestration of ML Workflows questions are in the MLA-C01 question bank?

The Courseiva MLA-C01 question bank contains 124 questions in the Deployment and Orchestration of ML Workflows domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Deployment and Orchestration of ML Workflows for MLA-C01?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Deployment and Orchestration of ML Workflows questions for MLA-C01?

Yes — the session launcher on this page draws questions exclusively from the Deployment and Orchestration of ML Workflows domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your MLA-C01 domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included