Free MLA-C01 Deployment and Orchestration of ML Workflows Practice Questions (2026)

Q: How many Deployment and Orchestration of ML Workflows questions are on the MLA-C01 exam?

The Deployment and Orchestration of ML Workflows domain is one of the weighted domains on the MLA-C01 exam. The Courseiva question bank has 108 practice questions for this domain.

Q: How can I practice Deployment and Orchestration of ML Workflows questions for MLA-C01?

Click any of the 108 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Deployment and Orchestration of ML Workflows domain.

Practice Deployment and Orchestration of ML Workflows questions

10Q 20Q 30Q 50Q

All MLA-C01 Deployment and Orchestration of ML Workflows questions (108)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A data scientist needs to deploy a single ML model that will serve real-time predictions with low latency (under 10 ms) for a high-traffic web application. The model fits in memory and requires GPU acceleration. Which SageMaker inference option is MOST suitable?

A team has 200 small ML models that need to be served via HTTPS endpoints. Each model is used infrequently, and the team wants to minimize hosting costs. Which SageMaker deployment approach is MOST cost-effective?

An ML team uses SageMaker Pipelines to automate model retraining. They want to skip redundant training steps when input data has not changed. Which feature should they enable?

A company needs to deploy a new model version to a SageMaker real-time endpoint. They want to route 5% of traffic to the new version initially to monitor for errors before full rollout. Which deployment strategy should they use?

An ML engineer needs to compile a trained TensorFlow model to run efficiently on a target edge device with an ARM CPU. Which AWS service should they use?

A data science team uses SageMaker Pipelines for automated training. They need to conditionally register a model only if evaluation metrics exceed a threshold. Which pipeline step type should they use after the evaluation step?

A company wants to serve a large ensemble of models using NVIDIA Triton Inference Server on SageMaker for high throughput GPU inference. Which SageMaker inference option supports this?

An ML pipeline uses SageMaker Processing to run a feature engineering script. The script takes a long time and the team wants to speed up pipeline execution. What is the MOST effective approach?

A company uses SageMaker Model Registry to manage model versions. They want to automate the approval of models that pass automated evaluation, but require manual approval for others. Which Model Registry feature supports this workflow?

A company needs to deploy a model that processes large payloads (up to 1 GB) asynchronously. The results should be written to S3, and the team needs SNS notifications upon completion. Which SageMaker inference option is MOST suitable?

An ML team uses AWS Step Functions to orchestrate a retraining pipeline triggered by EventBridge when new training data arrives. The pipeline includes a SageMaker training job and a model evaluation. If evaluation fails, the team wants to send an alert. How should they implement this?

A startup wants to deploy a containerized ML application that includes both a model inference server and a preprocessing component in the same endpoint. Which SageMaker endpoint type supports running multiple containers?

A company uses SageMaker to deploy a model and wants to perform A/B testing by splitting traffic between two model variants. Which TWO actions should they take? (Select TWO.)

An organization wants to automate ML retraining using an event-driven architecture. Which THREE services should they combine? (Select THREE.)

A team uses SageMaker Pipelines to train and evaluate a model. They want to run the training step only if the data quality check passes, otherwise skip. Which TWO pipeline step types are required? (Select TWO.)

A machine learning engineer needs to deploy a model that requires less than 100 ms inference latency for real-time predictions. The model is a small PyTorch model that fits in a single GPU. Which SageMaker inference option is MOST cost-effective for this scenario?

A team built a SageMaker Pipeline that includes a training step and a model evaluation step. They want to automatically register a model in SageMaker Model Registry only if the evaluation metric (accuracy) exceeds 0.9. Which pipeline step should be used to implement this conditional logic?

A company has 200 small models (each ~100 MB) that serve different customers. They want to minimize costs while keeping low latency for each customer. Which SageMaker deployment approach is MOST suitable?

A data science team uses SageMaker Pipelines to orchestrate their ML workflow. They noticed that even when source data hasn't changed, the pipeline re-runs all steps, wasting compute time. What should they enable to avoid redundant runs?

A company wants to update an existing SageMaker real-time endpoint to serve a new model version. They need to route a small percentage of traffic to the new version initially and monitor for errors before switching fully. Which deployment pattern supports this?

A team uses MLflow on SageMaker for experiment tracking. They want to automatically deploy the best-performing model from an MLflow run to a SageMaker endpoint for real-time inference. What is the MOST efficient way to achieve this?

A company deploys a large NLP model on a SageMaker real-time endpoint using an ml.p3.2xlarge instance. To reduce inference cost without sacrificing throughput, they want to compile the model for their target hardware. Which service should they use?

A company uses SageMaker Model Registry to manage model versions. They have a cross-account deployment requirement: models approved in the development account must be deployed to a production account. Which approach is the MOST secure and recommended?

A team wants to use AWS Step Functions to orchestrate a retraining workflow that is triggered when new data arrives in an S3 bucket. They also need to monitor model drift. Which event-driven approach should they use?

A company wants to deploy a PyTorch model that uses dynamic batching and model ensemble. They need to serve multiple models with different frameworks (PyTorch, TensorFlow) within the same endpoint. Which SageMaker feature should they use?

A team uses SageMaker real-time endpoints for inference. They want to deploy a new model version and compare its performance with the current version under live traffic without affecting user experience. Which method should they use?

A company wants to deploy a machine learning model using infrastructure as code to ensure reproducibility. They need to define the SageMaker Studio domain, user profiles, and the endpoint configuration. Which tool should they use?

A data scientist needs to deploy an anomaly detection model that processes large payloads (up to 10 MB per request) and expects inference times of up to 10 minutes. The team wants to minimize cost and only pay per inference. Which TWO SageMaker inference options meet these requirements? (Choose TWO.)

A team is optimizing a deep learning model for deployment on SageMaker using SageMaker Neo. Which THREE of the following are valid optimization techniques that Neo can apply? (Choose THREE.)

A company uses SageMaker Pipelines to automate their ML workflow. They need to add model versioning and approval workflow. Which THREE steps should they include in their pipeline to achieve this? (Choose THREE.)

A machine learning team has a model that needs to serve predictions with very low latency (under 10 ms) for a real-time web application. The model is a small ensemble of three neural networks that fits in memory. Which SageMaker inference option is MOST appropriate?

A company wants to deploy a single model that processes images from a production line. The images are uploaded to an S3 bucket every few minutes, and the inference results must be stored back to S3. The team wants to avoid paying for idle compute and prefers a fully managed, on-demand solution. Which SageMaker inference option should they use?

A data science team wants to host 50 different models for a recommendation engine. Each model is small (under 100 MB) and traffic patterns are unpredictable. They need to minimize cost and operational overhead. Which approach should they take?

A machine learning engineer needs to deploy a new version of a model gradually, initially sending 5% of traffic to the new version and 95% to the current version, while monitoring for errors. Which deployment pattern should they use?

A company is using SageMaker Pipelines to orchestrate their ML workflow. They have a Condition step that checks if a model's accuracy exceeds 0.9. If true, they want to register the model in the model registry; otherwise, they want to run a retraining step. Which step type should they use for the decision?

An organization wants to ensure that only approved model versions can be deployed to production. They use the SageMaker Model Registry to track model versions. How can they enforce that only approved models are deployed?

A team has a large deep learning model that needs to be deployed for real-time inference with GPU acceleration. They want to use the Triton Inference Server on SageMaker to maximize throughput. Which instance type and configuration should they choose?

A company uses SageMaker Pipelines to automate their ML workflow. They notice that the pipeline reruns all steps even when the input data has not changed. Which feature should they enable to avoid unnecessary recomputation?

An ML engineer wants to use MLflow on SageMaker to track experiments and log metrics. They have set up MLflow on an EC2 instance. How can they best integrate MLflow tracking with SageMaker training jobs?

A company needs to update a model in production without any downtime. They currently have a single real-time endpoint serving traffic. Which approach allows them to deploy a new model version and switch traffic gradually while being able to roll back quickly?

Which SageMaker feature compiles a trained model into an optimized binary for a specific hardware target (e.g., Intel, ARM, NVIDIA, or edge devices) to improve inference performance?

A team has a SageMaker Pipeline that trains a model and registers it in the Model Registry. They want to automate the deployment of the approved model to a staging environment. Which event-driven approach should they use?

An ML engineer is designing a SageMaker Pipeline for a computer vision model. The pipeline includes steps for data processing, training, evaluation, and registration. The engineer wants to enable caching to avoid reprocessing when step inputs have not changed. For which steps is caching supported? (Select TWO.)

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but wants to minimize costs by using a smaller attached GPU. Which options can they use? (Select TWO.)

A team is migrating their ML infrastructure to AWS and wants to use infrastructure as code to manage SageMaker Studio domains, user profiles, and associated resources. Which services can they use for this purpose? (Select THREE.)

A data science team needs to deploy a PyTorch model that performs real-time inference with sub-100ms latency. The model requires GPU acceleration, but the team wants to minimize cost by sharing GPU instances across multiple models. Which SageMaker hosting option should they choose?

A machine learning engineer wants to automatically trigger a retraining pipeline whenever new training data arrives in an S3 bucket. The pipeline uses SageMaker Pipelines. Which AWS service should be used to detect the S3 event and start the pipeline?

A financial services company needs to deploy a machine learning model for real-time fraud detection. The model must be highly available across multiple Availability Zones and must support automatic scaling based on request volume. The company also needs to perform canary deployments to test new model versions with a small percentage of traffic before full rollout. Which SageMaker feature should they use?

A team wants to use MLflow on SageMaker to track experiments and manage model lifecycle. They need to register models in the SageMaker Model Registry after training. Which approach allows them to use MLflow for experiment tracking and then register the best model to SageMaker Model Registry?

A company wants to deploy a trained XGBoost model for batch inference on a large dataset stored in S3. The inference job should be cost-effective and does not require real-time responses. Which SageMaker inference option should they use?

A machine learning engineer needs to deploy a TensorFlow model that requires a custom inference environment with specific system libraries. The model will be used in a real-time application with variable traffic. They want to minimize cold start latency. Which SageMaker hosting option should they choose?

A company uses SageMaker Pipelines to orchestrate their ML workflow. They notice that if a pipeline step fails due to a transient error (e.g., a brief network issue), the entire pipeline fails and they must manually rerun from the beginning. They want to automatically retry failed steps a few times before failing. What should they do?

A data scientist wants to compare the performance of two model versions (V1 and V2) in production by splitting traffic between them. They want to gradually increase the percentage of traffic to the new version while monitoring metrics. Which SageMaker feature enables this?

A company is deploying a large NLP model on SageMaker for real-time inference. They want to reduce inference latency and cost by optimizing the model for the target hardware. The model is trained in PyTorch. Which SageMaker feature should they use to compile the model for best performance on the chosen instance?

A team needs to deploy a PyTorch model that uses custom CUDA kernels. They want to use NVIDIA Triton Inference Server on SageMaker for high-performance serving. Which SageMaker configuration is required to use Triton?

A financial services company needs to enforce that only approved model versions are deployed to production. They use SageMaker Model Registry to track versions, with an approval workflow. Which action must they take in the model registry to ensure only approved models can be deployed?

A company wants to deploy 50 small models (each ~100 MB) for real-time inference. They need to minimize hosting costs while maintaining low latency. Which SageMaker hosting option is most cost-effective?

A data science team uses SageMaker Pipelines to automate their ML workflow. They want to reduce costs by reusing outputs from previous pipeline runs when the input data and code have not changed. Which TWO actions should they take? (Choose two.)

An ML engineer needs to deploy a model that requires GPU acceleration but wants to reduce inference cost by optimizing the model. They are considering SageMaker Neo compilation and Amazon Elastic Inference. Which TWO statements are correct about these services? (Choose two.)

A company is using AWS Step Functions to orchestrate their ML retraining pipeline. They want to trigger retraining when new data arrives, but only if the model's performance has degraded below a threshold. Which THREE AWS services should they use together to achieve this? (Choose three.)

A data science team has trained a PyTorch model for real-time inference and needs to deploy it on AWS with GPU acceleration while minimizing cold-start latency. Which SageMaker inference option should they choose?

A company has 50 small PyTorch models that are used infrequently for inference. They want to minimize costs while maintaining the ability to serve all models from a single endpoint. Which SageMaker feature should they use?

A machine learning team uses SageMaker Pipelines to automate retraining. They want to avoid re-running data processing steps if the data has not changed since the last successful pipeline run. Which built-in feature should they enable?

A team needs to deploy a new model version to production while minimizing risk. They want to route 5% of live traffic to the new model and 95% to the current model, and then gradually increase the new model's traffic. Which SageMaker deployment pattern should they use?

A company wants to run inference on a large dataset stored in S3 using a pre-trained model. The inference can tolerate latency from minutes to hours, and they want a fully managed solution that autoscales to handle large volumes. Which SageMaker inference option is most suitable?

A machine learning engineer needs to optimize a trained TensorFlow model for deployment on edge devices with limited compute. Which SageMaker feature should they use to compile the model for target hardware?

A team uses SageMaker Pipelines with a Condition step to decide whether to register a model based on evaluation metrics. They want to also store the evaluation results for lineage tracking. Which step should they use to record the metrics?

A company wants to deploy a PyTorch model on SageMaker using the NVIDIA Triton Inference Server for GPU acceleration. They have an existing Triton configuration. Which approach should they take?

A team wants to implement an event-driven retraining pipeline that triggers retraining when new data arrives in an S3 bucket. The pipeline should include preprocessing, training, evaluation, and conditional registration. Which AWS services should they combine?

A team needs to deploy a model that has compliance requirements to log all inference requests and responses for auditing. The model will be served using a real-time endpoint. How can they achieve this without custom code?

A company wants to version and track ML models, with an approval workflow for promoting models from staging to production. Which SageMaker feature should they use?

A startup wants to deploy a model that has variable traffic patterns, with some periods of no traffic and occasional spikes. They want to pay only for what they use and do not want to manage instances. Which SageMaker inference option should they choose?

A company wants to deploy a new model using a canary deployment strategy on SageMaker. Which two actions should they take? (Select TWO.)

A machine learning engineer is designing a SageMaker Pipeline that includes a training step, a processing step for evaluation, and a condition step to decide whether to register the model. The pipeline should support caching to avoid redundant runs when inputs haven't changed. Which three steps must have caching enabled? (Select THREE.)

A team wants to deploy a single SageMaker real-time endpoint that serves both a PyTorch model for NLP and a TensorFlow model for image classification. Each model requires a different inference container. Which two features can they use together to achieve this? (Select TWO.)

A company wants to serve 200 different PyTorch models. Each model is small (under 1 GB) and only a fraction are used at any time. To minimize cost and management overhead, which SageMaker inference option should be used?

A machine learning team needs to deploy a new model version for A/B testing, gradually shifting traffic from the old version to the new version over 24 hours. Which deployment strategy should they use?

A team uses SageMaker Pipelines to retrain a model nightly. They want to skip the training step if the new data is unchanged (same checksum as previous run) to save cost and time. Which pipeline configuration achieves this?

A company needs to serve real-time predictions from a large ensemble of three deep learning models, each requiring different inference environments (PyTorch, TensorFlow, MXNet). Which SageMaker endpoint type supports running multiple inference containers together?

A data scientist wants to train a model on SageMaker using a custom PyTorch script, then register the best model in the SageMaker Model Registry. The training job is part of a SageMaker Pipeline. Which pipeline step should be used to register the model?

A team wants to deploy a model that performs inference on large video files (up to 2 GB each) uploaded to an S3 bucket. The inference can tolerate a few minutes of latency. Which SageMaker inference option is most cost-effective?

A company uses SageMaker Neo to compile a trained model for deployment on edge devices. What is the primary benefit of using Neo?

A team wants to orchestrate a multi-step ML workflow that includes data preprocessing, hyperparameter tuning, model training, evaluation, and conditional deployment to staging or production based on evaluation metrics. The workflow should run on a schedule and track lineage. Which service should they use?

A team deploys a model on a SageMaker real-time endpoint using an ml.m5.xlarge instance. The model has high latency due to a large neural network. The team wants to reduce latency without changing the model code. Which option should they use?

A company uses SageMaker Model Registry to manage model versions. They want to enforce that only models with an 'Approved' status can be deployed to production endpoints. How can they enforce this?

A team uses MLflow on SageMaker for experiment tracking. They want to automate the retraining of a model when new training data arrives in an S3 bucket. Which combination of services should they use?

A company wants to deploy a model using a serverless inference endpoint that can automatically scale to zero when not in use and has a configurable maximum concurrency. Which SageMaker inference option meets these requirements?

A machine learning engineer is deploying a TensorFlow model for real-time inference. The model has high latency on CPU. Which TWO actions can reduce inference latency? (Choose two.)

A team wants to deploy a new model using a canary deployment strategy on SageMaker. Which TWO configurations are necessary? (Choose two.)

A company uses SageMaker Pipelines to automate their ML workflow. They want to ensure that pipeline steps are not re-executed if the inputs and parameters have not changed since the last successful run. Which THREE features can help achieve this? (Choose three.)

A data science team needs to deploy a trained PyTorch model for real-time inference with sub-100ms latency. The model fits on a single GPU. Which SageMaker inference option is MOST cost-effective while meeting the latency requirement?

A company has 200 small PyTorch models that are each used infrequently but need to be available for real-time inference. To minimize costs, they want to host all models on a single endpoint. Which SageMaker feature should they use?

A machine learning engineer deploys a new model version to a SageMaker endpoint with production variants. They want to gradually shift traffic from the old model to the new model, monitoring for errors, and automatically roll back if the error rate exceeds 5%. Which deployment pattern should they use?

A team uses SageMaker Pipelines to automate retraining. They want to skip the training step if the data has not changed since the last run. Which feature should they enable?

A company needs to deploy a large language model (LLM) on SageMaker with the Triton Inference Server to maximize GPU utilization and reduce latency. They have an NVIDIA A100 GPU. Which SageMaker inference option supports Triton?

A data scientist wants to version and manage trained models, require approval before deployment, and enable cross-account deployment. Which SageMaker feature provides these capabilities?

A company runs a batch inference job on 10 TB of image data stored in S3. Each image needs to be processed by a GPU-accelerated model. The job is not time-sensitive and cost is the primary concern. Which SageMaker option is MOST appropriate?

A team uses SageMaker Pipelines to train and register a model. They want to conditionally run a hyperparameter tuning step only if the data quality check passes. Which pipeline step type should they use to branch the execution?

An ML engineer needs to orchestrate a multi-step workflow that includes data preprocessing on Spark, model training on SageMaker, and deployment to a production endpoint. They require tight integration with other AWS services and the ability to add custom logic. Which AWS service should they use alongside SageMaker?

100

A company wants to trigger a model retraining pipeline whenever new training data arrives in an S3 bucket. They also need to send a notification to a Slack channel when the retraining completes. Which TWO AWS services should they use to implement this event-driven workflow? (Select TWO.)

101

A machine learning team needs to deploy a PyTorch model that has been compiled with SageMaker Neo to improve inference performance on edge devices. Which TWO statements about SageMaker Neo are correct? (Select TWO.)

102

A company is using SageMaker to serve a model for real-time predictions. They want to test a new model version by routing a small percentage of live traffic to it while the rest goes to the current model. They also need to compare performance metrics. Which TWO actions should they take? (Select TWO.)

103

An ML engineer is designing a SageMaker Pipeline for model training and registration. They need to ensure that the pipeline can be re-run with different datasets without manual intervention, and that the steps are only re-executed if inputs have changed. Which THREE features should they configure? (Select THREE.)

104

A company wants to deploy a containerized inference application that includes a custom pre-processing script and a TensorFlow model on SageMaker. They need the ability to independently scale the pre-processing and model serving components. Which TWO SageMaker features support this? (Select TWO.)

105

A team is deploying a model using SageMaker real-time endpoint with an ml.m5.large instance. They notice high latency under peak load. They want to reduce latency without increasing instance size. Which THREE actions could help? (Select THREE.)

106

A data science team is deploying a PyTorch model for real-time inference with sub-second latency requirements. They need to minimize cost while handling variable traffic. Which TWO approaches should they consider? (Choose TWO.)

107

An MLOps team is designing a SageMaker Pipeline to automate model retraining. The pipeline must: (1) run training only if new training data is available, (2) register the model in SageMaker Model Registry only if evaluation metrics exceed a threshold, (3) deploy the approved model to a staging endpoint automatically. Which THREE steps should they include? (Choose THREE.)

108

A company wants to test a new ML model in production with minimal risk before shifting full traffic. They have an existing real-time endpoint serving model version A. They need to route 5% of live traffic to model version B and monitor performance for 24 hours. Which TWO steps should they take? (Choose TWO.)

Practice all 108 Deployment and Orchestration of ML Workflows questions

Other MLA-C01 exam domains

ML Model Development Data Preparation for Machine Learning ML Solution Monitoring, Maintenance, and Security ML Solution Monitoring, Maintenance and Security

Frequently asked questions

What does the Deployment and Orchestration of ML Workflows domain cover on the MLA-C01 exam?

The Deployment and Orchestration of ML Workflows domain covers the key concepts tested in this area of the MLA-C01 exam blueprint published by Amazon Web Services. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all MLA-C01 domains — no account required.

How many Deployment and Orchestration of ML Workflows questions are in the MLA-C01 question bank?

The Courseiva MLA-C01 question bank contains 108 questions in the Deployment and Orchestration of ML Workflows domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Deployment and Orchestration of ML Workflows for MLA-C01?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Deployment and Orchestration of ML Workflows questions for MLA-C01?

Yes — the session launcher on this page draws questions exclusively from the Deployment and Orchestration of ML Workflows domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your MLA-C01 domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included