MLA-C01 Deployment and Orchestration of ML Workflows — All Questions With Answers

Question 1easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist needs to deploy a single ML model that will serve real-time predictions with low latency (under 10 ms) for a high-traffic web application. The model fits in memory and requires GPU acceleration. Which SageMaker inference option is MOST suitable?

Question 2mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team has 200 small ML models that need to be served via HTTPS endpoints. Each model is used infrequently, and the team wants to minimize hosting costs. Which SageMaker deployment approach is MOST cost-effective?

Question 3hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team uses SageMaker Pipelines to automate model retraining. They want to skip redundant training steps when input data has not changed. Which feature should they enable?

Question 4mediummultiple choice

Review the full routing breakdown →

A company needs to deploy a new model version to a SageMaker real-time endpoint. They want to route 5% of traffic to the new version initially to monitor for errors before full rollout. Which deployment strategy should they use?

Question 5easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer needs to compile a trained TensorFlow model to run efficiently on a target edge device with an ARM CPU. Which AWS service should they use?

Question 6mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team uses SageMaker Pipelines for automated training. They need to conditionally register a model only if evaluation metrics exceed a threshold. Which pipeline step type should they use after the evaluation step?

Question 7hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to serve a large ensemble of models using NVIDIA Triton Inference Server on SageMaker for high throughput GPU inference. Which SageMaker inference option supports this?

Question 8mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML pipeline uses SageMaker Processing to run a feature engineering script. The script takes a long time and the team wants to speed up pipeline execution. What is the MOST effective approach?

Question 9mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Model Registry to manage model versions. They want to automate the approval of models that pass automated evaluation, but require manual approval for others. Which Model Registry feature supports this workflow?

Question 10easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to deploy a model that processes large payloads (up to 1 GB) asynchronously. The results should be written to S3, and the team needs SNS notifications upon completion. Which SageMaker inference option is MOST suitable?

Question 11hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team uses AWS Step Functions to orchestrate a retraining pipeline triggered by EventBridge when new training data arrives. The pipeline includes a SageMaker training job and a model evaluation. If evaluation fails, the team wants to send an alert. How should they implement this?

Question 12mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A startup wants to deploy a containerized ML application that includes both a model inference server and a preprocessing component in the same endpoint. Which SageMaker endpoint type supports running multiple containers?

Question 13mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker to deploy a model and wants to perform A/B testing by splitting traffic between two model variants. Which TWO actions should they take? (Select TWO.)

Question 14mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An organization wants to automate ML retraining using an event-driven architecture. Which THREE services should they combine? (Select THREE.)

Question 15hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to train and evaluate a model. They want to run the training step only if the data quality check passes, otherwise skip. Which TWO pipeline step types are required? (Select TWO.)

Question 16easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to deploy a model that requires less than 100 ms inference latency for real-time predictions. The model is a small PyTorch model that fits in a single GPU. Which SageMaker inference option is MOST cost-effective for this scenario?

Question 17mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team built a SageMaker Pipeline that includes a training step and a model evaluation step. They want to automatically register a model in SageMaker Model Registry only if the evaluation metric (accuracy) exceeds 0.9. Which pipeline step should be used to implement this conditional logic?

Question 18mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has 200 small models (each ~100 MB) that serve different customers. They want to minimize costs while keeping low latency for each customer. Which SageMaker deployment approach is MOST suitable?

Question 19hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team uses SageMaker Pipelines to orchestrate their ML workflow. They noticed that even when source data hasn't changed, the pipeline re-runs all steps, wasting compute time. What should they enable to avoid redundant runs?

Question 20easymultiple choice

Review the full routing breakdown →

A company wants to update an existing SageMaker real-time endpoint to serve a new model version. They need to route a small percentage of traffic to the new version initially and monitor for errors before switching fully. Which deployment pattern supports this?

Question 21mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses MLflow on SageMaker for experiment tracking. They want to automatically deploy the best-performing model from an MLflow run to a SageMaker endpoint for real-time inference. What is the MOST efficient way to achieve this?

Question 22hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a large NLP model on a SageMaker real-time endpoint using an ml.p3.2xlarge instance. To reduce inference cost without sacrificing throughput, they want to compile the model for their target hardware. Which service should they use?

Question 23mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Model Registry to manage model versions. They have a cross-account deployment requirement: models approved in the development account must be deployed to a production account. Which approach is the MOST secure and recommended?

Question 24mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to use AWS Step Functions to orchestrate a retraining workflow that is triggered when new data arrives in an S3 bucket. They also need to monitor model drift. Which event-driven approach should they use?

Question 25easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a PyTorch model that uses dynamic batching and model ensemble. They need to serve multiple models with different frameworks (PyTorch, TensorFlow) within the same endpoint. Which SageMaker feature should they use?

Question 26hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker real-time endpoints for inference. They want to deploy a new model version and compare its performance with the current version under live traffic without affecting user experience. Which method should they use?

Question 27mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a machine learning model using infrastructure as code to ensure reproducibility. They need to define the SageMaker Studio domain, user profiles, and the endpoint configuration. Which tool should they use?

Question 28mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist needs to deploy an anomaly detection model that processes large payloads (up to 10 MB per request) and expects inference times of up to 10 minutes. The team wants to minimize cost and only pay per inference. Which TWO SageMaker inference options meet these requirements? (Choose TWO.)

Question 29hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is optimizing a deep learning model for deployment on SageMaker using SageMaker Neo. Which THREE of the following are valid optimization techniques that Neo can apply? (Choose THREE.)

Question 30mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to automate their ML workflow. They need to add model versioning and approval workflow. Which THREE steps should they include in their pipeline to achieve this? (Choose THREE.)

Question 31mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team has a model that needs to serve predictions with very low latency (under 10 ms) for a real-time web application. The model is a small ensemble of three neural networks that fits in memory. Which SageMaker inference option is MOST appropriate?

Question 32mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a single model that processes images from a production line. The images are uploaded to an S3 bucket every few minutes, and the inference results must be stored back to S3. The team wants to avoid paying for idle compute and prefers a fully managed, on-demand solution. Which SageMaker inference option should they use?

Question 33hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team wants to host 50 different models for a recommendation engine. Each model is small (under 100 MB) and traffic patterns are unpredictable. They need to minimize cost and operational overhead. Which approach should they take?

Question 34easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to deploy a new version of a model gradually, initially sending 5% of traffic to the new version and 95% to the current version, while monitoring for errors. Which deployment pattern should they use?

Question 35mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using SageMaker Pipelines to orchestrate their ML workflow. They have a Condition step that checks if a model's accuracy exceeds 0.9. If true, they want to register the model in the model registry; otherwise, they want to run a retraining step. Which step type should they use for the decision?

Question 36mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An organization wants to ensure that only approved model versions can be deployed to production. They use the SageMaker Model Registry to track model versions. How can they enforce that only approved models are deployed?

Question 37hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team has a large deep learning model that needs to be deployed for real-time inference with GPU acceleration. They want to use the Triton Inference Server on SageMaker to maximize throughput. Which instance type and configuration should they choose?

Question 38easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to automate their ML workflow. They notice that the pipeline reruns all steps even when the input data has not changed. Which feature should they enable to avoid unnecessary recomputation?

Question 39mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer wants to use MLflow on SageMaker to track experiments and log metrics. They have set up MLflow on an EC2 instance. How can they best integrate MLflow tracking with SageMaker training jobs?

Question 40hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to update a model in production without any downtime. They currently have a single real-time endpoint serving traffic. Which approach allows them to deploy a new model version and switch traffic gradually while being able to roll back quickly?

Question 41easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Which SageMaker feature compiles a trained model into an optimized binary for a specific hardware target (e.g., Intel, ARM, NVIDIA, or edge devices) to improve inference performance?

Question 42mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team has a SageMaker Pipeline that trains a model and registers it in the Model Registry. They want to automate the deployment of the approved model to a staging environment. Which event-driven approach should they use?

Question 43hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer is designing a SageMaker Pipeline for a computer vision model. The pipeline includes steps for data processing, training, evaluation, and registration. The engineer wants to enable caching to avoid reprocessing when step inputs have not changed. For which steps is caching supported? (Select TWO.)

Question 44mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but wants to minimize costs by using a smaller attached GPU. Which options can they use? (Select TWO.)

Question 45mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is migrating their ML infrastructure to AWS and wants to use infrastructure as code to manage SageMaker Studio domains, user profiles, and associated resources. Which services can they use for this purpose? (Select THREE.)

Question 46mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team needs to deploy a PyTorch model that performs real-time inference with sub-100ms latency. The model requires GPU acceleration, but the team wants to minimize cost by sharing GPU instances across multiple models. Which SageMaker hosting option should they choose?

Question 47easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer wants to automatically trigger a retraining pipeline whenever new training data arrives in an S3 bucket. The pipeline uses SageMaker Pipelines. Which AWS service should be used to detect the S3 event and start the pipeline?

Question 48hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company needs to deploy a machine learning model for real-time fraud detection. The model must be highly available across multiple Availability Zones and must support automatic scaling based on request volume. The company also needs to perform canary deployments to test new model versions with a small percentage of traffic before full rollout. Which SageMaker feature should they use?

Question 49mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to use MLflow on SageMaker to track experiments and manage model lifecycle. They need to register models in the SageMaker Model Registry after training. Which approach allows them to use MLflow for experiment tracking and then register the best model to SageMaker Model Registry?

Question 50easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a trained XGBoost model for batch inference on a large dataset stored in S3. The inference job should be cost-effective and does not require real-time responses. Which SageMaker inference option should they use?

Question 51mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to deploy a TensorFlow model that requires a custom inference environment with specific system libraries. The model will be used in a real-time application with variable traffic. They want to minimize cold start latency. Which SageMaker hosting option should they choose?

Question 52hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to orchestrate their ML workflow. They notice that if a pipeline step fails due to a transient error (e.g., a brief network issue), the entire pipeline fails and they must manually rerun from the beginning. They want to automatically retry failed steps a few times before failing. What should they do?

Question 53easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist wants to compare the performance of two model versions (V1 and V2) in production by splitting traffic between them. They want to gradually increase the percentage of traffic to the new version while monitoring metrics. Which SageMaker feature enables this?

Question 54mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a large NLP model on SageMaker for real-time inference. They want to reduce inference latency and cost by optimizing the model for the target hardware. The model is trained in PyTorch. Which SageMaker feature should they use to compile the model for best performance on the chosen instance?

Question 55hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team needs to deploy a PyTorch model that uses custom CUDA kernels. They want to use NVIDIA Triton Inference Server on SageMaker for high-performance serving. Which SageMaker configuration is required to use Triton?

Question 56mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company needs to enforce that only approved model versions are deployed to production. They use SageMaker Model Registry to track versions, with an approval workflow. Which action must they take in the model registry to ensure only approved models can be deployed?

Question 57mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy 50 small models (each ~100 MB) for real-time inference. They need to minimize hosting costs while maintaining low latency. Which SageMaker hosting option is most cost-effective?

Question 58mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team uses SageMaker Pipelines to automate their ML workflow. They want to reduce costs by reusing outputs from previous pipeline runs when the input data and code have not changed. Which TWO actions should they take? (Choose two.)

Question 59hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer needs to deploy a model that requires GPU acceleration but wants to reduce inference cost by optimizing the model. They are considering SageMaker Neo compilation and Amazon Elastic Inference. Which TWO statements are correct about these services? (Choose two.)

Question 60mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using AWS Step Functions to orchestrate their ML retraining pipeline. They want to trigger retraining when new data arrives, but only if the model's performance has degraded below a threshold. Which THREE AWS services should they use together to achieve this? (Choose three.)

Question 61easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team has trained a PyTorch model for real-time inference and needs to deploy it on AWS with GPU acceleration while minimizing cold-start latency. Which SageMaker inference option should they choose?

Question 62easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has 50 small PyTorch models that are used infrequently for inference. They want to minimize costs while maintaining the ability to serve all models from a single endpoint. Which SageMaker feature should they use?

Question 63mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team uses SageMaker Pipelines to automate retraining. They want to avoid re-running data processing steps if the data has not changed since the last successful pipeline run. Which built-in feature should they enable?

Question 64mediummultiple choice

Review the full routing breakdown →

A team needs to deploy a new model version to production while minimizing risk. They want to route 5% of live traffic to the new model and 95% to the current model, and then gradually increase the new model's traffic. Which SageMaker deployment pattern should they use?

Question 65mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to run inference on a large dataset stored in S3 using a pre-trained model. The inference can tolerate latency from minutes to hours, and they want a fully managed solution that autoscales to handle large volumes. Which SageMaker inference option is most suitable?

Question 66easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to optimize a trained TensorFlow model for deployment on edge devices with limited compute. Which SageMaker feature should they use to compile the model for target hardware?

Question 67hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines with a Condition step to decide whether to register a model based on evaluation metrics. They want to also store the evaluation results for lineage tracking. Which step should they use to record the metrics?

Question 68mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a PyTorch model on SageMaker using the NVIDIA Triton Inference Server for GPU acceleration. They have an existing Triton configuration. Which approach should they take?

Question 69mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to implement an event-driven retraining pipeline that triggers retraining when new data arrives in an S3 bucket. The pipeline should include preprocessing, training, evaluation, and conditional registration. Which AWS services should they combine?

Question 70hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team needs to deploy a model that has compliance requirements to log all inference requests and responses for auditing. The model will be served using a real-time endpoint. How can they achieve this without custom code?

Question 71easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to version and track ML models, with an approval workflow for promoting models from staging to production. Which SageMaker feature should they use?

Question 72mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A startup wants to deploy a model that has variable traffic patterns, with some periods of no traffic and occasional spikes. They want to pay only for what they use and do not want to manage instances. Which SageMaker inference option should they choose?

Question 73mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a new model using a canary deployment strategy on SageMaker. Which two actions should they take? (Select TWO.)

Question 74hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer is designing a SageMaker Pipeline that includes a training step, a processing step for evaluation, and a condition step to decide whether to register the model. The pipeline should support caching to avoid redundant runs when inputs haven't changed. Which three steps must have caching enabled? (Select THREE.)

Question 75mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to deploy a single SageMaker real-time endpoint that serves both a PyTorch model for NLP and a TensorFlow model for image classification. Each model requires a different inference container. Which two features can they use together to achieve this? (Select TWO.)

Question 76mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to serve 200 different PyTorch models. Each model is small (under 1 GB) and only a fraction are used at any time. To minimize cost and management overhead, which SageMaker inference option should be used?

Question 77mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team needs to deploy a new model version for A/B testing, gradually shifting traffic from the old version to the new version over 24 hours. Which deployment strategy should they use?

Question 78hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to retrain a model nightly. They want to skip the training step if the new data is unchanged (same checksum as previous run) to save cost and time. Which pipeline configuration achieves this?

Question 79mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to serve real-time predictions from a large ensemble of three deep learning models, each requiring different inference environments (PyTorch, TensorFlow, MXNet). Which SageMaker endpoint type supports running multiple inference containers together?

Question 80mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist wants to train a model on SageMaker using a custom PyTorch script, then register the best model in the SageMaker Model Registry. The training job is part of a SageMaker Pipeline. Which pipeline step should be used to register the model?

Question 81easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to deploy a model that performs inference on large video files (up to 2 GB each) uploaded to an S3 bucket. The inference can tolerate a few minutes of latency. Which SageMaker inference option is most cost-effective?

Question 82easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Neo to compile a trained model for deployment on edge devices. What is the primary benefit of using Neo?

Question 83mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to orchestrate a multi-step ML workflow that includes data preprocessing, hyperparameter tuning, model training, evaluation, and conditional deployment to staging or production based on evaluation metrics. The workflow should run on a schedule and track lineage. Which service should they use?

Question 84hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team deploys a model on a SageMaker real-time endpoint using an ml.m5.xlarge instance. The model has high latency due to a large neural network. The team wants to reduce latency without changing the model code. Which option should they use?

Question 85mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Model Registry to manage model versions. They want to enforce that only models with an 'Approved' status can be deployed to production endpoints. How can they enforce this?

Question 86mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses MLflow on SageMaker for experiment tracking. They want to automate the retraining of a model when new training data arrives in an S3 bucket. Which combination of services should they use?

Question 87easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a model using a serverless inference endpoint that can automatically scale to zero when not in use and has a configurable maximum concurrency. Which SageMaker inference option meets these requirements?

Question 88hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer is deploying a TensorFlow model for real-time inference. The model has high latency on CPU. Which TWO actions can reduce inference latency? (Choose two.)

Question 89mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to deploy a new model using a canary deployment strategy on SageMaker. Which TWO configurations are necessary? (Choose two.)

Question 90mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to automate their ML workflow. They want to ensure that pipeline steps are not re-executed if the inputs and parameters have not changed since the last successful run. Which THREE features can help achieve this? (Choose three.)

Question 91easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team needs to deploy a trained PyTorch model for real-time inference with sub-100ms latency. The model fits on a single GPU. Which SageMaker inference option is MOST cost-effective while meeting the latency requirement?

Question 92mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has 200 small PyTorch models that are each used infrequently but need to be available for real-time inference. To minimize costs, they want to host all models on a single endpoint. Which SageMaker feature should they use?

Question 93hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer deploys a new model version to a SageMaker endpoint with production variants. They want to gradually shift traffic from the old model to the new model, monitoring for errors, and automatically roll back if the error rate exceeds 5%. Which deployment pattern should they use?

Question 94mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to automate retraining. They want to skip the training step if the data has not changed since the last run. Which feature should they enable?

Question 95hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to deploy a large language model (LLM) on SageMaker with the Triton Inference Server to maximize GPU utilization and reduce latency. They have an NVIDIA A100 GPU. Which SageMaker inference option supports Triton?

Question 96easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist wants to version and manage trained models, require approval before deployment, and enable cross-account deployment. Which SageMaker feature provides these capabilities?

Question 97mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company runs a batch inference job on 10 TB of image data stored in S3. Each image needs to be processed by a GPU-accelerated model. The job is not time-sensitive and cost is the primary concern. Which SageMaker option is MOST appropriate?

Question 98hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to train and register a model. They want to conditionally run a hyperparameter tuning step only if the data quality check passes. Which pipeline step type should they use to branch the execution?

Question 99mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer needs to orchestrate a multi-step workflow that includes data preprocessing on Spark, model training on SageMaker, and deployment to a production endpoint. They require tight integration with other AWS services and the ability to add custom logic. Which AWS service should they use alongside SageMaker?

Question 100easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to trigger a model retraining pipeline whenever new training data arrives in an S3 bucket. They also need to send a notification to a Slack channel when the retraining completes. Which TWO AWS services should they use to implement this event-driven workflow? (Select TWO.)

Question 101mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team needs to deploy a PyTorch model that has been compiled with SageMaker Neo to improve inference performance on edge devices. Which TWO statements about SageMaker Neo are correct? (Select TWO.)

Question 102mediummulti select

Review the full routing breakdown →

A company is using SageMaker to serve a model for real-time predictions. They want to test a new model version by routing a small percentage of live traffic to it while the rest goes to the current model. They also need to compare performance metrics. Which TWO actions should they take? (Select TWO.)

Question 103hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer is designing a SageMaker Pipeline for model training and registration. They need to ensure that the pipeline can be re-run with different datasets without manual intervention, and that the steps are only re-executed if inputs have changed. Which THREE features should they configure? (Select THREE.)

Question 104mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a containerized inference application that includes a custom pre-processing script and a TensorFlow model on SageMaker. They need the ability to independently scale the pre-processing and model serving components. Which TWO SageMaker features support this? (Select TWO.)

Question 105hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is deploying a model using SageMaker real-time endpoint with an ml.m5.large instance. They notice high latency under peak load. They want to reduce latency without increasing instance size. Which THREE actions could help? (Select THREE.)

Question 106mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team is deploying a PyTorch model for real-time inference with sub-second latency requirements. They need to minimize cost while handling variable traffic. Which TWO approaches should they consider? (Choose TWO.)

Question 107hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An MLOps team is designing a SageMaker Pipeline to automate model retraining. The pipeline must: (1) run training only if new training data is available, (2) register the model in SageMaker Model Registry only if evaluation metrics exceed a threshold, (3) deploy the approved model to a staging endpoint automatically. Which THREE steps should they include? (Choose THREE.)

Question 108mediummulti select

Review the full routing breakdown →

A company wants to test a new ML model in production with minimal risk before shifting full traffic. They have an existing real-time endpoint serving model version A. They need to route 5% of live traffic to model version B and monitor performance for 24 hours. Which TWO steps should they take? (Choose TWO.)

Question 1easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist needs to deploy a single ML model that will serve real-time predictions with low latency (under 10 ms) for a high-traffic web application. The model fits in memory and requires GPU acceleration. Which SageMaker inference option is MOST suitable?

Question 2mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team has 200 small ML models that need to be served via HTTPS endpoints. Each model is used infrequently, and the team wants to minimize hosting costs. Which SageMaker deployment approach is MOST cost-effective?

Question 3hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team uses SageMaker Pipelines to automate model retraining. They want to skip redundant training steps when input data has not changed. Which feature should they enable?

Question 4mediummultiple choice

Review the full routing breakdown →

A company needs to deploy a new model version to a SageMaker real-time endpoint. They want to route 5% of traffic to the new version initially to monitor for errors before full rollout. Which deployment strategy should they use?

Question 5easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer needs to compile a trained TensorFlow model to run efficiently on a target edge device with an ARM CPU. Which AWS service should they use?

Question 6mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team uses SageMaker Pipelines for automated training. They need to conditionally register a model only if evaluation metrics exceed a threshold. Which pipeline step type should they use after the evaluation step?

Question 7hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to serve a large ensemble of models using NVIDIA Triton Inference Server on SageMaker for high throughput GPU inference. Which SageMaker inference option supports this?

Question 8mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML pipeline uses SageMaker Processing to run a feature engineering script. The script takes a long time and the team wants to speed up pipeline execution. What is the MOST effective approach?

Question 9mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Model Registry to manage model versions. They want to automate the approval of models that pass automated evaluation, but require manual approval for others. Which Model Registry feature supports this workflow?

Question 10easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to deploy a model that processes large payloads (up to 1 GB) asynchronously. The results should be written to S3, and the team needs SNS notifications upon completion. Which SageMaker inference option is MOST suitable?

Question 11hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team uses AWS Step Functions to orchestrate a retraining pipeline triggered by EventBridge when new training data arrives. The pipeline includes a SageMaker training job and a model evaluation. If evaluation fails, the team wants to send an alert. How should they implement this?

Question 12mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A startup wants to deploy a containerized ML application that includes both a model inference server and a preprocessing component in the same endpoint. Which SageMaker endpoint type supports running multiple containers?

Question 13mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker to deploy a model and wants to perform A/B testing by splitting traffic between two model variants. Which TWO actions should they take? (Select TWO.)

Question 14mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An organization wants to automate ML retraining using an event-driven architecture. Which THREE services should they combine? (Select THREE.)

Question 15hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to train and evaluate a model. They want to run the training step only if the data quality check passes, otherwise skip. Which TWO pipeline step types are required? (Select TWO.)

Question 16easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to deploy a model that requires less than 100 ms inference latency for real-time predictions. The model is a small PyTorch model that fits in a single GPU. Which SageMaker inference option is MOST cost-effective for this scenario?

Question 17mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team built a SageMaker Pipeline that includes a training step and a model evaluation step. They want to automatically register a model in SageMaker Model Registry only if the evaluation metric (accuracy) exceeds 0.9. Which pipeline step should be used to implement this conditional logic?

Question 18mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has 200 small models (each ~100 MB) that serve different customers. They want to minimize costs while keeping low latency for each customer. Which SageMaker deployment approach is MOST suitable?

Question 19hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team uses SageMaker Pipelines to orchestrate their ML workflow. They noticed that even when source data hasn't changed, the pipeline re-runs all steps, wasting compute time. What should they enable to avoid redundant runs?

Question 20easymultiple choice

Review the full routing breakdown →

A company wants to update an existing SageMaker real-time endpoint to serve a new model version. They need to route a small percentage of traffic to the new version initially and monitor for errors before switching fully. Which deployment pattern supports this?

Question 21mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses MLflow on SageMaker for experiment tracking. They want to automatically deploy the best-performing model from an MLflow run to a SageMaker endpoint for real-time inference. What is the MOST efficient way to achieve this?

Question 22hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a large NLP model on a SageMaker real-time endpoint using an ml.p3.2xlarge instance. To reduce inference cost without sacrificing throughput, they want to compile the model for their target hardware. Which service should they use?

Question 23mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Model Registry to manage model versions. They have a cross-account deployment requirement: models approved in the development account must be deployed to a production account. Which approach is the MOST secure and recommended?

Question 24mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to use AWS Step Functions to orchestrate a retraining workflow that is triggered when new data arrives in an S3 bucket. They also need to monitor model drift. Which event-driven approach should they use?

Question 25easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a PyTorch model that uses dynamic batching and model ensemble. They need to serve multiple models with different frameworks (PyTorch, TensorFlow) within the same endpoint. Which SageMaker feature should they use?

Question 26hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker real-time endpoints for inference. They want to deploy a new model version and compare its performance with the current version under live traffic without affecting user experience. Which method should they use?

Question 27mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a machine learning model using infrastructure as code to ensure reproducibility. They need to define the SageMaker Studio domain, user profiles, and the endpoint configuration. Which tool should they use?

Question 28mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist needs to deploy an anomaly detection model that processes large payloads (up to 10 MB per request) and expects inference times of up to 10 minutes. The team wants to minimize cost and only pay per inference. Which TWO SageMaker inference options meet these requirements? (Choose TWO.)

Question 29hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is optimizing a deep learning model for deployment on SageMaker using SageMaker Neo. Which THREE of the following are valid optimization techniques that Neo can apply? (Choose THREE.)

Question 30mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to automate their ML workflow. They need to add model versioning and approval workflow. Which THREE steps should they include in their pipeline to achieve this? (Choose THREE.)

Question 31mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team has a model that needs to serve predictions with very low latency (under 10 ms) for a real-time web application. The model is a small ensemble of three neural networks that fits in memory. Which SageMaker inference option is MOST appropriate?

Question 32mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a single model that processes images from a production line. The images are uploaded to an S3 bucket every few minutes, and the inference results must be stored back to S3. The team wants to avoid paying for idle compute and prefers a fully managed, on-demand solution. Which SageMaker inference option should they use?

Question 33hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team wants to host 50 different models for a recommendation engine. Each model is small (under 100 MB) and traffic patterns are unpredictable. They need to minimize cost and operational overhead. Which approach should they take?

Question 34easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to deploy a new version of a model gradually, initially sending 5% of traffic to the new version and 95% to the current version, while monitoring for errors. Which deployment pattern should they use?

Question 35mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using SageMaker Pipelines to orchestrate their ML workflow. They have a Condition step that checks if a model's accuracy exceeds 0.9. If true, they want to register the model in the model registry; otherwise, they want to run a retraining step. Which step type should they use for the decision?

Question 36mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An organization wants to ensure that only approved model versions can be deployed to production. They use the SageMaker Model Registry to track model versions. How can they enforce that only approved models are deployed?

Question 37hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team has a large deep learning model that needs to be deployed for real-time inference with GPU acceleration. They want to use the Triton Inference Server on SageMaker to maximize throughput. Which instance type and configuration should they choose?

Question 38easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to automate their ML workflow. They notice that the pipeline reruns all steps even when the input data has not changed. Which feature should they enable to avoid unnecessary recomputation?

Question 39mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer wants to use MLflow on SageMaker to track experiments and log metrics. They have set up MLflow on an EC2 instance. How can they best integrate MLflow tracking with SageMaker training jobs?

Question 40hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to update a model in production without any downtime. They currently have a single real-time endpoint serving traffic. Which approach allows them to deploy a new model version and switch traffic gradually while being able to roll back quickly?

Question 41easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Which SageMaker feature compiles a trained model into an optimized binary for a specific hardware target (e.g., Intel, ARM, NVIDIA, or edge devices) to improve inference performance?

Question 42mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team has a SageMaker Pipeline that trains a model and registers it in the Model Registry. They want to automate the deployment of the approved model to a staging environment. Which event-driven approach should they use?

Question 43hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer is designing a SageMaker Pipeline for a computer vision model. The pipeline includes steps for data processing, training, evaluation, and registration. The engineer wants to enable caching to avoid reprocessing when step inputs have not changed. For which steps is caching supported? (Select TWO.)

Question 44mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but wants to minimize costs by using a smaller attached GPU. Which options can they use? (Select TWO.)

Question 45mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is migrating their ML infrastructure to AWS and wants to use infrastructure as code to manage SageMaker Studio domains, user profiles, and associated resources. Which services can they use for this purpose? (Select THREE.)

Question 46mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team needs to deploy a PyTorch model that performs real-time inference with sub-100ms latency. The model requires GPU acceleration, but the team wants to minimize cost by sharing GPU instances across multiple models. Which SageMaker hosting option should they choose?

Question 47easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer wants to automatically trigger a retraining pipeline whenever new training data arrives in an S3 bucket. The pipeline uses SageMaker Pipelines. Which AWS service should be used to detect the S3 event and start the pipeline?

Question 48hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company needs to deploy a machine learning model for real-time fraud detection. The model must be highly available across multiple Availability Zones and must support automatic scaling based on request volume. The company also needs to perform canary deployments to test new model versions with a small percentage of traffic before full rollout. Which SageMaker feature should they use?

Question 49mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to use MLflow on SageMaker to track experiments and manage model lifecycle. They need to register models in the SageMaker Model Registry after training. Which approach allows them to use MLflow for experiment tracking and then register the best model to SageMaker Model Registry?

Question 50easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a trained XGBoost model for batch inference on a large dataset stored in S3. The inference job should be cost-effective and does not require real-time responses. Which SageMaker inference option should they use?

Question 51mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to deploy a TensorFlow model that requires a custom inference environment with specific system libraries. The model will be used in a real-time application with variable traffic. They want to minimize cold start latency. Which SageMaker hosting option should they choose?

Question 52hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to orchestrate their ML workflow. They notice that if a pipeline step fails due to a transient error (e.g., a brief network issue), the entire pipeline fails and they must manually rerun from the beginning. They want to automatically retry failed steps a few times before failing. What should they do?

Question 53easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist wants to compare the performance of two model versions (V1 and V2) in production by splitting traffic between them. They want to gradually increase the percentage of traffic to the new version while monitoring metrics. Which SageMaker feature enables this?

Question 54mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a large NLP model on SageMaker for real-time inference. They want to reduce inference latency and cost by optimizing the model for the target hardware. The model is trained in PyTorch. Which SageMaker feature should they use to compile the model for best performance on the chosen instance?

Question 55hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team needs to deploy a PyTorch model that uses custom CUDA kernels. They want to use NVIDIA Triton Inference Server on SageMaker for high-performance serving. Which SageMaker configuration is required to use Triton?

Question 56mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company needs to enforce that only approved model versions are deployed to production. They use SageMaker Model Registry to track versions, with an approval workflow. Which action must they take in the model registry to ensure only approved models can be deployed?

Question 57mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy 50 small models (each ~100 MB) for real-time inference. They need to minimize hosting costs while maintaining low latency. Which SageMaker hosting option is most cost-effective?

Question 58mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team uses SageMaker Pipelines to automate their ML workflow. They want to reduce costs by reusing outputs from previous pipeline runs when the input data and code have not changed. Which TWO actions should they take? (Choose two.)

Question 59hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer needs to deploy a model that requires GPU acceleration but wants to reduce inference cost by optimizing the model. They are considering SageMaker Neo compilation and Amazon Elastic Inference. Which TWO statements are correct about these services? (Choose two.)

Question 60mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using AWS Step Functions to orchestrate their ML retraining pipeline. They want to trigger retraining when new data arrives, but only if the model's performance has degraded below a threshold. Which THREE AWS services should they use together to achieve this? (Choose three.)

Question 61easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team has trained a PyTorch model for real-time inference and needs to deploy it on AWS with GPU acceleration while minimizing cold-start latency. Which SageMaker inference option should they choose?

Question 62easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has 50 small PyTorch models that are used infrequently for inference. They want to minimize costs while maintaining the ability to serve all models from a single endpoint. Which SageMaker feature should they use?

Question 63mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team uses SageMaker Pipelines to automate retraining. They want to avoid re-running data processing steps if the data has not changed since the last successful pipeline run. Which built-in feature should they enable?

Question 64mediummultiple choice

Review the full routing breakdown →

A team needs to deploy a new model version to production while minimizing risk. They want to route 5% of live traffic to the new model and 95% to the current model, and then gradually increase the new model's traffic. Which SageMaker deployment pattern should they use?

Question 65mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to run inference on a large dataset stored in S3 using a pre-trained model. The inference can tolerate latency from minutes to hours, and they want a fully managed solution that autoscales to handle large volumes. Which SageMaker inference option is most suitable?

Question 66easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to optimize a trained TensorFlow model for deployment on edge devices with limited compute. Which SageMaker feature should they use to compile the model for target hardware?

Question 67hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines with a Condition step to decide whether to register a model based on evaluation metrics. They want to also store the evaluation results for lineage tracking. Which step should they use to record the metrics?

Question 68mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a PyTorch model on SageMaker using the NVIDIA Triton Inference Server for GPU acceleration. They have an existing Triton configuration. Which approach should they take?

Question 69mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to implement an event-driven retraining pipeline that triggers retraining when new data arrives in an S3 bucket. The pipeline should include preprocessing, training, evaluation, and conditional registration. Which AWS services should they combine?

Question 70hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team needs to deploy a model that has compliance requirements to log all inference requests and responses for auditing. The model will be served using a real-time endpoint. How can they achieve this without custom code?

Question 71easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to version and track ML models, with an approval workflow for promoting models from staging to production. Which SageMaker feature should they use?

Question 72mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A startup wants to deploy a model that has variable traffic patterns, with some periods of no traffic and occasional spikes. They want to pay only for what they use and do not want to manage instances. Which SageMaker inference option should they choose?

Question 73mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a new model using a canary deployment strategy on SageMaker. Which two actions should they take? (Select TWO.)

Question 74hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer is designing a SageMaker Pipeline that includes a training step, a processing step for evaluation, and a condition step to decide whether to register the model. The pipeline should support caching to avoid redundant runs when inputs haven't changed. Which three steps must have caching enabled? (Select THREE.)

Question 75mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to deploy a single SageMaker real-time endpoint that serves both a PyTorch model for NLP and a TensorFlow model for image classification. Each model requires a different inference container. Which two features can they use together to achieve this? (Select TWO.)

Question 76mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to serve 200 different PyTorch models. Each model is small (under 1 GB) and only a fraction are used at any time. To minimize cost and management overhead, which SageMaker inference option should be used?

Question 77mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team needs to deploy a new model version for A/B testing, gradually shifting traffic from the old version to the new version over 24 hours. Which deployment strategy should they use?

Question 78hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to retrain a model nightly. They want to skip the training step if the new data is unchanged (same checksum as previous run) to save cost and time. Which pipeline configuration achieves this?

Question 79mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to serve real-time predictions from a large ensemble of three deep learning models, each requiring different inference environments (PyTorch, TensorFlow, MXNet). Which SageMaker endpoint type supports running multiple inference containers together?

Question 80mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist wants to train a model on SageMaker using a custom PyTorch script, then register the best model in the SageMaker Model Registry. The training job is part of a SageMaker Pipeline. Which pipeline step should be used to register the model?

Question 81easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to deploy a model that performs inference on large video files (up to 2 GB each) uploaded to an S3 bucket. The inference can tolerate a few minutes of latency. Which SageMaker inference option is most cost-effective?

Question 82easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Neo to compile a trained model for deployment on edge devices. What is the primary benefit of using Neo?

Question 83mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to orchestrate a multi-step ML workflow that includes data preprocessing, hyperparameter tuning, model training, evaluation, and conditional deployment to staging or production based on evaluation metrics. The workflow should run on a schedule and track lineage. Which service should they use?

Question 84hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team deploys a model on a SageMaker real-time endpoint using an ml.m5.xlarge instance. The model has high latency due to a large neural network. The team wants to reduce latency without changing the model code. Which option should they use?

Question 85mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Model Registry to manage model versions. They want to enforce that only models with an 'Approved' status can be deployed to production endpoints. How can they enforce this?

Question 86mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses MLflow on SageMaker for experiment tracking. They want to automate the retraining of a model when new training data arrives in an S3 bucket. Which combination of services should they use?

Question 87easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a model using a serverless inference endpoint that can automatically scale to zero when not in use and has a configurable maximum concurrency. Which SageMaker inference option meets these requirements?

Question 88hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer is deploying a TensorFlow model for real-time inference. The model has high latency on CPU. Which TWO actions can reduce inference latency? (Choose two.)

Question 89mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to deploy a new model using a canary deployment strategy on SageMaker. Which TWO configurations are necessary? (Choose two.)

Question 90mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to automate their ML workflow. They want to ensure that pipeline steps are not re-executed if the inputs and parameters have not changed since the last successful run. Which THREE features can help achieve this? (Choose three.)

Question 91easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team needs to deploy a trained PyTorch model for real-time inference with sub-100ms latency. The model fits on a single GPU. Which SageMaker inference option is MOST cost-effective while meeting the latency requirement?

Question 92mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has 200 small PyTorch models that are each used infrequently but need to be available for real-time inference. To minimize costs, they want to host all models on a single endpoint. Which SageMaker feature should they use?

Question 93hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer deploys a new model version to a SageMaker endpoint with production variants. They want to gradually shift traffic from the old model to the new model, monitoring for errors, and automatically roll back if the error rate exceeds 5%. Which deployment pattern should they use?

Question 94mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to automate retraining. They want to skip the training step if the data has not changed since the last run. Which feature should they enable?

Question 95hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company needs to deploy a large language model (LLM) on SageMaker with the Triton Inference Server to maximize GPU utilization and reduce latency. They have an NVIDIA A100 GPU. Which SageMaker inference option supports Triton?

Question 96easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist wants to version and manage trained models, require approval before deployment, and enable cross-account deployment. Which SageMaker feature provides these capabilities?

Question 97mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company runs a batch inference job on 10 TB of image data stored in S3. Each image needs to be processed by a GPU-accelerated model. The job is not time-sensitive and cost is the primary concern. Which SageMaker option is MOST appropriate?

Question 98hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines to train and register a model. They want to conditionally run a hyperparameter tuning step only if the data quality check passes. Which pipeline step type should they use to branch the execution?

Question 99mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer needs to orchestrate a multi-step workflow that includes data preprocessing on Spark, model training on SageMaker, and deployment to a production endpoint. They require tight integration with other AWS services and the ability to add custom logic. Which AWS service should they use alongside SageMaker?

Question 100easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to trigger a model retraining pipeline whenever new training data arrives in an S3 bucket. They also need to send a notification to a Slack channel when the retraining completes. Which TWO AWS services should they use to implement this event-driven workflow? (Select TWO.)

Question 101mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team needs to deploy a PyTorch model that has been compiled with SageMaker Neo to improve inference performance on edge devices. Which TWO statements about SageMaker Neo are correct? (Select TWO.)

Question 102mediummulti select

Review the full routing breakdown →

A company is using SageMaker to serve a model for real-time predictions. They want to test a new model version by routing a small percentage of live traffic to it while the rest goes to the current model. They also need to compare performance metrics. Which TWO actions should they take? (Select TWO.)

Question 103hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer is designing a SageMaker Pipeline for model training and registration. They need to ensure that the pipeline can be re-run with different datasets without manual intervention, and that the steps are only re-executed if inputs have changed. Which THREE features should they configure? (Select THREE.)

Question 104mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a containerized inference application that includes a custom pre-processing script and a TensorFlow model on SageMaker. They need the ability to independently scale the pre-processing and model serving components. Which TWO SageMaker features support this? (Select TWO.)

Question 105hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is deploying a model using SageMaker real-time endpoint with an ml.m5.large instance. They notice high latency under peak load. They want to reduce latency without increasing instance size. Which THREE actions could help? (Select THREE.)

Question 106mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team is deploying a PyTorch model for real-time inference with sub-second latency requirements. They need to minimize cost while handling variable traffic. Which TWO approaches should they consider? (Choose TWO.)

Question 107hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An MLOps team is designing a SageMaker Pipeline to automate model retraining. The pipeline must: (1) run training only if new training data is available, (2) register the model in SageMaker Model Registry only if evaluation metrics exceed a threshold, (3) deploy the approved model to a staging endpoint automatically. Which THREE steps should they include? (Choose THREE.)

Question 108mediummulti select

Review the full routing breakdown →

A company wants to test a new ML model in production with minimal risk before shifting full traffic. They have an existing real-time endpoint serving model version A. They need to route 5% of live traffic to model version B and monitor performance for 24 hours. Which TWO steps should they take? (Choose TWO.)