MLA-C01 Deployment and Orchestration of ML Workflows — All Questions With Answers

Question 1mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team has trained a PyTorch model using Amazon SageMaker and wants to deploy it with a custom inference container that includes a pre-processing step. The team needs to minimize latency and ensure the pre-processing runs only once per request. Which SageMaker real-time inference option should they use?

Question 2hardmultiple choice

Read the full NAT/PAT explanation →

A company is deploying a real-time inference endpoint for a natural language processing model using Amazon SageMaker. The model requires GPU acceleration and must handle variable traffic patterns, including sudden spikes. The team wants to minimize costs while maintaining low latency during spikes. Which endpoint configuration strategy should they use?

Question 3easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer is deploying a model using AWS Lambda for inference. The model is a small scikit-learn classifier with a size of 50 MB. The Lambda function is invoked by an API Gateway REST API. The engineer notices that cold starts are causing high latency. Which action would most effectively reduce cold start latency without increasing costs significantly?

Question 4mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses Amazon SageMaker to train and deploy machine learning models. The security team requires that all data in transit between the training job and S3 be encrypted, and that no data traverses the public internet. Which configuration should the company use?

Question 5hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is deploying a deep learning model on a SageMaker real-time endpoint. The model has high memory requirements, and the team wants to minimize instance cost while ensuring the endpoint can handle up to 10 concurrent requests. They plan to use a single ml.p3.2xlarge instance (8 vCPUs, 61 GB memory). Which SageMaker endpoint configuration will allow the endpoint to handle 10 concurrent requests without errors?

Question 6easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a machine learning model that was trained on-premises using TensorFlow. The model is a TensorFlow SavedModel. The company uses AWS and wants to minimize operational overhead. Which deployment option meets these requirements?

Question 7mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is using AWS Step Functions to orchestrate a machine learning workflow that includes data preprocessing, training, and model evaluation. The team wants to run the workflow whenever new data arrives in an S3 bucket. Which approach should they use to trigger the Step Functions workflow?

Question 8hardmulti select

Read the full NAT/PAT explanation →

A company is deploying a machine learning model using Amazon SageMaker. The model is a large deep learning model that requires GPU for inference. The company expects unpredictable traffic patterns with occasional bursts. They want to minimize cost while ensuring low latency during bursts. Which TWO actions should they take? (Select TWO.)

Question 9mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An MLOps engineer is designing a CI/CD pipeline for deploying machine learning models to a production SageMaker endpoint. The pipeline should include automated testing, approval gates, and rollback capability. Which THREE components should be included in the pipeline? (Select THREE.)

Question 10easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using Amazon SageMaker to deploy a model for real-time inference. The model requires access to a private S3 bucket that contains reference data. The company wants to ensure that the endpoint can access the S3 bucket without using a public internet connection. Which TWO actions should they take? (Select TWO.)

Question 11hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist is trying to create a SageMaker endpoint configuration with 6 instances of ml.c5.large for a production variant. The creation fails with the error shown in the exhibit. Which action should the data scientist take to resolve this issue?

Exhibit

Refer to the exhibit.

Error log from SageMaker endpoint creation:
```
ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateEndpointConfig operation: The account-level service limit for 'ml.c5.large for real-time endpoints' is 5. You have requested 6 instances. Please use AWS Service Quotas to request an increase.
```

Question 12mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer has configured a SageMaker Model Monitor schedule for data quality monitoring as shown in the exhibit. The schedule is set to run hourly. However, the engineer notices that the monitoring jobs are not producing output in the specified S3 bucket. What is the most likely cause?

Exhibit

Refer to the exhibit.

SageMaker Model Monitor schedule configuration:
```
{
  "ScheduleConfig": {
    "ScheduleExpression": "cron(0 * * * ? *)",
    "DataAnalysisStartTime": "2023-01-01T00:00:00Z",
    "DataAnalysisEndTime": "2023-01-01T23:59:00Z"
  },
  "JobDefinition": {
    "Environment": {
      "output_path": "s3://my-bucket/reports/"
    }
  },
  "MonitoringType": "DataQuality"
}
```

Question 13mediummultiple choice

Read the full NAT/PAT explanation →

A data science team has trained a model using SageMaker and wants to deploy it for real-time inference with automatic scaling based on request latency. The deployment must handle unpredictable traffic spikes without manual intervention. Which combination of SageMaker features should the team use?

Question 14hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer is deploying a PyTorch model for real-time inference on SageMaker. The model requires GPU for low-latency predictions. The deployment fails with the error: 'The primary container does not support the requested instance type.' The instance type is ml.p3.2xlarge. Which action should the engineer take to resolve the issue?

Question 15easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using SageMaker Pipelines to automate a multi-step ML workflow. The pipeline includes data preprocessing, training, and model evaluation. The team wants to ensure that if the evaluation step fails, the pipeline stops and sends an alert to the operations team. Which SageMaker Pipelines feature should they use?

Question 16mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is deploying a machine learning model using Amazon SageMaker. They need to serve predictions with sub-100ms latency for a real-time application. The model is a large ensemble that requires 4 GB of memory. The team expects traffic of 100 requests per second initially, but it may double during peak hours. Which instance type and deployment configuration should the team choose to minimize cost while meeting the latency requirement?

Question 17hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has a SageMaker endpoint running a model that provides real-time recommendations. Recently, the model's accuracy has degraded due to data drift. The team wants to automatically retrain the model when a drift metric exceeds a threshold and deploy the new model without downtime. Which architecture should the team implement?

Question 18easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team needs to deploy a model that was built using scikit-learn. They want to use SageMaker for hosting. Which approach should they take?

Question 19mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

Which TWO of the following are best practices for deploying machine learning models on SageMaker? (Select TWO.)

Question 20mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A media company uses SageMaker to host a real-time video recommendation model. The model is deployed on a single ml.c5.xlarge endpoint. During a major live event, traffic surges to 10 times the normal load, and the endpoint becomes unresponsive, causing high latency and errors. The team had set up an Application Auto Scaling target tracking policy based on CPU utilization with a target of 70%. However, scaling did not trigger quickly enough. After the event, the team reviews CloudWatch metrics and notices that CPU utilization never exceeded 70% during the surge, but memory utilization peaked at 95%. The model is memory-bound. The team wants to ensure the endpoint scales automatically before performance degrades during future events. What should the team do?

Question 21hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company has a SageMaker pipeline that trains a fraud detection model daily. The pipeline consists of three steps: preprocessing (using a Spark script), training (XGBoost), and evaluation. The evaluation step calculates the F1 score and compares it to a threshold of 0.95. If the F1 score is below 0.95, the pipeline should fail and notify the team via email. The team implemented this using a Condition step that checks if the F1 score is greater than or equal to 0.95. If true, the pipeline proceeds to register the model; if false, the pipeline fails. However, the team notices that even when the F1 score is 0.94, the pipeline continues to the registration step. The evaluation script outputs the F1 score as a float with two decimal places in a JSON file. The Condition step uses the expression: $.evaluation.metrics.f1_score >= 0.95. What is the most likely cause of the issue?

Question 22easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team needs to deploy a PyTorch model for real-time inference with low latency. The model requires GPU acceleration. Which SageMaker endpoint configuration should they use?

Question 23mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is using SageMaker Pipelines to automate retraining and deployment. They want to trigger the pipeline automatically when new training data is available in an S3 bucket. Which approach should they use?

Question 24hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company deploys a fraud detection model on a SageMaker real-time endpoint. The inference logic includes a pre-processing step that requires access to a DynamoDB table for user metadata. The model container is a custom Docker image. How should the team grant the endpoint access to DynamoDB?

Question 25easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team wants to deploy a model that was trained using XGBoost in SageMaker. They want to use the built-in XGBoost algorithm container for inference. Which inference option requires the least custom code?

Question 26mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a large number of small models (each < 100 MB) for different customers. They want to minimize costs and management overhead while serving traffic that varies significantly. Which SageMaker endpoint type should they choose?

Question 27hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

During a blue/green deployment of a SageMaker endpoint, the team notices that traffic is not being fully shifted to the new variant after the update. The endpoint has two variants with equal initial weights (50% each). The team wants to shift 100% traffic to the new variant. What is the most likely cause?

Question 28easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer needs to deploy a model as an AWS Lambda function for serverless inference. The model is a scikit-learn pipeline serialized as a pickle file. What is the best way to include the model in the Lambda deployment?

Question 29mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to train and register models. They want to automate the deployment of approved models from the model registry to a staging endpoint. Which service should they use to orchestrate the deployment workflow?

Question 30hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is deploying a TensorFlow model on a SageMaker real-time endpoint with automatic scaling. They set the scaling policy to target an average CPU utilization of 50%. However, during traffic spikes, the endpoint experiences high latency and 503 errors. The instance type is ml.c5.large. What should the team do to resolve this while minimizing cost?

Question 31mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a machine learning model using SageMaker hosting. They need to support multiple versions of the model for A/B testing. Which TWO actions are required to set up the A/B test? (Choose two.)

Question 32hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using an AWS Step Functions state machine to orchestrate a multi-step ML deployment. The workflow includes: training a model, evaluating it, registering the model, and deploying to a staging endpoint. They need to implement an approval gate before deploying to production. Which THREE components are necessary to achieve this? (Choose three.)

Question 33easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a model on SageMaker serverless inference. Which TWO of the following are limitations of serverless endpoints compared to real-time endpoints? (Choose two.)

Question 34easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company trained a model using SageMaker and wants to deploy it with low latency for real-time inference. Which SageMaker feature is MOST suitable?

Question 35easymultiple choice

Read the full NAT/PAT explanation →

A data scientist wants to automate retraining of a model weekly and deploy the new model automatically after passing validation. Which AWS service combination is best?

Question 36easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has a model that receives low traffic but needs to handle sudden spikes. Which deployment option is most cost-effective?

Question 37mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team notices that inference requests to their SageMaker endpoint are failing with '504 Gateway Timeout' for large payloads. What change should be made?

Question 38mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is using SageMaker Model Registry to manage model versions. They want to automatically deploy the latest approved model to production after retraining. Which approach is best?

Question 39mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying multiple models on a single endpoint to reduce costs. They need to update one model without affecting others. Which solution?

Question 40hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker endpoints with auto-scaling based on CPU utilization. During a flash sale, latency increases despite low CPU. What should be done?

Question 41hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a large model (10GB) for real-time inference. The inference latency is too high. What optimization technique can help?

Question 42hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Pipelines for CI/CD. The training step fails due to insufficient memory. How to fix without rewriting code?

Question 43easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a trained model to a SageMaker endpoint with automatic scaling based on traffic. Which TWO configurations are required? (Choose two.)

Question 44mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines for model training and wants to incorporate model evaluation before deployment into production. Which THREE components are essential? (Choose three.)

Question 45hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is running a SageMaker endpoint serving multiple models. They need to monitor for data drift and model quality. Which THREE actions are necessary? (Choose three.)

Question 46easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Refer to the exhibit. A user is unable to invoke a SageMaker endpoint. The IAM policy shown is attached to the user. Which permission is missing to allow invocation?

Exhibit

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:DescribeEndpoint",
        "sagemaker:ListEndpoints"
      ],
      "Resource": "*"
    }
  ]
}

Question 47mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Refer to the exhibit. A SageMaker endpoint is logging this error when processing inference requests that require database access. What is the most likely cause?

Exhibit

[ERROR] 2024-03-15 10:23:45,234 - botocore.exception.ConnectTimeoutError: Connect timeout on endpoint URL: "https://database.example.com:5432"

Question 48hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Refer to the exhibit. A SageMaker Pipeline fails with 'Invalid output reference' at the TrainingStep. What is the most likely cause?

Exhibit

TrainingStep(
    name="TrainModel",
    step_args=train_args,
    depends_on=[tuning_step]
)
tuning_step = TuningStep(...) # produces multiple artifacts

Question 49easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team deploys a PyTorch model on Amazon SageMaker for real-time inference. The model requires GPU for low latency. Which instance type is MOST cost-effective while meeting the GPU requirement?

Question 50mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses Amazon SageMaker Pipelines to automate its ML workflow. The pipeline includes a training step and a model evaluation step. If the evaluation step fails, the pipeline should stop and notify the team. How should the company configure the pipeline?

Question 51hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company deploys multiple models on a single Amazon SageMaker endpoint using a multi-model endpoint (MME). The models are stored in Amazon S3. Each model is approximately 500 MB and is loaded on demand. Users report high latency for cold-start scenarios. What should the company do to reduce cold-start latency?

Question 52mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An e-commerce company uses Amazon SageMaker to deploy a real-time inference endpoint for product recommendations. The endpoint receives bursty traffic, with occasional spikes. The company wants to minimize cost while ensuring that latency remains under 100 ms. Which approach should the company take?

Question 53easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer needs to deploy a TensorFlow model to Amazon SageMaker and wants to use the built-in TensorFlow Serving container. What should the engineer provide in the model archive?

Question 54mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist is using Amazon SageMaker Studio to develop a model. The training job is taking longer than expected. The data scientist suspects that the data is being downloaded from Amazon S3 each time the training starts. What is the BEST way to reduce data loading time?

Question 55hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare company is deploying a model for predicting patient outcomes. The model must be deployed across multiple AWS accounts to meet compliance requirements. Each account has its own Amazon SageMaker endpoint. The company wants to centralize monitoring of model performance without exposing data across accounts. Which solution should the company use?

Question 56easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to automate its machine learning pipeline using AWS CodePipeline and Amazon SageMaker. The pipeline should train a model, evaluate it, and if the evaluation passes, register the model in the SageMaker Model Registry. Which service should the company use to orchestrate the training and evaluation steps?

Question 57mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a model on Amazon SageMaker for real-time inference. The inference latency is too high. The model is a large deep learning model. The company wants to reduce latency without significantly impacting accuracy. Which approach should the company consider?

Question 58hardmultiple choice

Review the full subnetting walkthrough →

Refer to the exhibit. An AWS IAM policy is attached to a role used by a CI/CD pipeline to deploy SageMaker endpoints. The pipeline attempts to create an endpoint configuration with a VPC subnet that is not subnet-0123456789abcdef0. What will happen when the pipeline tries to create the endpoint configuration?

Exhibit

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sagemaker:CreateModel",
      "Resource": "arn:aws:sagemaker:us-east-1:123456789012:model/*"
    },
    {
      "Effect": "Deny",
      "Action": "sagemaker:CreateEndpointConfig",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "sagemaker:VpcSubnets": "subnet-0123456789abcdef0"
        }
      }
    },
    {
      "Effect": "Allow",
      "Action": "sagemaker:CreateEndpoint",
      "Resource": "*"
    }
  ]
}

Question 59mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Refer to the exhibit. A data scientist creates a SageMaker Pipeline definition using the JSON shown. The pipeline runs successfully, but the scientist notices that the training step did not use the parameter 'TrainingInstanceCount' defined in Parameters. Why did this happen?

Exhibit

{
  "PipelineExperimentConfig": {
    "ExperimentName": "my-experiment",
    "TrialName": "my-trial"
  },
  "Parameters": {
    "TrainingInstanceType": "ml.m5.large",
    "TrainingInstanceCount": 2,
    "MaxRuntimeInSeconds": 86400
  },
  "Steps": [
    {
      "Name": "Preprocess",
      "Type": "Processing",
      "ProcessingJobName": "preprocess-job",
      "ProcessingResources": {
        "ClusterConfig": {
          "InstanceCount": 1,
          "InstanceType": "ml.m5.large",
          "VolumeSizeInGB": 30
        }
      }
    },
    {
      "Name": "Train",
      "Type": "Training",
      "TrainingJobName": "train-job",
      "AlgorithmSpecification": {
        "TrainingImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-algo:latest",
        "TrainingInputMode": "File"
      },
      "ResourceConfig": {
        "InstanceCount": 2,
        "InstanceType": "ml.m5.large",
        "VolumeSizeInGB": 30
      }
    }
  ]
}

Question 60hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Refer to the exhibit. A company configures a SageMaker Model Monitor Data Quality monitoring schedule as shown. The schedule runs every hour. However, the team notices that the monitoring job fails intermittently with an AccessDenied error when accessing the S3 bucket for output. The IAM role SageMakerMonitorRole has permissions to write to s3://my-bucket/monitor-output. What is the MOST likely cause of the failure?

Exhibit

{
  "MonitoringScheduleName": "model-quality-monitor",
  "EndpointName": "my-endpoint",
  "MonitoringType": "DataQuality",
  "MonitoringScheduleConfig": {
    "ScheduleExpression": "cron(0 * * * ? *)",
    "MonitoringJobDefinition": {
      "BaselineConfig": {
        "BaseliningJobName": "baseline-job",
        "ConstraintsResource": {
          "S3Uri": "s3://my-bucket/baseline/constraints.json"
        },
        "StatisticsResource": {
          "S3Uri": "s3://my-bucket/baseline/statistics.json"
        }
      },
      "MonitoringInputs": [
        {
          "EndpointInput": {
            "EndpointName": "my-endpoint",
            "LocalPath": "/opt/ml/processing/input/endpoint",
            "S3DataDistributionType": "FullyReplicated",
            "S3InputMode": "File"
          }
        }
      ],
      "MonitoringOutputConfig": {
        "MonitoringOutputs": [
          {
            "S3Output": {
              "S3Uri": "s3://my-bucket/monitor-output",
              "LocalPath": "/opt/ml/processing/output",
              "S3UploadMode": "Continuous"
            }
          }
        ]
      },
      "MonitoringResources": {
        "ClusterConfig": {
          "InstanceCount": 1,
          "InstanceType": "ml.m5.large",
          "VolumeSizeInGB": 20
        }
      },
      "RoleArn": "arn:aws:iam::123456789012:role/SageMakerMonitorRole"
    }
  }
}

Question 61mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses Amazon SageMaker to deploy a model for real-time inference. They want to perform A/B testing between two model versions. Which TWO actions should the company take to set up A/B testing? (Choose TWO.)

Question 62hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team is building a CI/CD pipeline to train and deploy models using Amazon SageMaker. They want to ensure that the deployment step only proceeds if the model evaluation metrics exceed a certain threshold. Which THREE components should the team include in the pipeline? (Choose THREE.)

Question 63easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team is deploying a model on Amazon SageMaker and wants to protect the endpoint from unauthorized access. Which TWO methods can the team use to secure the endpoint? (Choose TWO.)

Question 64easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team needs to deploy a frequently updated PyTorch model for real-time inference. The model is retrained weekly and versioned using SageMaker Model Registry. Which deployment strategy minimizes downtime and allows easy rollback?

Question 65mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team is deploying a model using SageMaker. The model requires GPU inference and must be available in multiple AWS regions for low latency. The team has created a multi-model endpoint with GPU instances. After deployment, they notice high latency spikes when a new model is loaded. What is the most likely cause?

Question 66hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a ML model for real-time fraud detection using SageMaker. The model must process requests within 50 ms and scale to handle up to 10,000 requests per second during peak hours. The data includes PII, so all traffic must stay within a VPC. The team has configured the SageMaker endpoint with a VPC and an internet gateway for model downloads. During a load test, the endpoint fails to achieve the required throughput. Which change would most likely resolve the issue?

Question 67easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team wants to automate the retraining and deployment of an ML model whenever new labeled data arrives in S3. The workflow includes data preprocessing, training, evaluation, and conditional deployment. Which AWS service is best suited for orchestrating this end-to-end pipeline?

Question 68mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An MLOps engineer is setting up a SageMaker endpoint for a model that performs inference on large images. The model is containerized and expects input in a specific format. The team wants to preprocess the images (resize and normalize) before passing them to the model. What is the most efficient way to implement this?

Question 69hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a model using SageMaker real-time endpoint with auto scaling. They observe that during a traffic spike, the endpoint quickly scales up to 10 instances, but after the spike, it takes a long time to scale down, leading to high costs. The scaling policy is based on a simple average CPU utilization threshold. Which adjustment would optimize the scaling down behavior?

Question 70easymultiple choice

Read the full NAT/PAT explanation →

A team wants to apply a custom container for inference on SageMaker. The container needs to implement a web server that responds to API requests. Which protocol and port must the container listen on to be compatible with SageMaker hosting?

Question 71mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team is using SageMaker Model Registry to manage model versions. After training a new model version, they register it with an 'Approved' status. The CI/CD pipeline automatically deploys the latest approved model to a staging endpoint. However, the pipeline fails with an error: 'Cannot deploy model because the model version is not approved.' The model version is clearly approved in the registry. What is the most likely cause?

Question 72hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Ground Truth to create a labeled dataset, then trains a model using SageMaker Training. They want to automate the pipeline so that whenever a labeling job is completed, it triggers the training job. Which architecture meets this requirement with minimal latency?

Question 73mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a model on SageMaker that serves predictions to a web application. The model's performance degrades over time due to data drift. The company wants to set up continuous monitoring. Which TWO actions should the company take to monitor and retrain the model effectively? (Choose TWO.)

Question 74hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A team is deploying a model using SageMaker Pipelines. They have defined a pipeline with steps: preprocessing, training, evaluation, and conditional registration. The evaluation step produces a JSON file with metrics. If accuracy > 0.9, the model is registered; else, the pipeline fails. Which TWO statements about this pipeline are correct? (Choose TWO.)

Question 75easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy its trained model to edge devices such as cameras and IoT devices. The model must run efficiently with low latency and minimal memory footprint. Which THREE actions should the company take to prepare the model for edge deployment? (Choose THREE.)

Question 76mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An engineer runs: aws sagemaker describe-endpoint --endpoint-name my-endpoint and receives the exhibit output. The engineer wants to update the endpoint to use a new model version stored in ECR with tag ':2'. Which step is necessary to perform the update?

Exhibit

Refer to the exhibit.

{
  "EndpointName": "my-endpoint",
  "EndpointConfigName": "my-endpoint-config-v1",
  "ProductionVariants": [
    {
      "VariantName": "v1",
      "DeployedImages": [
        {
          "SpecifiedImage": "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-model:1",
          "ResolvedImage": "123456789012.dkr.ecr.us-west-2.amazonaws.com/my-model:1@sha256:abc123"
        }
      ],
      "CurrentWeight": 1.0,
      "DesiredWeight": 1.0,
      "CurrentInstanceCount": 2,
      "DesiredInstanceCount": 2
    }
  ],
  "EndpointStatus": "InService"
}

Question 77hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A SageMaker endpoint is failing with the exhibited error. What is the most likely cause of this error?

Exhibit

Refer to the exhibit.

[ERROR] 2022-12-01 10:15:30,123 – model_server – ModelLoadFailed: Unable to load model from /opt/ml/model. Parsed error: FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/model/classes.txt'

This log is from a SageMaker endpoint instance. The model was packaged as a tar.gz containing model.pth, classes.txt, and inference.py. The Docker container uses the SageMaker inference toolkit.

Question 78easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML engineer runs the CLI command shown in the exhibit. However, the training job fails immediately with an error: 'Unable to assume role'. What is the most likely cause?

Exhibit

Refer to the exhibit.

aws sagemaker create-training-job \
    --training-job-name my-training-job \
    --algorithm-specification 'TrainingImage=123456789012.dkr.ecr.us-west-2.amazonaws.com/my-custom-training:latest,TrainingInputMode=File' \
    --role-arn arn:aws:iam::123456789012:role/SageMakerExecutionRole \
    --input-data-config '[{"ChannelName":"train","DataSource":{"S3DataSource":{"S3Uri":"s3://my-bucket/train/","S3DataType":"S3Prefix"}},"ContentType":"text/csv"}]' \
    --output-data-config '{"S3OutputPath":"s3://my-bucket/output/"}' \
    --resource-config '{"InstanceType":"ml.m5.large","InstanceCount":1,"VolumeSizeInGB":30}' \
    --vpc-config '{"SecurityGroupIds":["sg-12345678"],"Subnets":["subnet-12345678"]}'

Question 79easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team has trained a model using SageMaker and wants to deploy it to a production endpoint with automatic scaling based on request volume. Which SageMaker feature should they use to configure scaling?

Question 80mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a multi-model endpoint using SageMaker to serve multiple models from a single endpoint. They notice that one model consumes excessive memory and impacts others. What is the BEST practice to isolate resource usage?

Question 81hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team uses SageMaker Pipelines to automate retraining. After a pipeline failure, they need to reprocess only the failed step without rerunning the entire pipeline. What should they do?

Question 82mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to deploy a PyTorch model on SageMaker for real-time inference. Which two steps are required? (Select TWO.)

Question 83mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker to orchestrate a training pipeline with multiple steps including preprocessing, training, and evaluation. They want to ensure that each step can be reused and tracked. Which three SageMaker features support this? (Select THREE.)

Question 84hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An organization is deploying a large language model on SageMaker and needs to optimize inference costs while maintaining low latency. Which three strategies should they consider? (Select THREE.)

Question 85mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team used the above config to create an endpoint. However, the endpoint fails to invoke because of a "ModelError". What is the most likely cause?

Network Topology

Question 86mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist runs this pipeline but the Train step fails with "ResourceLimitExceeded". What is the most likely cause?

Exhibit

Refer to the exhibit.
```
Pipeline definition snippet:
{
  "Steps": [
    {
      "Name": "Preprocess",
      "Type": "Processing",
      "Arguments": {
        "ProcessingResources": {
          "ClusterConfig": {
            "InstanceCount": 1,
            "InstanceType": "ml.m5.large",
            "VolumeSizeInGB": 10
          }
        }
      }
    },
    {
      "Name": "Train",
      "Type": "Training",
      "DependsOn": ["Preprocess"],
      "Arguments": {
        "AlgorithmSpecification": {
          "TrainingImage": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-training:latest",
          "TrainingInputMode": "File"
        },
        "ResourceConfig": {
          "InstanceCount": 2,
          "InstanceType": "ml.p3.2xlarge",
          "VolumeSizeInGB": 30
        }
      }
    }
  ]
}
```

Question 87hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

During deployment of a Hugging Face model, the endpoint logs show this error. Which step was likely missed?

Exhibit

Refer to the exhibit.
```
CloudWatch Logs from a SageMaker endpoint:
[ERROR] Runtime.ImportModuleError: Unable to import module 'inference': No module named 'transformers'
```

Question 88easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to use SageMaker to deploy a model that requires GPU acceleration for inference but also needs to keep costs low when traffic is low. Which SageMaker feature should they use?

Question 89mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team has a large number of models that need to be deployed for batch inference weekly. They want to minimize cost and management overhead. Which approach is MOST efficient?

Question 90mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team uses SageMaker Model Registry to manage model versions. They want to automatically deploy a model to a staging endpoint when a new version is approved. Which AWS service can orchestrate this?

Question 91hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a model using SageMaker and enables data capture for monitoring. After a week, they notice that the captured data is not being written to the specified S3 bucket. The endpoint is running and invocations are successful. What is the most likely cause?

Question 92hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A team uses SageMaker Neo to compile a model for deployment on a target device. After compilation, they deploy the compiled model to a SageMaker endpoint using the Neo-optimized container. The endpoint fails to start with error "RuntimeError: Unable to load model". What could be the issue?

Question 93easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company wants to use SageMaker to serve real-time predictions with a model that has a large memory footprint. They need to ensure the endpoint can handle traffic spikes. Which scaling policy should they use?

Question 94easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team wants to deploy a real-time inference endpoint on Amazon SageMaker for a model that requires low latency (under 100 ms). The model is a small ensemble of three tree-based models, each about 50 MB. The team expects around 1000 requests per minute, with occasional spikes to 5000 requests per minute. Which instance type and deployment strategy would be MOST cost-effective while meeting the latency requirement?

Question 95mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has a SageMaker endpoint that was deployed successfully and is in service. However, when the team sends test inferences using the InvokeEndpoint API, they receive a 500 internal server error. The endpoint logs in CloudWatch show a stack trace indicating 'OutOfMemoryError: Java heap space'. The model is a large XGBoost model (2 GB) and the endpoint is using an ml.m5.large instance with 8 GB of memory. What is the MOST likely cause and solution?

Question 96hardmultiple choice

Read the full NAT/PAT explanation →

A company is running multiple SageMaker endpoints for different models, each serving a separate business unit. The total cost is growing rapidly. The ML engineering team wants to reduce costs without sacrificing performance or isolation. They are considering either consolidating models into a Multi-Model Endpoint (MME) or onto a Multi-Container Endpoint (MCE). The models vary in size from 100 MB to 5 GB, and traffic patterns are unpredictable. Which recommendation is MOST appropriate?

Question 97easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses Amazon SageMaker to train and deploy machine learning models. They need to run batch predictions on 10 TB of data stored in Amazon S3 every night. The model is a PyTorch neural network that fits in GPU memory. The predictions are not time-sensitive, but the job must complete within 8 hours. Which approach would be the MOST cost-effective?

Question 98mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning engineer is configuring auto-scaling for a SageMaker real-time endpoint. The endpoint is expected to have steady traffic during business hours and low traffic at night. The engineer wants to minimize costs by scaling in during low traffic, but the model container has a long start-up time (about 5 minutes). Which scaling policy should the engineer use to prevent request drops during sudden traffic spikes?

Question 99hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker endpoint with production variants for canary deployments. The team wants to gradually shift traffic from the old model variant (variant A) to the new model variant (variant B) over a period of 10 minutes. After the shift, if the new variant's error rate increases by more than 5%, they want to roll back automatically. Which solution meets these requirements with minimal manual intervention?

Question 100easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data scientist needs to version and manage multiple models for a team of five. The team frequently experiments with different algorithms and hyperparameters. They need a centralized registry to store, deploy, and compare model versions. Which AWS service should the data scientist use?

Question 101mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team uses AWS Step Functions to orchestrate a multi-step inference pipeline: data preprocessing, model inference, and postprocessing. The pipeline runs on demand for single records. The team notices that the pipeline occasionally fails due to timeouts in the preprocessing step. They want to implement retries with exponential backoff and a maximum retry count of 3 for that step. How should they configure this?

Question 102hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is deploying a deep learning model for real-time inference using Amazon SageMaker. The model is a CPU-intensive XGBoost model that performs well with CPU. However, the team wants to minimize latency further by using hardware acceleration. They are considering Amazon Elastic Inference (EI) or moving to a GPU instance. The model is not optimized for GPU, so significant code changes would be required. Which approach is the MOST cost-effective way to reduce latency without changing the model code?

Question 103mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team is running multiple SageMaker endpoints for various models. The monthly cost is higher than expected. Which TWO actions would help reduce costs without negatively impacting performance?

Question 104hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a SageMaker endpoint that is InService, but inference requests are returning 503 Service Unavailable errors when traffic is high. The endpoint uses three ml.m5.large instances with target tracking scaling based on CPU utilization. The team has confirmed the model container is healthy. Which TWO possible issues could cause 503 errors?

Question 105easymulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is adopting Amazon SageMaker Pipelines to automate their ML workflow. They want to choose three key benefits that SageMaker Pipelines provides over traditional manual scripts and ad-hoc steps. Which THREE benefits are correct?

Question 106hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Your team manages a SageMaker real-time endpoint for a financial services application that requires low latency for fraud detection. The model is a 1 GB XGBoost model. The endpoint is deployed on two ml.m5.xlarge instances with target tracking auto-scaling based on average CPU utilization at 70%. During peak hours, the endpoint receives a sudden burst of traffic that increases from 500 requests per second to 2000 requests per second within 30 seconds. Many requests start failing with 503 errors. The CPU utilization metric shows that the instances are at 90% before the scaling policy launches new instances. However, by the time the new instances are added (approximately 3 minutes), the burst has subsided. You need to prevent these failures during future bursts while keeping costs reasonable. Which action would be MOST effective?

Question 107mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

Your company uses SageMaker batch transform to process a large dataset (5 TB) of customer transactions every night. The batch transform job uses a single ml.c5.4xlarge instance and takes about 6 hours to complete. However, the job recently started failing with an error message: 'Timed out waiting for transformation to complete. The maximum job duration is 3600 seconds.' You check the input data and notice that one of the input files is a single large JSON file of 50 GB, while the rest are smaller files. The job is configured with a batch strategy of 'MultiRecord' and a maximum payload size of 6 MB. What is the most likely cause of the timeout and which fix should you apply?

Question 108easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A startup is building a serverless inference API using AWS Lambda. They have a TensorFlow model that is 400 MB in size. They packaged the model and inference code into a Lambda function using a container image. When they test the function with a small input, it consistently times out after 3 seconds. The Lambda function has 512 MB of memory and a timeout of 30 seconds. The business requirement is that inference must complete in less than 5 seconds under normal conditions. What is the most likely cause of the slow performance, and which change should they make?

Question 109easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deploys a deep learning model to a real-time SageMaker endpoint. After deployment, users report high inference latency. Which action is the MOST effective first step to reduce latency?

Question 110mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company uses SageMaker Pipelines to automate model retraining. The pipeline runs daily but sometimes fails due to data quality issues. What is the best design to handle this?

Question 111mediummulti select

Read the full NAT/PAT explanation →

A machine learning engineer is deploying a model using SageMaker and needs to ensure that the endpoint can automatically scale based on traffic patterns. Which TWO actions should the engineer take? (Choose two.)

Question 112hardmulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A company is building a CI/CD pipeline for ML models using AWS CodePipeline and SageMaker. The pipeline should include steps to automatically retrain, evaluate, and deploy models. Which THREE components are essential for this pipeline? (Choose three.)

Question 113easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has deployed a SageMaker real-time endpoint for a model that predicts customer churn. The endpoint uses a single ml.m5.large instance. After deployment, the team notices that during peak hours, the endpoint returns 5xx errors for about 20% of requests. The endpoint has not been configured with any scaling policy. The team needs to resolve this issue with minimal cost increase. Which solution should the team implement?

Question 114easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team uses SageMaker notebooks to develop models. They want to automate the process of training and registering models whenever new data arrives in an S3 bucket. The team has limited DevOps experience and needs a solution that requires minimal maintenance. Which approach should the team use?

Question 115easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company has trained a custom model using PyTorch on Amazon SageMaker. The model achieves high accuracy, but the inference latency on a real-time endpoint is above the required 100ms SLA. The model is a large neural network with many layers. The company wants to reduce latency without significantly impacting accuracy. Which approach should the machine learning engineer take?

Question 116mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A machine learning team is deploying a fraud detection model using SageMaker. They use the SageMaker Model Registry to track model versions. They want to automatically deploy the latest approved model to a production endpoint whenever a new model version is approved. The team uses a CI/CD pipeline with AWS CodePipeline. The pipeline currently includes a source stage (S3), a build stage (CodeBuild), and a deploy stage (manual approval). They want to automate the deployment of approved models. Which solution will meet these requirements with the least operational overhead?

Question 117mediummultiple choice

Read the full NAT/PAT explanation →

A company uses SageMaker for training and inference. They have a model that retrains weekly. After each retraining, the model is evaluated on a held-out test set. If the evaluation metrics meet a threshold, the model is registered as 'Approved' in the SageMaker Model Registry. The team manually deploys the approved model to a production endpoint. They want to automate this deployment process to reduce manual errors. However, the deployment should only proceed if the new model passes a canary test in a staging environment. Which combination of AWS services should the team use to achieve this?

Question 118hardmultiple choice

Read the full NAT/PAT explanation →

A large enterprise has multiple SageMaker endpoints serving models for different business units. Each endpoint uses a separate instance type and scaling policy. The enterprise wants to implement a unified monitoring and logging solution to track endpoint health, latency, and errors across all endpoints. They also want to set up alerts when the error rate exceeds 5% over a 5-minute period. The solution must be centralized and use AWS-native services. Which solution should the team implement?

Question 119hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company is deploying a credit risk model using SageMaker. They require that the model always uses the latest approved version from the Model Registry. They also need to maintain a detailed audit trail of all model version transitions (e.g., from PendingApproval to Approved). The deployment should be fully automated and must roll back immediately if the new model's error rate exceeds the old model's error rate by more than 2% during a canary deployment. Which solution meets these requirements with the least custom code?

Question 120easymultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A company deployed a machine learning model on an Amazon SageMaker real-time endpoint. Over several weeks, they notice that inference latency has been gradually increasing, especially during peak business hours. The model and instance type have remained unchanged. What is the most likely cause of the increased latency?

Question 121mediummulti select

Read the full Deployment and Orchestration of ML Workflows explanation →

A data science team deploys a TensorFlow model for real-time inference using the Amazon SageMaker model configuration shown. They observe high latency during the first few requests after deployment. Which TWO actions would reduce cold start latency? (Choose two.)

Exhibit

Refer to the exhibit.

{
  "ModelName": "my-model",
  "PrimaryContainer": {
    "Image": "763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.11-cpu",
    "Environment": {
      "SAGEMAKER_PROGRAM": "inference.py",
      "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code"
    }
  },
  "ExecutionRoleArn": "arn:aws:iam::123456789012:role/SageMakerRole"
}

Question 122hardmultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

A financial services company uses Amazon SageMaker to deploy a fraud detection model for real-time inference. The model is deployed on an ml.m5.large instance with a SageMaker real-time endpoint. The endpoint has an auto scaling policy configured using a custom scaling policy based on average CPU utilization, with scale out threshold at 70% and scale in threshold at 30%. During a flash sale event, the traffic to the endpoint spikes tenfold within minutes. The endpoint fails to handle the load, resulting in increased latency and timeouts. The data science team needs to improve the scalability of the endpoint to handle sudden traffic spikes. Which solution should the team implement?

Question 123mediummultiple choice

Read the full Deployment and Orchestration of ML Workflows explanation →

An ML team at a financial services company has developed a fraud detection model using Amazon SageMaker. The model is currently deployed to a production endpoint with a single variant using the previous model version. The team wants to deploy a new model version with a canary deployment where 10% of traffic goes to the new version and 90% remains on the old version for 30 minutes before shifting all traffic to the new version if no issues are detected. Which step is essential to achieve this safe rollout?

Question 124hardmultiple choice

Read the full NAT/PAT explanation →

A streaming media company uses Amazon SageMaker to host a recommendation model at a real-time endpoint. The model is updated weekly, and the team deploys new model versions using SageMaker's blue/green deployments. Recently, after a deployment, the new endpoint variant began returning HTTP 503 errors (Service Unavailable) for approximately 5 minutes before stabilizing. The deployment uses a linear transition with a 10-minute window. The old variant continues to serve traffic during the transition. The team notices that the error rate spikes right after the new variant becomes active. The endpoint is configured with two instances for each variant. Instance logs show that the new model container is taking longer than expected to load and initialize (e.g., downloading model artifacts from S3 and loading into memory). The team needs to resolve this issue without changing the model or container image. Which combination of actions should the team take to eliminate the 503 errors?