MLA-C01 ML Solution Monitoring, Maintenance and Security — All Questions With Answers

Question 1easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning engineer at a retail company is monitoring a production model that predicts inventory demand. The model's prediction accuracy has dropped significantly over the past week. The engineer checks the model's input data and notices a new product category was introduced with a different distribution. Which concept is most likely causing the performance degradation?

Question 2mediummultiple choice

Read the full NAT/PAT explanation →

A data science team is using Amazon SageMaker to train and deploy a binary classification model. They want to continuously monitor the model for data drift in production. Which combination of AWS services and SageMaker features should they use to implement automated drift detection with minimal operational overhead?

Question 3hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A financial services company uses a custom container on Amazon SageMaker to serve a fraud detection model. The model's inference latency has recently increased, causing timeouts for some requests. The team reviews the SageMaker logs and finds that the container is consuming more memory than allocated. What should the team do to maintain service quality while ensuring cost-effectiveness?

Question 4mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning team is building a CI/CD pipeline for model deployment using Amazon SageMaker. They need to ensure that all model artifacts are encrypted at rest and in transit, and that access to the models is controlled via IAM. Which TWO actions should the team take to meet these requirements? (Choose TWO.)

Question 5hardmulti select

Read the full NAT/PAT explanation →

A healthcare company deploys a model to predict patient readmission risk. The model was trained on historical data and is now showing signs of concept drift. The team needs to implement a monitoring solution that can detect drift and automatically retrain the model when drift is detected. Which THREE steps should the team take to build this solution? (Choose THREE.)

Question 6easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company is using Amazon SageMaker to host a real-time inference endpoint. They want to restrict access to the endpoint to only a specific VPC and require authentication using AWS IAM. Which TWO configuration steps should they take to achieve this? (Choose TWO.)

Question 7mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning engineer is troubleshooting a model that is producing unexpectedly low accuracy in production. The engineer examines the model's training data and finds that the distribution of the target variable in production is significantly different from the training set. What type of drift is the model experiencing?

Question 8hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team deploys a machine learning model using a SageMaker endpoint with an ML.T4 instance. After a week, they notice that the endpoint's CPU utilization is consistently below 10% and latency is low. However, the endpoint is incurring high costs. Which action should the team take to reduce costs while maintaining the ability to serve traffic?

Question 9easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company is using Amazon SageMaker to train a model on sensitive customer data. The security team requires that all data be encrypted in transit and at rest, and that the training job does not have internet access. Which configuration should the team use to meet these requirements?

Question 10easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company has a SageMaker endpoint that uses a trained model to classify images. The endpoint is experiencing high latency and the team suspects it is due to the model size. Which action can the team take to reduce latency without significantly impacting accuracy?

Question 11easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data science team deploys a regression model using Amazon SageMaker. After one week, the model's prediction accuracy drops significantly. The team needs to detect this degradation automatically and trigger retraining. Which AWS service should they use to monitor the model's performance over time and set up alerts?

Question 12mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses Amazon SageMaker to host a real-time inference endpoint for a fraud detection model. The endpoint is deployed with three instances of ml.m5.large. The model processes each request in about 200 ms. Lately, users report occasional timeouts (requests taking >5 seconds). The team suspects model drift or data skew. What is the MOST likely cause and solution?

Question 13hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning engineer deploys a model to an Amazon SageMaker endpoint with data capture enabled. The endpoint uses a production variant with initial instance count of 2. After a week, they notice that the captured data is not being sent to the specified Amazon S3 bucket. The IAM role used by the endpoint has the following policy attached. What is the MOST likely reason for the failure?

Question 14easymultiple choice

Read the full NAT/PAT explanation →

A company uses Amazon Rekognition to moderate user-generated images. They want to set up a monitoring system that alerts the team if the number of inappropriate images flagged by the model exceeds a threshold. Which combination of AWS services should they use?

Question 15mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team deploys a PyTorch model on Amazon SageMaker for real-time inference. They notice that inference latency is higher than expected. They suspect the serialization format used for input data is inefficient. Which approach would MOST likely reduce latency?

Question 16hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A financial services company deploys a credit risk model using an Amazon SageMaker endpoint with data capture enabled. The model uses a custom container. The compliance team requires that all inference requests and responses are logged to an S3 bucket with server-side encryption using AWS KMS. The IAM role for the endpoint has the following policy. What must be added to meet the compliance requirement?

Question 17easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Which TWO actions are recommended best practices for securing an Amazon SageMaker notebook instance? (Select TWO.)

Question 18mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Which THREE components are required to set up automated model retraining in response to performance degradation using Amazon SageMaker? (Select THREE.)

Question 19hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company operates an e-commerce platform that uses a machine learning model to recommend products to users. The model is deployed on an Amazon SageMaker endpoint with automatic scaling enabled based on average CPU utilization. The model was trained on historical data and is updated weekly. Recently, the platform experienced a flash sale event that caused a sudden spike in traffic. During the event, the endpoint's latency increased dramatically, and many requests timed out. After the event, the team reviews the CloudWatch metrics and notices that the CPU utilization never exceeded 70%, and the scaling policy was triggered but instances took several minutes to become available. The team wants to prevent similar issues in future flash sales. Which course of action would be MOST effective?

Question 20mediummultiple choice

Read the full NAT/PAT explanation →

A healthcare company deploys a model that predicts patient readmission risk. The model is deployed using a SageMaker real-time endpoint with data capture enabled. The compliance team requires that all inference data be encrypted at rest in S3 using AWS KMS with a customer managed key. The team has configured the endpoint to use an IAM role that includes the necessary KMS permissions. However, after deployment, the captured data is not being written to the S3 bucket. The team checks the CloudWatch logs for the endpoint and finds no errors. The S3 bucket policy is as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "Bool": {

"aws:SecureTransport": "false"

}
      }
    }
  ]
}

The bucket also has a default KMS key. What is the MOST likely reason that the captured data is not being written?

Question 21easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data science team deploys a real-time inference endpoint on Amazon SageMaker. They want to monitor for data drift in the input features over time. Which AWS service should they use to capture and analyze the input data distribution?

Question 22mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses SageMaker endpoints with auto-scaling. The endpoint is experiencing high latency during peak hours. The metrics show CPU utilization is low but memory is high. What is the most likely cause?

Question 23hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

An ML team trained a model using SageMaker and stored the model artifacts in S3 with server-side encryption using AWS KMS (SSE-KMS). They need to deploy the model to a SageMaker endpoint that uses a different KMS key for inference data encryption. What must they do to ensure the endpoint can decrypt the model artifacts?

Question 24easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to audit all API calls made to SageMaker endpoints for security compliance. Which AWS service should they enable?

Question 25mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning model is deployed on SageMaker and its predictions are used in a production application. The model's accuracy has degraded over time. What is the most likely cause?

Question 26hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses SageMaker training jobs that need to access data in an S3 bucket in a different AWS account. The bucket uses a bucket policy that allows access only from a specific VPC. How should they configure the training job?

Question 27easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team wants to automatically retrain a model when new labeled data arrives. Which SageMaker feature can orchestrate this workflow?

Question 28mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A model deployed on a SageMaker endpoint is returning predictions. The team wants to log all predictions to an S3 bucket for auditing. What is the most efficient way to achieve this?

Question 29hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to restrict access to a SageMaker notebook instance so that only a specific IAM role can open the notebook via JupyterLab. The notebook instance is associated with a lifecycle configuration that installs custom packages. What is the correct way to enforce access control?

Question 30easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team uses SageMaker Ground Truth to create labeled datasets. They need to ensure labeling jobs are cost-effective. Which TWO measures should they take? (Select TWO.)

Question 31mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

An ML engineer is setting up monitoring for a SageMaker endpoint. Which THREE metrics should be monitored to detect performance issues? (Select THREE.)

Question 32hardmulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company needs to secure a SageMaker notebook instance that contains sensitive data. Which THREE of the following are effective security measures? (Select THREE.)

Question 33easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A team has configured data capture for a SageMaker endpoint. The endpoint is returning predictions but no captured data appears in the S3 bucket. What is the most likely cause?

Network Topology

Question 34mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A data scientist tries to deploy a model from an S3 bucket encrypted with SSE-KMS. What should the administrator do to resolve this?

Exhibit

Error from SageMaker: ClientError: Cannot use encrypted model artifact. 
The SageMaker execution role (arn:aws:iam::123456789012:role/SageMakerRole) 
must have kms:Decrypt permission on the KMS key (arn:aws:kms:us-east-1:123456789012:key/abcd1234-...)

Question 35hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. An IAM policy is attached to a user to allow invoking a SageMaker endpoint. A developer tries to call the endpoint from a laptop with IP 203.0.113.5 and receives an access denied error. What is the most likely reason?

Exhibit

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sagemaker:InvokeEndpoint",
      "Resource": "arn:aws:sagemaker:us-east-1:123456789012:endpoint/my-endpoint",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "10.0.0.0/8"
        }
      }
    }
  ]
}

Question 36easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses Amazon SageMaker to deploy a real-time inference endpoint. They notice increased latency in predictions during peak hours. Which should they investigate first to address the issue?

Question 37mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company trains a model daily using Amazon SageMaker and uses the model for real-time inference. They want to detect data drift between the training data and the inference data to decide when to retrain. Which AWS service should they use for this purpose?

Question 38hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company deploys a machine learning model as a SageMaker real-time endpoint. They need to implement a mechanism to automatically roll back to the previous model version if performance degrades after a deployment. Which approach should they use?

Question 39easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to ensure that only authorized users and services can invoke a SageMaker real-time endpoint. Which AWS service can be used to manage access control?

Question 40mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team deploys a model with SageMaker and notices that the model returns inconsistent results during inference. They suspect a mismatch in feature transformation between the training pipeline and the inference pipeline. Which SageMaker feature can help compare the feature distributions?

Question 41hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses Amazon SageMaker Ground Truth to create a labeled dataset. They want to monitor the accuracy of human labelers during the labeling process. Which metric should they track?

Question 42easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company stores its model training data in Amazon S3. To meet compliance requirements, all data in transit between the S3 bucket and SageMaker must be encrypted. What should the company enforce?

Question 43mediummultiple choice

Read the full NAT/PAT explanation →

A team uses AWS Auto Scaling for a SageMaker real-time endpoint. They notice that when scaling in, the latest instance is always terminated first, causing disruption to recent requests. How can they configure the scaling policy to terminate the oldest instance first?

Question 44hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company deploys a SageMaker model using AWS KMS for encryption at rest. They have a compliance requirement to rotate the KMS key every year without causing downtime for the inference endpoint. Which approach should they take?

Question 45mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses Amazon SageMaker Model Monitor to track data quality. The monitoring job triggers an alert indicating that the data distribution has shifted beyond the configured threshold. Which TWO actions should the team take? (Choose TWO.)

Question 46hardmulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to monitor their machine learning model for bias over time. Which THREE AWS services or features can they use to achieve this? (Choose THREE.)

Question 47easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company stores training data in Amazon S3 and uses Amazon SageMaker for model training. They need to ensure data is encrypted at rest. Which THREE encryption options are supported by SageMaker for data stored in S3? (Choose THREE.)

Question 48mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A team observes that their SageMaker endpoint scales out quickly when load increases, but scales in very slowly when load decreases, causing over-provisioning. What is the most likely cause?

Exhibit

{
    "PolicyARN": "arn:aws:autoscaling:us-east-1:123456789012:scalingPolicy:policy-1",
    "PolicyName": "SageMakerEndpointScalingPolicy",
    "PolicyType": "TargetTrackingScaling",
    "TargetTrackingScalingPolicyConfiguration": {
        "TargetValue": 70.0,
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"
        },
        "ScaleInCooldown": 600,
        "ScaleOutCooldown": 200
    }
}

Question 49hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A team receives an error when running a SageMaker Model Monitor schedule for data quality. What should they do to resolve this issue?

Exhibit

2024-01-01 12:00:00 ERROR - Baseline configuration is missing for data quality monitoring. Unable to evaluate constraints.
2024-01-01 12:00:01 ERROR - Monitoring job failed.

Question 50easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A user has the above IAM policy attached but cannot access files in SageMaker Studio. What additional permission is most likely needed?

Exhibit

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sagemaker:CreatePresignedDomainUrl",
            "Resource": "*"
        }
    ]
}

Question 51easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data science team deploys a machine learning model to a SageMaker endpoint for real-time inference. They need to monitor the model for feature distribution drift over time to ensure the model's predictions remain accurate. Which AWS service should they use?

Question 52mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company's SageMaker endpoint is experiencing increased latency during peak hours. The endpoint uses a single ml.m5.large instance. The deployment is critical and must maintain low latency. Which action is MOST effective to reduce latency without sacrificing cost efficiency?

Question 53hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A financial services company uses SageMaker to train and deploy models. They must ensure that all model artifacts stored in S3 are encrypted at rest using customer-managed KMS keys. Additionally, only the SageMaker service role should have access to the encryption key for decrypting artifacts during inference. Which IAM policy configuration meets these requirements?

Question 54easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning engineer is monitoring a production SageMaker endpoint using Amazon CloudWatch. They want to set up alarms for anomalous behavior. Which TWO CloudWatch metrics are MOST appropriate for detecting a sudden increase in request latency?

Question 55mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data science team detects that a deployed model's prediction accuracy is degrading over time due to concept drift. They need to implement a retraining strategy. Which THREE actions are recommended best practices for handling concept drift?

Question 56hardmulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company operates multiple AWS accounts with SageMaker workloads. They need to implement governance and security controls for model monitoring and maintenance. Which THREE actions should they take to meet compliance requirements?

Question 57mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A data engineer investigates why a SageMaker endpoint is returning errors. The endpoint configuration has been updated to point to a new model version. What is the MOST likely cause of the error?

Exhibit

[ERROR] 2024-03-15 10:23:45,123 - sagemaker - 1321 - root - ERROR - InvocationException: Received response status code 404 from container. Error: ResourceNotFoundException: Model 'my-model-v2' is not found. You may be using an outdated endpoint configuration.

Question 58hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A data scientist uses a SageMaker notebook instance to read a model file from S3 bucket 'my-bucket'. The bucket uses SSE-KMS encryption with a KMS key. The IAM role attached to the notebook has the above policy. However, reading the file fails. What is the MOST likely reason?

Exhibit

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::my-bucket/*",
            "Condition": {
                "StringEquals": {
                    "s3:x-amz-server-side-encryption": "AES256"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": "arn:aws:kms:us-east-1:123456789012:key/abc123"
        }
    ]
}

Question 59easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A data engineer runs a SageMaker processing job that fails. What is the MOST likely cause of the failure?

Exhibit

{
    "ProcessingJobName": "my-processing-job",
    "ProcessingJobStatus": "Failed",
    "FailureReason": "ClientError: Unable to read data from input source: s3://my-bucket/input/data.csv. Please check the path and ensure the file exists."
}

Question 60easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to maintain multiple versions of a trained model in a central repository and track metadata such as training metrics, hyperparameters, and approval status. Which SageMaker feature should they use?

Question 61mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

An e-commerce company uses a machine learning model to predict customer churn. They notice that the model's performance degrades after a major marketing campaign changes customer behavior. Which approach is MOST effective to detect and respond to this type of concept drift?

Question 62hardmultiple choice

Read the full NAT/PAT explanation →

A company's ML pipeline runs in multiple AWS accounts (dev, test, prod). They want to enforce that only approved models from a central Model Registry can be deployed to the production account. Which combination of services is MOST appropriate to implement this governance?

Question 63mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A SageMaker training job has been running for several hours but shows no progress. The job is using a custom Docker container. The engineer suspects a bug in the training script. Which tool is BEST to debug the training job without stopping it?

Question 64hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company runs a real-time inference endpoint with an auto-scaling policy based on average CPU utilization. During a traffic spike, the endpoint scales out but takes several minutes to become healthy, causing increased latency. The endpoint uses a large instance type. Which change would MOST effectively reduce the time to scale out?

Question 65easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to automate the deployment of a SageMaker model into production whenever a new model version is approved in the Model Registry. Which service can be used to trigger the deployment pipeline?

Question 66mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data science team deployed a model on Amazon SageMaker and enabled Model Monitor to detect data drift. After a week, they receive alerts indicating that the distribution of a key feature has shifted significantly. However, the model's accuracy on the recent production data remains high. Which action should the team take next?

Question 67hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company's SageMaker real-time endpoint is experiencing high latency under load. The CloudWatch metrics show that the ModelLatency is acceptable, but the OverheadLatency is spiking. What is the most likely cause?

Question 68easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning engineer wants to encrypt model artifacts stored in Amazon S3. The artifacts are created and used by SageMaker training jobs and endpoints. What is the simplest way to ensure encryption at rest?

Question 69mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team uses SageMaker Clarify to monitor bias drift in production. They schedule weekly analysis. After a month, Clarify reports a significant increase in a bias metric. What should the team do first?

Question 70hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A model deployed on SageMaker uses custom inference code. The endpoint is showing intermittent 500 errors. CloudWatch logs reveal 'TimeoutError: Request timed out after 60 seconds'. The model takes on average 55 seconds to process. What is the most effective solution?

Question 71easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company requires that all SageMaker notebook instances be created within a private VPC without internet access. Which configuration step is mandatory?

Question 72mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team uses SageMaker Model Monitor to track data quality. They notice that the monitor's constraint violations are increasing but the model performance remains good. What should they do?

Question 73hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning engineer is setting up automated retraining for a model using SageMaker Pipelines. The pipeline should trigger when a data drift alert is received from Model Monitor. Which event source should the engineer use to initiate the pipeline?

Question 74easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data scientist wants to version control trained models and manage approvals for deployment. Which SageMaker feature should they use?

Question 75mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to secure access to a SageMaker real-time endpoint. Which TWO actions should be taken? (Select two.)

Question 76hardmulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning team is setting up Model Monitor for a deployed model. Which THREE factors should they consider when configuring the monitoring schedule? (Select three.)

Question 77easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data scientist wants to monitor a deployed model for performance degradation. Which TWO metrics from Amazon CloudWatch should they use to detect issues? (Select two.)

Question 78mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A SageMaker endpoint is failing health checks. What is the most likely cause?

Exhibit

[2024-01-15 10:23:45.123] [ERROR] [ContainerHealthCheck] 
Health check failed: Error: Unable to connect to endpoint.
[2024-01-15 10:23:45.456] [INFO] [ModelServer] 
Starting model server...
[2024-01-15 10:23:50.789] [ERROR] [ModelServer] 
Model server failed to start: OSError: [Errno 24] Too many open files

Question 79hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A SageMaker training job using this IAM role fails with an access denied error when trying to read a file from s3://my-bucket/training-data/model_input.csv. However, a different file at s3://my-bucket/training-data/input/data.csv can be read successfully. What is the most likely reason?

Exhibit

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*",
      "Condition": {
        "StringEquals": {
          "s3:prefix": "training-data/"
        }
      }
    }
  ]
}

Question 80easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A team configured a SageMaker Model Monitor schedule for data quality. The baseline was created from a training dataset. After running for a day, the monitoring results show frequent violations. What is the most likely cause?

Exhibit

{
  "MonitoringScheduleName": "data-quality-monitor",
  "MonitoringType": "DataQuality",
  "ScheduleConfig": {
    "ScheduleExpression": "cron(0 * * * ? *)"
  },
  "MonitoringJobDefinition": {
    "BaseliningJobDefinition": {
      "BaselineJobName": "baseline-job-1",
      "BaseliningJobOutputConfig": {
        "MonitoringOutputS3Uri": "s3://my-bucket/baseline/"
      }
    },
    "MonitoringOutputConfig": {
      "MonitoringOutputS3Uri": "s3://my-bucket/monitoring-results/"
    },
    "Environment": {
      "max_runtime_in_seconds": "3600"
    }
  }
}

Question 81easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data science team deploys a regression model to Amazon SageMaker for real-time inference. After one month, the model's prediction errors increase significantly, but data distributions remain unchanged. Which monitoring approach is MOST suitable for detecting this issue?

Question 82easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses an Amazon SageMaker endpoint for real-time inference. The security team requires that all traffic between the endpoint and the client application be encrypted in transit. Which configuration ensures this?

Question 83mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning team deploys a custom container image for an Amazon SageMaker training job. The container needs to access an S3 bucket that contains sensitive data. The team wants to follow the principle of least privilege. How should the team grant access?

Question 84mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company has a batch transform job in Amazon SageMaker that processes large datasets every night. Recently, the job has been failing sporadically with an out-of-memory error. The data size has not increased. What is the MOST likely cause?

Question 85hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

An e-commerce company uses a multi-model endpoint on Amazon SageMaker to serve several deep learning models. After a new model version is deployed, the endpoint starts returning 503 errors for some models. Monitoring shows that the endpoint's memory utilization is near 100%. What should the team do to resolve this issue while minimizing operational overhead?

Question 86easymulti select

Read the full NAT/PAT explanation →

A machine learning engineer is setting up an Amazon SageMaker notebook instance. The instance needs to access a private S3 bucket that contains training data. The notebook instance is in a VPC. Which combination of steps will grant access to the S3 bucket? (Choose TWO.)

Question 87mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company is using an Amazon SageMaker pipeline for automated retraining. The pipeline fails intermittently due to transient errors in the training job. Which steps should the team take to ensure the pipeline completes successfully? (Choose THREE.)

Question 88hardmulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A financial services company must ensure that all data used by Amazon SageMaker training jobs is encrypted at rest. The company wants to use a customer-managed key (CMK) for the encryption. Which steps are necessary to achieve this? (Choose TWO.)

Question 89mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team deploys a machine learning model using an Amazon SageMaker endpoint. They need to monitor for data drift and model quality issues. Which AWS services or features should they use? (Choose THREE.)

Question 90hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A SageMaker execution role has the IAM policy shown. The team attempts to run a training job that writes results to 's3://my-bucket/training/output/model.tar.gz'. What will happen?

Exhibit

Refer to the exhibit.
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateTrainingJob",
                "sagemaker:DescribeTrainingJob",
                "sagemaker:StopTrainingJob"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": "arn:aws:s3:::my-bucket/training/*"
        },
        {
            "Effect": "Deny",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::my-bucket/training/sensitive-data.csv"
        }
    ]
}
```

Question 91easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

Refer to the exhibit. A data scientist reviews the CloudWatch Logs from an Amazon SageMaker real-time endpoint. What is the MOST likely root cause of the NaN output?

Exhibit

Refer to the exhibit.
```
2024-01-15 10:23:45,123 [INFO] Starting inference at endpoint ...
2024-01-15 10:23:45,456 [ERROR] Model output contains NaN values.
2024-01-15 10:23:45,457 [WARN] Input feature x has value -9999.0 which is unusual.
```

Question 92mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses an Amazon SageMaker endpoint with auto-scaling. They notice that during traffic bursts, new instances take several minutes to become healthy, causing 503 errors. What is the BEST way to reduce the time to serve requests during scaling events?

Question 93hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team is deploying a model that requires GPU acceleration for inference. They are using an Amazon SageMaker real-time endpoint. The model is a large language model (LLM) that does not fit on a single GPU. Which configuration should they use to minimize latency while fitting the model?

Question 94mediummultiple choice

Study the full Python automation breakdown →

A company uses Amazon SageMaker Pipelines for automated retraining. The pipeline includes a processing step that runs a Python script. The script uses the boto3 library to call an AWS service, but the calls are being throttled. What is the MOST effective way to address this within the pipeline?

Question 95hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare company uses Amazon SageMaker to deploy a real-time inference endpoint for a diagnostic model. The endpoint is configured with a single ml.p3.2xlarge instance. The model processes patient data and returns a risk score. Recently, the endpoint has been experiencing intermittent 504 errors along with increased latency. The team uses Amazon CloudWatch to monitor the endpoint's InvocationsPerInstance and ModelLatency metrics. They observe that InvocationsPerInstance is well below the throttling threshold, but ModelLatency shows periodic spikes lasting 5-10 seconds. The endpoint's CPU utilization remains below 60%, but memory utilization occasionally spikes to 90% during those spikes. The team has checked the inference code and found no obvious memory leaks or performance bottlenecks in the custom logic. The model itself is a deep neural network hosted using Apache MXNet. The team suspects that the issue might be related to resource contention or an external dependency. What should the team do FIRST to diagnose and resolve the issue?

Question 96mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses Amazon SageMaker to train and deploy a machine learning model. After deployment, they notice that the model's accuracy drops significantly over time due to changes in the underlying data distribution. Which monitoring solution should they implement to detect this issue automatically?

Question 97hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team is deploying a real-time inference endpoint in SageMaker. The model requires access to an S3 bucket containing customer data, which is encrypted with SSE-KMS. The team needs to ensure that the endpoint can decrypt the data. Which IAM role configuration is necessary?

Question 98easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data scientist trained a model using SageMaker and wants to automate the retraining process when new data becomes available. Which AWS service is best suited to trigger a SageMaker training job based on an S3 event?

Question 99mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

After deploying a model to a SageMaker endpoint, the operations team notices high inference latency. They suspect it is due to insufficient instance capacity. Which first step should they take to diagnose the issue?

Question 100hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company is using a SageMaker notebook instance to develop models. The security team requires that all data in the notebook be encrypted at rest and in transit, and that internet access be restricted. Which configuration meets these requirements?

Question 101easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A team uses SageMaker Pipelines to automate model retraining. After a successful pipeline run, they want to register the new model version in the SageMaker Model Registry so that it can be reviewed for approval. Which step type should they add to the pipeline?

Question 102mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A financial services company is deploying a model for loan approval. They must ensure that the model's predictions do not show bias against protected groups. They plan to monitor for bias drift after deployment. Which SageMaker feature should they use?

Question 103mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company is using SageMaker endpoints for inference. To reduce costs, they want to use Automatic Scaling. However, they observe that scaling up takes several minutes, causing latency spikes during traffic bursts. What should they do to mitigate this?

Question 104hardmulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company has deployed a model to a SageMaker endpoint. The security team wants to ensure that all traffic between the endpoint and the client application is encrypted and that the endpoint is not accessible from the internet. Which TWO actions should the company take? (Choose TWO.)

Question 105mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A data science team uses SageMaker Studio to collaborate. They need to restrict access to certain SageMaker Studio applications (e.g., only JupyterLab, no RStudio). Which THREE steps should they take? (Choose THREE.)

Question 106easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company uses SageMaker Model Monitor to detect drift. They want to receive notifications when drift is detected. Which TWO services can be used together to send notifications? (Choose TWO.)

Question 107hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A retail company has deployed a real-time recommendation model on a SageMaker endpoint. The model is trained daily using SageMaker Pipelines that process user interaction data from a large S3 bucket. Recently, the operations team noticed that the endpoint's predictions have become stale; users are seeing recommendations based on data from days ago. The pipeline runs successfully every day at 2 AM UTC, but the endpoint continues to serve the old model version. The team checks the pipeline and finds no errors. The model registry contains multiple model versions approved automatically. The endpoint is configured with production variants, but only one variant is active. The team suspects the issue is with the deployment step in the pipeline. They want to automatically deploy new model versions to the endpoint as soon as they are registered and approved. What should they do?

Question 108mediummultiple choice

Read the full NAT/PAT explanation →

A healthcare company is subject to HIPAA and uses SageMaker to train models on patient data. The data is stored in an S3 bucket with server-side encryption using a customer-managed KMS key. The training job uses a custom Docker container that needs to read the data. The security team is concerned about unauthorized access to the data during training. They want to ensure that only the specific training job can access the decryption key. The training runs in a VPC. What should they do?

Question 109easymultiple choice

Read the full NAT/PAT explanation →

A startup is using SageMaker to train a deep learning model. They use GPU instances for training. The training job takes about 8 hours. The team notices that sometimes the training job fails with an error message indicating that the instance was terminated due to Amazon EBS volume underprovisioned. The team is using the default EBS volume size for the training instance. They want to avoid this error without over-provisioning. What should they do?

Question 110mediummultiple choice

Review the full routing breakdown →

A media company uses SageMaker endpoints to serve a model that predicts video engagement. They have two production variants: Variant A (ml.c5.large) for regular traffic and Variant B (ml.c5.xlarge) for burst traffic. They use weighted routing (90% to A, 10% to B). Recently, during peak hours, Variant A's latency increase causes many requests to time out. The metrics show that both variants are under similar CPU load, but the number of concurrent requests to Variant A is very high. The team wants to ensure that burst traffic is handled properly without manual intervention. What should they do?

Question 111mediummulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company has deployed a SageMaker endpoint for real-time inference. The security team needs to monitor for potential security threats such as unauthorized access attempts and tampering with the model configuration. Which TWO actions should the team take? (Choose TWO.)

Question 112easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning team at a retail company has deployed a product recommendation model using Amazon SageMaker. The model is updated weekly with new data. Recently, the team noticed that the model's accuracy on a holdout evaluation set has been declining over the past month. The data pipeline that feeds the training job has not changed. The team suspects data drift. They have SageMaker Model Monitor enabled on the inference endpoint and have set up Amazon CloudWatch metrics for feature distribution distances. Upon reviewing the CloudWatch dashboards, they see that the feature distribution distance metric for the most important feature 'product_category' has increased significantly. However, the team is unsure if this is the root cause. Which remediation step should the team take FIRST?

Question 113mediummultiple choice

Read the full NAT/PAT explanation →

A financial services company uses an Amazon SageMaker endpoint for real-time credit scoring. The endpoint is deployed with an ml.c5.2xlarge instance. Recently, the data science team has received complaints from users about slow response times. The team monitors the endpoint using CloudWatch metrics. They observe that the InvocationsPerSecond metric averages 50, the ModelLatency metric averages 200 milliseconds, and the CPUUtilization metric averages 95%. The team has also noticed that the endpoint occasionally returns HTTP 503 (Service Unavailable) errors during peak hours. The team needs to reduce latency and eliminate 503 errors while minimizing cost increase. Which solution should the team implement?

Question 114hardmultiple choice

Read the full NAT/PAT explanation →

A healthcare startup has deployed a machine learning model on Amazon SageMaker that predicts patient readmission risks. The model uses sensitive health data stored in an S3 bucket encrypted with AWS KMS. The SageMaker endpoint is configured with an IAM role that has the following policy attached: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:", "Resource": "arn:aws:s3:::healthcare-data/", "Condition": { "Bool": { "aws:SecureTransport": "true" } } }, { "Effect": "Allow", "Action": "kms:Decrypt", "Resource": "*" } ] }. During a security audit, the team discovers that the IAM role's KMS permission is too permissive because it allows decryption of any KMS key in the account. The team needs to modify the policy to follow the principle of least privilege while still allowing the SageMaker endpoint to read the encrypted data. Which modification should the team make?

Question 115easymultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

An e-commerce company uses a SageMaker endpoint to serve a product recommendation model. The model is retrained every month using batch transforms. The ML team has set up a retraining pipeline using SageMaker Processing jobs and Step Functions. Recently, the Step Functions workflow has been failing at the retraining step with an error: 'AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/RetrainingRole/abc123 is not authorized to perform: s3:GetObject on resource: arn:aws:s3:::training-data/processed/latest.parquet'. The team confirms that the S3 bucket exists and the object is present. The retraining role has the following policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::training-data/*" } ] }. The team also verifies that the bucket policy does not explicitly deny access. What is the MOST likely cause of the AccessDenied error?

Question 116mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A gaming company uses a SageMaker endpoint for real-time player churn prediction. The model is updated weekly. After a recent retraining, the team notices that the endpoint's predicted probabilities for churn have shifted dramatically: the average predicted probability dropped from 0.3 to 0.05. The team suspects concept drift (the relationship between features and target changed) rather than data drift. They have SageMaker Model Monitor set up for data drift and quality metrics, but not for bias or explainability. The team needs to confirm concept drift and take corrective action. Which approach should the team take FIRST?

Question 117easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning team has deployed a model using Amazon SageMaker and wants to set up continuous monitoring for data drift. Which TWO actions are essential for ongoing data drift detection?

Question 118hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A financial services company has deployed a machine learning model using Amazon SageMaker to predict loan default risk. The model is hosted on a real-time endpoint and uses a SageMaker Model Monitor schedule to check for data drift every hour. The monitoring schedule has been running for a month without issues. Starting last week, the data science team noticed that the endpoint's invocation latency has increased by 300% and error rates have spiked to 5% from a baseline of 0.1%. The team suspects the model is receiving out-of-distribution data that is causing longer processing times and occasional timeouts. They have active CloudWatch alarms on latency and error rates but no alarms on data drift. The Model Monitor schedule shows no failures in its status. The team needs to quickly identify whether data drift is the root cause and take corrective action. Which course of action should the team take to diagnose and address the issue?

Question 119easymulti select

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A company wants to monitor its Amazon SageMaker real-time endpoint for data quality issues. Which TWO actions should the company take?

Question 120mediummultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A machine learning engineer sees the above error in Amazon CloudWatch Logs for a SageMaker endpoint. What is the most likely cause?

Exhibit

Refer to the exhibit.

CloudWatch Logs excerpt:
```
2024-09-21T14:22:10Z ERROR - Model endpoint 'fraud-model-v2' returned unexpected response: {"prediction": 0.95}. Expected format: {"predictions": [{"score": 0.95}]}. Check inference code and response structure.
```

Question 121hardmultiple choice

Read the full ML Solution Monitoring, Maintenance and Security explanation →

A financial services company operates a real-time inference endpoint for a fraud detection model on Amazon SageMaker. The model was trained on historical transaction data from 2023. Over the past month, the model's precision has dropped from 92% to 78%, while recall remains high at 95%. The data science team suspects data drift and has already enabled SageMaker Model Monitor with data capture and a baseline from the training data. The latest monitoring report indicates no statistically significant drift in any of the input features. The team also verified that the inference code and model artifact have not changed. Despite the stable feature distributions, the model is misclassifying an increasing number of legitimate transactions as fraudulent (false positives). The business is concerned about the impact on customer experience. What is the best course of action?