Free MLA-C01 ML Solution Monitoring, Maintenance and Security Practice Questions (2026)

Q: How can I practice ML Solution Monitoring, Maintenance and Security questions for MLA-C01?

Click any of the 121 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the ML Solution Monitoring, Maintenance and Security domain.

Practice ML Solution Monitoring, Maintenance and Security questions

10Q 20Q 30Q 50Q

All MLA-C01 ML Solution Monitoring, Maintenance and Security questions (121)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A machine learning engineer at a retail company is monitoring a production model that predicts inventory demand. The model's prediction accuracy has dropped significantly over the past week. The engineer checks the model's input data and notices a new product category was introduced with a different distribution. Which concept is most likely causing the performance degradation?

A data science team is using Amazon SageMaker to train and deploy a binary classification model. They want to continuously monitor the model for data drift in production. Which combination of AWS services and SageMaker features should they use to implement automated drift detection with minimal operational overhead?

A financial services company uses a custom container on Amazon SageMaker to serve a fraud detection model. The model's inference latency has recently increased, causing timeouts for some requests. The team reviews the SageMaker logs and finds that the container is consuming more memory than allocated. What should the team do to maintain service quality while ensuring cost-effectiveness?

A machine learning team is building a CI/CD pipeline for model deployment using Amazon SageMaker. They need to ensure that all model artifacts are encrypted at rest and in transit, and that access to the models is controlled via IAM. Which TWO actions should the team take to meet these requirements? (Choose TWO.)

A healthcare company deploys a model to predict patient readmission risk. The model was trained on historical data and is now showing signs of concept drift. The team needs to implement a monitoring solution that can detect drift and automatically retrain the model when drift is detected. Which THREE steps should the team take to build this solution? (Choose THREE.)

A company is using Amazon SageMaker to host a real-time inference endpoint. They want to restrict access to the endpoint to only a specific VPC and require authentication using AWS IAM. Which TWO configuration steps should they take to achieve this? (Choose TWO.)

A machine learning engineer is troubleshooting a model that is producing unexpectedly low accuracy in production. The engineer examines the model's training data and finds that the distribution of the target variable in production is significantly different from the training set. What type of drift is the model experiencing?

A team deploys a machine learning model using a SageMaker endpoint with an ML.T4 instance. After a week, they notice that the endpoint's CPU utilization is consistently below 10% and latency is low. However, the endpoint is incurring high costs. Which action should the team take to reduce costs while maintaining the ability to serve traffic?

A company is using Amazon SageMaker to train a model on sensitive customer data. The security team requires that all data be encrypted in transit and at rest, and that the training job does not have internet access. Which configuration should the team use to meet these requirements?

A company has a SageMaker endpoint that uses a trained model to classify images. The endpoint is experiencing high latency and the team suspects it is due to the model size. Which action can the team take to reduce latency without significantly impacting accuracy?

A data science team deploys a regression model using Amazon SageMaker. After one week, the model's prediction accuracy drops significantly. The team needs to detect this degradation automatically and trigger retraining. Which AWS service should they use to monitor the model's performance over time and set up alerts?

A company uses Amazon SageMaker to host a real-time inference endpoint for a fraud detection model. The endpoint is deployed with three instances of ml.m5.large. The model processes each request in about 200 ms. Lately, users report occasional timeouts (requests taking >5 seconds). The team suspects model drift or data skew. What is the MOST likely cause and solution?

A machine learning engineer deploys a model to an Amazon SageMaker endpoint with data capture enabled. The endpoint uses a production variant with initial instance count of 2. After a week, they notice that the captured data is not being sent to the specified Amazon S3 bucket. The IAM role used by the endpoint has the following policy attached. What is the MOST likely reason for the failure?

A company uses Amazon Rekognition to moderate user-generated images. They want to set up a monitoring system that alerts the team if the number of inappropriate images flagged by the model exceeds a threshold. Which combination of AWS services should they use?

A team deploys a PyTorch model on Amazon SageMaker for real-time inference. They notice that inference latency is higher than expected. They suspect the serialization format used for input data is inefficient. Which approach would MOST likely reduce latency?

A financial services company deploys a credit risk model using an Amazon SageMaker endpoint with data capture enabled. The model uses a custom container. The compliance team requires that all inference requests and responses are logged to an S3 bucket with server-side encryption using AWS KMS. The IAM role for the endpoint has the following policy. What must be added to meet the compliance requirement?

Which TWO actions are recommended best practices for securing an Amazon SageMaker notebook instance? (Select TWO.)

Which THREE components are required to set up automated model retraining in response to performance degradation using Amazon SageMaker? (Select THREE.)

A company operates an e-commerce platform that uses a machine learning model to recommend products to users. The model is deployed on an Amazon SageMaker endpoint with automatic scaling enabled based on average CPU utilization. The model was trained on historical data and is updated weekly. Recently, the platform experienced a flash sale event that caused a sudden spike in traffic. During the event, the endpoint's latency increased dramatically, and many requests timed out. After the event, the team reviews the CloudWatch metrics and notices that the CPU utilization never exceeded 70%, and the scaling policy was triggered but instances took several minutes to become available. The team wants to prevent similar issues in future flash sales. Which course of action would be MOST effective?

A healthcare company deploys a model that predicts patient readmission risk. The model is deployed using a SageMaker real-time endpoint with data capture enabled. The compliance team requires that all inference data be encrypted at rest in S3 using AWS KMS with a customer managed key. The team has configured the endpoint to use an IAM role that includes the necessary KMS permissions. However, after deployment, the captured data is not being written to the S3 bucket. The team checks the CloudWatch logs for the endpoint and finds no errors. The S3 bucket policy is as follows: { "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::my-bucket/*", "Condition": { "Bool": { "aws:SecureTransport": "false" } } } ] } The bucket also has a default KMS key. What is the MOST likely reason that the captured data is not being written?

A data science team deploys a real-time inference endpoint on Amazon SageMaker. They want to monitor for data drift in the input features over time. Which AWS service should they use to capture and analyze the input data distribution?

A company uses SageMaker endpoints with auto-scaling. The endpoint is experiencing high latency during peak hours. The metrics show CPU utilization is low but memory is high. What is the most likely cause?

An ML team trained a model using SageMaker and stored the model artifacts in S3 with server-side encryption using AWS KMS (SSE-KMS). They need to deploy the model to a SageMaker endpoint that uses a different KMS key for inference data encryption. What must they do to ensure the endpoint can decrypt the model artifacts?

A company wants to audit all API calls made to SageMaker endpoints for security compliance. Which AWS service should they enable?

A machine learning model is deployed on SageMaker and its predictions are used in a production application. The model's accuracy has degraded over time. What is the most likely cause?

A company uses SageMaker training jobs that need to access data in an S3 bucket in a different AWS account. The bucket uses a bucket policy that allows access only from a specific VPC. How should they configure the training job?

A team wants to automatically retrain a model when new labeled data arrives. Which SageMaker feature can orchestrate this workflow?

A model deployed on a SageMaker endpoint is returning predictions. The team wants to log all predictions to an S3 bucket for auditing. What is the most efficient way to achieve this?

A company wants to restrict access to a SageMaker notebook instance so that only a specific IAM role can open the notebook via JupyterLab. The notebook instance is associated with a lifecycle configuration that installs custom packages. What is the correct way to enforce access control?

A team uses SageMaker Ground Truth to create labeled datasets. They need to ensure labeling jobs are cost-effective. Which TWO measures should they take? (Select TWO.)

An ML engineer is setting up monitoring for a SageMaker endpoint. Which THREE metrics should be monitored to detect performance issues? (Select THREE.)

A company needs to secure a SageMaker notebook instance that contains sensitive data. Which THREE of the following are effective security measures? (Select THREE.)

Refer to the exhibit. A team has configured data capture for a SageMaker endpoint. The endpoint is returning predictions but no captured data appears in the S3 bucket. What is the most likely cause?

Refer to the exhibit. A data scientist tries to deploy a model from an S3 bucket encrypted with SSE-KMS. What should the administrator do to resolve this?

Refer to the exhibit. An IAM policy is attached to a user to allow invoking a SageMaker endpoint. A developer tries to call the endpoint from a laptop with IP 203.0.113.5 and receives an access denied error. What is the most likely reason?

A company uses Amazon SageMaker to deploy a real-time inference endpoint. They notice increased latency in predictions during peak hours. Which should they investigate first to address the issue?

A company trains a model daily using Amazon SageMaker and uses the model for real-time inference. They want to detect data drift between the training data and the inference data to decide when to retrain. Which AWS service should they use for this purpose?

A company deploys a machine learning model as a SageMaker real-time endpoint. They need to implement a mechanism to automatically roll back to the previous model version if performance degrades after a deployment. Which approach should they use?

A company wants to ensure that only authorized users and services can invoke a SageMaker real-time endpoint. Which AWS service can be used to manage access control?

A team deploys a model with SageMaker and notices that the model returns inconsistent results during inference. They suspect a mismatch in feature transformation between the training pipeline and the inference pipeline. Which SageMaker feature can help compare the feature distributions?

A company uses Amazon SageMaker Ground Truth to create a labeled dataset. They want to monitor the accuracy of human labelers during the labeling process. Which metric should they track?

A company stores its model training data in Amazon S3. To meet compliance requirements, all data in transit between the S3 bucket and SageMaker must be encrypted. What should the company enforce?

A team uses AWS Auto Scaling for a SageMaker real-time endpoint. They notice that when scaling in, the latest instance is always terminated first, causing disruption to recent requests. How can they configure the scaling policy to terminate the oldest instance first?

A company deploys a SageMaker model using AWS KMS for encryption at rest. They have a compliance requirement to rotate the KMS key every year without causing downtime for the inference endpoint. Which approach should they take?

A company uses Amazon SageMaker Model Monitor to track data quality. The monitoring job triggers an alert indicating that the data distribution has shifted beyond the configured threshold. Which TWO actions should the team take? (Choose TWO.)

A company wants to monitor their machine learning model for bias over time. Which THREE AWS services or features can they use to achieve this? (Choose THREE.)

A company stores training data in Amazon S3 and uses Amazon SageMaker for model training. They need to ensure data is encrypted at rest. Which THREE encryption options are supported by SageMaker for data stored in S3? (Choose THREE.)

Refer to the exhibit. A team observes that their SageMaker endpoint scales out quickly when load increases, but scales in very slowly when load decreases, causing over-provisioning. What is the most likely cause?

Refer to the exhibit. A team receives an error when running a SageMaker Model Monitor schedule for data quality. What should they do to resolve this issue?

Refer to the exhibit. A user has the above IAM policy attached but cannot access files in SageMaker Studio. What additional permission is most likely needed?

A data science team deploys a machine learning model to a SageMaker endpoint for real-time inference. They need to monitor the model for feature distribution drift over time to ensure the model's predictions remain accurate. Which AWS service should they use?

A company's SageMaker endpoint is experiencing increased latency during peak hours. The endpoint uses a single ml.m5.large instance. The deployment is critical and must maintain low latency. Which action is MOST effective to reduce latency without sacrificing cost efficiency?

A financial services company uses SageMaker to train and deploy models. They must ensure that all model artifacts stored in S3 are encrypted at rest using customer-managed KMS keys. Additionally, only the SageMaker service role should have access to the encryption key for decrypting artifacts during inference. Which IAM policy configuration meets these requirements?

A machine learning engineer is monitoring a production SageMaker endpoint using Amazon CloudWatch. They want to set up alarms for anomalous behavior. Which TWO CloudWatch metrics are MOST appropriate for detecting a sudden increase in request latency?

A data science team detects that a deployed model's prediction accuracy is degrading over time due to concept drift. They need to implement a retraining strategy. Which THREE actions are recommended best practices for handling concept drift?

A company operates multiple AWS accounts with SageMaker workloads. They need to implement governance and security controls for model monitoring and maintenance. Which THREE actions should they take to meet compliance requirements?

Refer to the exhibit. A data engineer investigates why a SageMaker endpoint is returning errors. The endpoint configuration has been updated to point to a new model version. What is the MOST likely cause of the error?

Refer to the exhibit. A data scientist uses a SageMaker notebook instance to read a model file from S3 bucket 'my-bucket'. The bucket uses SSE-KMS encryption with a KMS key. The IAM role attached to the notebook has the above policy. However, reading the file fails. What is the MOST likely reason?

Refer to the exhibit. A data engineer runs a SageMaker processing job that fails. What is the MOST likely cause of the failure?

A company wants to maintain multiple versions of a trained model in a central repository and track metadata such as training metrics, hyperparameters, and approval status. Which SageMaker feature should they use?

An e-commerce company uses a machine learning model to predict customer churn. They notice that the model's performance degrades after a major marketing campaign changes customer behavior. Which approach is MOST effective to detect and respond to this type of concept drift?

A company's ML pipeline runs in multiple AWS accounts (dev, test, prod). They want to enforce that only approved models from a central Model Registry can be deployed to the production account. Which combination of services is MOST appropriate to implement this governance?

A SageMaker training job has been running for several hours but shows no progress. The job is using a custom Docker container. The engineer suspects a bug in the training script. Which tool is BEST to debug the training job without stopping it?

A company runs a real-time inference endpoint with an auto-scaling policy based on average CPU utilization. During a traffic spike, the endpoint scales out but takes several minutes to become healthy, causing increased latency. The endpoint uses a large instance type. Which change would MOST effectively reduce the time to scale out?

A company wants to automate the deployment of a SageMaker model into production whenever a new model version is approved in the Model Registry. Which service can be used to trigger the deployment pipeline?

A data science team deployed a model on Amazon SageMaker and enabled Model Monitor to detect data drift. After a week, they receive alerts indicating that the distribution of a key feature has shifted significantly. However, the model's accuracy on the recent production data remains high. Which action should the team take next?

A company's SageMaker real-time endpoint is experiencing high latency under load. The CloudWatch metrics show that the ModelLatency is acceptable, but the OverheadLatency is spiking. What is the most likely cause?

A machine learning engineer wants to encrypt model artifacts stored in Amazon S3. The artifacts are created and used by SageMaker training jobs and endpoints. What is the simplest way to ensure encryption at rest?

A team uses SageMaker Clarify to monitor bias drift in production. They schedule weekly analysis. After a month, Clarify reports a significant increase in a bias metric. What should the team do first?

A model deployed on SageMaker uses custom inference code. The endpoint is showing intermittent 500 errors. CloudWatch logs reveal 'TimeoutError: Request timed out after 60 seconds'. The model takes on average 55 seconds to process. What is the most effective solution?

A company requires that all SageMaker notebook instances be created within a private VPC without internet access. Which configuration step is mandatory?

A team uses SageMaker Model Monitor to track data quality. They notice that the monitor's constraint violations are increasing but the model performance remains good. What should they do?

A machine learning engineer is setting up automated retraining for a model using SageMaker Pipelines. The pipeline should trigger when a data drift alert is received from Model Monitor. Which event source should the engineer use to initiate the pipeline?

A data scientist wants to version control trained models and manage approvals for deployment. Which SageMaker feature should they use?

A company wants to secure access to a SageMaker real-time endpoint. Which TWO actions should be taken? (Select two.)

A machine learning team is setting up Model Monitor for a deployed model. Which THREE factors should they consider when configuring the monitoring schedule? (Select three.)

A data scientist wants to monitor a deployed model for performance degradation. Which TWO metrics from Amazon CloudWatch should they use to detect issues? (Select two.)

Refer to the exhibit. A SageMaker endpoint is failing health checks. What is the most likely cause?

Refer to the exhibit. A SageMaker training job using this IAM role fails with an access denied error when trying to read a file from s3://my-bucket/training-data/model_input.csv. However, a different file at s3://my-bucket/training-data/input/data.csv can be read successfully. What is the most likely reason?

Refer to the exhibit. A team configured a SageMaker Model Monitor schedule for data quality. The baseline was created from a training dataset. After running for a day, the monitoring results show frequent violations. What is the most likely cause?

A data science team deploys a regression model to Amazon SageMaker for real-time inference. After one month, the model's prediction errors increase significantly, but data distributions remain unchanged. Which monitoring approach is MOST suitable for detecting this issue?

A company uses an Amazon SageMaker endpoint for real-time inference. The security team requires that all traffic between the endpoint and the client application be encrypted in transit. Which configuration ensures this?

A machine learning team deploys a custom container image for an Amazon SageMaker training job. The container needs to access an S3 bucket that contains sensitive data. The team wants to follow the principle of least privilege. How should the team grant access?

A company has a batch transform job in Amazon SageMaker that processes large datasets every night. Recently, the job has been failing sporadically with an out-of-memory error. The data size has not increased. What is the MOST likely cause?

An e-commerce company uses a multi-model endpoint on Amazon SageMaker to serve several deep learning models. After a new model version is deployed, the endpoint starts returning 503 errors for some models. Monitoring shows that the endpoint's memory utilization is near 100%. What should the team do to resolve this issue while minimizing operational overhead?

A machine learning engineer is setting up an Amazon SageMaker notebook instance. The instance needs to access a private S3 bucket that contains training data. The notebook instance is in a VPC. Which combination of steps will grant access to the S3 bucket? (Choose TWO.)

A company is using an Amazon SageMaker pipeline for automated retraining. The pipeline fails intermittently due to transient errors in the training job. Which steps should the team take to ensure the pipeline completes successfully? (Choose THREE.)

A financial services company must ensure that all data used by Amazon SageMaker training jobs is encrypted at rest. The company wants to use a customer-managed key (CMK) for the encryption. Which steps are necessary to achieve this? (Choose TWO.)

A team deploys a machine learning model using an Amazon SageMaker endpoint. They need to monitor for data drift and model quality issues. Which AWS services or features should they use? (Choose THREE.)

Refer to the exhibit. A SageMaker execution role has the IAM policy shown. The team attempts to run a training job that writes results to 's3://my-bucket/training/output/model.tar.gz'. What will happen?

Refer to the exhibit. A data scientist reviews the CloudWatch Logs from an Amazon SageMaker real-time endpoint. What is the MOST likely root cause of the NaN output?

A company uses an Amazon SageMaker endpoint with auto-scaling. They notice that during traffic bursts, new instances take several minutes to become healthy, causing 503 errors. What is the BEST way to reduce the time to serve requests during scaling events?

A team is deploying a model that requires GPU acceleration for inference. They are using an Amazon SageMaker real-time endpoint. The model is a large language model (LLM) that does not fit on a single GPU. Which configuration should they use to minimize latency while fitting the model?

A company uses Amazon SageMaker Pipelines for automated retraining. The pipeline includes a processing step that runs a Python script. The script uses the boto3 library to call an AWS service, but the calls are being throttled. What is the MOST effective way to address this within the pipeline?

A healthcare company uses Amazon SageMaker to deploy a real-time inference endpoint for a diagnostic model. The endpoint is configured with a single ml.p3.2xlarge instance. The model processes patient data and returns a risk score. Recently, the endpoint has been experiencing intermittent 504 errors along with increased latency. The team uses Amazon CloudWatch to monitor the endpoint's InvocationsPerInstance and ModelLatency metrics. They observe that InvocationsPerInstance is well below the throttling threshold, but ModelLatency shows periodic spikes lasting 5-10 seconds. The endpoint's CPU utilization remains below 60%, but memory utilization occasionally spikes to 90% during those spikes. The team has checked the inference code and found no obvious memory leaks or performance bottlenecks in the custom logic. The model itself is a deep neural network hosted using Apache MXNet. The team suspects that the issue might be related to resource contention or an external dependency. What should the team do FIRST to diagnose and resolve the issue?

A company uses Amazon SageMaker to train and deploy a machine learning model. After deployment, they notice that the model's accuracy drops significantly over time due to changes in the underlying data distribution. Which monitoring solution should they implement to detect this issue automatically?

A team is deploying a real-time inference endpoint in SageMaker. The model requires access to an S3 bucket containing customer data, which is encrypted with SSE-KMS. The team needs to ensure that the endpoint can decrypt the data. Which IAM role configuration is necessary?

A data scientist trained a model using SageMaker and wants to automate the retraining process when new data becomes available. Which AWS service is best suited to trigger a SageMaker training job based on an S3 event?

After deploying a model to a SageMaker endpoint, the operations team notices high inference latency. They suspect it is due to insufficient instance capacity. Which first step should they take to diagnose the issue?

100

A company is using a SageMaker notebook instance to develop models. The security team requires that all data in the notebook be encrypted at rest and in transit, and that internet access be restricted. Which configuration meets these requirements?

101

A team uses SageMaker Pipelines to automate model retraining. After a successful pipeline run, they want to register the new model version in the SageMaker Model Registry so that it can be reviewed for approval. Which step type should they add to the pipeline?

102

A financial services company is deploying a model for loan approval. They must ensure that the model's predictions do not show bias against protected groups. They plan to monitor for bias drift after deployment. Which SageMaker feature should they use?

103

A company is using SageMaker endpoints for inference. To reduce costs, they want to use Automatic Scaling. However, they observe that scaling up takes several minutes, causing latency spikes during traffic bursts. What should they do to mitigate this?

104

A company has deployed a model to a SageMaker endpoint. The security team wants to ensure that all traffic between the endpoint and the client application is encrypted and that the endpoint is not accessible from the internet. Which TWO actions should the company take? (Choose TWO.)

105

A data science team uses SageMaker Studio to collaborate. They need to restrict access to certain SageMaker Studio applications (e.g., only JupyterLab, no RStudio). Which THREE steps should they take? (Choose THREE.)

106

A company uses SageMaker Model Monitor to detect drift. They want to receive notifications when drift is detected. Which TWO services can be used together to send notifications? (Choose TWO.)

107

A retail company has deployed a real-time recommendation model on a SageMaker endpoint. The model is trained daily using SageMaker Pipelines that process user interaction data from a large S3 bucket. Recently, the operations team noticed that the endpoint's predictions have become stale; users are seeing recommendations based on data from days ago. The pipeline runs successfully every day at 2 AM UTC, but the endpoint continues to serve the old model version. The team checks the pipeline and finds no errors. The model registry contains multiple model versions approved automatically. The endpoint is configured with production variants, but only one variant is active. The team suspects the issue is with the deployment step in the pipeline. They want to automatically deploy new model versions to the endpoint as soon as they are registered and approved. What should they do?

108

A healthcare company is subject to HIPAA and uses SageMaker to train models on patient data. The data is stored in an S3 bucket with server-side encryption using a customer-managed KMS key. The training job uses a custom Docker container that needs to read the data. The security team is concerned about unauthorized access to the data during training. They want to ensure that only the specific training job can access the decryption key. The training runs in a VPC. What should they do?

109

A startup is using SageMaker to train a deep learning model. They use GPU instances for training. The training job takes about 8 hours. The team notices that sometimes the training job fails with an error message indicating that the instance was terminated due to Amazon EBS volume underprovisioned. The team is using the default EBS volume size for the training instance. They want to avoid this error without over-provisioning. What should they do?

110

A media company uses SageMaker endpoints to serve a model that predicts video engagement. They have two production variants: Variant A (ml.c5.large) for regular traffic and Variant B (ml.c5.xlarge) for burst traffic. They use weighted routing (90% to A, 10% to B). Recently, during peak hours, Variant A's latency increase causes many requests to time out. The metrics show that both variants are under similar CPU load, but the number of concurrent requests to Variant A is very high. The team wants to ensure that burst traffic is handled properly without manual intervention. What should they do?

111

A company has deployed a SageMaker endpoint for real-time inference. The security team needs to monitor for potential security threats such as unauthorized access attempts and tampering with the model configuration. Which TWO actions should the team take? (Choose TWO.)

112

A machine learning team at a retail company has deployed a product recommendation model using Amazon SageMaker. The model is updated weekly with new data. Recently, the team noticed that the model's accuracy on a holdout evaluation set has been declining over the past month. The data pipeline that feeds the training job has not changed. The team suspects data drift. They have SageMaker Model Monitor enabled on the inference endpoint and have set up Amazon CloudWatch metrics for feature distribution distances. Upon reviewing the CloudWatch dashboards, they see that the feature distribution distance metric for the most important feature 'product_category' has increased significantly. However, the team is unsure if this is the root cause. Which remediation step should the team take FIRST?

113

A financial services company uses an Amazon SageMaker endpoint for real-time credit scoring. The endpoint is deployed with an ml.c5.2xlarge instance. Recently, the data science team has received complaints from users about slow response times. The team monitors the endpoint using CloudWatch metrics. They observe that the InvocationsPerSecond metric averages 50, the ModelLatency metric averages 200 milliseconds, and the CPUUtilization metric averages 95%. The team has also noticed that the endpoint occasionally returns HTTP 503 (Service Unavailable) errors during peak hours. The team needs to reduce latency and eliminate 503 errors while minimizing cost increase. Which solution should the team implement?

114

A healthcare startup has deployed a machine learning model on Amazon SageMaker that predicts patient readmission risks. The model uses sensitive health data stored in an S3 bucket encrypted with AWS KMS. The SageMaker endpoint is configured with an IAM role that has the following policy attached: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": "arn:aws:s3:::healthcare-data/*", "Condition": { "Bool": { "aws:SecureTransport": "true" } } }, { "Effect": "Allow", "Action": "kms:Decrypt", "Resource": "*" } ] }. During a security audit, the team discovers that the IAM role's KMS permission is too permissive because it allows decryption of any KMS key in the account. The team needs to modify the policy to follow the principle of least privilege while still allowing the SageMaker endpoint to read the encrypted data. Which modification should the team make?

115

An e-commerce company uses a SageMaker endpoint to serve a product recommendation model. The model is retrained every month using batch transforms. The ML team has set up a retraining pipeline using SageMaker Processing jobs and Step Functions. Recently, the Step Functions workflow has been failing at the retraining step with an error: 'AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/RetrainingRole/abc123 is not authorized to perform: s3:GetObject on resource: arn:aws:s3:::training-data/processed/latest.parquet'. The team confirms that the S3 bucket exists and the object is present. The retraining role has the following policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::training-data/*" } ] }. The team also verifies that the bucket policy does not explicitly deny access. What is the MOST likely cause of the AccessDenied error?

116

A gaming company uses a SageMaker endpoint for real-time player churn prediction. The model is updated weekly. After a recent retraining, the team notices that the endpoint's predicted probabilities for churn have shifted dramatically: the average predicted probability dropped from 0.3 to 0.05. The team suspects concept drift (the relationship between features and target changed) rather than data drift. They have SageMaker Model Monitor set up for data drift and quality metrics, but not for bias or explainability. The team needs to confirm concept drift and take corrective action. Which approach should the team take FIRST?

117

A machine learning team has deployed a model using Amazon SageMaker and wants to set up continuous monitoring for data drift. Which TWO actions are essential for ongoing data drift detection?

118

A financial services company has deployed a machine learning model using Amazon SageMaker to predict loan default risk. The model is hosted on a real-time endpoint and uses a SageMaker Model Monitor schedule to check for data drift every hour. The monitoring schedule has been running for a month without issues. Starting last week, the data science team noticed that the endpoint's invocation latency has increased by 300% and error rates have spiked to 5% from a baseline of 0.1%. The team suspects the model is receiving out-of-distribution data that is causing longer processing times and occasional timeouts. They have active CloudWatch alarms on latency and error rates but no alarms on data drift. The Model Monitor schedule shows no failures in its status. The team needs to quickly identify whether data drift is the root cause and take corrective action. Which course of action should the team take to diagnose and address the issue?

119

A company wants to monitor its Amazon SageMaker real-time endpoint for data quality issues. Which TWO actions should the company take?

120

A machine learning engineer sees the above error in Amazon CloudWatch Logs for a SageMaker endpoint. What is the most likely cause?

121

A financial services company operates a real-time inference endpoint for a fraud detection model on Amazon SageMaker. The model was trained on historical transaction data from 2023. Over the past month, the model's precision has dropped from 92% to 78%, while recall remains high at 95%. The data science team suspects data drift and has already enabled SageMaker Model Monitor with data capture and a baseline from the training data. The latest monitoring report indicates no statistically significant drift in any of the input features. The team also verified that the inference code and model artifact have not changed. Despite the stable feature distributions, the model is misclassifying an increasing number of legitimate transactions as fraudulent (false positives). The business is concerned about the impact on customer experience. What is the best course of action?

Practice all 121 ML Solution Monitoring, Maintenance and Security questions

Other MLA-C01 exam domains

Data Preparation for Machine Learning ML Model Development Deployment and Orchestration of ML Workflows

Frequently asked questions

What does the ML Solution Monitoring, Maintenance and Security domain cover on the MLA-C01 exam?

The ML Solution Monitoring, Maintenance and Security domain covers the key concepts tested in this area of the MLA-C01 exam blueprint published by Amazon Web Services. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all MLA-C01 domains — no account required.

How many ML Solution Monitoring, Maintenance and Security questions are in the MLA-C01 question bank?

The Courseiva MLA-C01 question bank contains 121 questions in the ML Solution Monitoring, Maintenance and Security domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice ML Solution Monitoring, Maintenance and Security for MLA-C01?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only ML Solution Monitoring, Maintenance and Security questions for MLA-C01?

Yes — the session launcher on this page draws questions exclusively from the ML Solution Monitoring, Maintenance and Security domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your MLA-C01 domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included