Knowledge + Practice

CCNA ML Solution Monitoring, Maintenance and Security Questions

46 of 121 questions · Page 2/2 · ML Solution Monitoring, Maintenance and Security · Answers revealed

Practice these questions Domain overview All questions

76

MCQeasy

A company uses Amazon SageMaker to deploy a real-time inference endpoint. They notice increased latency in predictions during peak hours. Which should they investigate first to address the issue?

A.Review the endpoint auto-scaling policy

B.Check the data labeling job status

C.Modify the training instance type

D.Increase the model artifact size

AnswerA

Auto-scaling policy determines how instances are added/removed; insufficient capacity causes high latency.

Why this answer

Option B is correct because latency typically increases when the endpoint is under-provisioned; auto-scaling policies control scaling behavior. Option A is about training, not inference. Option C is unrelated to inference latency.

Option D may affect latency but is not the first thing to investigate.

Practice this question →

77

Multi-Selectmedium

A company is using an Amazon SageMaker pipeline for automated retraining. The pipeline fails intermittently due to transient errors in the training job. Which steps should the team take to ensure the pipeline completes successfully? (Choose THREE.)

Select 3 answers

A.Enable managed spot training for cost savings and use checkpointing to resume from interruptions.

B.Use a larger instance type for the training job to reduce the chance of failure.

C.Implement automatic model checkpointing by setting the CheckpointConfig in the pipeline step.

D.Configure the SageMaker pipeline step to retry on failure with a maximum number of attempts.

E.Add exponential backoff in any custom Python code that makes API calls to AWS services.

AnswersA, D, E

Spot instances can be interrupted; checkpointing helps.

Why this answer

Options A, C, and E are correct. A: Add retry policies for the training step. C: Use spot instances with managed spot training to handle interruptions.

E: Implement exponential backoff in custom code for API calls. Option B is wrong because increasing instance count does not solve transient errors; it adds cost. Option D is wrong because SageMaker does not support automatic checkpointing across retries; you need to implement custom checkpointing.

Practice this question →

78

MCQmedium

A machine learning team deploys a custom container image for an Amazon SageMaker training job. The container needs to access an S3 bucket that contains sensitive data. The team wants to follow the principle of least privilege. How should the team grant access?

A.Create an IAM role with S3 access and assign it as the SageMaker execution role for the training job.

B.Attach an IAM instance profile to the training instance with permissions to the bucket.

C.Configure an S3 bucket policy that grants access to the training job's ARN.

D.Store AWS access keys in the container image and use them to access the bucket.

AnswerA

This is the standard secure method.

Why this answer

Option C is correct because SageMaker execution role assigned to the training job is the best practice. Option A is wrong because hardcoding keys is insecure. Option B is wrong because instance profile is for EC2, not SageMaker training jobs directly; SageMaker uses execution roles.

Option D is wrong because SageMaker does not support S3 bucket policies with principal as the training job ARN directly; the execution role is used.

Practice this question →

79

Multi-Selecteasy

A machine learning team has deployed a model using Amazon SageMaker and wants to set up continuous monitoring for data drift. Which TWO actions are essential for ongoing data drift detection?

Select 2 answers

A.Set up Amazon CloudWatch alarms on the endpoint's invocation latency metric.

B.Enable data capture on the SageMaker endpoint to store inference data in Amazon S3.

C.Configure Amazon SageMaker Model Monitor to run hourly monitoring schedules.

D.Deploy a shadow endpoint to compare predictions from the current model and a challenger model.

E.Create a baseline from the training data to serve as a reference distribution.

AnswersB, C

Data capture is necessary to collect the inference data for monitoring.

Why this answer

Option A (Enable data capture) is essential because data capture collects inference requests and responses, which are required for monitoring. Option C (Configure Model Monitor to run hourly) is essential because Model Monitor analyzes the captured data against a baseline to detect drift. Option B is a prerequisite but not an ongoing action.

Options D and E are unrelated to data drift monitoring.

Practice this question →

80

MCQeasy

A company wants to maintain multiple versions of a trained model in a central repository and track metadata such as training metrics, hyperparameters, and approval status. Which SageMaker feature should they use?

A.SageMaker Pipelines

B.SageMaker Feature Store

C.SageMaker Model Registry

D.SageMaker Experiments

E.SageMaker Studio

AnswerC

Correct. Model Registry provides a central repository for model versions, metadata, and approval status.

Why this answer

SageMaker Model Registry is designed for model versioning, metadata tracking, and approval workflows.

Practice this question →

81

MCQeasy

A company is using Amazon SageMaker to train a model on sensitive customer data. The security team requires that all data be encrypted in transit and at rest, and that the training job does not have internet access. Which configuration should the team use to meet these requirements?

A.Configure the training job to run in a public subnet with a security group that blocks outbound traffic

B.Configure the training job to run in a private subnet, but disable encryption to reduce latency

C.Configure the training job to run in a private subnet with no internet access, and use a KMS key for encryption

D.Configure the training job to run in a VPC with a NAT gateway, and use default SageMaker encryption

AnswerC

Private subnet restricts internet; KMS encrypts data.

Why this answer

Option C is correct because running the SageMaker training job in a private subnet with no internet access ensures the job cannot reach the public internet, satisfying the no-internet-access requirement. Using an AWS KMS key for encryption at rest (for the S3 bucket and EBS volumes) and enforcing encryption in transit (via HTTPS/TLS for SageMaker and S3 endpoints) meets the encryption requirements. SageMaker training jobs in a private subnet use VPC endpoints (e.g., S3 and SageMaker API endpoints) to communicate securely without internet access.

Exam trap

The trap here is that candidates often confuse a private subnet with a NAT gateway as providing no internet access, but a NAT gateway actually enables outbound internet connectivity, which violates the requirement.

How to eliminate wrong answers

Option A is wrong because a public subnet inherently provides internet access via an internet gateway, violating the no-internet-access requirement; blocking outbound traffic with a security group does not prevent the instance from having a public IP or being reachable from the internet. Option B is wrong because disabling encryption violates the requirement that all data be encrypted in transit and at rest; encryption does not inherently increase latency in a meaningful way for SageMaker training jobs. Option D is wrong because a NAT gateway provides outbound internet access for instances in a private subnet, which violates the no-internet-access requirement; default SageMaker encryption uses AWS-managed keys, not a customer-managed KMS key, which may not satisfy the security team's requirement for explicit encryption control.

Practice this question →

82

MCQhard

A financial services company uses SageMaker to train and deploy models. They must ensure that all model artifacts stored in S3 are encrypted at rest using customer-managed KMS keys. Additionally, only the SageMaker service role should have access to the encryption key for decrypting artifacts during inference. Which IAM policy configuration meets these requirements?

A.Set the S3 bucket policy to require aws:SourceArn to match the SageMaker endpoint and allow kms:GenerateDataKey and kms:Decrypt.

B.Create a KMS grant to allow the SageMaker service to use the key on behalf of the role, and set the S3 bucket to use AWS-managed SSE-S3.

C.Configure the KMS key policy to allow s3:PutObject and s3:GetObject for the SageMaker role, and enable S3 default encryption with the KMS key.

D.Use envelope encryption by generating a data key and storing it alongside the model artifact.

E.Attach a policy to the SageMaker role that allows kms:Decrypt on the KMS key, and set an S3 bucket policy that denies all access unless the request uses server-side encryption with the KMS key.

AnswerE

Correct. The role can decrypt, and the bucket policy enforces SSE-KMS, preventing unencrypted access.

Why this answer

The role must have kms:Decrypt permission, and the S3 bucket policy must enforce SSE-KMS to ensure encryption with the correct key.

Practice this question →

83

Multi-Selecthard

A healthcare company deploys a model to predict patient readmission risk. The model was trained on historical data and is now showing signs of concept drift. The team needs to implement a monitoring solution that can detect drift and automatically retrain the model when drift is detected. Which THREE steps should the team take to build this solution? (Choose THREE.)

Select 3 answers

A.Deploy SageMaker Model Monitor to track prediction quality over time

B.Disable the existing endpoint to prevent stale predictions during retraining

C.Set up a process to collect ground truth labels from patient outcomes

D.Manually compare the model's predictions against a holdout validation set each week

E.Use AWS Lambda to invoke a SageMaker training job when drift is detected

AnswersA, C, E

Model Monitor can detect drift using ground truth.

Why this answer

A is correct because Amazon SageMaker Model Monitor can continuously track prediction quality metrics (e.g., accuracy, precision) over time by analyzing data captured from the endpoint. This allows the team to detect concept drift by comparing live predictions against a baseline, triggering alerts when performance degrades. It provides a managed, automated way to monitor model quality without manual intervention.

Exam trap

The trap here is that candidates might think disabling the endpoint (Option B) is necessary to prevent stale predictions, but AWS best practice is to keep the endpoint live and use a separate pipeline (e.g., Lambda triggering a training job) to retrain and then update the endpoint without downtime.

Practice this question →

84

Multi-Selecthard

A company has deployed a model to a SageMaker endpoint. The security team wants to ensure that all traffic between the endpoint and the client application is encrypted and that the endpoint is not accessible from the internet. Which TWO actions should the company take? (Choose TWO.)

Select 2 answers

A.Place the endpoint behind an API Gateway and call it from the client.

B.Configure the SageMaker endpoint to be VPC-only by setting the endpoint's VPC configuration.

C.Create the endpoint with a public endpoint and allow only the client's IP address via security group.

D.Enable HTTPS on the endpoint by using a custom certificate from ACM.

E.Use AWS KMS to encrypt data in transit between the client and the endpoint.

AnswersB, D

VPC-only endpoints are not publicly accessible.

Why this answer

Option A and D are correct. A VPC endpoint (PrivateLink) enables private connectivity from clients in the same VPC, and using HTTPS ensures encryption. Option B (public endpoint) is wrong.

Option C (AWS KMS) doesn't directly encrypt network traffic. Option E (API Gateway) is unnecessary if clients are in VPC.

Practice this question →

85

MCQeasy

A data science team deploys a regression model to Amazon SageMaker for real-time inference. After one month, the model's prediction errors increase significantly, but data distributions remain unchanged. Which monitoring approach is MOST suitable for detecting this issue?

A.Set up Amazon SageMaker Model Monitor to track model performance metrics against ground truth labels as they arrive.

B.Use Amazon SageMaker Clarify to monitor feature attribution drift.

C.Enable Amazon CloudWatch to monitor model endpoint latency.

D.Configure Amazon SageMaker Model Monitor to track data drift on the input features.

AnswerA

Model performance monitoring directly detects concept drift by comparing predictions to actuals.

Why this answer

Option A is correct because the model's prediction errors increased despite no data drift, indicating concept drift. Model performance monitoring compares predictions against ground truth. Option B is wrong because data drift monitors input distribution changes, not concept drift.

Option C is wrong because feature attribution drift is a type of data drift. Option D is wrong because model latency is a performance metric, not accuracy.

Practice this question →

86

Multi-Selecteasy

A machine learning engineer is setting up an Amazon SageMaker notebook instance. The instance needs to access a private S3 bucket that contains training data. The notebook instance is in a VPC. Which combination of steps will grant access to the S3 bucket? (Choose TWO.)

Select 2 answers

A.Create a VPC endpoint for S3 in the same VPC and subnet.

B.Assign a public IP address to the notebook instance.

C.Set up a NAT gateway in the public subnet.

D.Create an IAM role with S3 access permissions and attach it to the notebook instance.

E.Attach an internet gateway to the VPC.

AnswersA, D

Allows private connectivity to S3.

Why this answer

Options B and D are correct. The notebook needs an IAM role with S3 permissions (B) and a VPC endpoint for S3 (D) to access the bucket privately. Option A is wrong because internet gateway is not needed if using VPC endpoint; using NAT would be more complex.

Option C is wrong because assigning public IP is not necessary for private access. Option E is wrong because NAT gateway is not required if using VPC endpoint.

Practice this question →

87

MCQhard

A company uses SageMaker training jobs that need to access data in an S3 bucket in a different AWS account. The bucket uses a bucket policy that allows access only from a specific VPC. How should they configure the training job?

A.Use AWS DataSync to copy data to the training account's S3.

B.Create an IAM role in the source account and assume it from the training account.

C.Use an S3 VPC endpoint in the training job's VPC and attach a bucket policy that allows the VPC.

D.Use cross-account access with an IAM role and add a bucket policy allowing the training job's VPC.

AnswerD

This combines IAM role assumption and VPC condition to meet both requirements.

Why this answer

The training job should be launched in a VPC with an S3 VPC endpoint, and the bucket policy must allow the VPC. Additionally, an IAM role in the source account with cross-account trust is needed. Option C combines both requirements.

Practice this question →

88

MCQmedium

A company uses an Amazon SageMaker endpoint with auto-scaling. They notice that during traffic bursts, new instances take several minutes to become healthy, causing 503 errors. What is the BEST way to reduce the time to serve requests during scaling events?

A.Set up a scheduled scaling policy to pre-warm instances before known traffic bursts.

B.Decrease the cooldown period for the scaling policy to add instances faster.

C.Use a larger instance type so that fewer instances are needed, and the scaling threshold is triggered less often.

D.Increase the maximum number of instances to allow more capacity.

AnswerC

Larger instances can serve more traffic, reducing scaling events.

Why this answer

Option D is correct because using a larger instance type with more compute resources can handle more requests per instance, reducing the need to scale as aggressively. Option A is wrong because proactive scaling with a schedule can help but doesn't reduce the time to become healthy. Option B is wrong because decreasing cooldown period could cause thrashing.

Option C is wrong because increasing maximum instances doesn't speed up each instance's startup.

Practice this question →

89

Multi-Selecteasy

A company wants to monitor its Amazon SageMaker real-time endpoint for data quality issues. Which TWO actions should the company take?

Select 2 answers

A.Create a baseline from the training data to compare against live data.

B.Use SageMaker Debugger to analyze training jobs.

C.Set up an AWS Lambda function to preprocess incoming requests.

D.Configure Amazon S3 bucket notifications for model artifacts.

E.Enable data capture on the SageMaker endpoint.

AnswersA, E

A baseline provides the expected statistics and constraints for the data.

Why this answer

To monitor data quality with SageMaker Model Monitor, you need to enable data capture on the endpoint and create a baseline from the training data. The other options are not directly required for data quality monitoring.

Practice this question →

90

MCQeasy

A data science team deploys a machine learning model to a SageMaker endpoint for real-time inference. They need to monitor the model for feature distribution drift over time to ensure the model's predictions remain accurate. Which AWS service should they use?

A.Amazon CloudWatch Evidently

B.AWS Glue DataBrew

C.SageMaker Clarify

D.SageMaker Model Monitor

E.SageMaker Debugger

AnswerD

Correct. SageMaker Model Monitor monitors data and model quality, including drift detection.

Why this answer

SageMaker Model Monitor is specifically designed to detect drift in feature distributions and prediction quality over time.

Practice this question →

91

Multi-Selectmedium

A company has deployed a SageMaker endpoint for real-time inference. The security team needs to monitor for potential security threats such as unauthorized access attempts and tampering with the model configuration. Which TWO actions should the team take? (Choose TWO.)

Select 2 answers

A.Enable AWS CloudTrail for the SageMaker endpoint API calls

B.Enable AWS Config to monitor endpoint configuration changes

C.Enable SageMaker Data Capture on the endpoint

D.Enable SageMaker Model Monitor for the endpoint

E.Enable Amazon GuardDuty for the endpoint

AnswersA, B

CloudTrail logs all API calls, providing an audit trail for security analysis.

Why this answer

CloudTrail logs all API calls to the SageMaker endpoint, including who made the call and from where, which helps identify unauthorized access. AWS Config continuously monitors endpoint configuration changes and can trigger alerts when changes are made without authorization. SageMaker Model Monitor is for data drift, not security.

Data Capture captures input/output for monitoring model performance, not security. GuardDuty is a threat detection service for AWS accounts and workloads, but it does not directly monitor SageMaker endpoints specifically.

Practice this question →

92

MCQmedium

A machine learning engineer is troubleshooting a model that is producing unexpectedly low accuracy in production. The engineer examines the model's training data and finds that the distribution of the target variable in production is significantly different from the training set. What type of drift is the model experiencing?

A.Prior probability shift

B.Concept drift

C.Data drift

D.Covariate shift

AnswerB

Concept drift is a change in the statistical properties of the target variable.

Why this answer

Option B is correct because a change in the target variable distribution is concept drift. Option A is wrong because covariate shift is input distribution change. Option C is wrong because prior probability shift is a type of concept drift, but not the best answer here.

Option D is wrong because data drift is a general term.

Practice this question →

93

MCQmedium

A media company uses SageMaker endpoints to serve a model that predicts video engagement. They have two production variants: Variant A (ml.c5.large) for regular traffic and Variant B (ml.c5.xlarge) for burst traffic. They use weighted routing (90% to A, 10% to B). Recently, during peak hours, Variant A's latency increase causes many requests to time out. The metrics show that both variants are under similar CPU load, but the number of concurrent requests to Variant A is very high. The team wants to ensure that burst traffic is handled properly without manual intervention. What should they do?

A.Increase the traffic weight to Variant B to 70% and reduce Variant A to 30%.

B.Configure Application Auto Scaling for each variant with a target tracking scaling policy based on the number of concurrent requests per instance.

C.Set a CloudWatch alarm on Variant A's p99 latency and trigger a step scaling policy to add instances.

D.Create a separate endpoint for burst traffic and route peak traffic to it via DNS.

AnswerB

Autoscaling adjusts capacity based on load, preventing timeouts.

Why this answer

Option B is correct because changing to target tracking scaling based on the number of concurrent requests (or InvocationsPerInstance) ensures each variant scales based on its load. Option A (swap weights) doesn't fix scaling. Option C (p99 latency alarm) might trigger too late.

Option D (separate endpoint) is not necessary.

Practice this question →

94

MCQhard

A company deploys a SageMaker model using AWS KMS for encryption at rest. They have a compliance requirement to rotate the KMS key every year without causing downtime for the inference endpoint. Which approach should they take?

A.Use AWS Certificate Manager (ACM) for encryption

B.Create a new KMS key and update the endpoint configuration

C.Manually rotate the key by recreating the endpoint

D.Enable automatic key rotation on the existing KMS key

AnswerD

Automatic rotation rotates the key material without changing the key ID, causing no downtime.

Why this answer

Option B is correct because AWS KMS supports automatic key rotation, which rotates the key material yearly without requiring any changes to the endpoint. Options A, C, and D would cause downtime or are unnecessary.

Practice this question →

95

Multi-Selecthard

A machine learning team is setting up Model Monitor for a deployed model. Which THREE factors should they consider when configuring the monitoring schedule? (Select three.)

Select 3 answers

A.The monitoring job can be configured to send notifications via Amazon SNS.

B.The frequency of monitoring should be at least daily.

C.The monitoring job should analyze a sufficient sample size to be statistically significant.

D.The monitoring job should run on a schedule that aligns with data arrival patterns.

E.The constraints file must be updated after each monitoring run.

AnswersA, C, D

SNS notifications can alert teams when violations are detected.

Why this answer

Options B, D, and E are correct. B: Sufficient sample size ensures statistical significance. D: Schedule should align with data arrival patterns to detect drift promptly.

E: SNS notifications can be set up for alerts. A is not necessarily correct; frequency depends on data volume. C is incorrect because constraints are updated manually or via baseline jobs, not automatically after each run.

Practice this question →

96

MCQmedium

A company uses SageMaker endpoints with auto-scaling. The endpoint is experiencing high latency during peak hours. The metrics show CPU utilization is low but memory is high. What is the most likely cause?

A.The model is not optimized for inference, causing memory leaks.

B.The auto-scaling policy is based on CPU utilization, which does not trigger scaling.

C.The instance type has insufficient network bandwidth.

D.The endpoint is deployed in a VPC without a NAT gateway.

AnswerB

CPU is low so scaling not triggered, but memory high indicates need for more instances.

Why this answer

The auto-scaling policy is likely based on CPU utilization, which does not trigger scaling during memory pressure. Memory leaks could be a secondary cause, but the primary issue is the scaling metric.

Practice this question →

97

MCQhard

A financial services company deploys a credit risk model using an Amazon SageMaker endpoint with data capture enabled. The model uses a custom container. The compliance team requires that all inference requests and responses are logged to an S3 bucket with server-side encryption using AWS KMS. The IAM role for the endpoint has the following policy. What must be added to meet the compliance requirement?

A.Add kms:GenerateDataKey and kms:Decrypt permissions to the IAM role.

B.Add s3:PutObjectAcl permission to the IAM role.

C.Enable S3 default encryption on the bucket.

D.Modify the container to handle encryption internally.

AnswerA

These permissions are necessary to write to a KMS-encrypted bucket.

Why this answer

The correct answer is A because the IAM role for the SageMaker endpoint needs permissions to generate a data key (kms:GenerateDataKey) for encrypting captured data and to decrypt (kms:Decrypt) the KMS key when writing to the S3 bucket. Without these, the endpoint cannot use the customer-managed KMS key for server-side encryption, even if the bucket policy allows it.

Exam trap

The trap here is that candidates often assume enabling S3 default encryption (Option C) is sufficient, but SageMaker data capture requires explicit KMS permissions in the endpoint's IAM role to use the customer-managed key.

How to eliminate wrong answers

Option B is wrong because s3:PutObjectAcl is not required for server-side encryption with KMS; it is used for managing object-level access control lists, not encryption. Option C is wrong because enabling S3 default encryption on the bucket does not satisfy the requirement for server-side encryption using AWS KMS for data captured by SageMaker; the endpoint must explicitly use the KMS key via the IAM role. Option D is wrong because modifying the container to handle encryption internally would bypass the managed data capture feature and is not necessary; SageMaker data capture already supports KMS encryption natively.

Practice this question →

98

MCQmedium

A team uses AWS Auto Scaling for a SageMaker real-time endpoint. They notice that when scaling in, the latest instance is always terminated first, causing disruption to recent requests. How can they configure the scaling policy to terminate the oldest instance first?

A.Configure the termination policy as 'OldestInstance'

B.No action needed; this is the default behavior

C.Use lifecycle hooks

D.Use AWS CloudFormation to manage the endpoint

AnswerA

You can set the termination policy to 'OldestInstance' in the scaling policy configuration.

Why this answer

Option B is correct because Application Auto Scaling allows setting a termination policy. Option A is for cleanup actions. Option C is for infrastructure as code.

Option D is incorrect; the default may not be oldest instance.

Practice this question →

99

MCQhard

Refer to the exhibit. A SageMaker execution role has the IAM policy shown. The team attempts to run a training job that writes results to 's3://my-bucket/training/output/model.tar.gz'. What will happen?

A.The training job will fail because the Deny statement blocks all PutObject actions.

B.The training job will succeed and write the model artifact.

C.The training job will fail because the Deny statement overrides the Allow.

D.The training job will succeed, but the output file will be encrypted with a different key.

AnswerB

The Deny does not affect this resource.

Why this answer

Option C is correct. The Deny statement blocks PutObject on the specific object 'sensitive-data.csv', but the write to 'model.tar.gz' is allowed by the second statement. There is no explicit deny on 'model.tar.gz'.

Option A is wrong because the Deny is specific. Option B is wrong because there is no conflict; Deny only applies to that one object. Option D is wrong because the Deny does not affect this write.

Practice this question →

100

MCQhard

A healthcare startup has deployed a machine learning model on Amazon SageMaker that predicts patient readmission risks. The model uses sensitive health data stored in an S3 bucket encrypted with AWS KMS. The SageMaker endpoint is configured with an IAM role that has the following policy attached: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": "arn:aws:s3:::healthcare-data/*", "Condition": { "Bool": { "aws:SecureTransport": "true" } } }, { "Effect": "Allow", "Action": "kms:Decrypt", "Resource": "*" } ] }. During a security audit, the team discovers that the IAM role's KMS permission is too permissive because it allows decryption of any KMS key in the account. The team needs to modify the policy to follow the principle of least privilege while still allowing the SageMaker endpoint to read the encrypted data. Which modification should the team make?

A.Change the KMS statement Action to "kms:DescribeKey" instead of "kms:Decrypt"

B.Add a condition to the KMS statement: "Condition": { "StringEquals": { "kms:ViaService": "s3.us-east-1.amazonaws.com" } }

C.Remove the KMS statement entirely, as S3 bucket policies with SSE-KMS do not require KMS permissions

D.Change the KMS statement to: "Action": "kms:Decrypt", "Resource": "arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab"

AnswerD

Restricting the Resource to the specific KMS key ARN ensures that the role can only decrypt the key used for the healthcare data, adhering to least privilege.

Why this answer

The current policy allows kms:Decrypt on any KMS key (*). To follow least privilege, the team should restrict the Resource to the specific KMS key used to encrypt the S3 bucket. Option A (change the Action to kms:Decrypt and restrict Resource to the specific key ARN) is correct.

Option B (remove the KMS statement entirely) would break the endpoint because it cannot decrypt the data. Option C (add a condition for specific encryption context) is good practice but still allows decryption of any key if condition is met, not least privilege. Option D (use kms:DescribeKey instead of kms:Decrypt) does not allow decryption.

Practice this question →

101

MCQhard

Refer to the exhibit. A team receives an error when running a SageMaker Model Monitor schedule for data quality. What should they do to resolve this issue?

A.Update the IAM role to allow S3 access

B.Restart the monitoring schedule

C.Enable data capture on the endpoint

D.Create a baseline job using the training dataset

AnswerD

A baseline must be generated from training data to compare inference data against.

Why this answer

Option B is correct because Model Monitor requires a baseline (constraints and statistics) generated from the training data. The error indicates the baseline is missing. Option A enables capture but does not resolve baseline.

Option C is incorrect because the schedule is fine but baseline missing. Option D is about permissions, not baseline.

Practice this question →

102

MCQeasy

An e-commerce company uses a SageMaker endpoint to serve a product recommendation model. The model is retrained every month using batch transforms. The ML team has set up a retraining pipeline using SageMaker Processing jobs and Step Functions. Recently, the Step Functions workflow has been failing at the retraining step with an error: 'AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/RetrainingRole/abc123 is not authorized to perform: s3:GetObject on resource: arn:aws:s3:::training-data/processed/latest.parquet'. The team confirms that the S3 bucket exists and the object is present. The retraining role has the following policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::training-data/*" } ] }. The team also verifies that the bucket policy does not explicitly deny access. What is the MOST likely cause of the AccessDenied error?

A.The Step Functions execution role does not have permission to invoke the SageMaker Processing job

B.The path in the error message is misspelled; the actual object is at a different key

C.The S3 bucket has a bucket policy that denies access to the retraining role based on a condition like aws:SourceIp

D.The training data object uses server-side encryption with AWS KMS (SSE-KMS), and the retraining role lacks kms:Decrypt permission on the KMS key

AnswerD

If the object is encrypted with SSE-KMS, the role needs both s3:GetObject and kms:Decrypt. The current IAM policy does not include KMS permissions.

Why this answer

The error indicates that the retraining role is not authorized to GetObject on the specific object. Even though the policy allows 'arn:aws:s3:::training-data/*', if the object is encrypted with SSE-KMS, the role also needs kms:Decrypt permission on the KMS key. The bucket policy might also require encryption.

Option B is the most likely cause. Option A (wrong region) would give a different error. Option C (lack of S3 bucket policy) is not the issue if there is no explicit deny.

Option D (path typo) would result in a 404 Not Found error, not AccessDenied.

Practice this question →

103

Multi-Selecthard

A company needs to secure a SageMaker notebook instance that contains sensitive data. Which THREE of the following are effective security measures? (Select THREE.)

Select 3 answers

A.Use IAM policies to restrict who can access the notebook instance.

B.Disable direct internet access and use a VPC with a NAT gateway for outbound.

C.Attach a lifecycle configuration that runs a script to download data from a public S3 bucket.

D.Enable AWS CloudTrail to log all notebook API calls.

E.Encrypt the notebook instance's EBS volume using AWS KMS.

AnswersA, D, E

IAM policies can limit which users can create presigned URLs for the notebook.

Why this answer

Encrypting the EBS volume with KMS protects data at rest, IAM policies control access, and CloudTrail provides auditing. Disabling internet access is also good, but the question asks for three from the list.

Practice this question →

104

MCQmedium

An e-commerce company uses a machine learning model to predict customer churn. They notice that the model's performance degrades after a major marketing campaign changes customer behavior. Which approach is MOST effective to detect and respond to this type of concept drift?

A.Deploy an A/B test to compare the current model with a baseline.

B.Use SageMaker Model Monitor to track prediction distribution and trigger retraining.

C.Manually review model accuracy each month.

D.Set up a weekly batch transform job to compute accuracy against historical data.

E.Increase the number of instances for the endpoint.

AnswerB

Correct. Model Monitor continuously checks for drift and can initiate automated retraining.

Why this answer

SageMaker Model Monitor can automatically detect drift in prediction distributions and trigger retraining pipelines.

Practice this question →

105

MCQhard

An e-commerce company uses a multi-model endpoint on Amazon SageMaker to serve several deep learning models. After a new model version is deployed, the endpoint starts returning 503 errors for some models. Monitoring shows that the endpoint's memory utilization is near 100%. What should the team do to resolve this issue while minimizing operational overhead?

A.Increase the number of instances for the endpoint and configure an auto-scaling policy based on memory utilization.

B.Deploy each model on its own separate endpoint to isolate memory usage.

C.Use Amazon SageMaker Model Monitor to detect memory leaks and send alerts.

D.Use SageMaker's built-in model scaling feature to allocate more memory to the affected model.

AnswerA

Adds capacity and auto-scales.

Why this answer

Option C is correct because increasing the endpoint's instance count spreads the memory load, and automating instance scaling with a target tracking policy adjusts based on memory. Option A is wrong because SageMaker does not support per-model scaling; scaling is per endpoint. Option B is wrong because moving to single-model endpoints would increase operational overhead and cost.

Option D is wrong because Model Monitor doesn't help with scaling.

Practice this question →

106

MCQhard

A model deployed on SageMaker uses custom inference code. The endpoint is showing intermittent 500 errors. CloudWatch logs reveal 'TimeoutError: Request timed out after 60 seconds'. The model takes on average 55 seconds to process. What is the most effective solution?

A.Increase the invocation timeout in the SageMaker API call.

B.Increase the SageMaker endpoint's model container timeout setting.

C.Optimize the inference code to reduce latency.

D.Increase the endpoint's instance count.

AnswerC

Reducing inference latency below the timeout threshold is the most direct and effective solution, as it addresses the root cause.

Why this answer

Option A is correct because the timeout is at the container level, but the issue is latency near limit; optimizing code is most effective. Option B might help with load but not per-request latency. Option C does not exist (invocation timeout is set by client but server-side timeout is 60s default).

Option D: container timeout can be increased, but default is 60s; increasing might mask performance issues.

Practice this question →

107

MCQeasy

A company wants to automate the deployment of a SageMaker model into production whenever a new model version is approved in the Model Registry. Which service can be used to trigger the deployment pipeline?

A.AWS Lambda

B.Amazon CloudWatch Events (EventBridge)

C.Amazon S3 Events

D.Amazon SNS

E.AWS Config

AnswerB

Correct. EventBridge can capture Model Registry events and trigger downstream actions like CodePipeline.

Why this answer

Amazon EventBridge can respond to Model Registry events (e.g., approval status change) and start an automated pipeline.

Practice this question →

108

Multi-Selecthard

A financial services company must ensure that all data used by Amazon SageMaker training jobs is encrypted at rest. The company wants to use a customer-managed key (CMK) for the encryption. Which steps are necessary to achieve this? (Choose TWO.)

Select 2 answers

A.Enable SageMaker's default encryption for the training job by setting the EnableDefaultEncryption flag.

B.Create a CMK in AWS KMS and add the SageMaker service principal to the key policy to allow it to use the key.

C.Enable S3 default encryption using the CMK on all buckets containing training data.

D.Specify the CMK's ARN in the VolumeKmsKeyId parameter when creating the training job.

E.Use CloudWatch Logs encryption to protect the training logs.

AnswersB, D

SageMaker needs permission to use the CMK.

Why this answer

Options A and C are correct. A: Grant SageMaker permissions to use the CMK. C: Specify the CMK in the KmsKeyId parameter of the training job.

Option B is wrong because adding to S3 encryption is for S3, not SageMaker. Option D is wrong because enabling S3 default encryption does not cover SageMaker's internal storage. Option E is wrong because SageMaker encrypts at rest by default with AWS-managed keys, but for CMK you specify it.

Practice this question →

109

MCQeasy

A data scientist wants to version control trained models and manage approvals for deployment. Which SageMaker feature should they use?

A.SageMaker Model Registry.

B.SageMaker Experiments.

C.SageMaker Feature Store.

D.SageMaker Ground Truth.

AnswerA

Model Registry provides version control for models and supports approval workflows for deployment.

Why this answer

Option B is correct because SageMaker Model Registry is purpose-built for model versioning and approval workflows. Option A is for experiment tracking, not deployment. Option C is for labeling data.

Option D is for feature storage.

Practice this question →

110

MCQmedium

A company has a batch transform job in Amazon SageMaker that processes large datasets every night. Recently, the job has been failing sporadically with an out-of-memory error. The data size has not increased. What is the MOST likely cause?

A.The custom inference code has a memory leak that gradually consumes available memory.

B.The data distribution has shifted, causing different memory usage patterns.

C.The instance type is not large enough to handle the dataset.

D.The batch transform input data has increased in size.

AnswerA

A memory leak can cause OOM even with same data size.

Why this answer

Option D is correct because a memory leak in custom code would cause increasing memory usage over time within a single job, leading to OOM. Option A is wrong because instance type is fixed; if it worked before, instance type is not the issue. Option B is wrong because if the data size hasn't increased, total data is not the cause.

Option C is wrong because data distribution change doesn't directly cause OOM; it might cause different processing but not necessarily memory exhaustion.

Practice this question →

111

MCQmedium

Refer to the exhibit. A SageMaker endpoint is failing health checks. What is the most likely cause?

A.The endpoint is not correctly configured with VPC settings.

B.The model is too large for the instance memory.

C.The inference code has a file descriptor leak.

D.The model server is using an incorrect port.

AnswerC

The error explicitly indicates too many open files, which is a classic symptom of a file descriptor leak.

Why this answer

Option C is correct because the error 'Too many open files' indicates a file descriptor leak in the inference code. Option A would show memory errors. Option B would show network errors.

Option D would show incorrect port errors.

Practice this question →

112

MCQeasy

Refer to the exhibit. A team configured a SageMaker Model Monitor schedule for data quality. The baseline was created from a training dataset. After running for a day, the monitoring results show frequent violations. What is the most likely cause?

A.The baseline was created from a dataset that does not represent production data.

B.The environment variable max_runtime_in_seconds is too low.

C.The schedule runs too often (every hour), causing overload.

D.The monitoring output destination is incorrect.

AnswerA

If the baseline does not reflect real-world data, constraints will be frequently violated.

Why this answer

Option A is correct because the baseline from training data may not represent the production data distribution, causing frequent violations. Option B is not likely because hourly monitoring is typical. Option C would cause job failures, not violations.

Option D would cause timeout, not violations.

Practice this question →

113

MCQeasy

Refer to the exhibit. A user has the above IAM policy attached but cannot access files in SageMaker Studio. What additional permission is most likely needed?

A.sagemaker:ListApps

B.s3:GetObject on the relevant S3 buckets

C.sagemaker:DescribeUserProfile

D.kms:Decrypt

AnswerB

To read files in Studio, the user must have S3 access permissions.

Why this answer

Option B is correct because SageMaker Studio users need S3 read/write permissions to access data files stored in S3 buckets. The policy only allows creating a presigned URL for Studio, but not S3 access. Option A is for listing apps, not files.

Option C is for user profiles. Option D is for KMS decryption if applicable, but not the most common cause.

Practice this question →

114

MCQmedium

A company uses Amazon SageMaker to train and deploy a machine learning model. After deployment, they notice that the model's accuracy drops significantly over time due to changes in the underlying data distribution. Which monitoring solution should they implement to detect this issue automatically?

A.Set up Amazon SageMaker Model Monitor with data quality monitoring.

B.Configure AWS Config rules to check the model accuracy metric.

C.Use AWS CloudTrail to monitor changes to the model's S3 bucket.

D.Enable Amazon CloudWatch Logs on the endpoint and set alarms on inference latency.

AnswerA

SageMaker Model Monitor automatically detects drift in data quality and model quality.

Why this answer

Option D is correct because Amazon SageMaker Model Monitor can monitor data quality and model quality drift. Option A (CloudWatch Logs) is for logs, not drift detection. Option B (CloudTrail) tracks API calls.

Option C (AWS Config) tracks resource configuration.

Practice this question →

115

MCQmedium

A model deployed on a SageMaker endpoint is returning predictions. The team wants to log all predictions to an S3 bucket for auditing. What is the most efficient way to achieve this?

A.Enable SageMaker endpoint data capture to the S3 bucket.

B.Configure CloudWatch Logs to export to S3.

C.Modify the inference code to write logs to S3.

D.Use Amazon Kinesis Data Firehose to stream predictions to S3.

AnswerA

Data capture is built-in and efficient.

Why this answer

SageMaker data capture is designed for this purpose and can be enabled on the endpoint configuration to automatically capture input and output data to S3. Modifying inference code is custom and less efficient, Firehose adds complexity, and CloudWatch Logs export is for logs.

Practice this question →

116

Multi-Selectmedium

Which THREE components are required to set up automated model retraining in response to performance degradation using Amazon SageMaker? (Select THREE.)

Select 3 answers

A.An Amazon SNS topic with a subscription to send a manual approval email.

B.A CloudWatch alarm that triggers when a quality metric falls below a threshold.

C.A SageMaker Model Monitor schedule to capture inference data and compute quality metrics.

D.An AWS Lambda function that starts a SageMaker training job or pipeline execution.

E.A production variant with a canary traffic shift configuration.

AnswersB, C, D

The alarm detects degradation and triggers the retraining.

Why this answer

Option B is correct because a CloudWatch alarm can monitor a SageMaker Model Monitor quality metric (e.g., accuracy, precision) and trigger an alarm when the metric falls below a defined threshold. This alarm acts as the event source to initiate automated retraining, forming the monitoring and alerting backbone of the retraining pipeline.

Exam trap

The trap here is that candidates often confuse the monitoring and alerting components (CloudWatch alarm and Model Monitor) with deployment or notification mechanisms, mistakenly selecting manual approval (SNS) or traffic shifting (canary) as part of the automated retraining workflow.

Practice this question →

117

MCQeasy

Refer to the exhibit. A data engineer runs a SageMaker processing job that fails. What is the MOST likely cause of the failure?

A.The processing instance type is too small.

B.The processing job code has a bug.

C.The S3 bucket is in a different region.

D.The input file does not exist at the specified S3 path.

E.The IAM role does not have s3:GetObject permission.

AnswerD

Correct. The error directly points to a missing file or incorrect path.

Why this answer

The failure reason explicitly states the input file cannot be read and advises checking the path or file existence.

Practice this question →

118

MCQmedium

Refer to the exhibit. A team observes that their SageMaker endpoint scales out quickly when load increases, but scales in very slowly when load decreases, causing over-provisioning. What is the most likely cause?

A.TargetValue is too high

B.ScaleOutCooldown is too low

C.ScaleInCooldown is too high

D.Wrong predefined metric selected

AnswerC

A high ScaleInCooldown delays scale-in responses.

Why this answer

Option B is correct because the ScaleInCooldown is 600 seconds (10 minutes), meaning the system waits 10 minutes after a scale-in activity before triggering another scale-in action. This delay causes slow scale-in. Option A would affect scale-out speed.

Option C relates to target value. Option D is incorrect because the metric is appropriate.

Practice this question →

119

MCQmedium

A healthcare company deploys a model that predicts patient readmission risk. The model is deployed using a SageMaker real-time endpoint with data capture enabled. The compliance team requires that all inference data be encrypted at rest in S3 using AWS KMS with a customer managed key. The team has configured the endpoint to use an IAM role that includes the necessary KMS permissions. However, after deployment, the captured data is not being written to the S3 bucket. The team checks the CloudWatch logs for the endpoint and finds no errors. The S3 bucket policy is as follows: { "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::my-bucket/*", "Condition": { "Bool": { "aws:SecureTransport": "false" } } } ] } The bucket also has a default KMS key. What is the MOST likely reason that the captured data is not being written?

A.The bucket policy includes an explicit deny that overrides any allow.

B.The bucket policy denies all PutObject requests because aws:SecureTransport is false.

C.The KMS key policy does not grant the SageMaker execution role the kms:GenerateDataKey permission.

D.The S3 bucket does not exist.

AnswerC

Even if the IAM role has KMS permissions, the key policy might not allow the role to use the key for encryption.

Why this answer

The correct answer is C because SageMaker data capture encrypts captured data at rest in S3 using server-side encryption with AWS KMS (SSE-KMS). When a customer managed KMS key is used, the SageMaker execution role must have the kms:GenerateDataKey permission to encrypt the data before writing it to S3. Even if the IAM role has other KMS permissions, without kms:GenerateDataKey, the data capture write operation fails silently, and CloudWatch logs may not show errors because the failure occurs at the KMS encryption step before the S3 PutObject call.

Exam trap

The trap here is that candidates focus on the S3 bucket policy's explicit Deny and assume it blocks all writes, but they overlook the condition key aws:SecureTransport, which makes the Deny only apply to non-HTTPS requests, and they miss the subtle KMS permission requirement for data capture encryption.

How to eliminate wrong answers

Option A is wrong because the bucket policy does not contain an explicit deny that overrides all allows; the Deny statement only applies when aws:SecureTransport is false, which is a condition that is not met (the request uses HTTPS). Option B is wrong because the bucket policy denies PutObject only when aws:SecureTransport is false, but SageMaker data capture uses HTTPS (SecureTransport is true), so the Deny does not apply. Option D is wrong because if the S3 bucket did not exist, SageMaker would log an error in CloudWatch logs (e.g., NoSuchBucket), but the question states no errors are found in the logs.

Practice this question →

120

MCQmedium

Refer to the exhibit. A data scientist tries to deploy a model from an S3 bucket encrypted with SSE-KMS. What should the administrator do to resolve this?

A.Change the model artifact encryption to SSE-S3.

B.Add kms:Decrypt permission to the SageMaker execution role for the KMS key.

C.Re-upload the model artifact without encryption.

D.Attach the AWS managed policy 'AmazonSageMakerFullAccess' to the role.

AnswerB

This directly addresses the missing permission.

Why this answer

The error indicates the execution role lacks kms:Decrypt permission on the KMS key used to encrypt the model artifact. Adding this permission resolves the issue.

Practice this question →

121

MCQeasy

A company wants to audit all API calls made to SageMaker endpoints for security compliance. Which AWS service should they enable?

A.AWS CloudTrail

B.Amazon GuardDuty

C.AWS Config

D.AWS CloudTrail

AnswerA

CloudTrail records API calls for auditing.

Why this answer

AWS CloudTrail records all API calls for auditing. GuardDuty is for threat detection, Macie for sensitive data discovery, and Config for configuration changes.

Practice this question →

← PreviousPage 2 of 2 · 121 questions total

Ready to test yourself?

Try a timed practice session using only ML Solution Monitoring, Maintenance and Security questions.

Start 20-question session