Knowledge + Practice

AWS Certified Machine Learning Engineer Associate MLA-C01 (MLA-C01) — Questions 751–825

1000 questions total · 14pages · All types, answers revealed

Take a mock exam Exam hub

Page 11 of 14

751

MCQmedium

A company wants to automatically trigger a retraining pipeline when concept drift is detected in their deployed model. Which combination of services should they use?

A.SageMaker Model Monitor → Lambda

B.CloudWatch Events → SageMaker Training Job

C.SageMaker Model Monitor → CloudWatch Alarm → SNS → Lambda

D.SageMaker Clarify → SNS → Step Functions

AnswerC

This is the standard architecture for automated drift-based retraining.

Why this answer

Option C is correct because SageMaker Model Monitor detects concept drift by analyzing model predictions against a baseline, then publishes metrics to CloudWatch. A CloudWatch Alarm triggers when drift exceeds a threshold, sending a notification via SNS to invoke a Lambda function, which starts the retraining pipeline. This end-to-end integration ensures automated, event-driven retraining without manual intervention.

Exam trap

Cisco often tests the distinction between monitoring services (Model Monitor for drift vs. Clarify for bias) and the correct event chain (Model Monitor → CloudWatch → SNS → Lambda) versus incomplete chains like direct Lambda invocation or using the wrong service for drift detection.

How to eliminate wrong answers

Option A is wrong because SageMaker Model Monitor alone cannot directly invoke Lambda; it requires CloudWatch Alarms and SNS to bridge the monitoring output to Lambda execution. Option B is wrong because CloudWatch Events (now EventBridge) can trigger SageMaker Training Jobs, but it lacks the concept drift detection capability provided by Model Monitor, so it cannot determine when retraining is needed. Option D is wrong because SageMaker Clarify is designed for bias detection and explainability, not concept drift monitoring; using SNS and Step Functions without drift detection would not trigger retraining based on model performance degradation.

Full explanation →

752

MCQmedium

A company uses SageMaker endpoints with auto-scaling. The endpoint is experiencing high latency during peak hours. The metrics show CPU utilization is low but memory is high. What is the most likely cause?

A.The model is not optimized for inference, causing memory leaks.

B.The auto-scaling policy is based on CPU utilization, which does not trigger scaling.

C.The instance type has insufficient network bandwidth.

D.The endpoint is deployed in a VPC without a NAT gateway.

AnswerB

CPU is low so scaling not triggered, but memory high indicates need for more instances.

Why this answer

Option B is correct because the auto-scaling policy is based on CPU utilization, which remains low during the memory-bound issue. Since the scaling trigger is not met, the endpoint does not add more instances to handle the increased load, leading to high latency. Memory pressure without CPU spikes indicates the bottleneck is memory, not compute, so a CPU-based metric fails to scale appropriately.

Exam trap

The trap here is that candidates assume high latency always means CPU is the bottleneck, but the exam tests understanding that auto-scaling must be based on the correct metric; memory pressure can cause latency without CPU spikes, and a CPU-based policy will fail to scale.

How to eliminate wrong answers

Option A is wrong because a memory leak would cause memory to increase over time, not specifically during peak hours, and would likely degrade performance gradually rather than cause latency spikes tied to load. Option C is wrong because insufficient network bandwidth would manifest as network-related errors or timeouts, not high memory utilization with low CPU; network metrics would show saturation. Option D is wrong because a VPC without a NAT gateway affects outbound internet access, not inbound inference requests to the endpoint; SageMaker endpoints in a VPC can receive traffic via VPC endpoints or public endpoints without a NAT gateway.

Full explanation →

753

Multi-Selecteasy

Which TWO actions are recommended best practices when preparing training data for a machine learning model in AWS? (Choose two.)

Select 2 answers

A.Remove all outliers from the dataset.

B.Train the model on the entire dataset to maximize data usage.

C.Check for and handle missing values appropriately.

D.Split the data into training, validation, and test sets.

E.Always normalize all features to a [0,1] range.

AnswersC, D

Missing values can cause errors or bias if not addressed.

Why this answer

Option C is correct because missing values can introduce bias or cause algorithms to fail, so handling them (e.g., via imputation or removal) is a critical data preparation step in AWS SageMaker. Option D is correct because splitting data into training, validation, and test sets allows you to evaluate model performance on unseen data and prevent overfitting, which is a standard practice in SageMaker's built-in algorithms and training jobs.

Exam trap

The trap here is that candidates assume all outliers must be removed (Option A) or that normalization is always required (Option E), but the exam tests nuanced understanding that these steps depend on the algorithm and data characteristics, not blanket rules.

Full explanation →

754

MCQmedium

A team is using SageMaker Pipelines to automate retraining and deployment. They want to trigger the pipeline automatically when new training data is available in an S3 bucket. Which approach should they use?

A.Create an Amazon EventBridge rule that triggers the pipeline execution on S3 PutObject events

B.Register the pipeline as a model package in SageMaker Model Registry

C.Configure a cron job to run the pipeline every hour

D.Use AWS Step Functions to poll the S3 bucket and start the pipeline when a new object appears

AnswerA

EventBridge can detect S3 events and start pipeline executions.

Why this answer

Option A is correct because Amazon EventBridge can directly capture S3 PutObject events and invoke a SageMaker Pipeline execution as a target. This provides a fully event-driven, serverless integration without polling or manual intervention, aligning with best practices for automating ML workflows when new data arrives.

Exam trap

The trap here is that candidates may overcomplicate the solution by choosing Step Functions (Option D) for orchestration, not realizing that EventBridge provides a simpler, event-driven trigger without the need for polling or additional state machines.

How to eliminate wrong answers

Option B is wrong because registering a pipeline as a model package in SageMaker Model Registry is for versioning and managing trained models, not for triggering pipeline executions based on S3 events. Option C is wrong because a cron job runs on a fixed schedule, which is inefficient and may miss data arrivals or run unnecessarily, whereas the requirement is to trigger only when new data appears. Option D is wrong because using AWS Step Functions to poll S3 introduces latency, cost, and complexity compared to the native event-driven approach with EventBridge, which reacts instantly to S3 events.

Full explanation →

755

MCQhard

A data scientist is preparing a dataset for a regression model that predicts house prices. The dataset includes a `neighborhood` feature with 500 distinct categories. The data scientist wants to encode this feature without increasing dimensionality too much and while capturing the target relationship. Which encoding technique should be used?

A.Target encoding (mean encoding)

B.One-hot encoding

C.Frequency encoding

D.Label encoding

AnswerA

Target encoding captures target relationship with low dimensionality.

Why this answer

Target encoding (mean encoding) is the correct choice because it replaces each of the 500 neighborhood categories with the mean of the target variable (house price) for that category. This captures the relationship between the neighborhood and the target while adding only one new feature column, thus avoiding the massive dimensionality explosion that would occur with one-hot encoding (which would create 500 binary columns).

Exam trap

AWS often tests the trade-off between dimensionality and information retention, and the trap here is that candidates may choose one-hot encoding out of habit, failing to recognize that 500 categories make it impractical, or choose label encoding because it seems simple, ignoring the ordinal assumption it imposes.

How to eliminate wrong answers

Option B (One-hot encoding) is wrong because it would create 500 binary columns, drastically increasing dimensionality and leading to the curse of dimensionality, sparsity, and overfitting. Option C (Frequency encoding) is wrong because it replaces categories with their count/frequency, which does not capture the relationship with the target variable (house price) and loses predictive signal. Option D (Label encoding) is wrong because it assigns arbitrary integer labels (e.g., 1, 2, 3) that imply an ordinal relationship, which is inappropriate for a nominal feature like neighborhood and can mislead the regression model into assuming a false order.

Full explanation →

756

MCQmedium

A company wants to deploy a single model that processes images from a production line. The images are uploaded to an S3 bucket every few minutes, and the inference results must be stored back to S3. The team wants to avoid paying for idle compute and prefers a fully managed, on-demand solution. Which SageMaker inference option should they use?

A.SageMaker batch transform

B.SageMaker asynchronous inference

C.SageMaker real-time endpoint with auto scaling

D.SageMaker serverless inference

AnswerB

Asynchronous inference is ideal for near-real-time, event-driven workloads with S3 input/output and scales to zero when idle.

Why this answer

Asynchronous inference is designed for this use case: it processes images from S3 input, writes results to S3 output, scales to zero when idle, and is fully managed. Real-time endpoints are always running and incur cost when idle. Batch transform is not event-driven.

Serverless inference is event-driven but has a payload limit and cold start that may not be suitable for image payloads.

Full explanation →

757

MCQeasy

A data scientist is working on a time series forecasting problem. The dataset contains a column 'sales' with occasional negative values due to returns. The model expects non-negative input. Which data preparation step should be taken?

A.Clip negative sales values to zero

B.Apply log transformation after adding a constant

C.Remove all rows with negative sales values

D.Impute negative values with the mean

AnswerA

Sets returns to zero, which is appropriate for sales data.

Why this answer

Option A is correct because clipping negative sales values to zero directly addresses the model's requirement for non-negative input while preserving the data's temporal structure. This approach is appropriate for time series forecasting where returns cause occasional negative values, as it treats returns as zero sales rather than removing or distorting the data points.

Exam trap

AWS often tests the misconception that removing or imputing negative values is safe in time series, but the trap here is that these actions break temporal dependencies and introduce bias, whereas clipping preserves the sequence structure.

How to eliminate wrong answers

Option B is wrong because applying a log transformation after adding a constant does not guarantee non-negative values; it only compresses the scale and can introduce bias, especially with negative values that require arbitrary shifting. Option C is wrong because removing all rows with negative sales values disrupts the time series continuity and can lead to loss of important temporal patterns, such as seasonality or trends. Option D is wrong because imputing negative values with the mean introduces statistical bias and distorts the underlying distribution, which is particularly problematic in time series where data points are sequentially dependent.

Full explanation →

758

MCQeasy

Refer to the exhibit. A user is unable to invoke a SageMaker endpoint. The IAM policy shown is attached to the user. Which permission is missing to allow invocation?

A.sagemaker:InvokeEndpoint

B.sagemaker:DescribeEndpoint

C.sagemaker:CreateEndpoint

D.sagemaker:ListEndpoints

AnswerA

InvokeEndpoint is required to send inference requests.

Why this answer

To invoke a SageMaker endpoint, the user needs the `sagemaker:InvokeEndpoint` permission. The IAM policy shown lacks this action, which is required for making real-time inference requests to the endpoint. Without it, any attempt to call the endpoint via the SDK or CLI will fail with an access denied error.

Exam trap

AWS often tests the distinction between read-only permissions (like `DescribeEndpoint` or `ListEndpoints`) and the specific action required to perform an operation, leading candidates to confuse metadata access with actual invocation capability.

How to eliminate wrong answers

Option B is wrong because `sagemaker:DescribeEndpoint` only allows retrieving metadata about an endpoint, not invoking it for inference. Option C is wrong because `sagemaker:CreateEndpoint` is for creating new endpoints, not for sending inference requests to an existing one. Option D is wrong because `sagemaker:ListEndpoints` only lists endpoints in the account, which does not grant the ability to invoke them.

Full explanation →

759

MCQhard

A company uses Amazon SageMaker Feature Store to store features for a real-time recommendation model. The feature data is updated continuously, and the model must use the most recent feature values for each user at inference time. Which type of Feature Store should the company use for serving features to the model?

A.Offline store

B.Point-in-time queries

C.Online store

D.Both online and offline store

AnswerC

Online store provides low-latency reads for real-time inference with latest feature values.

Why this answer

Online store provides low-latency access to the latest feature values for real-time inference. Offline store is for batch training and analytics. Point-in-time queries are for historical retrieval, not real-time serving.

Full explanation →

760

MCQmedium

A company uses Amazon SageMaker Data Wrangler to create a data flow for a classification model. The dataset contains a high-cardinality categorical feature 'product_id' with 50,000 unique values. The data scientist wants to reduce dimensionality while preserving predictive power. Which approach is most effective?

A.Apply one-hot encoding to the 'product_id' column.

B.Perform target encoding by replacing each product ID with the average target value for that product.

C.Use feature hashing to map product IDs to a fixed number of buckets (e.g., 100).

D.Drop the 'product_id' column entirely.

AnswerB

Target encoding condenses information into a single numerical feature while retaining predictive signals.

Why this answer

Target encoding is the most effective approach for high-cardinality categorical features because it replaces each category with the mean of the target variable, preserving predictive signal while drastically reducing dimensionality. In SageMaker Data Wrangler, this can be implemented using the 'Encode categorical' transform with the 'Target encoding' option, which avoids the explosion of features caused by one-hot encoding and retains the relationship between product IDs and the target.

Exam trap

AWS often tests the misconception that feature hashing is always safe for high-cardinality features, but the trap here is that hash collisions can degrade model performance, making target encoding a better choice when the target variable is available and predictive.

How to eliminate wrong answers

Option A is wrong because one-hot encoding on a feature with 50,000 unique values would create 50,000 binary columns, leading to extreme dimensionality and sparsity, which degrades model performance and increases computational cost. Option C is wrong because feature hashing maps product IDs to a fixed number of buckets (e.g., 100), which can cause hash collisions and loss of information, reducing predictive power compared to target encoding. Option D is wrong because dropping the column entirely discards all predictive information contained in the product IDs, which is likely to harm model accuracy.

Full explanation →

761

MCQhard

A financial services company deploys a credit risk model using an Amazon SageMaker endpoint with data capture enabled. The model uses a custom container. The compliance team requires that all inference requests and responses are logged to an S3 bucket with server-side encryption using AWS KMS. The IAM role for the endpoint has the following policy. What must be added to meet the compliance requirement?

A.Add kms:GenerateDataKey and kms:Decrypt permissions to the IAM role.

B.Add s3:PutObjectAcl permission to the IAM role.

C.Enable S3 default encryption on the bucket.

D.Modify the container to handle encryption internally.

AnswerA

These permissions are necessary to write to a KMS-encrypted bucket.

Why this answer

The correct answer is A because the IAM role for the SageMaker endpoint needs permissions to generate a data key (kms:GenerateDataKey) for encrypting captured data and to decrypt (kms:Decrypt) the KMS key when writing to the S3 bucket. Without these, the endpoint cannot use the customer-managed KMS key for server-side encryption, even if the bucket policy allows it.

Exam trap

The trap here is that candidates often assume enabling S3 default encryption (Option C) is sufficient, but SageMaker data capture requires explicit KMS permissions in the endpoint's IAM role to use the customer-managed key.

How to eliminate wrong answers

Option B is wrong because s3:PutObjectAcl is not required for server-side encryption with KMS; it is used for managing object-level access control lists, not encryption. Option C is wrong because enabling S3 default encryption on the bucket does not satisfy the requirement for server-side encryption using AWS KMS for data captured by SageMaker; the endpoint must explicitly use the KMS key via the IAM role. Option D is wrong because modifying the container to handle encryption internally would bypass the managed data capture feature and is not necessary; SageMaker data capture already supports KMS encryption natively.

Full explanation →

762

Multi-Selectmedium

A machine learning team needs to monitor a deployed model for both data drift and concept drift. Which TWO approaches should they implement? (Select TWO.)

Select 2 answers

A.Set up SageMaker Model Monitor for data quality monitoring

B.Use SageMaker Clarify for bias monitoring

C.Configure CloudWatch Logs Insights to query inference logs

D.Set up SageMaker Model Monitor for model quality monitoring

E.Enable SageMaker Debugger during inference

AnswersA, D

Data quality monitoring detects drift in input features.

Why this answer

SageMaker Model Monitor can be configured for data quality (data drift) and model quality (concept drift) monitoring. Data drift monitors input distribution changes, while model quality monitors prediction accuracy against ground truth.

Full explanation →

763

Multi-Selectmedium

An MLOps team is designing a CI/CD pipeline for deploying machine learning models to production on Amazon SageMaker. They want to ensure that the deployment process is automated and that models are automatically rolled back if performance degrades. Which of the following AWS services or features should they use to achieve this? (Choose THREE.)

Select 3 answers

A.Amazon SageMaker Model Registry

B.Amazon SageMaker Ground Truth

C.Amazon CloudWatch

D.Amazon SageMaker Pipelines

E.AWS CloudTrail

AnswersA, C, D

Model Registry manages model versions and approvals.

Why this answer

Amazon SageMaker Model Registry is correct because it provides a centralized catalog for managing, versioning, and approving ML models. It enables automated deployment by triggering CI/CD pipelines when a model version is approved, and supports automatic rollback by allowing you to revert to a previous approved version if performance degrades, as detected by monitoring metrics.

Exam trap

The trap here is that candidates may confuse SageMaker Ground Truth (a data labeling service) or CloudTrail (an auditing service) with the core MLOps components needed for automated deployment and rollback, overlooking that Model Registry, Pipelines, and CloudWatch are the precise services that form the CI/CD and monitoring backbone.

Full explanation →

764

MCQmedium

A healthcare company is developing a predictive model to identify patients at risk of readmission within 30 days after discharge. The dataset contains electronic health record (EHR) data from multiple hospitals, stored as Parquet files in Amazon S3. The data includes patient demographics, diagnoses (ICD-10 codes), medications, lab results, and length of stay. A data scientist notices that the 'lab_result' column has a high number of null values (over 60%) because some tests are not applicable to all patients. Additionally, the 'diagnosis_code' column has over 10,000 unique ICD-10 codes. The company wants to build a model that complies with HIPAA and performs well. The data scientist must prepare the features efficiently using AWS services. Which combination of steps should the data scientist take? (Assume the company can use any AWS service.)

A.Use AWS Glue ETL to impute missing lab results with a value predicted from other features using a model like XGBoost, and apply count encoding to diagnosis codes based on their frequency of occurrence.

B.Replace missing lab results with the overall mean, and use a binary flag for nullness. For diagnosis codes, apply one-hot encoding after grouping codes into 20 categories based on clinical relevance.

C.Drop all records where lab_result is null, and use one-hot encoding for diagnosis codes.

D.Use Amazon SageMaker Data Wrangler's built-in 'Fill missing' with KNN imputation for lab results, and apply ordinal encoding to diagnosis codes based on the order of ICD-10 chapters.

AnswerA

Predictive imputation leverages other features to estimate missing values, retaining data. Count encoding reduces the cardinality of diagnosis codes.

Why this answer

Option A is correct because it uses AWS Glue ETL to impute missing lab results with a predictive model (XGBoost), which is appropriate for high missingness (>60%) where simple imputation would bias the model, and applies count encoding to the high-cardinality diagnosis codes (10,000+ unique values) to avoid the dimensionality explosion of one-hot encoding while preserving frequency information. This approach balances HIPAA compliance (data stays within AWS) with model performance.

Exam trap

The trap here is that candidates often choose simple mean imputation (Option B) or dropping rows (Option C) without considering the impact of high missingness on bias and data loss, or they overcomplicate encoding (Option D) without recognizing that ordinal encoding implies a false order for categorical codes.

How to eliminate wrong answers

Option B is wrong because replacing 60%+ missing lab results with the overall mean ignores the non-random missingness (tests not applicable to all patients) and introduces severe bias, and grouping 10,000+ ICD-10 codes into only 20 categories based on clinical relevance loses granularity and may not reflect readmission risk patterns. Option C is wrong because dropping all records with null lab results would discard over 60% of the data, leading to massive data loss and a non-representative dataset, and one-hot encoding 10,000+ diagnosis codes creates an unmanageable feature space (sparse matrix) that degrades model performance. Option D is wrong because KNN imputation on a dataset with >60% missingness in the same column is computationally expensive and unreliable (neighbors themselves may have missing values), and ordinal encoding based on ICD-10 chapter order imposes an arbitrary ordinal relationship that does not reflect clinical risk or readmission likelihood.

Full explanation →

765

MCQeasy

An ML engineer runs the CLI command shown in the exhibit. However, the training job fails immediately with an error: 'Unable to assume role'. What is the most likely cause?

A.The IAM role 'SageMakerExecutionRole' does not have permission to create the training job.

B.The training image in ECR does not exist.

C.The S3 bucket 'my-bucket' does not exist.

D.The IAM role's trust policy does not grant SageMaker permission to assume the role.

AnswerD

Without proper trust policy, SageMaker cannot assume the role, causing immediate failure.

Why this answer

The 'Unable to assume role' error indicates that SageMaker cannot assume the IAM role specified in the CLI command. This is a trust policy issue: the role's trust policy must include SageMaker as a trusted service (i.e., `"Service": "sagemaker.amazonaws.com"`). Without this, SageMaker is not authorized to assume the role, regardless of the role's permissions.

Exam trap

AWS often tests the distinction between IAM role permissions (what the role can do) and trust policies (who can assume the role), leading candidates to mistakenly select a permission-related option when the error is about trust.

How to eliminate wrong answers

Option A is wrong because the error is about assuming the role, not about the role's permissions to create the training job; permission errors would appear as 'AccessDenied' or similar, not 'Unable to assume role'. Option B is wrong because a missing ECR image would cause an error like 'Image not found' or 'RepositoryNotFoundException', not an assume role error. Option C is wrong because a non-existent S3 bucket would result in an error like 'NoSuchBucket' or 'AccessDenied' when SageMaker tries to access it, not an assume role failure.

Full explanation →

766

Multi-Selecteasy

A company is using Amazon SageMaker to deploy a model for real-time inference. The model requires access to a private S3 bucket that contains reference data. The company wants to ensure that the endpoint can access the S3 bucket without using a public internet connection. Which TWO actions should they take? (Select TWO.)

Select 2 answers

A.Configure the endpoint's security group to allow outbound traffic to the S3 bucket's IP range.

B.Attach the endpoint to a VPC that has a VPC endpoint for S3.

C.Ensure the SageMaker execution role has an IAM policy that grants s3:GetObject access to the bucket.

D.Attach the endpoint to a VPC with an internet gateway and route the S3 traffic through the internet gateway.

E.Attach the endpoint to a VPC with a NAT gateway to route traffic to S3.

AnswersB, C

VPC endpoints allow private connectivity to S3 without internet.

Why this answer

Option B is correct because attaching the SageMaker endpoint to a VPC with a VPC endpoint for S3 (Gateway type) allows the endpoint to access the S3 bucket using AWS's private network, bypassing the public internet. This ensures traffic stays within the AWS backbone, meeting the requirement for no public internet connection. Option C is also correct because the SageMaker execution role must have an IAM policy with s3:GetObject permissions to authorize the read access to the private S3 bucket, which is a prerequisite for any S3 operation.

Exam trap

The trap here is that candidates often confuse VPC endpoints (which keep traffic private) with NAT gateways or internet gateways (which route traffic over the public internet), and they may overlook the mandatory IAM permissions required for S3 access even when using a VPC endpoint.

Full explanation →

767

MCQhard

A team is deploying a model that requires low-latency inference for real-time predictions. They are using a SageMaker endpoint with a single instance. During testing, they observe high latency. Which change would most effectively reduce latency?

A.Use a multi-model endpoint

B.Add Elastic Inference

C.Enable SageMaker Batch Transform

D.Switch to a larger instance type

AnswerD

Correct: Larger instances provide more CPU/GPU for faster inferences.

Why this answer

Switching to a larger instance type (Option D) directly increases the compute and memory resources available to the SageMaker endpoint, which reduces inference latency by allowing the model to process requests faster. Since the team is using a single instance, scaling up is the most straightforward way to handle the computational load and meet real-time latency requirements.

Exam trap

The trap here is that candidates often confuse scaling up (larger instance) with scaling out (multiple instances) or assume that Elastic Inference always reduces latency, but Elastic Inference adds network latency and is better for cost savings on large models, not for minimizing per-request latency.

How to eliminate wrong answers

Option A is wrong because a multi-model endpoint is designed to host multiple models on a single instance to improve resource utilization, not to reduce latency for a single model; it can actually increase latency due to model loading and unloading overhead. Option B is wrong because Elastic Inference attaches a separate accelerator for deep learning inference, but it adds network round-trip time between the instance and the accelerator, which can increase latency for real-time predictions, especially for small models or low-latency requirements. Option C is wrong because SageMaker Batch Transform is an asynchronous batch processing service that processes large datasets offline, not suitable for real-time, low-latency predictions.

Full explanation →

768

MCQmedium

A data engineer is using Amazon SageMaker Ground Truth to create a labeled dataset for an object detection task. The dataset contains millions of images, and the labeling budget is limited. Which approach can reduce labeling costs while maintaining high model accuracy?

A.Enable active learning in Ground Truth to automatically select a subset of images for human labeling

B.Label 100% of the images using a pre-built worker template to ensure accuracy

C.Use Amazon SageMaker Data Wrangler to annotate images

D.Use automated labeling with a pre-trained model for all images and skip human review

AnswerA

Active learning iteratively selects the most valuable samples for human labeling, reducing cost while maintaining model performance.

Why this answer

Active learning in Ground Truth selects the most informative samples (e.g., uncertain predictions) for human labeling, reducing the number of labels needed while maximizing model improvement. This is a built-in feature.

Full explanation →

769

MCQmedium

A data scientist trains a neural network on SageMaker using the TensorFlow framework. The training accuracy is lower than expected, and the scientist suspects vanishing gradients. How can the scientist leverage SageMaker Debugger to diagnose this?

A.Increase the number of training epochs to allow gradients to propagate.

B.Export model summaries to TensorBoard for manual inspection.

C.Reduce the learning rate to prevent gradient explosion.

D.Use a built-in Debugger rule to monitor gradient magnitudes during training.

AnswerD

Built-in rules like VanishingGradient can detect and alert when gradients become too small.

Why this answer

SageMaker Debugger provides built-in rules that automatically monitor tensors like gradients during training. By enabling a rule such as `VanishingGradient` or `ExplodingGradient`, the data scientist can receive real-time alerts when gradient magnitudes fall below a threshold, directly diagnosing the vanishing gradient problem without manual inspection or code changes.

Exam trap

Cisco often tests the distinction between vanishing and exploding gradients, expecting candidates to know that reducing the learning rate addresses exploding gradients, while monitoring gradients with Debugger is the correct diagnostic step for vanishing gradients.

How to eliminate wrong answers

Option A is wrong because increasing epochs does not address the root cause of vanishing gradients; it merely extends training time without fixing gradient propagation. Option B is wrong because exporting model summaries to TensorBoard requires manual setup and does not provide automated monitoring or alerts for gradient magnitudes during training. Option C is wrong because reducing the learning rate helps prevent gradient explosion (large gradients), not vanishing gradients (small gradients), and may even exacerbate the vanishing problem by further reducing weight updates.

Full explanation →

770

MCQmedium

A company wants to deploy a PyTorch model on SageMaker using the NVIDIA Triton Inference Server for GPU acceleration. They have an existing Triton configuration. Which approach should they take?

A.Use SageMaker Neo to compile the model for Triton

B.Package Triton as a custom container and use SageMaker batch transform

C.Use the SageMaker Triton Inference Server container from the Deep Learning Containers

D.Use the standard SageMaker PyTorch container and install Triton at runtime

AnswerC

The SageMaker Triton DLC is pre-configured for Triton and supports PyTorch models.

Why this answer

Option C is correct because AWS provides a pre-built SageMaker Triton Inference Server container as part of the Deep Learning Containers (DLCs), which is optimized for GPU acceleration and supports the existing Triton configuration without modification. This container integrates directly with SageMaker hosting endpoints, enabling seamless deployment of PyTorch models with Triton's features like dynamic batching and model concurrency.

Exam trap

The trap here is that candidates may assume SageMaker Neo is a universal compilation tool for any inference server, but Neo is specifically for hardware-specific optimization and does not support Triton's runtime environment, leading them to incorrectly select Option A.

How to eliminate wrong answers

Option A is wrong because SageMaker Neo compiles models for specific hardware targets (e.g., Intel, ARM) and does not support compilation for the NVIDIA Triton Inference Server; Neo is designed for edge devices and does not integrate with Triton's serving architecture. Option B is wrong because while packaging Triton as a custom container is possible, using SageMaker batch transform is not the recommended approach for real-time inference with GPU acceleration; batch transform is for offline, asynchronous processing, not for low-latency serving. Option D is wrong because installing Triton at runtime on the standard PyTorch container is inefficient and error-prone; it adds startup latency, may cause dependency conflicts, and bypasses the pre-optimized, tested Triton container that AWS provides.

Full explanation →

771

MCQmedium

A team uses AWS Auto Scaling for a SageMaker real-time endpoint. They notice that when scaling in, the latest instance is always terminated first, causing disruption to recent requests. How can they configure the scaling policy to terminate the oldest instance first?

A.Configure the termination policy as 'OldestInstance'

B.No action needed; this is the default behavior

C.Use lifecycle hooks

D.Use AWS CloudFormation to manage the endpoint

AnswerA

You can set the termination policy to 'OldestInstance' in the scaling policy configuration.

Why this answer

Option A is correct because AWS Auto Scaling for SageMaker endpoints supports a termination policy of 'OldestInstance', which explicitly instructs the scaling process to terminate the instance that has been running the longest. By default, Auto Scaling terminates the newest instance (the default termination policy), which can disrupt recent requests. Configuring the termination policy to 'OldestInstance' ensures that the oldest, most stable instance is removed first, minimizing disruption to in-flight requests.

Exam trap

The trap here is that candidates assume the default termination policy is 'OldestInstance' or that lifecycle hooks can influence instance selection, when in fact the default is 'NewestInstance' and lifecycle hooks only add a delay, not a selection rule.

How to eliminate wrong answers

Option B is wrong because the default behavior of AWS Auto Scaling is to terminate the newest instance first (the 'Default' termination policy), not the oldest, so action is needed to change this. Option C is wrong because lifecycle hooks are used to perform custom actions (e.g., draining connections) before an instance is terminated or launched, but they do not control which instance is selected for termination; they only add a pause in the lifecycle. Option D is wrong because AWS CloudFormation is an infrastructure-as-code service for provisioning resources, not a mechanism to configure the termination policy of an Auto Scaling group; the termination policy must be set directly on the Auto Scaling group or via the SageMaker endpoint configuration.

Full explanation →

772

MCQmedium

A team is using Amazon SageMaker to train a neural network. They want to minimize training time while effectively exploring the hyperparameter space. Which approach should they use?

A.Random search

B.Bayesian optimization

C.Grid search

D.Manual tuning

AnswerB

Bayesian optimization uses past evaluations to focus on promising regions, reducing training time.

Why this answer

Bayesian optimization is the correct approach because it builds a probabilistic model of the objective function and uses it to select the most promising hyperparameters to evaluate next, balancing exploration and exploitation. This method converges to optimal hyperparameters in fewer iterations than random or grid search, significantly reducing training time for expensive neural network models.

Exam trap

Cisco often tests the misconception that random search is always the best for hyperparameter tuning, but the question explicitly asks to minimize training time, which favors Bayesian optimization's efficient use of prior evaluations.

How to eliminate wrong answers

Option A is wrong because random search, while better than grid search for high-dimensional spaces, does not use past evaluation results to inform future trials, leading to more wasted iterations and longer training time. Option C is wrong because grid search exhaustively evaluates all combinations of a predefined set of hyperparameter values, which is computationally prohibitive for neural networks with many hyperparameters and does not scale efficiently. Option D is wrong because manual tuning relies on human intuition and trial-and-error, which is slow, non-reproducible, and cannot systematically explore the hyperparameter space to minimize training time.

Full explanation →

773

MCQhard

A company needs to update a model in production without any downtime. They currently have a single real-time endpoint serving traffic. Which approach allows them to deploy a new model version and switch traffic gradually while being able to roll back quickly?

A.Use a canary deployment by creating a new production variant with the new model and shifting traffic incrementally

B.Use a multi-model endpoint and replace the model file

C.Stop the endpoint, update the model, and restart the endpoint

D.Update the existing endpoint's model directly using UpdateEndpoint

AnswerA

This allows gradual traffic shift and the old variant can be used for rollback if needed.

Why this answer

SageMaker supports production variants with traffic splitting. By creating a new variant with the new model and shifting traffic gradually, the old variant remains available for rollback. Blue/green deployment with a new endpoint and endpoint configuration swap also allows quick rollback.

The key is to have both variants active during the transition.

Full explanation →

774

Multi-Selectmedium

A data science team deploys a TensorFlow model for real-time inference using the Amazon SageMaker model configuration shown. They observe high latency during the first few requests after deployment. Which TWO actions would reduce cold start latency? (Choose two.)

Select 2 answers

A.Enable data capture on the endpoint

B.Set the SAGEMAKER_PROGRAM environment variable to a more optimized entry point

C.Add a secondary container for model ensemble

D.Configure a Production Variant with an initial instance count greater than zero

E.Use Amazon SageMaker Multi-Model Endpoints

AnswersD, E

Setting an initial instance count ensures that instances are always running, preventing cold start.

Why this answer

Option D is correct because setting an initial instance count greater than zero ensures that SageMaker provisions and initializes the endpoint instances before traffic arrives, eliminating the cold start delay caused by model loading and container startup. Option E is correct because Multi-Model Endpoints keep multiple models loaded in memory on the same instance, reducing the need to load a model from Amazon S3 for each new request, which directly mitigates cold start latency.

Exam trap

Cisco often tests the misconception that environment variables like SAGEMAKER_PROGRAM control inference behavior, when in fact they are only relevant for training jobs, leading candidates to incorrectly select Option B.

Full explanation →

775

MCQeasy

An ML team wants to use Amazon SageMaker Ground Truth to create a labeled dataset for a multi-class image classification task. They have a large set of unlabeled images and want to minimize labeling costs while maintaining high accuracy. Which Ground Truth feature should they enable?

A.Active learning

B.Annotation consolidation

C.Data labeling workforce management

D.Consolidated labeling

AnswerA

Active learning selects the most uncertain or informative samples for labeling, minimizing cost while maximizing model improvement.

Why this answer

Active learning in SageMaker Ground Truth automatically selects the most informative unlabeled images for human labeling, reducing the total number of labels needed while maintaining model accuracy. By iteratively training a model on a small labeled subset and then using that model to identify uncertain predictions, the system focuses labeling effort on the data that will most improve the model, directly minimizing labeling costs.

Exam trap

The trap here is that candidates may confuse 'annotation consolidation' (a post-labeling quality step) with a cost-reduction feature, or think that workforce management alone reduces costs, when in fact active learning is the specific feature designed to minimize the number of labels required.

How to eliminate wrong answers

Option B (Annotation consolidation) is wrong because it refers to combining multiple annotations for the same data point to produce a ground truth label, which does not reduce the number of labels needed. Option C (Data labeling workforce management) is wrong because it involves managing human labelers (e.g., public, private, or vendor workforces) but does not inherently reduce labeling volume or cost. Option D (Consolidated labeling) is not a distinct SageMaker Ground Truth feature; it is a generic term that might be confused with annotation consolidation, and it does not address cost minimization through selective labeling.

Full explanation →

776

MCQhard

Refer to the exhibit. A SageMaker execution role has the IAM policy shown. The team attempts to run a training job that writes results to 's3://my-bucket/training/output/model.tar.gz'. What will happen?

A.The training job will fail because the Deny statement blocks all PutObject actions.

B.The training job will succeed and write the model artifact.

C.The training job will fail because the Deny statement overrides the Allow.

D.The training job will succeed, but the output file will be encrypted with a different key.

AnswerB

The Deny does not affect this resource.

Why this answer

The training job will succeed because the Allow statement in the IAM policy explicitly grants s3:PutObject on the specific object 's3://my-bucket/training/output/model.tar.gz', and the Deny statement only blocks PutObject on objects with a 'training/' prefix in the key. Since the target object key is 'training/output/model.tar.gz', it does not start with 'training/' (the prefix is 'training/output/'), so the Deny does not apply. The Allow is therefore effective, and the model artifact is written successfully.

Exam trap

The trap here is that candidates assume the Deny statement blocks all PutObject actions to the 'training/' directory, but they overlook that the target object's key includes a subdirectory ('output/'), so the prefix 'training/' does not match the full key path 'training/output/model.tar.gz'.

How to eliminate wrong answers

Option A is wrong because the Deny statement does not block all PutObject actions; it only denies PutObject on objects whose key starts with 'training/', and the target object key 'training/output/model.tar.gz' does not match that prefix. Option C is wrong because the Deny statement does not override the Allow in this case; the Deny only applies when the condition (key starting with 'training/') is met, which it is not. Option D is wrong because there is no mention of encryption keys in the policy; the policy only controls access permissions, not encryption behavior.

Full explanation →

777

MCQeasy

Which SageMaker feature automatically generates model cards, feature importance, and bias reports without requiring manual coding?

A.SageMaker Autopilot

B.SageMaker Experiments

C.SageMaker Clarify

D.SageMaker Model Monitor

AnswerA

Autopilot automatically creates model cards, feature importance, and bias reports.

Why this answer

SageMaker Clarify provides bias detection and feature importance, and it can generate reports. SageMaker Autopilot generates model cards and explanations. SageMaker Experiments tracks experiments.

SageMaker Model Monitor is for monitoring. Autopilot is the correct answer because it automates the entire pipeline including model cards and explanations.

Full explanation →

778

MCQhard

A data scientist is using Amazon SageMaker Data Wrangler for feature engineering on a large dataset stored in S3. The dataset has a column 'ProductCategory' with 1000+ unique values. To reduce dimensionality, they want to group categories that appear less than 1% of the time into an 'Other' category. Which Data Wrangler transform should they use?

A.Group similar categories

B.Custom transform with Python

C.Handle rare values

D.One-hot encode with threshold

AnswerC

This built-in transform can group categories below a frequency threshold into an 'Other' value.

Why this answer

The 'Handle rare values' transform in SageMaker Data Wrangler is specifically designed to group infrequent category values into a single 'Other' bucket based on a frequency threshold (e.g., less than 1%). This directly addresses the need to reduce dimensionality by consolidating rare categories without requiring custom code or manual grouping.

Exam trap

The trap here is that candidates may confuse the 'Handle rare values' transform with the 'One-hot encode with threshold' transform, mistakenly thinking the threshold in one-hot encoding serves the same purpose as grouping rare categories, when in fact it limits the number of one-hot columns created, not the grouping of infrequent values.

How to eliminate wrong answers

Option A is wrong because 'Group similar categories' is a manual grouping transform that requires the user to explicitly define which categories to combine, not an automated threshold-based grouping of rare values. Option B is wrong because while a custom Python transform could technically achieve this, it is unnecessary and less efficient when a built-in, optimized transform ('Handle rare values') exists for this exact purpose. Option D is wrong because 'One-hot encode with threshold' applies to one-hot encoding (creating binary columns) and its threshold controls the maximum number of one-hot features, not the grouping of rare categories into an 'Other' bucket.

Full explanation →

779

MCQmedium

A data scientist deploys a model and wants to monitor the endpoint's invocation latency. They notice that the CloudWatch metric 'ModelLatency' is high, but 'OverheadLatency' is low. Which statement correctly interprets these metrics?

A.The SageMaker overhead is causing the delay; check endpoint configuration

B.The model inference time is the bottleneck; consider optimizing the model or using a faster instance type

C.The endpoint is overloaded; increase the number of instances

D.The network latency is high; move the endpoint closer to clients

AnswerB

High ModelLatency indicates inference time is the issue.

Why this answer

The 'ModelLatency' metric measures the time taken by the SageMaker model container to process a single request, including inference and any preprocessing/postprocessing within the container. 'OverheadLatency' measures the time spent on SageMaker infrastructure (e.g., network I/O, request queuing, and response handling). When ModelLatency is high and OverheadLatency is low, the bottleneck is clearly the model inference time itself, not the infrastructure overhead. Therefore, optimizing the model (e.g., quantization, pruning) or upgrading to a faster instance type (e.g., GPU vs.

CPU) is the correct remediation.

Exam trap

The trap here is that candidates confuse 'ModelLatency' with overall endpoint latency and assume any high latency is due to infrastructure or scaling issues, when in fact the metric explicitly isolates the model's own inference time from overhead.

How to eliminate wrong answers

Option A is wrong because high ModelLatency with low OverheadLatency indicates the delay is inside the model container, not in SageMaker's infrastructure overhead; checking endpoint configuration would not address the model's own inference time. Option C is wrong because endpoint overload typically manifests as increased OverheadLatency (due to request queuing) or increased Invocations and 5xx errors, not as isolated high ModelLatency with low OverheadLatency. Option D is wrong because network latency is captured within OverheadLatency, not ModelLatency; moving the endpoint closer to clients would reduce OverheadLatency but would not affect the model's inference computation time.

Full explanation →

780

MCQeasy

A data engineer is preparing a large dataset of 10 TB for ML training on Amazon SageMaker. The data is stored in Amazon S3 as CSV files. To reduce training time and cost, the engineer wants to use a columnar format that is optimized for analytical queries. Which format should the engineer convert the data to?

A.XML

B.Parquet

C.ORC

D.JSON Lines

AnswerB

Parquet is a columnar format that speeds up data access and reduces storage costs.

Why this answer

Parquet is a columnar storage format that is highly optimized for analytical queries and is natively supported by Amazon SageMaker for efficient data loading. By converting the 10 TB of CSV data to Parquet, the data engineer can reduce I/O and storage costs because columnar formats allow SageMaker to read only the columns needed for training, rather than scanning entire rows. This directly addresses the goal of reducing training time and cost for ML workloads.

Exam trap

AWS often tests the distinction between columnar formats (Parquet vs. ORC) by making both appear correct, but the trap here is that ORC is tightly coupled with Hive and less commonly used with SageMaker, while Parquet is the de facto standard for AWS-native ML and analytics services.

How to eliminate wrong answers

Option A (XML) is wrong because XML is a verbose, row-oriented text format that is not optimized for analytical queries; it would increase storage size and I/O overhead, making training slower and more expensive. Option C (ORC) is also a columnar format optimized for analytical queries, but it is primarily designed for and tightly integrated with the Apache Hive ecosystem, whereas Parquet is the more universally supported and recommended format for Amazon SageMaker and AWS analytics services. Option D (JSON Lines) is wrong because it is a row-oriented, text-based format that lacks the compression and columnar pruning benefits of Parquet, leading to higher storage costs and slower data access for ML training.

Full explanation →

781

MCQmedium

A team is tuning hyperparameters for a neural network using SageMaker's HyperparameterTuningJob with Bayesian optimization. After several trials, the objective metric has not improved significantly. Which action is most likely to help continue making progress?

A.Expand the hyperparameter ranges

B.Switch to random search strategy

C.Use a warm start with previous tuning results

D.Switch to Bayesian search

AnswerB

Random search introduces exploration and can discover new promising regions beyond the current exploitation focus.

Why this answer

Option D is correct because switching to random search introduces exploration and can help escape local optima that Bayesian optimization might be stuck exploiting. Option A (switch to Bayesian) is already in use. Option B (warm start) uses previous results but does not change the search strategy.

Option C (expand ranges) might help if the optimum lies outside current ranges, but stagnation often requires more exploration.

Full explanation →

782

MCQmedium

A company deploys a real-time inference endpoint and wants to be alerted if the number of 4XX errors exceeds 10 per minute over a 5-minute period. Which steps should they take?

A.Create a CloudWatch alarm on the 4XXError metric with a threshold of 10 and an evaluation period of 5 minutes, and configure SNS notification

B.Create a CloudWatch alarm on the Invocations metric and set a threshold

C.Enable endpoint auto-scaling with a target tracking policy

D.Use SageMaker Model Monitor to capture invocations and trigger an SNS topic

AnswerA

Correct metric, threshold, and action.

Why this answer

Option A is correct because a CloudWatch alarm on the `4XXError` metric with a threshold of 10 and an evaluation period of 5 minutes directly monitors the rate of HTTP 4XX errors from the SageMaker real-time inference endpoint. When the alarm state transitions to ALARM (i.e., the average 4XX errors per minute exceeds 10 over the 5-minute window), it triggers an SNS notification to alert the team. This is the standard approach for real-time metric-based alerting in AWS.

Exam trap

The trap here is that candidates confuse metric-based alerting (CloudWatch alarms on `4XXError`) with monitoring services (Model Monitor) or scaling mechanisms (auto-scaling), leading them to pick options that address different operational concerns.

How to eliminate wrong answers

Option B is wrong because the `Invocations` metric counts total requests, not 4XX errors, so it cannot detect error rate thresholds. Option C is wrong because endpoint auto-scaling with a target tracking policy adjusts capacity based on a target metric (e.g., Invocations per instance), not on error counts, and it does not generate alerts. Option D is wrong because SageMaker Model Monitor is designed for data quality, bias, and drift detection on captured payloads, not for real-time HTTP error rate monitoring; it cannot directly trigger alerts on 4XX error counts per minute.

Full explanation →

783

MCQeasy

A data scientist is using SageMaker to train a linear regression model. After training, they evaluate the model on the test set and get an R² of 0.95. However, when they deploy the model to a SageMaker endpoint and run predictions on new data, the predictions are far off. What is the most likely cause?

A.The endpoint is using a different inference script.

B.The test set is not representative of the production data distribution.

C.The model was trained with a wrong algorithm.

D.The model is overfitting the training data.

AnswerB

Correct: Data drift causes model to perform poorly on new data despite good test metrics.

Why this answer

A high R² of 0.95 on the test set indicates the model fits the test data well, but if the test set was drawn from the same distribution as the training data and does not reflect the real-world production data, the model will fail to generalize. In SageMaker, the endpoint serves predictions on live data that may have different statistical properties, leading to poor performance despite high test-set metrics. This is a classic case of dataset shift, not a model training or deployment configuration issue.

Exam trap

The trap here is that candidates confuse high test-set R² with model generalization, overlooking that the test set itself may be non-representative of production data, which is a core concept in the MLA-C01 exam under 'Model Evaluation and Validation'.

How to eliminate wrong answers

Option A is wrong because a different inference script would cause runtime errors or incorrect preprocessing, not systematically poor predictions on new data; SageMaker endpoints use the same inference code as the training container unless explicitly changed. Option C is wrong because using a wrong algorithm would typically result in poor training metrics (e.g., low R² on the test set), not a high R² of 0.95; the model converged well on the given data. Option D is wrong because overfitting would manifest as a large gap between training and test set performance (e.g., R² near 1.0 on training but much lower on test), but here the test R² is 0.95, suggesting the model generalizes to the test set; the issue is with production data differing from the test set.

Full explanation →

784

MCQmedium

A machine learning engineer notices that the latency of a SageMaker endpoint has increased over time. They need to identify which component (model inference vs. pre/post-processing) contributes most to the latency. Which CloudWatch metrics should they examine?

A.Latency and ModelLatency

B.Invocations and 4XXError

C.5XXError and MemoryUtilization

D.ModelLatency and OverheadLatency

AnswerD

ModelLatency shows inference time inside the container; OverheadLatency shows SageMaker overhead. Comparing them pinpoints the latency source.

Why this answer

SageMaker endpoints emit CloudWatch metrics that break down total latency into model inference time (ModelLatency) and the time spent in pre/post-processing (OverheadLatency). By comparing these two metrics, the engineer can pinpoint whether the bottleneck is in the inference code or in the custom preprocessing/postprocessing logic. Option D directly provides both metrics needed for this root-cause analysis.

Exam trap

The trap here is that candidates confuse the total Latency metric with a breakdown metric, assuming it alone can identify the bottleneck, when in fact only the pair of ModelLatency and OverheadLatency provides the necessary decomposition.

How to eliminate wrong answers

Option A is wrong because Latency is the total end-to-end response time, and ModelLatency alone only covers inference; together they do not isolate the pre/post-processing component. Option B is wrong because Invocations and 4XXError track request count and client-side errors, not latency breakdown. Option C is wrong because 5XXError indicates server-side failures and MemoryUtilization shows resource pressure, but neither metric decomposes latency into inference vs. overhead.

Full explanation →

785

MCQhard

A healthcare startup has deployed a machine learning model on Amazon SageMaker that predicts patient readmission risks. The model uses sensitive health data stored in an S3 bucket encrypted with AWS KMS. The SageMaker endpoint is configured with an IAM role that has the following policy attached: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": "arn:aws:s3:::healthcare-data/*", "Condition": { "Bool": { "aws:SecureTransport": "true" } } }, { "Effect": "Allow", "Action": "kms:Decrypt", "Resource": "*" } ] }. During a security audit, the team discovers that the IAM role's KMS permission is too permissive because it allows decryption of any KMS key in the account. The team needs to modify the policy to follow the principle of least privilege while still allowing the SageMaker endpoint to read the encrypted data. Which modification should the team make?

A.Change the KMS statement Action to "kms:DescribeKey" instead of "kms:Decrypt"

B.Add a condition to the KMS statement: "Condition": { "StringEquals": { "kms:ViaService": "s3.us-east-1.amazonaws.com" } }

C.Remove the KMS statement entirely, as S3 bucket policies with SSE-KMS do not require KMS permissions

D.Change the KMS statement to: "Action": "kms:Decrypt", "Resource": "arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab"

AnswerD

Restricting the Resource to the specific KMS key ARN ensures that the role can only decrypt the key used for the healthcare data, adhering to least privilege.

Why this answer

The current policy allows kms:Decrypt on any KMS key (*). To follow least privilege, the team should restrict the Resource to the specific KMS key used to encrypt the S3 bucket. Option A (change the Action to kms:Decrypt and restrict Resource to the specific key ARN) is correct.

Option B (remove the KMS statement entirely) would break the endpoint because it cannot decrypt the data. Option C (add a condition for specific encryption context) is good practice but still allows decryption of any key if condition is met, not least privilege. Option D (use kms:DescribeKey instead of kms:Decrypt) does not allow decryption.

Full explanation →

786

MCQhard

Refer to the exhibit. A team receives an error when running a SageMaker Model Monitor schedule for data quality. What should they do to resolve this issue?

A.Update the IAM role to allow S3 access

B.Restart the monitoring schedule

C.Enable data capture on the endpoint

D.Create a baseline job using the training dataset

AnswerD

A baseline must be generated from training data to compare inference data against.

Why this answer

The error occurs because SageMaker Model Monitor requires a baseline to compare against live data. Without a baseline job created from the training dataset, the monitoring schedule fails. Option D resolves this by generating the necessary statistics and constraints that define expected data quality.

Exam trap

The trap here is that candidates often assume the error is a permissions or configuration issue (S3 access or data capture), but the root cause is the mandatory prerequisite of a baseline job before a monitoring schedule can run.

How to eliminate wrong answers

Option A is wrong because the IAM role likely already has S3 access if the endpoint and model artifacts are deployed; the error is not a permissions issue. Option B is wrong because restarting the monitoring schedule does not address the missing baseline; the schedule will fail again. Option C is wrong because data capture must already be enabled on the endpoint for Model Monitor to collect inference data; the error indicates a missing baseline, not a missing data capture configuration.

Full explanation →

787

MCQeasy

A data scientist is preparing a dataset for a linear regression model. The dataset has a few missing values in a numerical feature with a normal distribution and no outliers. Which imputation method is most appropriate?

A.Impute with mode

B.Impute with mean

C.Impute with median

D.Drop rows with missing values

AnswerB

Mean is appropriate for normally distributed numerical data without outliers.

Why this answer

For a numerical feature with a normal distribution and no outliers, the mean is the most appropriate imputation method because it preserves the central tendency of the data without introducing bias. In linear regression, mean imputation maintains the expected value of the feature, which is critical for unbiased coefficient estimates when data are missing completely at random (MCAR).

Exam trap

The trap here is that candidates often confuse the median with the mean for normal distributions, but the median is actually less efficient and can lead to biased variance estimates, while the mean is the maximum likelihood estimator for normally distributed data with no outliers.

How to eliminate wrong answers

Option A is wrong because the mode is intended for categorical data, not for a normally distributed numerical feature, and it would distort the distribution by replacing missing values with the most frequent value rather than the central tendency. Option C is wrong because the median is robust to outliers, but since the dataset has no outliers and a normal distribution, the median is less efficient than the mean and would slightly underestimate the variance, reducing statistical power. Option D is wrong because dropping rows with missing values reduces sample size and can introduce bias if the missingness is not completely random, whereas imputation with the mean is a standard technique for MCAR data in linear regression.

Full explanation →

788

MCQmedium

A company wants to deploy a machine learning model that makes real-time predictions for a mobile app. The model is a deep neural network with a large model size (500 MB). Which SageMaker endpoint configuration is most cost-effective while meeting low-latency requirements?

A.Multi-model endpoint

B.Serverless inference

C.Real-time endpoint with a single instance

D.Batch transform

AnswerC

Ensures low latency and is cost-effective for a single model with sustained traffic.

Why this answer

Option C is correct because a real-time endpoint with a single instance provides the lowest latency for a 500 MB deep neural network model, as it keeps the model loaded in memory and ready for inference without cold starts or multi-model overhead. This configuration is also cost-effective for consistent traffic patterns, as you pay for the instance uptime rather than per-invocation or for multiple model loads.

Exam trap

Cisco often tests the misconception that multi-model endpoints are always more cost-effective for large models, but the trap here is that multi-model endpoints introduce significant latency from disk I/O for models over 100 MB, making them unsuitable for real-time inference despite lower instance costs.

How to eliminate wrong answers

Option A is wrong because multi-model endpoints are designed to host multiple smaller models on a single instance, but they incur latency overhead from loading/unloading models from disk, which is unsuitable for a 500 MB model requiring real-time predictions. Option B is wrong because serverless inference has a maximum payload size of 6 MB and experiences cold starts, making it incompatible with a 500 MB model and low-latency requirements. Option D is wrong because batch transform is an asynchronous, offline inference method that does not provide real-time predictions, and it is designed for large-scale batch processing, not low-latency mobile app requests.

Full explanation →

789

MCQhard

A machine learning engineer is deploying a PyTorch model for real-time inference on SageMaker. The model requires GPU for low-latency predictions. The deployment fails with the error: 'The primary container does not support the requested instance type.' The instance type is ml.p3.2xlarge. Which action should the engineer take to resolve the issue?

A.Use SageMaker Neo to compile the model for the target instance type

B.Request a service quota increase for the ml.p3.2xlarge instance type

C.Verify that the PyTorch framework version specified in the SageMaker estimator matches a version that supports GPU instances

D.Create a custom inference container and use it with the SageMaker model

AnswerC

Older PyTorch versions may not support GPU; using a supported version resolves the error.

Why this answer

Option C is correct because the error 'The primary container does not support the requested instance type' typically occurs when the specified PyTorch framework version in the SageMaker estimator does not include GPU support for the chosen instance type (ml.p3.2xlarge). SageMaker's prebuilt PyTorch containers are version-specific and only certain versions are compiled with CUDA and GPU libraries; using a version that lacks GPU support causes the container to reject GPU instance types. Verifying and selecting a PyTorch version that explicitly supports GPU instances resolves the mismatch.

Exam trap

The trap here is that candidates often assume the error is due to resource limits (quota) or hardware incompatibility (Neo), rather than recognizing it as a framework version and container image mismatch specific to GPU support.

How to eliminate wrong answers

Option A is wrong because SageMaker Neo compiles models for edge devices or optimized inference on specific hardware, but it does not fix a container-instance type compatibility error; the error occurs before model compilation. Option B is wrong because a service quota increase addresses insufficient capacity or account limits for the instance type, not a container-level compatibility error; the error indicates the container rejects the instance type, not that the instance is unavailable. Option D is wrong because creating a custom inference container is unnecessary when the issue is simply a version mismatch in the prebuilt container; the error can be resolved by selecting a supported PyTorch version without custom container overhead.

Full explanation →

790

MCQmedium

A company wants to build a customer service chatbot that answers questions about their internal policy documents. The documents are updated monthly, and the team cannot afford to retrain a model each time. Which approach is MOST appropriate?

A.Use a larger foundation model with a longer context window and paste all documents into each prompt

B.Fine-tune a base LLM on the policy documents monthly

C.Train a custom model from scratch on the policy documents each month

D.Use Retrieval-Augmented Generation (RAG) with the policy documents indexed in a vector store

AnswerD

RAG retrieves relevant document chunks at query time, ensuring the chatbot always answers from the latest uploaded documents without any model retraining.

Why this answer

RAG (Retrieval-Augmented Generation) allows the LLM to retrieve relevant document sections at inference time, so knowledge stays current without retraining. The other options either require expensive retraining for each update or lack document grounding.

Full explanation →

791

MCQeasy

A data scientist is training a regression model in Amazon SageMaker. The dataset contains missing values in several features. The scientist wants to handle missing values as part of the training pipeline to ensure consistency between training and inference. Which approach should the scientist use?

A.Impute missing values in a separate Jupyter notebook and save the cleaned data.

B.Use SageMaker Autopilot to automatically handle missing values.

C.Drop all rows with missing values before training.

D.Use a scikit-learn container in SageMaker to create a preprocessing step that imputes missing values and include it in the inference pipeline.

AnswerD

Consistent preprocessing in pipeline.

Why this answer

Option D is correct because it uses a scikit-learn container within SageMaker to create a preprocessing step that imputes missing values, then includes that step in the inference pipeline. This ensures the same imputation logic (e.g., mean, median, or custom strategy) is applied consistently during both training and inference, preventing data drift and maintaining reproducibility. SageMaker Pipelines or the built-in scikit-learn container allow the preprocessing to be serialized as part of the model artifact, so inference requests automatically undergo the same transformation.

Exam trap

The trap here is that candidates often assume SageMaker Autopilot (Option B) is the correct choice because it automates preprocessing, but they miss that the question specifically requires a custom, reproducible pipeline that ensures consistency between training and inference, which Autopilot does not expose for custom control.

How to eliminate wrong answers

Option A is wrong because handling missing values in a separate Jupyter notebook and saving the cleaned data breaks the training-inference consistency; the imputation logic is not captured in a reusable pipeline, leading to potential mismatch when new data arrives during inference. Option B is wrong because SageMaker Autopilot is an automated machine learning service that handles missing values internally during model selection, but it does not allow the data scientist to control the imputation method or integrate a custom preprocessing step into a production inference pipeline. Option C is wrong because dropping all rows with missing values can discard valuable data, reduce model performance, and is not feasible when missing values appear in inference-time data, as the pipeline would have no strategy to handle them.

Full explanation →

792

MCQhard

A machine learning engineer is training a deep learning model using TensorFlow in SageMaker. The training runs on an ml.p3.16xlarge instance (8 GPUs). The engineer notices that GPU utilization is low (~30%) and time per epoch is high. The model uses a custom training loop. Which configuration change is most likely to improve GPU utilization?

A.Increase the batch size to match GPU memory

B.Reduce the number of data loading workers

C.Use mixed precision training

D.Enable SageMaker Managed Warm Pools

AnswerA

Larger batch size increases the amount of computation per step, keeping GPUs more fully utilized.

Why this answer

Low GPU utilization (~30%) with a custom training loop on an 8-GPU instance typically indicates that each GPU is not receiving enough work to keep its compute units busy. Increasing the batch size to match GPU memory allows each GPU to process more data per step, improving arithmetic intensity and reducing the overhead of frequent weight updates, which directly raises GPU utilization.

Exam trap

Cisco often tests the misconception that mixed precision training is the universal fix for low GPU utilization, when in reality it only helps if the model is memory-bound or if FP32 is causing out-of-memory errors, not when the batch size is too small.

How to eliminate wrong answers

Option B is wrong because reducing the number of data loading workers would likely worsen data pipeline bottlenecks, further starving the GPUs and decreasing utilization. Option C is wrong because mixed precision training primarily improves memory usage and throughput by using FP16, but it does not address the root cause of low utilization from insufficient per-GPU workload; it may even reduce utilization if the batch size remains small. Option D is wrong because SageMaker Managed Warm Pools reduce cold start times for subsequent training jobs, but they have no effect on GPU utilization during an active training run.

Full explanation →

793

MCQhard

A data scientist is training a model using SageMaker and wants to use spot instances to reduce costs. The training job is checkpointed every 5 minutes. However, the job gets interrupted frequently and never completes. What is the MOST likely cause?

A.The checkpoint interval is too long relative to the interruption frequency

B.The checkpoint S3 URI is incorrect

C.The instance type is too small for the training job

D.The job is configured with too few max retries

AnswerA

If interruptions occur more often than checkpoints, progress is lost and job may never complete.

Why this answer

Spot instances can be reclaimed with little notice. If the job checkpoint interval is longer than the average interruption notice, progress may be lost. Using a smaller instance type reduces cost but not interruption frequency.

Incorrect checkpoint path causes save failures. Too few max retries causes job to stop after few interruptions.

Full explanation →

794

Multi-Selecthard

A machine learning engineer is deploying a TensorFlow model for real-time inference. The model has high latency on CPU. Which TWO actions can reduce inference latency? (Choose two.)

Select 2 answers

A.Enable SageMaker Model Monitor

B.Switch to a multi-model endpoint

C.Attach Amazon Elastic Inference to the endpoint

D.Use a larger instance type with more vCPUs

E.Compile the model with SageMaker Neo

AnswersC, E

Elastic Inference adds GPU acceleration, reducing latency.

Why this answer

Compiling with SageMaker Neo optimizes the model for the target hardware. Attaching Elastic Inference provides GPU acceleration without moving to a full GPU instance.

Full explanation →

795

MCQmedium

A data scientist is using Amazon SageMaker Ground Truth to create a labeled dataset for an object detection model. The dataset contains 1 million images, and the team wants to reduce labeling cost by labeling only the most informative samples. Which feature of Ground Truth should they use?

A.Active learning

B.Automated data labeling

C.Pre-built annotation worker UI

D.Consolidated labeling

AnswerA

Active learning selects samples where the model is uncertain, maximizing labeling efficiency.

Why this answer

Ground Truth offers active learning, which automatically selects the most informative samples to label, reducing cost. Option A is correct. Options B, C, and D do not provide automatic sample selection for labeling.

Full explanation →

796

MCQhard

A data scientist is running a SageMaker training job with a custom PyTorch image. The training script loads a large dataset into memory, and the job fails with an out-of-memory error after a few minutes. The instance type is ml.m5.xlarge (16 GB RAM). What should the data scientist do to resolve this issue without changing the instance type?

A.Enable SageMaker Managed Spot Training to free memory

B.Implement data loading with multiprocessing and increase the number of workers

C.Reduce the batch size in the training script

D.Use SageMaker Pipe mode to stream data from S3

AnswerC

Smaller batch sizes reduce memory consumption per step, helping to fit within the available RAM.

Why this answer

Reducing the batch size decreases the amount of data loaded into memory at once, directly addressing the out-of-memory error without changing the instance type. Since the training script loads a large dataset into memory and fails after a few minutes, a smaller batch size reduces peak memory consumption per iteration, allowing the job to fit within the 16 GB RAM of ml.m5.xlarge.

Exam trap

The trap here is that candidates confuse streaming data (Pipe mode) with reducing in-memory data loading, not realizing that the script's explicit load into memory bypasses any streaming benefit.

How to eliminate wrong answers

Option A is wrong because SageMaker Managed Spot Training provides cost savings via discounted spare EC2 capacity but does not free or reduce memory usage; it can even cause interruptions that require checkpointing. Option B is wrong because increasing the number of workers with multiprocessing increases memory overhead due to data duplication across processes, exacerbating the out-of-memory issue. Option D is wrong because SageMaker Pipe mode streams data from S3 directly to the training algorithm without writing to disk, but the training script still loads the dataset into memory, so the memory footprint remains unchanged.

Full explanation →

797

MCQeasy

A company has deployed a SageMaker real-time endpoint for a model that predicts customer churn. The endpoint uses a single ml.m5.large instance. After deployment, the team notices that during peak hours, the endpoint returns 5xx errors for about 20% of requests. The endpoint has not been configured with any scaling policy. The team needs to resolve this issue with minimal cost increase. Which solution should the team implement?

A.Deploy the model to a multi-model endpoint to reduce resource utilization.

B.Enable Auto Scaling for the endpoint with a target tracking policy based on the average InvocationsPerInstance metric.

C.Increase the instance type to ml.m5.xlarge to handle more concurrent requests.

D.Use SageMaker batch transform instead of real-time inference to process peak traffic asynchronously.

AnswerB

Auto Scaling adds instances only when needed, minimizing cost while handling peak load.

Why this answer

Option B is correct because enabling Auto Scaling with a target tracking policy based on the average InvocationsPerInstance metric dynamically adjusts the number of instances in response to traffic spikes, preventing 5xx errors during peak hours without over-provisioning. This approach minimizes cost by scaling only when needed, unlike manual instance upgrades or batch transforms that either increase baseline cost or introduce latency.

Exam trap

The trap here is that candidates often confuse 'scaling up' (increasing instance size) with 'scaling out' (adding more instances), and overlook that Auto Scaling with a target tracking policy is the most cost-effective way to handle variable traffic, as it matches capacity to demand in real time.

How to eliminate wrong answers

Option A is wrong because deploying to a multi-model endpoint reduces resource utilization by sharing a single container across multiple models, but it does not address the root cause of insufficient capacity for a single model under peak load; it may even exacerbate contention. Option C is wrong because increasing the instance type to ml.m5.xlarge provides more compute per instance but incurs a fixed higher cost regardless of traffic, failing the 'minimal cost increase' requirement and not dynamically adapting to variable load. Option D is wrong because SageMaker batch transform is designed for asynchronous, offline inference on large datasets, not for real-time requests; it would introduce unacceptable latency and cannot serve interactive predictions, thus not resolving the immediate 5xx errors during peak hours.

Full explanation →

798

Multi-Selectmedium

An MLOps engineer is designing a CI/CD pipeline for deploying machine learning models to a production SageMaker endpoint. The pipeline should include automated testing, approval gates, and rollback capability. Which THREE components should be included in the pipeline? (Select THREE.)

Select 3 answers

A.A step to register the model in SageMaker Model Registry.

B.A CloudFormation template to deploy the endpoint infrastructure, enabling rollback via stack update.

C.A separate staging endpoint to validate the model before production deployment.

D.A manual approval step after staging testing.

E.A step to run SageMaker Debugger to monitor training.

AnswersB, C, D

Infrastructure as code allows precise rollback by redeploying a previous CloudFormation stack.

Why this answer

Option B is correct because using a CloudFormation template to deploy the SageMaker endpoint infrastructure enables rollback via stack update. If a deployment fails, CloudFormation can automatically roll back the stack to the previous known good state, ensuring infrastructure consistency and reducing downtime.

Exam trap

The trap here is that candidates confuse model registry steps (Option A) or training monitoring tools (Option E) with deployment pipeline components, but the question specifically asks for components that enable automated testing, approval gates, and rollback capability in the CI/CD pipeline for deploying to a production SageMaker endpoint.

Full explanation →

799

MCQhard

A machine learning engineer is deploying a pre-trained NLP model on Amazon SageMaker for real-time inference. The model expects input sequences of variable length, and performance is critical. The engineer wants to minimize latency while handling the variable-length inputs efficiently. Which approach should the engineer choose?

A.Reduce the model size by pruning and quantization.

B.Pad all input sequences to the maximum length in the batch.

C.Use dynamic batching with a custom inference script that groups requests by sequence length.

D.Process each request individually to avoid padding overhead.

AnswerC

Dynamic batching reduces padding and latency.

Why this answer

Option C is correct because dynamic batching with a custom inference script that groups requests by sequence length minimizes padding overhead and maximizes hardware utilization. By batching similar-length sequences together, the model avoids excessive padding to the maximum length in the batch, which reduces wasted computation and latency. This approach is particularly effective for variable-length NLP inputs on SageMaker, where the inference container can be customized to implement the grouping logic.

Exam trap

AWS often tests the misconception that padding to the maximum length is always necessary or efficient, but the trap here is that dynamic batching with length-based grouping is a more sophisticated technique that balances batching efficiency with minimal padding overhead.

How to eliminate wrong answers

Option A is wrong because pruning and quantization reduce model size and can improve latency, but they do not address the core issue of efficiently handling variable-length input sequences; they are orthogonal optimizations. Option B is wrong because padding all sequences to the maximum length in the batch introduces significant wasted computation and memory, especially when sequence lengths vary widely, leading to higher latency. Option D is wrong because processing each request individually eliminates batching benefits, resulting in lower throughput and higher per-request latency due to underutilized hardware accelerators.

Full explanation →

800

Multi-Selectmedium

A machine learning engineer is using SageMaker Autopilot for AutoML. Which TWO outputs does Autopilot produce?

Select 2 answers

A.A hyperparameter tuning job summary

B.An ensemble of candidate models

C.A data labeling pipeline

D.A single optimal model

E.An explainability report

AnswersB, E

Full explanation →

801

MCQmedium

An ML team at a financial services company has developed a fraud detection model using Amazon SageMaker. The model is currently deployed to a production endpoint with a single variant using the previous model version. The team wants to deploy a new model version with a canary deployment where 10% of traffic goes to the new version and 90% remains on the old version for 30 minutes before shifting all traffic to the new version if no issues are detected. Which step is essential to achieve this safe rollout?

A.Use the 'Deploy' method on the model object with the 'mode' parameter set to 'canary' within the built-in XGBoost algorithm container.

B.Update the endpoint with a new production variant for the new model version and set the 'InitialVariantWeight' to 10 for the new variant and 90 for the old variant, specifying a 'BlueGreenUpdatePolicy' with a 'TrafficRoutingConfiguration' for canary.

C.Ensure the endpoint is hosted on at least two instances to enable load balancing, then deploy the new model version as a separate variant and manually adjust the endpoint's DNS to split traffic.

D.Deploy the new model as a separate endpoint and use a SageMaker predictor to randomly route 10% of inference requests to the new endpoint.

AnswerB

This configuration uses SageMaker's blue/green deployment with canary traffic shifting, which is the correct approach.

Why this answer

Option B is correct because it uses the SageMaker endpoint update with a new production variant and sets 'InitialVariantWeight' to 10 for the new model and 90 for the old model, which routes 10% of traffic to the new version. Additionally, specifying a 'BlueGreenUpdatePolicy' with a 'TrafficRoutingConfiguration' for canary enables the automatic shift of all traffic to the new variant after 30 minutes if no issues are detected, achieving the desired safe rollout.

Exam trap

The trap here is that candidates may think canary deployments require manual traffic splitting or separate endpoints, but SageMaker's native 'BlueGreenUpdatePolicy' with 'TrafficRoutingConfiguration' automates the entire process, including traffic shifting and rollback, without needing custom code or DNS manipulation.

How to eliminate wrong answers

Option A is wrong because the 'Deploy' method on a model object does not have a 'mode' parameter set to 'canary'; SageMaker's built-in XGBoost container does not support canary deployment via a 'mode' parameter, and canary deployments are managed at the endpoint configuration level, not within the algorithm container. Option C is wrong because hosting the endpoint on at least two instances is not a requirement for canary deployments, and manually adjusting the endpoint's DNS to split traffic is not a supported or reliable method in SageMaker; traffic splitting is done via variant weights in the endpoint configuration. Option D is wrong because deploying the new model as a separate endpoint and using a SageMaker predictor to randomly route 10% of inference requests is not a built-in feature of SageMaker; it would require custom code and does not provide the automatic traffic shifting after 30 minutes, nor does it integrate with SageMaker's native deployment monitoring and rollback capabilities.

Full explanation →

802

MCQmedium

An ML pipeline uses SageMaker Processing to run a feature engineering script. The script takes a long time and the team wants to speed up pipeline execution. What is the MOST effective approach?

A.Increase the instance count for the Processing step

B.Enable pipeline caching for the Processing step

C.Use a larger instance type with more vCPUs

D.Use a Tuning step instead

AnswerA

More instances allow distributed processing, reducing wall clock time.

Why this answer

Increasing the instance count for the SageMaker Processing step enables distributed execution of the feature engineering script across multiple nodes. SageMaker Processing supports distributed processing by default when you set the instance_count > 1, which can dramatically reduce wall-clock time for embarrassingly parallel workloads like feature engineering. This is the most effective approach because it directly parallelizes the computation without requiring code changes if the script is designed to work with distributed frameworks like PySpark or if the data is sharded appropriately.

Exam trap

Cisco often tests the distinction between vertical scaling (larger instance) and horizontal scaling (more instances), where candidates mistakenly choose a larger instance type thinking it's always faster, but for distributed workloads like feature engineering, horizontal scaling is more effective and cost-efficient.

How to eliminate wrong answers

Option B is wrong because pipeline caching only avoids re-running the step if the inputs and parameters haven't changed; it does not speed up the execution of the step itself when it must run. Option C is wrong because using a larger instance type with more vCPUs provides vertical scaling, which has diminishing returns due to CPU/memory bottlenecks and does not scale as effectively as horizontal scaling (multiple instances) for large-scale feature engineering. Option D is wrong because a Tuning step is designed for hyperparameter optimization, not for running feature engineering scripts; it would not execute the script and would add unnecessary complexity and cost.

Full explanation →

803

Multi-Selecthard

A data scientist is working with a dataset containing customer demographics and purchase history. The dataset includes categorical variables with high cardinality (e.g., ZIP code, product ID). The data scientist wants to perform feature engineering to improve model performance. Which THREE feature engineering techniques should the data scientist consider? (Choose three.)

Select 3 answers

A.Principal Component Analysis (PCA) to reduce dimensionality of numerical features.

B.Domain-specific feature engineering based on business rules.

C.Target encoding for high-cardinality categorical variables.

D.Frequency encoding to represent categories by their occurrence count.

E.One-hot encoding all categorical features.

AnswersA, C, D

PCA can reduce noise and multicollinearity.

Why this answer

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms correlated numerical features into a smaller set of uncorrelated principal components, capturing the maximum variance in the data. This is correct because the dataset includes numerical features (e.g., purchase amounts, age) where PCA can reduce noise and multicollinearity, improving model performance without losing critical information.

Exam trap

AWS often tests the distinction between techniques that are universally applicable (like PCA for numerical features) versus those that are specifically designed to handle high-cardinality categorical variables (like target encoding and frequency encoding), tempting candidates to choose one-hot encoding without considering its impracticality for high cardinality.

Full explanation →

804

MCQeasy

A company uses SageMaker Pipelines to automate their ML workflow. They notice that the pipeline reruns all steps even when the input data has not changed. Which feature should they enable to avoid unnecessary recomputation?

A.Enable pipeline caching

B.Use a Lambda step to check input changes

C.Use a Conditional step to skip steps

D.Set the pipeline execution mode to 'Parallel'

AnswerA

Caching stores step outputs and reuses them when inputs are identical, preventing unnecessary reruns.

Why this answer

Pipeline caching in SageMaker Pipelines automatically reuses the output of a step if its inputs (including parameters, data, and code) have not changed since the last successful execution. This avoids recomputation by comparing a hash of the step's dependencies against previous runs, making it the correct feature to prevent unnecessary reruns when input data remains identical.

Exam trap

The trap here is that candidates confuse caching with conditional branching or parallel execution, assuming that skipping steps via conditions or running steps in parallel will avoid recomputation, when in fact only caching directly reuses prior outputs based on input immutability.

How to eliminate wrong answers

Option B is wrong because a Lambda step is used for custom processing or integration (e.g., invoking external APIs), not for detecting input changes or caching step outputs; it would add complexity without solving the core caching requirement. Option C is wrong because a Conditional step evaluates a condition to branch the pipeline (e.g., skip a step based on a metric), but it does not automatically detect unchanged inputs or cache results; it requires manual logic and still incurs overhead for the condition check. Option D is wrong because setting the pipeline execution mode to 'Parallel' controls whether steps run sequentially or concurrently, but it does not prevent recomputation of steps whose inputs have not changed; it only affects execution order, not caching.

Full explanation →

805

MCQmedium

A company uses SageMaker endpoints for real-time inference. They want to automatically scale the number of instances based on the number of outstanding requests. Which auto-scaling policy type should they choose?

A.Scheduled scaling

B.Step scaling

C.Target tracking scaling

D.Simple scaling

AnswerC

Target tracking automatically adjusts capacity to keep the specified metric at the target value.

Why this answer

Target tracking scaling adjusts the instance count to maintain a target metric value (e.g., average invocation count per instance). Step scaling uses predefined scaling adjustments based on alarm breaches but does not directly track a target. Simple scaling is not recommended for production.

Scheduled scaling is for predictable patterns, not dynamic.

Full explanation →

806

Multi-Selectmedium

A data science team is deploying a PyTorch model for real-time inference with sub-second latency requirements. They need to minimize cost while handling variable traffic. Which TWO approaches should they consider? (Choose TWO.)

Select 2 answers

A.Compile the model with SageMaker Neo

B.Attach Amazon Elastic Inference to a real-time endpoint

C.Use a batch transform job to process requests in batches

D.Use SageMaker serverless inference with a configured max concurrency

E.Use a multi-model endpoint (MME) to host the model

AnswersA, D

Neo optimizes the model for the target hardware, reducing inference latency and often allowing a smaller instance type.

Why this answer

Serverless inference auto-scales to zero when not in use and charges per request, minimizing cost for variable traffic. SageMaker Neo compiles the model for optimal hardware performance, achieving low latency. Multi-model endpoints (MME) are for hosting multiple models, not single-model optimization.

Elastic Inference adds GPU acceleration at lower cost than a full GPU instance, but with Neo compilation the team may not need it. Batch transform is for offline, not real-time.

Full explanation →

807

MCQeasy

A machine learning team wants to monitor bias in a deployed model's predictions on an ongoing basis. Which AWS service should they use to schedule bias monitoring jobs and generate reports?

A.Amazon QuickSight with Athona queries

B.AWS CloudTrail for prediction logging

C.SageMaker Model Monitor with data quality monitoring

D.SageMaker Clarify with bias drift monitoring

AnswerD

Clarify offers bias monitoring after deployment, detecting shifts in fairness metrics.

Why this answer

SageMaker Clarify is the correct choice because it provides built-in bias drift monitoring capabilities that can be scheduled to run on a recurring basis. It evaluates predictions against pre-training and post-training bias metrics (e.g., DPL, DI, CDDL) and generates detailed reports, making it the only service designed specifically for ongoing bias monitoring in deployed models.

Exam trap

The trap here is that candidates confuse SageMaker Model Monitor (which handles data and model quality drift) with SageMaker Clarify (which handles bias and explainability drift), leading them to select Option C even though it does not support bias-specific monitoring.

How to eliminate wrong answers

Option A is wrong because Amazon QuickSight with Athena queries is a business intelligence and visualization service, not a bias monitoring tool; it lacks the ability to schedule bias detection jobs or compute bias metrics. Option B is wrong because AWS CloudTrail logs API calls for auditing and governance, not model predictions or bias metrics; it cannot schedule bias monitoring or generate bias reports. Option C is wrong because SageMaker Model Monitor with data quality monitoring focuses on detecting data drift (e.g., feature distribution changes) and model quality degradation (e.g., accuracy), not bias drift; it does not compute fairness metrics like disparate impact or equal opportunity.

Full explanation →

808

MCQmedium

A team is fine-tuning a Hugging Face BERT model for text classification using SageMaker. They want to use the Hugging Face estimator for convenience. Which parameter must be set to use a custom training script?

A.framework_version

B.instance_type

C.hyperparameters

D.entry_point

AnswerD

entry_point points to the custom training script.

Why this answer

The entry_point parameter specifies the path to the training script. The instance_type is for hardware selection. The hyperparameters dictionary passes parameters to the script.

The framework_version specifies the Hugging Face version.

Full explanation →

809

MCQmedium

A machine learning engineer is training a TensorFlow model using SageMaker with distributed training. They need to implement data parallelism across multiple GPUs. Which SageMaker feature should they use to distribute the training?

A.SageMaker Distributed Data Parallelism

B.SageMaker Automatic Model Tuning

C.SageMaker Debugger

D.SageMaker Model Parallelism

AnswerA

This library implements data parallelism for SageMaker training.

Why this answer

SageMaker's distributed data parallelism library handles splitting data across GPUs and synchronizing gradients, optimized for TensorFlow and PyTorch.

Full explanation →

810

MCQeasy

Refer to the exhibit. A data scientist reviews the output of a SageMaker training job. The model has 95% training accuracy and 92% validation accuracy. Which statement is true?

A.The model has acceptable performance with a small generalization gap

B.The model is underfitting because the validation accuracy is too low

C.The model needs more epochs to improve validation accuracy

D.The model is overfitting because the training accuracy is higher than validation accuracy

AnswerA

The 3% gap is typical and the accuracy values are high.

Why this answer

Option C is correct because a 3% gap between training and validation accuracy is typically considered a small generalization gap and indicates acceptable performance. Option A (overfitting) would be a larger gap. Option B (underfitting) would show low accuracy on both sets.

Option D (more epochs) may not help if the model is already converging.

Full explanation →

811

Multi-Selectmedium

A machine learning engineer is training a neural network using Amazon SageMaker. The training job uses a single GPU instance. To improve training speed using distributed training, which two steps should they take? (Select TWO.)

Select 2 answers

A.Split the dataset into smaller files

B.Use SageMaker's distributed data parallelism library

C.Modify the training script to use Horovod or PyTorch DistributedDataParallel

D.Enable automatic mixed precision

E.Increase the number of worker instances in the training job

AnswersC, E

These frameworks enable multi-GPU communication and are necessary for distributed training.

Why this answer

Option C is correct because distributed training on a single GPU instance requires a framework-level approach like Horovod or PyTorch DistributedDataParallel (DDP) to coordinate gradient computation across multiple GPUs. SageMaker's distributed data parallelism library (Option B) is designed for multi-instance setups, not single-instance multi-GPU scenarios. Modifying the training script to use Horovod or DDP enables efficient allreduce-based gradient synchronization, which is essential for scaling training across GPUs within a single instance.

Exam trap

The trap here is that candidates often confuse SageMaker's distributed data parallelism library (which is for multi-instance training) with framework-level parallelism tools like Horovod or DDP, or mistakenly believe that data splitting or mixed precision alone constitutes distributed training.

Full explanation →

812

MCQeasy

An e-commerce company uses a SageMaker endpoint to serve a product recommendation model. The model is retrained every month using batch transforms. The ML team has set up a retraining pipeline using SageMaker Processing jobs and Step Functions. Recently, the Step Functions workflow has been failing at the retraining step with an error: 'AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/RetrainingRole/abc123 is not authorized to perform: s3:GetObject on resource: arn:aws:s3:::training-data/processed/latest.parquet'. The team confirms that the S3 bucket exists and the object is present. The retraining role has the following policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": "arn:aws:s3:::training-data/*" } ] }. The team also verifies that the bucket policy does not explicitly deny access. What is the MOST likely cause of the AccessDenied error?

A.The Step Functions execution role does not have permission to invoke the SageMaker Processing job

B.The path in the error message is misspelled; the actual object is at a different key

C.The S3 bucket has a bucket policy that denies access to the retraining role based on a condition like aws:SourceIp

D.The training data object uses server-side encryption with AWS KMS (SSE-KMS), and the retraining role lacks kms:Decrypt permission on the KMS key

AnswerD

If the object is encrypted with SSE-KMS, the role needs both s3:GetObject and kms:Decrypt. The current IAM policy does not include KMS permissions.

Why this answer

The error indicates that the retraining role is not authorized to GetObject on the specific object. Even though the policy allows 'arn:aws:s3:::training-data/*', if the object is encrypted with SSE-KMS, the role also needs kms:Decrypt permission on the KMS key. The bucket policy might also require encryption.

Option B is the most likely cause. Option A (wrong region) would give a different error. Option C (lack of S3 bucket policy) is not the issue if there is no explicit deny.

Option D (path typo) would result in a 404 Not Found error, not AccessDenied.

Full explanation →

813

MCQmedium

A data engineer needs to join two large datasets from Amazon S3: one containing customer demographics and another containing transaction history. The join key is `customer_id`. To minimize data shuffling and improve performance, the engineer decides to use Amazon SageMaker Processing with Spark. Which configuration should the engineer use?

A.Use a bucketed join with the same number of buckets

B.Broadcast join the larger dataset

C.Use a bucketed join with the same number of buckets and co-location

D.Use a repartition on the join key before join

AnswerC

Bucketing with co-location allows Spark to perform the join without shuffling.

Why this answer

Option C is correct because bucketed joins with the same number of buckets and co-location ensure that data with the same `customer_id` hash is physically stored together on the same nodes. This eliminates the need for expensive shuffles during the join, as Spark can perform the join locally within each executor, dramatically improving performance for large datasets in SageMaker Processing.

Exam trap

The trap here is that candidates assume bucketing alone (same number of buckets) is sufficient, but without co-location, Spark still performs a shuffle to align the data, so both conditions are required for a shuffle-free join.

How to eliminate wrong answers

Option A is wrong because bucketed joins require both datasets to have the same number of buckets AND co-location (data physically stored together); without co-location, Spark still shuffles data to align partitions, negating the performance benefit. Option B is wrong because broadcast join is only efficient when one dataset is small enough to fit in memory (typically <100 MB); the transaction history dataset is large, so broadcasting it would cause out-of-memory errors or severe performance degradation. Option D is wrong because repartitioning on the join key before the join adds an extra shuffle step, increasing overhead rather than reducing it; bucketing with co-location avoids shuffles entirely.

Full explanation →

814

Multi-Selectmedium

A company uses SageMaker to deploy a model and wants to perform A/B testing by splitting traffic between two model variants. Which TWO actions should they take? (Select TWO.)

Select 2 answers

A.Configure two production variants on the endpoint, each with an initial weight

B.Use SageMaker Model Registry to approve both variants

C.Use the UpdateEndpointWeightsAndCapacity API to adjust traffic after analysis

D.Deploy each variant to a separate endpoint and use Route53 weighted routing

E.Enable shadow testing on the endpoint

AnswersA, C

This defines the variants and their traffic split.

Why this answer

Option A is correct because SageMaker endpoints support multiple production variants, each with an assigned weight that determines the proportion of traffic routed to that variant. By setting initial weights (e.g., 50/50 or 90/10), you can split traffic between two model variants for A/B testing without deploying separate endpoints.

Exam trap

The trap here is that candidates confuse shadow testing (Option E) with A/B testing, but shadow testing does not split live traffic—it only mirrors requests for offline analysis, while A/B testing requires actual traffic distribution between variants.

Full explanation →

815

Multi-Selecteasy

A data scientist is using SageMaker Autopilot to automatically build a model. Which TWO aspects does Autopilot handle? (Choose TWO.)

Select 2 answers

A.Data ingestion

B.Model deployment

C.Feature engineering

D.Data labeling

E.Hyperparameter tuning

AnswersC, E

Correct: Autopilot automatically explores different feature transformations.

Why this answer

Option C is correct because SageMaker Autopilot automatically performs feature engineering, which includes data preprocessing, feature transformation, and selection of the most relevant features to improve model performance. This is a core capability of Autopilot, as it analyzes the dataset and applies techniques like one-hot encoding, scaling, and imputation without manual intervention.

Exam trap

The trap here is that candidates often confuse SageMaker Autopilot's automated capabilities with full MLOps automation, mistakenly thinking it handles data ingestion or deployment, when in fact it focuses solely on model building tasks like feature engineering and hyperparameter tuning.

Full explanation →

816

MCQhard

A data scientist is preparing data for a regression model. The target variable has a skewed distribution. The scientist wants to apply a log transformation to make it closer to normal. Which step should be taken before applying log transformation?

A.Standardize the data to zero mean and unit variance

B.Remove outliers using IQR

C.Ensure all values are positive

D.Center the data by subtracting the mean

AnswerC

Log is undefined for zero and negative values. If present, add a constant or use other transformations.

Why this answer

The log transformation is defined only for positive real numbers; applying it to zero or negative values results in undefined or complex outputs. Therefore, before applying a log transformation, you must ensure all values in the target variable are positive, typically by adding a constant (e.g., log(x + 1)) if zeros are present. This step is a fundamental data preparation requirement for log transformations in regression modeling.

Exam trap

AWS often tests the assumption that candidates will confuse data normalization or centering with the domain restriction of the log function, leading them to pick standardization or mean-centering as a preparatory step.

How to eliminate wrong answers

Option A is wrong because standardizing to zero mean and unit variance (z-score normalization) does not guarantee all values become positive; it centers data around zero, which can produce negative values, making log transformation invalid. Option B is wrong because removing outliers using IQR is not a prerequisite for log transformation; while outliers can affect model performance, the log transformation itself can help mitigate skewness and reduce the influence of outliers, and removing them beforehand is an optional, separate step. Option D is wrong because centering data by subtracting the mean shifts values to have a mean of zero, which inevitably introduces negative values, directly contradicting the requirement for positive inputs for log transformation.

Full explanation →

817

MCQmedium

A team is fine-tuning a large language model (LLM) using SageMaker and wants to reduce memory footprint during training. Which technique should they use?

A.Use LoRA (Low-Rank Adaptation) with fp32 precision

B.Use QLoRA (Quantized Low-Rank Adaptation) with 4-bit quantization

C.Use SageMaker Model Parallelism with tensor parallelism

D.Full fine-tuning on a p3.16xlarge instance

AnswerB

QLoRA uses 4-bit quantization to drastically lower memory usage while preserving performance.

Why this answer

QLoRA (Quantized Low-Rank Adaptation) combines 4-bit quantization with low-rank adapters, significantly reducing GPU memory usage while maintaining model quality.

Full explanation →

818

Multi-Selectmedium

Which TWO options are recommended best practices for monitoring model performance in production on SageMaker? (Choose 2.)

Select 2 answers

A.Retrain the model daily based on recent data without evaluation.

B.Use SageMaker Clarify for bias monitoring and feature importance drift.

C.Enable SageMaker Model Monitor to capture data drift and model quality metrics.

D.Set up a CloudWatch alarm on the endpoint's Invocations metric.

E.Manually compare prediction distributions weekly.

AnswersB, C

Clarify can monitor bias and explainability over time.

Why this answer

SageMaker Clarify is a recommended best practice for monitoring model performance because it provides automated bias detection and feature importance drift analysis, helping to identify when model predictions become unfair or when the relationships between features and predictions change over time. This is critical for maintaining model integrity and compliance in production.

Exam trap

The trap here is that candidates often confuse operational metrics (like invocation count or latency) with model performance monitoring, leading them to select CloudWatch alarms on Invocations instead of the specialized drift and bias detection tools.

Full explanation →

819

MCQeasy

A data scientist is using SageMaker Automatic Model Tuning to find the best hyperparameters for a model. They want to reduce the total tuning time for a given number of training jobs. Which tuning strategy should they choose?

A.Hyperband

B.Grid search

C.Random search

D.Bayesian optimization

AnswerA

Hyperband uses early stopping to prune bad trials, reducing total tuning time for the same number of jobs.

Why this answer

Hyperband is an early stopping strategy that allocates resources to promising configurations and stops poor performers early, reducing total tuning time compared to random search or Bayesian optimization without early stopping.

Full explanation →

820

MCQeasy

A data science team has trained a model using SageMaker and wants to deploy it to a production endpoint with automatic scaling based on request volume. Which SageMaker feature should they use to configure scaling?

A.SageMaker Endpoint Autoscaling

B.SageMaker Debugger

C.SageMaker Model Registry

D.SageMaker Pipelines

AnswerA

Endpoint Autoscaling automatically adjusts the number of instances based on demand.

Why this answer

SageMaker Endpoint Autoscaling is the correct feature because it automatically adjusts the number of instances behind a SageMaker hosted endpoint based on a target metric (e.g., requests per minute, CPU utilization) using Application Auto Scaling. This allows the endpoint to handle varying request volumes without manual intervention, ensuring cost efficiency and performance.

Exam trap

The trap here is that candidates may confuse SageMaker Debugger (a training debugger) or SageMaker Pipelines (a workflow tool) with scaling features, when only Endpoint Autoscaling directly manages production instance count based on request volume.

How to eliminate wrong answers

Option B (SageMaker Debugger) is wrong because it is a monitoring and debugging tool for training jobs, not for scaling production endpoints. Option C (SageMaker Model Registry) is wrong because it is a catalog for versioning and managing trained models, not a scaling mechanism. Option D (SageMaker Pipelines) is wrong because it is a workflow orchestration service for building and automating ML pipelines, not for configuring endpoint scaling.

Full explanation →

821

MCQmedium

A team uses MLflow on SageMaker for experiment tracking. They want to automate the retraining of a model when new training data arrives in an S3 bucket. Which combination of services should they use?

A.SageMaker Pipelines scheduled trigger every hour

B.EventBridge -> Lambda -> SageMaker Training Job

C.S3 Event Notifications -> SQS -> SageMaker Training Job

D.AWS Step Functions with S3 poller

AnswerB

EventBridge captures S3 events, Lambda initiates training, and MLflow can log the run.

Why this answer

EventBridge can detect S3 PutObject events and trigger a Lambda function that starts a SageMaker training job, possibly using MLflow for tracking.

Full explanation →

822

MCQmedium

A company uses SageMaker Inference Recommender to select the optimal endpoint configuration. After running the recommender, they receive a recommendation for a specific instance type and initial instance count. What should they do next to optimize costs over time?

A.Use the recommended configuration without changes, as it is already optimal

B.Purchase a Savings Plan for the recommended instance type to reduce hourly cost

C.Set up auto-scaling with a target tracking policy based on the recommended metric

D.Manually adjust the instance count daily based on observed traffic

AnswerC

Auto-scaling adjusts capacity to demand, minimizing cost while meeting performance.

Why this answer

SageMaker Inference Recommender provides a baseline configuration. To optimize costs, they should apply auto-scaling with a target tracking policy based on the recommended metric, such as invocation count or latency.

Full explanation →

823

MCQhard

A data scientist creates a feature group as shown in the exhibit. When ingesting data with an 'age' column of integer values, the ingestion fails. What is the most likely cause?

A.The role does not have permissions to write to the feature store.

B.The `age` feature type should be `Integral`, not `String`.

C.The `OnlineStoreConfig` must include a `SecurityConfig`.

D.The `EventTimeFeatureName` is incorrectly spelled.

AnswerB

The feature type must match the ingested data type.

Why this answer

Option B is correct because the feature group definition specifies the 'age' column as a `String` type, but the ingested data contains integer values. Amazon SageMaker Feature Store requires that the data types of ingested records match the schema defined in the feature group. When a mismatch occurs, such as providing an integer for a string field, the ingestion fails with a type conversion error.

Exam trap

AWS often tests the distinction between schema definition and actual data types, trapping candidates who overlook that the feature group schema must exactly match the ingested data's types, not just the column names.

How to eliminate wrong answers

Option A is wrong because the question states the ingestion fails specifically due to a data type mismatch, not a permissions issue; a permissions error would typically occur at the API call level, not during data parsing. Option C is wrong because `SecurityConfig` is not a required field in `OnlineStoreConfig`; the online store configuration only requires an `EnableOnlineStore` boolean and optionally a `SecurityGroupIdList` and `SubnetIdList` for VPC settings. Option D is wrong because the `EventTimeFeatureName` is spelled correctly as 'EventTime' in the exhibit, and a misspelling would cause a different error (e.g., 'InvalidParameterValue') rather than a data type mismatch.

Full explanation →

824

MCQhard

A machine learning engineer is building a time-series forecasting model to predict daily sales for the next 30 days. The dataset spans two years of daily sales data. To evaluate model performance, the engineer needs to simulate a realistic forecasting scenario where the model is trained on past data and tested on future data without leakage. Which data splitting strategy should they use?

A.Hold-out validation using a random 80/20 split

B.Walk-forward validation with an expanding window

C.Stratified sampling based on sales volume

D.k-fold cross-validation with random shuffling

AnswerB

Walk-forward validation trains on all past data and evaluates on the next unobserved time step, mimicking real-world forecasting.

Why this answer

Walk-forward validation (also known as time-series cross-validation) is specifically designed for time-dependent data. It trains on an expanding window of historical data and tests on the next period, respecting temporal order.

Full explanation →

825

MCQeasy

A company uses SageMaker Studio for collaborative ML development. The security team requires that all SageMaker Studio notebooks run within a VPC and cannot access the public internet. Which configuration should the administrator set?

A.Enable VPC-only mode for the SageMaker Studio domain

B.Use SageMaker notebook instances instead of Studio

C.Apply an SCP that denies internet access for all IAM users

D.Set the SageMaker Studio domain to use a public subnet with a NAT Gateway

AnswerA

This restricts all traffic to the VPC, no internet access.

Why this answer

Option A is correct because enabling VPC-only mode for the SageMaker Studio domain ensures that all Studio notebooks and apps are launched within the specified VPC and cannot access the public internet. This mode enforces that all network traffic, including internet-bound traffic, is routed through the VPC, and it blocks direct internet access by default, meeting the security team's requirement.

Exam trap

The trap here is that candidates may confuse VPC-only mode with simply using a private subnet, but VPC-only mode is a specific SageMaker Studio domain setting that explicitly blocks all internet access, whereas a private subnet alone could still allow outbound traffic via a NAT Gateway or VPC endpoint.

How to eliminate wrong answers

Option B is wrong because using SageMaker notebook instances instead of Studio does not inherently enforce VPC-only internet restrictions; notebook instances can still be configured with public internet access unless explicitly blocked via VPC settings. Option C is wrong because an SCP that denies internet access for all IAM users is an organization-level policy that does not directly control the network configuration of SageMaker Studio notebooks; it would affect user permissions but not the VPC routing or internet access of the Studio environment. Option D is wrong because setting the SageMaker Studio domain to use a public subnet with a NAT Gateway would actually provide outbound internet access to the notebooks, which violates the requirement that notebooks cannot access the public internet.

Full explanation →

Page 11 of 14

All pages

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Practice MLA-C01 by domain

Target a specific domain to shore up weak areas.

ML Model Development Data Preparation for Machine Learning Deployment and Orchestration of ML Workflows ML Solution Monitoring, Maintenance, and Security ML Solution Monitoring, Maintenance and Security

See all domains with question counts →

AWS Certified Machine Learning Engineer Associate MLA-C01 MLA-C01 Questions 751–825 | Page 11/14 | Courseiva