AWS Certified Machine Learning Engineer Associate MLA-C01 (MLA-C01) — Questions 451507

507 questions total · 7pages · All types, answers revealed

Page 6

Page 7 of 7

451
Multi-Selecthard

A data engineer is optimizing Amazon Athena queries on large datasets stored in S3 for machine learning data preparation. Which THREE practices improve query performance?

Select 3 answers
A.Partition the data by a frequently filtered column, such as date
B.Use uncompressed CSV files for simplicity
C.Partition the data by every column to maximize filtering
D.Store data in columnar formats like Parquet or ORC
E.Compress the data with Snappy or gzip
AnswersA, D, E

Partition pruning limits scanned data.

Why this answer

Partitioning by a frequently filtered column, such as date, allows Athena to use partition pruning. When a query includes a filter on the partition column, Athena can skip entire directories of data in S3, drastically reducing the amount of data scanned and improving query performance while also lowering cost.

Exam trap

AWS often tests the misconception that more partitions always improve performance, but in reality, over-partitioning leads to metastore overhead and small file problems that degrade query performance.

452
MCQeasy

Refer to the exhibit. A user has the above IAM policy attached but cannot access files in SageMaker Studio. What additional permission is most likely needed?

A.sagemaker:ListApps
B.s3:GetObject on the relevant S3 buckets
C.sagemaker:DescribeUserProfile
D.kms:Decrypt
AnswerB

To read files in Studio, the user must have S3 access permissions.

Why this answer

Option B is correct because SageMaker Studio users need S3 read/write permissions to access data files stored in S3 buckets. The policy only allows creating a presigned URL for Studio, but not S3 access. Option A is for listing apps, not files.

Option C is for user profiles. Option D is for KMS decryption if applicable, but not the most common cause.

453
MCQeasy

A machine learning engineer needs to deploy a TensorFlow model to Amazon SageMaker and wants to use the built-in TensorFlow Serving container. What should the engineer provide in the model archive?

A.A frozen graph of the TensorFlow model.
B.A tar.gz file containing the TensorFlow SavedModel.
C.Model artifacts and a Python inference script.
D.A Dockerfile and model artifacts.
AnswerB

SageMaker's TensorFlow serving container expects a SavedModel packaged as tar.gz.

Why this answer

The built-in TensorFlow Serving container in Amazon SageMaker expects a TensorFlow SavedModel packaged in a tar.gz archive. This is because TensorFlow Serving natively loads models from the SavedModel format, which includes the model's computational graph, weights, and assets in a standardized directory structure. Providing a tar.gz of the SavedModel ensures compatibility with the container's default serving stack without requiring custom inference code.

Exam trap

AWS often tests the misconception that a frozen graph (Option A) is sufficient for TensorFlow Serving, but the exam expects candidates to know that TensorFlow Serving specifically requires the SavedModel format with its directory structure, not just a single protobuf file.

How to eliminate wrong answers

Option A is wrong because a frozen graph (typically a .pb file) is a legacy TensorFlow format that lacks the full SavedModel structure (e.g., variables and assets), and TensorFlow Serving does not natively support frozen graphs as a direct input; it requires the SavedModel format. Option C is wrong because a Python inference script is unnecessary when using the built-in TensorFlow Serving container, which handles inference automatically via the SavedModel; custom scripts are only needed for bring-your-own-container scenarios. Option D is wrong because a Dockerfile is not part of the model archive; SageMaker's built-in containers are pre-built, and providing a Dockerfile would indicate a custom container approach, which contradicts the requirement to use the built-in TensorFlow Serving container.

454
MCQhard

A financial services company is building a fraud detection model using transactional data stored in Amazon S3. The data includes transaction_id, timestamp, amount, merchant_category, and fraud_label (0/1). The data is collected from multiple sources and has inconsistencies: timestamps are in different timezones (UTC and EST), merchant categories are sometimes misspelled (e.g., 'RESTAURANT', 'Restaurant', 'restaurant'), and the fraud_label is missing for about 5% of records. The data science team uses AWS Glue for ETL. They need to prepare a clean dataset for training. The final dataset must have consistent timestamps in UTC, standardized merchant categories, and no missing fraud labels. The team also wants to minimize data loss. Which set of actions should the team take?

A.Use AWS Glue to convert all timestamps to UTC, apply a mapping function to correct merchant category misspellings to a standard list, and drop records with missing fraud_label.
B.Use AWS Glue to convert timestamps to UTC, use a fuzzy matching algorithm to standardize merchant categories, and replace missing fraud_label with the mean value (0.05).
C.Use AWS Glue to convert timestamps to UTC, correct merchant categories by mapping known misspellings to correct names, and drop records with missing fraud_label.
D.Use AWS Glue to convert timestamps to UTC, use a mapping table to group similar merchant categories (e.g., all restaurant variants to 'Restaurant'), and impute missing fraud_label using mode (most frequent value).
AnswerD

Mode imputation preserves the majority class and avoids data loss, while timestamp conversion and category mapping clean the data correctly.

Why this answer

Option D is correct because it preserves data by imputing missing fraud labels using the mode (most frequent value), which is appropriate for a binary classification label where the majority class is likely 0. It also standardizes timestamps to UTC and uses a mapping table to group merchant category variants, ensuring consistency without data loss. Dropping records (as in A and C) would reduce the dataset size, and imputing with the mean (as in B) is invalid for a categorical label.

Exam trap

The trap here is that candidates often choose to drop missing values (options A and C) to avoid imputation complexity, not realizing that minimizing data loss is explicitly stated as a requirement, and that mode imputation is a standard technique for categorical labels in ML pipelines.

How to eliminate wrong answers

Option A is wrong because dropping records with missing fraud_label causes unnecessary data loss (5% of records) when imputation is feasible, and the mapping function for merchant categories is vague and not standardized. Option B is wrong because replacing missing fraud_label with the mean (0.05) is inappropriate for a binary categorical variable; mean imputation can introduce fractional values that are meaningless for classification. Option C is wrong because dropping records with missing fraud_label again causes data loss, and correcting merchant categories by mapping known misspellings is less robust than using a mapping table to group all variants, which better handles unseen misspellings.

455
Multi-Selecteasy

A company wants to deploy its trained model to edge devices such as cameras and IoT devices. The model must run efficiently with low latency and minimal memory footprint. Which THREE actions should the company take to prepare the model for edge deployment? (Choose THREE.)

Select 3 answers
A.Use SageMaker Edge Manager to package and manage the model on devices.
B.Quantize the model to reduce precision and memory footprint.
C.Increase the model's complexity to improve accuracy on edge devices.
D.Use SageMaker Neo to compile the model for the target edge hardware.
E.Deploy the model directly as a SageMaker endpoint and have the edge devices call it over the internet.
AnswersA, B, D

Edge Manager provides tools for model packaging, deployment, and monitoring on edge.

Why this answer

SageMaker Edge Manager is purpose-built to package, optimize, and manage machine learning models on edge devices. It provides model packaging, runtime monitoring, and over-the-air updates, ensuring the model runs efficiently with low latency and minimal memory footprint on resource-constrained hardware like cameras and IoT devices.

Exam trap

AWS often tests the misconception that edge deployment can rely on cloud endpoints for inference, but the correct approach is to optimize and run the model locally on the device to achieve low latency and offline operation.

456
Multi-Selectmedium

A team is training a deep learning model on Amazon SageMaker using a custom Docker container. Which three practices should they follow to optimize training performance? (Choose three.)

Select 3 answers
A.Store training data in Amazon S3 in a shuffled and compressed format
B.Use the largest instance type available
C.Increase the number of layers in the model to improve accuracy
D.Use SageMaker Managed Spot Training with checkpointing
E.Use Pipe mode to stream data instead of File mode
AnswersA, D, E

Shuffling prevents bias and compression reduces transfer time, improving training performance.

Why this answer

Storing training data in Amazon S3 in a shuffled and compressed format (Option A) optimizes training performance because shuffling prevents biased gradient updates during stochastic gradient descent, while compression reduces I/O overhead and network transfer time. SageMaker's Pipe mode can then stream this compressed data directly to the training algorithm without intermediate disk writes, further accelerating throughput.

Exam trap

AWS often tests the misconception that bigger instances always mean faster training, but the real optimization lies in data pipeline efficiency (e.g., Pipe mode, compression, and shuffling) and cost management (e.g., Managed Spot Training with checkpointing).

457
MCQeasy

A data scientist is preparing a dataset for training a binary classification model. The dataset has 100,000 rows and 50 features. The target variable is imbalanced, with only 5% positive cases. Which technique should the data scientist apply to address the class imbalance BEFORE training?

A.Principal Component Analysis (PCA) dimensionality reduction
B.Random oversampling of the minority class
C.Standard scaling of numerical features
D.One-hot encoding of categorical variables
AnswerB

Random oversampling is a valid technique to balance classes by replicating minority samples.

Why this answer

Random oversampling of the minority class (Option B) directly addresses the class imbalance by duplicating examples from the positive class until the class distribution is more balanced. This prevents the binary classification model from being biased toward the majority class, which is critical when only 5% of the 100,000 rows are positive cases. Oversampling is applied before training to ensure the model sees sufficient minority examples during learning.

Exam trap

AWS often tests whether candidates confuse data preprocessing techniques (scaling, encoding, dimensionality reduction) with methods that directly modify the class distribution, leading them to pick a plausible but irrelevant option like PCA or scaling.

How to eliminate wrong answers

Option A is wrong because PCA dimensionality reduction reduces the number of features but does not alter the class distribution; it would not fix the 5% imbalance and could even discard variance useful for separating the minority class. Option C is wrong because standard scaling normalizes numerical feature ranges but has no effect on the ratio of positive to negative samples; it addresses feature magnitude, not class imbalance. Option D is wrong because one-hot encoding converts categorical variables into binary columns but does not change the target variable's distribution; it is a preprocessing step for feature representation, not for balancing classes.

458
MCQhard

During deployment of a Hugging Face model, the endpoint logs show this error. Which step was likely missed?

A.The inference container does not include the transformers library; the team should use a pre-built Hugging Face container.
B.The IAM role does not have permissions to download additional libraries.
C.The model artifact was not packaged correctly; the inference script is missing.
D.The endpoint configuration specifies the wrong instance type.
AnswerA

Hugging Face containers are pre-built with transformers and other dependencies.

Why this answer

The error indicates that the inference container cannot find the `transformers` library, which is required to load and run the Hugging Face model. By using a pre-built Hugging Face container from AWS, the team ensures that all necessary dependencies (like `transformers`, `tokenizers`, and `torch`) are pre-installed and compatible with the SageMaker inference environment. Option A is correct because the most likely missed step was selecting a generic container instead of the purpose-built Hugging Face container.

Exam trap

The trap here is that candidates confuse runtime dependency issues (missing Python libraries) with infrastructure or configuration problems (IAM permissions, instance types, or packaging), leading them to select a plausible-sounding but incorrect option like B or C.

How to eliminate wrong answers

Option B is wrong because IAM role permissions control access to AWS services (e.g., S3, ECR) and cannot prevent the container from downloading Python libraries at runtime; missing libraries are a container image issue, not an IAM issue. Option C is wrong because the error message specifically mentions a missing Python module (`transformers`), not a missing inference script or packaging error; if the inference script were missing, the error would be about a missing entry point or handler function. Option D is wrong because the instance type affects compute capacity and pricing, not the availability of Python libraries inside the container; an incorrect instance type would cause resource errors (e.g., memory or GPU), not an `ImportError`.

459
MCQmedium

A company is training a deep learning model on Amazon SageMaker. The training job started but has been stuck in 'InProgress' state for an unusually long time with low CPU utilization. The data scientist suspects a bottleneck. What should be the first troubleshooting step?

A.Switch the training job to use Spot instances to reduce cost and potentially improve throughput.
B.Increase the number of training instances to parallelize data loading.
C.Stop and restart the training job with a different instance type.
D.Review CloudWatch Logs for the training container to identify errors or warnings.
AnswerD

Logs often show the exact cause of hanging, such as waiting for data or resource constraints.

Why this answer

Option A is correct because checking CloudWatch logs for the training container can reveal errors like resource limits or data loading issues. Option B is wrong because restarting without investigation may waste time. Option C is wrong because using Spot instances does not address the stuck job.

Option D is wrong because increasing instance count may not help if the bottleneck is elsewhere.

460
MCQmedium

A team notices that inference requests to their SageMaker endpoint are failing with '504 Gateway Timeout' for large payloads. What change should be made?

A.Enable data capture on the endpoint
B.Increase the endpoint's invocation timeout
C.Deploy a shadow endpoint for testing
D.Switch to a multi-model endpoint
AnswerB

Increasing the invocation timeout allows more time for large payloads to be processed.

Why this answer

A 504 Gateway Timeout indicates that the SageMaker endpoint's invocation timeout (default 60 seconds) was exceeded while processing a large payload. Increasing the invocation timeout allows the endpoint more time to complete inference for large payloads, resolving the timeout error.

Exam trap

The trap here is that candidates confuse a 504 timeout with a 413 payload too large error, leading them to incorrectly consider multi-model endpoints or data capture instead of adjusting the invocation timeout.

How to eliminate wrong answers

Option A is wrong because enabling data capture logs inference requests and responses but does not affect the endpoint's timeout behavior or ability to handle large payloads. Option C is wrong because deploying a shadow endpoint is used for A/B testing or canary deployments, not for resolving timeout issues on the existing endpoint. Option D is wrong because switching to a multi-model endpoint improves resource utilization for multiple models but does not change the per-invocation timeout limit.

461
MCQmedium

A company deploys a model on Amazon SageMaker for real-time inference. The inference latency is too high. The model is a large deep learning model. The company wants to reduce latency without significantly impacting accuracy. Which approach should the company consider?

A.Increase the batch size for inference.
B.Use a smaller instance type to reduce inference time.
C.Use SageMaker Inference Recommender to test different instance types and optimizations.
D.Enable SageMaker Model Monitor to detect performance issues.
AnswerC

Inference Recommender helps find the optimal configuration for low latency.

Why this answer

SageMaker Inference Recommender is designed specifically to automate load testing and benchmarking across various instance types and model optimizations (e.g., Elastic Inference, GPU acceleration, serialization formats). It provides latency and throughput metrics to identify the optimal configuration for reducing inference latency while maintaining accuracy, making it the correct choice for a large deep learning model with high latency.

Exam trap

AWS often tests the misconception that reducing instance size or increasing batch size directly reduces latency, when in fact these actions typically increase latency or degrade throughput for real-time inference.

How to eliminate wrong answers

Option A is wrong because increasing batch size typically increases throughput but also increases per-request latency, as the model must process more data before returning results, which is counterproductive for real-time inference. Option B is wrong because using a smaller instance type generally reduces computational capacity, leading to longer inference times and higher latency, not lower. Option D is wrong because SageMaker Model Monitor is for detecting data drift, model quality degradation, and bias over time, not for optimizing inference performance or reducing latency.

462
Multi-Selecthard

A data scientist is cleaning a text dataset for natural language processing. The raw data contains HTML tags, URLs, and special characters. Which THREE steps should be taken to preprocess the text data? (Choose 3.)

Select 3 answers
A.Convert all text to lowercase
B.Encode the text using one-hot encoding
C.Remove HTML tags using a regular expression
D.Perform stemming or lemmatization
E.Remove stop words
AnswersA, C, D

Lowercasing standardizes text and reduces vocabulary size.

Why this answer

Converting all text to lowercase (Option A) is a standard text normalization step in NLP preprocessing. It reduces the vocabulary size by treating words like 'Apple' and 'apple' as the same token, which helps downstream models avoid treating case variations as distinct features. This is typically done early in the pipeline before tokenization or vectorization.

Exam trap

AWS often tests the distinction between preprocessing steps that clean raw data (like removing HTML tags and normalizing case) versus later feature engineering steps (like encoding or stop word removal), causing candidates to mistakenly select stop word removal as a cleaning step when it is actually a filtering step applied after tokenization.

463
Multi-Selecthard

A data scientist is building a text classification model using Amazon SageMaker. The dataset is stored as a CSV file in Amazon S3. The scientist wants to use the SageMaker built-in BlazingText algorithm. Which of the following steps are required to prepare the data for training? (Choose TWO.)

Select 2 answers
A.Convert the text to one-hot encoded vectors.
B.Tokenize and remove stop words from the text.
C.Convert the CSV file to the format of a single file with one instance per line.
D.Upload the data to an Amazon SageMaker notebook instance.
E.Ensure each line in the training file contains a single text instance with the label prefixed by '__label__'.
AnswersC, E

BlazingText expects a single file with one instance per line.

Why this answer

Option C is correct because BlazingText expects input data in a single file where each line represents one training instance. This is a specific requirement of the algorithm's input format, not a general SageMaker practice. The CSV file must be converted to this line-per-instance format for BlazingText to process it correctly.

Exam trap

The trap here is that candidates assume general NLP preprocessing (like tokenization or stop word removal) is always required, but BlazingText is designed to handle raw text and expects a specific line format, not preprocessed vectors.

464
MCQeasy

Which technique is commonly used to handle missing values in a categorical feature?

A.One-hot encoding
B.Mean imputation
C.Mode imputation
D.Standard scaling
AnswerC

Mode imputation replaces missing categorical values with the most frequent category, a common practice.

Why this answer

Mode imputation (replacing missing values with the most frequent category) is a standard method for categorical data. Mean imputation is for numerical data, standard scaling is for feature scaling, and one-hot encoding encodes categories without handling missing values.

465
MCQhard

An MLOps engineer is building an automated retraining pipeline for a fraud detection model. The model must be retrained weekly, and the new model should only be promoted to production if it meets predefined performance thresholds compared to the current model. Which combination of SageMaker capabilities should the engineer use?

A.Amazon SageMaker Debugger and Amazon SageMaker Clarify
B.Amazon SageMaker Model Monitor and Amazon SageMaker Ground Truth
C.Amazon SageMaker Autopilot and Amazon SageMaker Experiments
D.Amazon SageMaker Pipelines and Amazon SageMaker Model Registry
AnswerD

Pipelines orchestrate the workflow, Model Registry manages model versions and approvals.

Why this answer

Option D is correct because Amazon SageMaker Pipelines provides the orchestration for the automated retraining workflow (including weekly scheduling and conditional logic), while SageMaker Model Registry enables versioning, approval, and promotion of models based on performance thresholds. Together, they allow the engineer to define a pipeline that trains a new model, evaluates it against the current production model, and only registers it for deployment if it meets the predefined criteria.

Exam trap

AWS often tests the distinction between monitoring tools (Model Monitor, Debugger) and orchestration/registry services (Pipelines, Model Registry), so the trap here is that candidates may confuse Model Monitor's drift detection with the need for a retraining pipeline, overlooking that the question specifically requires automated retraining and conditional promotion.

How to eliminate wrong answers

Option A is wrong because SageMaker Debugger monitors training metrics and detects anomalies (e.g., vanishing gradients), but it does not orchestrate retraining pipelines or manage model promotion. SageMaker Clarify is used for bias detection and feature importance, not for automated retraining workflows. Option B is wrong because SageMaker Model Monitor detects data drift in production, not for retraining orchestration, and SageMaker Ground Truth is a labeling service for creating training datasets, not for pipeline automation or model promotion.

Option C is wrong because SageMaker Autopilot automates model building (feature engineering, algorithm selection) but does not provide pipeline orchestration or model registry capabilities for conditional promotion; SageMaker Experiments tracks trial runs but lacks the workflow automation and approval gates needed for this use case.

466
MCQmedium

A company uses Amazon SageMaker to train and deploy a machine learning model. After deployment, they notice that the model's accuracy drops significantly over time due to changes in the underlying data distribution. Which monitoring solution should they implement to detect this issue automatically?

A.Set up Amazon SageMaker Model Monitor with data quality monitoring.
B.Configure AWS Config rules to check the model accuracy metric.
C.Use AWS CloudTrail to monitor changes to the model's S3 bucket.
D.Enable Amazon CloudWatch Logs on the endpoint and set alarms on inference latency.
AnswerA

SageMaker Model Monitor automatically detects drift in data quality and model quality.

Why this answer

Option D is correct because Amazon SageMaker Model Monitor can monitor data quality and model quality drift. Option A (CloudWatch Logs) is for logs, not drift detection. Option B (CloudTrail) tracks API calls.

Option C (AWS Config) tracks resource configuration.

467
MCQeasy

A team uses SageMaker for training. They need to monitor training progress and view metrics like loss and accuracy. Which SageMaker feature should they use?

A.SageMaker Ground Truth
B.SageMaker Debugger
C.SageMaker Model Monitor
D.SageMaker Experiments
AnswerB

Debugger can output tensors and metrics during training for real-time monitoring.

Why this answer

SageMaker Debugger captures real-time training metrics and provides alerts, making it ideal for monitoring progress. Other features serve different purposes.

468
MCQeasy

A data scientist is preparing a dataset for a binary classification model. The dataset has 10,000 records with 100 features. The target variable is imbalanced, with 95% negative class and 5% positive class. Which data preparation step should the data scientist take to address the imbalance before training?

A.Normalize all features to a 0-1 range
B.Use cross-validation to handle imbalance
C.Remove enough instances of the negative class to achieve balance
D.Apply SMOTE to oversample the positive class
AnswerD

SMOTE generates synthetic samples for the minority class, effectively balancing the dataset.

Why this answer

Option D is correct because SMOTE (Synthetic Minority Oversampling Technique) generates synthetic samples for the minority class (positive class, 5%) by interpolating between existing minority instances. This addresses the severe class imbalance (95:5) without discarding data, allowing the model to learn decision boundaries for the minority class more effectively than simple duplication.

Exam trap

AWS often tests the misconception that any data preprocessing step (like normalization or cross-validation) can fix class imbalance, when in fact only resampling techniques (oversampling, undersampling, or synthetic generation) directly alter the class distribution.

How to eliminate wrong answers

Option A is wrong because normalizing features to a 0-1 range addresses feature scaling, not class imbalance; it does not change the class distribution. Option B is wrong because cross-validation is a model evaluation technique that helps assess performance but does not modify the training data to correct imbalance; it would still train on the imbalanced dataset. Option C is wrong because removing instances of the negative class (random undersampling) discards potentially valuable data, which can lead to loss of information and reduced model performance, especially when the negative class represents 95% of the data.

469
MCQeasy

A data scientist wants to evaluate the performance of a binary classification model. The dataset is highly imbalanced with only 5% positive class. Which metric should be used to evaluate the model?

A.Accuracy
B.Mean Squared Error
C.R-squared
D.F1-score
AnswerD

F1-score considers both precision and recall, providing a balanced measure for imbalanced classes.

Why this answer

F1-score balances precision and recall, making it suitable for imbalanced datasets. Accuracy can be misleading (e.g., 95% if always predicting negative). Mean Squared Error and R-squared are for regression.

470
MCQmedium

A model deployed on a SageMaker endpoint is returning predictions. The team wants to log all predictions to an S3 bucket for auditing. What is the most efficient way to achieve this?

A.Enable SageMaker endpoint data capture to the S3 bucket.
B.Configure CloudWatch Logs to export to S3.
C.Modify the inference code to write logs to S3.
D.Use Amazon Kinesis Data Firehose to stream predictions to S3.
AnswerA

Data capture is built-in and efficient.

Why this answer

SageMaker data capture is designed for this purpose and can be enabled on the endpoint configuration to automatically capture input and output data to S3. Modifying inference code is custom and less efficient, Firehose adds complexity, and CloudWatch Logs export is for logs.

471
MCQhard

Refer to the exhibit. An AWS IAM policy is attached to a role used by a CI/CD pipeline to deploy SageMaker endpoints. The pipeline attempts to create an endpoint configuration with a VPC subnet that is not subnet-0123456789abcdef0. What will happen when the pipeline tries to create the endpoint configuration?

A.The action will be denied because the Deny statement explicitly blocks CreateEndpointConfig when the subnet does not match.
B.The action will be allowed because the CreateEndpoint statement allows all endpoints.
C.The action will be allowed only if the endpoint configuration uses a VPC with multiple subnets.
D.The action will be allowed because the policy lacks a Deny on the subnet condition for the endpoint resource.
AnswerA

An explicit Deny overrides any Allow, and the condition is not met.

Why this answer

Option A is correct because the IAM policy includes a Deny statement with a condition that explicitly blocks the `CreateEndpointConfig` action when the subnet specified in the request does not match `subnet-0123456789abcdef0`. Since the pipeline is attempting to create an endpoint configuration with a different subnet, the Deny statement overrides any Allow statements, resulting in the action being denied.

Exam trap

The trap here is that candidates may assume an Allow statement for `CreateEndpoint` would permit the action, but they overlook that an explicit Deny on the `CreateEndpointConfig` action with a subnet condition takes precedence, causing the request to fail.

How to eliminate wrong answers

Option B is wrong because the policy contains a Deny statement that explicitly restricts the subnet condition for `CreateEndpointConfig`, so the Allow on `CreateEndpoint` does not override the Deny; IAM Deny statements always take precedence. Option C is wrong because the policy does not grant any special permission for multiple subnets; the Deny condition applies regardless of the number of subnets used. Option D is wrong because the policy does include a Deny on the subnet condition for the `sagemaker:CreateEndpointConfig` action, not for the endpoint resource, so the action is blocked.

472
MCQhard

A machine learning team wants to detect bias in a binary classification model before deployment. They use SageMaker Clarify. Which type of bias metric should they compute to understand whether the model treats different demographic groups unfairly in predictions?

A.SHAP (SHapley Additive exPlanations) values from the test predictions.
B.A re-run of the training job with a fairness constraint.
C.Pre-training bias metrics like Class Imbalance (CI) and Difference in Positive Proportions in Labels (DPPL).
D.Feature importance values after training.
AnswerC

Pre-training metrics identify bias in the training data that could lead to unfair models.

Why this answer

Option B is correct because pre-training bias metrics (such as class imbalance, Kullback-Leibler divergence) reveal data bias before modeling, while post-training metrics assess prediction bias. Option A is wrong because feature importance explains model behavior but not bias. Option C is wrong because SHAP values are for model interpretability, not bias metrics per se.

Option D is wrong because retraining does not detect bias.

473
MCQhard

A machine learning team is processing a large dataset in Amazon SageMaker using a processing job. The data is stored in S3 in CSV format. The team wants to split the data into training, validation, and test sets (70/20/10) while ensuring that the distribution of a categorical feature 'region' is preserved across splits. Which SageMaker SDK method should they use to write the output?

A.Use sagemaker.sklearn.processing.SKLearnProcessor with a script that uses sklearn's StratifiedShuffleSplit
B.Use sagemaker.xgboost.processing.XGBoostProcessor with a script that uses random split
C.Use sagemaker.processing.Processor.run() with a custom script that uses train_test_split
D.Use sagemaker.processing.FrameworkProcessor with a script that uses pandas.sample
AnswerA

StratifiedShuffleSplit ensures the 'region' distribution is maintained across splits.

Why this answer

Option A is correct because `SKLearnProcessor` allows you to run a custom Python script that uses `sklearn.model_selection.StratifiedShuffleSplit`, which preserves the distribution of the categorical 'region' feature across the training, validation, and test splits. This is the only option that directly supports stratified splitting within a SageMaker processing job, ensuring the 70/20/10 ratio while maintaining class balance.

Exam trap

The trap here is that candidates often confuse generic processing methods (like `Processor.run()` or `FrameworkProcessor`) with the specific processor that supports stratified splitting, or they assume `train_test_split` with a random state is sufficient for preserving categorical distributions, ignoring the need for stratification.

How to eliminate wrong answers

Option B is wrong because `XGBoostProcessor` is designed for XGBoost-specific preprocessing (e.g., converting CSV to libsvm) and does not natively support stratified splitting or custom scripts for data partitioning. Option C is wrong because `Processor.run()` is a generic method that executes a processing job, but it does not provide built-in stratified splitting; using `train_test_split` alone would perform a random split, not preserving the 'region' distribution. Option D is wrong because `FrameworkProcessor` is a generic base class for custom frameworks, and `pandas.sample` performs random sampling without stratification, failing to maintain the categorical feature distribution across splits.

474
MCQmedium

A healthcare company is building a model to predict patient readmission rates. The dataset contains a mix of numeric features (age, blood pressure, lab test results) and categorical features (gender, diagnosis code, hospital department). The dataset has 2 million rows. The data is stored in an Amazon S3 bucket, and they use AWS Glue to catalog and preprocess the data. The data scientist notices that the 'diagnosis_code' column has 10,000 unique codes, and 20% of the rows have missing values for 'blood_pressure'. They plan to use a SageMaker built-in XGBoost model. For optimal model performance, which preprocessing steps should they apply using AWS Glue ETL?

A.Impute missing 'blood_pressure' with the mean, and apply label encoding to 'diagnosis_code'.
B.Impute missing 'blood_pressure' with median, and apply integer encoding to 'diagnosis_code'.
C.Replace missing 'blood_pressure' with -1 and apply one-hot encoding to 'diagnosis_code' after grouping rare codes into 'other'.
D.Apply one-hot encoding to 'diagnosis_code' and drop rows with missing 'blood_pressure'.
AnswerB

Median is robust; integer encoding is sufficient for tree-based models like XGBoost.

Why this answer

Option B is correct because XGBoost handles missing values natively, so median imputation for 'blood_pressure' is robust to outliers and preserves data distribution, while integer encoding (label encoding) for 'diagnosis_code' with 10,000 unique values is efficient and avoids the dimensionality explosion of one-hot encoding. AWS Glue ETL can apply these transformations using built-in functions like `Imputer` and `StringIndexer` without excessive memory overhead.

Exam trap

The trap here is that candidates overestimate the need for one-hot encoding with high-cardinality categorical features, forgetting that tree-based models like XGBoost can effectively use integer encoding, and they may also default to mean imputation without considering outlier sensitivity.

How to eliminate wrong answers

Option A is wrong because mean imputation for 'blood_pressure' is sensitive to outliers, which can skew the model, and label encoding is a form of integer encoding but the term 'label encoding' often implies ordinal mapping that may introduce unintended ordinal relationships; however, the primary flaw is the mean imputation choice. Option C is wrong because replacing missing 'blood_pressure' with -1 introduces an arbitrary value that XGBoost may misinterpret as a valid numeric pattern, and one-hot encoding 'diagnosis_code' with 10,000 categories (even after grouping rare codes) still creates a very high-dimensional sparse matrix that degrades performance and increases memory usage in Glue ETL. Option D is wrong because dropping 20% of rows with missing 'blood_pressure' leads to significant data loss and potential bias, and one-hot encoding 'diagnosis_code' with 10,000 categories is computationally prohibitive and unnecessary for tree-based models like XGBoost.

475
Multi-Selectmedium

A company deploys a model on SageMaker that serves predictions to a web application. The model's performance degrades over time due to data drift. The company wants to set up continuous monitoring. Which TWO actions should the company take to monitor and retrain the model effectively? (Choose TWO.)

Select 2 answers
A.Manually review model performance monthly and retrain if necessary.
B.Configure an Amazon EventBridge rule to start a retraining pipeline when the Model Monitor detects violations.
C.Enable SageMaker Model Monitor to capture inference data and run monitoring schedules.
D.Use Amazon CloudWatch Logs Insights to query inference logs for anomalies.
E.Deploy the model on multiple endpoints with A/B testing to compare performance.
AnswersB, C

EventBridge can react to Model Monitor violation events to trigger automatic retraining.

Why this answer

Option B is correct because Amazon EventBridge can be configured to trigger a retraining pipeline automatically when SageMaker Model Monitor detects data drift or other violations, enabling a closed-loop monitoring and retraining system. Option C is correct because SageMaker Model Monitor must first be enabled to capture inference data and run monitoring schedules, which is the prerequisite for detecting drift and triggering automated actions.

Exam trap

The trap here is that candidates may confuse general monitoring tools like CloudWatch Logs Insights with the specialized, model-aware monitoring capabilities of SageMaker Model Monitor, or they may overlook that EventBridge automation requires Model Monitor to be enabled first.

476
Multi-Selectmedium

A data scientist is building a text classification model using Amazon SageMaker. The dataset is large and includes imbalanced classes. Which three techniques can help improve model performance? (Choose three.)

Select 3 answers
A.Performing feature extraction using TF-IDF
B.Using cost-sensitive learning
C.Oversampling the minority class
D.Using a linear classifier only
E.Using SMOTE
AnswersB, C, E

Assigns higher misclassification costs to the minority class, improving performance.

Why this answer

Oversampling, SMOTE, and cost-sensitive learning are standard approaches to handle class imbalance. Using a linear classifier only is limiting, and TF-IDF is a feature extraction method that does not address imbalance directly.

477
MCQmedium

Your company uses SageMaker batch transform to process a large dataset (5 TB) of customer transactions every night. The batch transform job uses a single ml.c5.4xlarge instance and takes about 6 hours to complete. However, the job recently started failing with an error message: 'Timed out waiting for transformation to complete. The maximum job duration is 3600 seconds.' You check the input data and notice that one of the input files is a single large JSON file of 50 GB, while the rest are smaller files. The job is configured with a batch strategy of 'MultiRecord' and a maximum payload size of 6 MB. What is the most likely cause of the timeout and which fix should you apply?

A.Set the batch strategy to 'SingleRecord' so that each record is processed individually.
B.Split the large JSON file into smaller files (e.g., 100 MB each) before feeding to the batch transform job.
C.Increase the job timeout to 7200 seconds.
D.Increase the number of instances to 5 in the batch transform job.
AnswerB

SageMaker batch transform splits input on file boundaries; small files allow parallel processing and stay within time limits.

Why this answer

The batch transform job is timing out because the single 50 GB JSON file cannot be processed within the default 3600-second (1-hour) timeout. With a 'MultiRecord' batch strategy and a 6 MB maximum payload size, SageMaker must split the large file into many small batches, but the job still tries to read the entire file sequentially, causing excessive processing time. Splitting the large file into smaller files (e.g., 100 MB each) allows SageMaker to parallelize and complete the transform within the timeout.

Exam trap

AWS often tests the misconception that increasing instances or timeout alone can solve performance bottlenecks caused by a single large input file, when in fact SageMaker batch transform processes each file on a single instance and requires file-level splitting for parallelism.

How to eliminate wrong answers

Option A is wrong because setting the batch strategy to 'SingleRecord' would process each record individually, which would increase the number of API calls and likely worsen the timeout issue, not resolve it. Option C is wrong because increasing the job timeout to 7200 seconds only masks the underlying problem of the oversized file; the job may still fail due to resource constraints or eventually hit other limits. Option D is wrong because increasing the number of instances does not help when a single massive file cannot be split across instances—SageMaker batch transform assigns each file to a single instance, so the 50 GB file would still be processed by one instance, causing the same timeout.

478
MCQmedium

A company is deploying a large number of small models (each < 100 MB) for different customers. They want to minimize costs and management overhead while serving traffic that varies significantly. Which SageMaker endpoint type should they choose?

A.A batch transform job
B.A multi-model endpoint on a GPU instance
C.A multi-variant endpoint to route traffic to different model versions
D.A serverless endpoint
AnswerB

MME allows hosting many models on one instance, reducing costs.

Why this answer

A multi-model endpoint (MME) on a GPU instance is the best choice because it allows you to host multiple small models (< 100 MB each) on a single endpoint, sharing the underlying GPU instance to reduce costs. SageMaker MME dynamically loads and unloads models based on traffic, which minimizes management overhead and handles variable traffic patterns efficiently without provisioning separate endpoints per model.

Exam trap

The trap here is that candidates confuse 'multi-model endpoint' (hosting many models on one endpoint) with 'multi-variant endpoint' (routing traffic to different versions of the same model), leading them to select option C incorrectly.

How to eliminate wrong answers

Option A is wrong because batch transform jobs are designed for offline, asynchronous inference on large datasets, not for serving real-time traffic that varies significantly. Option C is wrong because a multi-variant endpoint is used to route traffic between different versions (variants) of the same model for A/B testing or gradual rollouts, not to host multiple distinct models per customer. Option D is wrong because serverless endpoints automatically scale to zero but have a maximum payload size of 6 MB and a maximum invocation duration of 60 seconds, making them unsuitable for GPU-accelerated inference or models that require GPU instances.

479
Multi-Selecthard

Which TWO tools are specifically designed for debugging and analyzing training jobs in SageMaker?

Select 2 answers
A.SageMaker Autopilot
B.SageMaker Experiments
C.SageMaker Debugger
D.SageMaker Clarify
E.SageMaker Model Monitor
AnswersB, C

Experiments organizes training runs for analysis and comparison.

Why this answer

SageMaker Debugger provides real-time monitoring and debugging of training jobs, and SageMaker Experiments helps track and compare runs. Model Monitor is for deployed endpoints, Clarify for bias, and Autopilot for automated model creation.

480
MCQhard

A data scientist is using Amazon SageMaker Debugger to monitor training metrics. They want to stop training automatically if the model is overfitting. Which action should they take?

A.Define a Debugger rule that monitors the loss plateau
B.Configure a custom rule that triggers a STOP training action when validation loss stops decreasing
C.Create a SageMaker Training Compiler
D.Use a built-in rule that checks for vanishing gradients
AnswerB

A custom rule can monitor validation loss and stop training when it plateaus or increases, indicating overfitting.

Why this answer

SageMaker Debugger allows custom rules with actions like STOP training. A built-in rule for overfitting may not exist, so a custom rule is needed. The rule should check if validation loss stops decreasing (plateau) or starts increasing, and trigger STOP.

Other options monitor different issues.

481
MCQmedium

Refer to the exhibit. A data scientist receives an AccessDenied error when trying to create a training job using SageMaker. What is the most likely cause?

A.Missing s3:PutObject permission
B.Missing sagemaker:CreateTrainingJob permission
C.Missing sagemaker:DescribeTrainingJob permission
D.Using wrong AWS region
AnswerA

Training jobs require put access to S3 for outputs and logs.

Why this answer

The policy allows sagemaker:CreateTrainingJob and s3:GetObject, but training jobs also need to write logs and output to S3 (s3:PutObject). The missing s3:PutObject permission causes the AccessDenied error.

482
MCQeasy

A data engineer notices that an AWS Glue ETL job is failing with an Out of Memory error when processing a large dataset. The dataset is 500 GB in size, and the worker type is G.1X. Which change is MOST likely to resolve the issue?

A.Partition the input data into smaller files
B.Use a Spark DataFrame instead of RDD
C.Increase the number of workers
D.Use a larger worker type like G.2X
AnswerD

G.2X provides double the memory of G.1X, resolving the OOM.

Why this answer

The G.1X worker type provides 16 GB of memory per worker. A 500 GB dataset requires sufficient aggregate memory across workers for processing. Increasing the worker type to G.2X (which doubles memory to 32 GB per worker) increases the memory per executor, allowing each task to handle larger data partitions without running out of memory.

This directly addresses the Out of Memory error by providing more heap space for Spark operations.

Exam trap

The trap here is that candidates often assume adding more workers (scaling out) always solves memory issues, but the real bottleneck is per-executor memory, which is only addressed by using a larger worker type (scaling up).

How to eliminate wrong answers

Option A is wrong because partitioning input data into smaller files does not increase the available memory per worker; it only changes how data is read and may reduce parallelism but does not resolve an OOM caused by insufficient executor memory. Option B is wrong because using a Spark DataFrame instead of RDD does not inherently reduce memory usage; DataFrames use Catalyst optimizer and Tungsten execution for better performance, but they still operate within the same memory constraints and will OOM if memory per worker is insufficient. Option C is wrong because increasing the number of workers distributes the data across more executors but does not increase the memory per executor; if each executor still has only 16 GB, a single large partition or shuffle operation can still cause OOM on an individual executor.

483
Multi-Selectmedium

Which THREE components are required to set up automated model retraining in response to performance degradation using Amazon SageMaker? (Select THREE.)

Select 3 answers
A.An Amazon SNS topic with a subscription to send a manual approval email.
B.A CloudWatch alarm that triggers when a quality metric falls below a threshold.
C.A SageMaker Model Monitor schedule to capture inference data and compute quality metrics.
D.An AWS Lambda function that starts a SageMaker training job or pipeline execution.
E.A production variant with a canary traffic shift configuration.
AnswersB, C, D

The alarm detects degradation and triggers the retraining.

Why this answer

Option B is correct because a CloudWatch alarm can monitor a SageMaker Model Monitor quality metric (e.g., accuracy, precision) and trigger an alarm when the metric falls below a defined threshold. This alarm acts as the event source to initiate automated retraining, forming the monitoring and alerting backbone of the retraining pipeline.

Exam trap

The trap here is that candidates often confuse the monitoring and alerting components (CloudWatch alarm and Model Monitor) with deployment or notification mechanisms, mistakenly selecting manual approval (SNS) or traffic shifting (canary) as part of the automated retraining workflow.

484
MCQmedium

A data science team has trained a PyTorch model using Amazon SageMaker and wants to deploy it with a custom inference container that includes a pre-processing step. The team needs to minimize latency and ensure the pre-processing runs only once per request. Which SageMaker real-time inference option should they use?

A.Deploy the model on a multi-model endpoint and include pre-processing in the model code.
B.Use a batch transform job with a pre-processing script.
C.Package pre-processing and inference in a single container with a custom entry point.
D.Create a SageMaker inference pipeline with two containers: one for pre-processing and one for inference.
AnswerD

An inference pipeline chains containers sequentially, allowing pre-processing to run once per request with low latency.

Why this answer

Option D is correct because a SageMaker inference pipeline allows you to chain two containers in a single endpoint, where the first container handles pre-processing and the second runs inference. This ensures that pre-processing runs exactly once per request, minimizing latency by avoiding redundant processing and keeping the request within the same HTTP connection.

Exam trap

AWS often tests the distinction between a single-container approach (Option C) and a multi-container pipeline (Option D), where candidates mistakenly think a single custom container is simpler and sufficient, but the pipeline is required to guarantee that pre-processing runs exactly once per request and to allow independent scaling or updates of the pre-processing logic.

How to eliminate wrong answers

Option A is wrong because a multi-model endpoint hosts multiple models on the same container, but it does not support a separate pre-processing step; any pre-processing would be embedded in the model code and run per model load, not once per request, and it cannot guarantee a separate container for pre-processing. Option B is wrong because a batch transform job is designed for asynchronous, offline processing of large datasets, not for real-time inference with low latency requirements. Option C is wrong because packaging pre-processing and inference in a single container with a custom entry point runs both steps sequentially per request, but it does not leverage SageMaker's built-in pipeline orchestration, and if the pre-processing logic changes, the entire container must be rebuilt, whereas a pipeline allows independent updates.

485
MCQhard

A financial services company has a SageMaker pipeline that trains a fraud detection model daily. The pipeline consists of three steps: preprocessing (using a Spark script), training (XGBoost), and evaluation. The evaluation step calculates the F1 score and compares it to a threshold of 0.95. If the F1 score is below 0.95, the pipeline should fail and notify the team via email. The team implemented this using a Condition step that checks if the F1 score is greater than or equal to 0.95. If true, the pipeline proceeds to register the model; if false, the pipeline fails. However, the team notices that even when the F1 score is 0.94, the pipeline continues to the registration step. The evaluation script outputs the F1 score as a float with two decimal places in a JSON file. The Condition step uses the expression: $.evaluation.metrics.f1_score >= 0.95. What is the most likely cause of the issue?

A.The evaluation step must be split into two steps: one for evaluation and one for condition check
B.The evaluation script outputs the F1 score as a string, and string comparison '0.94' >= '0.95' evaluates to true because it is lexicographically compared
C.The Condition step cannot be used to check metric values; it can only check step status
D.The threshold should be set to 0.95 but the Condition step uses a less than or equal operator
AnswerB

If the F1 score is a string, the comparison may be lexicographic; '0.94' is not >= '0.95' lexicographically, but the actual cause could be that the script outputs the score as a string and the condition fails to parse it as a number, causing unexpected behavior. The most likely fix is to ensure numeric output.

Why this answer

The most likely cause is that the evaluation script outputs the F1 score as a string (e.g., "0.94") rather than a numeric value. In AWS SageMaker Pipelines, the Condition step evaluates expressions using JSONPath, and when comparing two values, if one is a string, the comparison is performed lexicographically (character by character). Lexicographically, the string "0.94" is considered greater than or equal to "0.95" because '9' > '5' after the decimal point, causing the condition to pass incorrectly.

Exam trap

AWS often tests the subtle distinction between numeric and string comparisons in AWS Step Functions and SageMaker Pipelines, where candidates assume that a value that looks like a number will be compared numerically, but the actual behavior depends on the data type in the JSON output.

How to eliminate wrong answers

Option A is wrong because splitting the evaluation step into two steps would not fix the root cause—the issue is a data type mismatch, not a step separation problem. Option C is wrong because the Condition step can absolutely check metric values using JSONPath expressions; it is not limited to checking step status. Option D is wrong because the operator used (>=) is correct for the intended logic (pass if F1 >= 0.95); the issue is that the comparison is lexicographic due to string values, not that the operator is wrong.

486
MCQmedium

Refer to the exhibit. A data engineer runs a Glue ETL job that uses a Python script. The job fails because of a missing module `scikit-learn`. Which fix is MOST appropriate?

A.Modify the script to install scikit-learn using pip at runtime
B.Add a --additional-python-modules argument to the job with scikit-learn
C.Switch to a Glue job using Spark instead of Python
D.Use a Glue Python shell job instead
AnswerD

Python shell jobs allow pip install at runtime and are suitable for scripts that need custom modules. However, they are not designed for heavy ETL. The correct answer is A; let me fix the responses. I'll swap: make A correct, B wrong. Actually, the best for ETL is to add a requirements file or use --additional-python-modules. So I'll set A as correct.

Why this answer

Option D is correct because a Glue Python shell job includes pre-installed libraries like scikit-learn, eliminating the missing module error without additional configuration. This job type is designed for lightweight Python scripts that do not require the distributed processing of Spark, making it the most appropriate fix for a simple dependency issue.

Exam trap

The trap here is that candidates assume all Glue jobs require Spark or that pip install at runtime is a valid workaround, but the exam expects you to recognize that Glue Python shell jobs are purpose-built for simple Python scripts and come with pre-installed ML libraries like scikit-learn.

How to eliminate wrong answers

Option A is wrong because modifying the script to install scikit-learn at runtime using pip is inefficient, may fail due to network restrictions or permission issues in the Glue environment, and violates best practices for dependency management. Option B is wrong because the --additional-python-modules argument is used with Glue Spark jobs, not Python shell jobs, and it requires specifying a compatible module version; it does not apply to the Python shell job type. Option C is wrong because switching to a Spark-based Glue job is an overengineered solution that introduces unnecessary complexity and cost for a simple Python script that does not require distributed data processing.

487
Multi-Selecteasy

A data science team is deploying a model on Amazon SageMaker and wants to protect the endpoint from unauthorized access. Which TWO methods can the team use to secure the endpoint? (Choose TWO.)

Select 2 answers
A.Configure the endpoint to be deployed within a VPC and control traffic using security groups and network ACLs.
B.Use a resource-based IAM policy on the endpoint to restrict invocation.
C.Place an Amazon API Gateway in front of the endpoint with AWS WAF.
D.Attach a security group directly to the SageMaker endpoint.
E.Use an IAM policy that requires authentication for the sagemaker:InvokeEndpoint action.
AnswersA, E

Deploying inside a VPC allows network-level access control.

Why this answer

Option A is correct because deploying a SageMaker endpoint within a VPC allows you to control inbound and outbound traffic using security groups and network ACLs, effectively restricting network-level access to the endpoint. This is a fundamental network security measure that prevents unauthorized network traffic from reaching the endpoint.

Exam trap

The trap here is that candidates often confuse resource-based IAM policies (which are not supported for SageMaker endpoints) with identity-based policies, or they assume that attaching a security group directly to an endpoint is possible without deploying it in a VPC.

488
MCQeasy

Refer to the exhibit. A data engineer runs a SageMaker processing job that fails. What is the MOST likely cause of the failure?

A.The processing instance type is too small.
B.The processing job code has a bug.
C.The S3 bucket is in a different region.
D.The input file does not exist at the specified S3 path.
E.The IAM role does not have s3:GetObject permission.
AnswerD

Correct. The error directly points to a missing file or incorrect path.

Why this answer

The failure reason explicitly states the input file cannot be read and advises checking the path or file existence.

489
MCQmedium

A company is using Amazon SageMaker to train a large deep learning model. The training job is taking a very long time. The data scientist suspects that the GPU utilization is low due to inefficient data loading. Which action should the data scientist take to diagnose and address this issue?

A.Switch to a CPU-only instance to reduce overhead.
B.Check GPU utilization using Amazon CloudWatch metrics, and if low, optimize the data loading pipeline by using Pipe mode or faster data formats.
C.Reduce the batch size to speed up training.
D.Increase the number of GPUs in the training instance.
AnswerB

Monitoring GPU utilization and optimizing data loading addresses the bottleneck.

Why this answer

Option B is correct because low GPU utilization during deep learning training often indicates a data loading bottleneck, where the GPU spends cycles waiting for data. Amazon CloudWatch provides GPU utilization metrics for SageMaker training jobs, and if utilization is low, optimizing the data pipeline with Pipe mode (streaming data directly from Amazon S3) or using faster data formats like RecordIO or TFRecord can reduce I/O overhead and keep the GPU busy.

Exam trap

The trap here is that candidates often assume adding more GPUs or reducing batch size will speed up training, but without addressing the data pipeline bottleneck, these changes can actually worsen GPU utilization and training time.

How to eliminate wrong answers

Option A is wrong because switching to a CPU-only instance would eliminate GPU acceleration entirely, making training even slower, and does not address the root cause of inefficient data loading. Option C is wrong because reducing the batch size typically decreases GPU utilization further, as the GPU processes fewer samples per step, increasing the relative overhead of data loading and model synchronization. Option D is wrong because increasing the number of GPUs does not fix a data loading bottleneck; it can actually exacerbate the issue by requiring even more data to be fed to multiple GPUs, potentially lowering per-GPU utilization further.

490
MCQmedium

Refer to the exhibit. A team observes that their SageMaker endpoint scales out quickly when load increases, but scales in very slowly when load decreases, causing over-provisioning. What is the most likely cause?

A.TargetValue is too high
B.ScaleOutCooldown is too low
C.ScaleInCooldown is too high
D.Wrong predefined metric selected
AnswerC

A high ScaleInCooldown delays scale-in responses.

Why this answer

Option B is correct because the ScaleInCooldown is 600 seconds (10 minutes), meaning the system waits 10 minutes after a scale-in activity before triggering another scale-in action. This delay causes slow scale-in. Option A would affect scale-out speed.

Option C relates to target value. Option D is incorrect because the metric is appropriate.

491
Multi-Selectmedium

A machine learning engineer is preparing a dataset for a multiclass classification task. The dataset has 10 features and 100,000 rows. Which TWO techniques should the engineer use to reduce the risk of overfitting during data preparation?

Select 2 answers
A.Data augmentation (e.g., adding noise)
B.SMOTE to balance classes
C.One-hot encoding of all categorical features
D.Log transformation of skewed features
E.Feature selection using correlation analysis
AnswersA, E

Increases training data diversity, reducing overfitting.

Why this answer

Data augmentation (A) is correct because it artificially increases the diversity of the training set by adding noise or transformations, which helps the model generalize better and reduces overfitting. Feature selection using correlation analysis (E) is correct because it removes redundant or highly correlated features, simplifying the model and minimizing the risk of learning noise from irrelevant predictors.

Exam trap

AWS often tests the distinction between techniques that address overfitting versus those that handle other data issues like imbalance or skewness, leading candidates to confuse SMOTE or log transforms as overfitting remedies.

492
MCQmedium

An e-commerce company is building a recommendation system using user interaction data stored in Amazon DynamoDB. The data includes user_id, product_id, timestamp, event_type (click, add_to_cart, purchase), and session_id. The data science team exports the data to Amazon S3 as JSON files. During preprocessing, they discover that the 'event_type' field contains inconsistent values due to logging errors: 'Click', 'click', 'CLICK', and 'clck' all appear. Also, there are duplicate records where the same user_id, product_id, and timestamp appear multiple times with the same event_type. The team wants to use AWS Glue to clean the data for training a sequence-based recommendation model. Which set of actions should they perform?

A.Use AWS Glue to group records by session_id and aggregate event_types into a list per session. Then apply a mapping function to standardize event_type names.
B.Use AWS Glue to drop exact duplicate rows (all columns identical). Then apply a mapping function to standardize event_type to a controlled vocabulary (e.g., 'click', 'add_to_cart', 'purchase').
C.Use AWS Glue to drop duplicate records based on all columns. Then drop the event_type column and use only numeric features for training.
D.Use AWS Glue to impute event_type with the mode for records with inconsistent values. Then drop duplicate records based on user_id, product_id, and timestamp.
AnswerB

Deduplication removes redundant records, and mapping standardizes event_type, both essential for clean sequence data.

Why this answer

Option B is correct because it addresses both data quality issues: first, dropping exact duplicate rows (all columns identical) removes redundant records that would bias the sequence model; second, standardizing event_type to a controlled vocabulary ensures consistent categorical input for ML training. AWS Glue's DynamicFrame with DropDuplicates and Map transformations are the appropriate tools for this ETL task.

Exam trap

The trap here is that candidates may think grouping by session_id is necessary for sequence modeling, but the question asks for cleaning steps, not feature engineering—duplicate removal and standardization must come first to avoid propagating errors into the sequence aggregation.

How to eliminate wrong answers

Option A is wrong because grouping by session_id and aggregating event_types into a list per session loses the individual event timestamps and ordering, which are critical for sequence-based recommendation models. Option C is wrong because dropping the event_type column removes the target label for the recommendation model, and using only numeric features would discard the core behavioral signal. Option D is wrong because imputing event_type with the mode is inappropriate for categorical data with logging errors (e.g., 'clck' should be mapped to 'click', not replaced by the most frequent value), and dropping duplicates only on user_id, product_id, and timestamp may remove legitimate distinct events that differ in event_type.

493
MCQmedium

A machine learning engineer is developing a text classification model using Amazon SageMaker. The dataset consists of 1 million customer reviews, with labels indicating sentiment (positive, negative, neutral). The engineer uses a pre-trained BERT model from the Hugging Face Model Hub and fine-tunes it on the dataset using SageMaker's Hugging Face estimator with a ml.p3.2xlarge instance. After 2 hours of training, the training job fails with a 'ResourceExhaustedError: CUDA out of memory' error. The error occurs during the forward pass of the first epoch. The engineer confirms that the batch size is set to 32, the maximum sequence length is 512 tokens, and the dataset is stored in a S3 bucket in the same AWS region. The engineer needs to complete fine-tuning without increasing instance costs. Which course of action should the engineer take?

A.Reduce the batch size to 8 and enable gradient accumulation with 4 steps to maintain effective batch size.
B.Enable SageMaker Managed Spot Training to reduce costs and use the savings to upgrade to a ml.p3.8xlarge instance.
C.Switch to a CPU-based instance like ml.c5.2xlarge to avoid GPU memory constraints.
D.Reduce the maximum sequence length to 128 tokens to lower memory consumption.
AnswerA

Reducing batch size lowers GPU memory usage, and gradient accumulation allows the model to see the same number of samples per update without increasing memory.

Why this answer

Option A is correct because reducing the batch size to 8 directly lowers GPU memory usage per forward pass, and enabling gradient accumulation with 4 steps allows the model to simulate the original effective batch size of 32 (8 × 4 = 32) without increasing memory footprint. This approach resolves the CUDA out-of-memory error while keeping the same instance type (ml.p3.2xlarge) and without incurring additional costs.

Exam trap

The trap here is that candidates may think reducing sequence length (Option D) is the simplest fix, but they overlook that it can severely impact model performance for sentiment analysis on long reviews, while gradient accumulation (Option A) is the standard technique to handle GPU memory limits without sacrificing batch size or accuracy.

How to eliminate wrong answers

Option B is wrong because upgrading to a ml.p3.8xlarge instance increases costs (it has 4× the GPU memory and is more expensive per hour), and Managed Spot Training only reduces cost but does not change the instance type; the engineer explicitly needs to avoid increasing instance costs. Option C is wrong because switching to a CPU-based instance (ml.c5.2xlarge) would dramatically increase training time for a BERT model (which relies on GPU parallelism) and may still run out of memory for sequence length 512, while also violating the requirement to complete fine-tuning efficiently. Option D is wrong because reducing the maximum sequence length to 128 tokens would truncate input texts, potentially losing critical context in customer reviews and degrading model accuracy; the engineer needs to maintain model quality while fixing the memory error.

494
MCQhard

A team is using AWS Glue to process streaming data from Amazon Kinesis. The streaming data contains both structured and semi-structured fields. The team needs to flatten the semi-structured fields into columns for downstream ML training. Which Glue feature is BEST suited?

A.Relationalize transform
B.Spigot transform
C.ResolveChoice transform
D.ApplyMapping transform
AnswerA

Relationalize recursively flattens nested data into separate tables or columns.

Why this answer

The Relationalize transform is specifically designed to flatten nested JSON or semi-structured fields into a relational structure, making it ideal for converting complex streaming data from Kinesis into flat columns for ML training. It automatically handles arrays and structs by creating separate tables or columns, which is exactly what the team needs for downstream processing.

Exam trap

The trap here is that candidates confuse 'flattening semi-structured data' with simple schema operations like type resolution or column mapping, leading them to choose ResolveChoice or ApplyMapping instead of the specialized Relationalize transform.

How to eliminate wrong answers

Option B is wrong because the Spigot transform is used to sample or write a subset of data to a specified location for debugging or testing, not for flattening semi-structured fields. Option C is wrong because the ResolveChoice transform resolves ambiguity when a column has multiple data types (e.g., string vs. int) by casting to a chosen type, but it does not flatten nested structures. Option D is wrong because the ApplyMapping transform renames, casts, or drops columns based on a mapping specification, but it cannot flatten nested JSON or semi-structured data into separate columns.

495
MCQhard

A company uses SageMaker endpoints with auto-scaling based on CPU utilization. During a flash sale, latency increases despite low CPU. What should be done?

A.Use a custom metric such as memory utilization or request count for auto-scaling
B.Increase the instance size
C.Disable auto-scaling and use a larger instance
D.Switch to GPU instances
AnswerA

Custom metrics can better capture the actual load and scale appropriately.

Why this answer

Option A is correct because CPU utilization is a poor scaling metric for inference workloads that are I/O or memory-bound. During a flash sale, increased request concurrency can cause queuing and latency spikes even when CPU is low. Using a custom metric like request count per instance or memory utilization directly reflects the load on the inference endpoint, enabling the Application Auto Scaling target tracking policy to scale out proactively before latency degrades.

Exam trap

The trap here is that candidates assume CPU utilization is always the best scaling metric for compute-bound workloads, but the MLA-C01 exam specifically tests the understanding that inference endpoints can be I/O-bound, making request count or memory utilization more appropriate for auto-scaling.

How to eliminate wrong answers

Option B is wrong because increasing the instance size does not address the root cause—auto-scaling is not triggering due to an inappropriate metric; it merely shifts the bottleneck to a larger instance without solving the scaling policy issue. Option C is wrong because disabling auto-scaling removes elasticity entirely, which is counterproductive for handling unpredictable traffic spikes like a flash sale; a static larger instance will either be over-provisioned or still suffer latency under extreme load. Option D is wrong because GPU instances are designed for compute-heavy workloads like deep learning inference, not for resolving latency caused by request queuing or I/O bottlenecks; they add cost without fixing the scaling metric problem.

496
MCQmedium

A data science team has trained a model using SageMaker and wants to deploy it for real-time inference with automatic scaling based on request latency. The deployment must handle unpredictable traffic spikes without manual intervention. Which combination of SageMaker features should the team use?

A.Create a SageMaker endpoint with an Application Auto Scaling target tracking policy based on the SageMakerVariantInvocationsPerInstance metric
B.Deploy the model on a multi-model endpoint and manually adjust the number of instances via the AWS Management Console
C.Deploy the model on an Elastic Inference accelerator and use AWS Auto Scaling with a scheduled policy
D.Create a batch transform job with a scheduled Lambda function to trigger scaling
AnswerA

SageMaker endpoints support Application Auto Scaling with target tracking on invocations per instance, handling spikes.

Why this answer

Option A is correct because it uses a SageMaker endpoint with an Application Auto Scaling target tracking policy based on the SageMakerVariantInvocationsPerInstance metric. This allows the endpoint to automatically scale the number of instances in response to changes in request latency, as the metric directly reflects the load per instance. The target tracking policy adjusts capacity to maintain a target value for the metric, handling unpredictable traffic spikes without manual intervention.

Exam trap

The trap here is that candidates may confuse automatic scaling with manual adjustments or batch processing, or mistakenly think that Elastic Inference or scheduled policies can handle real-time, unpredictable traffic spikes, when only a target tracking policy on a SageMaker endpoint provides the required dynamic, latency-aware scaling.

How to eliminate wrong answers

Option B is wrong because manually adjusting instances via the AWS Management Console does not provide automatic scaling, which is required to handle unpredictable traffic spikes without manual intervention. Option C is wrong because Elastic Inference accelerators are used to reduce the cost of deep learning inference by attaching a fraction of GPU power to an instance, not for scaling based on latency; AWS Auto Scaling with a scheduled policy is not suitable for unpredictable spikes as it relies on predefined schedules. Option D is wrong because a batch transform job is designed for offline, asynchronous inference on large datasets, not for real-time inference, and a scheduled Lambda function cannot dynamically scale based on real-time latency metrics.

497
MCQmedium

A company uses SageMaker for training and inference. They have a model that retrains weekly. After each retraining, the model is evaluated on a held-out test set. If the evaluation metrics meet a threshold, the model is registered as 'Approved' in the SageMaker Model Registry. The team manually deploys the approved model to a production endpoint. They want to automate this deployment process to reduce manual errors. However, the deployment should only proceed if the new model passes a canary test in a staging environment. Which combination of AWS services should the team use to achieve this?

A.AWS CodeDeploy with a blue/green deployment strategy.
B.SageMaker Pipelines with a conditional deployment step that includes a canary test.
C.AWS Lambda to deploy to staging, then automatically promote to production if staging tests pass.
D.Amazon EKS with a custom inference container and use ArgoCD for automated deployments.
AnswerB

Pipelines natively support conditional logic, canary deployments via weighted endpoints, and automatic rollback.

Why this answer

SageMaker Pipelines natively supports conditional execution steps, allowing you to add a canary test step that evaluates the new model in a staging environment before automatically promoting it to production. This directly addresses the requirement for automated deployment gated by a canary test, without needing external orchestration services.

Exam trap

The trap here is that candidates may overthink the solution and choose a generic CI/CD tool like CodeDeploy or Lambda, missing that SageMaker Pipelines already provides a fully managed, ML-specific orchestration with conditional deployment and canary testing capabilities.

How to eliminate wrong answers

Option A is wrong because AWS CodeDeploy with blue/green deployment is a general-purpose deployment service for EC2, Lambda, or ECS, not integrated with SageMaker Model Registry or SageMaker endpoints, and lacks native canary testing for ML models. Option C is wrong because using AWS Lambda to deploy to staging and then promote to production would require custom code to manage the canary test logic, state tracking, and rollback, which is less reliable and maintainable than SageMaker Pipelines' built-in conditional steps. Option D is wrong because Amazon EKS with ArgoCD is designed for Kubernetes container orchestration, not for managing SageMaker endpoints or Model Registry, and introduces unnecessary complexity for a SageMaker-native workflow.

498
MCQmedium

A data scientist runs the exhibit AWS Glue ETL job. The job fails with a Spark stage failure error. What is the most likely cause?

A.The output path is missing.
B.The S3 bucket does not exist.
C.The job does not have enough memory.
D.The data type mapping in ApplyMapping is incorrect; "value" column contains non-numeric strings that cannot be cast to double.
AnswerD

Casting string to double fails on non-numeric data, causing task failure.

Why this answer

The Spark stage failure error in an AWS Glue ETL job is most likely caused by a data type mismatch during the ApplyMapping transformation. When the 'value' column contains non-numeric strings that cannot be cast to double, Spark throws a stage failure because it cannot complete the required type conversion, leading to task failures and job termination.

Exam trap

The trap here is that candidates often attribute Spark stage failures to resource issues (memory or missing paths) rather than recognizing that data type casting errors during transformations are a primary cause of stage-level failures in Glue ETL jobs.

How to eliminate wrong answers

Option A is wrong because a missing output path would cause a different error, such as 'Path does not exist' or 'FileNotFoundException', not a Spark stage failure. Option B is wrong because a non-existent S3 bucket would result in an 'AccessDenied' or 'NoSuchBucket' error at the job start, not during a Spark stage. Option C is wrong because insufficient memory typically manifests as an 'OutOfMemoryError' or 'Container killed by YARN' error, not a generic stage failure; stage failures are more commonly tied to data processing errors like type casting issues.

499
MCQhard

A company has a SageMaker endpoint running a model that provides real-time recommendations. Recently, the model's accuracy has degraded due to data drift. The team wants to automatically retrain the model when a drift metric exceeds a threshold and deploy the new model without downtime. Which architecture should the team implement?

A.Use SageMaker Model Monitor to collect drift metrics, and have a data scientist manually analyze the metrics and trigger retraining via the SageMaker console
B.Use SageMaker Model Monitor to trigger an Amazon EventBridge event that starts a SageMaker Pipeline, which retrains the model, registers it in the Model Registry, and then updates the existing endpoint with a new production variant
C.Schedule a daily SageMaker Pipeline that retrains the model and deploys it using a new endpoint, then updates the application to point to the new endpoint
D.Use SageMaker Model Monitor to publish drift metrics to Amazon CloudWatch, and create a CloudWatch alarm that triggers an AWS Lambda function to retrain and deploy the model
AnswerB

EventBridge triggers pipeline on drift; pipeline retrains, registers, and uses production variant to shift traffic gradually with no downtime.

Why this answer

Option B is correct because it uses SageMaker Model Monitor to detect data drift and emit an EventBridge event, which triggers a SageMaker Pipeline to retrain the model, register it in the Model Registry, and then update the existing endpoint with a new production variant. This architecture enables automatic retraining and zero-downtime deployment by leveraging the endpoint's production variants for a blue/green deployment.

Exam trap

AWS often tests the distinction between automatic drift-triggered retraining with zero-downtime deployment (Option B) versus scheduled retraining or manual intervention, and candidates may overlook the need to update the existing endpoint rather than creating a new one.

How to eliminate wrong answers

Option A is wrong because it relies on manual analysis and triggering, which does not meet the requirement for automatic retraining. Option C is wrong because scheduling a daily pipeline ignores the data drift trigger and deploys a new endpoint instead of updating the existing one, causing downtime or requiring application changes to point to the new endpoint. Option D is wrong because while it uses CloudWatch alarms and Lambda for automation, it lacks the integration with SageMaker Model Registry and the ability to update the existing endpoint with a new production variant, potentially causing downtime or manual intervention.

500
MCQmedium

A data scientist is exploring data stored in an Amazon Redshift cluster. The data includes timestamp columns with different formats. The scientist wants to create a new column that standardizes the timestamp format to UTC. Which approach is MOST efficient?

A.Use AWS Glue to read the Redshift table and apply a custom transform
B.Use a SELECT with CONVERT_TIMEZONE in Redshift and export to S3
C.Use a SageMaker notebook to query Redshift and transform
D.Use Amazon QuickSight to transform the timestamp
AnswerB

CONVERT_TIMEZONE is a built-in Redshift function that efficiently converts timestamps.

Why this answer

Option B is correct because `CONVERT_TIMEZONE` in Amazon Redshift is a native SQL function that directly converts timestamps to UTC without moving data outside the cluster. This approach avoids the overhead of external services, leverages Redshift's massively parallel processing (MPP) engine, and is the most efficient for in-database transformations.

Exam trap

The trap here is that candidates assume external ETL tools (Glue, SageMaker) are always necessary for complex transforms, overlooking Redshift's powerful built-in SQL functions that can perform the same task with zero data egress.

How to eliminate wrong answers

Option A is wrong because AWS Glue would require reading the entire Redshift table into a separate Spark environment, adding network latency and compute costs, which is far less efficient than a native SQL transform. Option C is wrong because a SageMaker notebook would need to query Redshift via a JDBC/ODBC connection, pulling data into the notebook's memory for transformation, introducing unnecessary data movement and serialization overhead. Option D is wrong because Amazon QuickSight is a visualization and dashboarding service, not a data transformation engine; it cannot create new columns or modify schemas in Redshift.

501
MCQmedium

A healthcare company deploys a model that predicts patient readmission risk. The model is deployed using a SageMaker real-time endpoint with data capture enabled. The compliance team requires that all inference data be encrypted at rest in S3 using AWS KMS with a customer managed key. The team has configured the endpoint to use an IAM role that includes the necessary KMS permissions. However, after deployment, the captured data is not being written to the S3 bucket. The team checks the CloudWatch logs for the endpoint and finds no errors. The S3 bucket policy is as follows: { "Version": "2012-10-17", "Statement": [ { "Effect": "Deny", "Principal": "*", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::my-bucket/*", "Condition": { "Bool": { "aws:SecureTransport": "false" } } } ] } The bucket also has a default KMS key. What is the MOST likely reason that the captured data is not being written?

A.The bucket policy includes an explicit deny that overrides any allow.
B.The bucket policy denies all PutObject requests because aws:SecureTransport is false.
C.The KMS key policy does not grant the SageMaker execution role the kms:GenerateDataKey permission.
D.The S3 bucket does not exist.
AnswerC

Even if the IAM role has KMS permissions, the key policy might not allow the role to use the key for encryption.

Why this answer

The correct answer is C because SageMaker data capture encrypts captured data at rest in S3 using server-side encryption with AWS KMS (SSE-KMS). When a customer managed KMS key is used, the SageMaker execution role must have the kms:GenerateDataKey permission to encrypt the data before writing it to S3. Even if the IAM role has other KMS permissions, without kms:GenerateDataKey, the data capture write operation fails silently, and CloudWatch logs may not show errors because the failure occurs at the KMS encryption step before the S3 PutObject call.

Exam trap

The trap here is that candidates focus on the S3 bucket policy's explicit Deny and assume it blocks all writes, but they overlook the condition key aws:SecureTransport, which makes the Deny only apply to non-HTTPS requests, and they miss the subtle KMS permission requirement for data capture encryption.

How to eliminate wrong answers

Option A is wrong because the bucket policy does not contain an explicit deny that overrides all allows; the Deny statement only applies when aws:SecureTransport is false, which is a condition that is not met (the request uses HTTPS). Option B is wrong because the bucket policy denies PutObject only when aws:SecureTransport is false, but SageMaker data capture uses HTTPS (SecureTransport is true), so the Deny does not apply. Option D is wrong because if the S3 bucket did not exist, SageMaker would log an error in CloudWatch logs (e.g., NoSuchBucket), but the question states no errors are found in the logs.

502
Multi-Selecteasy

A company wants to deploy a trained model to a SageMaker endpoint with automatic scaling based on traffic. Which TWO configurations are required? (Choose two.)

Select 2 answers
A.Use a multi-model endpoint
B.Enable data capture
C.Set up an Application Auto Scaling policy
D.Configure a lifecycle configuration
E.Create a CloudWatch alarm
AnswersC, E

Auto Scaling policy defines how to scale the endpoint.

Why this answer

Option C is correct because Application Auto Scaling is the AWS service that automatically adjusts the number of instances for a SageMaker endpoint based on demand. You define a scaling policy (e.g., target tracking, step scaling) that tells Auto Scaling when to add or remove instances, which is essential for handling variable traffic without manual intervention.

Exam trap

The trap here is that candidates often confuse 'required configurations for scaling' with 'optional features that improve monitoring or cost efficiency,' leading them to select data capture or multi-model endpoints instead of recognizing that a CloudWatch alarm is the trigger mechanism for the scaling policy.

503
MCQmedium

Refer to the exhibit. A data scientist tries to deploy a model from an S3 bucket encrypted with SSE-KMS. What should the administrator do to resolve this?

A.Change the model artifact encryption to SSE-S3.
B.Add kms:Decrypt permission to the SageMaker execution role for the KMS key.
C.Re-upload the model artifact without encryption.
D.Attach the AWS managed policy 'AmazonSageMakerFullAccess' to the role.
AnswerB

This directly addresses the missing permission.

Why this answer

The error indicates the execution role lacks kms:Decrypt permission on the KMS key used to encrypt the model artifact. Adding this permission resolves the issue.

504
MCQeasy

A company wants to audit all API calls made to SageMaker endpoints for security compliance. Which AWS service should they enable?

A.AWS CloudTrail
B.Amazon GuardDuty
C.AWS Config
D.AWS CloudTrail
AnswerA

CloudTrail records API calls for auditing.

Why this answer

AWS CloudTrail records all API calls for auditing. GuardDuty is for threat detection, Macie for sensitive data discovery, and Config for configuration changes.

505
Multi-Selecteasy

A data engineer is using SageMaker Pipelines to automate data preparation. Which TWO statements about data validation within a pipeline are correct?

Select 2 answers
A.The pipeline can be configured to fail if data quality checks do not meet thresholds
B.SageMaker Pipelines has a built-in 'CheckDataQuality' step for data validation
C.Data validation can only be performed on training data, not inference data
D.Data validation steps cannot pass results to subsequent steps
E.Data validation requires a trained model to evaluate predictions
AnswersA, B

You can set conditions to fail the pipeline.

Why this answer

Option A is correct because SageMaker Pipelines allows you to define conditions that evaluate the output of data quality checks (e.g., using Amazon SageMaker Model Monitor or custom validation scripts). If the checks fail to meet specified thresholds (e.g., missing values exceed 5%), the pipeline can be configured to fail, stopping execution and preventing downstream steps from processing invalid data.

Exam trap

The trap here is that candidates assume data validation requires a trained model or is limited to training data, but SageMaker Pipelines supports rule-based validation on any dataset, including inference data, without needing a model.

506
MCQmedium

A team is deploying a machine learning model using Amazon SageMaker. They need to serve predictions with sub-100ms latency for a real-time application. The model is a large ensemble that requires 4 GB of memory. The team expects traffic of 100 requests per second initially, but it may double during peak hours. Which instance type and deployment configuration should the team choose to minimize cost while meeting the latency requirement?

A.Deploy on one ml.c5.large instance with an Application Auto Scaling target tracking policy based on memory utilization
B.Deploy on one ml.t2.medium instance with an Application Auto Scaling target tracking policy based on CPU utilization
C.Deploy on one ml.p3.2xlarge instance with provisioned concurrency
D.Deploy on two ml.m5.large instances behind a load balancer with manual scaling
AnswerA

ml.c5.large has 4 GB memory, suitable; one instance can handle 100 RPS; auto-scaling handles peak.

Why this answer

Option A is correct because the ml.c5.large instance provides 4 GB of memory, which meets the model's requirement, and its compute-optimized nature ensures low-latency inference. Using Application Auto Scaling with a target tracking policy based on memory utilization allows the instance to scale out during traffic spikes (up to 200 requests per second) while minimizing cost by running a single instance during normal load.

Exam trap

The trap here is that candidates often choose GPU instances (like p3) for any 'large' model, but the question specifies memory and latency requirements, not GPU compute needs, and they overlook that burstable instances (t2) cannot sustain low latency under continuous load due to CPU credit exhaustion.

How to eliminate wrong answers

Option B is wrong because the ml.t2.medium instance has only 4 GB of memory but uses burstable CPU (t2 series), which cannot sustain sub-100ms latency under sustained load due to CPU credit exhaustion, especially at 100-200 requests per second. Option C is wrong because the ml.p3.2xlarge instance is a GPU-accelerated instance designed for training or high-throughput batch inference, not for real-time low-latency serving; it is over-provisioned and costly for this memory-bound ensemble model, and provisioned concurrency is a Lambda feature, not applicable to SageMaker. Option D is wrong because deploying two ml.m5.large instances (each with 8 GB memory) behind a load balancer with manual scaling is over-provisioned for the initial 100 requests per second, increasing cost unnecessarily, and manual scaling cannot dynamically handle peak traffic without manual intervention.

507
MCQhard

A company is deploying a real-time inference endpoint for a natural language processing model using Amazon SageMaker. The model requires GPU acceleration and must handle variable traffic patterns, including sudden spikes. The team wants to minimize costs while maintaining low latency during spikes. Which endpoint configuration strategy should they use?

A.Use a single large GPU instance with provisioned concurrency.
B.Use a serverless endpoint with GPU support.
C.Use a single GPU instance in multiple Availability Zones with an Application Load Balancer.
D.Use a multi-model endpoint on a GPU instance with Auto Scaling based on invocation count.
AnswerD

Multi-model endpoints share instances across models, and Auto Scaling adjusts capacity for spikes.

Why this answer

Option D is correct because a multi-model endpoint on a GPU instance with Auto Scaling based on invocation count allows multiple models to share a single GPU, maximizing utilization and reducing cost. Auto Scaling based on invocation count dynamically adjusts the number of instances to handle traffic spikes while maintaining low latency, as it scales out quickly when the invocation count exceeds a threshold.

Exam trap

The trap here is that candidates assume serverless endpoints support GPU acceleration, but SageMaker serverless endpoints are CPU-only, making Option B invalid despite its cost-saving appeal.

How to eliminate wrong answers

Option A is wrong because a single large GPU instance with provisioned concurrency does not scale to handle sudden spikes; provisioned concurrency pre-warms instances but does not add more instances during a spike, leading to latency increases or throttling. Option B is wrong because serverless endpoints with GPU support are not available in SageMaker; serverless endpoints only support CPU instances, so they cannot meet the GPU acceleration requirement. Option C is wrong because using a single GPU instance in multiple Availability Zones with an Application Load Balancer does not provide horizontal scaling; it only adds redundancy across zones, but a single instance cannot handle spikes in traffic without Auto Scaling to add more instances.

Page 6

Page 7 of 7

All pages