CCNA Ml Modeling Questions

75 of 624 questions · Page 7/9 · Ml Modeling topic · Answers revealed

451
MCQhard

A team deployed a SageMaker endpoint for real-time inference using a PyTorch model. After monitoring, they notice that the latency is highly variable, with p99 latency 10x the p50 latency. The endpoint uses a single ml.c5.2xlarge instance with auto-scaling based on average CPU utilization. Which change is most likely to reduce latency variability?

A.Increase the batch size for inference
B.Pre-warm the model by sending dummy requests every minute
C.Switch to a GPU instance type
D.Change the auto-scaling metric to 'InvocationsPerInstance'
AnswerD

Scaling on invocations per instance prevents overload and reduces queueing.

Why this answer

Option C is correct because high p99 latency often results from cold starts or queueing when traffic spikes. Scaling on invocations per instance ensures more instances are ready. Option A (GPU) may not help if model is CPU-bound.

Option B (batch size) can increase latency. Option D (warm-up) helps cold starts but not queueing.

452
MCQhard

A company is building a sentiment analysis model using Amazon SageMaker BlazingText. The training data consists of 100,000 product reviews. The data scientist wants to use the Word2Vec algorithm to generate word embeddings. Which configuration is required to use the continuous bag-of-words (CBOW) architecture?

A.Set the mode parameter to 'supervised'.
B.Set the mode parameter to 'batch_skipgram'.
C.Set the mode parameter to 'cbow'.
D.Set the mode parameter to 'skipgram'.
AnswerC

The 'cbow' mode enables the continuous bag-of-words architecture in BlazingText.

Why this answer

In BlazingText, the 'mode' parameter controls the training objective. Setting 'mode' to 'cbow' enables the continuous bag-of-words architecture. 'skipgram' is for skip-gram. 'batch_skipgram' is for large-scale skip-gram. 'supervised' is for text classification.

453
MCQhard

A data scientist is building a binary classification model to predict customer churn. The dataset is highly imbalanced, with only 5% of customers churning. The scientist evaluates several models using accuracy, precision, recall, and F1 score. Which metric is most appropriate for comparing model performance in this scenario?

A.Accuracy
B.F1 score
C.Precision
D.Recall
AnswerB

F1 score balances precision and recall, making it suitable for imbalanced datasets.

Why this answer

F1 score is the harmonic mean of precision and recall and is suitable for imbalanced datasets where accuracy can be misleading. Accuracy would be high even if the model predicts no churn ever (95% accuracy). Precision and recall each consider only one aspect, but F1 balances both.

454
Multi-Selecteasy

A data scientist is building a binary classification model to predict customer churn. The dataset has 10,000 samples with 500 churners (positive class). Which TWO techniques should be used to address the class imbalance? (Choose 2.)

Select 2 answers
A.Use a higher learning rate during training
B.Use L1 regularization on the model
C.Use random undersampling of the majority class
D.Use SMOTE to generate synthetic samples for the minority class
E.Use principal component analysis (PCA) to reduce dimensionality
AnswersC, D

Undersampling reduces majority class samples, balancing the dataset.

Why this answer

SMOTE and undersampling are standard techniques for handling class imbalance.

455
MCQeasy

A data scientist is training a binary classification model on a dataset where the positive class represents only 1% of the data. The model's accuracy is 99%, but the recall for the positive class is 0%. Which metric should the scientist use to evaluate the model's performance effectively?

A.Area under the ROC curve (ROC AUC)
B.Area under the Precision-Recall curve (PR AUC)
C.Accuracy
D.F1 score
AnswerB

PR AUC is robust to class imbalance.

Why this answer

In a highly imbalanced dataset where the positive class is only 1%, accuracy is misleading because a model can achieve 99% accuracy by simply predicting the negative class for all samples, resulting in 0% recall for the positive class. The Area under the Precision-Recall curve (PR AUC) is the correct metric because it focuses on the performance of the positive class by evaluating the trade-off between precision and recall, making it sensitive to changes in the minority class. Unlike ROC AUC, which can be overly optimistic in imbalanced settings due to the large number of true negatives, PR AUC provides a more realistic assessment of model performance for rare events.

Exam trap

The trap here is that candidates often choose ROC AUC (Option A) because it is a common default metric, but they fail to recognize that in severe class imbalance, ROC AUC can be artificially inflated by the dominance of true negatives, whereas PR AUC is the correct choice for evaluating minority class performance.

How to eliminate wrong answers

Option A is wrong because ROC AUC evaluates the trade-off between true positive rate and false positive rate, and in highly imbalanced datasets with a large number of true negatives, it can remain high even when the model fails to identify positive samples, giving a false sense of good performance. Option C is wrong because accuracy is a global metric that counts overall correct predictions; in this 1% positive class scenario, a model that always predicts the negative class achieves 99% accuracy but has 0% recall, making it completely useless for detecting the positive class. Option D is wrong because the F1 score, while better than accuracy, is a single threshold-dependent metric that can be misleading if the model's precision is high but recall is zero (F1 would be 0), and it does not capture performance across all thresholds like PR AUC does.

456
MCQeasy

A data scientist is using Amazon SageMaker to train a model and wants to automatically stop the training job if the loss does not improve for a certain number of epochs. Which SageMaker feature can be used for this purpose?

A.SageMaker Experiments
B.Custom early stopping callback in the training script
C.SageMaker Automatic Model Tuning
D.SageMaker Debugger
AnswerB

Implementing a custom callback that stops training when loss stagnates is the most direct method.

Why this answer

SageMaker provides built-in early stopping via the 'StoppingCondition' parameter in the training job definition, or through custom training scripts that use callbacks. The simplest way is to set MaxRuntimeInSeconds, but for early stopping based on loss, the data scientist should implement a custom callback in the training script.

457
MCQeasy

A company has deployed a real-time inference endpoint using SageMaker for a fraud detection model. The model uses a Random Forest classifier. The endpoint receives predictions but the latency is too high. The metric shows p99 latency of 500ms, but the requirement is under 200ms. The team has already optimized the instance type to the maximum allowed by their budget. The data scientist suggests: A) Reducing the number of trees in the Random Forest model. B) Switching to a linear model like Logistic Regression. C) Enabling SageMaker's batch transform instead of real-time endpoint. D) Adding more instances to the endpoint behind a load balancer. Which option will MOST effectively reduce latency while maintaining acceptable accuracy?

A.Switch to a linear model like Logistic Regression
B.Reduce the number of trees in the Random Forest model
C.Enable SageMaker's batch transform
D.Add more instances to the endpoint
AnswerB

Fewer trees mean faster inference, though accuracy may drop slightly; it's a direct latency reduction.

Why this answer

Reducing the number of trees directly reduces inference time of Random Forest, but may degrade accuracy. However, it's the most direct method for latency reduction while keeping the model type. Switching to Logistic Regression may reduce latency but likely reduces accuracy significantly.

Batch transform is not suitable for real-time. Adding instances helps throughput but not per-request latency.

458
Multi-Selectmedium

Which TWO metrics are appropriate for evaluating a binary classification model when the cost of false negatives is high?

Select 2 answers
A.Accuracy
B.AUC-ROC
C.Recall
D.F1 score
E.Precision
AnswersC, D

Recall measures the proportion of actual positives correctly identified.

Why this answer

When false negatives are costly, we want to minimize them, so recall (true positive rate) is important. Precision is also important to avoid too many false positives. F1 score balances both, but recall directly measures false negatives.

AUC-ROC is a general measure. Accuracy can be misleading. So recall and F1 score are appropriate.

Options: A: Recall, B: F1 score (both correct). C: Precision, D: AUC-ROC, E: Accuracy are not the best choices in this context.

459
MCQhard

A machine learning engineer is using SageMaker to train an XGBoost model on a dataset with a severe class imbalance (1:1000). The goal is to maximize recall on the minority class. Which hyperparameter tuning strategy is MOST appropriate?

A.Set max_delta_step to a high value
B.Increase subsample ratio to 1.0
C.Set scale_pos_weight to the ratio of negative to positive samples
D.Set objective to 'binary:logistic' and tune max_depth
AnswerC

This parameter adjusts the weight of the minority class, improving recall.

Why this answer

XGBoost's 'scale_pos_weight' parameter can be set to the ratio of negative to positive instances to help the model focus on the minority class. Adjusting max_delta_step or subsample may help but are secondary. Setting objective to 'binary:logistic' is default, not addressing imbalance.

460
MCQmedium

Refer to the exhibit. A data scientist is configuring SageMaker Model Monitor for data quality checks. The configuration above is used. What is the purpose of the `ProbabilityThresholdAttribute` set to "0.5"?

A.It filters the input data to only include predictions above the threshold
B.It specifies the threshold for sampling data for monitoring
C.It sets the threshold for the accuracy metric
D.It defines the probability threshold used to convert model output to binary predictions for monitoring
AnswerD

This threshold is used to compute predicted labels for monitoring purposes.

Why this answer

In Model Monitor, `ProbabilityThresholdAttribute` is used for binary classification models to define the threshold for converting probabilities to predicted labels. It is used to capture baseline distribution of predictions. It does not set the threshold for the endpoint inference; that is done in the model.

It is used for monitoring drift in prediction distribution. Option B is correct. Option A: It does not sample data.

Option C: It does not define the metric. Option D: It does not filter input data.

461
MCQhard

A data scientist is building a recommender system using collaborative filtering. The dataset is sparse (99% missing values). Which algorithm is best suited?

A.Random Forest
B.K-Nearest Neighbors
C.Matrix Factorization (e.g., SVD)
D.Hidden Markov Model
AnswerC

Matrix factorization works well on sparse data.

Why this answer

Matrix factorization (e.g., SVD) is best suited for sparse collaborative filtering because it learns latent factors that capture underlying user-item interactions, effectively handling the 99% missing values by generalizing patterns rather than relying on explicit pairwise similarities. Unlike memory-based methods, it decomposes the sparse user-item matrix into lower-dimensional representations, enabling accurate predictions even when most entries are unobserved.

Exam trap

Cisco often tests the misconception that K-Nearest Neighbors (KNN) is the default for collaborative filtering, but the trap here is that extreme sparsity (99% missing) makes pairwise similarity calculations unreliable, whereas matrix factorization explicitly models latent factors to overcome data sparsity.

How to eliminate wrong answers

Option A is wrong because Random Forest is a supervised ensemble method that requires a dense feature matrix and cannot inherently handle missing values in a collaborative filtering context; it would fail to leverage the implicit feedback structure of the sparse user-item matrix. Option B is wrong because K-Nearest Neighbors (KNN) is a memory-based collaborative filtering approach that computes similarities between users or items, but with 99% missing values, pairwise distances become unreliable and the algorithm suffers from poor scalability and the 'curse of dimensionality'. Option D is wrong because Hidden Markov Model (HMM) is designed for sequential or temporal data with hidden states, not for static user-item interaction matrices; it does not model the latent factor structure needed for collaborative filtering in sparse settings.

462
Multi-Selecthard

Which TWO SageMaker features can be used to perform hyperparameter optimization? (Choose 2)

Select 2 answers
A.SageMaker Debugger
B.SageMaker Pipelines
C.SageMaker Model Monitor
D.SageMaker automatic model tuning
E.SageMaker Experiments
AnswersD, E

This is the built-in hyperparameter tuning service.

Why this answer

Option A (SageMaker automatic model tuning) is the built-in hyperparameter tuning. Option D (SageMaker Experiments) can track and compare tuning jobs, but not directly run them. However, the question asks for features that can be used to perform HPO.

SageMaker automatic model tuning is the primary feature. SageMaker SDK can be used to implement custom tuning, but it's not a feature name. SageMaker Debugger (B) and Model Monitor (C) are not for HPO.

SageMaker Pipelines (E) can orchestrate HPO but is not a direct tuning feature. The best answer is A and D (Experiments can be used to track HPO runs). Alternatively, A and something else.

Let's reconsider: SageMaker automatic model tuning (A) is the official HPO. SageMaker Experiments (D) can be used to track and analyze tuning jobs, but doesn't perform tuning. The question says 'perform hyperparameter optimization'.

Typically, only automatic model tuning performs it. However, sometimes 'SageMaker SDK' is considered. To align with MLS-C01, the correct answer is A and D (Experiments can be used to run multiple trials).

I'll go with A and D.

463
MCQeasy

During training of a SageMaker built-in object detection algorithm, the loss is not decreasing after several epochs. Which troubleshooting step should be taken first?

A.Increase the mini-batch size
B.Add more classes to the dataset
C.Check whether the learning rate is appropriate
D.Increase the number of epochs
AnswerC

Learning rate is a critical hyperparameter; incorrect value often causes loss not to decrease.

Why this answer

When the loss is not decreasing during training of a SageMaker built-in object detection algorithm, the most common cause is an inappropriate learning rate. A learning rate that is too high can cause the loss to oscillate or diverge, while one that is too low can cause the loss to plateau. Checking and adjusting the learning rate is the first troubleshooting step because it directly controls the step size of gradient updates and is a fundamental hyperparameter in optimization.

Exam trap

The trap here is that candidates often assume increasing the number of epochs (Option D) will always reduce loss, but they fail to recognize that a plateauing loss is typically a sign of a hyperparameter issue like learning rate, not insufficient training time.

How to eliminate wrong answers

Option A is wrong because increasing the mini-batch size typically stabilizes gradient estimates but does not directly address a plateauing loss; it can even slow convergence if the batch size becomes too large. Option B is wrong because adding more classes to the dataset increases task complexity and would likely worsen the loss, not help it decrease. Option D is wrong because increasing the number of epochs does not fix the underlying optimization issue; if the loss is not decreasing due to a poor learning rate, more epochs will simply continue the same ineffective training.

464
MCQmedium

A data scientist is using Amazon SageMaker to train a linear regression model. After training, the scientist notices that the model has a high bias. What is the most likely cause?

A.The training dataset has too many features
B.The model is too complex and overfits the data
C.The regularization parameter is too high
D.The model is too simple and underfits the data
AnswerD

Linear regression can underfit if relationship is nonlinear.

Why this answer

High bias is typically caused by the model being too simple to capture patterns in the data. Option B is wrong because high variance would cause overfitting. Option C is wrong because regularization reduces overfitting, not bias.

Option D is wrong because too many features would increase variance, not bias.

465
MCQmedium

Refer to the exhibit. A data scientist is trying to run a SageMaker training job using a script that reads data from the S3 bucket 'my-bucket' and writes the model artifact to the same bucket. The training job fails with an access denied error. What is the likely cause?

A.The IAM role does not have permission to write to the S3 bucket for the model artifact
B.The IAM role does not have sagemaker:CreateModel permission
C.The IAM role does not have s3:ListBucket permission
D.The IAM role does not have ec2:DescribeInstances permission
AnswerA

The policy only allows PutObject on training-data/*, but the model artifact might be saved to a different prefix (e.g., output/).

Why this answer

The training job fails with an access denied error because the IAM role used by SageMaker lacks the s3:PutObject permission (or equivalent write access) for the S3 bucket 'my-bucket'. While the script reads data from the bucket, writing the model artifact requires explicit write permissions on the same bucket. Without this, SageMaker cannot upload the model artifact, causing the job to fail.

Exam trap

The trap here is that candidates may focus on the read operation (data input) and overlook the write operation (model artifact output), or confuse S3 permissions with SageMaker-specific API actions like CreateModel.

How to eliminate wrong answers

Option B is wrong because sagemaker:CreateModel is a permission for creating a SageMaker model resource after training, not for writing to S3 during the training job; the error occurs during training, not model creation. Option C is wrong because s3:ListBucket is a read permission for listing objects, and the job already reads data successfully (the error is on write), so lack of ListBucket would cause a different error (e.g., 403 on list). Option D is wrong because ec2:DescribeInstances is unrelated to S3 access; it is used for managing EC2 instances, not for SageMaker training jobs writing to S3.

466
Multi-Selectmedium

A data scientist is building a binary classifier to predict customer churn. The dataset is highly imbalanced (5% churn). Which TWO techniques can help improve the model's ability to detect churn?

Select 2 answers
A.Downsample the majority class to balance the dataset
B.Use Synthetic Minority Over-sampling Technique (SMOTE)
C.Use class weights in the loss function to penalize misclassifications of the minority class
D.Use accuracy as the evaluation metric
E.Increase the model complexity by adding more layers
AnswersB, C

SMOTE generates synthetic samples for the minority class.

Why this answer

Correct options: B (Synthetic Minority Over-sampling Technique) and C (Use class weights in the loss function). SMOTE generates synthetic samples for the minority class, and class weights penalize misclassifications of the minority class more heavily. Option A (Downsampling the majority class) can be used but may discard data; not as effective as SMOTE.

Option D (Use accuracy as evaluation metric) is misleading. Option E (Increase model complexity) may cause overfitting.

467
MCQeasy

A machine learning engineer is deploying a model using Amazon SageMaker and wants to automatically scale the endpoint based on the number of incoming requests. Which scaling policy should be used?

A.Step scaling
B.Scheduled scaling
C.Target tracking scaling
D.Simple scaling
AnswerC

Target tracking automatically adjusts capacity based on a target metric.

Why this answer

SageMaker endpoints support Application Auto Scaling, which can use a target tracking scaling policy based on a metric like InvocationsPerInstance. Simple scaling and step scaling are also possible but target tracking is simpler. Scheduled scaling is for predictable traffic.

Option A: Target tracking scaling is correct. Option B: Simple scaling requires manual thresholds. Option C: Step scaling is more complex.

Option D: Scheduled scaling is for predictable patterns.

468
MCQeasy

A data scientist wants to evaluate the performance of a multiclass classification model. The model outputs probabilities for 10 classes. Which metric is most appropriate for evaluating the model's ranking performance across all classes?

A.F1 score (macro-averaged)
B.Accuracy
C.Mean Absolute Error
D.Log loss (cross-entropy)
E.ROC AUC (one-vs-rest macro-averaged)
AnswerD

Log loss directly measures the quality of probability predictions for multiclass problems.

Why this answer

Option D is correct because log loss measures the performance of a classification model where the prediction is a probability value, and it penalizes false classifications. Option A (Accuracy) ignores probability calibration. Option B (ROC AUC) is for binary classification.

Option C (F1 score) is for binary or per-class. Option E (Mean Absolute Error) is for regression.

469
MCQeasy

A machine learning team needs to deploy a model that makes real-time predictions with latency under 100 ms. The model is a deep neural network with 500 MB of parameters. Which AWS service should they use?

A.AWS Glue
B.AWS Lambda with a container image
C.Amazon SageMaker real-time endpoint
D.Amazon EMR
AnswerC

SageMaker real-time endpoints provide low-latency inference for large models.

Why this answer

Amazon SageMaker real-time endpoints are designed for low-latency inference. Option B is wrong because AWS Lambda has a 250 MB deployment package limit and higher latency for large models. Option C is wrong because Amazon EMR is for big data processing, not real-time inference.

Option D is wrong because AWS Glue is for ETL jobs.

470
MCQhard

A machine learning team is deploying a real-time inference endpoint for a recommendation model using Amazon SageMaker. The model takes a long time to load (several minutes) due to its size (5 GB). Which deployment strategy minimizes the cold start latency?

A.Use a single instance with a large memory size
B.Use Multi-Model Endpoints to keep the model loaded between invocations
C.Use SageMaker Serverless Inference
D.Use a larger instance type with more vCPUs
AnswerB

Multi-Model Endpoints allow models to stay loaded in memory, reducing cold start.

Why this answer

Option D is correct because Multi-Model Endpoints allow loading models on demand, but with a large model, it may still be slow. However, Multi-Model Endpoints are designed to reduce cold start by keeping models loaded. Option A is wrong because increasing instance count doesn't reduce load time per instance.

Option B is wrong because Serverless Inference has cold starts. Option C is wrong because a single instance with larger memory doesn't reduce load time significantly.

471
Multi-Selectmedium

A data scientist is training a deep learning model for object detection using Amazon SageMaker. The training job is using a single GPU instance and is taking too long. Which THREE actions can reduce training time? (Choose THREE.)

Select 3 answers
A.Use a CPU instance instead of GPU
B.Enable mixed precision training with FP16
C.Use a GPU instance with more GPUs, such as p3.16xlarge
D.Reduce the batch size
E.Use distributed training across multiple instances
AnswersB, C, E

Mixed precision uses half-precision floats, speeding up computation and reducing memory usage.

Why this answer

Option B is correct because enabling mixed precision training with FP16 reduces memory usage and accelerates computation by using half-precision floating-point numbers where possible, which is particularly effective on NVIDIA GPUs with Tensor Cores (e.g., V100, A100). This can nearly double throughput for deep learning models without sacrificing model accuracy, as critical operations still use FP32 precision.

Exam trap

The trap here is that candidates often confuse reducing batch size with speeding up training, but in practice, smaller batches increase the number of gradient updates and can lead to longer wall-clock time, especially on GPU instances where larger batches better utilize parallel hardware.

472
MCQmedium

A data scientist is deploying a regression model in Amazon SageMaker that predicts housing prices. The model shows high bias (underfitting). Which action is most likely to reduce bias?

A.Reduce the amount of training data
B.Increase regularization strength
C.Use a simpler model
D.Add more features or increase model complexity
AnswerD

More complex models can capture patterns better.

Why this answer

High bias (underfitting) means the model is too simple to capture the underlying patterns in the data. Adding more features or increasing model complexity (e.g., using polynomial features, deeper trees, or a more flexible algorithm) directly addresses underfitting by giving the model greater capacity to learn from the data. In Amazon SageMaker, this could involve using a more complex built-in algorithm like XGBoost with deeper trees or adding feature engineering transformations in a processing job.

Exam trap

The trap here is that candidates often confuse bias with variance and incorrectly choose regularization or simpler models, which are solutions for overfitting (high variance), not underfitting (high bias).

How to eliminate wrong answers

Option A is wrong because reducing the amount of training data would exacerbate underfitting by providing even less information for the model to learn from. Option B is wrong because increasing regularization strength penalizes model complexity further, which would increase bias and worsen underfitting. Option C is wrong because using a simpler model would reduce capacity even more, directly increasing bias rather than reducing it.

473
Multi-Selecteasy

A data scientist is evaluating a regression model. Which TWO metrics are appropriate for evaluating regression performance?

Select 2 answers
A.Root Mean Squared Error (RMSE)
B.F1 score
C.Area Under the ROC Curve (AUC)
D.R-squared
E.Precision
AnswersA, D

RMSE measures average prediction error.

Why this answer

Correct options: C (Root Mean Squared Error) and D (R-squared). RMSE measures average error magnitude, R-squared measures variance explained. Option A (F1 score) is for classification.

Option B (Precision) is for classification. Option E (AUC) is for classification.

474
MCQeasy

A company is using Amazon SageMaker to train a XGBoost model for predicting customer churn. The training data is stored in an S3 bucket as CSV files. The data scientist runs a hyperparameter tuning job with 50 training jobs. The tuning job completes, but the best model's accuracy on the holdout set is lower than expected. The data scientist suspects that the hyperparameter ranges are too narrow. Which corrective action is most appropriate?

A.Increase the number of training jobs in the tuning job
B.Switch to a different algorithm like Random Forest
C.Expand the hyperparameter ranges for key parameters such as 'max_depth', 'learning_rate', and 'subsample'
D.Change the tuning strategy from random search to Bayesian optimization
AnswerC

Wider ranges allow the tuning job to explore more of the hyperparameter space, potentially finding better configurations.

Why this answer

Option C is correct because the data scientist suspects the hyperparameter ranges are too narrow, which directly limits the model's ability to find an optimal configuration. Expanding ranges for key XGBoost parameters like 'max_depth', 'learning_rate', and 'subsample' allows the tuning job to explore a broader space of model complexities and regularization levels, potentially improving accuracy on the holdout set. This is the most direct fix for the stated problem, as it addresses the root cause rather than increasing job count or changing the search strategy.

Exam trap

The trap here is that candidates often confuse 'more training jobs' (Option A) with 'broader search space', failing to recognize that increasing jobs only refines sampling within existing bounds, not expands them.

How to eliminate wrong answers

Option A is wrong because increasing the number of training jobs does not address the core issue of narrow hyperparameter ranges; it only samples the same limited space more densely, which may not yield a better model if the true optimum lies outside the current bounds. Option B is wrong because switching to a different algorithm like Random Forest is an unnecessary and drastic change; the problem is explicitly about hyperparameter ranges, not algorithm suitability, and XGBoost is a strong choice for tabular churn data. Option D is wrong because changing from random search to Bayesian optimization improves sampling efficiency but does not expand the search space; if the ranges are too narrow, even a more intelligent search cannot find a better configuration outside those bounds.

475
MCQhard

A company is using Amazon SageMaker to host a model for real-time inference. The model is a large ensemble of 10 XGBoost models, each 2 GB. The endpoint uses a single ml.c5.18xlarge instance. The inference latency is high (average 2 seconds). Which change would most effectively reduce latency?

A.Use SageMaker Multi-Model Endpoints to serve each model independently
B.Switch to a GPU instance type
C.Add more instances behind a load balancer
D.Use SageMaker Batch Transform instead of real-time endpoint
AnswerA

Multi-Model Endpoints reduce serialization overhead by loading models on demand.

Why this answer

Option A is correct because serialization/deserialization of large models is a bottleneck; SageMaker Multi-Model Endpoints can reduce overhead by loading only the requested model. Option B (GPU) may not help if the bottleneck is CPU. Option C (batch transform) is for offline inference.

Option D (more instances) helps throughput but not per-request latency.

476
MCQeasy

A data scientist is using Amazon SageMaker to train a linear regression model. The training data has 10 features, and the scientist wants to interpret the model's coefficients. Which algorithm should they use?

A.Amazon SageMaker XGBoost
B.Amazon SageMaker K-Means
C.Amazon SageMaker Factorization Machines
D.Amazon SageMaker Linear Learner
AnswerD

Produces linear coefficients for interpretation.

Why this answer

Option A is correct because Linear Learner provides interpretable coefficients. Option B is wrong because XGBoost is tree-based and less interpretable. Option C is wrong because K-Means is unsupervised.

Option D is wrong because Factorization Machines are for high-dimensional sparse data.

477
MCQmedium

A company uses an XGBoost model to predict equipment failures. The model has high precision but low recall. The business impact of a false negative is very high (missing a failure). Which action would MOST effectively increase recall while keeping precision reasonably high?

A.Increase the regularization parameter lambda
B.Set the objective to 'reg:squarederror'
C.Decrease the probability threshold for the positive class
D.Increase the number of boosting rounds
AnswerC

Lower threshold increases recall but may reduce precision.

Why this answer

Decreasing the probability threshold for the positive class means the model will classify a case as a failure at a lower predicted probability, which captures more true positives (increases recall). However, this also allows more false positives, so precision may drop, but the trade-off is acceptable given the high cost of false negatives. This is a standard post-training calibration technique for imbalanced classification problems.

Exam trap

Cisco often tests the misconception that increasing boosting rounds or regularization directly improves recall, when in fact the probability threshold is the primary lever for trading off precision and recall after training.

How to eliminate wrong answers

Option A is wrong because increasing the regularization parameter lambda (L2 regularization) reduces model complexity and can lead to underfitting, which typically decreases both precision and recall, not selectively increase recall. Option B is wrong because setting the objective to 'reg:squarederror' treats the problem as regression, not classification, so the model outputs continuous values without a probability threshold, making it unsuitable for recall-focused binary classification. Option D is wrong because increasing the number of boosting rounds can lead to overfitting, which may increase variance and actually degrade recall on unseen data, and does not directly control the trade-off between precision and recall.

478
MCQeasy

A company wants to deploy a machine learning model that provides real-time inference with low latency. The model is a small ensemble of three tree-based models. Which Amazon SageMaker approach is most appropriate?

A.Use a SageMaker real-time endpoint with a single inference container.
B.Use a SageMaker batch transform job.
C.Use AWS Lambda with the model packaged in a layer.
D.Use a SageMaker Serverless Inference endpoint.
AnswerA

Real-time endpoints provide low-latency inference.

Why this answer

A SageMaker real-time endpoint with a single inference container is the most appropriate approach because it provides persistent, low-latency inference by keeping the model loaded in memory and handling requests synchronously. For a small ensemble of three tree-based models, a single container can host all models (e.g., using a custom inference script or a multi-model endpoint) and deliver sub-second response times, meeting the real-time requirement.

Exam trap

The trap here is that candidates often confuse 'real-time inference' with 'serverless' or 'batch processing,' assuming that serverless or Lambda are always cheaper or simpler, but they fail to account for cold-start latency and execution limits that break low-latency requirements.

How to eliminate wrong answers

Option B is wrong because SageMaker batch transform jobs are designed for asynchronous, offline inference on large datasets and do not provide real-time, low-latency responses. Option C is wrong because AWS Lambda has a maximum execution timeout of 15 minutes and limited memory (up to 10 GB), making it unsuitable for hosting even a small ensemble of models that require persistent, low-latency inference; additionally, packaging models in Lambda layers adds cold-start latency and complexity. Option D is wrong because SageMaker Serverless Inference endpoints automatically scale to zero when not in use, incurring cold-start latency that can exceed acceptable thresholds for real-time inference, and they are optimized for intermittent or bursty traffic, not sustained low-latency workloads.

479
MCQeasy

A data scientist is training a linear regression model to predict house prices. The dataset contains 10 features. After training, the data scientist notices that the model has high bias (underfitting). Which action should the data scientist take to reduce bias?

A.Reduce the amount of training data
B.Add more features, such as polynomial features
C.Increase the regularization strength
D.Use a simpler model, such as ridge regression
AnswerB

Adding features increases model complexity, reducing bias.

Why this answer

High bias (underfitting) means the model is too simple to capture the underlying patterns in the data. Adding more features, such as polynomial features, increases model complexity, allowing the linear regression model to fit non-linear relationships and reduce bias. This directly addresses the underfitting issue by giving the model more expressive power.

Exam trap

Cisco often tests the bias-variance tradeoff by making candidates confuse regularization (which reduces variance) with the need to increase model complexity to fix underfitting; the trap here is that increasing regularization or using a simpler model seems like a 'safe' choice, but it actually worsens bias.

How to eliminate wrong answers

Option A is wrong because reducing the amount of training data would increase variance and potentially worsen bias, as the model would have even less information to learn from. Option C is wrong because increasing regularization strength penalizes model complexity, which would further increase bias by forcing the model to be simpler. Option D is wrong because using a simpler model, such as ridge regression (which is a regularized linear model), would also increase bias by constraining the coefficients, making underfitting worse.

480
MCQmedium

A data scientist is using Amazon SageMaker to train a deep learning model using a built-in algorithm. The training job uses an ml.p3.2xlarge instance and takes 10 hours to complete. The scientist wants to reduce training time without changing the algorithm or model architecture. The instance's GPU utilization is consistently at 95%, but CPU utilization is only 20%. The data input pipeline uses SageMaker Pipe mode with the 'TrainingInputMode' set to 'Pipe'. The training dataset is 200 GB in CSV format stored in S3. Which approach is most likely to reduce training time?

A.Switch from Pipe mode to File mode to reduce I/O overhead
B.Use Pipe mode with 'S3DataType' as 'AugmentedManifestFile'
C.Use a larger instance type with more GPUs, such as ml.p3.8xlarge
D.Reduce the batch size to improve GPU utilization
AnswerC

More GPUs can parallelize computation and reduce training time.

Why this answer

Option D is correct. Since GPU utilization is high (95%), the GPU is the bottleneck. Upgrading to a more powerful GPU instance (e.g., p3.8xlarge with 4 GPUs) can reduce training time by parallelizing computation.

Option A is wrong because File mode may not help and could increase I/O overhead. Option B is wrong because Pipe mode is already being used. Option C is wrong because reducing batch size could underutilize GPU further.

481
MCQeasy

A data scientist wants to build a binary classifier to predict customer churn. The dataset has 10,000 records with 500 churners (5%). Which technique should the data scientist use to address class imbalance?

A.Randomly undersample the majority class.
B.Use SMOTE (Synthetic Minority Over-sampling Technique) to create synthetic samples.
C.Assign higher class weights to the minority class.
D.Downsample the majority class to match the minority class size.
AnswerB

SMOTE generates synthetic examples, effectively balancing the dataset.

Why this answer

SMOTE generates synthetic samples for the minority class, which is appropriate for imbalanced datasets. Option A (downsampling majority class) would lose data. Option B (upweighting minority class) is possible but less common.

Option D (random undersampling) also loses data.

482
MCQeasy

A data scientist is using SageMaker to train a linear regression model. The target variable has a long-tail distribution. Which data transformation is LEAST likely to improve model performance?

A.Add interaction terms between features
B.Apply log transformation to the target variable
C.Normalize all feature values to [0,1]
D.Remove outliers from the target variable
AnswerC

Normalization does not affect linear regression's coefficients; it's not needed.

Why this answer

Option C (Normalization of features) is least likely to help because linear regression is scale-invariant; normalization does not change the model's performance. Option A (Log transformation) can reduce skewness. Option B (Removing outliers) can improve fit.

Option D (Adding interaction terms) can capture relationships.

483
MCQeasy

A company wants to build a model to detect fraudulent transactions. The dataset has a highly imbalanced class distribution. Which technique should be used during training to handle class imbalance?

A.Add more features to the dataset
B.Use SageMaker's built-in fraud detection algorithm that applies random under-sampling
C.Reduce the learning rate
D.Increase the tree depth in XGBoost
AnswerB

The algorithm handles imbalance by under-sampling.

Why this answer

Option A is correct because the built-in fraud detection algorithm in SageMaker uses random under-sampling of the majority class. Option B is wrong because adding more features does not directly handle imbalance. Option C is wrong because increasing tree depth may overfit.

Option D is wrong because reducing the learning rate does not address imbalance.

484
MCQhard

A team is training a deep learning model on SageMaker using a custom PyTorch container. Training takes 24 hours on a single ml.p3.2xlarge instance. The team wants to reduce training time using distributed training. Which strategy is MOST appropriate?

A.Use data parallelism with Horovod across multiple instances
B.Use model parallelism to split the model across multiple GPUs
C.Use SageMaker Managed Spot Training to reduce cost
D.Use SageMaker Automatic Model Tuning to find optimal hyperparameters
AnswerB

Model parallelism splits the model across devices, suitable for large models.

Why this answer

Option C (Model parallelism) is correct for deep learning models that are too large to fit on a single GPU. Option A (Data parallelism) would require the model to fit on each GPU, which may not be the case. Option B (Hyperparameter tuning) does not reduce training time directly.

Option D (Spot instances) may cause interruptions and does not guarantee speedup.

485
MCQmedium

A data scientist is using an IAM role with the policy shown in the exhibit to train a model in SageMaker. The training job fails with a permissions error. What is the missing permission?

A.sagemaker:InvokeEndpoint
B.sagemaker:DescribeTrainingJob
C.s3:ListBucket
D.iam:PassRole
AnswerD

SageMaker requires iam:PassRole to use the execution role.

Why this answer

The training job fails because SageMaker needs to assume the IAM role specified in the training job configuration to access resources like S3 buckets. The `iam:PassRole` permission is required to allow the SageMaker service to pass that role to the training job. Without it, SageMaker cannot assume the role and thus cannot perform actions such as reading training data from S3.

Exam trap

The trap here is that candidates often focus on S3 or SageMaker-specific actions (like `s3:GetObject` or `sagemaker:CreateTrainingJob`) and overlook the prerequisite `iam:PassRole` permission, which is required for SageMaker to assume the role on behalf of the user.

How to eliminate wrong answers

Option A is wrong because `sagemaker:InvokeEndpoint` is used for invoking a deployed endpoint for inference, not for training jobs. Option B is wrong because `sagemaker:DescribeTrainingJob` is a read-only action that allows viewing training job metadata, not a permission required to launch or execute a training job. Option C is wrong because `s3:ListBucket` is an S3 action that might be needed for listing objects in a bucket, but the core issue is that SageMaker cannot assume the IAM role at all, so S3 permissions are irrelevant until the role is passed.

486
MCQhard

A data scientist is training a binary classifier using logistic regression. The dataset has 100,000 samples and 500 features. After training, the model achieves 95% accuracy on the training set but only 70% on the test set. The data scientist suspects overfitting. Which technique would best reduce overfitting while preserving interpretability?

A.Apply L1 regularization (Lasso)
B.Increase the maximum number of iterations
C.Add polynomial features
D.Use a random forest model instead
AnswerA

L1 regularization performs feature selection, reducing overfitting and keeping the model interpretable.

Why this answer

L1 regularization (Lasso) adds a penalty equal to the absolute value of the magnitude of coefficients, which drives many feature weights to exactly zero. This performs automatic feature selection, reducing model complexity and overfitting while keeping the model as a simple linear logistic regression, thus preserving interpretability.

Exam trap

AWS often tests the distinction between regularization techniques that shrink coefficients (L2/Ridge) versus those that zero them out (L1/Lasso), and candidates may mistakenly choose L2 or fail to recognize that L1 directly improves interpretability by removing irrelevant features.

How to eliminate wrong answers

Option B is wrong because increasing the maximum number of iterations only ensures the optimization algorithm converges; it does not address overfitting and may even lead to further overfitting if the model is already fitting noise. Option C is wrong because adding polynomial features increases model complexity and the number of parameters, which would worsen overfitting rather than reduce it. Option D is wrong because while a random forest can reduce overfitting through ensemble averaging, it is a non-linear black-box model that sacrifices the interpretability of logistic regression's coefficient-based explanations.

487
MCQhard

A company is using Amazon SageMaker to train a time series forecasting model using the DeepAR algorithm. The training data contains multiple time series. The model is overfitting. Which action is LEAST likely to reduce overfitting?

A.Decrease the number of layers in the neural network.
B.Increase the dropout rate.
C.Decrease the context length.
D.Reduce the number of time series in the training set.
AnswerD

Less data may worsen overfitting.

Why this answer

Option D is correct because reducing the number of time series in the training set reduces the diversity of training data, which typically increases overfitting rather than reducing it. DeepAR relies on learning patterns across multiple related time series to generalize well; fewer time series mean less shared statistical strength, making the model more likely to memorize noise in the remaining series.

Exam trap

The trap here is that candidates mistakenly think reducing training data always reduces overfitting, but in time series forecasting with DeepAR, fewer time series actually weaken the cross-series learning that regularizes the model, making overfitting worse.

How to eliminate wrong answers

Option A is wrong because decreasing the number of layers reduces the model's capacity, which directly combats overfitting by limiting the complexity of learned representations. Option B is wrong because increasing the dropout rate randomly drops neurons during training, which acts as a regularization technique to prevent co-adaptation and reduce overfitting. Option C is wrong because decreasing the context length shortens the look-back window, forcing the model to rely on fewer historical points and reducing its ability to memorize long-term patterns, which helps mitigate overfitting.

488
MCQhard

A company uses Amazon SageMaker to host a model for real-time inference. The model is a large ensemble of 10 deep learning models, each 500 MB. The total model size is 5 GB, which exceeds the 5 GB limit for SageMaker real-time endpoints. The data scientist wants to reduce the model size without significantly impacting accuracy. The ensemble uses averaging of predictions from all models. The scientist has access to a validation set with 10,000 samples. Which technique should the scientist use to reduce the model size?

A.Use model distillation to train a smaller model that approximates the ensemble
B.Use a more expensive instance type to host the model
C.Use SageMaker Neo to compile and optimize the model
D.Apply weight pruning to each model in the ensemble
AnswerA

Distillation produces a compact model with similar performance.

Why this answer

Option A is correct. Model distillation trains a smaller student model to mimic the ensemble, reducing size while preserving accuracy. Option B is wrong because price-aware instance selection does not reduce model size.

Option C is wrong because SageMaker Neo is for optimization, not size reduction below 5 GB. Option D is wrong because pruning alone may not reduce size enough.

489
MCQmedium

A company uses Amazon SageMaker to train a deep learning model for image classification. The training job is taking longer than expected. The data scientist observes that GPU utilization is low (around 30%) and CPU utilization is high. Which action is most likely to reduce training time?

A.Reduce the batch size
B.Increase the batch size
C.Increase the learning rate
D.Increase the number of data loading workers
AnswerD

More data loading workers can parallelize data preprocessing and reduce I/O bottleneck, improving GPU utilization.

Why this answer

Option C is correct because low GPU utilization indicates that the data pipeline is not feeding data fast enough, causing the GPU to idle. Increasing the number of data loading workers can improve data throughput. Option A is wrong because larger batch sizes may increase memory usage and not directly address the bottleneck.

Option B is wrong because reducing batch size may further underutilize GPU. Option D is wrong because increasing learning rate does not address data loading bottleneck.

490
MCQmedium

A data scientist is building a fraud detection model using a highly imbalanced dataset. The model uses a random forest classifier. The recall for the minority class is 0.6, and precision is 0.9. The business requires recall above 0.8. Which action should the data scientist take to improve recall?

A.Perform feature selection to remove noisy features.
B.Increase the maximum depth of the trees.
C.Increase the class weight for the minority class in the algorithm.
D.Decrease the probability threshold for classifying a transaction as fraudulent.
E.Increase the number of trees in the random forest.
AnswerD

Lower threshold increases true positives (recall) but may reduce precision.

Why this answer

Option D is correct because decreasing the classification threshold for the positive class increases recall (more positives predicted) at the cost of precision. Option A (more trees) reduces variance, may not improve recall. Option B (class weights) can help but is already used.

Option C (feature selection) may reduce recall. Option E (increase max depth) could lead to overfitting and not necessarily improve recall.

491
MCQhard

A data scientist runs a SageMaker training job and receives the above error. The S3 bucket 'my-bucket' contains a folder 'data' with a file 'data.csv'. What is the MOST likely cause of the error?

A.The instance type ml.m5.large does not have enough memory
B.The VolumeSizeInGB is too small to download the data
C.The S3 URI should be s3://my-bucket/data/data.csv instead of s3://my-bucket/data
D.The S3 bucket and the training job are in different regions
AnswerC

If the training script expects a single file, the S3 URI must point to the file directly.

Why this answer

The error occurs because the SageMaker training job expects a specific S3 object URI (pointing to a file), not a prefix (pointing to a folder). When you specify `s3://my-bucket/data`, SageMaker interprets it as a prefix and attempts to list objects under that prefix, but the training channel requires a direct file reference. Using `s3://my-bucket/data/data.csv` provides the exact object path, allowing SageMaker to download the file correctly.

Exam trap

The trap here is that candidates confuse S3 prefixes (folders) with S3 objects (files), assuming SageMaker can automatically resolve a folder to its contents, when in fact it requires an explicit file path for training data channels.

How to eliminate wrong answers

Option A is wrong because the error is about S3 URI format, not instance memory; ml.m5.large has sufficient memory for typical CSV processing. Option B is wrong because VolumeSizeInGB controls the local storage volume for the training instance, not the download of data from S3; SageMaker downloads data to the volume regardless of its size. Option D is wrong because cross-region S3 access would cause a different error (e.g., 'Access Denied' or 'BucketRegionError'), not a URI parsing error, and SageMaker training jobs can access buckets in different regions if the IAM role allows it.

492
Multi-Selecthard

Which THREE factors should be considered when selecting the appropriate algorithm for a regression problem? (Choose 3.)

Select 3 answers
A.The number of features relative to the number of samples
B.The interpretability requirements of the business stakeholders
C.The presence of non-linear relationships in the data
D.The time of day the training will occur
E.The color of the data scientist's laptop
AnswersA, B, C

High-dimensional data may require regularization.

Why this answer

Option A is correct because the ratio of features to samples directly impacts model complexity and overfitting risk. In high-dimensional settings (e.g., p >> n), algorithms like linear regression may fail due to singular covariance matrices, while regularized methods (Ridge, Lasso) or tree-based models become necessary. This is a core consideration in the bias-variance tradeoff for regression problems.

Exam trap

AWS often tests the distinction between operational concerns (like training time or hardware) and core modeling factors, expecting candidates to recognize that irrelevant options (time of day, laptop color) are clear distractors while the three correct factors directly influence algorithm performance and business suitability.

493
MCQmedium

A company is using SageMaker built-in object detection algorithm to detect defects in manufacturing images. The model is trained on 10,000 labeled images and achieves 95% accuracy. However, in production, the model misclassifies many defective items as non-defective (false negatives). The business requires recall > 90% for the defect class. Which action should they take?

A.Use a different algorithm such as semantic segmentation
B.Adjust the decision threshold of the model to increase recall at the expense of precision
C.Use SageMaker's Automatic Model Tuning to find better hyperparameters
D.Retrain the model with more images of non-defective items
AnswerB

Lowering the threshold increases recall for the positive class.

Why this answer

Threshold tuning directly optimizes recall for a given class.

494
Multi-Selecthard

A company is deploying a machine learning model for fraud detection. The model outputs a probability score. The cost of false negatives is very high. Which TWO metrics should the company focus on optimizing?

Select 2 answers
A.Precision
B.False positive rate (FPR)
C.F1 score
D.Area under the ROC curve (AUC-ROC)
E.Recall
AnswersC, E

F1 = harmonic mean of precision and recall; optimizing F1 also improves recall.

Why this answer

Recall (true positive rate) measures ability to find positives; minimizing false negatives is optimizing recall. AUC-ROC summarizes overall performance but not specific to false negatives. Precision focuses on false positives.

FPR is about false positives. F1 balances precision and recall, but recall directly addresses false negatives.

495
MCQmedium

A team is deploying a real-time inference endpoint using Amazon SageMaker. The model is a large deep learning model that requires GPU for inference. The endpoint must handle variable traffic patterns with minimal latency. Which deployment strategy should the team use?

A.Deploy a single model endpoint with an auto-scaling policy.
B.Use a SageMaker multi-model endpoint with GPU instance type.
C.Deploy a serverless endpoint using SageMaker Serverless Inference.
D.Use SageMaker Batch Transform to process requests in batches.
AnswerB

Multi-model endpoints allow hosting multiple models on GPU instances, handling variable traffic efficiently.

Why this answer

B is correct because SageMaker multi-model endpoints (MMEs) allow multiple models to be hosted on a single GPU-backed endpoint, dynamically loading and unloading models from disk to GPU memory as needed. This reduces cost and cold-start latency compared to single-model endpoints, while still providing GPU acceleration for deep learning inference. MMEs are ideal for variable traffic patterns because they can scale horizontally and share GPU resources efficiently.

Exam trap

The trap here is that candidates often assume serverless inference (Option C) is suitable for GPU workloads, but AWS SageMaker Serverless Inference only supports CPU instances, making it incompatible with large deep learning models that require GPU acceleration.

How to eliminate wrong answers

Option A is wrong because a single model endpoint with auto-scaling can handle variable traffic but does not optimize GPU utilization for multiple models; it would require separate endpoints for each model, increasing cost and management overhead. Option C is wrong because SageMaker Serverless Inference does not support GPU instances; it uses CPU-based compute, which is unsuitable for large deep learning models requiring GPU acceleration. Option D is wrong because SageMaker Batch Transform is designed for offline, asynchronous batch processing, not real-time inference with minimal latency; it cannot handle variable traffic patterns dynamically.

496
MCQmedium

A data scientist is training a deep learning model on Amazon SageMaker using a custom Docker container. The training job fails with an error 'OutOfMemoryError: CUDA out of memory'. The instance type is ml.p3.2xlarge (8 GB GPU memory). The model has 50 million parameters. What is the most likely cause and solution?

A.The instance type is insufficient; switch to ml.p3.8xlarge
B.The batch size is too large; reduce batch size
C.Enable gradient checkpointing to reduce memory
D.The model uses FP32 precision; enable mixed precision training
AnswerD

Mixed precision (FP16) halves memory usage, fitting the model into 8 GB.

Why this answer

Option B is correct because 50M parameters likely exceed GPU memory when using full precision. Mixed precision (FP16) reduces memory usage. Option A (batch size) could help but is secondary.

Option C (instance type) may be unnecessary if mixed precision works. Option D (checkpointing) doesn't address memory during training.

497
MCQeasy

A data scientist is training a binary classification model using Amazon SageMaker. The dataset is highly imbalanced (99% negative class, 1% positive class). The model currently achieves 99% accuracy but fails to detect most positive cases. Which metric should the data scientist primarily use to evaluate model performance?

A.ROC AUC
B.F1 score
C.Recall
D.Accuracy
AnswerB

F1 score balances precision and recall, suitable for imbalanced data.

Why this answer

In highly imbalanced datasets (99% negative, 1% positive), accuracy is misleading because a model can achieve 99% accuracy by simply predicting the majority class for all instances, failing to detect any positive cases. The F1 score (option B) is the harmonic mean of precision and recall, providing a balanced measure that penalizes models that trade off recall for precision or vice versa. This makes it the primary metric for evaluating binary classification performance on imbalanced data, as it directly reflects the model's ability to correctly identify positive cases while minimizing false positives.

Exam trap

The trap here is that candidates see 99% accuracy and assume the model is performing well, failing to recognize that accuracy is meaningless on imbalanced datasets, and they may incorrectly choose ROC AUC because it is commonly used for binary classification without understanding its limitations with extreme class imbalance.

How to eliminate wrong answers

Option A (ROC AUC) is wrong because it measures the model's ability to rank positive instances higher than negative ones across all thresholds, which can be overly optimistic on highly imbalanced datasets and does not directly reflect precision or recall for the minority class. Option C (Recall) is wrong because while it captures the proportion of actual positives correctly identified, it ignores false positives, so a model could achieve high recall by predicting all instances as positive, which is not useful. Option D (Accuracy) is wrong because it is dominated by the majority class; a model that always predicts the negative class achieves 99% accuracy but fails entirely to detect positive cases, making it a poor metric for imbalanced classification.

498
MCQeasy

A machine learning team is using SageMaker to train a model with the built-in Linear Learner algorithm. The dataset has 1 million rows and 20 features. The training completes, but the model's mean squared error (MSE) is high. Which parameter adjustment is most likely to reduce MSE?

A.Increase the mini-batch size
B.Change the loss function to cross-entropy
C.Increase the number of epochs
D.Increase the learning rate
AnswerC

More epochs allow the algorithm to converge to a lower loss.

Why this answer

Option D is correct because increasing the number of epochs allows the model to converge better. Option A (learning rate increase) may cause instability. Option B (batch size increase) can slow convergence.

Option C (loss function change) is not straightforward.

499
MCQhard

A machine learning team is using SageMaker to train a custom TensorFlow model on a dataset that fits in memory. The training job is taking too long. The team wants to reduce training time without changing the model architecture. Which approach is most effective?

A.Switch the input mode from File to Pipe
B.Use SageMaker managed spot training
C.Use Amazon EFS as the input data source instead of S3
D.Use a larger instance type with more vCPUs
AnswerA

Pipe mode streams data directly, reducing I/O wait time and speeding up training.

Why this answer

Switching the input mode from File to Pipe is the most effective approach because it streams data directly from Amazon S3 to the training container, eliminating the need to download the entire dataset to the local storage before training begins. This reduces the I/O bottleneck and significantly cuts down the time spent on data loading, especially for datasets that fit in memory, as the model can start training almost immediately while data is being streamed.

Exam trap

AWS often tests the misconception that larger instances always reduce training time, but the trap here is that the dataset fits in memory, so the bottleneck is typically I/O, not compute, making data streaming optimizations like Pipe mode more effective than scaling up hardware.

How to eliminate wrong answers

Option B is wrong because SageMaker managed spot training reduces cost by using spare EC2 capacity, but it does not inherently reduce training time; in fact, it can increase total time due to potential interruptions and checkpoint restarts. Option C is wrong because Amazon EFS as an input data source typically introduces higher latency and slower throughput compared to S3, and it does not support the Pipe input mode, so it would likely increase training time. Option D is wrong because using a larger instance type with more vCPUs may improve compute parallelism but does not address the data loading bottleneck that is the primary cause of slow training; the dataset fits in memory, so the issue is likely I/O-bound, not compute-bound.

500
MCQhard

Refer to the exhibit. An IAM policy is attached to a SageMaker notebook instance role. A data scientist is trying to train a model using the SageMaker built-in XGBoost algorithm with training data in 'my-bucket/training-data/' and expects output in 'my-bucket/output/'. The training job fails with an access denied error. What is the most likely missing permission?

A.iam:PassRole on the SageMaker execution role.
B.ecr:GetAuthorizationToken on the ECR repository.
C.s3:ListBucket on the S3 bucket.
D.sagemaker:DescribeTrainingJob on the training job.
AnswerA

The policy is missing iam:PassRole, which is required to allow SageMaker to assume the execution role for the training job.

Why this answer

The training job uses the SageMaker built-in algorithm, which downloads the training data from S3 and uploads output. The policy allows s3:GetObject on training-data and s3:PutObject on output. However, SageMaker also needs to read the algorithm image from ECR (elasticcontainerregistry).

The missing permission is likely ecr:GetDownloadUrlForLayer or ecr:BatchGetImage. Also, SageMaker needs to pass roles. But the error is 'access denied', likely from ECR.

Option A (ecr:GetAuthorizationToken) is needed to authenticate, but typically SageMaker uses the role to pull images. Option B (s3:ListBucket) is needed if the training job lists objects. Option C (sagemaker:DescribeTrainingJob) is not needed for execution.

Option D (iam:PassRole) is needed for the training job to assume the role. Actually, SageMaker needs iam:PassRole to pass the execution role to the training job. But the error 'access denied' could be due to missing iam:PassRole.

However, the policy does not include iam:PassRole. The most likely missing permission is iam:PassRole. Let's check: The policy allows creating training job, but the training job also needs to pass a role to SageMaker.

Without iam:PassRole, the API call fails. So D is correct. But also ECR permissions might be needed.

However, IAM PassRole is a common missing permission. I'll go with D.

501
MCQmedium

A data scientist is training a gradient boosting model using SageMaker's built-in XGBoost algorithm. The model is overfitting on the training data. Which hyperparameter adjustment is most likely to reduce overfitting?

A.Increase learning rate (eta)
B.Increase max_depth
C.Increase num_round
D.Increase lambda (L2 regularization)
AnswerD

Higher lambda penalizes large weights, reducing overfitting.

Why this answer

Increasing the L2 regularization term (lambda) penalizes large weights, reducing overfitting. Option A is wrong because increasing max_depth increases model complexity. Option B is wrong because increasing num_round can increase overfitting.

Option D is wrong because increasing learning rate may cause overfitting if not paired with regularization.

502
MCQhard

A machine learning engineer is building a binary classification model to predict customer churn. The dataset is highly imbalanced (5% churn). The engineer wants to use Amazon SageMaker's built-in XGBoost algorithm. Which combination of hyperparameters is most appropriate for this scenario?

A.scale_pos_weight=19, subsample=0.8
B.scale_pos_weight=0.05, subsample=0.8
C.scale_pos_weight=19, subsample=1.0
D.scale_pos_weight=1, subsample=1.0
AnswerA

Correct ratio and subsample for regularization.

Why this answer

In a highly imbalanced dataset with only 5% churn, the ratio of negative to positive classes is 95:5, or 19:1. The `scale_pos_weight` hyperparameter in XGBoost should be set to this ratio (19) to penalize misclassifications of the minority class more heavily. A `subsample` of 0.8 introduces stochasticity and helps prevent overfitting, which is especially important when the minority class is small.

Exam trap

The trap here is that candidates often confuse `scale_pos_weight` with a simple class weight or mistakenly think a value less than 1 is needed for the minority class, when in fact it should be the ratio of majority to minority class counts.

How to eliminate wrong answers

Option B is wrong because `scale_pos_weight=0.05` would actually down-weight the minority class, making the model ignore churn cases entirely. Option C is wrong because `subsample=1.0` uses the full dataset for every tree, which increases the risk of overfitting on the minority class without any regularization from row sampling. Option D is wrong because `scale_pos_weight=1` treats both classes equally, failing to address the 19:1 class imbalance, and `subsample=1.0` again provides no overfitting protection.

503
MCQmedium

A company uses SageMaker to deploy a real-time inference endpoint for a fraud detection model. The model is an XGBoost model trained on 50 features. The endpoint receives 100 requests per second, but latency is higher than the required 200 ms. The team wants to reduce latency without retraining. What should they do?

A.Increase the number of instances behind the endpoint
B.Use SageMaker's batch transform instead of real-time endpoint
C.Reduce the number of features by selecting the most important ones
D.Use SageMaker's Elastic Inference to attach an acceleration to the endpoint
AnswerC, D

Fewer features reduce inference time.

Why this answer

Reducing features directly lowers latency. Elastic Inference does not apply to XGBoost.

504
MCQhard

A machine learning engineer is deploying a PyTorch model to SageMaker. The model requires custom inference logic. Which approach should the engineer use?

A.Use a SageMaker built-in PyTorch container as-is
B.Use SageMaker Ground Truth to deploy the model
C.Use SageMaker Processing to run inference
D.Create a custom inference script and use the SageMaker PyTorch container
AnswerD

SageMaker PyTorch container supports custom entry points.

Why this answer

Option B is correct because SageMaker allows you to provide a custom inference script (entry point) for PyTorch. Option A is wrong because SageMaker built-in containers do not support arbitrary custom logic. Option C is wrong because SageMaker Processing is for data processing, not inference.

Option D is wrong because SageMaker Ground Truth is for labeling.

505
MCQmedium

Refer to the exhibit. A data scientist runs the above CLI command to create a SageMaker training job. The job fails with an error 'Unable to read data from s3://bucket/train/'. What is the MOST likely cause?

A.The training image is not accessible
B.The instance type does not support the required memory
C.The IAM role does not have permissions to read from the S3 bucket
D.The training job is in a different region than the S3 bucket
AnswerC

The role must have s3:GetObject permission for the training data.

Why this answer

The error 'Unable to read data from s3://bucket/train/' indicates that the SageMaker training job cannot access the S3 input data. The most common cause is that the IAM role specified in the command does not have the necessary s3:GetObject permission on the S3 bucket or objects. SageMaker uses the IAM role to assume permissions for reading training data, and without proper S3 read access, the job fails at the data loading stage.

Exam trap

The trap here is that candidates may confuse the error message with a network or region issue, but the 'Unable to read data' error is almost always an IAM permissions problem, not a connectivity or resource constraint issue.

How to eliminate wrong answers

Option A is wrong because if the training image were not accessible, the error would typically be 'Unable to pull image' or 'Image not found', not a data read error from S3. Option B is wrong because insufficient memory would cause an out-of-memory or resource-exhausted error, not a failure to read data from S3. Option D is wrong because SageMaker automatically handles cross-region S3 access by copying data to the training job's region; a region mismatch would not produce an 'Unable to read data' error unless the bucket policy explicitly denies cross-region access, which is not the default behavior.

506
MCQhard

A data scientist is using Amazon SageMaker to train a custom TensorFlow model. The training job is failing with the error: 'OutOfRangeError: End of sequence'. The input data is stored in TFRecord format in S3. What is the most likely cause?

A.The TFRecord files are corrupted.
B.The number of training steps or epochs specified exceeds the dataset size.
C.The instance type does not have enough memory.
D.The shuffle buffer size is too large.
AnswerB

The training loop continues beyond available data, causing the error.

Why this answer

The 'OutOfRangeError: End of sequence' error in TensorFlow occurs when the training loop attempts to read more data than is available in the dataset. This typically happens when the number of training steps or epochs specified exceeds the total number of records in the TFRecord files, causing the iterator to reach the end of the dataset prematurely.

Exam trap

The trap here is that candidates often confuse 'OutOfRangeError' with data corruption or memory issues, but the error specifically indicates the dataset has been fully iterated, not that the data is damaged or resources are insufficient.

How to eliminate wrong answers

Option A is wrong because corrupted TFRecord files would typically cause parsing errors (e.g., 'DataLossError' or 'InvalidArgumentError'), not an 'End of sequence' error which indicates the iterator has exhausted valid data. Option C is wrong because insufficient memory would manifest as an 'OutOfMemoryError' or a resource exhaustion error, not a dataset iteration boundary error. Option D is wrong because a large shuffle buffer size may increase memory usage but does not cause an 'End of sequence' error; it only affects the randomness of data ordering within the available dataset.

507
Multi-Selecthard

A data scientist is using Amazon SageMaker to train a deep learning model. The training job is taking too long. Which THREE actions can reduce training time?

Select 3 answers
A.Use incremental training to continue from a previous model
B.Use Spot Instances to reduce cost
C.Use Pipe input mode to stream data directly from Amazon S3
D.Decrease the batch size to reduce memory usage
E.Use a GPU instance type for faster computation
AnswersA, C, E

Incremental training starts from an existing model, requiring fewer epochs.

Why this answer

Incremental training allows you to start from a previously trained model, which reduces training time because the model does not need to learn from scratch. SageMaker's incremental training loads the existing model artifacts and continues training on new data, significantly cutting down the time required to converge compared to full retraining.

Exam trap

The trap here is that candidates often confuse cost-saving techniques (like Spot Instances) with performance-improving techniques, or they mistakenly think decreasing batch size always speeds up training, when in fact it can slow it down due to increased overhead.

508
Multi-Selectmedium

A data scientist is building a deep learning model using Amazon SageMaker. The model is overfitting the training data. Which THREE actions can help reduce overfitting?

Select 3 answers
A.Add L2 regularization to the loss function.
B.Use data augmentation to increase the training dataset size.
C.Increase the number of layers in the network.
D.Reduce the learning rate.
E.Use dropout layers in the network.
AnswersA, B, E

L2 regularization penalizes large weights, reducing overfitting.

Why this answer

Overfitting can be reduced by regularization (L2), dropout, data augmentation (increases effective data size), early stopping, and reducing model complexity. Increasing model complexity (more layers) would increase overfitting (Option B). So correct: A, C, D.

509
MCQmedium

A team is deploying a SageMaker endpoint for a model that was trained with scikit-learn. The endpoint receives spikes in traffic during business hours. The team wants to minimize cost while ensuring availability during spikes. Which endpoint configuration is MOST appropriate?

A.Use SageMaker Serverless Inference
B.Use a production variant endpoint with auto-scaling based on CPU utilization
C.Use a multi-model endpoint with a single instance type
D.Deploy a single large instance that can handle peak load
AnswerB

Auto-scaling handles traffic spikes efficiently.

Why this answer

Option B is correct because a production variant endpoint with auto-scaling based on CPU utilization allows the SageMaker endpoint to dynamically adjust the number of instances in response to traffic spikes, ensuring availability during business hours while minimizing cost by scaling down during off-peak periods. This approach is ideal for a scikit-learn model, which is CPU-bound, making CPU utilization a relevant and effective scaling metric.

Exam trap

The trap here is that candidates often confuse serverless inference with cost optimization for predictable spikes, overlooking that auto-scaling with a relevant metric like CPU utilization provides both cost efficiency and availability for scheduled traffic patterns.

How to eliminate wrong answers

Option A is wrong because SageMaker Serverless Inference is designed for intermittent or unpredictable traffic patterns with low latency requirements, but it can incur cold start latency and is not optimal for consistent daily spikes during business hours, potentially leading to higher costs or performance issues. Option C is wrong because a multi-model endpoint with a single instance type does not provide auto-scaling; it hosts multiple models on a single instance, which cannot handle traffic spikes by itself and would still require scaling mechanisms to ensure availability. Option D is wrong because deploying a single large instance that can handle peak load results in over-provisioning and higher costs during off-peak hours, as the instance remains fully running regardless of actual traffic, contradicting the goal of minimizing cost.

510
Multi-Selecteasy

Which TWO of the following are examples of unsupervised learning tasks?

Select 2 answers
A.Classifying emails as spam or not spam
B.Dimensionality reduction using PCA
C.Sentiment analysis of product reviews
D.Clustering customer segments
E.Predicting house prices
AnswersB, D

PCA reduces features without labels.

Why this answer

Principal Component Analysis (PCA) is an unsupervised learning technique used for dimensionality reduction. It works by identifying the directions (principal components) that maximize variance in the data, without requiring any labeled target variable. This makes it a classic example of unsupervised learning, as the algorithm learns patterns solely from the input features.

Exam trap

Cisco often tests the distinction between supervised and unsupervised learning by presenting tasks that seem intuitive (like clustering) but pairing them with tasks that require labeled outputs (like classification or regression), so candidates must recognize that any task involving a target variable is supervised.

511
MCQmedium

A company is building a recommender system using matrix factorization. The dataset contains user-item interactions. The model is trained on a large dataset, but the recommendations for new users are poor. Which approach would MOST effectively address this cold-start problem?

A.Incorporate user demographic features as side information
B.Switch to item-based collaborative filtering only
C.Increase the number of latent factors in the model
D.Use only implicit feedback signals for training
AnswerA

Side information helps generalize to new users by leveraging metadata.

Why this answer

Matrix factorization models learn latent factors only from user-item interactions. For new users with no history, the model cannot compute a meaningful latent vector, leading to poor recommendations. Incorporating user demographic features as side information allows the model to initialize or infer latent factors for new users based on their attributes, directly addressing the cold-start problem.

Exam trap

The trap here is that candidates may think increasing latent factors or switching to implicit feedback improves generalization, but neither addresses the fundamental lack of user interaction data for new users.

How to eliminate wrong answers

Option B is wrong because switching to item-based collaborative filtering still relies on user-item interactions and does not solve the cold-start problem for new users with no history. Option C is wrong because increasing the number of latent factors may improve model capacity but does not provide any information about new users, so it cannot mitigate the cold-start issue. Option D is wrong because using only implicit feedback signals does not introduce any new user attributes; it still requires historical interactions to generate recommendations, leaving the cold-start problem unresolved.

512
MCQmedium

A data scientist is training a neural network on image data using TensorFlow with GPU instances on SageMaker. The training is slow because the GPU utilization is low. The data pipeline uses tf.data with a large number of preprocessing operations. Which action would most likely increase GPU utilization?

A.Increase the learning rate to converge faster.
B.Increase the prefetch buffer size in the tf.data pipeline.
C.Reduce the batch size to speed up each step.
D.Increase the number of CPU instances in the training job.
E.Use smaller image sizes to reduce computation.
AnswerB

Prefetching overlaps CPU data preparation with GPU computation, improving GPU utilization.

Why this answer

Option D is correct because prefetching allows the CPU to prepare batches while the GPU is computing, reducing idle time. Option A (reduce batch size) may decrease utilization. Option B (increase CPU instances) may not help if bottleneck is data pipeline.

Option C (use smaller images) reduces computation but may not improve utilization percentage. Option E (increase learning rate) does not affect data throughput.

513
MCQeasy

A company is using SageMaker to train a text classification model using a built-in BlazingText algorithm. The dataset has 500,000 documents, each labeled with one of 10 categories. The training time is taking longer than expected. The data scientist wants to speed up training without increasing cost. The training job is using a single ml.m4.xlarge instance. The code uses default hyperparameters. Which change is MOST likely to reduce training time? A. Use a larger instance type, such as ml.m4.4xlarge. B. Increase the learning rate significantly. C. Use SageMaker Managed Spot Training. D. Use the 'mode' hyperparameter set to 'batch_skipgram' instead of 'supervised'. The company has a fixed budget and wants to minimize cost while reducing training time. Which option should the data scientist choose?

A.Increase the learning rate significantly
B.Use the 'mode' hyperparameter set to 'batch_skipgram' instead of 'supervised'
C.Use SageMaker Managed Spot Training
D.Use a larger instance type, such as ml.m4.4xlarge
AnswerC

Spot instances reduce cost, allowing more resources for same budget.

Why this answer

Option C is the best because spot instances can reduce cost and training time is not affected; they can use the same instance type at lower cost, allowing them to use more instances if needed. Option A increases cost. Option B may cause the model not to converge.

Option D changes the problem to unsupervised, not appropriate.

514
MCQmedium

Refer to the exhibit. A data scientist is using Amazon SageMaker Ground Truth to label a dataset. The output manifest file references S3 objects with metadata. The scientist notices that a training job using the labeled data yields poor accuracy. What is the most likely issue?

A.The labeled dataset has missing labels for some records.
B.The training data is in an incorrect format for the algorithm.
C.The IAM role used for training does not have permissions to read the manifest file.
D.The data distribution differs significantly between the training set and the real-world inference data.
AnswerB

If the data format does not match the algorithm's expectations, training may complete but produce poor results.

Why this answer

The metadata shows 'sagemaker-import-job': 'true', which indicates the object was imported from a SageMaker import job. However, that metadata is not relevant. The content length is 1 GB, which is large.

The poor accuracy could be due to many reasons. But the exhibit shows a head-object response, which doesn't directly indicate a problem. However, the question implies that the metadata might be incorrect.

Actually, the metadata 'sagemaker-import-job' is set by Ground Truth when importing data. But if the data is not properly labeled, the manifest might be wrong. Option D (data distribution shift) is plausible.

Option B (incorrect IAM permissions) would cause access errors. Option C (incorrect data format) could cause issues. Option A (missing labels) is a common Ground Truth issue.

But the exhibit doesn't show labels. I think the most likely is that the training data is not representative because the labeling job might have introduced bias. However, I'll choose D (data distribution shift between training and inference).

But the question is about the labeled data. Maybe the issue is that the metadata indicates the data was imported but not labeled? Actually, Ground Truth output manifest includes labels. The head-object shows the raw data object, not the manifest.

The scientist is looking at the source data. The poor accuracy could be because the data is not properly preprocessed. I'll choose B (incorrect IAM permissions) because if the training job cannot read the manifest, it would fail, but accuracy is poor, not failure.

So not that. Option A: missing labels – if the manifest is missing labels, training would fail. Option C: incorrect data format – if the data format is wrong, training might run but produce poor results.

That is plausible. I'll go with C.

515
MCQhard

A data scientist is training a neural network for a multi-class classification problem with 100 classes. The model uses a softmax output layer and cross-entropy loss. During training, the loss decreases steadily but the accuracy on the validation set plateaus early. Which of the following is the most likely cause?

A.Batch size is too large
B.The model is overfitting the training data
C.Number of epochs is too small
D.Learning rate is too high
AnswerB

Overfitting occurs when the model learns training data noise, causing training loss to keep decreasing while validation performance stagnates.

Why this answer

When the validation accuracy plateaus early while training loss continues to decrease, it indicates that the model is memorizing the training data rather than learning generalizable patterns. This is classic overfitting, where the softmax output layer produces high-confidence predictions for training samples but fails to generalize to unseen validation data, causing cross-entropy loss to drop on the training set while validation accuracy stagnates.

Exam trap

AWS often tests the distinction between overfitting and underfitting by pairing a decreasing training loss with a plateauing validation metric, tricking candidates into choosing learning rate or epoch issues when the real problem is memorization.

How to eliminate wrong answers

Option A is wrong because a batch size that is too large typically leads to slower convergence or poorer generalization, not a plateau in validation accuracy while training loss decreases; it would more likely cause both losses to be high or unstable. Option C is wrong because too few epochs would cause both training and validation accuracy to be low and still improving, not a plateau in validation accuracy alone. Option D is wrong because a learning rate that is too high usually causes the loss to diverge or oscillate, not a steady decrease in training loss with a plateau in validation accuracy.

516
MCQhard

A machine learning engineer is deploying a model that predicts loan defaults. The model uses features like income, credit score, and debt-to-income ratio. After deployment, the model's performance degrades over time. Which concept best describes this phenomenon?

A.Data drift
B.Concept drift
C.Overfitting
D.Model drift
AnswerD

Model drift is the degradation of model performance over time.

Why this answer

Option C is correct because model drift occurs when the statistical properties of the target variable change over time. Option A is wrong because concept drift is a broader term. Option B is wrong because data drift refers to changes in input distribution.

Option D is wrong because overfitting is not a time-dependent degradation.

517
MCQeasy

A company is using Amazon SageMaker to train a linear regression model. The data scientist notices that the training loss is decreasing but the validation loss has started to increase after a few epochs. What is the most likely cause?

A.The model is underfitting the training data.
B.There is data leakage from the validation set into the training set.
C.The model is overfitting the training data.
D.The learning rate is too high.
AnswerC

Decreasing training loss with increasing validation loss is a classic sign of overfitting.

Why this answer

When training loss decreases but validation loss increases, the model is overfitting to the training data. This is a classic sign of overfitting. Underfitting would show both losses high.

Learning rate too high would cause divergence. Data leakage would cause both losses to be artificially low.

518
MCQhard

A company uses Amazon SageMaker to host a model for real-time inference. The model is a large ensemble that takes 2 seconds to load into memory. To reduce cold start latency, the data scientist uses SageMaker's managed warm pools. However, they notice that during a sudden traffic spike, new instances still experience high latency. What is the BEST way to ensure consistently low latency for all requests?

A.Use a larger instance type to reduce model loading time.
B.Configure auto scaling based on the number of active invocations to maintain a buffer of warmed instances.
C.Reduce the number of instances to minimize cold start frequency.
D.Switch to SageMaker Serverless Inference.
AnswerB

Auto scaling with a buffer ensures that new instances are provisioned ahead of demand, reducing cold start impact.

Why this answer

Option C is correct. Using auto scaling with a target metric keeps a buffer of warm instances. Option A is wrong because increasing instance size reduces per-request time but cold start still occurs.

Option B is wrong because Serverless Inference has its own cold start issues. Option D is wrong because scaling down aggressively increases cold starts.

519
MCQeasy

A retail company uses Amazon SageMaker to train a model for product demand forecasting. The dataset contains daily sales data for 10,000 products over 3 years. The data includes features like price, promotions, holidays, and seasonality. The data scientist uses a linear regression model and gets an RMSE of 50 units. However, the business requires more accurate forecasts, especially for products with high variability. The scientist notices that the residuals show a pattern: the model underestimates demand during promotional periods. Which approach should the scientist take to improve the model?

A.Add interaction features between promotion and other variables.
B.Collect more historical data for training.
C.Use a deep learning model like LSTM.
D.Remove promotion features to simplify the model.
AnswerA

Interaction terms capture combined effects.

Why this answer

Option D is correct because adding interaction terms between promotion and other features can capture the promotional effect. Option A is wrong because more data may not fix the bias. Option B is wrong because moving to deep learning may be overkill.

Option C is wrong because excluding promotions removes important information.

520
MCQeasy

A data scientist is training a binary classification model on a highly imbalanced dataset (99% negative class, 1% positive class). The model currently achieves 99% accuracy but only identifies 0.5% of true positives. Which metric should the data scientist focus on to improve model performance?

A.Precision
B.Root Mean Squared Error (RMSE)
C.Recall
D.Accuracy
AnswerC

Recall measures the ability to find all positive samples, which is crucial for imbalanced data.

Why this answer

Recall (sensitivity) measures the proportion of actual positives correctly identified, which is critical when the dataset is highly imbalanced (99% negative, 1% positive) and the model fails to detect most positives (only 0.5% true positives). Improving recall directly addresses the model's inability to capture the minority class, even if it reduces precision or accuracy. In binary classification with severe class imbalance, accuracy is misleading because a model can achieve 99% accuracy by simply predicting the majority class, as seen here.

Exam trap

The trap here is that candidates see 99% accuracy and assume the model is performing well, failing to recognize that accuracy is a deceptive metric in imbalanced datasets, while recall directly measures the model's ability to find the rare positive class.

How to eliminate wrong answers

Option A is wrong because precision focuses on the proportion of predicted positives that are actually positive, which does not address the low true positive rate (0.5%); improving precision could even further reduce recall by making the model more conservative. Option B is wrong because Root Mean Squared Error (RMSE) is a regression metric that measures the average magnitude of errors in continuous predictions, not applicable to binary classification outcomes like true positive identification. Option D is wrong because accuracy is already 99% and is a poor metric for imbalanced datasets; optimizing for accuracy encourages the model to predict the majority class (negative) for all instances, which is exactly why only 0.5% of positives are found.

521
MCQhard

A data scientist is setting up a SageMaker training job and has attached this IAM policy to the execution role. The training job fails with an access denied error when trying to write to the output path 's3://my-bucket/output/model.tar.gz'. What additional permission is needed?

A.s3:ListBucket
B.s3:GetObject for the output path
C.s3:DeleteObject
D.iam:PassRole on the role itself
AnswerA

SageMaker requires ListBucket permission to access the bucket.

Why this answer

The training job fails because SageMaker needs to verify that the output S3 bucket exists before writing to it. The s3:ListBucket permission is required to list the contents of the bucket (or confirm its existence) as part of the write operation. Without this permission, the service cannot validate the bucket, resulting in an access denied error even if s3:PutObject is allowed.

Exam trap

The trap here is that candidates assume only s3:PutObject is needed for writing to S3, but AWS services like SageMaker often require s3:ListBucket to verify the bucket exists before performing write operations.

How to eliminate wrong answers

Option B is wrong because s3:GetObject is a read permission used for retrieving objects, not for writing output; the training job needs write access (s3:PutObject) to create the model artifact. Option C is wrong because s3:DeleteObject is unrelated to writing output; it is used for removing objects, and the training job does not need to delete anything. Option D is wrong because iam:PassRole is required to pass the execution role to the SageMaker service, but the question states the role is already attached to the training job, so this permission is not missing; the error occurs specifically at the S3 write step.

522
MCQeasy

A machine learning engineer is using Amazon SageMaker to deploy a model for real-time inference. The model must respond within 100 milliseconds. The initial deployment uses a single ml.m5.large instance, but latency is too high. Which change should the engineer make to reduce latency?

A.Switch to a compute-optimized instance like ml.c5.2xlarge.
B.Use batch transform instead of real-time endpoint.
C.Deploy to a single ml.t2.medium instance to reduce cost.
D.Deploy the model on a multi-model endpoint.
AnswerA

Compute-optimized instances provide higher CPU performance, reducing prediction latency.

Why this answer

Option A is correct because using a more powerful instance reduces inference time. Option B is wrong because multi-model endpoint can lead to resource contention. Option C is wrong because batch transforms are for offline predictions.

Option D is wrong because scaling down reduces resources.

523
MCQmedium

A company is building a binary classifier to detect fraudulent transactions. The dataset is highly imbalanced (99% legitimate, 1% fraudulent). Which metric is most appropriate for evaluating the model?

A.Accuracy
B.Mean Squared Error
C.F1-score
D.Area Under the ROC Curve (AUC-ROC)
AnswerC

F1-score considers both precision and recall, suitable for imbalanced data.

Why this answer

Precision and recall (or F1-score) are more informative for imbalanced datasets than accuracy, because a model predicting all legitimate would achieve 99% accuracy but be useless. F1-score balances precision and recall.

524
Multi-Selecteasy

A company is using Amazon SageMaker to train a model. Which TWO metrics should be used to evaluate a binary classification model?

Select 2 answers
A.Accuracy
B.Perplexity
C.AUC
D.F1 score
E.Mean Absolute Error
AnswersC, D

AUC is a standard metric for binary classification.

Why this answer

AUC (Area Under the ROC Curve) is a threshold-independent metric that measures the model's ability to distinguish between positive and negative classes across all classification thresholds. For binary classification in SageMaker, AUC is robust to class imbalance and provides a single scalar value representing overall model performance, making it a standard evaluation metric.

Exam trap

The trap here is that candidates often pick Accuracy (A) as a default metric without considering class imbalance, or confuse regression metrics like MAE (E) with classification evaluation, while perplexity (B) is a distractor from NLP contexts.

525
MCQeasy

A machine learning engineer is deploying a model to Amazon SageMaker for real-time inference. The model requires low latency and must handle variable traffic patterns. Which SageMaker feature should the engineer use to automatically scale the number of instances based on demand?

A.SageMaker automatic scaling
B.Amazon EC2 Auto Scaling
C.Elastic Inference
D.SageMaker Batch Transform
AnswerA

SageMaker integrates with Application Auto Scaling to scale the number of instances based on demand.

Why this answer

SageMaker automatic scaling (Application Auto Scaling) is the correct feature because it allows the engineer to define scaling policies (e.g., based on CPU utilization or request latency) that automatically adjust the number of instances behind a SageMaker endpoint in response to real-time traffic patterns. This ensures low latency by maintaining sufficient capacity during spikes and reducing costs during lulls, without manual intervention.

Exam trap

The trap here is that candidates confuse Amazon EC2 Auto Scaling (which scales EC2 instances in an Auto Scaling group) with SageMaker automatic scaling (which scales SageMaker endpoint instances via Application Auto Scaling), leading them to pick B even though it does not directly apply to SageMaker endpoints.

How to eliminate wrong answers

Option B (Amazon EC2 Auto Scaling) is wrong because it operates at the EC2 instance level, not at the SageMaker endpoint level; SageMaker endpoints are managed services that require Application Auto Scaling with a specific SageMaker scalable target (e.g., variant.DesiredInstanceCount). Option C (Elastic Inference) is wrong because it accelerates inference by attaching a GPU accelerator to an instance, but it does not handle scaling of instances based on demand—it only reduces latency for deep learning models. Option D (SageMaker Batch Transform) is wrong because it is designed for offline, asynchronous batch predictions on large datasets, not for real-time inference with variable traffic patterns.

← PreviousPage 7 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Ml Modeling questions.