CCNA Modeling Questions

75 of 624 questions · Page 5/9 · Modeling · Answers revealed

301
MCQmedium

A data scientist is working on a regression problem to predict house prices. The dataset has 80 features, including categorical variables with high cardinality (e.g., zip code with 10,000 unique values). The target variable is log-transformed. The data scientist trains a linear regression model and obtains an R² of 0.45 on the test set. To improve performance, the data scientist considers: A) Applying one-hot encoding to all categorical features and using Ridge regression. B) Using target encoding for high-cardinality features and using a tree-based model like XGBoost. C) Removing all categorical features and using polynomial features for numerical features. D) Using principal component analysis (PCA) on all features before training a linear model. Which approach is MOST likely to improve the model's performance?

A.Remove categorical features and use polynomial features
B.Target encoding + XGBoost
C.One-hot encoding + Ridge regression
D.PCA on all features before linear regression
AnswerB

Target encoding reduces dimensionality and XGBoost captures complex patterns.

Why this answer

Target encoding efficiently handles high-cardinality features, and tree-based models like XGBoost can capture non-linear relationships and interactions, likely improving R². One-hot encoding would create too many features, causing sparsity. Removing categories loses information.

PCA may discard important information.

302
Multi-Selecteasy

A data scientist is evaluating a binary classification model. The model's AUC-ROC is 0.95. Which TWO statements are true?

Select 2 answers
A.The model has no false positives
B.The model has excellent discriminative ability
C.The model's performance is independent of the decision threshold
D.The model is well-calibrated
E.The model's accuracy is at least 95%
AnswersB, C

AUC close to 1 indicates strong separation between classes.

Why this answer

AUC-ROC measures the model's ability to distinguish between classes across all thresholds. A high AUC (close to 1) indicates good performance. AUC-ROC is threshold-independent.

It does not directly indicate accuracy or calibration.

303
Multi-Selectmedium

A data scientist is training a linear regression model on a dataset with 10 numerical features. After training, the model's R-squared value is 0.99 on the training set but only 0.60 on the test set. Which TWO of the following are appropriate actions to reduce overfitting? (Choose TWO.)

Select 2 answers
A.Normalize the features
B.Add more features to the model
C.Use a subset of the most important features
D.Increase the number of training epochs
E.Apply L2 regularization (Ridge regression)
AnswersC, E

Reducing the number of features reduces model complexity and overfitting.

Why this answer

Regularization (L1 or L2) penalizes large coefficients and reduces overfitting. Reducing model complexity by using fewer features or simplifying the model also helps. Adding more features would increase complexity and overfitting.

Increasing the number of epochs is not relevant for linear regression (which has a closed-form solution).

304
Multi-Selecteasy

Which TWO actions are valid ways to handle missing data in a dataset before training a machine learning model? (Select TWO.)

Select 2 answers
A.Delete rows with missing values
B.Remove all features that have any missing values
C.Replace missing values with the maximum value
D.Ignore missing values and train the model
E.Impute missing values with the mean
AnswersA, E

Row deletion is valid if missingness is random.

Why this answer

Option A is correct because deleting rows with missing values (listwise deletion) is a straightforward and valid approach when the missing data is random and the dataset is large enough that the loss of rows does not significantly reduce statistical power or introduce bias. This method avoids the need to estimate missing values and is commonly used in practice when the proportion of missing data is low.

Exam trap

Cisco often tests the misconception that 'ignoring missing values' is acceptable because some algorithms like tree-based models can technically handle missing values internally, but the exam expects explicit data preprocessing steps as part of the modeling pipeline.

305
MCQmedium

A company uses Amazon SageMaker to train a time-series forecasting model using the built-in DeepAR algorithm. The training data consists of daily sales for 1000 products over 2 years. The model performs well on most products, but for a few products with intermittent demand (sporadic sales), the predictions are poor. Which action should the data scientist take to improve predictions for these products?

A.Create a separate forecasting model specifically for intermittent demand products, using a model designed for such patterns (e.g., Croston's method).
B.Use a linear regression model for all products.
C.Increase the context length of the DeepAR model to capture longer history.
D.Add more training data by including additional product categories.
AnswerA

Intermittent demand requires specialized models like Croston's method or TSB.

Why this answer

Option A is correct. Creating separate models for different demand patterns allows specialized treatment. Option B is wrong because the dataset is already long enough.

Option C is wrong because using a linear model may underfit. Option D is wrong because increasing training data does not help with intermittent patterns.

306
Multi-Selectmedium

Which TWO approaches are valid for handling missing categorical values in a dataset before training a machine learning model?

Select 2 answers
A.Remove all rows with missing values
B.Impute missing values with the mode of the column
C.Impute missing values with the median of the column
D.Impute missing values with the mean of the column
E.Treat missing values as a separate category
AnswersB, E

Mode is appropriate for categorical data.

Why this answer

Option B is correct because the mode (most frequent value) is the only valid measure of central tendency for categorical data, as it identifies the most common category. Imputing with the mode preserves the distribution of categories and is a standard technique for handling missing categorical values in preprocessing pipelines like scikit-learn's SimpleImputer with strategy='most_frequent'.

Exam trap

AWS often tests the distinction between numerical and categorical imputation methods, trapping candidates who apply mean or median imputation to categorical features without recognizing that these statistics are invalid for non-numeric data.

307
MCQmedium

A company uses Amazon SageMaker to train a model. The training job runs successfully but the model artifacts are not saved to the specified S3 output path. What is a likely cause?

A.The training script does not save the model to /opt/ml/model.
B.The model size exceeds the S3 bucket limit.
C.The training job used spot instances.
D.The S3 bucket is in a different AWS Region.
AnswerA

SageMaker uploads contents of /opt/ml/model to S3; saving elsewhere means artifacts are lost.

Why this answer

Option A is correct because Amazon SageMaker expects the training script to save the model artifacts to the `/opt/ml/model` directory. After the training job completes, SageMaker automatically copies the contents of this directory to the specified S3 output path. If the script saves the model elsewhere (e.g., `/tmp` or a custom path), no artifacts will be uploaded, resulting in an empty or missing S3 output.

Exam trap

The trap here is that candidates assume any successful training job automatically saves artifacts, but SageMaker only uploads what is explicitly placed in `/opt/ml/model`, and the exam tests this specific SageMaker convention.

How to eliminate wrong answers

Option B is wrong because S3 bucket limits are based on total bucket size (unlimited) and object size (up to 5 TB per object), not model size; a model exceeding these limits would cause a different error (e.g., upload failure), not a silent missing artifact. Option C is wrong because using spot instances does not affect where the model is saved; spot instances can be preempted, but if the training completes successfully, artifacts are still saved to `/opt/ml/model` and uploaded. Option D is wrong because SageMaker can write to S3 buckets in any region as long as the bucket policy and IAM role grant cross-region access; a region mismatch would cause a permission or access error, not a silent failure to save artifacts.

308
MCQmedium

A company uses SageMaker to train a time-series forecasting model using Amazon Forecast. The dataset contains historical sales data for 10,000 products over 2 years. Which data format is required for the target time series?

A.A single JSON file with nested arrays
B.A CSV file with columns: timestamp, target_value, item_id
C.A text file with one value per line
D.A Parquet file partitioned by date
AnswerB

This is the required format for target time series in Forecast.

Why this answer

Amazon Forecast requires the target time series data to be in a CSV format with specific columns: timestamp, target_value, and item_id. This structured format allows the service to correctly identify the time series for each product and the target metric to forecast. The CSV format is the standard input for Forecast's built-in algorithms and ensures compatibility with the dataset import process.

Exam trap

The trap here is that candidates may assume Amazon Forecast supports flexible data formats like JSON or Parquet for all dataset types, but the target time series is strictly restricted to CSV to ensure consistent parsing and algorithm compatibility.

How to eliminate wrong answers

Option A is wrong because Amazon Forecast does not accept JSON files for target time series data; it requires CSV format for dataset import. Option C is wrong because a text file with one value per line lacks the necessary metadata (timestamp and item_id) to define multiple time series and their temporal alignment. Option D is wrong while Parquet is a supported format for related time series (RTS) or item metadata, the target time series dataset must be in CSV format as per Forecast's documentation.

309
Multi-Selecteasy

A data scientist is evaluating a binary classification model that predicts whether a customer will churn. The model achieves an AUC of 0.85 on the test set. Which TWO statements about AUC are correct? (Choose two.)

Select 2 answers
A.AUC represents the probability that a randomly chosen positive instance is ranked higher than a randomly chosen negative instance.
B.An AUC of 0.85 indicates the model is no better than random guessing.
C.AUC is the average precision across all thresholds.
D.AUC is equivalent to the accuracy of the model at the default threshold of 0.5.
E.AUC is threshold-independent, meaning it evaluates the model's ranking performance across all thresholds.
AnswersA, E

This is the statistical interpretation of AUC.

Why this answer

Options B and D are correct. AUC measures the model's ability to rank positive instances higher than negative ones (B). AUC is threshold-independent (D).

Option A is false because AUC 0.85 is better than random (0.5). Option C is false because AUC is not accuracy. Option E is false because AUC is not mean precision.

310
MCQeasy

A data scientist is training a linear regression model using Amazon SageMaker's built-in Linear Learner algorithm. The dataset has 500 features and 1 million rows. After training, the model's training RMSE is 2.5 and validation RMSE is 2.6, which is acceptable. However, the scientist notices that many feature coefficients are very small but non-zero, and the model takes a long time to train. The scientist wants to reduce training time while maintaining similar accuracy. Which action should the scientist take?

A.Increase the mini-batch size
B.Increase the L1 regularization strength
C.Switch to a neural network model
D.Increase the L2 regularization strength
AnswerB

L1 induces sparsity, removing features and speeding training.

Why this answer

Option B (increase L1 regularization) will drive many coefficients to zero, reducing effective features and thus training time. Option A (increase L2) shrinks coefficients but doesn't zero them out, so less impact on speed. Option C (increase batch size) may speed training but could affect convergence.

Option D (use a different algorithm) is unnecessary.

311
MCQhard

A data scientist is using Amazon SageMaker to train a deep learning model for image classification. The training job is using a single GPU instance and is taking too long. The scientist wants to reduce training time without sacrificing model accuracy. The dataset contains 100,000 images of size 256x256. Which change would most effectively reduce training time?

A.Reduce the batch size
B.Use a smaller image size (e.g., 128x128)
C.Increase the learning rate
D.Switch to a distributed training setup with multiple GPUs
AnswerB

Fewer pixels mean faster forward/backward passes, significantly reducing training time.

Why this answer

Reducing image resolution (e.g., to 128x128) significantly reduces the number of pixels and thus the computational cost per epoch, often with minimal impact on accuracy for many tasks. Using a smaller batch size increases the number of iterations but can actually slow down training. Distributed training with multiple GPUs would reduce time but the question asks for a change that does not sacrifice accuracy; distributed training can sometimes affect convergence but is generally safe.

However, reducing resolution is a direct and effective method.

312
MCQeasy

A data scientist is training a binary classification model on a highly imbalanced dataset where the positive class represents only 1% of the data. The model achieves 99% accuracy but only identifies 5% of the actual positives. Which metric should the data scientist use to evaluate model performance?

A.Mean squared error
B.Accuracy
C.Recall
D.Precision
AnswerC

Recall measures the proportion of actual positives correctly identified.

Why this answer

Recall (sensitivity) measures the proportion of actual positives correctly identified by the model. With only 5% of positives detected, recall is 0.05, which directly reveals the model's failure to capture the minority class despite high accuracy. In imbalanced datasets, accuracy is misleading because the model can achieve 99% accuracy by simply predicting the majority class (negative) for all instances.

Exam trap

Cisco often tests the trap that high accuracy implies good performance on imbalanced datasets, leading candidates to choose accuracy without considering class distribution or the specific failure mode (low recall).

How to eliminate wrong answers

Option A is wrong because mean squared error (MSE) is a regression metric that measures average squared differences between predicted and actual values, not suitable for binary classification evaluation. Option B is wrong because accuracy is misleading in imbalanced datasets; a model predicting all negatives achieves 99% accuracy but fails to identify positives, as seen here. Option D is wrong because precision measures the proportion of positive predictions that are correct, which could be high if the model makes very few positive predictions, but it does not capture the low detection rate of actual positives (recall).

313
Multi-Selecthard

A company uses Amazon SageMaker to build a text classification model using a pre-trained BERT model. The dataset contains 10,000 labeled documents. The model is overfitting: training accuracy is 99%, validation accuracy is 85%. Which TWO of the following are most likely to help reduce overfitting? (Choose TWO.)

Select 2 answers
A.Add more transformer layers to the model
B.Increase the dropout rate during fine-tuning
C.Increase the batch size
D.Use a larger pre-trained BERT model
E.Decrease the learning rate
AnswersB, E

Dropout is a regularization technique that randomly drops units, reducing overfitting.

Why this answer

Increasing dropout during fine-tuning adds regularization. Decreasing the learning rate can help the model converge to a better solution and prevent overfitting to the training set. Increasing batch size can sometimes regularize but is not as effective as dropout.

Adding more layers increases model capacity and overfitting. Using a larger pre-trained model also increases capacity.

314
MCQeasy

A data scientist is training a binary classification model on imbalanced data (95% negative, 5% positive). Which metric is most appropriate for evaluating model performance?

A.R-squared
B.Mean Squared Error (MSE)
C.Area Under the ROC Curve (AUC-ROC)
D.Accuracy
AnswerC

AUC-ROC measures the model's ability to distinguish between classes regardless of threshold, suitable for imbalanced data.

Why this answer

AUC-ROC is the most appropriate metric for imbalanced binary classification because it evaluates the model's ability to distinguish between positive and negative classes across all classification thresholds, without being biased by the 95% negative majority. It measures the trade-off between true positive rate and false positive rate, making it robust to class imbalance.

Exam trap

The trap here is that candidates often default to accuracy as the primary metric, not realizing that with severe class imbalance, accuracy can be artificially high and completely mask poor performance on the minority class.

How to eliminate wrong answers

Option A is wrong because R-squared is a regression metric that measures the proportion of variance explained by the model, and it is not applicable to binary classification problems. Option B is wrong because Mean Squared Error (MSE) is a regression loss function that penalizes large errors quadratically and does not provide meaningful evaluation for classification tasks, especially with imbalanced data. Option D is wrong because accuracy would be misleadingly high (95%) by simply predicting the majority class for all instances, failing to capture the model's performance on the rare positive class.

315
Multi-Selecthard

Which THREE of the following are valid approaches for deploying a machine learning model to an Amazon SageMaker endpoint for real-time inference?

Select 3 answers
A.Use a SageMaker Inference Pipeline with multiple containers
B.Use a pre-built SageMaker container with built-in algorithms
C.Use Amazon EMR to host the model
D.Deploy the model as an AWS Lambda function
E.Bring your own Docker container
AnswersA, B, E

Inference pipelines allow chaining of preprocessing and prediction containers.

Why this answer

Option A is correct because SageMaker Inference Pipelines allow you to chain multiple containers (e.g., preprocessing, prediction, postprocessing) into a single endpoint, enabling complex workflows for real-time inference. This is achieved by defining a sequence of Docker containers in the model definition, where each container's output is passed as input to the next, all within the same SageMaker endpoint.

Exam trap

The trap here is that candidates might confuse Amazon EMR's model serving capabilities (e.g., using Spark MLlib) with SageMaker's managed inference, or assume Lambda can handle large model artifacts despite its payload and timeout constraints.

316
MCQhard

A data scientist is training a binary classifier to detect network intrusions. The dataset has 1,000 features and 10 million samples, but only 0.1% are positive (intrusions). The scientist uses XGBoost with scale_pos_weight set to 100. The model achieves a recall of 0.90 and precision of 0.05 on the test set. The business requires precision of at least 0.50 while maintaining recall above 0.80. Which technique should the scientist apply?

A.Switch to a random forest classifier with class weights
B.Randomly undersample the majority class to achieve 1:1 ratio
C.Tune the decision threshold on validation data to maximize F1 score
D.Increase scale_pos_weight to 500
AnswerC

Threshold tuning directly controls precision-recall trade-off.

Why this answer

Option B (post-training threshold tuning) adjusts the decision threshold to trade off precision and recall. Option A (increase scale_pos_weight) will further increase recall but decrease precision. Option C (undersample majority) can help but may reduce recall.

Option D (use random forest) may not achieve required precision.

317
MCQhard

A company is deploying a real-time inference endpoint using SageMaker. The model is a large deep learning model (5 GB) with strict latency requirements (< 100 ms per request). The team expects bursty traffic with up to 1000 requests per second. Which configuration best meets the latency and throughput requirements?

A.Deploy an ml.p3.2xlarge instance with automatic scaling based on a custom metric like 'InvocationsPerInstance'
B.Use a multi-model endpoint with ml.c5.4xlarge instances
C.Use SageMaker Serverless Inference with a memory size of 6 GB
D.Deploy a single ml.p3.16xlarge instance with a production variant
AnswerA

GPU instances handle large models; automatic scaling with custom metrics provides elasticity.

Why this answer

Option A is correct because deploying on an ml.p3.2xlarge instance with automatic scaling based on 'InvocationsPerInstance' allows the endpoint to handle bursty traffic up to 1000 requests per second while maintaining sub-100 ms latency. The GPU-accelerated p3 instance provides the necessary compute for a 5 GB deep learning model, and custom scaling on invocations per instance ensures that additional instances are provisioned quickly during traffic spikes without over-provisioning.

Exam trap

The trap here is that candidates often assume a single large instance (like ml.p3.16xlarge) can handle high throughput, but they overlook the need for horizontal scaling to manage bursty traffic without latency degradation.

How to eliminate wrong answers

Option B is wrong because multi-model endpoints share a single instance across multiple models, which can lead to contention and increased latency for a large 5 GB model under bursty traffic, and the ml.c5.4xlarge instances lack GPU acceleration, making them unsuitable for deep learning inference at high throughput. Option C is wrong because SageMaker Serverless Inference has a cold start latency that can exceed 100 ms, especially for a 5 GB model, and its maximum concurrency is limited, making it unable to handle 1000 requests per second with strict latency requirements. Option D is wrong because a single ml.p3.16xlarge instance, while powerful, cannot handle bursty traffic of 1000 requests per second without scaling; a single instance will be overwhelmed, causing latency to spike above 100 ms, and it lacks the elasticity needed for bursty workloads.

318
MCQhard

A data scientist is training a model using SageMaker's built-in XGBoost algorithm. The dataset has 500 features and 1 million rows. The training job is taking too long. The scientist wants to reduce training time without sacrificing accuracy. Which action is LIKELY to be most effective?

A.Use a smaller instance type to reduce time
B.Reduce the number of trees in XGBoost
C.Use a larger instance type with more vCPUs
D.Apply Principal Component Analysis (PCA) to reduce the number of features
AnswerD

PCA reduces dimensionality, speeding up training while retaining most information.

Why this answer

Option A (Reduce the number of features by applying PCA) is correct because it reduces dimensionality, speeding up training. Option B (Increase the number of instances) may not be cost-effective. Option C (Use a smaller instance) may reduce time but also accuracy.

Option D (Reduce the number of trees) may reduce accuracy.

319
MCQeasy

A data scientist is training a decision tree classifier and notices that the model performs well on training data but poorly on test data. Which technique should the data scientist use to address this issue?

A.Use a different split criterion
B.Prune the tree
C.Apply L1 regularization
D.Increase tree depth
AnswerB

Pruning reduces overfitting.

Why this answer

Pruning the tree reduces overfitting by removing branches that have little statistical significance or that capture noise in the training data. This technique improves generalization to unseen test data, which directly addresses the symptom of high training accuracy and low test accuracy.

Exam trap

Cisco often tests the misconception that regularization techniques like L1/L2 apply universally, when in fact they are specific to models with learnable weights (e.g., linear regression, neural networks) and not to tree-based models.

How to eliminate wrong answers

Option A is wrong because using a different split criterion (e.g., Gini impurity vs. entropy) changes how splits are selected but does not inherently reduce model complexity or overfitting; it may still produce a deep, overfitted tree. Option C is wrong because L1 regularization is a technique for linear models (e.g., Lasso regression) and is not directly applicable to decision trees, which do not have coefficients to penalize. Option D is wrong because increasing tree depth makes the model more complex, exacerbating overfitting rather than fixing it.

320
MCQhard

A data scientist is training a deep learning model for image segmentation using a U-Net architecture. The model overfits severely. The scientist tries L2 regularization, dropout, and data augmentation, but validation loss remains high while training loss approaches zero. Which additional strategy is most likely to reduce overfitting?

A.Implement early stopping based on validation loss
B.Increase the batch size
C.Use a larger learning rate
D.Add more convolutional layers to increase model capacity
AnswerA

Early stopping prevents overfitting by stopping training before the model starts to memorize the training data.

Why this answer

Early stopping monitors validation loss and halts training when it stops improving, directly addressing overfitting by preventing the model from memorizing noise after it has learned generalizable features. Since the training loss is near zero but validation loss remains high, the model has already started overfitting, and early stopping can cut training at the point just before overfitting worsens.

Exam trap

AWS often tests the misconception that increasing regularization (L2, dropout, augmentation) is always sufficient, but the trap here is that when those techniques fail, early stopping is the next logical step because it directly stops the overfitting process at the optimal point, whereas the other options either increase capacity or destabilize training.

How to eliminate wrong answers

Option B is wrong because increasing batch size typically reduces gradient noise and can lead to sharper minima, which often worsens generalization and overfitting, not reduces it. Option C is wrong because using a larger learning rate can cause the optimizer to overshoot minima, leading to unstable training and potentially higher validation loss, but it does not specifically target the overfitting problem when training loss is already near zero. Option D is wrong because adding more convolutional layers increases model capacity, which exacerbates overfitting when the model already has enough capacity to memorize the training data.

321
MCQmedium

Refer to the exhibit. A data scientist is trying to create a SageMaker training job but receives an access denied error. The IAM policy shown is attached to the user. What is the likely issue?

A.The policy does not allow sagemaker:CreateTrainingJob
B.The policy is missing s3:ListBucket permission
C.The policy does not allow sagemaker:CreateModel
D.The policy does not allow s3:PutObject
AnswerB

SageMaker needs to list objects in the bucket.

Why this answer

Option C is correct because the policy allows s3:GetObject and s3:PutObject on the bucket, but the training job also needs s3:ListBucket to read the objects. Option A is wrong because the policy has sagemaker:CreateTrainingJob on all resources. Option B is wrong because the policy allows sagemaker actions.

Option D is wrong because the policy allows s3:PutObject, but the error is about access denied, not about upload.

322
MCQhard

A data scientist is training a binary classification model using SageMaker XGBoost and notices that training loss decreases but validation loss increases after a few epochs. Which action should the data scientist take to address this issue?

A.Increase the number of rounds
B.Set early stopping based on validation loss
C.Increase the learning rate
D.Increase the maximum tree depth
AnswerB

Stops training when validation loss stops improving.

Why this answer

The increasing validation loss while training loss decreases is a classic sign of overfitting. Setting early stopping based on validation loss halts training when the validation loss stops improving, preventing the model from memorizing noise in the training data. SageMaker XGBoost's `early_stopping_rounds` parameter monitors the evaluation metric on the validation set and stops training if no improvement is seen for a specified number of rounds.

Exam trap

AWS often tests the misconception that increasing model complexity (more rounds, deeper trees, higher learning rate) always improves performance, when in fact these actions worsen overfitting when validation loss diverges from training loss.

How to eliminate wrong answers

Option A is wrong because increasing the number of rounds would continue training further, exacerbating overfitting and making validation loss worse. Option C is wrong because increasing the learning rate makes the model converge faster but does not address overfitting; it can actually cause the model to overshoot optimal minima and worsen validation loss. Option D is wrong because increasing the maximum tree depth allows trees to grow deeper, capturing more complex patterns and increasing the risk of overfitting, which is the opposite of what is needed.

323
MCQeasy

A data scientist is building a regression model to predict house prices. The dataset contains many features, some of which are highly correlated. The model is overfitting. Which regularization technique should the scientist use to penalize large coefficients and perform feature selection?

A.L2 regularization (Ridge)
B.L1 regularization (Lasso)
C.Elastic Net regularization
D.Dropout
AnswerB

L1 regularization can zero out coefficients, performing feature selection.

Why this answer

L1 regularization (Lasso) adds a penalty equal to the absolute value of the coefficients, which can shrink some coefficients to zero, performing feature selection. L2 regularization (Ridge) penalizes squared coefficients but does not zero them out. Elastic Net combines both.

Dropout is for neural networks. Option A: L1 regularization is correct. Option B: L2 regularization does not perform feature selection.

Option C: Elastic Net combines both but L1 alone is simpler for feature selection. Option D: Dropout is not applicable to linear regression.

324
MCQhard

A data scientist is training a neural network on Amazon SageMaker. The network has many layers and the training is very slow. The scientist suspects that the gradients are vanishing. Which technique is most specifically designed to mitigate the vanishing gradient problem?

A.Use gradient clipping.
B.Use batch normalization.
C.Use data augmentation.
D.Use dropout layers.
AnswerB

Batch normalization reduces internal covariate shift and helps mitigate vanishing gradients.

Why this answer

Batch normalization helps by normalizing the activations, which reduces the problem of vanishing/exploding gradients. Dropout is for regularization. Data augmentation increases data.

Gradient clipping deals with exploding gradients, not vanishing.

325
MCQhard

Refer to the exhibit. A data scientist wants to use SageMaker to train a model using data stored in 'my-bucket'. The training job fails with an access denied error. What is the MOST likely cause?

A.The bucket uses AWS KMS key encryption instead of AES256
B.The bucket name in the policy does not match the actual bucket name
C.The training job is not requesting server-side encryption with AES256
D.The bucket is publicly accessible but the IAM role lacks permissions
AnswerC

The policy denies PutObject if encryption is not AES256, so the job must include the encryption header.

Why this answer

Option C is correct because SageMaker training jobs require server-side encryption with AES256 when the S3 bucket uses default encryption with AES256. If the training job does not explicitly request SSE-S3 (AES256) in its S3 data source configuration, the S3 service denies access, resulting in an 'access denied' error even if the IAM role has full S3 permissions.

Exam trap

The trap here is that candidates often assume 'access denied' always means an IAM permissions issue, but AWS S3 can return 'Access Denied' for encryption policy violations when the request does not match the bucket's default encryption settings.

How to eliminate wrong answers

Option A is wrong because AWS KMS key encryption would cause a different error (e.g., KMS access denied) rather than a generic 'access denied' error, and the question does not mention KMS key permissions. Option B is wrong because a bucket name mismatch would produce a 'NoSuchBucket' error, not an 'access denied' error. Option D is wrong because if the bucket is publicly accessible, the training job would not need IAM role permissions for read access; the error would be a different type (e.g., 403 Forbidden) if the role lacked permissions, but the scenario points to an encryption mismatch.

326
MCQmedium

A data scientist is using Amazon SageMaker to train a neural network. The training job fails with the error 'ResourceLimitExceeded: The account-level service limit for ml.p3.8xlarge for training job usage is 0.' What is the most likely cause and solution?

A.The training job is using spot instances; switch to on-demand instances.
B.The instance type is not available in the current region; switch to a different region.
C.The account has not requested a limit increase for ml.p3.8xlarge; submit a limit increase request via AWS Support.
D.The instance type is too large; use a smaller instance type like ml.m5.large.
AnswerC

ResourceLimitExceeded indicates the current limit is zero; a limit increase is needed.

Why this answer

The error message explicitly states that the account-level service limit for ml.p3.8xlarge for training job usage is 0, which means the account has not been granted any capacity for that instance type. AWS enforces service quotas (limits) per account per region, and for GPU-intensive instances like ml.p3.8xlarge, the default limit is often 0 unless a limit increase request has been submitted and approved. Therefore, the correct solution is to request a limit increase via AWS Support.

Exam trap

The trap here is that candidates may confuse a service limit error with instance availability or spot instance issues, but the specific phrase 'limit is 0' directly points to an unrequested quota increase, not a regional or pricing model problem.

How to eliminate wrong answers

Option A is wrong because the error is about a service limit of 0, not about spot instance availability; switching to on-demand instances would still fail because the limit applies to both spot and on-demand training jobs. Option B is wrong because the error message does not indicate regional unavailability; it specifically cites a limit of 0, meaning the instance type exists in the region but the account has no quota. Option D is wrong because the error is not about instance size or resource exhaustion; using a smaller instance type like ml.m5.large would avoid the GPU limit but does not address the root cause of the ml.p3.8xlarge limit being 0.

327
MCQmedium

A data scientist is tuning a neural network on a small dataset and observes that the training loss decreases but validation loss increases after a few epochs. Which technique should be applied to mitigate this issue?

A.Add dropout layers to the model.
B.Increase the learning rate.
C.Remove regularization terms from the loss function.
D.Increase the number of epochs.
AnswerA

Dropout randomly drops neurons, reducing overfitting.

Why this answer

Overfitting occurs when validation loss increases. Dropout is a regularization technique that reduces overfitting. Learning rate decay may help convergence but not specifically for overfitting.

Batch normalization helps training stability. Data augmentation is useful but not always applicable.

328
MCQmedium

A data scientist is training a binary classification model on imbalanced data (95% negative, 5% positive). The model achieves 99% accuracy on the test set but fails to detect any positive cases. Which metric should the scientist focus on to evaluate model performance?

A.Accuracy
B.Recall
C.RMSE
D.Precision
AnswerB

Recall measures the proportion of actual positives correctly identified.

Why this answer

Option B is correct because recall (true positive rate) measures the ability to find all positive samples, which is critical for imbalanced datasets where accuracy can be misleading. Option A is wrong because accuracy is high but misleading. Option C is wrong because precision alone doesn't capture the missed positives.

Option D is wrong because RMSE is for regression.

329
MCQeasy

A data scientist is training a deep learning model for image classification using Amazon SageMaker. The training job is taking too long. The data scientist wants to use distributed training across multiple GPUs to speed up the process. Which SageMaker feature should the data scientist use?

A.SageMaker Distributed Training Libraries
B.SageMaker Managed Spot Training
C.SageMaker Hyperparameter Tuning
D.SageMaker Automatic Model Tuning
AnswerA

Distributed training libraries enable training across multiple GPUs, reducing wall-clock time.

Why this answer

SageMaker Distributed Training Libraries provide optimized implementations of data parallelism and model parallelism that automatically partition the model and data across multiple GPUs, reducing training time for deep learning models. This is the correct choice because the question specifically asks for a feature to enable distributed training across multiple GPUs, which is exactly what these libraries are designed for.

Exam trap

The trap here is that candidates often confuse cost-saving features (like Spot Training) with performance-optimization features (like distributed training), or they mistakenly think hyperparameter tuning can parallelize a single training job across GPUs.

How to eliminate wrong answers

Option B is wrong because SageMaker Managed Spot Training reduces cost by using spare EC2 capacity, not by distributing training across multiple GPUs; it does not inherently speed up training. Option C is wrong because SageMaker Hyperparameter Tuning automates the search for optimal hyperparameters, but it does not distribute a single training job across multiple GPUs. Option D is wrong because SageMaker Automatic Model Tuning is another name for hyperparameter tuning (same as option C) and does not provide distributed training capabilities.

330
MCQeasy

A data scientist is training a binary classification model for fraud detection. The dataset is highly imbalanced with only 1% fraudulent transactions. The model currently achieves 99% accuracy but only catches 5% of actual fraud cases. Which metric should the data scientist focus on to better evaluate model performance?

A.Precision
B.Accuracy
C.Root Mean Squared Error (RMSE)
D.Recall
AnswerD

Recall measures the ability to find all positive samples, which is crucial for fraud detection.

Why this answer

In fraud detection with highly imbalanced data (1% fraud), accuracy is misleading because a model can achieve 99% accuracy by simply predicting 'not fraud' for all transactions. Recall (true positive rate) measures the proportion of actual fraud cases correctly identified, which is critical when the cost of missing fraud is high. The model currently catches only 5% of fraud, so improving recall is the primary goal to reduce false negatives.

Exam trap

AWS often tests the misconception that accuracy is always the best metric, but in imbalanced classification, recall or precision-recall curves are more informative, and candidates must recognize that high accuracy can mask poor minority class performance.

How to eliminate wrong answers

Option A is wrong because precision measures the proportion of predicted fraud cases that are actually fraud, which is not the primary concern when the model misses 95% of actual fraud; precision focuses on false positives, not false negatives. Option B is wrong because accuracy is dominated by the majority class (99% non-fraud) and does not reflect the model's poor performance on the minority fraud class; a model can have high accuracy while failing to detect fraud. Option C is wrong because RMSE is a regression metric that measures the average magnitude of errors in continuous predictions, not suitable for evaluating binary classification performance, especially with imbalanced classes.

331
MCQmedium

A company is deploying a real-time fraud detection model using Amazon SageMaker. The model must make predictions in under 100 milliseconds. The data scientist uses a pre-trained XGBoost model and deploys it to a SageMaker endpoint with an ml.c5.xlarge instance. After load testing, the average latency is 150 ms. Which action should the data scientist take to reduce latency?

A.Reduce the number of trees in the XGBoost model
B.Deploy multiple instances behind a load balancer
C.Enable SageMaker Neo to compile the model for the target instance
D.Use a larger instance type to increase compute capacity
AnswerC

Neo optimization can reduce inference latency by optimizing the model for the hardware.

Why this answer

Option C is correct because SageMaker Neo optimizes trained models for the target hardware platform by compiling them into an efficient runtime. This reduces inference latency without changing the model architecture, making it ideal for meeting the sub-100ms requirement when the current latency is 150ms on an ml.c5.xlarge instance.

Exam trap

The trap here is that candidates often confuse scaling out (Option B) or scaling up (Option D) with latency reduction, but these primarily address throughput or resource contention, not the per-request inference time on a single instance.

How to eliminate wrong answers

Option A is wrong because reducing the number of trees in the XGBoost model would degrade model accuracy and is not a targeted latency optimization technique; it may also not achieve the required latency reduction without significant accuracy loss. Option B is wrong because deploying multiple instances behind a load balancer improves throughput and availability but does not reduce per-request latency; it may even add network overhead. Option D is wrong because using a larger instance type increases compute capacity but does not guarantee lower latency for a single inference request; it may also increase cost without addressing the root cause of model execution inefficiency.

332
Multi-Selecteasy

Which TWO of the following are valid methods for handling missing values in a dataset before training a machine learning model?

Select 2 answers
A.Remove rows that contain missing values
B.Use a decision tree algorithm that handles missing values internally
C.Increase the number of trees in a random forest
D.Replace missing values with zero
E.Impute missing values with the mean of the column
AnswersA, E

If the proportion of missing data is small, dropping rows is a valid option.

Why this answer

Option A is correct because removing rows with missing values (listwise deletion) is a straightforward and valid method when the missing data is random and the dataset is large enough that the loss of rows does not significantly reduce statistical power or introduce bias. This approach ensures that only complete cases are used for training, avoiding the need to estimate missing values.

Exam trap

Cisco often tests the misconception that decision tree algorithms inherently handle missing values without any preprocessing, but in practice, they require explicit handling (e.g., surrogate splits) and do not automatically resolve missing data for all model training scenarios.

333
Multi-Selecthard

A company is training a deep learning model for object detection using Amazon SageMaker. The training job is taking too long. Which THREE actions can reduce training time?

Select 3 answers
A.Use distributed training with multiple GPUs
B.Use a larger instance type with more vCPUs
C.Use SageMaker managed spot training
D.Use a smaller batch size initially and increase gradually (warm-up)
E.Increase the number of epochs
AnswersA, C, D

Distributed training parallelizes the workload, reducing training time.

Why this answer

Option A is correct because distributed training with multiple GPUs (e.g., using SageMaker's distributed data parallelism or model parallelism) splits the workload across multiple devices, reducing wall-clock time per epoch. This leverages Horovod or SageMaker's own distributed training libraries to synchronize gradients efficiently, directly addressing the long training time.

Exam trap

AWS often tests the misconception that increasing CPU cores (Option B) or epochs (Option E) will speed up training, when in reality deep learning is GPU-bound and more epochs increase time.

334
Multi-Selecthard

Which TWO techniques are used to handle missing values in a dataset before training? (Choose 2.)

Select 2 answers
A.Mean or median imputation.
B.Min-max scaling.
C.Removing rows or columns with missing values.
D.One-hot encoding.
E.Principal component analysis (PCA).
AnswersA, C

Imputation replaces missing values with central tendency.

Why this answer

Option B is correct because imputation fills missing values. Option D is correct because removing rows/columns with missing data is a valid approach. Option A is wrong because scaling is for numerical features, not missing values.

Option C is wrong because one-hot encoding is for categorical variables. Option E is wrong because PCA is for dimensionality reduction.

335
MCQeasy

A company uses Amazon SageMaker to train a linear regression model on a dataset with 10 million rows and 50 features. The training job takes 8 hours to complete. A data scientist wants to reduce the training time to under 2 hours without changing the dataset size or the model algorithm. The SageMaker instance type currently used is ml.m5.2xlarge. Which action should the data scientist take to achieve the desired training time?

A.Change the instance type to ml.p3.2xlarge (GPU instance).
B.Change the instance type to ml.m5.4xlarge (double the vCPUs and memory).
C.Reduce the number of features from 50 to 25.
D.Use SageMaker's distributed training with 4 ml.m5.2xlarge instances.
AnswerD

Distributed training parallelizes computation across instances, significantly reducing training time.

Why this answer

Option A is correct because using a distributed training approach with multiple ml.m5.2xlarge instances will parallelize the computation, reducing wall-clock time. Option B (increasing to ml.m5.4xlarge) provides more compute but not enough to reduce time from 8 to 2 hours (only 2x improvement). Option C (changing to ml.p3.2xlarge with GPU) is not optimal for linear regression, which is CPU-bound.

Option D (reducing features) changes the dataset and is not allowed.

336
MCQmedium

A data scientist is training a neural network on a dataset with 1 million images. The training loss decreases steadily but the validation loss starts to increase after 10 epochs. Which action should the scientist take to improve generalization?

A.Implement early stopping
B.Add more layers to the network
C.Reduce the learning rate
D.Increase the number of epochs
AnswerA

Early stopping prevents overfitting by halting training when validation loss increases.

Why this answer

Increasing validation loss indicates overfitting. Early stopping halts training when validation loss stops improving, preventing overfitting. Increasing epochs would worsen overfitting.

Reducing learning rate might help but early stopping directly addresses the issue. Adding more layers could increase overfitting. Option A: Early stopping is correct.

Option B: Increasing epochs would worsen overfitting. Option C: Reducing learning rate might help but not as directly. Option D: Adding more layers could increase overfitting.

337
MCQeasy

A data scientist wants to use a linear regression model to predict house prices. After training, the model shows high bias and low variance. Which action would most likely improve the model's performance?

A.Add polynomial features to capture non-linear relationships.
B.Increase L2 regularization strength.
C.Use a simpler model, such as linear regression without interaction terms.
D.Reduce the amount of training data.
AnswerA

Increasing model complexity reduces bias by better fitting the data.

Why this answer

Option C is correct because high bias indicates underfitting, and increasing model complexity (e.g., adding polynomial features or using a more complex algorithm) can reduce bias. Option A is wrong because adding L2 regularization increases bias. Option B is wrong because reducing training data can increase variance but not necessarily reduce bias.

Option D is wrong because using a simpler model would increase bias.

338
MCQmedium

A data scientist is training a binary classification model on an imbalanced dataset where the positive class represents only 5% of the data. The model currently achieves 95% accuracy but only 10% recall on the positive class. Which metric should the scientist focus on to improve the model's ability to detect the positive class?

A.Recall
B.Accuracy
C.Precision
D.AUC-ROC
AnswerA

Recall measures the proportion of actual positives correctly identified.

Why this answer

Option B is correct because recall measures the ability to find all positive samples, which is the key issue in this imbalanced dataset. Option A is wrong because accuracy is misleading when classes are imbalanced. Option C is wrong because precision does not directly address the low recall.

Option D is wrong because AUC-ROC is a global metric that may not reflect the improvement in recall.

339
Multi-Selecthard

A machine learning team is using Amazon SageMaker to train a model with a large dataset stored in S3. The training job is taking too long. Which THREE of the following actions can reduce training time? (Choose three.)

Select 3 answers
A.Decrease the batch size.
B.Use a GPU instance with more powerful GPUs.
C.Use distributed training with multiple instances.
D.Use Pipe input mode instead of File mode for the training data.
E.Increase the batch size.
AnswersB, C, D

Faster GPUs reduce computation time.

Why this answer

Using a GPU instance with more powerful GPUs (Option B) reduces training time because it increases the parallel compute capacity for matrix operations, which are the core of deep learning. Amazon SageMaker allows you to select instances like p3.16xlarge with NVIDIA V100 GPUs, which offer significantly higher FLOPS compared to smaller GPU instances, directly accelerating model training.

Exam trap

The trap here is that candidates often confuse batch size adjustments as a primary performance lever, but the exam tests understanding that hardware upgrades (GPU power), parallelism (distributed training), and data streaming (Pipe mode) are the most direct and reliable methods to reduce training time in SageMaker.

340
MCQhard

A machine learning engineer is deploying a model to an Amazon SageMaker endpoint for real-time inference. The model is a large ensemble that requires 4 GB of memory. The engineer wants to minimize cost while ensuring the endpoint can handle up to 100 concurrent requests with a latency under 200 ms. Which instance configuration is most appropriate?

A.Two ml.t3.medium instances behind a load balancer.
B.One ml.c5.xlarge instance with auto-scaling up to 2 instances.
C.One ml.m5.2xlarge instance.
D.One ml.p3.2xlarge instance.
AnswerB

ml.c5.xlarge has 4 GB memory, cost-effective, and auto-scaling handles load.

Why this answer

Option B is correct because the ml.c5.xlarge instance provides sufficient compute (4 vCPUs, 8 GB memory) for the 4 GB model, and auto-scaling up to 2 instances allows handling 100 concurrent requests with low latency while minimizing cost during low traffic. The ml.c5 family is optimized for compute-intensive inference, and auto-scaling ensures the endpoint scales out only when needed, avoiding over-provisioning.

Exam trap

The trap here is that candidates often choose a single large instance (like ml.m5.2xlarge) thinking it simplifies management, but auto-scaling with a smaller instance type is more cost-effective and still meets latency requirements under variable load.

How to eliminate wrong answers

Option A is wrong because two ml.t3.medium instances (2 vCPUs, 4 GB memory each) are burstable and may not sustain the required 200 ms latency under load, as t3 instances use CPU credits and can throttle under sustained high concurrency. Option C is wrong because one ml.m5.2xlarge instance (8 vCPUs, 32 GB memory) is over-provisioned for the 4 GB model and 100 concurrent requests, leading to higher cost without benefit. Option D is wrong because one ml.p3.2xlarge instance (8 vCPUs, 61 GB memory, GPU) is designed for GPU-accelerated workloads like deep learning, not for a large ensemble model that only needs 4 GB memory, making it unnecessarily expensive.

341
MCQhard

A data scientist is training a recurrent neural network (RNN) for time series forecasting. The model's training loss is not decreasing, and the gradients are vanishing. Which technique should the scientist apply to address vanishing gradients?

A.Apply gradient clipping.
B.Replace the RNN cells with LSTM or GRU units.
C.Add batch normalization layers.
D.Increase the learning rate.
AnswerB

LSTM/GRU have gating mechanisms that help preserve gradients over long sequences.

Why this answer

Option C is correct because LSTM and GRU are designed to mitigate vanishing gradients via gating mechanisms. Option A is wrong because gradient clipping addresses exploding gradients, not vanishing. Option B is wrong because increasing learning rate may cause instability.

Option D is wrong because batch normalization helps with internal covariate shift but not specifically vanishing gradients.

342
Multi-Selectmedium

Which THREE techniques can help reduce overfitting in a neural network? (Choose 3)

Select 3 answers
A.Dropout
B.Increasing the number of layers
C.Using a larger learning rate
D.Early stopping
E.L2 regularization
AnswersA, D, E

Dropout randomly drops neurons, reducing overfitting.

Why this answer

Dropout is a regularization technique that randomly drops a fraction of neurons during training, which prevents the network from relying too heavily on any single neuron and forces it to learn more robust features. This reduces overfitting by introducing noise that improves generalization.

Exam trap

Cisco often tests the misconception that increasing model capacity (e.g., more layers) or adjusting the learning rate can reduce overfitting, when in fact these techniques either exacerbate overfitting or address convergence issues rather than regularization.

343
MCQmedium

A team has trained a deep learning model on Amazon SageMaker using a custom Docker container. They want to deploy the model to a SageMaker endpoint for real-time inference. Which format should the model artifacts be in?

A.A single .tar.gz file containing the model files.
B.A folder on S3 with the model files.
C.No format requirement; any file works.
D.A .zip file containing the model files.
AnswerA

SageMaker requires model artifacts as a tarball.

Why this answer

Amazon SageMaker requires model artifacts to be packaged as a single .tar.gz file when using a custom Docker container for real-time inference. This compressed archive must contain the model files (e.g., model.pth, model.h5) and any necessary inference code, as SageMaker extracts the archive to the /opt/ml/model directory during deployment. The .tar.gz format ensures consistent extraction and compatibility with SageMaker's inference pipeline.

Exam trap

The trap here is that candidates may assume SageMaker accepts common archive formats like .zip or any file structure, but the exam specifically tests the requirement for a single .tar.gz file as the only supported format for model artifacts in custom container deployments.

How to eliminate wrong answers

Option B is wrong because a folder on S3 with model files is not a valid format; SageMaker expects a single compressed archive, not a directory structure, to ensure atomic deployment and consistent extraction. Option C is wrong because SageMaker does impose a format requirement: the model artifacts must be a .tar.gz file; arbitrary files would break the deployment process. Option D is wrong because a .zip file is not supported by SageMaker for model artifacts; only .tar.gz is accepted, as SageMaker's extraction logic is built around tar-based archives.

344
MCQeasy

A machine learning team is using Amazon SageMaker to tune hyperparameters for a neural network. They have defined a hyperparameter tuning job with a random search strategy. The training time per job is very long. Which strategy can reduce the total tuning time?

A.Enable early stopping to terminate poorly performing jobs.
B.Use a larger instance type for each training job.
C.Switch to Bayesian optimization.
D.Increase the number of training jobs.
AnswerA

Early stops poor trials early, saving compute time.

Why this answer

Enabling early stopping allows SageMaker to terminate training jobs that are unlikely to produce better results based on the objective metric, which directly reduces total tuning time by freeing up compute resources for more promising hyperparameter combinations. This is especially effective with random search, where many trials may converge slowly or plateau.

Exam trap

The trap here is that candidates often confuse early stopping with reducing training time per job (Option B) or assume Bayesian optimization always converges faster, but in practice, early stopping directly cuts wasted time on poor trials, which is the most effective strategy when individual training jobs are very long.

How to eliminate wrong answers

Option B is wrong because using a larger instance type speeds up individual training jobs but does not reduce the number of jobs or the time wasted on poor performers, and may increase cost without proportional benefit. Option C is wrong because switching to Bayesian optimization typically requires more initial jobs to build a surrogate model and can be less effective with very long training times per job, as it still waits for each job to complete before suggesting the next. Option D is wrong because increasing the number of training jobs would increase total tuning time, not reduce it, since each job still takes a long time to run.

345
MCQhard

A company uses SageMaker to train a model that processes sensitive customer data. Due to compliance, the training data must be encrypted at rest and in transit, and the model artifacts must be stored in a secured S3 bucket with encryption. Which combination of actions is REQUIRED?

A.Store data in an S3 bucket with AWS CloudHSM integration
B.Use an S3 bucket with SSE-S3 and enable SageMaker Internet-facing mode
C.Use an S3 bucket with default encryption (SSE-S3) and enable SSL for all connections
D.Enable AWS KMS encryption for the SageMaker notebook and training job, and use an S3 bucket with default encryption using AWS KMS
AnswerD

KMS encryption ensures encryption at rest and in transit for SageMaker and S3.

Why this answer

Option D is correct because it ensures end-to-end encryption: AWS KMS encryption for the SageMaker notebook and training job encrypts data in transit and at rest within the SageMaker environment, while an S3 bucket with default encryption using AWS KMS encrypts the training data and model artifacts at rest. This combination meets compliance requirements for encryption at rest and in transit, as KMS provides envelope encryption with customer-managed keys, and SageMaker automatically uses TLS for data in transit when KMS is enabled.

Exam trap

The trap here is that candidates often assume SSE-S3 alone is sufficient for compliance, but it does not cover encryption in transit or SageMaker-specific encryption, and they overlook the requirement for KMS to encrypt the SageMaker environment itself.

How to eliminate wrong answers

Option A is wrong because AWS CloudHSM integration is not a direct encryption method for S3 buckets; it provides hardware security modules for key storage but does not inherently encrypt data at rest in S3 or in transit, and it is not a required action for SageMaker encryption. Option B is wrong because SSE-S3 encrypts data at rest but does not address encryption in transit, and enabling SageMaker Internet-facing mode exposes the endpoint to the internet without ensuring SSL/TLS for all connections, violating compliance. Option C is wrong because SSE-S3 encrypts data at rest but does not provide encryption for SageMaker notebook instances or training jobs; enabling SSL for all connections is a best practice but not a specific SageMaker configuration, and it does not cover encryption of model artifacts in transit between SageMaker and S3 without KMS integration.

346
Multi-Selecteasy

A data scientist is evaluating a binary classification model. The model's confusion matrix shows: True Positives=80, False Positives=20, True Negatives=900, False Negatives=0. Which THREE metrics can be calculated from this confusion matrix? (Choose three.)

Select 3 answers
A.Precision
B.Recall
C.AUC-ROC
D.Accuracy
E.Root Mean Squared Error (RMSE)
AnswersA, B, D

Precision = TP/(TP+FP).

Why this answer

Precision is calculated as TP/(TP+FP) = 80/(80+20) = 0.80. This metric measures the proportion of positive identifications that were actually correct, which is directly derivable from the confusion matrix values.

Exam trap

The trap here is that candidates often assume AUC-ROC can be derived from a single confusion matrix, but it actually requires the full distribution of predicted probabilities to plot the ROC curve and calculate the area under it.

347
MCQhard

A company uses Amazon SageMaker to train a model for fraud detection. The training data is highly imbalanced. The data scientist uses SMOTE to oversample the minority class. However, the model still has poor recall on the minority class. Which additional technique should the data scientist consider?

A.One-vs-rest encoding
B.Use class weights in the loss function
C.L1 regularization
D.Principal component analysis (PCA)
AnswerB

Class weights penalize minority errors more.

Why this answer

Cost-sensitive learning assigns higher penalty to misclassifications of minority class, addressing imbalance. Option A is for feature selection, B is regularization, D is for multi-class.

348
MCQeasy

A data scientist is building a text classification model. The dataset contains 10,000 documents, each labeled with one of 5 categories. Which algorithm is most suitable for this task?

A.Principal Component Analysis (PCA)
B.Naive Bayes
C.Linear regression
D.k-means clustering
AnswerB

Naive Bayes is effective for text classification and small datasets.

Why this answer

Naive Bayes is highly suitable for text classification because it models the probability of each category given the document's word features using Bayes' theorem with a strong independence assumption. It performs well on high-dimensional sparse data like bag-of-words or TF-IDF representations, and it is particularly effective when the number of documents (10,000) is moderate relative to the vocabulary size, as it requires relatively little training data to estimate parameters.

Exam trap

Cisco often tests the distinction between supervised and unsupervised learning, leading candidates to mistakenly choose k-means clustering (an unsupervised method) for a labeled classification task, or to confuse PCA with a classification algorithm because it is used for feature reduction before modeling.

How to eliminate wrong answers

Option A is wrong because Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique that finds orthogonal components maximizing variance; it does not perform classification and ignores the category labels entirely. Option C is wrong because linear regression predicts a continuous numeric output, not a discrete categorical label; applying it to multiclass classification would require inappropriate thresholding and violates the assumption of normally distributed errors. Option D is wrong because k-means clustering is an unsupervised algorithm that partitions data into clusters based on distance, without using label information; it cannot assign documents to predefined categories and requires post-hoc mapping of clusters to labels.

349
MCQhard

A data scientist is using Amazon SageMaker Autopilot to automatically build a model. The dataset contains a mix of numerical and categorical features. After the experiment completes, Autopilot provides several candidate pipelines. Which pipeline is MOST likely to be ranked highest by Autopilot?

A.The pipeline with the lowest validation loss
B.The pipeline with the simplest model (e.g., linear classifier)
C.The pipeline with the fastest training time
D.The pipeline with the lowest training loss
AnswerA

Autopilot ranks candidates by validation performance.

Why this answer

Amazon SageMaker Autopilot ranks candidate pipelines by their objective metric on the validation dataset, which is typically the validation loss (e.g., cross-entropy for classification or mean squared error for regression). The pipeline with the lowest validation loss generalizes best to unseen data, making it the highest-ranked candidate. Autopilot uses hold-out validation or cross-validation to compute this metric, ensuring the ranking reflects out-of-sample performance rather than overfitting to the training set.

Exam trap

The trap here is that candidates often confuse training loss with validation loss, mistakenly thinking that a lower training loss indicates a better model, but Autopilot explicitly ranks by validation performance to prevent overfitting.

How to eliminate wrong answers

Option B is wrong because Autopilot does not prioritize model simplicity; it optimizes for predictive performance, and a more complex model (e.g., ensemble or gradient-boosted tree) often achieves lower validation loss than a linear classifier. Option C is wrong because training time is not a ranking criterion; Autopilot focuses on accuracy, not computational speed, and a faster pipeline may sacrifice performance. Option D is wrong because training loss is an in-sample metric that can be misleadingly low due to overfitting; Autopilot uses validation loss to avoid this bias and ensure generalization.

350
MCQhard

A machine learning engineer is tuning hyperparameters for a gradient boosting model using Amazon SageMaker Automatic Model Tuning. The objective metric is validation accuracy. After several tuning jobs, the best accuracy achieved is 0.85, but the engineer suspects the model is overfitting. Which hyperparameter adjustment is most likely to reduce overfitting?

A.Increase the regularization parameter (e.g., lambda or alpha)
B.Increase the maximum depth of trees
C.Increase the subsample ratio
D.Increase the learning rate
AnswerA

Regularization penalizes large weights, reducing overfitting.

Why this answer

Option D is correct because increasing the regularization parameter (e.g., lambda or alpha in XGBoost) penalizes model complexity and reduces overfitting. Option A is wrong because increasing learning rate can cause overfitting. Option B is wrong because increasing max depth increases model complexity, leading to overfitting.

Option C is wrong because decreasing subsample might reduce overfitting but increasing it introduces more data, which could increase overfitting.

351
Multi-Selectmedium

Which TWO of the following are valid techniques for handling missing values in a dataset for machine learning?

Select 2 answers
A.Replace missing values with the maximum value of the feature
B.Remove rows with missing values
C.Replace missing values with random noise
D.Convert missing values to the string 'missing'
E.Replace missing values with the mean of the feature
AnswersB, E

Dropping rows is acceptable if the missing data is random and not too many.

Why this answer

Option A is correct because mean imputation is a common technique. Option C is correct because dropping rows with missing values is valid. Option B is wrong because using the maximum value introduces bias.

Option D is wrong because adding random noise is not standard. Option E is wrong because converting to string is not appropriate for numerical features.

352
MCQeasy

Refer to the exhibit. A data scientist creates a SageMaker notebook instance using this Terraform configuration. The notebook fails to start. The logs indicate 'The IAM role does not have the necessary permissions'. Which addition to the IAM role policy is MOST likely needed?

A.cloudwatch:PutMetricData
B.s3:GetObject on the notebook bucket
C.sagemaker:CreatePresignedNotebookInstanceUrl
D.sagemaker:CreateTrainingJob
AnswerC

Required for notebook access.

Why this answer

The SageMaker notebook instance requires the `sagemaker:CreatePresignedNotebookInstanceUrl` permission to generate a presigned URL, which is used to access the notebook's Jupyter interface. Without this permission, the notebook fails to start because the IAM role cannot create the necessary URL for the user to connect, as indicated by the 'The IAM role does not have the necessary permissions' log error.

Exam trap

AWS often tests the specific permission required for notebook instance access, and the trap here is that candidates confuse general SageMaker permissions (like training or S3 access) with the precise `CreatePresignedNotebookInstanceUrl` action needed for the notebook to start.

How to eliminate wrong answers

Option A is wrong because `cloudwatch:PutMetricData` is used for publishing custom metrics to CloudWatch, which is not required for starting a SageMaker notebook instance; the notebook startup process does not depend on CloudWatch permissions. Option B is wrong because `s3:GetObject` on the notebook bucket is typically needed for accessing data or artifacts, but the notebook instance itself does not require S3 read access to start; the startup failure is due to missing permissions for generating the presigned URL, not S3 access. Option D is wrong because `sagemaker:CreateTrainingJob` is a permission for launching training jobs, which is unrelated to the notebook instance lifecycle; the notebook startup does not involve creating training jobs.

353
MCQmedium

An e-commerce company wants to build a recommendation system. They have user-item interaction data (clicks, purchases) and user demographic data. The goal is to recommend items that a user is likely to purchase. Which approach should be used?

A.Linear regression on user and item features.
B.Collaborative filtering using matrix factorization.
C.Factorization Machines using user-item interactions and user features.
D.Content-based filtering using item features.
AnswerC

Handles sparse data and side features effectively.

Why this answer

Option D is correct because Factorization Machines are designed for high-dimensional sparse data and can handle both user-item interactions and side features. Option A is wrong because collaborative filtering does not naturally incorporate user demographic features. Option B is wrong because content-based filtering typically uses item features.

Option C is wrong because linear regression is not suitable for implicit feedback.

354
Multi-Selecteasy

A data scientist is performing feature engineering for a machine learning model. The dataset contains categorical features with high cardinality. Which THREE techniques are appropriate for encoding high-cardinality categorical features?

Select 3 answers
A.Target encoding
B.Binary encoding
C.Label encoding
D.Count encoding
E.One-hot encoding with pruning of rare categories
AnswersA, D, E

Replaces category with target mean.

Why this answer

Option A is correct because target encoding replaces categories with the mean target value. Option B is correct because count encoding replaces categories with frequency counts. Option D is correct because one-hot encoding can be used if the number of categories is manageable after pruning.

Option C is wrong because label encoding implies ordinal relationship, not ideal for nominal high cardinality. Option E is wrong because binary encoding is another option, but the question asks for THREE; typically target, count, and one-hot are common.

355
Multi-Selectmedium

A team is building a regression model to predict house prices. They observe that the model performs well on training data but poorly on validation data. Which THREE actions can help reduce overfitting? (Choose THREE.)

Select 3 answers
A.Reduce model complexity by selecting fewer features
B.Increase regularization strength (e.g., L1, L2)
C.Collect more training data if possible
D.Increase the maximum depth of decision trees
E.Add more interaction features
AnswersA, B, C

Simpler models generalize better.

Why this answer

Option A (increase regularization) penalizes large coefficients. Option C (reduce model complexity) like using fewer features or a simpler algorithm. Option D (add more training data) helps generalization.

Option B (increase tree depth) increases overfitting. Option E (feature engineering) may not reduce overfitting directly.

356
MCQeasy

Refer to the exhibit. A data scientist checks the status of a SageMaker endpoint and sees the output above. What does this indicate?

A.The endpoint has failed
B.The endpoint is running at full capacity
C.The endpoint is out of service
D.The endpoint is scaling up to meet desired capacity
AnswerD

Current is less than desired, so scaling up.

Why this answer

Option B is correct because the endpoint is InService but the current instance count (2) is less than the desired count (5), indicating scaling is in progress. Option A is wrong because the status is InService, not OutOfService. Option C is wrong because the endpoint is running but at lower capacity.

Option D is wrong because the endpoint is not failed.

357
MCQeasy

A data scientist trains a convolutional neural network (CNN) for image classification. The training loss decreases steadily, but the validation loss starts increasing after 10 epochs. Which technique should the data scientist use to address this problem?

A.Add more data augmentation to the training set.
B.Use early stopping to halt training when validation loss stops decreasing.
C.Increase the number of training epochs.
D.Add more convolutional layers to increase model capacity.
E.Increase the learning rate.
AnswerB

Early stopping prevents overfitting by stopping at the optimal point.

Why this answer

Option D is correct because early stopping halts training when validation loss stops improving, preventing overfitting. Option A (more data) is not directly addressing the current overfitting. Option B (data augmentation) can help but is already used.

Option C (increase learning rate) may cause divergence. Option E (more layers) increases model complexity, likely worsening overfitting.

358
MCQhard

A company is building a recommendation system using collaborative filtering on Amazon SageMaker. The dataset contains user-item interactions with a long-tail distribution: a few items have millions of interactions, while most items have very few. The model currently uses matrix factorization with ALS. The recall@20 metric is low for niche items. Which modification would most likely improve recall for long-tail items?

A.Increase the regularization parameter to prevent overfitting
B.Add explicit features like item category and user demographics
C.Increase the number of latent factors in the matrix
D.Use implicit feedback with confidence weighting to downweight popular items
AnswerD

Confidence weighting reduces the influence of overly popular items, allowing the model to learn patterns for niche items.

Why this answer

Implicit feedback models can incorporate confidence weights that downweight popular items, helping the model focus on less frequent items. Adding explicit features would not directly address the long-tail. Increasing the number of factors might help but could also overfit.

Regularization is already present; adjusting it might not target the issue specifically.

359
Multi-Selecthard

Which THREE techniques help reduce overfitting in a neural network? (Select THREE.)

Select 3 answers
A.Dropout
B.L2 Regularization
C.Increasing the number of layers
D.Using a larger batch size
E.Early Stopping
AnswersA, B, E

Dropout is a regularization technique that reduces overfitting.

Why this answer

Dropout randomly drops units during training, L2 regularization penalizes large weights, and early stopping halts training when validation error increases. Data augmentation can also help but is not listed. Batch normalization may help but primarily for training stability.

360
Multi-Selectmedium

A data scientist is training a binary classification model to predict customer churn. The dataset has 10,000 samples with 500 churners (5% positive class). Which TWO techniques should the scientist use to address the class imbalance? (Choose TWO.)

Select 2 answers
A.Use SMOTE to oversample the minority class
B.Tune the decision threshold after training
C.Randomly undersample the majority class to match minority size
D.Oversample the minority class by duplicating existing samples
E.Set class_weight='balanced' in the classifier
AnswersA, E

SMOTE creates synthetic samples to balance classes.

Why this answer

Option A (SMOTE) generates synthetic samples for the minority class. Option C (class_weight='balanced') adjusts loss function weights. Option B (undersampling majority) can be used but is not always preferred; Option D (oversampling with replacement) may cause overfitting; Option E (threshold tuning) is post-training.

361
MCQhard

A machine learning engineer is tuning a gradient boosting model using SageMaker Hyperparameter Tuning. The objective is to minimize MAE. The tuning job uses 20 training jobs. After 10 jobs, the best objective value is 5.2. Which action should the engineer take to potentially improve the result?

A.Set early stopping to avoid overfitting.
B.Change the objective metric to RMSE.
C.Increase the total number of training jobs to 50.
D.Switch the tuning strategy from Bayesian to Random search.
AnswerC

More jobs allow broader exploration and may find a better configuration.

Why this answer

Option C is correct because increasing the total number of training jobs from 20 to 50 gives the Bayesian optimization algorithm more opportunities to explore the hyperparameter space and exploit promising regions. With only 10 jobs completed, the tuning job may not have converged to the global minimum of MAE, and additional jobs can refine the search, especially since Bayesian search builds a probabilistic model that improves with more observations.

Exam trap

The trap here is that candidates mistakenly think early stopping (Option A) applies to the tuning job itself rather than to individual training jobs, or they assume changing the metric (Option B) will indirectly improve MAE, when in fact the tuning job's objective must directly match the business metric.

How to eliminate wrong answers

Option A is wrong because early stopping is a technique to halt training of a single model when validation performance stops improving, not a mechanism to improve the tuning job's best objective value; it prevents overfitting per job but does not help the hyperparameter search find a better configuration. Option B is wrong because changing the objective metric to RMSE would optimize for a different loss function, which contradicts the stated goal of minimizing MAE and could lead to a model that performs worse on the actual target metric. Option D is wrong because switching from Bayesian to Random search would discard the information already gathered from the first 10 jobs, likely reducing sample efficiency and making it harder to find a better result within the remaining budget.

362
Multi-Selectmedium

A data scientist is training a random forest model for a binary classification task. The dataset has 100,000 samples and 500 features. The model is overfitting. Which TWO actions are MOST likely to reduce overfitting?

Select 2 answers
A.Increase the number of trees in the forest
B.Reduce the maximum depth of each tree
C.Increase the number of features considered at each split
D.Use all features for each tree
E.Increase the minimum number of samples required to split an internal node
AnswersB, E

Shorter trees are simpler and less likely to overfit.

Why this answer

Reducing the maximum depth of each tree limits the complexity of individual trees, preventing them from memorizing noise and specific patterns in the training data. This is a standard regularization technique for random forests that directly combats overfitting by controlling the variance of the model.

Exam trap

AWS often tests the misconception that adding more trees always reduces overfitting, but the trap here is that without controlling tree complexity (depth or split criteria), more trees can still produce an overfit ensemble, especially when individual trees are allowed to grow unchecked.

363
MCQmedium

A data scientist is using Amazon SageMaker to train a linear regression model. The training job fails with the error: 'AlgorithmError: Input data has NaN values'. Which step should the data scientist take to resolve this issue?

A.Convert the data to a sparse format
B.Switch to a different algorithm that handles missing values
C.Impute missing values or remove rows with NaN values
D.Increase the number of training instances
AnswerC

Handling missing values by imputation or removal resolves the NaN error.

Why this answer

The error indicates NaN values in the input data. The correct action is to handle missing values before training. Option A is wrong because increasing instance count does not fix data issues.

Option C is wrong because the error is data-related, not algorithm-related. Option D is wrong because the issue is NaN values, not data format.

364
MCQhard

A data scientist is training a deep learning model on a large dataset using Amazon SageMaker. The training job is taking too long. The scientist notices that GPU utilization is low and data loading is the bottleneck. Which action should the scientist take to improve training performance?

A.Increase the number of training instances
B.Use Pipe mode for the training data channel
C.Change the instance type to a CPU instance
D.Reduce the batch size
AnswerB

Pipe mode streams data, reducing I/O bottleneck.

Why this answer

Low GPU utilization with data loading bottleneck indicates that the CPU cannot feed data fast enough. Using Pipe mode streams data directly from S3 without downloading, reducing I/O overhead. Increasing instance count may not help if each GPU is underutilized.

Changing to a CPU instance would be slower. Reducing batch size would reduce GPU utilization further. Option A: Pipe mode is correct.

Option B: More instances may not address data loading bottleneck. Option C: CPU instance is slower. Option D: Smaller batch size reduces GPU utilization.

365
MCQmedium

A data scientist is training a deep learning model using TensorFlow on Amazon SageMaker. The training job uses a single GPU instance but the GPU utilization is low. Which action is MOST likely to improve GPU utilization?

A.Increase the batch size
B.Use a smaller instance type
C.Add more features
D.Decrease the number of epochs
AnswerA

Larger batch size better utilizes GPU.

Why this answer

Increasing the batch size allows the GPU to process more data in parallel per training step, which keeps the GPU compute units busier and reduces idle time. In TensorFlow on SageMaker, a small batch size can cause the GPU to finish computation quickly and then wait for the next batch to be loaded, leading to low utilization. This is the most direct way to improve GPU throughput without changing the instance or model architecture.

Exam trap

The trap here is that candidates confuse low GPU utilization with overfitting or model complexity, leading them to choose options like adding features or reducing epochs, when the real issue is underutilization of parallel compute resources due to insufficient batch size.

How to eliminate wrong answers

Option B is wrong because using a smaller instance type would reduce GPU compute capacity, likely worsening utilization and increasing training time. Option C is wrong because adding more features increases the input dimensionality, which may increase computation per sample but does not address the root cause of low GPU utilization (insufficient parallelism). Option D is wrong because decreasing the number of epochs reduces total training time but does not affect how efficiently the GPU is used during each step; utilization per step remains unchanged.

366
MCQhard

A data scientist is training a deep learning model on Amazon SageMaker using a large dataset stored in S3. The training job is taking too long due to high I/O latency waiting for data to be downloaded from S3. Which action would MOST effectively reduce the I/O latency?

A.Use File mode for the training channel
B.Increase the number of training instances
C.Use Pipe mode for the training channel
D.Use Amazon SageMaker Elastic Inference
AnswerC

Pipe mode streams data directly from S3, reducing disk I/O and latency.

Why this answer

Pipe mode streams data directly from S3 into the training algorithm without writing to disk, eliminating the I/O latency caused by downloading files to the local storage. This is the most effective solution because the bottleneck is data transfer from S3, and Pipe mode reduces it to near-zero latency by feeding data on the fly.

Exam trap

The trap here is that candidates confuse File mode (which downloads fully) with Pipe mode (which streams), or mistakenly think adding more instances (Option B) solves a per-instance I/O bottleneck, when in fact it does not address the root cause of S3 download latency.

How to eliminate wrong answers

Option A is wrong because File mode downloads the entire dataset to the training instance's local disk before training starts, which actually increases I/O latency due to the full download overhead. Option B is wrong because increasing the number of training instances does not reduce per-instance I/O latency; it distributes the workload but each instance still suffers from the same S3 download bottleneck. Option D is wrong because Amazon SageMaker Elastic Inference accelerates model inference, not training data loading, so it has no effect on I/O latency during training.

367
MCQmedium

A company is building a recommendation system for an e-commerce platform. The system needs to suggest products to users based on past purchases and browsing history. Which approach would be most appropriate for this use case?

A.Content-based filtering using product descriptions
B.K-means clustering of users based on demographics
C.Collaborative filtering using past user-item interactions
D.Matrix factorization on user-item ratings
AnswerC

Collaborative filtering leverages user behavior patterns to make recommendations.

Why this answer

Collaborative filtering is the most appropriate approach because it leverages past user-item interactions (e.g., purchases, clicks) to identify patterns and recommend items that similar users have liked. This method directly captures user behavior and preferences without requiring explicit product metadata, making it ideal for e-commerce recommendation systems where implicit feedback is abundant.

Exam trap

AWS often tests the distinction between collaborative filtering and matrix factorization, where candidates mistakenly choose matrix factorization (Option D) because it is a popular technique, but the question's emphasis on 'past purchases and browsing history' (implicit feedback) makes collaborative filtering the more direct and practical choice, as matrix factorization typically requires explicit ratings or careful adaptation for implicit data.

How to eliminate wrong answers

Option A is wrong because content-based filtering relies solely on product descriptions or features, which ignores the collaborative signal from other users' behavior and fails to capture serendipitous recommendations or cross-category preferences. Option B is wrong because K-means clustering based on demographics groups users by static attributes (e.g., age, location), which does not model dynamic purchase behavior or item preferences, leading to poor recommendation accuracy. Option D is wrong because matrix factorization on user-item ratings assumes explicit numerical ratings (e.g., 1-5 stars), which are often sparse or unavailable in e-commerce; it also requires a dense rating matrix and cannot directly handle implicit feedback like browsing history without additional preprocessing.

368
Multi-Selectmedium

A data scientist is training a binary classifier using a large dataset with class imbalance (90% negative, 10% positive). After training a logistic regression model, the F1 score is low but accuracy is high. Which TWO actions should the data scientist take to improve model performance? (Choose 2.)

Select 2 answers
A.Switch to evaluation metrics such as F1 score or AUC-ROC instead of accuracy.
B.Apply feature scaling to ensure all features contribute equally.
C.Add more features to the model to improve its capacity.
D.Resample the training data using techniques like SMOTE to balance the classes.
E.Increase the regularization parameter to reduce overfitting.
AnswersA, D

Correct: Metrics like F1 are robust to class imbalance.

Why this answer

Option A (resample training data) and Option C (use different evaluation metric) are correct because class imbalance causes the model to be biased toward the majority class, leading to high accuracy but poor F1. Resampling (e.g., SMOTE) balances classes, and using F1 or AUC-ROC focuses on minority class performance. Option B (feature scaling) is a general preprocessing step but doesn't directly address imbalance.

Option D (increase regularization) might reduce overfitting but doesn't target imbalance. Option E (add more features) may not help if the model is already biased.

369
Multi-Selectmedium

A data scientist is building a recommender system using Amazon SageMaker. The dataset contains user-item interactions with implicit feedback (clicks). Which THREE evaluation metrics are appropriate for this use case?

Select 3 answers
A.Root Mean Squared Error (RMSE)
B.Precision@k
C.Mean Average Precision (MAP)
D.Recall@k
E.Area Under the ROC Curve (AUC-ROC)
AnswersB, C, D

Precision@k measures relevance of top-k recommendations.

Why this answer

Precision@k is appropriate for implicit feedback (clicks) because it measures the proportion of relevant items among the top-k recommendations, focusing on the accuracy of the ranked list. In recommender systems with implicit feedback, where only positive interactions are observed, ranking metrics like Precision@k are standard as they evaluate the quality of the top recommendations without requiring explicit ratings.

Exam trap

The trap here is that candidates often confuse regression metrics (RMSE) or binary classification metrics (AUC-ROC) as applicable to implicit feedback, not realizing that recommender systems with implicit feedback require ranking-based metrics that handle only positive observations and no explicit negative labels.

370
MCQmedium

A data scientist is training a classification model on an imbalanced dataset where the positive class represents only 5% of the data. Which technique would BEST address the class imbalance without discarding data?

A.Use SMOTE to generate synthetic samples for the minority class
B.Randomly undersample the majority class
C.Adjust the decision threshold to 0.95
D.Randomly oversample the minority class with replacement
AnswerA

SMOTE creates synthetic samples, balancing the dataset without data loss.

Why this answer

SMOTE (Synthetic Minority Oversampling Technique) is the best choice because it generates synthetic samples for the minority class by interpolating between existing minority instances, effectively balancing the dataset without discarding any data. This avoids the information loss of undersampling and the overfitting risk of simple random oversampling, making it ideal for a 5% positive class scenario.

Exam trap

Cisco often tests the distinction between data-level techniques (like SMOTE) and post-hoc adjustments (like threshold tuning), trapping candidates who think changing the threshold alone solves the imbalance without addressing the underlying data distribution.

How to eliminate wrong answers

Option B is wrong because randomly undersampling the majority class discards data, which can lead to loss of valuable information and reduced model performance, especially when the majority class contains important patterns. Option C is wrong because adjusting the decision threshold to 0.95 does not address class imbalance at the data level; it only changes the classification cutoff, which may improve recall but does not fix the underlying skewed distribution and can harm precision. Option D is wrong because randomly oversampling the minority class with replacement duplicates existing samples, which can cause overfitting to the minority class and does not introduce new, diverse examples like SMOTE does.

371
MCQmedium

A company is building a recommendation system using matrix factorization. The training data contains user-item interactions. The model performs well on the training set but poorly on the test set. Which regularization technique should be applied to improve generalization?

A.Add L1 regularization to the user and item latent factors
B.Add L2 regularization to the user and item latent factors
C.Apply dropout to the latent factors during training
D.Use batch normalization on the factors
AnswerB

L2 regularization penalizes large factor values, reducing overfitting.

Why this answer

L2 regularization (weight decay) penalizes large values in the user and item latent factor matrices, which helps prevent overfitting by encouraging the model to learn smoother, more generalizable representations. This is the standard regularization technique used in matrix factorization for collaborative filtering, as it directly controls the magnitude of the latent vectors without inducing sparsity.

Exam trap

Cisco often tests the distinction between L1 and L2 regularization in the context of matrix factorization, where candidates mistakenly choose L1 because they associate it with feature selection, but the correct choice for controlling latent factor magnitude and preventing overfitting is L2 regularization.

How to eliminate wrong answers

Option A is wrong because L1 regularization induces sparsity in the latent factors, which is not typically desired in matrix factorization—sparse factors can lose the dense, low-rank structure needed for capturing collaborative signals. Option C is wrong because dropout is a regularization technique designed for neural networks, not for standard matrix factorization models, and applying it to latent factors would disrupt the multiplicative interaction that defines the prediction. Option D is wrong because batch normalization normalizes activations within mini-batches to stabilize training in deep networks, but matrix factorization has no notion of mini-batch activations and batch normalization does not address overfitting from latent factor magnitudes.

372
MCQmedium

A company uses SageMaker to train a large language model. The training job is taking too long. The data scientist wants to use distributed training with data parallelism. Which SageMaker feature should be used?

A.SageMaker distributed training libraries
B.SageMaker Neo
C.SageMaker Processing
D.SageMaker Debugger
AnswerA

These libraries provide data and model parallelism.

Why this answer

Option B is correct because SageMaker's distributed training libraries support data parallelism. Option A is wrong because SageMaker Debugger is for monitoring and debugging, not distribution. Option C is wrong because SageMaker Processing is for data processing.

Option D is wrong because SageMaker Neo is for model optimization.

373
MCQmedium

A data scientist is training a deep learning model for image classification using Amazon SageMaker. The training job is taking too long. The data scientist notices that GPU utilization is low (around 30%). Which action is most likely to improve GPU utilization and reduce training time?

A.Increase the batch size
B.Use a smaller instance type
C.Increase the learning rate
D.Reduce the batch size
AnswerA

Larger batch size keeps GPU busy, improving utilization and reducing total training time if the data pipeline can keep up.

Why this answer

Low GPU utilization (around 30%) indicates that the GPU is spending too much time idle while waiting for data batches to be processed. Increasing the batch size allows each training step to process more samples per forward/backward pass, which increases computational load on the GPU and improves hardware utilization. This directly reduces the number of steps needed per epoch, thereby decreasing overall training time.

Exam trap

The trap here is that candidates often confuse low GPU utilization with a need to reduce batch size (thinking smaller batches speed up training), when in fact increasing batch size is the standard remedy to saturate GPU compute and reduce wall-clock time.

How to eliminate wrong answers

Option B is wrong because using a smaller instance type would reduce available GPU compute resources, likely further lowering utilization and increasing training time. Option C is wrong because increasing the learning rate does not directly affect GPU utilization; it changes the optimization dynamics and may cause divergence or instability without addressing the underutilization bottleneck. Option D is wrong because reducing the batch size would decrease the amount of work per GPU step, further lowering utilization and increasing the number of steps, which would worsen training time.

374
MCQhard

Refer to the exhibit. A SageMaker training job using the built-in Linear Learner algorithm fails with 'Loss function returned NaN'. Which hyperparameter change is MOST likely to resolve this issue?

A.Increase the learning rate to 0.5
B.Increase mini_batch_size to 2000
C.Decrease epochs to 5
D.Reduce learning_rate to 0.01
AnswerD

Lower learning rate helps convergence.

Why this answer

The 'Loss function returned NaN' error in SageMaker's built-in Linear Learner algorithm typically occurs when the learning rate is too high, causing gradient updates to overshoot optimal parameters and diverge. Reducing the learning rate to 0.01 stabilizes training by ensuring smaller, more controlled weight updates, preventing numerical instability that leads to NaN loss.

Exam trap

AWS often tests the misconception that increasing the learning rate speeds up convergence, but the trap here is that a high learning rate causes divergence and NaN loss, so the correct fix is to reduce it, not increase it.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate to 0.5 would exacerbate the instability, making NaN loss more likely rather than resolving it. Option B is wrong because increasing mini_batch_size to 2000 does not directly address the learning rate-induced divergence; while larger batches can reduce gradient variance, they do not fix the fundamental issue of an overly aggressive step size. Option C is wrong because decreasing epochs to 5 would simply truncate training without addressing the root cause—the loss would still be NaN from the first few iterations if the learning rate is too high.

375
MCQmedium

A data scientist is building a recommendation system using collaborative filtering. The dataset contains user-item interactions in a sparse matrix. The model will be trained on Amazon SageMaker using the built-in Factorization Machines algorithm. Which data format should the scientist use for the training data?

A.CSV format with all features as columns
B.JSON format with nested arrays
C.RecordIO-protobuf format with sparse features
D.Parquet format
AnswerC

RecordIO-protobuf is the recommended format for sparse data for Factorization Machines.

Why this answer

Amazon SageMaker's Factorization Machines algorithm expects input in the 'application/x-recordio-protobuf' format for sparse data, or in CSV format for dense data. For sparse data, Protobuf is recommended for performance.

← PreviousPage 5 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Modeling questions.