Knowledge + Practice

CCNA Modeling Questions

75 of 624 questions · Page 1/9 · Modeling · Answers revealed

Practice these questions Domain overview All questions

1

Multi-Selecthard

Which THREE factors should be considered when choosing between a parametric and a non-parametric machine learning model?

Select 3 answers

A.Non-parametric models are generally more flexible

B.Parametric models have lower bias than non-parametric models

C.Parametric models train faster than non-parametric models

D.Non-parametric models are less prone to overfitting

E.Parametric models typically require less training data

AnswersA, C, E

Non-parametric models can fit complex patterns.

Why this answer

Option A is correct because non-parametric models, such as k-nearest neighbors or decision trees, do not assume a fixed functional form for the data, allowing them to capture complex, non-linear relationships. This flexibility makes them well-suited for datasets where the underlying distribution is unknown or highly irregular, but it also increases the risk of overfitting if not properly regularized.

Exam trap

AWS often tests the misconception that non-parametric models are less prone to overfitting because they are 'simpler,' when in fact their flexibility makes them more susceptible to overfitting without careful tuning or large datasets.

Practice this question →

2

MCQeasy

A data scientist needs to perform feature scaling for a dataset containing numerical features with different units (e.g., age in years and income in dollars). Which scaling method is most appropriate when the algorithm assumes data is normally distributed?

A.Standardization (Z-score normalization)

B.Log transformation

C.Min-Max scaling

D.Robust scaling (using median and IQR)

AnswerA

Standardization centers data around zero with unit variance, suitable for normality assumptions.

Why this answer

Option B is correct because Standardization (Z-score normalization) transforms features to have mean 0 and standard deviation 1, which is suitable for algorithms assuming normality. Option A is wrong because Min-Max scaling does not preserve distribution shape. Option C is wrong because Robust scaling is for outliers.

Option D is wrong because Log transformation is for skewed data, not scaling.

Practice this question →

3

Multi-Selecteasy

Which TWO services can be used to serve machine learning models for real-time inference? (Select TWO.)

Select 2 answers

A.Amazon Rekognition

B.Amazon Athena

C.Amazon SageMaker

D.Amazon Comprehend

E.AWS Lambda

AnswersC, E

SageMaker endpoints are designed for real-time inference.

Why this answer

Amazon SageMaker provides real-time endpoints. AWS Lambda can be used with a custom container for inference. Amazon Rekognition is a specialized service for image analysis, not general model serving.

Amazon Comprehend is NLP service. Amazon Athena is query service.

Practice this question →

4

MCQmedium

A data scientist is working on a regression problem with a dataset that contains outliers. The data scientist is choosing between mean squared error (MSE) and mean absolute error (MAE) as the loss function. Which loss function is more robust to outliers?

A.Both are equally robust to outliers.

B.MSE, because it penalizes large errors more heavily.

C.MAE, because it treats all errors equally.

D.Neither is robust; use Huber loss instead.

AnswerC

MAE is linear in errors, reducing the impact of outliers.

Why this answer

MAE is more robust to outliers because it uses the absolute difference between predicted and actual values, which does not disproportionately penalize large errors. In contrast, MSE squares the errors, causing outliers to have a much larger influence on the loss and model updates. This makes MAE less sensitive to extreme values in regression tasks.

Exam trap

AWS often tests the misconception that a loss function that penalizes errors more heavily is better for robustness, when in fact the opposite is true for outliers.

How to eliminate wrong answers

Option A is wrong because MSE and MAE handle outliers very differently due to their mathematical formulations, so they are not equally robust. Option B is wrong because MSE's heavier penalization of large errors actually makes it less robust to outliers, not more. Option D is wrong because while Huber loss is indeed more robust than both MSE and MAE in some cases, the question asks which of the two given loss functions is more robust, and MAE is the correct choice; Huber loss is not an option in the original comparison.

Practice this question →

5

MCQeasy

A data scientist needs to evaluate a binary classification model's performance. Which metric is most appropriate when the cost of false positives is very high?

A.Accuracy

B.F1 score

C.Recall

D.Precision

AnswerD

Minimizes false positives.

Why this answer

Precision is the most appropriate metric when the cost of false positives is very high because it measures the proportion of positive identifications that were actually correct. In binary classification, precision = TP / (TP + FP), so a high precision means very few false positives occur, directly minimizing the costly error type.

Exam trap

The trap here is that candidates often confuse precision with recall or F1 score, mistakenly thinking that minimizing false positives is best achieved by maximizing recall or a balanced metric, rather than directly optimizing precision.

How to eliminate wrong answers

Option A is wrong because accuracy considers both false positives and false negatives equally, and can be misleading when classes are imbalanced; it does not specifically penalize false positives. Option B is wrong because the F1 score is the harmonic mean of precision and recall, balancing both false positives and false negatives, so it does not prioritize minimizing false positives over false negatives. Option C is wrong because recall measures the proportion of actual positives correctly identified (TP / (TP + FN)), which focuses on avoiding false negatives, not false positives.

Practice this question →

6

MCQhard

A machine learning engineer is training a neural network on Amazon SageMaker using a custom Docker container. The training job fails with an error: 'CUDA out of memory.' The training instance is an ml.p3.2xlarge with 16 GB GPU memory. The model and data fit into memory when using batch size 32, but the engineer wants to maximize GPU utilization. Which approach should the engineer use to fix the out-of-memory error while maintaining efficient training?

A.Enable mixed precision training

B.Reduce batch size to 1

C.Use a CPU-only instance

D.Implement gradient accumulation with a larger effective batch size

AnswerD

Accumulates gradients over smaller batches to simulate larger batches.

Why this answer

Gradient accumulation allows the engineer to simulate a larger effective batch size by accumulating gradients over multiple forward/backward passes before performing an optimizer step. This keeps the per-step memory footprint low (avoiding CUDA out-of-memory) while maintaining training dynamics similar to a larger batch, thus maximizing GPU utilization without crashing.

Exam trap

The trap here is that candidates may think mixed precision training (Option A) is the direct solution for CUDA out-of-memory, but it only reduces memory per tensor, not the peak memory from batch size; gradient accumulation is the correct technique to handle large effective batches without exceeding GPU memory.

How to eliminate wrong answers

Option A is wrong because mixed precision training reduces memory usage by using float16 for most operations, but it does not directly address the out-of-memory error caused by batch size being too large; it can help but is not the primary fix for this specific error when the goal is to maintain efficient training with a larger effective batch. Option B is wrong because reducing batch size to 1 would drastically reduce GPU utilization and slow convergence, contradicting the goal of maximizing GPU utilization. Option C is wrong because switching to a CPU-only instance would eliminate the GPU entirely, making the training extremely slow and defeating the purpose of using a GPU-accelerated instance like ml.p3.2xlarge.

Practice this question →

7

MCQeasy

A company uses Amazon SageMaker to train a linear regression model. The training data includes a feature 'age' with values ranging from 0 to 100. The model's loss is not converging. What is the MOST likely cause?

A.The features are not normalized.

B.There are outliers in the target variable.

C.The instance type is too small.

D.The learning rate is too high.

AnswerA

Unscaled features cause gradient descent to oscillate.

Why this answer

Feature scaling is crucial for linear models; 'age' range is large compared to other features if they are not scaled. Option A (wrong instance type) is unlikely. Option C (outliers) could cause issues but scaling is more fundamental.

Option D (learning rate too high) is possible but scaling is a common first step.

Practice this question →

8

MCQeasy

A company wants to use Amazon SageMaker to train a deep learning model using a custom TensorFlow script. The data is stored in an S3 bucket. Which SageMaker API operation should be used to launch the training job?

A.CreateHyperParameterTuningJob

B.CreateEndpoint

C.CreateTransformJob

D.CreateTrainingJob

AnswerD

CreateTrainingJob starts a training job.

Why this answer

The correct API operation to launch a training job in Amazon SageMaker is CreateTrainingJob. This operation specifies the training algorithm (or custom script), resource configuration (instance type and count), input data configuration (pointing to S3), and output location for the model artifacts. It directly initiates the training process on SageMaker-managed infrastructure.

Exam trap

Cisco often tests the distinction between training, tuning, inference, and batch transform operations, so candidates mistakenly choose CreateHyperParameterTuningJob when the question only asks for a single training job, or choose CreateEndpoint when they confuse training with deployment.

How to eliminate wrong answers

Option A is wrong because CreateHyperParameterTuningJob is used to launch a hyperparameter tuning job, which runs multiple training jobs with different hyperparameter combinations to find the best model; it is not the direct API for a single training job. Option B is wrong because CreateEndpoint is used to deploy a trained model as a real-time inference endpoint, not to start training. Option C is wrong because CreateTransformJob is used for batch inference on an existing trained model, not for training a new model.

Practice this question →

9

MCQeasy

A data scientist is using Amazon SageMaker to train a model. The training job fails with an error 'Insufficient instance capacity'. Which action should the scientist take to resolve this?

A.Check IAM permissions

B.Retry with a different instance type or Availability Zone

C.Increase the number of instances

D.Reduce the training data size

AnswerB

Different instances/AZs may have available capacity.

Why this answer

Option D is correct because retrying with a different instance type or in a different Availability Zone often resolves capacity issues. Option A is wrong because the error is about capacity, not permissions. Option B is wrong because reducing data size doesn't affect capacity.

Option C is wrong because increasing the instance count may exacerbate the problem.

Practice this question →

10

MCQmedium

A data scientist ran an XGBoost training job on Amazon SageMaker using a CSV dataset. The training job failed with the error shown. What is the most likely cause of this failure?

A.The dataset includes a header row

B.One of the rows in the CSV has an extra column

C.The delimiter used in the CSV is not a comma

D.The dataset contains missing values

AnswerB

The error message indicates that line 1 has 11 fields instead of the expected 10, meaning an extra column in that row.

Why this answer

XGBoost on SageMaker expects the CSV input to have a consistent number of columns across all rows. If one row contains an extra column, the parser will fail because it cannot map the additional field to the feature schema defined by the training job. This mismatch causes the 'Error: Number of columns does not match' or similar parsing error.

Exam trap

The trap here is that candidates often confuse a column count mismatch with missing values or delimiter issues, but the error message specifically points to inconsistent row lengths, not data quality or formatting problems.

How to eliminate wrong answers

Option A is wrong because XGBoost can handle a header row if the `csv_reader` parameter is set to skip it, and a header alone does not cause a column count mismatch error. Option C is wrong because SageMaker's XGBoost implementation defaults to comma delimiter, but if a different delimiter is used, the error would be about unrecognized delimiter or parsing, not a column count mismatch. Option D is wrong because XGBoost natively handles missing values (e.g., by treating them as NaN or using the `missing` parameter), and missing values do not cause a column count error.

Practice this question →

11

MCQmedium

A company uses Amazon SageMaker to deploy a model for real-time inference. The endpoint uses an ml.m5.large instance with automatic scaling based on CPU utilization. The team notices that during traffic spikes, the endpoint returns 5xx errors. What should the team do to improve the endpoint's availability?

A.Increase the instance type to ml.c5.2xlarge.

B.Reduce the scaling cooldown period.

C.Place an Application Load Balancer in front of the endpoint.

D.Use Amazon API Gateway to throttle requests.

AnswerA

Larger instance type provides more capacity.

Why this answer

The correct answer is A because upgrading the instance type from ml.m5.large to ml.c5.2xlarge provides more CPU and memory resources, which directly addresses the root cause of 5xx errors during traffic spikes — insufficient compute capacity to handle the request load. Automatic scaling based on CPU utilization may not react quickly enough to sudden spikes, leading to request queuing and timeouts that manifest as 5xx errors. A larger instance type increases the baseline throughput, reducing the likelihood of resource exhaustion before scaling can take effect.

Exam trap

The trap here is that candidates often confuse auto-scaling configuration (cooldown periods, thresholds) with raw capacity planning, assuming that tuning scaling parameters alone can handle sudden spikes, when in fact the instance must have enough headroom to survive the scaling latency.

How to eliminate wrong answers

Option B is wrong because reducing the scaling cooldown period would cause the auto-scaling policy to react more aggressively to short-lived CPU spikes, potentially leading to thrashing and unnecessary instance provisioning, but it does not address the immediate capacity shortfall during the spike itself — the endpoint still needs enough raw compute to handle the burst. Option C is wrong because placing an Application Load Balancer (ALB) in front of the SageMaker endpoint is not supported — SageMaker endpoints use a built-in load balancer (AWS-managed) and cannot have an external ALB inserted; ALBs distribute traffic to targets like EC2 instances, not SageMaker hosted endpoints. Option D is wrong because using Amazon API Gateway to throttle requests would intentionally drop or delay requests during spikes, which would not improve availability but instead cause client-side errors (429 Too Many Requests) and degrade user experience, whereas the goal is to serve all requests successfully.

Practice this question →

12

MCQhard

A company uses a linear regression model to predict house prices. The model's R-squared is 0.95 on the training set but 0.60 on the test set. Which of the following is the most likely cause?

A.Overfitting

B.Underfitting

C.Multicollinearity among features

D.Data leakage

AnswerA

High training R-squared with much lower test R-squared is a classic sign of overfitting.

Why this answer

Option A is correct because such a large gap between training and test R-squared indicates overfitting. Option B is wrong because if the model were underfitting, both would be low. Option C is wrong because data leakage would likely cause high test R-squared.

Option D is wrong because multicollinearity affects both sets similarly.

Practice this question →

13

MCQhard

A team is training a deep learning model on Amazon SageMaker using a large dataset stored in S3. The training job is taking a long time, and the team suspects that data loading is the bottleneck. The dataset consists of many small files (average size 10KB). Which change would most effectively reduce the I/O bottleneck?

A.Combine the small files into larger files (e.g., TFRecord format)

B.Use SageMaker Pipe mode instead of File mode

C.Increase the number of training instances

D.Use a P3 instance type for better GPU performance

AnswerA

Larger files reduce the number of S3 API calls and improve throughput.

Why this answer

Combining small files into larger ones (e.g., TFRecord or Parquet) reduces the number of S3 GET requests and improves throughput. Using Pipe mode reads data sequentially, which is less efficient for random access. Increasing instance count or using P3 instances addresses compute, not I/O.

Amazon EFS is not recommended for training jobs due to higher latency.

Practice this question →

14

MCQmedium

A financial services company is building a model to detect fraudulent credit card transactions. The dataset contains 1 million transactions, with only 0.1% labeled as fraud. The data scientist trains a logistic regression model on the raw dataset and obtains the following results on a held-out test set: accuracy = 99.8%, precision = 50%, recall = 60%, F1 = 0.545. The business requirement is to maximize recall while keeping precision above 80%. Which course of action should the data scientist take to improve the model?

A.Use random undersampling of the majority class to balance the dataset

B.Collect more historical transaction data and retrain the model

C.Train the model with class weights inversely proportional to class frequencies

D.Apply L2 regularization with a higher penalty to reduce overfitting

AnswerC

Class weights help the model focus on the minority class, often improving precision and recall.

Why this answer

The correct answer is C because assigning class weights inversely proportional to class frequencies penalizes misclassifications of the minority class (fraud) more heavily during training. This directly addresses the severe class imbalance (0.1% fraud) by forcing the logistic regression model to learn decision boundaries that improve recall, while the weight ratio can be tuned to maintain precision above 80%. Unlike naive resampling, this approach preserves the original data distribution and avoids information loss.

Exam trap

Cisco often tests the misconception that resampling (undersampling or oversampling) is always the best first step for imbalance, when in fact cost-sensitive learning via class weights is often more effective and stable for linear models like logistic regression.

How to eliminate wrong answers

Option A is wrong because random undersampling of the majority class discards a large number of legitimate transactions, which can lead to loss of valuable patterns and increased variance, often degrading precision rather than maintaining it above 80%. Option B is wrong because simply collecting more historical data does not change the underlying class imbalance ratio; without addressing the imbalance, the model will still be biased toward the majority class and fail to improve recall. Option D is wrong because L2 regularization with a higher penalty reduces overfitting by shrinking coefficients, but it does not target class imbalance; it may actually worsen recall by further suppressing the already weak signal from the minority class.

Practice this question →

15

MCQmedium

A company uses SageMaker built-in BlazingText algorithm for text classification. The model performance is poor on the validation set. The data consists of short documents (average 50 words). Which hyperparameter tuning strategy is most likely to improve performance?

A.Increase bucket size from 0 to 1000000

B.Increase vector dimension from 100 to 300

C.Increase minCount from 1 to 5

D.Decrease window size from 5 to 2

AnswerD

Smaller window size captures local context better for short documents.

Why this answer

BlazingText's default window size of 5 may be too large for short documents (average 50 words), causing the model to learn overly broad context that dilutes local semantic patterns. Decreasing the window size to 2 forces the model to focus on tighter word co-occurrences, which is more effective for short text classification where local n-gram signals are critical.

Exam trap

The trap here is that candidates often assume increasing model capacity (e.g., vector dimension) or filtering rare words (minCount) always helps, but for short documents, the hyperparameter controlling context granularity (window size) is the most impactful lever.

How to eliminate wrong answers

Option A is wrong because bucket size controls subword n-gram hashing for out-of-vocabulary words, not classification performance on short documents; increasing it from 0 to 1,000,000 would add computational overhead without addressing the core issue of context window. Option B is wrong because increasing vector dimension from 100 to 300 risks overfitting on small short-text datasets and does not fix the problem of overly broad context capture. Option C is wrong because increasing minCount from 1 to 5 discards rare but potentially discriminative words in short documents, further reducing signal in an already sparse dataset.

Practice this question →

16

MCQeasy

A data scientist is training a binary classification model on a dataset with 10,000 features. The model overfits severely. Which technique is MOST appropriate to reduce overfitting?

A.Apply L1 regularization (Lasso)

B.Use early stopping during training

C.Use PCA to reduce dimensionality

D.Increase the max depth of the model

AnswerA

L1 regularization penalizes the absolute size of coefficients, driving some to zero and reducing overfitting.

Why this answer

L1 regularization (Lasso) can shrink some feature coefficients to zero, effectively performing feature selection and reducing overfitting. Early stopping is more for iterative algorithms. PCA reduces dimensionality but may lose interpretability.

Increasing max depth would worsen overfitting.

Practice this question →

17

MCQeasy

A data scientist is training a binary classification model using Amazon SageMaker's XGBoost. The dataset is highly imbalanced (99% negative class, 1% positive class). The data scientist wants to maximize the F1-score. Which parameter adjustment is most appropriate?

A.Set max_depth to 10

B.Set eta to 0.01

C.Set subsample to 0.5

D.Set scale_pos_weight to 99

AnswerD

scale_pos_weight adjusts the balance of positive and negative weights; a value of 99 (ratio of negatives to positives) helps the model focus on the minority class.

Why this answer

Setting scale_pos_weight to 99 is the most appropriate adjustment because it directly addresses class imbalance by assigning a higher weight to the minority (positive) class during training. In XGBoost, scale_pos_weight controls the balance of positive and negative weights, typically set as sum(negative instances) / sum(positive instances), which here is 99/1 = 99. This forces the model to penalize misclassifications of the positive class more heavily, thereby improving recall and F1-score.

Exam trap

The trap here is that candidates may confuse hyperparameters that control model complexity (max_depth, subsample) or learning rate (eta) with those that directly handle class imbalance, missing that scale_pos_weight is the specific XGBoost parameter designed for this purpose.

How to eliminate wrong answers

Option A is wrong because increasing max_depth to 10 can lead to overfitting, especially on imbalanced data, and does not directly address class imbalance or optimize F1-score. Option B is wrong because setting eta (learning rate) to 0.01 reduces the step size for updates, which can slow convergence but does not specifically handle class imbalance or improve F1-score. Option C is wrong because setting subsample to 0.5 randomly samples 50% of the training data per tree, which may reduce overfitting but does not target the imbalance between positive and negative classes.

Practice this question →

18

MCQeasy

Refer to the exhibit. A data scientist is evaluating a binary classification model for spam detection. The exhibit shows a single prediction instance. What is the model's prediction for this instance?

A.Ham

B.0.95

C.The model is unsure because probability is not 1.0

D.Spam

AnswerD

The 'predicted_label' is 'spam'.

Why this answer

The model's prediction is 'Spam' because the prediction instance shows a probability of 0.95 for the 'Spam' class, which exceeds the typical decision threshold of 0.5 used in binary classification. Since the probability for 'Spam' is higher than for 'Ham' (0.05), the model assigns the instance to the class with the highest probability, which is Spam.

Exam trap

Cisco often tests the distinction between a model's probability output and its final class prediction, leading candidates to mistakenly select the probability value (0.95) as the prediction instead of the class label (Spam).

How to eliminate wrong answers

Option A is wrong because 'Ham' is the class with the lower probability (0.05), and the model predicts the class with the highest probability, not the lowest. Option B is wrong because 0.95 is the probability score for the Spam class, not the final class prediction; the model outputs a class label (Spam or Ham), not a probability value as the prediction. Option C is wrong because the model is not 'unsure' — in binary classification, a probability of 0.95 indicates high confidence for the Spam class, and the model always makes a deterministic prediction based on the decision threshold (typically 0.5), regardless of whether the probability is 1.0.

Practice this question →

19

MCQmedium

A machine learning engineer is deploying a sentiment analysis model using Amazon SageMaker. The model is a BERT-based transformer that takes up to 512 tokens. The engineer notices that inference latency is high (over 500 ms per request) on a single ml.c5.xlarge instance. The application requires latency under 100 ms. The model has already been optimized using half-precision (FP16). Which action should the engineer take to reduce latency?

A.Use a GPU instance such as ml.g4dn.xlarge

B.Reduce the maximum sequence length to 128

C.Increase the batch size for inference requests

D.Use SageMaker Neo to compile the model for the target instance

AnswerA

GPUs accelerate transformer inference significantly.

Why this answer

Option B (use a GPU instance) accelerates inference for transformers. Option A (increase batch size) can help throughput but not latency for single requests. Option C (reduce max sequence length) may hurt accuracy.

Option D (use SageMaker Neo) is for compilation but may not achieve sub-100ms.

Practice this question →

20

MCQmedium

A company is building a text classification model to categorize customer support tickets. The dataset is highly imbalanced with 95% of tickets belonging to 'General Inquiry' and 5% to 'Complaint'. The data scientist is using a random forest classifier. Which metric is most appropriate for evaluating model performance on the minority class?

A.Accuracy

B.F1-score for 'Complaint'

C.ROC AUC

D.Precision for 'Complaint'

AnswerB

F1-score balances precision and recall, making it suitable for imbalanced classification.

Why this answer

In a highly imbalanced dataset (95% General Inquiry, 5% Complaint), accuracy is misleading because a model that predicts 'General Inquiry' for every ticket would achieve 95% accuracy but completely fail on the minority class. The F1-score for 'Complaint' is the harmonic mean of precision and recall, providing a balanced evaluation of the model's ability to correctly identify complaints without being skewed by the majority class. For a random forest classifier, this metric directly addresses the minority class performance, which is the primary concern.

Exam trap

The trap here is that candidates often choose accuracy due to its simplicity, failing to recognize that on imbalanced datasets it is a deceptive metric, or they select ROC AUC because it is commonly used for binary classification, but it does not isolate minority class performance as effectively as the F1-score.

How to eliminate wrong answers

Option A is wrong because accuracy is inappropriate for imbalanced datasets; a naive model predicting only the majority class would achieve 95% accuracy, masking poor minority class performance. Option C is wrong because ROC AUC measures the trade-off between true positive rate and false positive rate across all thresholds, which can be overly optimistic on imbalanced data and does not directly reflect precision or recall for the minority class. Option D is wrong because precision for 'Complaint' alone ignores recall; a model could achieve high precision by making very few positive predictions (e.g., only when extremely confident), but miss most actual complaints, which is unacceptable for detecting customer complaints.

Practice this question →

21

MCQmedium

A data scientist is training a binary classifier using logistic regression on a dataset that is highly imbalanced (95% negative class, 5% positive class). The model achieves 95% accuracy but only predicts the negative class. Which metric should the scientist use to evaluate the model's performance on the positive class?

A.Recall

B.Precision

C.F1 Score

D.Accuracy

AnswerB

Precision measures the proportion of positive predictions that are correct, which is 0 here, correctly indicating poor performance.

Why this answer

Precision measures the proportion of positive identifications that were actually correct, which is suitable for imbalanced datasets. Accuracy is misleading because the model predicts only the majority class. Recall would be high if the model predicted all positives, but here it predicts none.

F1 score is a harmonic mean of precision and recall, but precision is more directly relevant to the issue of false positives.

Practice this question →

22

MCQmedium

A company has a time series forecasting problem with daily sales data. The data shows both trend and seasonality. Which Amazon SageMaker built-in algorithm is most suitable?

A.K-Means

B.Linear Learner

C.DeepAR

D.XGBoost

AnswerC

DeepAR is a built-in algorithm for time series forecasting that handles trend and seasonality.

Why this answer

DeepAR is a supervised learning algorithm for time series forecasting that explicitly models both trend and seasonality using recurrent neural networks (RNNs). It is designed to handle multiple related time series, incorporate additional features like holidays or promotions, and produce probabilistic forecasts, making it the most suitable choice for daily sales data with trend and seasonal patterns.

Exam trap

The trap here is that candidates often choose XGBoost (Option D) because it is a powerful general-purpose algorithm, but they overlook that DeepAR is specifically designed for time series forecasting with trend and seasonality, whereas XGBoost requires manual feature engineering (e.g., lag variables, rolling statistics) to capture temporal patterns and does not natively produce probabilistic forecasts.

How to eliminate wrong answers

Option A is wrong because K-Means is an unsupervised clustering algorithm used for grouping data points based on similarity, not for forecasting time series with trend and seasonality. Option B is wrong because Linear Learner is a linear regression or classification algorithm that assumes independence of observations and cannot inherently capture temporal dependencies like seasonality or trend without extensive feature engineering. Option D is wrong because XGBoost is a gradient boosting algorithm primarily for tabular data and classification/regression tasks; while it can be used for time series with lag features, it is not purpose-built for sequential forecasting and lacks native support for probabilistic outputs and seasonal patterns.

Practice this question →

23

MCQhard

A data scientist runs a SageMaker training job that fails with the above error. The S3 bucket and object exist, and the IAM role has s3:GetObject permission. What is the MOST likely cause?

A.The S3 object was uploaded with incorrect checksum or is corrupted

B.The training instance does not have internet access

C.The S3 bucket has versioning enabled and the object version is not specified

D.The IAM role lacks kms:Decrypt permission for an encrypted S3 object

AnswerA

A corrupted file can cause size mismatch and zero-byte download.

Why this answer

The error indicates that SageMaker cannot read the S3 object, even though the bucket and object exist and the IAM role has s3:GetObject permission. The most likely cause is that the object was uploaded with an incorrect checksum or is corrupted, which prevents SageMaker from verifying the integrity of the data during the training job initialization. SageMaker uses ETag (MD5 checksum) validation when reading objects, and a mismatch triggers a failure.

Exam trap

The trap here is that candidates often assume permission or network issues are the root cause, but the error specifically points to a data integrity problem, which is a subtle but critical detail in SageMaker's S3 interaction.

How to eliminate wrong answers

Option B is wrong because SageMaker training jobs access S3 via AWS internal network endpoints, not the public internet; internet access is not required for S3 data retrieval. Option C is wrong because versioning does not require specifying an object version unless the object is explicitly requested by version ID; SageMaker defaults to the latest version if none is specified. Option D is wrong because the error message does not mention encryption or KMS; if the object were encrypted with SSE-KMS, the error would explicitly indicate a decryption permission failure, not a generic read failure.

Practice this question →

24

MCQhard

A data scientist is building a recommendation system for an e-commerce platform using Amazon SageMaker. The system needs to provide personalized product recommendations based on user purchase history and product metadata. The dataset contains 10 million users and 1 million products. Which algorithm should the data scientist use as the core of the recommendation engine?

A.Linear Learner

B.XGBoost

C.K-Means

D.Factorization Machines

AnswerD

Factorization Machines handle sparse data well and are designed for recommendation tasks.

Why this answer

Factorization Machines (FM) are specifically designed for high-dimensional sparse data like user-item interactions, making them ideal for recommendation systems with 10 million users and 1 million products. FM can capture pairwise feature interactions (e.g., user-product affinities) efficiently using factorized parameters, which scales well to large datasets and supports personalized recommendations from purchase history and metadata.

Exam trap

The trap here is that candidates often pick XGBoost (B) because it is a powerful general-purpose algorithm, but they overlook that it cannot efficiently handle the extreme sparsity and pairwise interaction learning required for large-scale recommendation systems, which Factorization Machines are purpose-built for.

How to eliminate wrong answers

Option A is wrong because Linear Learner is a supervised learning algorithm for regression or classification that assumes linear relationships and cannot model complex feature interactions (e.g., user-product pairs) inherent in recommendation systems. Option B is wrong because XGBoost is a tree-based ensemble method that struggles with high-dimensional sparse categorical data (e.g., user and product IDs) and does not naturally handle pairwise interaction learning without extensive feature engineering. Option C is wrong because K-Means is an unsupervised clustering algorithm that groups similar users or products but cannot produce personalized recommendations based on user-item interactions or predict ratings for unseen pairs.

Practice this question →

25

MCQeasy

A company is building a binary classification model to predict customer churn. The dataset has 10,000 samples with 500 churners (positive class). The data scientist trains a logistic regression model and obtains an accuracy of 95%. However, the model predicts all customers as non-churn. Which metric should the data scientist use to evaluate the model's performance?

A.AUC-ROC

B.F1-score

C.Accuracy

D.Confusion matrix

AnswerB

F1-score balances precision and recall; with all negatives predicted, recall is 0, so F1 is 0, clearly showing poor performance on churners.

Why this answer

The F1-score is the harmonic mean of precision and recall, making it robust to class imbalance. Since the model predicts all customers as non-churn (accuracy 95% due to 9500 non-churners), precision for the positive class is undefined (0 true positives) and recall is 0, so the F1-score correctly reveals the model's failure to identify any churners.

Exam trap

AWS often tests the trap that candidates choose accuracy because it is high (95%), failing to recognize that accuracy is meaningless in imbalanced datasets when the model predicts only the majority class.

How to eliminate wrong answers

Option A is wrong because AUC-ROC can be misleading with severe class imbalance; a model that always predicts the majority class can still achieve a high AUC-ROC if the scores are well-separated, but it fails to capture the complete lack of positive predictions. Option C is wrong because accuracy is dominated by the majority class (95% non-churn) and gives a false sense of performance when the model never predicts the positive class. Option D is wrong because a confusion matrix is a visualization tool, not a single metric; while it would show zero true positives, the question asks for a metric to evaluate performance, and the confusion matrix itself is not a scalar metric.

Practice this question →

26

Multi-Selectmedium

Which TWO actions can help reduce overfitting in a decision tree model? (Choose 2.)

Select 2 answers

A.Prune the tree after training

B.Increase the maximum depth of the tree

C.Set a minimum number of samples per leaf

D.Increase the number of features considered at each split

E.Use all training data without validation

AnswersA, C

Pruning removes branches that have little predictive power, reducing overfitting.

Why this answer

Pruning the tree after training removes branches that have little predictive power, reducing overfitting by simplifying the model. This technique directly addresses the variance component of the bias-variance tradeoff, making the model generalize better to unseen data.

Exam trap

Cisco often tests the misconception that increasing model complexity (e.g., deeper trees or more features) always improves accuracy, when in fact it increases overfitting; candidates may incorrectly select options that add complexity instead of regularization.

Practice this question →

27

MCQeasy

A company is training a large language model on Amazon SageMaker using a single GPU instance. The training is taking too long. Which change would most likely reduce training time?

A.Use a larger instance with multiple GPUs and enable distributed training

B.Increase the instance memory

C.Decrease the batch size

D.Store the training data in S3 Glacier for faster access

AnswerA

Distributed training across multiple GPUs reduces training time.

Why this answer

Using multiple GPUs in a distributed training job can parallelize work and reduce time. Option B is wrong because decreasing batch size often increases training time due to more updates. Option C is wrong because storing data in S3 Standard vs Glacier Access does not affect training speed.

Option D is wrong because increasing instance memory may not help if the bottleneck is compute.

Practice this question →

28

MCQeasy

A company is building a model to classify customer reviews as positive or negative. The dataset has 10,000 positive and 100 negative reviews. Which metric is most appropriate for evaluating model performance?

A.F1 score.

B.Accuracy.

C.Mean squared error.

D.AUC-ROC.

AnswerA

F1 score considers both precision and recall, good for imbalance.

Why this answer

Option B is correct because F1 score balances precision and recall, important for imbalanced datasets. Option A is wrong because accuracy can be misleading (e.g., 99% by predicting all positive). Option C is wrong because AUC-ROC can be optimistic for imbalanced data.

Option D is wrong because mean squared error is for regression.

Practice this question →

29

MCQeasy

A machine learning team is developing a model to predict housing prices. They have a dataset with numerical features like square footage and number of bedrooms, and categorical features like neighborhood. Which preprocessing step is essential before training a linear regression model?

A.Normalize all numerical features to have zero mean and unit variance

B.Remove highly correlated features

C.One-hot encode categorical features

D.Apply Principal Component Analysis (PCA) to reduce dimensionality

AnswerC

Linear regression requires numerical input; one-hot encoding is needed for categorical variables.

Why this answer

One-hot encoding converts categorical features into binary columns, which linear regression requires. Option A is wrong because scaling is important but not the only essential step; encoding is needed first. Option B is wrong because PCA reduces dimensionality but is optional.

Option D is wrong because feature selection is not essential for all models.

Practice this question →

30

MCQhard

A financial services company is building a fraud detection model using Amazon SageMaker. The dataset has 10 million transactions, with 0.1% fraudulent. They train an XGBoost model with default hyperparameters. The model achieves 99.9% accuracy on the test set, but only catches 10% of actual fraud cases. The company wants to maximize the number of fraud cases caught while keeping the false positive rate below 5%. The data scientist has already tried adjusting the class weights and threshold, but the recall is still low. What should the data scientist do next?

A.Collect more data, especially fraudulent transactions, to balance the dataset

B.Use a different algorithm such as a balanced random forest or SMOTE with XGBoost

C.Apply PCA to reduce the number of features and prevent overfitting

D.Use a larger instance type to train for more epochs

AnswerB

Balanced random forest or SMOTE are designed to handle imbalanced datasets.

Why this answer

Option C is correct because the model is underfitting the minority class; XGBoost with default settings may not handle extreme imbalance well. Using a specialized algorithm like balanced random forest or SMOTE can improve recall. Option A (more data) may not help if the new data is also imbalanced.

Option B (PCA) reduces dimensionality but not imbalance. Option D (larger instance) does not improve model performance.

Practice this question →

31

MCQeasy

A data scientist is using Amazon SageMaker to train a model. The training job is using a large dataset stored in S3. Which data input mode provides the FASTEST data loading for training?

A.FastFile mode

B.Augmented manifest file

C.File mode

D.Pipe mode

AnswerD

Pipe mode streams data directly from S3.

Why this answer

Pipe mode is the fastest data loading mode for SageMaker training because it streams data directly from S3 to the training algorithm via a FIFO pipe, bypassing disk writes. This eliminates the I/O overhead of downloading files to the local storage, enabling near-zero latency data ingestion for large datasets.

Exam trap

The trap here is that candidates often confuse 'FastFile mode' (which is fast but still disk-bound) with 'Pipe mode' (which is truly streaming), or they mistakenly think 'Augmented manifest file' is a data loading mode rather than a metadata file format.

How to eliminate wrong answers

Option A is wrong because FastFile mode, while optimized for high-throughput access, still requires data to be written to the local instance's Amazon Elastic Block Store (EBS) volume before training, adding disk I/O latency compared to streaming. Option B is wrong because an augmented manifest file is a metadata format for labeling jobs and data sources, not a data loading mode; it does not affect the speed of data transfer during training. Option C is wrong because File mode downloads the entire dataset from S3 to the local EBS volume before training starts, incurring significant download time and storage overhead, making it slower than streaming approaches.

Practice this question →

32

MCQhard

A company is building a machine learning model to detect anomalies in industrial sensor data. The data is time-series with seasonal patterns. The data scientist wants to use Amazon SageMaker to train a model. Which algorithm is most suitable for this task?

A.Random Cut Forest (RCF)

B.K-Means

C.DeepAR

D.XGBoost

AnswerA

RCF is a SageMaker built-in algorithm for anomaly detection, suitable for time-series data.

Why this answer

Random Cut Forest (RCF) is the most suitable algorithm because it is designed for unsupervised anomaly detection on streaming and time-series data. It works by constructing an ensemble of random trees that isolate anomalies based on how quickly a data point can be separated from the rest, making it effective for detecting outliers in sensor data with seasonal patterns without requiring labeled training data.

Exam trap

The trap here is that candidates often confuse unsupervised anomaly detection with supervised forecasting or classification, leading them to pick DeepAR or XGBoost, but RCF is the only algorithm among the options specifically built for unsupervised anomaly detection on streaming data without requiring labels.

How to eliminate wrong answers

Option B (K-Means) is wrong because it is a clustering algorithm that groups data into clusters based on distance, not specifically designed for anomaly detection; it requires specifying the number of clusters (k) and does not inherently handle time-series seasonality or detect anomalies as outliers. Option C (DeepAR) is wrong because it is a supervised forecasting algorithm for time-series that predicts future values based on historical patterns, not for detecting anomalies in existing data; it requires a target variable and is not suited for unsupervised anomaly detection. Option D (XGBoost) is wrong because it is a supervised gradient boosting algorithm used for regression and classification tasks, requiring labeled data and feature engineering; it does not natively handle unsupervised anomaly detection or time-series seasonality without extensive preprocessing and labeling.

Practice this question →

33

MCQhard

A financial services company is developing a fraud detection model using a highly imbalanced dataset where fraudulent transactions are only 0.1% of the data. The data scientist has trained a gradient boosting model that achieves 99.9% accuracy but only detects 20% of actual fraud cases. The business requirement is to detect at least 80% of fraud while minimizing false positives. The data scientist has access to SageMaker and can use any built-in algorithm or custom script. Which approach should the data scientist take to meet the business requirement?

A.Keep the model but adjust the classification threshold to increase recall.

B.Use random under-sampling of the majority class to balance the dataset and retrain the model.

C.Use Amazon SageMaker Random Cut Forest (RCF) algorithm for anomaly detection.

D.Use random oversampling of the minority class to balance the dataset and retrain the model.

AnswerC

RCF is designed for anomaly detection on highly imbalanced data and can detect fraud effectively.

Why this answer

Amazon SageMaker Random Cut Forest (RCF) is an unsupervised anomaly detection algorithm that is well-suited for highly imbalanced datasets like this one (0.1% fraud). Unlike supervised methods that struggle with extreme class imbalance, RCF isolates anomalies by measuring how many random cuts are needed to separate a point from the rest of the data, making it effective at detecting rare fraud cases without requiring balanced training data. This approach can meet the 80% fraud detection requirement while minimizing false positives by tuning the anomaly score threshold.

Exam trap

The trap here is that candidates assume a supervised model with threshold tuning (Option A) can solve the imbalance, but they overlook that the model's learned decision boundary is fundamentally biased, and unsupervised anomaly detection like RCF is specifically designed for such extreme imbalance scenarios.

How to eliminate wrong answers

Option A is wrong because simply adjusting the classification threshold on the existing gradient boosting model will increase recall but will also dramatically increase false positives, as the model was trained on imbalanced data and its decision boundary is already skewed toward the majority class. Option B is wrong because random under-sampling of the majority class discards a large amount of legitimate transaction data, which can lead to loss of valuable patterns and increase false positive rates, and it does not guarantee achieving 80% recall with minimal false positives. Option D is wrong because random oversampling of the minority class duplicates existing fraud examples, which can cause overfitting to those specific instances and reduce generalization, and it still relies on a supervised model that may not effectively learn the rare fraud patterns.

Practice this question →

34

MCQmedium

A data scientist is assigned an IAM policy as shown. The data scientist attempts to create a SageMaker endpoint to deploy a model, but the request fails. What is the most likely reason?

A.The data scientist does not have permission to upload the model to S3.

B.The data scientist does not have permission to create a training job.

C.The data scientist does not have permission to create an endpoint.

D.The data scientist does not have permission to pass roles.

AnswerC

The policy has a Deny for sagemaker:CreateEndpoint.

Why this answer

The IAM policy shown does not include the `sagemaker:CreateEndpoint` action, which is required to create a SageMaker endpoint. Even if the data scientist has permissions for other SageMaker actions like `CreateModel` or `CreateEndpointConfig`, the explicit absence of `CreateEndpoint` in the policy will cause the request to fail with an access denied error. AWS IAM policies must explicitly grant each action needed for the operation.

Exam trap

The trap here is that candidates assume that having permissions for model creation and configuration automatically implies permission for endpoint creation, but AWS requires each SageMaker API action to be explicitly listed in the IAM policy.

How to eliminate wrong answers

Option A is wrong because the policy includes `s3:PutObject` and `s3:GetObject` actions on the specified S3 bucket, so the data scientist has permission to upload the model to S3. Option B is wrong because the policy includes `sagemaker:CreateTrainingJob`, so the data scientist has permission to create a training job. Option D is wrong because the policy includes `iam:PassRole` on the specified role ARN, so the data scientist has permission to pass roles.

Practice this question →

35

MCQmedium

A company is building a recommendation system for an e-commerce platform. They have user-item interaction data and want to use matrix factorization. However, the dataset is sparse (99% missing interactions). Which approach should the data scientist take to train the model effectively?

A.Impute missing values with zeros and use singular value decomposition (SVD)

B.Use alternating least squares (ALS) with implicit feedback and assign lower confidence to unobserved interactions

C.Remove all users and items with fewer than 10 interactions to reduce sparsity

D.Use item-based collaborative filtering with cosine similarity

AnswerB

ALS with implicit feedback naturally handles sparsity by weighting unobserved interactions.

Why this answer

Option B is correct because Alternating Least Squares (ALS) with implicit feedback is specifically designed to handle sparse implicit feedback datasets by assigning lower confidence to unobserved interactions (e.g., confidence = 1 + alpha * r_ui, where r_ui is 0 for unobserved). This avoids the pitfalls of treating missing values as zeros (which distorts the factorization) and scales well to 99% sparsity by leveraging weighted regularization.

Exam trap

The trap here is that candidates assume missing values must be imputed (e.g., with zeros) or that reducing sparsity by filtering is necessary, but ALS with implicit feedback is purpose-built for sparse implicit data without imputation.

How to eliminate wrong answers

Option A is wrong because imputing missing values with zeros and applying SVD treats all unobserved interactions as negative signals, which introduces bias and distorts the latent factor model; SVD also requires a dense matrix and fails on sparse data due to overfitting and computational inefficiency. Option C is wrong because removing users and items with fewer than 10 interactions reduces the dataset size but does not address the fundamental sparsity problem—it discards valuable cold-start data and can still leave a sparse matrix, while matrix factorization methods like ALS handle sparsity natively. Option D is wrong because item-based collaborative filtering with cosine similarity relies on pairwise item similarity computed from co-occurrence patterns, which becomes unreliable when 99% of interactions are missing (cosine similarity over sparse vectors yields unstable or zero similarity scores).

Practice this question →

36

MCQmedium

A company is training a deep learning model on Amazon SageMaker. The training job is taking a long time and the data scientist suspects that the model is overfitting. Which of the following actions can help reduce overfitting and improve generalization?

A.Increase the batch size used during training.

B.Add dropout layers to the model architecture.

C.Increase the number of training epochs.

D.Remove regularization terms from the loss function.

AnswerB

Dropout is a regularization technique that helps prevent overfitting by randomly dropping neurons during training.

Why this answer

Adding dropout layers is a regularization technique that randomly drops neurons during training to prevent overfitting. Increasing the number of epochs (Option B) would likely worsen overfitting. Using a larger batch size (Option C) can sometimes help generalization but is not a direct regularization technique.

Removing regularization (Option D) would increase overfitting.

Practice this question →

37

MCQeasy

A machine learning team is building a model to predict customer churn. They have a dataset with 10,000 samples and 50 features, including categorical variables with high cardinality (e.g., ZIP code). Which feature engineering technique is most appropriate to reduce dimensionality while preserving predictive information?

A.Principal Component Analysis (PCA)

B.One-hot encoding

C.Target encoding

D.Label encoding

AnswerC

Target encoding reduces dimensionality by replacing categories with target mean, preserving predictive information.

Why this answer

Target encoding replaces high-cardinality categories with the mean target value, reducing dimensionality while capturing predictive signal. Option A is wrong because one-hot encoding creates many sparse features, increasing dimensionality. Option B is wrong because PCA is applied to numerical features, not categorical.

Option D is wrong because label encoding imposes ordinality that may not exist.

Practice this question →

38

Multi-Selecteasy

A data scientist is building a binary classifier and wants to evaluate model performance. Which THREE metrics are most commonly used?

Select 3 answers

A.Mean Absolute Error

B.RMSE

C.Precision

D.Recall

E.Accuracy

AnswersC, D, E

Common classification metric.

Why this answer

Precision is a core metric for binary classifiers, measuring the proportion of true positive predictions among all positive predictions. It is especially important when the cost of false positives is high, such as in spam detection or fraud alert systems.

Exam trap

AWS often tests the distinction between regression and classification metrics, and the trap here is that candidates mistakenly apply regression metrics like MAE or RMSE to binary classification problems.

Practice this question →

39

MCQeasy

A data scientist is training a linear regression model to predict house prices. The dataset includes features such as square footage, number of bedrooms, and location. After training, the model achieves an R² of 0.85 on the training set but only 0.60 on the test set. Which of the following is the MOST likely cause of this discrepancy?

A.The model is overfitting the training data

B.There is multicollinearity among the features

C.The model is underfitting the training data

D.There is data leakage between the training and test sets

AnswerA

Overfitting causes high training performance but poor generalization to test data.

Why this answer

A high R² on the training set (0.85) paired with a significantly lower R² on the test set (0.60) is a classic symptom of overfitting. The model has learned noise and specific patterns in the training data that do not generalize to unseen data, causing poor test performance. Regularization techniques like Lasso or Ridge, or reducing model complexity, would typically address this issue.

Exam trap

Cisco often tests the distinction between overfitting and multicollinearity, where candidates mistakenly attribute a training-test R² gap to multicollinearity instead of recognizing it as a generalization failure.

How to eliminate wrong answers

Option B is wrong because multicollinearity inflates the variance of coefficient estimates but does not inherently cause a large gap between training and test R²; it affects interpretability and stability, not generalization performance directly. Option C is wrong because underfitting would result in low R² on both training and test sets (e.g., both below 0.60), not a high training R² with a much lower test R². Option D is wrong because data leakage would typically inflate both training and test R² artificially, making them both appear deceptively high, not creating a large discrepancy between them.

Practice this question →

40

MCQhard

A data scientist is building a model to predict customer churn. The dataset contains categorical features with high cardinality (e.g., ZIP code, customer ID). Which encoding method is MOST suitable?

A.One-hot encoding

B.Label encoding

C.Hashing encoding

D.Target encoding

AnswerD

Target encoding captures information without expanding dimensionality.

Why this answer

Target encoding is most suitable for high-cardinality categorical features because it replaces each category with the mean of the target variable for that category, effectively capturing the predictive signal while keeping the feature space dense. This avoids the curse of dimensionality from one-hot encoding and the arbitrary ordinality of label encoding, which can mislead tree-based models.

Exam trap

The trap here is that candidates often choose one-hot encoding as the default for categorical data, failing to recognize that high cardinality makes it impractical, or they pick label encoding assuming it is safe for tree models, but it introduces false ordinality that can degrade performance.

How to eliminate wrong answers

Option A is wrong because one-hot encoding creates a binary column for each unique category, which with high cardinality (e.g., thousands of ZIP codes) leads to an extremely sparse feature matrix, causing memory issues and model overfitting. Option B is wrong because label encoding assigns arbitrary integer labels to categories, implying an ordinal relationship that does not exist, which can distort distance-based and tree-based models. Option C is wrong because hashing encoding maps categories to a fixed number of buckets via a hash function, which can cause collisions (different categories mapping to the same bucket) and loss of information, making it less reliable for churn prediction where each category's signal matters.

Practice this question →

41

Multi-Selectmedium

A machine learning engineer is deploying a model on Amazon SageMaker. Which TWO steps are required to create a SageMaker endpoint?

Select 2 answers

A.Create a SageMaker model

B.Submit a training job

C.Create a SageMaker pipeline

D.Create an endpoint configuration

E.Create a SageMaker notebook instance

AnswersA, D

Model must be registered first.

Why this answer

A is correct because creating a SageMaker model is the first required step to define the model artifacts, inference code, and container image that will be used for predictions. Without a model object, SageMaker has no executable artifact to deploy behind the endpoint.

Exam trap

The trap here is that candidates confuse the training job (Option B) as a prerequisite for deployment, but SageMaker allows deploying a pre-trained model without ever running a training job, so only the model creation and endpoint configuration are mandatory.

Practice this question →

42

MCQhard

Refer to the exhibit. A training job failed with the error shown. What is the most likely cause?

A.The model architecture is incorrect

B.The training data contains missing values or outliers that cause numerical instability

C.The instance type does not have enough memory

D.The training job exceeded the maximum runtime

AnswerB

Error indicates NaN or infinity in input.

Why this answer

Option A is correct because the error explicitly states input contains NaN or infinity, indicating missing or invalid values in the training data. Option B is wrong because the error is about input values, not about memory. Option C is wrong because the error is from the training script, not from SageMaker capacity.

Option D is wrong because the error is about input data, not about model architecture.

Practice this question →

43

MCQeasy

A data scientist is using Amazon SageMaker to train a model with the built-in XGBoost algorithm. The dataset contains missing values. What is the default behavior of SageMaker XGBoost regarding missing values?

A.It raises an error and stops training

B.It imputes missing values with the column mean

C.It removes rows with missing values

D.It automatically learns the best direction (left or right) for missing values during training

AnswerD

XGBoost's sparsity-aware algorithm learns the optimal branch for missing values.

Why this answer

Option A is correct because XGBoost treats missing values as a separate category and learns the best direction to handle them (by default). Option B (mean imputation) is not default; XGBoost handles missingness internally. Option C (removing rows) is not default.

Option D (fail) is not default.

Practice this question →

44

MCQhard

A machine learning team is using Amazon SageMaker to train a model. The training data is stored in an S3 bucket. The team wants to ensure that the training job can access the data securely without using long-lived AWS credentials. Which approach should the team use?

A.Store AWS access keys in the training script

B.Use an S3 bucket policy that allows public access

C.Specify an IAM role in the SageMaker training job configuration

D.Create a new IAM user for each training job

AnswerC

SageMaker assumes the IAM role to access S3, providing temporary credentials and secure access.

Why this answer

Option C is correct because SageMaker training jobs can assume an IAM role specified in the job configuration to obtain temporary security credentials via AWS Security Token Service (STS). This allows the training job to access the S3 bucket securely without embedding long-lived AWS access keys in code or configuration files.

Exam trap

The trap here is that candidates may think embedding credentials in code (Option A) is acceptable for automation, but AWS services like SageMaker are designed to use IAM roles for temporary, scoped access, making long-lived credentials unnecessary and insecure.

How to eliminate wrong answers

Option A is wrong because storing AWS access keys in the training script violates security best practices by exposing long-lived credentials that could be compromised; SageMaker provides IAM roles to avoid this. Option B is wrong because making the S3 bucket publicly accessible would expose the training data to anyone on the internet, creating a severe security risk and violating data privacy requirements. Option D is wrong because creating a new IAM user for each training job is impractical and insecure, as it would require managing many long-lived credentials and does not leverage SageMaker's built-in IAM role-based access control.

Practice this question →

45

Multi-Selecthard

A data scientist is building a binary classification model to predict loan default. The dataset is highly imbalanced (5% default, 95% non-default). Which TWO techniques should the data scientist use to address the class imbalance?

Select 2 answers

A.Undersample the majority class

B.Use RMSE as the evaluation metric

C.Oversample the minority class using SMOTE

D.Use accuracy as the evaluation metric

E.Use class weights in the loss function

AnswersC, E

SMOTE generates synthetic samples for the minority class, balancing the dataset.

Why this answer

Oversampling the minority class using SMOTE (Synthetic Minority Oversampling Technique) is correct because it generates synthetic samples for the minority class by interpolating between existing minority instances, rather than simply duplicating them. This helps balance the dataset without introducing exact copies, which can reduce overfitting and improve the model's ability to generalize to the minority class.

Exam trap

AWS often tests the misconception that accuracy is a valid metric for imbalanced datasets, or that undersampling is always preferable to oversampling, when in fact accuracy can be highly misleading and undersampling can discard critical data.

Practice this question →

46

MCQmedium

A data scientist is using SageMaker to train a model that requires access to a private S3 bucket in another account. The scientist has set up the correct IAM roles and bucket policies. However, the training job fails with an access denied error. What is the most likely cause?

A.The SageMaker execution role does not have s3:GetObject permission

B.The S3 objects are encrypted with SSE-KMS and the KMS key is not accessible

C.The training instance type does not support S3 access

D.The VPC used for training does not have a route to S3 (e.g., missing VPC endpoint or NAT)

AnswerD

If training is in a VPC without S3 access, the job cannot reach S3 despite correct IAM policies.

Why this answer

When a SageMaker training job runs inside a VPC (which is common for cross-account access), the VPC must have a route to S3, either through a VPC endpoint (Gateway or Interface) or a NAT gateway. Without such a route, the training instances cannot reach the private S3 bucket in the other account, even if IAM roles and bucket policies are correctly configured, resulting in an 'access denied' error because the network path is blocked.

Exam trap

The trap here is that candidates often assume 'access denied' always means an IAM or bucket policy issue, but in cross-account scenarios with VPCs, network misconfigurations (like missing VPC endpoints) are a common hidden cause that produces the same error message.

How to eliminate wrong answers

Option A is wrong because the question states that the correct IAM roles and bucket policies have been set up, so the execution role likely already has s3:GetObject permission; the error is not due to missing IAM permissions. Option B is wrong because while SSE-KMS can cause access issues, the question specifically says the scientist set up correct IAM roles and bucket policies, and there is no mention of KMS key configuration; the most likely cause in a cross-account scenario with VPC is network connectivity. Option C is wrong because all SageMaker training instance types support S3 access via the SageMaker-managed S3 client; there is no instance type that inherently lacks S3 connectivity.

Practice this question →

47

Multi-Selecthard

A data scientist is training a binary classification model using Amazon SageMaker's built-in XGBoost algorithm. The dataset is highly imbalanced (95% negative class, 5% positive class). The model achieves high accuracy but poor recall on the positive class. Which TWO actions should the data scientist take to improve recall without significantly sacrificing precision?

Select 2 answers

A.Perform random undersampling of the majority class.

B.Set scale_pos_weight to the ratio of negative to positive samples.

C.Increase the max_depth hyperparameter.

D.Reduce the learning rate (eta) and increase num_round.

E.Use SMOTE to generate synthetic samples of the minority class.

AnswersB, E

This parameter assigns higher weight to the minority class, penalizing misclassifications more.

Why this answer

Options B and E are correct. Using scale_pos_weight adjusts the weight of the positive class, directly addressing imbalance. SMOTE oversamples the minority class to balance the dataset.

Option A is wrong because subsampling the majority class may lose information. Option C is wrong because increasing max_depth may overfit. Option D is wrong because reducing eta may slow convergence but not directly help imbalance.

Practice this question →

48

MCQhard

A data scientist is training a binary classifier on a dataset with 1 million rows and 500 features. The model uses XGBoost and achieves an AUC of 0.95 on the training set but only 0.72 on the test set. The scientist suspects overfitting. Which combination of hyperparameter adjustments is most likely to improve generalization?

A.Increase 'max_depth' and decrease 'learning_rate'

B.Increase 'subsample' and decrease 'colsample_bytree'

C.Decrease 'max_depth' and increase 'min_child_weight'

D.Decrease 'gamma' and increase 'learning_rate'

AnswerC

Decreasing max_depth reduces tree complexity; increasing min_child_weight prevents overfitting by requiring more samples per leaf.

Why this answer

Option C is correct because decreasing 'max_depth' reduces the complexity of individual trees, preventing the model from learning overly specific patterns in the training data. Increasing 'min_child_child_weight' forces the algorithm to require a higher sum of instance weights (hessian) before further partitioning, which acts as a regularization mechanism that discourages splits on noisy or sparse data. Together, these adjustments directly combat overfitting in XGBoost by limiting tree depth and requiring more evidence for splits, which improves generalization from the training AUC of 0.95 to a higher test AUC.

Exam trap

Cisco often tests the misconception that increasing regularization parameters like 'max_depth' or decreasing 'learning_rate' alone will fix overfitting, when in fact the correct approach is to reduce model complexity (decrease 'max_depth') and increase split regularization (increase 'min_child_weight').

How to eliminate wrong answers

Option A is wrong because increasing 'max_depth' makes trees deeper and more complex, which exacerbates overfitting, and decreasing 'learning_rate' alone does not compensate for the added depth; this combination would likely worsen the generalization gap. Option B is wrong because increasing 'subsample' (the fraction of rows sampled per tree) actually reduces randomness and can increase overfitting, while decreasing 'colsample_bytree' (the fraction of features sampled) adds some regularization but is insufficient to counterbalance the increased subsample; the net effect is ambiguous and not the most direct fix for overfitting. Option D is wrong because decreasing 'gamma' (the minimum loss reduction required for a split) allows more splits, increasing model complexity and overfitting, and increasing 'learning_rate' makes the model converge faster but with larger steps, which can also lead to overfitting; this combination moves in the wrong direction for regularization.

Practice this question →

49

MCQmedium

A data scientist is training a binary classifier on an imbalanced dataset where the positive class represents 1% of the data. The model is evaluated using accuracy, but the accuracy is 99% even though the model predicts all instances as negative. Which metric should the data scientist use to properly evaluate the model?

A.Root mean squared error (RMSE)

B.Mean squared error (MSE)

C.F1 score

D.Accuracy

AnswerC

F1 score combines precision and recall, providing a better measure for imbalanced classification.

Why this answer

The F1 score is the harmonic mean of precision and recall, making it robust to class imbalance. With 99% negative instances, accuracy is misleadingly high even if the model never predicts the positive class. F1 captures both false positives and false negatives, providing a balanced evaluation of the minority class performance.

Exam trap

The trap here is that candidates see 99% accuracy and assume the model is performing well, failing to recognize that accuracy is unreliable for imbalanced datasets, and they may incorrectly choose accuracy or a regression metric without considering the need for a precision-recall based metric like F1.

How to eliminate wrong answers

Option A is wrong because RMSE is a regression metric that measures the square root of the average squared differences between predicted and actual values, not suitable for binary classification evaluation. Option B is wrong because MSE is also a regression metric that penalizes larger errors quadratically, and it does not account for class imbalance or the confusion matrix structure. Option D is wrong because accuracy is dominated by the majority class in imbalanced datasets; a model predicting all negatives achieves 99% accuracy but fails to identify any positive instances, making it a misleading metric.

Practice this question →

50

MCQhard

A machine learning engineer is deploying a model that predicts customer churn. The model outputs probabilities between 0 and 1. The business requires that at least 90% of customers flagged for churn actually churn (precision >= 0.9). Currently, the model's precision is 0.85 at the default threshold of 0.5. Which threshold adjustment should the engineer consider?

A.Decrease the threshold to 0.4

B.Decrease the threshold to 0.3

C.Increase the threshold to 0.7

D.Keep the threshold at 0.5

AnswerC

Higher threshold increases precision by requiring higher confidence for positive predictions.

Why this answer

Increasing the threshold to 0.7 raises the probability cutoff for classifying a customer as churning. This means only customers with a high predicted probability (strong model confidence) are flagged, which reduces false positives and increases precision. Since the current precision at 0.5 is 0.85 and the goal is ≥0.9, moving the threshold higher is the correct direction to achieve the required precision.

Exam trap

The trap here is that candidates often associate higher thresholds with lower recall and assume precision will drop, but in reality, increasing the threshold filters out low-confidence positives, which reduces false positives and increases precision.

How to eliminate wrong answers

Option A is wrong because decreasing the threshold to 0.4 would classify more customers as churn, including those with lower probabilities, which typically increases false positives and lowers precision further below 0.9. Option B is wrong because decreasing the threshold to 0.3 would have an even more extreme effect, flooding the flagged set with low-confidence predictions and worsening precision. Option D is wrong because keeping the threshold at 0.5 maintains the current precision of 0.85, which does not meet the business requirement of at least 0.9.

Practice this question →

51

MCQeasy

A data scientist is training a binary classification model on an imbalanced dataset where the positive class is rare. The model currently achieves 95% accuracy but only 10% recall on the positive class. Which metric should the data scientist prioritize to improve model performance?

A.F1 score

B.AUC-ROC

C.Precision

D.Accuracy

AnswerA

F1 score combines precision and recall, making it suitable for imbalanced datasets where both false positives and false negatives are important.

Why this answer

The F1 score is the harmonic mean of precision and recall, making it the best single metric to optimize when the positive class is rare and both false positives and false negatives are costly. With 95% accuracy but only 10% recall, the model is likely predicting the majority class almost exclusively, so improving recall without sacrificing precision is critical — the F1 score directly balances this trade-off.

Exam trap

Cisco often tests the misconception that accuracy is always the best metric, but the trap here is that on imbalanced datasets, accuracy is misleadingly high even when the model fails to detect the rare positive class, so candidates must recognize that F1 score (or precision-recall AUC) is the appropriate choice.

How to eliminate wrong answers

Option B (AUC-ROC) is wrong because AUC-ROC measures the model's ability to rank positive instances higher than negative ones across all thresholds, but it can be misleading on highly imbalanced datasets — a high AUC-ROC can still correspond to poor recall if the model scores all positives slightly above negatives but never predicts them. Option C (Precision) is wrong because optimizing precision alone would further reduce recall, making the model even less useful for detecting the rare positive class — precision focuses on minimizing false positives, not on capturing true positives. Option D (Accuracy) is wrong because accuracy is dominated by the majority class in imbalanced settings; a model that predicts the negative class for every instance can achieve 95% accuracy while having 0% recall, which is the exact problem described.

Practice this question →

52

MCQmedium

A company is using SageMaker to deploy a real-time inference endpoint for a natural language processing model. The model receives input text and returns predictions. The data scientist notices that the endpoint latency increases significantly under load. Which design change would MOST effectively reduce latency?

A.Enable data capture for monitoring

B.Switch to batch transform for real-time predictions

C.Increase the number of instances behind the endpoint

D.Use an inference pipeline to combine preprocessing and model inference

AnswerD

Inference pipelines reduce network overhead between preprocessing and prediction.

Why this answer

Option D is correct because an inference pipeline in SageMaker allows you to chain preprocessing logic directly with the model inference within the same endpoint container. This eliminates the need for separate Lambda functions or client-side preprocessing, which reduces network round-trips and serialization overhead, thereby lowering latency under load.

Exam trap

The trap here is that candidates often assume scaling out (Option C) is the universal fix for latency, but the question specifically targets latency under load caused by preprocessing overhead, not throughput limits.

How to eliminate wrong answers

Option A is wrong because enabling data capture for monitoring adds additional I/O overhead and storage writes, which can increase latency rather than reduce it. Option B is wrong because batch transform is designed for offline, asynchronous predictions on large datasets, not for real-time inference; switching to it would break the real-time requirement and introduce significant latency due to job queuing. Option C is wrong because increasing the number of instances behind the endpoint improves throughput and availability but does not directly reduce per-request latency; it may even add slight overhead from load balancing.

Practice this question →

53

MCQhard

A data scientist ran a hyperparameter tuning job for an XGBoost model. The tuning job completed, but the best validation RMSE is 2.34. The data scientist believes the model can perform better. Based on the exhibit, which change to the tuning strategy is most likely to improve the model's performance?

A.Use random search instead of Bayesian optimization

B.Change the objective to binary:logistic

C.Increase the maximum value of eta to 1.0

D.Increase the static num_round hyperparameter to 500

AnswerD

The tuning job fixed num_round to 100; increasing it allows more boosting rounds, which can improve model performance.

Why this answer

Option D is correct because increasing the static `num_round` hyperparameter to 500 allows the model to train for more boosting rounds, which can reduce underfitting and lower the RMSE further. The current best validation RMSE of 2.34 suggests the model may not have converged, and additional rounds can help the XGBoost model learn more complex patterns, provided overfitting is monitored with early stopping.

Exam trap

The trap here is that candidates may think increasing `eta` to 1.0 accelerates learning, but they overlook that a high learning rate without sufficient boosting rounds or regularization often causes the model to overshoot the optimal solution, degrading RMSE.

How to eliminate wrong answers

Option A is wrong because random search is less efficient than Bayesian optimization for hyperparameter tuning, as it does not learn from previous trials to focus on promising regions, so switching to random search would likely degrade performance. Option B is wrong because changing the objective to binary:logistic is for binary classification tasks, but the RMSE metric indicates a regression problem, so this would be a fundamental mismatch. Option C is wrong because increasing the maximum value of `eta` (learning rate) to 1.0 would make the model take overly large steps during training, likely causing divergence or poor convergence, which would worsen RMSE rather than improve it.

Practice this question →

54

MCQhard

A data scientist is training an LSTM model for time series forecasting using Amazon SageMaker. The model is overfitting. Which action is LEAST likely to reduce overfitting?

A.Add dropout layers

B.Increase the number of LSTM layers

C.Reduce the number of hidden units

D.Use early stopping

AnswerB

Increases complexity, likely overfits more.

Why this answer

Option D is correct because increasing the number of LSTM layers typically increases model complexity, which can worsen overfitting. Option A is wrong because dropout helps. Option B is wrong because reducing hidden units reduces complexity.

Option C is wrong because early stopping prevents overfitting.

Practice this question →

55

MCQeasy

A data scientist is training a regression model on a dataset with 50 features. After training a linear regression model, the model achieves an R-squared of 0.85 on the training set but only 0.55 on the test set. Which technique is most likely to reduce the generalization error?

A.Add more features

B.Remove highly correlated features

C.Increase the polynomial degree of the model

D.Apply L2 regularization (Ridge regression)

AnswerD

L2 regularization shrinks coefficients, reducing variance and improving test performance.

Why this answer

The model exhibits high variance (overfitting): high training R² (0.85) but much lower test R² (0.55). L2 regularization (Ridge regression) shrinks coefficients toward zero, reducing model complexity and penalizing large weights, which directly combats overfitting and improves generalization to unseen data.

Exam trap

AWS often tests the distinction between overfitting (high variance) and underfitting (high bias), and candidates mistakenly choose feature removal or polynomial adjustment when regularization is the direct fix for variance-dominated error.

How to eliminate wrong answers

Option A is wrong because adding more features would increase model complexity and likely worsen overfitting, not reduce generalization error. Option B is wrong because removing highly correlated features addresses multicollinearity, which inflates coefficient variance but is not the primary cause of the large train-test gap (overfitting) seen here. Option C is wrong because increasing the polynomial degree would further increase model flexibility and exacerbate overfitting, leading to even lower test performance.

Practice this question →

56

MCQmedium

A data scientist is training a multiclass classification model to categorize support tickets into 50 categories. The dataset has 100,000 labeled tickets. The scientist uses a random forest classifier with 100 trees. The model achieves 90% accuracy on the test set, but the F1-score for some rare categories is below 0.1. The scientist wants to improve performance on rare categories without significantly reducing overall accuracy. Which approach should the scientist try?

A.Increase the maximum depth of trees

B.Reduce the number of trees to 50 to prevent overfitting

C.Switch to a one-vs-rest logistic regression model

D.Use class_weight='balanced' or compute custom class weights

AnswerD

Class weights penalize misclassifications of rare classes more heavily.

Why this answer

Option A (use class weights) helps the model focus on rare classes. Option B (reduce the number of trees) may hurt overall performance. Option C (use one-vs-rest logistic regression) may not handle rare classes well.

Option D (increase max_depth) could overfit.

Practice this question →

57

MCQhard

A team is deploying a model for fraud detection. The dataset is highly imbalanced (99% legitimate, 1% fraudulent). They trained a logistic regression model and achieved 99% accuracy on the test set. However, the model fails to detect most fraud cases. Which metric should the team focus on to evaluate the model?

A.Mean squared error

B.Precision

C.Recall

D.Accuracy

AnswerC

Recall measures the proportion of actual fraud cases correctly identified.

Why this answer

Accuracy is misleading for imbalanced datasets. Recall (true positive rate) measures how well the model detects fraud. Option A is wrong because accuracy is already high but misleading.

Option C is wrong because precision may be high but recall low. Option D is wrong because mean squared error is for regression.

Practice this question →

58

MCQmedium

A company's machine learning model is overfitting to the training data. The data scientist has already tried reducing the model complexity and adding regularization, but the model still overfits. Which technique could the data scientist use to further reduce overfitting?

A.Use data augmentation to increase the training dataset size

B.Decrease the batch size

C.Increase the number of training epochs

D.Increase the learning rate

AnswerA

Data augmentation creates more training examples, which helps the model generalize better and reduces overfitting.

Why this answer

Data augmentation artificially increases the size and diversity of the training dataset by applying transformations (e.g., rotations, flips, noise injection) to existing samples. This exposes the model to more varied examples, reducing its tendency to memorize noise and improving generalization — directly countering overfitting when other methods have failed.

Exam trap

Cisco often tests the misconception that hyperparameter tuning (e.g., batch size, learning rate) is a primary cure for overfitting, when in fact these parameters primarily affect optimization dynamics, not the fundamental data scarcity or memorization issue that data augmentation directly addresses.

How to eliminate wrong answers

Option B is wrong because decreasing the batch size introduces noisier gradient estimates, which can sometimes act as a mild regularizer, but it does not fundamentally address overfitting caused by insufficient or repetitive training data. Option C is wrong because increasing the number of training epochs typically worsens overfitting by allowing the model more iterations to memorize the training set. Option D is wrong because increasing the learning rate can destabilize training (e.g., divergence) and does not reduce overfitting; it may even cause the model to skip over generalizable minima.

Practice this question →

59

MCQmedium

A machine learning engineer is deploying a model to an Amazon SageMaker endpoint for real-time inference. The model requires a preprocessing step that involves tokenizing text and converting it to a numerical format. To minimize latency, where should the preprocessing logic be implemented?

A.Inside the SageMaker inference container using the inference.py script

B.Using Amazon SageMaker batch transform

C.As a separate AWS Lambda function called before the endpoint

D.On the client side before sending the request

AnswerA

Including preprocessing in the container reduces latency by processing data locally.

Why this answer

To minimize latency, it's best to include the preprocessing logic inside the inference container that serves the model. This avoids additional network calls to separate preprocessing services.

Practice this question →

60

MCQmedium

A data scientist is using SageMaker to train a deep learning model for image classification. The training job is taking too long. Which approach can reduce training time?

A.Use SageMaker's distributed data parallelism

B.Use SageMaker Neo to compile the model

C.Increase the number of epochs

D.Use a smaller image size

AnswerA

Distributed training speeds up training by parallelizing across GPUs.

Why this answer

SageMaker's distributed data parallelism splits the training data across multiple GPUs or instances, allowing each worker to process a different subset of the data simultaneously. This reduces the wall-clock time per epoch by parallelizing the computation, which directly addresses the 'taking too long' issue for deep learning image classification models.

Exam trap

AWS often tests the distinction between training acceleration (distributed data parallelism) and inference optimization (Neo), leading candidates to mistakenly choose Neo for training speed improvements.

How to eliminate wrong answers

Option B is wrong because SageMaker Neo compiles trained models for optimized inference on target hardware, not for speeding up training. Option C is wrong because increasing the number of epochs increases training time, the opposite of what is needed. Option D is wrong because using a smaller image size reduces model accuracy and may not significantly reduce training time if the model architecture and batch size remain unchanged; it is a data preprocessing choice, not a training acceleration technique.

Practice this question →

61

MCQhard

A data scientist uses SageMaker Autopilot to automatically build a binary classification model. The dataset has 50 features and 100,000 rows. After the experiment, Autopilot provides multiple candidate models. Which candidate should the data scientist select to minimize inference latency for real-time predictions?

A.The model with the smallest memory footprint

B.The model with the highest validation accuracy

C.The model with the lowest validation loss

D.The model that is a linear learner

AnswerD

Linear models are fast for inference due to simple computations.

Why this answer

SageMaker Autopilot explores various algorithms including linear models, tree-based ensembles, and neural networks. For real-time inference with low latency, simpler models like linear or logistic regression or shallow decision trees are preferred. XGBoost with many trees or deep neural networks increase latency.

Practice this question →

62

MCQhard

A team is training a deep learning model using TensorFlow on a single GPU instance in SageMaker. The GPU utilization is below 30%. Which change will MOST improve GPU utilization?

A.Reduce the number of epochs

B.Increase the batch size

C.Use SageMaker Distributed Training with multiple GPUs

D.Switch to a CPU instance

AnswerB

Larger batches keep the GPU busy with more data per iteration.

Why this answer

Increasing the batch size makes more efficient use of GPU memory and parallel processing, improving utilization.

Practice this question →

63

MCQmedium

A data scientist is using Amazon SageMaker Autopilot to automatically build a binary classification model. The dataset has 50 features and 100,000 rows. After the experiment completes, the best candidate model achieves an F1 score of 0.85 on the validation set. However, when deployed to a real-time endpoint, the model's F1 score drops to 0.72 on production data. The data distributions between training and production are similar. What is the MOST likely cause of the performance drop?

A.Concept drift occurred between training and production.

B.The production data contains missing values that were not present in training.

C.The inference endpoint uses a different instance type than training.

D.The Autopilot pipeline used features that are not available at inference time (data leakage).

AnswerD

If Autopilot used future information or features derived from the target, the validation score would be inflated.

Why this answer

Option C is correct. Data leakage during Autopilot's feature engineering can lead to overly optimistic validation scores. Option A is wrong because similar distributions suggest no drift.

Option B is wrong because Autopilot handles missing values. Option D is wrong because inference instances typically don't affect accuracy.

Practice this question →

64

MCQmedium

A company is building a binary classifier to predict equipment failure. The dataset has 99% negative (no failure) and 1% positive (failure) examples. The data scientist uses a random forest model with default settings. The model achieves 99% accuracy on the test set but fails to identify any actual failures. Which metric should the data scientist use to evaluate the model?

A.RMSE

B.R-squared

C.Recall

D.Precision

AnswerC

Recall measures the proportion of actual positives correctly identified, which is critical for imbalanced data.

Why this answer

Recall (sensitivity) measures the proportion of actual positive cases correctly identified. With 99% negative examples, a model can achieve 99% accuracy by simply predicting 'no failure' for all instances, but this yields 0% recall for the failure class. Since the goal is to detect rare failures, recall is the appropriate metric to evaluate the model's ability to find positive cases.

Exam trap

The trap here is that candidates see 99% accuracy and assume the model is performing well, failing to recognize that accuracy is a poor metric for imbalanced datasets, and they overlook recall as the metric that reveals the model's inability to detect the minority class.

How to eliminate wrong answers

Option A is wrong because RMSE (Root Mean Squared Error) is a regression metric used for continuous target variables, not for binary classification problems. Option B is wrong because R-squared measures the proportion of variance explained in a regression model, which is meaningless for evaluating a binary classifier's ability to detect failures. Option D is wrong because precision measures the proportion of predicted positives that are actually positive; while useful, it does not capture the model's failure to identify any actual failures (the model has 0% recall, but precision would be undefined or 0/0 if no positives are predicted).

Practice this question →

65

Multi-Selecthard

A company is using Amazon SageMaker to deploy a model for real-time inference. The model takes 200 ms to respond, but the requirement is 100 ms. Which THREE actions could reduce latency? (Choose THREE.)

Select 3 answers

A.Use a larger instance with more compute capacity

B.Prune the model to remove unnecessary weights

C.Switch to a CPU-based instance

D.Use SageMaker Neo to compile the model for the target instance

E.Increase the batch size for inference

AnswersA, B, D

More powerful instances reduce inference time.

Why this answer

Using a more powerful instance reduces compute time. Model pruning reduces model size and computation. SageMaker Neo optimizes models for target hardware.

Option D is wrong because increasing batch size increases latency. Option E is wrong because CPU instances are typically slower than GPU for deep learning.

Practice this question →

66

MCQmedium

A company is training a deep learning model on a large dataset using Amazon SageMaker. The training script uses TensorFlow and requires GPUs. The training job is failing with an out-of-memory error. Which configuration change should be made to resolve this issue?

A.Use a larger instance type with more GPU memory.

B.Increase the number of instances in the training job.

C.Switch to using spot instances to reduce cost.

D.Enable distributed training across multiple instances.

AnswerA

Larger instance types have more GPU memory, resolving the OOM error.

Why this answer

The training job is failing with an out-of-memory error, which indicates that the model or batch size exceeds the GPU memory capacity of the current instance. Using a larger instance type with more GPU memory directly addresses this by providing additional VRAM, allowing the model to fit in memory and the training to proceed without failure.

Exam trap

The trap here is that candidates confuse horizontal scaling (adding instances) with vertical scaling (increasing instance size), assuming that more instances will magically fix a per-GPU memory limit, when in fact distributed training requires the model to fit on each GPU unless model parallelism is explicitly implemented.

How to eliminate wrong answers

Option B is wrong because increasing the number of instances does not increase the GPU memory available to a single training process; it only distributes the workload across multiple machines, which does not resolve a local out-of-memory error on each GPU. Option C is wrong because switching to spot instances reduces cost but does not change the instance's hardware specifications, so the GPU memory remains the same and the out-of-memory error persists. Option D is wrong because enabling distributed training across multiple instances partitions the data or model across GPUs, but if the model itself does not fit on a single GPU, you would need model parallelism or a larger instance; simply distributing the workload does not increase per-GPU memory and may still result in out-of-memory errors on each GPU.

Practice this question →

67

MCQmedium

A data scientist needs to choose an algorithm for a regression problem with 50 features and 1 million training examples. The model must be interpretable and the training data fits in memory. Which algorithm is most appropriate?

A.Principal Component Analysis (PCA)

B.Linear regression

C.XGBoost

D.k-Nearest Neighbors

AnswerB

Linear regression is interpretable and efficient for large datasets.

Why this answer

Option A is correct because linear regression is interpretable, scales well to large datasets, and is suitable for regression. Option B is wrong because XGBoost is less interpretable. Option C is wrong because k-NN is computationally expensive at inference and not interpretable.

Option D is wrong because PCA is dimensionality reduction, not a regression algorithm.

Practice this question →

68

Multi-Selecteasy

A data scientist is training a linear regression model and wants to check for multicollinearity among the features. Which TWO methods can be used to detect multicollinearity? (Choose TWO.)

Select 2 answers

A.Examine the R-squared value of the model

B.Compute the correlation matrix between features

C.Check the p-values of the coefficients

D.Calculate Variance Inflation Factor (VIF) for each feature

E.Plot the residuals vs. fitted values

AnswersB, D

High pairwise correlations between features (e.g., >0.8) suggest multicollinearity.

Why this answer

Option B is correct because computing the correlation matrix between features directly reveals pairwise linear relationships. High correlation coefficients (e.g., >0.8 or <-0.8) between two predictors indicate potential multicollinearity, which can destabilize coefficient estimates in linear regression.

Exam trap

AWS often tests the distinction between diagnosing model fit (R-squared, residual plots) and diagnosing predictor multicollinearity, leading candidates to mistakenly choose methods that evaluate model performance rather than feature interdependence.

Practice this question →

69

Multi-Selecthard

A machine learning engineer is using SageMaker's built-in XGBoost algorithm for a multi-class classification problem. The training job completes but the model accuracy is low. Which THREE hyperparameters should the engineer tune to improve performance?

Select 3 answers

A.eta (learning rate)

B.num_round

C.subsample

D.max_depth

E.colsample_bytree

AnswersA, B, D

Learning rate controls contribution of each tree; tuning helps convergence.

Why this answer

XGBoost hyperparameters: 'num_round' (number of boosting rounds), 'eta' (learning rate), and 'max_depth' (tree depth) are key for improving accuracy. 'subsample' can help but is less direct. 'min_child_weight' also important but these three are most common. 'colsample_bytree' is for feature subsampling.

Practice this question →

70

MCQmedium

A company is building a multiclass classification model using Amazon SageMaker. The dataset has 100 classes and is highly imbalanced. The model currently achieves high accuracy on the majority classes but poor performance on minority classes. Which technique should the data scientist use to improve minority class performance?

A.Apply random oversampling with replacement

B.Apply principal component analysis (PCA)

C.Use class weights to penalize misclassifications of minority classes

D.Remove samples from majority classes

AnswerC

Effectively balances the loss function.

Why this answer

Option C is correct because class weights penalize errors on minority classes more, improving their recall. Option A is wrong because removing majority classes would lose data. Option B is wrong because oversampling without replacement may cause overfitting.

Option D is wrong because principal component analysis (PCA) is dimensionality reduction, not for imbalance.

Practice this question →

71

Multi-Selectmedium

Which THREE of the following are valid metrics for evaluating a regression model?

Select 3 answers

A.R-squared (R²)

B.F1 score

C.Mean Absolute Error (MAE)

D.Root Mean Squared Error (RMSE)

E.Accuracy

AnswersA, C, D

R-squared is a common regression metric.

Why this answer

R-squared (R²) is a valid regression metric that measures the proportion of variance in the dependent variable explained by the independent variables. It ranges from 0 to 1, with higher values indicating better fit, and is commonly used alongside other error metrics to assess model performance.

Exam trap

The trap here is that candidates confuse classification metrics (F1 score, Accuracy) with regression metrics, especially when the question asks for 'valid metrics' without specifying the model type, leading them to select metrics they are more familiar with from classification tasks.

Practice this question →

72

Multi-Selecthard

A data scientist is using SageMaker to train a deep learning model. The training job runs on a single GPU instance and is taking too long. Which THREE actions can the data scientist take to reduce training time? (Choose three.)

Select 3 answers

A.Increase the size of the EBS volume attached to the instance.

B.Increase the number of instances but keep the same total data.

C.Switch to Pipe input mode to reduce I/O waiting time.

D.Use a larger GPU instance type, such as p3.16xlarge.

E.Use distributed training across multiple GPU instances.

AnswersC, D, E

Streams data directly, reducing I/O bottleneck.

Why this answer

Option C is correct because Pipe input mode streams training data directly from Amazon S3 to the GPU instance, reducing I/O waiting time compared to the default File mode, which downloads the entire dataset to the EBS volume first. This minimizes the time the GPU spends idle waiting for data, especially for large datasets that cannot fit entirely in memory.

Exam trap

The trap here is that candidates may confuse increasing instance count (Option B) with distributed training (Option E), not realizing that simply adding instances without distributed training code and configuration does not parallelize the workload and can even increase overhead.

Practice this question →

73

Multi-Selectmedium

A data scientist is building a regression model to predict housing prices. The dataset includes numerical features such as square footage, number of bedrooms, and year built, as well as categorical features such as neighborhood and roof type. Which TWO preprocessing steps are most important to apply before training a linear regression model?

Select 2 answers

A.Apply principal component analysis (PCA) for dimensionality reduction

B.One-hot encode categorical features

C.Remove outliers using IQR

D.Add interaction terms between all features

E.Normalize or standardize numerical features

AnswersB, E

One-hot encoding converts categorical variables into numerical form suitable for linear regression.

Why this answer

Linear regression requires numerical features and is sensitive to feature scales. Encoding categorical variables as numerical is necessary, and scaling numerical features ensures that no single feature dominates the model.

Practice this question →

74

MCQhard

A data scientist is building a recommendation system for an e-commerce platform. The dataset contains user interactions (clicks, purchases) and item metadata. The scientist wants to use matrix factorization. Which algorithm should be used?

A.SageMaker Image Classification

B.SageMaker BlazingText

C.SageMaker XGBoost

D.SageMaker Factorization Machines

AnswerD

Factorization Machines are designed for recommendation and matrix factorization.

Why this answer

Option D is correct because SageMaker provides the Factorization Machines algorithm for recommendation and matrix factorization. Option A is wrong because XGBoost is for gradient boosting, not matrix factorization. Option B is wrong because BlazingText is for text.

Option C is wrong because Image Classification is for images.

Practice this question →

75

MCQmedium

A company is using Amazon SageMaker to train a deep learning model on a large dataset. The training job is taking too long. The team wants to reduce training time without changing the model architecture. Which action should they take?

A.Increase the learning rate by a factor of 10

B.Use SageMaker's distributed training with multiple instances

C.Reduce the number of epochs

D.Reduce the batch size

AnswerB

Distributed training parallelizes the workload.

Why this answer

SageMaker's distributed training with multiple instances splits the dataset and model computations across several machines, enabling parallel processing that significantly reduces wall-clock training time. This approach leverages data parallelism or model parallelism without altering the model architecture, directly addressing the need for faster training.

Exam trap

Cisco often tests the misconception that simply adjusting hyperparameters like learning rate or batch size can solve performance issues, when the correct answer is to leverage distributed computing resources that SageMaker provides natively.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate by a factor of 10 can cause the optimizer to overshoot minima, leading to divergence or unstable training, and does not guarantee reduced training time without risking model quality. Option C is wrong because reducing the number of epochs directly reduces the amount of training iterations, which may lower model accuracy or prevent convergence, and is not a valid method to reduce training time while preserving model performance. Option D is wrong because reducing the batch size typically increases the number of weight updates per epoch and can slow down training due to less efficient hardware utilization and increased communication overhead, especially on GPUs.

Practice this question →

Page 1 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Modeling questions.

Start 20-question session