CCNA Modeling Questions

75 of 624 questions · Page 2/9 · Modeling · Answers revealed

76
MCQeasy

A data scientist is training a binary classification model on a dataset with a severe class imbalance (95% negative, 5% positive). The model achieves 95% accuracy but only correctly identifies 10% of the positive class. Which metric should the data scientist use to evaluate model performance?

A.Log loss
B.F1 score
C.Accuracy
D.Area under the ROC curve (AUC)
AnswerB

F1 score balances precision and recall, making it suitable for imbalanced datasets where the minority class is important.

Why this answer

The F1 score is the harmonic mean of precision and recall, making it robust to class imbalance. With 95% accuracy but only 10% recall on the positive class, the model is essentially a trivial classifier that predicts the majority class. F1 score captures both false positives and false negatives, providing a balanced view of performance on the minority class.

Exam trap

The trap here is that candidates see high accuracy and assume the model is good, but AWS tests the understanding that accuracy is meaningless for imbalanced datasets, and that AUC can be misleadingly high even when minority class recall is poor.

How to eliminate wrong answers

Option A is wrong because log loss measures the probabilistic confidence of predictions and can be misleading when class imbalance is severe, as it is dominated by the majority class. Option C is wrong because accuracy is misleading in imbalanced datasets; a model predicting all negatives achieves 95% accuracy without learning anything about the positive class. Option D is wrong because AUC measures the model's ability to rank positive instances higher than negative ones, but it can still be high even when recall on the positive class is low, as it aggregates performance across all thresholds.

77
MCQhard

A healthcare company is building a model to predict patient readmission within 30 days. They have structured electronic health records (EHR) data with 200 features. The data includes missing values, categorical variables with high cardinality (e.g., diagnosis codes), and a severe class imbalance (5% readmission). They need to deploy a model on SageMaker that is interpretable and achieves high recall for the positive class. Which combination of techniques should they use?

A.Use XGBoost with SMOTE, feature selection via SHAP, and deploy as a SageMaker endpoint
B.Use logistic regression with one-hot encoding and random undersampling
C.Use PCA for dimensionality reduction, then train a linear SVM with class weights
D.Use a deep neural network with embeddings for categorical variables and oversample the minority class
AnswerA

XGBoost handles missing values, SMOTE addresses imbalance, SHAP provides interpretability.

Why this answer

XGBoost with SMOTE and SHAP balances interpretability and performance.

78
MCQeasy

A machine learning engineer needs to deploy a model that requires low latency (under 10 ms) for real-time inference. The model is a small ensemble of decision trees. Which Amazon SageMaker endpoint configuration is MOST appropriate?

A.Batch transform
B.Training job
C.Real-time endpoint
D.Multi-model endpoint
AnswerC

Real-time endpoints provide low latency.

Why this answer

Real-time endpoints in Amazon SageMaker are designed for low-latency inference (typically under 10 ms) and are the correct choice for deploying a small ensemble of decision trees that needs to respond to individual prediction requests in real time. They keep the model loaded and ready, providing a persistent HTTPS endpoint that can serve predictions with minimal overhead.

Exam trap

The trap here is that candidates often confuse Multi-model endpoints with real-time endpoints, assuming they offer the same low-latency guarantees, but Multi-model endpoints trade off latency for cost efficiency by loading models on demand, which can introduce delays that violate strict latency requirements.

How to eliminate wrong answers

Option A is wrong because Batch Transform is an asynchronous, offline inference service that processes large datasets in batches and does not provide a real-time endpoint; it can take minutes to hours to complete and is unsuitable for sub-10 ms latency. Option B is wrong because a Training job is used to train a model, not to host it for inference; it runs a training algorithm on input data and produces model artifacts, but does not expose an endpoint for serving predictions. Option D is wrong because a Multi-model endpoint is designed to host multiple models on a single endpoint to reduce costs, but it introduces additional latency due to model loading and unloading on demand, making it less suitable for the strict under-10 ms requirement compared to a dedicated real-time endpoint.

79
MCQeasy

A data scientist is using Amazon SageMaker to train a model, but the training job fails with an 'Out of memory' error. The instance type is ml.p3.2xlarge. Which action should the data scientist take to resolve the issue?

A.Use Pipe input mode.
B.Increase the number of instances.
C.Reduce the mini-batch size in the training script.
D.Use a Spot instance.
AnswerC

Reducing batch size reduces memory consumption.

Why this answer

The 'Out of memory' error on a single ml.p3.2xlarge instance indicates that the GPU memory is insufficient for the current workload. Reducing the mini-batch size directly decreases the memory footprint per training step, allowing the model to fit within the available GPU memory without changing the instance type or incurring additional costs.

Exam trap

The trap here is that candidates confuse storage-related issues (disk space, data loading) with compute memory (GPU RAM), leading them to select Pipe input mode or Spot instances, which do not address the fundamental memory constraint.

How to eliminate wrong answers

Option A is wrong because Pipe input mode streams data directly from Amazon S3 without downloading it to the local disk, which reduces disk storage requirements but does not affect GPU memory consumption during training. Option B is wrong because increasing the number of instances distributes the workload across multiple machines but does not reduce the per-instance memory usage; each instance still faces the same GPU memory constraint. Option D is wrong because using a Spot instance provides cost savings but does not change the hardware specifications or memory capacity of the ml.p3.2xlarge instance, so the out-of-memory error would persist.

80
MCQeasy

A company is using Amazon SageMaker to deploy a machine learning model that predicts equipment failure. The model is a binary classifier that outputs a probability. The company wants to set a threshold such that the model correctly identifies 95% of actual failures (recall >= 0.95). The model's precision at the current threshold of 0.5 is 0.7. The data scientist evaluates the model on a test set and obtains the following confusion matrix at threshold 0.5: TP=95, FN=5, FP=40, TN=860. The total actual positives are 100. Which threshold adjustment should the data scientist make to achieve the recall goal?

A.Decrease the threshold to 0.1
B.Increase the threshold to 0.7
C.Keep the threshold at 0.5
D.Decrease the threshold to 0.3
AnswerC

Recall is already 95%, meeting the requirement.

Why this answer

Option B is correct. Lowering the threshold increases recall by classifying more instances as positive, which reduces false negatives. Currently recall is 95/100 = 0.95, so recall is already 95%.

Actually, recall is already 95% at threshold 0.5. So the requirement is already met. But the question might imply that recall needs to be at least 95%, which it is.

However, the stem says 'the company wants to set a threshold such that the model correctly identifies 95% of actual failures (recall >= 0.95)'. At threshold 0.5, recall is 95/100 = 0.95, so it meets the requirement. So no adjustment is needed.

But the options include 'Keep the threshold at 0.5' as option D. So D is correct. Let me check: If recall is already 0.95, then no change needed.

So answer D.

81
MCQmedium

A data scientist is using Amazon SageMaker to train an XGBoost model for a regression problem. The training data contains missing values in some features. Which approach should the data scientist use to handle missing values in XGBoost?

A.Use K-nearest neighbors imputation
B.Leave missing values as-is; XGBoost handles them natively
C.Remove all rows with missing values
D.Impute missing values with the mean of the column
AnswerB

XGBoost can handle missing values by learning the optimal direction to split.

Why this answer

XGBoost has a built-in mechanism to handle missing values natively by learning the best direction to split on missing values during training. For each split, XGBoost assigns missing values to the left or right child node based on which direction minimizes the loss function, making explicit imputation unnecessary for this algorithm.

Exam trap

The trap here is that candidates often default to common imputation techniques (like mean imputation or row removal) without recognizing that XGBoost has a built-in, algorithm-specific method for handling missing values, which is a key differentiator tested in the MLS-C01 exam.

How to eliminate wrong answers

Option A is wrong because K-nearest neighbors imputation is a data preprocessing technique that introduces computational overhead and potential bias, and it is not needed since XGBoost handles missing values internally. Option C is wrong because removing all rows with missing values can lead to significant data loss and reduced model performance, especially when missingness is not completely at random. Option D is wrong because imputing missing values with the mean of the column can distort the underlying distribution and reduce variance, which may degrade model accuracy, and it is unnecessary given XGBoost's native missing value handling.

82
MCQhard

A data scientist is setting up an IAM role for an Amazon SageMaker training job. The policy shown is attached to the role. The training job fails with an access denied error when trying to read the training data from s3://my-bucket/training/data.csv. What is the most likely reason?

A.The bucket policy on my-bucket denies access to the IAM role
B.The training job is using a different IAM role
C.The IAM policy is missing the s3:ListBucket permission
D.The IAM role does not have permission to access the bucket location
AnswerA

The IAM policy allows GetObject, but the bucket policy may have a deny rule that overrides the allow.

Why this answer

Option A is correct because the IAM policy shown grants s3:GetObject access to the bucket and object, but the training job still fails with an access denied error. The most likely cause is that the bucket policy on my-bucket explicitly denies access to the IAM role, overriding the IAM policy's allow. In AWS, an explicit deny in a resource-based policy (bucket policy) takes precedence over any allow in an identity-based policy (IAM role policy), causing the access denied error despite the IAM policy appearing sufficient.

Exam trap

The trap here is that candidates often assume the IAM policy alone determines access, overlooking that resource-based policies (like S3 bucket policies) can override IAM permissions with explicit denies, especially when the bucket policy is not shown in the question.

How to eliminate wrong answers

Option B is wrong because the question states the IAM role is set up for the training job, and the policy shown is attached to that role; if a different role were used, the error would likely be about role mismatch or missing permissions, not specifically about reading training data from the given S3 path. Option C is wrong because s3:ListBucket is not required to read a specific object if the full object ARN is known; the s3:GetObject permission on the object ARN is sufficient for reading the data.csv file, and the error is access denied, not a missing permission that would cause a different error like 403 Forbidden with a different message. Option D is wrong because the IAM policy explicitly grants s3:GetObject on the bucket and object ARN, so the role does have permission to access the bucket location; the failure is due to an external deny from the bucket policy, not a lack of permission in the IAM policy.

83
MCQhard

A data scientist is building a time series forecasting model for daily sales data. The data exhibits strong seasonality with a weekly pattern and a yearly trend. The scientist wants to use Amazon SageMaker's built-in algorithm. Which algorithm is most appropriate?

A.Amazon SageMaker DeepAR
B.Linear Learner
C.K-Means
D.XGBoost
AnswerA

DeepAR is a built-in algorithm for time series forecasting that handles seasonality and trends.

Why this answer

DeepAR is designed for time series forecasting with seasonality and trend. Option A is wrong because XGBoost is a tree-based model for tabular data, not specialized for time series. Option C is wrong because K-Means is clustering.

Option D is wrong because Linear Learner can model trends but not seasonality natively.

84
Multi-Selectmedium

Which TWO of the following are best practices for hyperparameter tuning using Amazon SageMaker? (Choose 2)

Select 2 answers
A.Use grid search to exhaustively explore all combinations.
B.Use early stopping to terminate poorly performing training jobs.
C.Include all algorithm hyperparameters in the tuning job.
D.Use a larger training dataset to improve tuning results.
E.Use automatic model tuning with Bayesian optimization.
AnswersB, E

Early stopping avoids wasted resources.

Why this answer

Option A is correct because early stopping saves time and cost. Option C is correct because automatic tuning with Bayesian optimization is efficient. Option B is wrong because manual grid search is less efficient.

Option D is wrong because tuning all parameters may be unnecessary. Option E is wrong because more training data does not improve tuning efficiency.

85
MCQhard

A data scientist is training a deep learning model for image classification using TensorFlow on Amazon SageMaker. The model trains slowly, and the GPU utilization is below 20%. Which action will MOST effectively increase GPU utilization and reduce training time?

A.Reduce the training dataset size.
B.Increase the batch size to better saturate the GPU.
C.Switch to a CPU-only instance.
D.Decrease the batch size to reduce memory pressure.
AnswerB

Larger batches improve GPU utilization and throughput.

Why this answer

Option B is correct because increasing batch size provides more work per GPU step, improving utilization. Option A is wrong because decreasing batch size reduces utilization. Option C is wrong because switching to CPU would be slower.

Option D is wrong because reducing data increases risk of underfitting.

86
MCQmedium

A company uses Amazon SageMaker to train a deep learning model on a GPU instance. The training job is taking too long. Which action would MOST likely reduce training time?

A.Reduce the mini-batch size
B.Use distributed data parallelism across multiple smaller instances
C.Use a larger GPU instance type, such as p3.16xlarge
D.Reduce the number of epochs
AnswerC

More powerful GPU accelerates training.

Why this answer

Option C is correct because using a larger GPU instance like p3.16xlarge provides significantly more GPU memory, CUDA cores, and memory bandwidth, which allows for larger batch sizes and more efficient parallel processing of matrix operations. This directly reduces training time for deep learning models by enabling faster forward and backward passes through the network, especially when the model is large enough to fully utilize the additional GPU resources.

Exam trap

The trap here is that candidates often confuse reducing mini-batch size (Option A) with improving training speed, but in GPU-accelerated deep learning, larger batch sizes better utilize GPU parallelism and reduce the number of iterations, making a larger instance the more effective solution.

How to eliminate wrong answers

Option A is wrong because reducing the mini-batch size typically increases the number of weight updates per epoch and can lead to noisier gradients, which often increases training time due to more frequent synchronization and less efficient GPU utilization. Option B is wrong because distributed data parallelism across multiple smaller instances introduces communication overhead (e.g., gradient synchronization via AllReduce) that can outweigh the benefits for a single GPU-bound training job, especially if the model does not fit in the smaller instances' memory. Option D is wrong because reducing the number of epochs directly reduces the amount of training performed, but it does not address the underlying performance bottleneck of the training process and may result in underfitting or incomplete convergence.

87
Multi-Selectmedium

A data scientist is training a model using SageMaker and wants to automatically stop training when the model stops improving. Which TWO options can be used?

Select 2 answers
A.AWS Step Functions
B.Built-in early stopping in XGBoost
C.SageMaker Debugger
D.CloudWatch Alarms
E.SageMaker Model Monitor
AnswersB, C

Native early stopping support.

Why this answer

Built-in early stopping in XGBoost (Option B) is correct because XGBoost natively supports an `early_stopping_rounds` parameter that halts training when the validation metric stops improving for a specified number of rounds. SageMaker Debugger (Option C) is correct because it can monitor training metrics in real time and trigger a stop action via a built-in or custom rule (e.g., `VanishingGradient` or `LossNotDecreasing`) when the model stops improving, integrating with SageMaker's `StopTraining` API.

Exam trap

The trap here is that candidates may confuse SageMaker Model Monitor (post-deployment monitoring) with SageMaker Debugger (training-time monitoring), or assume CloudWatch Alarms can directly implement early stopping logic when they are only for threshold-based alerts on emitted metrics.

88
MCQmedium

A company is building a fraud detection model that must achieve low false positive rates. The dataset is highly imbalanced (0.1% positive class). Which metric is most appropriate for model evaluation?

A.RMSE
B.Accuracy
C.Area under the Precision-Recall curve
D.R-squared
AnswerC

Best for imbalanced datasets.

Why this answer

In highly imbalanced datasets (0.1% positive class), the Precision-Recall curve focuses on the performance of the positive class, which is the minority class of interest. Area under the Precision-Recall curve (AUPRC) is insensitive to the large number of true negatives, making it a robust metric for evaluating models where false positives must be minimized. Unlike ROC-AUC, which can be overly optimistic in severe imbalance, AUPRC directly reflects the trade-off between precision and recall for the rare positive class.

Exam trap

AWS often tests the misconception that ROC-AUC is always the best metric for imbalanced classification, but the trap here is that ROC-AUC can be overly optimistic because it considers true negatives, whereas Precision-Recall AUC focuses solely on the positive class and is the correct choice when false positives must be minimized.

How to eliminate wrong answers

Option A is wrong because RMSE (Root Mean Square Error) is a regression metric that measures the average magnitude of errors between continuous values, and is not suitable for binary classification or imbalanced fraud detection. Option B is wrong because Accuracy is misleading in highly imbalanced datasets; a model that predicts the majority class for all instances would achieve 99.9% accuracy but fail to detect any fraud. Option D is wrong because R-squared is a regression metric that measures the proportion of variance explained by the model, and has no relevance to binary classification or precision-recall evaluation.

89
MCQhard

A company is deploying a model for real-time inference with SageMaker. The endpoint receives spiky traffic, with occasional bursts of 10x normal load. Which scaling policy is MOST cost-effective while maintaining availability?

A.Provision a large instance type that can handle the peak load at all times.
B.Manually scale the endpoint based on historical traffic patterns.
C.Use a combination of scheduled scaling for predictable peaks and simple scaling for additional bursts.
D.Use a target tracking scaling policy based on average latency.
AnswerC

Scheduled scaling handles known patterns, while simple scaling provides reactive capacity for bursts.

Why this answer

Option C is correct because it combines scheduled scaling for predictable traffic patterns (e.g., known peak hours) with simple scaling to handle unexpected bursts, ensuring availability during 10x load spikes without over-provisioning. This hybrid approach is more cost-effective than always-on large instances, as it dynamically adjusts capacity only when needed, aligning with SageMaker's automatic scaling capabilities.

Exam trap

The trap here is that candidates often assume target tracking (Option D) is always optimal for cost, but it fails for spiky traffic because it reacts to post-burst metrics like latency, not preemptively scaling for sudden load changes.

How to eliminate wrong answers

Option A is wrong because provisioning a large instance type to handle peak load at all times leads to significant cost waste during low-traffic periods, as you pay for unused capacity continuously. Option B is wrong because manual scaling based on historical patterns cannot react quickly enough to sudden 10x bursts, risking latency or downtime during unpredictable spikes. Option D is wrong because target tracking based on average latency is reactive and may cause slow scaling, as latency increases only after the burst has already impacted performance, potentially leading to dropped requests or throttling.

90
MCQhard

A data scientist is building a recommendation system using matrix factorization. The dataset has 1 million users and 100,000 items, with a sparse user-item interaction matrix. The scientist wants to minimize training time on Amazon SageMaker. Which algorithm would be most appropriate?

A.Linear Learner
B.Factorization Machines
C.K-Means
D.XGBoost
AnswerB

Built for recommendation systems with sparse data.

Why this answer

Factorization Machines (B) are specifically designed for sparse, high-dimensional datasets like the user-item interaction matrix in recommendation systems. They extend matrix factorization by modeling pairwise feature interactions, which is ideal for collaborative filtering tasks. On Amazon SageMaker, the built-in Factorization Machines algorithm is optimized for sparse data and can train efficiently on 1 million users and 100,000 items, minimizing training time compared to general-purpose algorithms.

Exam trap

The trap here is that candidates often choose XGBoost (D) because of its popularity and strong performance on tabular data, but they overlook that it requires dense feature engineering and is not optimized for the sparse, high-cardinality interaction matrices typical in recommendation systems.

How to eliminate wrong answers

Option A (Linear Learner) is wrong because it models only linear relationships and cannot capture the complex pairwise interactions between users and items that are essential for recommendation systems; it also does not handle sparse categorical features efficiently. Option C (K-Means) is wrong because it is an unsupervised clustering algorithm that groups similar data points, not a supervised or matrix factorization method for predicting user-item interactions; it cannot generate personalized recommendations from a sparse interaction matrix. Option D (XGBoost) is wrong because it is a tree-based ensemble method that requires dense feature engineering and is not designed for sparse matrix factorization; it would be computationally expensive and less effective on high-cardinality categorical features like user and item IDs.

91
MCQeasy

A data scientist is building a time series forecasting model for monthly sales data. The scientist has observed that the sales data shows a clear upward trend and a seasonal pattern that repeats every 12 months. Which algorithm would be most appropriate for this task?

A.ARIMA
B.Random Forest
C.k-means clustering
D.Linear regression with time-based features
AnswerA

ARIMA (or SARIMA) directly models trend and seasonality in time series data.

Why this answer

ARIMA (Autoregressive Integrated Moving Average) is specifically designed for time series forecasting and can handle both trend and seasonality through its parameters: the 'I' (differencing) removes trend, and seasonal ARIMA (SARIMA) extends it with seasonal differencing and seasonal AR/MA terms to capture the 12-month repeating pattern. This makes it the most appropriate choice for monthly sales data with a clear upward trend and annual seasonality.

Exam trap

The trap here is that candidates often choose Linear regression with time-based features (Option D) because they think adding a time index and month dummies is sufficient, but they overlook that ARIMA is purpose-built for time series with autocorrelation and seasonality, while linear regression violates the independence assumption and cannot model the stochastic seasonal patterns without extensive feature engineering.

How to eliminate wrong answers

Option B (Random Forest) is wrong because it is a tree-based ensemble method for regression or classification that does not inherently model temporal dependencies or seasonality; it treats each time point as an independent feature, ignoring the sequential nature and autocorrelation of time series data. Option C (k-means clustering) is wrong because it is an unsupervised clustering algorithm used to partition data into groups based on similarity, not for forecasting future values in a time series. Option D (Linear regression with time-based features) is wrong because while it can model a linear trend by including a time index feature, it cannot capture the complex autocorrelation structure and seasonal patterns without manually engineering lagged variables and seasonal dummies, and it assumes independent errors, which is violated in time series data.

92
Drag & Dropmedium

Drag and drop the steps to deploy a model as a SageMaker endpoint for real-time inference in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Deployment requires model creation, endpoint configuration, endpoint creation, and testing.

93
MCQmedium

A machine learning engineer is monitoring a deployed model on SageMaker and notices that the prediction latency is increasing over time. The model is a linear regression with a small number of features. Which is the MOST likely cause?

A.The number of features is too large
B.The CPU utilization is too low
C.The model is overfitting to recent data
D.The inference code has a memory leak
AnswerD

Memory leaks cause gradual performance degradation and increased latency.

Why this answer

Memory leak or accumulation of model artifacts in inference code can cause latency growth over time.

94
MCQhard

A team is training a deep learning model on SageMaker using a custom Docker container. The training job fails with 'OutOfMemoryError'. The instance type is ml.p3.2xlarge with 61 GB memory. Which change should increase available memory?

A.Reduce the batch size to use less memory.
B.Use SageMaker distributed data parallelism to distribute the model across multiple instances.
C.Set the 'shm-size' parameter in the SageMaker training container to a larger value.
D.Mount an Amazon FSx for Lustre file system to offload data.
AnswerC

Increasing shared memory (/dev/shm) can resolve OutOfMemory errors in deep learning frameworks.

Why this answer

The 'OutOfMemoryError' in a SageMaker training container often stems from insufficient shared memory (/dev/shm) for data-loading workers, especially with PyTorch or TensorFlow dataloaders that use multiprocessing. Increasing the 'shm-size' parameter allocates more shared memory to the container, resolving the error without altering the model or instance type.

Exam trap

Cisco often tests the misconception that 'OutOfMemoryError' always refers to GPU memory, leading candidates to choose batch size reduction, when in SageMaker containers it frequently indicates insufficient shared memory for data-loading processes.

How to eliminate wrong answers

Option A is wrong because reducing the batch size decreases GPU memory usage, not the shared memory (/dev/shm) that causes the 'OutOfMemoryError' in this context. Option B is wrong because SageMaker distributed data parallelism splits the model or data across multiple instances, which does not increase the memory available to a single container; it adds complexity without addressing the shared memory limit. Option D is wrong because mounting an Amazon FSx for Lustre file system offloads storage, not memory; it does not increase the container's shared memory or RAM capacity.

95
MCQhard

A financial services company uses Amazon SageMaker to train a model for credit risk prediction. The dataset contains 500 features and 1 million records. The target variable is binary with 20% default rate. The data scientist uses a gradient boosting algorithm (XGBoost) with default hyperparameters. After training, the model achieves 95% accuracy, but the precision for the default class is only 30%, and recall is 15%. The business requires at least 50% recall and 40% precision for the default class. The data scientist tries to adjust the decision threshold, but this does not simultaneously meet both targets. The scientist suspects that the model is not learning the default patterns well. The company also has a large dataset of unlabeled transactions that could be used. Which action should the data scientist take to improve the model?

A.Apply PCA to reduce dimensionality and noise.
B.Use the unlabeled data for semi-supervised learning with pseudo-labeling.
C.Increase the learning rate to accelerate convergence.
D.Reduce the number of features using feature selection to simplify the model.
AnswerB

Pseudo-labeling leverages unlabeled data to improve minority class detection.

Why this answer

Option C is correct because using the unlabeled data for pseudo-labeling can improve the model when the labeled dataset is imbalanced and the model is struggling to learn the minority class. Option A is wrong because reducing the number of features may not help if the features are relevant. Option B is wrong because increasing the learning rate may cause overfitting or divergence.

Option D is wrong because PCA may discard valuable information.

96
Multi-Selecteasy

Which TWO actions can help reduce overfitting when training a model on SageMaker? (Choose TWO.)

Select 2 answers
A.Reduce the amount of training data
B.Use early stopping based on validation error
C.Increase the maximum depth of the trees in XGBoost
D.Increase the number of training epochs
E.Add L1 regularization to the loss function
AnswersB, E

Early stopping halts training when validation error stops improving, preventing overfitting.

Why this answer

Early stopping monitors the validation error during training and halts the process when the error stops improving, preventing the model from learning noise in the training data. This directly reduces overfitting by ensuring the model does not continue to fit to spurious patterns after generalization has peaked.

Exam trap

AWS often tests the misconception that adding more data or increasing model complexity (like tree depth or epochs) always improves performance, when in fact these actions typically worsen overfitting without proper validation or regularization.

97
MCQhard

A company uses SageMaker to deploy a model for real-time inference. The model is a large ensemble that requires 8 GB of memory and has high latency. The team wants to reduce latency without increasing cost. Which strategy is most effective?

A.Use a larger instance type with more memory.
B.Deploy the model on multiple instances behind a load balancer.
C.Use SageMaker Neo to compile the model for the target instance.
D.Switch from real-time inference to batch transform.
AnswerC

Neo optimizes model for faster inference without additional cost.

Why this answer

Option D is correct because SageMaker Neo optimizes trained models for target hardware, reducing latency and memory footprint. Option A is wrong because using a larger instance increases cost. Option B is wrong because batch transform is for offline, not real-time.

Option C is wrong because multiple instances increase cost.

98
MCQmedium

A data scientist is using Amazon SageMaker to train a model. The training data is stored in an S3 bucket encrypted with AWS KMS. During training, the job fails with an access denied error. What is the MOST likely cause?

A.The training instance type does not support encryption
B.The training data is not in the same region as the SageMaker notebook
C.The S3 bucket policy does not allow SageMaker to list objects
D.The SageMaker execution role lacks kms:Decrypt permission for the KMS key
AnswerD

SageMaker needs KMS decrypt permissions to read encrypted data from S3.

Why this answer

SageMaker needs permission to use the KMS key to decrypt the data; the execution role must have kms:Decrypt permissions.

99
MCQmedium

A machine learning engineer is trying to deploy a model using a SageMaker endpoint but receives an access denied error. The IAM policy attached to the role is shown in the exhibit. What is the MOST likely cause of the error?

A.The policy does not include sagemaker:CreateEndpoint.
B.The policy does not specify resource ARNs.
C.The policy does not include sagemaker:InvokeEndpoint.
D.The policy does not include iam:PassRole.
AnswerD

SageMaker requires iam:PassRole to use the execution role.

Why this answer

The error occurs because the IAM role used by SageMaker does not have the iam:PassRole permission, which is required to allow SageMaker to assume the role and access the necessary resources (e.g., S3 buckets, EC2 instances) during endpoint deployment. Without this permission, SageMaker cannot pass the role to the service, resulting in an access denied error even if other SageMaker actions are allowed.

Exam trap

The trap here is that candidates often focus on missing SageMaker-specific actions (like CreateEndpoint or InvokeEndpoint) rather than recognizing that the fundamental issue is the missing iam:PassRole permission, which is a common prerequisite for any AWS service that needs to assume a role.

How to eliminate wrong answers

Option A is wrong because sagemaker:CreateEndpoint is not required for deploying a model to an existing endpoint; the error occurs during the deployment step where the role is passed, not during endpoint creation. Option B is wrong because the policy does specify resource ARNs (e.g., 'Resource': '*'), so the absence of ARNs is not the issue. Option C is wrong because sagemaker:InvokeEndpoint is used for invoking the endpoint after deployment, not for the deployment itself, and the error occurs before invocation.

100
MCQhard

A data scientist trains a neural network using TensorFlow on SageMaker. The training job fails with a 'CUDA out of memory' error. What is the most likely cause and solution?

A.The dataset is too large. Use SageMaker Pipe mode.
B.The model is too large for the GPU. Use a smaller batch size.
C.The training script has a bug. Use SageMaker Debugger.
D.The instance type is insufficient. Use distributed training across multiple instances.
AnswerB

Reducing batch size decreases memory usage.

Why this answer

CUDA out of memory indicates that the GPU memory is insufficient for the batch size or model size. Reducing the batch size is a common fix. Switching to CPU is not ideal for deep learning.

Increasing the number of instances may help but requires distributed training setup. Upgrading to a larger instance type is another option, but reducing batch size is simpler.

101
Multi-Selectmedium

A data scientist is training a deep neural network on Amazon SageMaker. The training is taking a long time and the data scientist wants to speed it up. Which THREE actions can help reduce training time?

Select 3 answers
A.Use GPU instances instead of CPU instances
B.Use distributed training across multiple instances
C.Use Pipe mode to stream data from S3
D.Increase the batch size
E.Use a smaller instance type
AnswersA, B, C

GPUs accelerate deep learning computations.

Why this answer

GPU instances (e.g., P3, P4d) are optimized for the massively parallel matrix operations required by deep neural networks, providing orders-of-magnitude faster computation than CPU instances for training tasks. By offloading tensor operations to GPU cores, the training time is significantly reduced, especially for large models and datasets.

Exam trap

AWS often tests the misconception that increasing batch size always speeds up training, but candidates overlook the memory constraints and potential negative impact on model accuracy, while also confusing smaller instance types as a cost-saving measure that inadvertently slows training.

102
MCQeasy

A startup is deploying a machine learning model for real-time recommendation on Amazon SageMaker. The model is a TensorFlow model (1 GB) and the endpoint uses a single ml.c5.2xlarge instance. The inference latency is currently 500 ms per request. The startup expects traffic to increase 10x in the next month. They want to maintain latency under 500 ms. What is the most cost-effective solution?

A.Use SageMaker Batch Transform to process requests in batches
B.Switch to a GPU instance type for faster inference
C.Set up auto-scaling for the endpoint based on average latency or request count
D.Upgrade to a larger CPU instance type, such as ml.c5.4xlarge
AnswerC

Auto-scaling adds capacity dynamically, handling traffic spikes cost-effectively.

Why this answer

Option A is correct because auto-scaling adds instances only when needed, handling increased traffic while keeping latency low. Option B (larger instance) is more expensive and may not be needed. Option C (GPU) is overkill and costly.

Option D (batch) is not real-time.

103
MCQeasy

A data scientist is training a binary classification model on an imbalanced dataset where the positive class accounts for 5% of the data. The model achieves 95% accuracy but has a recall of only 10% for the positive class. Which metric should the data scientist primarily use to evaluate model performance?

A.RMSE
B.F1 Score
C.Accuracy
D.AUC-ROC
AnswerB

F1 score considers both precision and recall.

Why this answer

The F1 Score is the harmonic mean of precision and recall, making it ideal for imbalanced datasets where accuracy is misleading. With 95% accuracy but only 10% recall, the model is simply predicting the majority class (negative) almost always, so F1 Score captures the trade-off between false positives and false negatives better than accuracy or AUC-ROC.

Exam trap

AWS often tests the misconception that high accuracy always indicates good model performance, especially on imbalanced datasets, leading candidates to overlook metrics like F1 Score that account for class distribution.

How to eliminate wrong answers

Option A is wrong because RMSE (Root Mean Squared Error) is a regression metric that measures the square root of the average squared differences between predicted and actual values, not applicable to binary classification. Option C is wrong because accuracy is misleading on imbalanced datasets; a model that always predicts the negative class achieves 95% accuracy but fails to identify the positive class (5% prevalence), as seen with 10% recall. Option D is wrong because AUC-ROC can be overly optimistic on highly imbalanced data; it measures the area under the ROC curve (TPR vs FPR), but with only 5% positives, the FPR remains low even if the model rarely predicts positive, giving a falsely high score.

104
MCQeasy

A data scientist is using SageMaker to train a model. The training data is stored in an S3 bucket in a different AWS account. What is required to allow SageMaker to access the data?

A.Configure the SageMaker execution role with a policy that grants cross-account access to the S3 bucket.
B.Set up VPC peering between the two accounts.
C.Create a SageMaker notebook instance in the same account as the S3 bucket.
D.Launch the training job from a SageMaker notebook in the account containing the S3 bucket.
AnswerA

The IAM role used by SageMaker must have permissions to access the S3 bucket in the other account.

Why this answer

Option A is correct because SageMaker uses an IAM execution role to access resources. To allow cross-account access to an S3 bucket, the SageMaker execution role must have an IAM policy that grants s3:GetObject and s3:ListBucket permissions for the bucket, and the S3 bucket policy must also grant cross-account access to that role. This is the standard AWS mechanism for cross-account resource access.

Exam trap

The trap here is that candidates often confuse network-level solutions (VPC peering) with IAM-based access control, or assume that running the job from the same account as the data automatically grants access, ignoring that the SageMaker execution role is the key security boundary.

How to eliminate wrong answers

Option B is wrong because VPC peering is used for network connectivity between VPCs, not for granting IAM-based data access permissions; SageMaker accesses S3 via AWS APIs, not through VPC peering. Option C is wrong because creating a SageMaker notebook instance in the same account as the S3 bucket does not resolve the cross-account access issue; the training job still runs in the original account and requires proper IAM permissions. Option D is wrong because launching the training job from a notebook in the account containing the S3 bucket does not change the fact that the training job runs under the execution role of the original account; cross-account access must be explicitly configured via IAM policies.

105
MCQeasy

A team is training a binary classifier and obtains a confusion matrix with 100 true positives, 10 false positives, 20 false negatives, and 200 true negatives. What is the precision of the model?

A.0.91
B.0.87
C.0.94
D.0.83
AnswerA

Precision = 100/(100+10)=0.91.

Why this answer

Precision is calculated as TP / (TP + FP). With 100 true positives and 10 false positives, precision = 100 / (100 + 10) = 100 / 110 ≈ 0.909, which rounds to 0.91. This metric measures how many of the positive predictions were actually correct.

Exam trap

The trap here is that candidates often confuse precision with recall or accuracy, especially when the numbers are close, leading them to pick 0.83 (recall) or miscalculate the denominator.

How to eliminate wrong answers

Option B (0.87) is wrong because it incorrectly uses recall (TP / (TP + FN) = 100/120 ≈ 0.833) or misapplies the denominator. Option C (0.94) is wrong because it likely uses accuracy (TP+TN / total = 300/330 ≈ 0.909) but miscalculates or uses F1-score logic. Option D (0.83) is wrong because it represents recall (100/120 ≈ 0.833), not precision.

106
MCQhard

A data scientist is training a deep learning model for object detection. The training loss decreases rapidly in the first few epochs but then plateaus at a high value. The validation loss starts increasing after a few epochs. Which adjustment is MOST likely to improve generalization?

A.Add more convolutional layers
B.Use more aggressive data augmentation
C.Increase the learning rate
D.Implement early stopping with a patience parameter
AnswerD

Early stopping prevents overfitting by terminating training when validation loss degrades.

Why this answer

The described behavior—training loss plateauing at a high value while validation loss increases—is a classic sign of overfitting. Early stopping with a patience parameter halts training when validation performance stops improving, preventing the model from memorizing noise and thus improving generalization. This directly addresses the overfitting without altering the model architecture or data distribution.

Exam trap

AWS often tests the distinction between underfitting and overfitting symptoms, and candidates may mistakenly choose data augmentation (Option B) as a universal fix, but the plateauing training loss and rising validation loss specifically indicate overfitting, where early stopping is the most direct remedy.

How to eliminate wrong answers

Option A is wrong because adding more convolutional layers increases model capacity, which would exacerbate overfitting and likely worsen the validation loss increase. Option B is wrong because more aggressive data augmentation could help reduce overfitting, but the question asks for the adjustment most likely to improve generalization given the specific symptoms; early stopping is a more direct and immediate fix for the observed plateau and divergence, whereas augmentation might not address the core issue of training too long. Option C is wrong because increasing the learning rate would cause the loss to oscillate or diverge, not improve generalization, and the training loss is already plateauing, indicating the optimizer is near a minimum.

107
MCQeasy

A data scientist is building a regression model to predict house prices. The dataset has 10 features, and the model shows high variance with a low bias. Which technique should the data scientist use to reduce variance?

A.Apply L2 regularization to the model.
B.Increase the depth of decision trees in the ensemble.
C.Add more features to the model.
D.Reduce the amount of training data.
AnswerA

L2 regularization reduces variance by penalizing large coefficients.

Why this answer

L2 regularization (Ridge regression) penalizes large coefficients by adding a squared magnitude term to the loss function, which shrinks the model's weights and reduces variance without substantially increasing bias. This directly addresses the high-variance, low-bias symptom, making the model less sensitive to fluctuations in the training data.

Exam trap

Cisco often tests the misconception that adding more data or features always reduces variance, but the trap here is that high variance is best addressed by regularization or simplifying the model, not by increasing complexity or reducing data.

How to eliminate wrong answers

Option B is wrong because increasing the depth of decision trees in an ensemble (e.g., random forest or gradient boosting) increases model complexity, which typically raises variance and worsens overfitting, not reduces it. Option C is wrong because adding more features increases the dimensionality and capacity of the model, which tends to increase variance further, especially when the current model already shows high variance. Option D is wrong because reducing the amount of training data generally increases variance (the model becomes more sensitive to the specific sample) and can also increase bias due to insufficient learning, which is the opposite of the desired effect.

108
MCQmedium

A company is using Amazon SageMaker to deploy a model that predicts customer churn. The model was trained using a linear learner algorithm. During inference, the endpoint returns predictions that are always 0.5 (the probability of churn). What is the most likely cause?

A.The dataset is highly imbalanced, and the model is predicting the majority class
B.The model was trained with too few epochs
C.The input features are not normalized
D.The learning rate is set too high, causing the model to converge to the mean prediction
AnswerD

A high learning rate can cause the model to overshoot and settle at the mean of the target variable.

Why this answer

If the model always outputs 0.5, it suggests that the model is not learning and is stuck at the prior probability. This often happens when the learning rate is too high (causing divergence) or too low (causing slow convergence) so that the model does not update weights. The other options would cause different symptoms: data imbalance might bias towards 0 or 1, not exactly 0.5; feature scaling issues typically cause NaN or poor convergence; insufficient epochs might not converge but not necessarily give exactly 0.5.

109
Multi-Selectmedium

A data scientist is using Amazon SageMaker to train a linear regression model. The training data contains outliers. Which THREE techniques can mitigate the impact of outliers?

Select 3 answers
A.Remove observations with outlier values from the dataset.
B.Increase the number of layers in the model.
C.Standardize the features to have mean zero and unit variance.
D.Apply winsorization to the feature values.
E.Use a loss function that is robust to outliers, such as Huber loss.
AnswersA, D, E

Direct removal eliminates outlier impact.

Why this answer

Option A is correct because removing observations with outlier values directly eliminates data points that can disproportionately influence the linear regression coefficients, leading to a more stable and representative model. In Amazon SageMaker, this can be done during data preprocessing using built-in algorithms or custom scripts in a SageMaker Processing job.

Exam trap

AWS often tests the misconception that feature scaling (standardization) alone can handle outliers, but scaling does not reduce the leverage of extreme values; it only changes their numeric range.

110
MCQeasy

A data scientist is training a linear regression model and notices that the model performs well on training data but poorly on validation data. Which technique should be applied to reduce overfitting?

A.Apply L2 regularization (Ridge)
B.Increase the number of epochs
C.Add more features
D.Remove training examples
AnswerA

Regularization reduces overfitting by penalizing large coefficients.

Why this answer

L2 regularization (Ridge) adds a penalty term proportional to the square of the magnitude of the coefficients to the loss function. This shrinks the weights toward zero, reducing the model's sensitivity to individual features and preventing it from fitting noise in the training data, which directly addresses overfitting.

Exam trap

The trap here is that candidates often confuse regularization with techniques that increase model capacity (like adding features or more training iterations), not realizing that overfitting requires reducing complexity, not increasing it.

How to eliminate wrong answers

Option B is wrong because increasing the number of epochs (training iterations) typically allows the model to fit the training data even more closely, worsening overfitting rather than reducing it. Option C is wrong because adding more features increases model complexity and the risk of capturing noise, which exacerbates overfitting. Option D is wrong because removing training examples reduces the amount of data available for learning, which can increase variance and make overfitting more likely, not less.

111
MCQeasy

A data scientist is evaluating a regression model. The RMSE on the training set is 2.5, and on the test set is 2.7. The R² on the test set is 0.98. What does this indicate?

A.The model has high bias
B.The model generalizes well with no severe overfitting
C.The model is underfitting because R² is too high
D.The model is overfitting because RMSE is lower on training data
AnswerB

Small difference in RMSE and high test R² indicate good generalization.

Why this answer

The model has low error and high R² on both sets, indicating good generalization without significant overfitting. The small difference between training and test RMSE suggests no severe overfitting.

112
Multi-Selectmedium

A data scientist is training a deep learning model on Amazon SageMaker and wants to reduce the training time. Which TWO actions would help achieve this?

Select 2 answers
A.Enable data augmentation.
B.Use distributed training across multiple instances.
C.Use SageMaker Automatic Model Tuning.
D.Use a GPU-based instance type.
E.Use SageMaker Managed Spot Training.
AnswersB, D

Distributed training parallelizes computation, reducing wall-clock time.

Why this answer

Distributed training across multiple instances (Option B) reduces training time by parallelizing the workload across multiple compute nodes, leveraging data parallelism or model parallelism to process larger batches or model partitions simultaneously. This is particularly effective for deep learning models where the dataset or model size exceeds the capacity of a single instance, as it scales throughput linearly with the number of instances under ideal conditions.

Exam trap

AWS often tests the distinction between cost-saving techniques (like Spot Training) and performance-enhancing techniques (like distributed training or GPU instances), leading candidates to mistakenly select Spot Training as a way to reduce training time when it only reduces cost.

113
MCQeasy

A data scientist is training a linear regression model. After training, the model has a high bias and low variance. Which technique should the data scientist use to reduce bias?

A.Decrease the model complexity
B.Add more relevant features
C.Apply L2 regularization (Ridge)
D.Reduce the amount of training data
AnswerB

Adding features increases model complexity and can reduce bias.

Why this answer

High bias indicates the model is underfitting the data, meaning it is too simple to capture the underlying patterns. Adding more relevant features increases model complexity, allowing it to learn more from the data and reduce bias. This directly addresses the underfitting issue without increasing variance excessively, provided the features are meaningful.

Exam trap

Cisco often tests the bias-variance tradeoff by presenting regularization as a solution for high bias, but candidates must remember that regularization (L1/L2) primarily reduces variance, not bias, and can actually increase bias if applied too strongly.

How to eliminate wrong answers

Option A is wrong because decreasing model complexity (e.g., using fewer features or a simpler algorithm) would further increase bias, worsening the underfitting problem. Option C is wrong because L2 regularization (Ridge) adds a penalty on large coefficients, which reduces variance but can increase bias by shrinking coefficients toward zero, making the model simpler. Option D is wrong because reducing the amount of training data typically increases variance and can also increase bias if the remaining data is not representative, but it does not directly reduce bias and may harm generalization.

114
MCQhard

A team is deploying a real-time inference endpoint using Amazon SageMaker. The model is a large ensemble of 10 deep learning models, each 500 MB. The inference latency requirement is under 200 ms. Currently, the endpoint using a single ml.p3.2xlarge instance takes 1.5 seconds per request. Which approach is MOST likely to meet the latency requirement?

A.Increase the batch size to process more requests per invocation.
B.Switch to a compute-optimized instance like c5.18xlarge.
C.Use SageMaker Neo to compile the model for the target instance.
D.Use model parallelism to split the ensemble across multiple GPUs on a single instance.
AnswerD

Model parallelism reduces per-device load and can achieve latency target.

Why this answer

Option C is correct because model parallelism distributes the model across multiple GPUs, reducing per-device memory and computation time. Option A is wrong because increasing batch size increases latency. Option B is wrong because changing instance type alone may not reduce latency enough.

Option D is wrong because SageMaker Neo does not support model parallelism.

115
MCQmedium

A financial services company is building a fraud detection model using a large dataset of credit card transactions. The dataset contains 10 million rows with 50 features, including transaction amount, merchant category, time of day, and customer historical features. The label is binary: fraudulent (1% of data) or legitimate. The company wants to deploy a real-time inference endpoint using Amazon SageMaker that can score transactions with sub-100ms latency. The current model is a gradient boosting model (XGBoost) trained on a sample of 1 million rows due to memory constraints. The model achieves 0.95 AUC on a held-out test set but the fraud recall (sensitivity) is only 0.4, which is unacceptable because the cost of missing a fraud is high. The data science team has access to a larger compute instance (ml.m5.24xlarge) for training. Which course of action is most likely to improve fraud recall while maintaining latency requirements?

A.Train the XGBoost model on the full 10 million rows using an ml.p3.2xlarge instance with GPU support, and apply SMOTE oversampling to the minority class before training.
B.Engineer additional features from transaction time and merchant category, then retrain the XGBoost model on the same 1 million row sample.
C.Downsample the majority class to 1% of the original size to create a balanced dataset of 200,000 rows, then retrain the XGBoost model on this balanced sample.
D.Replace XGBoost with a logistic regression model trained on the full dataset, as linear models are faster to train and may generalize better on large data.
AnswerA

Using a GPU instance allows training on the full dataset efficiently, and SMOTE oversampling balances the classes, directly improving recall.

Why this answer

Option A is correct because training on the full 10 million rows with a GPU-accelerated instance (ml.p3.2xlarge) allows the XGBoost model to learn from the complete data distribution, addressing the bias introduced by the 1 million row sample. Applying SMOTE oversampling to the minority class (fraud) directly tackles the class imbalance (1% fraud), which is the root cause of the low recall (0.4). SMOTE generates synthetic fraudulent examples, improving the model's ability to detect fraud without significantly increasing inference latency, as the model architecture and deployment remain unchanged.

Exam trap

The trap here is that candidates may choose downsampling (Option C) as a quick fix for class imbalance, overlooking that it discards valuable majority class data and can harm model generalization, while SMOTE (Option A) preserves data and synthetically balances the classes to improve recall without sacrificing latency.

How to eliminate wrong answers

Option B is wrong because engineering additional features and retraining on the same 1 million row sample does not address the fundamental issue of insufficient fraudulent examples in the training data; the model will still suffer from low recall due to class imbalance. Option C is wrong because downsampling the majority class to 1% reduces the dataset to only 200,000 rows, discarding 99% of legitimate transactions, which can lead to loss of valuable patterns and degrade model generalization, while not guaranteeing improved fraud recall. Option D is wrong because replacing XGBoost with logistic regression, a linear model, is unlikely to capture complex non-linear interactions in transaction data, and while it may train faster, it will not improve recall to the required level and may even worsen performance.

116
MCQeasy

A data scientist is training a binary classification model on an imbalanced dataset where the positive class represents 5% of the data. Which metric is most appropriate for evaluating model performance?

A.Accuracy
B.AUC-ROC
C.Root Mean Squared Error (RMSE)
D.R-squared
AnswerB

AUC-ROC evaluates the model's ability to distinguish between classes regardless of threshold and is robust to imbalance.

Why this answer

Option B is correct because AUC-ROC is robust to class imbalance and measures the trade-off between true positive rate and false positive rate. Option A is wrong because accuracy can be misleading with imbalanced data. Option C is wrong because RMSE is for regression.

Option D is wrong because R-squared is for regression.

117
MCQmedium

A company is using Amazon SageMaker to train a model on a dataset with many categorical features. They want to use SageMaker's built-in Linear Learner algorithm. What preprocessing step is required for the categorical features?

A.Apply one-hot encoding to convert them to numerical vectors.
B.Use label encoding to assign integers to categories.
C.Normalize the categorical features using min-max scaling.
D.Remove categorical features with high cardinality.
AnswerA

Linear models need numerical features; one-hot encoding is standard.

Why this answer

The SageMaker Linear Learner algorithm requires numerical input features. Categorical features must be converted to numerical vectors, typically via one-hot encoding, because the algorithm performs linear regression or classification on numerical data. Without this preprocessing, the algorithm cannot interpret categorical values directly.

Exam trap

The trap here is that candidates confuse label encoding (assigning integers) with one-hot encoding, assuming any numerical conversion suffices, but label encoding introduces false ordinality that degrades linear model performance.

How to eliminate wrong answers

Option B is wrong because label encoding assigns arbitrary integers to categories, which implies an ordinal relationship that can mislead the linear model into treating categories as ordered numerical values. Option C is wrong because normalization (min-max scaling) is a scaling technique for numerical features, not a method to convert categorical features to numerical form. Option D is wrong because removing high-cardinality categorical features is a data reduction strategy, not a required preprocessing step for the Linear Learner algorithm; the algorithm can handle one-hot encoded features regardless of cardinality.

118
Multi-Selecthard

Which THREE factors should be considered when choosing between SageMaker built-in algorithms and custom algorithms? (Choose THREE.)

Select 3 answers
A.Custom algorithms allow you to implement any architecture, including proprietary ones
B.Built-in algorithms are optimized for distributed training
C.Built-in algorithms can only be used with CSV and JSON formats
D.Custom algorithms require you to bring your own Docker container, but SageMaker built-in algorithms do not support frameworks like PyTorch
E.Built-in algorithms have predefined hyperparameters that may not fit all use cases
AnswersA, B, E

Custom algorithms offer full flexibility.

Why this answer

Option A is correct because custom algorithms in SageMaker allow you to implement any architecture, including proprietary or novel models that are not available as built-in algorithms. This flexibility is essential when you need to use a custom neural network, a unique loss function, or a model from a research paper that SageMaker does not natively support.

Exam trap

The trap here is that candidates often assume built-in algorithms are limited to CSV/JSON formats and do not support popular frameworks like PyTorch, when in fact SageMaker provides optimized built-in framework containers for PyTorch, TensorFlow, and others, and built-in algorithms support a wide variety of data formats.

119
Multi-Selecthard

A machine learning engineer is evaluating a multi-class classification model that predicts product categories. The model outputs probabilities for 10 classes. The engineer wants to improve the model's calibration so that the predicted probabilities reflect the true likelihood of each class. Which THREE techniques can help?

Select 3 answers
A.Use temperature scaling
B.Apply isotonic regression
C.Increase model complexity
D.Apply Platt scaling
E.Use focal loss
AnswersA, B, D

Temperature scaling adjusts the softmax temperature to improve calibration for neural networks.

Why this answer

Platt scaling and isotonic regression are common calibration methods for classification models. Temperature scaling is a variant of Platt scaling for neural networks. Using a different loss function like cross-entropy helps but is not a calibration technique per se.

120
Multi-Selecteasy

Which TWO techniques can help reduce overfitting in a decision tree model?

Select 2 answers
A.Increase the number of trees in the forest
B.Increase the number of features considered per split
C.Limit the maximum depth of the tree
D.Prune the tree after training
E.Increase the maximum depth of the tree
AnswersC, D

Shallower trees generalize better.

Why this answer

Limiting the maximum depth of the tree (Option C) directly restricts the number of splits, preventing the model from learning overly specific patterns in the training data. Pruning the tree after training (Option D) removes branches that have little predictive power, reducing variance and improving generalization. Both techniques combat overfitting by controlling the complexity of the decision tree.

Exam trap

AWS often tests the distinction between techniques that reduce overfitting in a single decision tree versus ensemble methods, so candidates mistakenly apply Random Forest concepts (like increasing trees or features) to a standalone tree.

121
MCQhard

A data science team is deploying a machine learning model to production using SageMaker. The model is a PyTorch model that requires custom inference logic including image preprocessing. The team needs to ensure that the endpoint can handle variable batch sizes and has low latency. Which deployment approach should the team use?

A.Use SageMaker Inference Pipelines with a preprocessing container followed by the PyTorch model container.
B.Deploy the model as an AWS Lambda function and use API Gateway.
C.Use the SageMaker Python SDK's Predictor class with the model artifact.
D.Use a SageMaker multi-model endpoint to host the model with a custom container.
E.Use SageMaker Batch Transform for real-time inference.
AnswerA

Inference Pipelines allow custom preprocessing and model serving with low latency.

Why this answer

Option A is correct because SageMaker Inference Pipelines allow chaining of preprocessing and prediction containers, enabling custom logic and efficient batching. Option B (Lambda) is not designed for real-time inference with large models. Option C (SageMaker Batch Transform) is for offline batch inference, not real-time.

Option D (multi-model endpoints) is for hosting multiple models, not custom inference logic. Option E (SageMaker SDK's Predictor) is a client, not a deployment approach.

122
Multi-Selecteasy

A data scientist is using Amazon SageMaker to train a linear regression model. The training data contains missing values. Which TWO techniques are appropriate for handling missing values in the dataset?

Select 2 answers
A.Use a decision tree model that can handle missing values internally
B.Set missing values to zero
C.Remove rows with missing values if the proportion is small
D.Impute missing values with the mean of the column
E.Create a separate category for missing values
AnswersC, D

If a small fraction of rows have missing values, removing them is acceptable.

Why this answer

Options A (Impute with the mean of the column) and C (Remove rows with missing values if the proportion is small) are correct. Imputation is a common technique; removing rows is acceptable if few missing. Option B (Set missing values to zero) can bias the model.

Option D (Use a decision tree that handles missing values internally) is not applicable for linear regression. Option E (Use a separate category for missing) is not suitable for linear regression numeric features.

123
Multi-Selecthard

An ML team trains a deep learning model using Amazon SageMaker with a custom Docker container. Training completes successfully, but the model's accuracy on the test set is significantly lower than expected. The team suspects overfitting. Which two actions should they take to mitigate overfitting? (Choose TWO.)

Select 2 answers
A.Use data augmentation
B.Add dropout layers
C.Reduce the batch size
D.Increase the number of layers
E.Increase the number of training epochs
AnswersA, B

Data augmentation increases effective training data size, reducing overfitting.

Why this answer

Dropout and data augmentation are effective regularization techniques to reduce overfitting. Option A (increasing epochs) would worsen overfitting. Option B (reducing batch size) can introduce noise but is not a primary regularization method.

Option D (adding more layers) increases model capacity, likely worsening overfitting.

124
MCQmedium

A company is using Amazon SageMaker to train a deep learning model for image segmentation. The training job uses a single ml.p3.2xlarge instance and takes 48 hours to complete. The team needs to reduce training time to under 12 hours to meet a deadline. The dataset is 50 GB of images stored in S3. The team currently uses File mode to download the data to the training instance. The model architecture is a convolutional neural network (CNN) with 50 layers. The team has access to multiple instances of the same type. Which approach will most effectively reduce training time?

A.Reduce the number of layers in the CNN to speed up training.
B.Increase the batch size on the single instance to process more data per iteration.
C.Use SageMaker's distributed data parallelism with multiple instances.
D.Switch to Pipe mode to stream data from S3, reducing data loading time.
AnswerC

Distributed training across instances parallelizes computation and can achieve near-linear speedup.

Why this answer

Option C is correct because SageMaker's distributed data parallelism splits the 50 GB dataset across multiple ml.p3.2xlarge instances, allowing each instance to process a subset of the data in parallel. This can reduce training time from 48 hours to under 12 hours, assuming near-linear scaling with the number of instances (e.g., 4 instances for a 4x speedup). The approach directly addresses the need to reduce wall-clock time without altering the model architecture or data loading method.

Exam trap

The trap here is that candidates may confuse data loading optimization (Pipe mode) with compute parallelism, overlooking that the 48-hour bottleneck is GPU compute time, not I/O, and that distributed training is the only viable method to achieve a 4x speedup without altering the model.

How to eliminate wrong answers

Option A is wrong because reducing the number of layers in the CNN would degrade model accuracy for image segmentation, and the goal is to reduce training time without compromising model quality; it also does not leverage the available multiple instances. Option B is wrong because increasing the batch size on a single instance may improve GPU utilization but is unlikely to reduce training time from 48 hours to under 12 hours, as it does not address the fundamental bottleneck of sequential processing on one GPU; it can also cause out-of-memory errors or convergence issues. Option D is wrong because switching to Pipe mode streams data directly from S3 without downloading, reducing I/O overhead, but the primary bottleneck is compute (GPU processing), not data loading; the training time is dominated by forward/backward passes through 50 layers, not by data transfer.

125
MCQhard

A data scientist is using SageMaker to train a custom TensorFlow model. The training script reads data from S3 using TensorFlow's tf.data API. The training is bottlenecked by I/O. Which strategy would MOST effectively improve data throughput?

A.Compress the data files in S3
B.Use Amazon FSx for Lustre as a mounted filesystem
C.Increase the number of parallel workers in tf.data
D.Use SageMaker Pipe mode and shard the S3 dataset
AnswerD

Pipe mode streams data directly, and sharding distributes data across instances, improving throughput.

Why this answer

Using SageMaker Pipe mode with a sharded S3 dataset allows the training instances to stream data in parallel, reducing I/O bottlenecks. Increasing workers in tf.data may help but not as effectively as optimizing data ingestion. Using FSx for Lustre provides high throughput but adds cost and complexity.

126
Multi-Selectmedium

Which TWO metrics are MOST appropriate for evaluating a regression model that predicts house prices, where the business is most sensitive to large errors?

Select 2 answers
A.Root Mean Squared Error (RMSE)
B.Mean Absolute Percentage Error (MAPE)
C.Accuracy
D.Mean Absolute Error (MAE)
E.R-squared
AnswersA, B

RMSE squares errors, so large errors are penalized heavily.

Why this answer

RMSE is most appropriate because it squares the errors before averaging, which heavily penalizes large errors. Since the business is most sensitive to large errors in house price predictions, RMSE directly aligns with this requirement by amplifying the impact of outliers, making it a suitable metric for evaluating model performance in this context.

Exam trap

The trap here is that candidates often choose MAE (Option D) because it is a common regression metric, but they fail to recognize that MAE does not penalize large errors more heavily, which is the key business requirement in this scenario.

127
MCQmedium

A data scientist trains a model using SageMaker and notices that the training loss decreases but validation loss increases after a few epochs. What is the MOST likely issue?

A.The learning rate is too low.
B.There is data leakage from validation to training.
C.The model is underfitting the training data.
D.The model is overfitting the training data.
AnswerD

Classic sign of overfitting: training loss decreases, validation loss increases.

Why this answer

Overfitting occurs when the model performs well on training data but poorly on validation data. Option A (underfitting) would show high training loss. Option B (data leakage) would show good performance on both.

Option C (learning rate too low) would slow convergence.

128
MCQhard

A research team is developing a deep learning model to classify medical images into 10 disease categories. They have a dataset of 50,000 labeled images, but the class distribution is highly imbalanced: the most common class has 20,000 images, while the rarest class has only 200 images. To address this, they apply data augmentation (random rotations, flips, and brightness adjustments) to the minority classes until each class has 20,000 images. They then train a convolutional neural network (CNN) from scratch using cross-entropy loss. The model achieves 95% overall accuracy but only 30% recall on the rarest class. Which change is MOST likely to improve recall on the rarest class without significantly reducing overall accuracy?

A.Increase dropout rate from 0.2 to 0.5 to reduce overfitting
B.Replace cross-entropy loss with focal loss
C.Switch from Adam optimizer to SGD with momentum
D.Reduce the batch size from 64 to 16 to increase stochasticity
AnswerB

Focal loss reduces the loss contribution from easy examples and focuses on hard, minority examples, improving recall.

Why this answer

Focal loss is specifically designed to address class imbalance by down-weighting the loss contribution from well-classified examples (majority classes) and focusing training on hard, misclassified examples (minority classes). This directly improves recall on the rarest class, while cross-entropy loss treats all classes equally, causing the model to be biased toward the majority classes.

Exam trap

Cisco often tests the distinction between regularization techniques (dropout, batch size) and loss function modifications (focal loss) for class imbalance, trapping candidates who think overfitting is the primary issue when the real problem is the model's bias toward majority classes.

How to eliminate wrong answers

Option A is wrong because increasing dropout from 0.2 to 0.5 is a regularization technique that reduces overfitting, but the model already achieves 95% overall accuracy, indicating it is not overfitting; this change would likely reduce capacity and hurt recall on the rare class without addressing the imbalance. Option C is wrong because switching from Adam to SGD with momentum changes the optimization dynamics (e.g., learning rate scheduling, convergence speed) but does not directly address the class imbalance problem; it may even slow convergence and fail to improve minority class recall. Option D is wrong because reducing batch size from 64 to 16 increases gradient stochasticity, which can help escape local minima but does not specifically target the imbalance; it may cause training instability and does not re-weight the loss to focus on minority classes.

129
Multi-Selecteasy

A data scientist is training a k-means clustering model on a dataset with 1,000 points. The scientist uses the elbow method to choose the number of clusters. The elbow plot shows a clear bend at k=4. After running k-means with k=4, the scientist wants to evaluate the quality of the clustering. Which THREE of the following are suitable internal clustering validation metrics? (Choose THREE.)

Select 3 answers
A.Adjusted Rand index
B.Rand index
C.Calinski-Harabasz index
D.Silhouette score
E.Davies-Bouldin index
AnswersC, D, E

Ratio of between-cluster variance to within-cluster variance; higher is better.

Why this answer

Silhouette score, Davies-Bouldin index, and Calinski-Harabasz index are all internal validation metrics that do not require ground truth labels. They measure compactness and separation. Rand index and adjusted Rand index require ground truth labels (external validation).

130
MCQeasy

A data scientist is training a linear regression model on a dataset with 50 features. After training, they notice that the model performs well on training data but poorly on test data. They suspect overfitting. Which action should they take to reduce overfitting?

A.Use a larger learning rate
B.Add L2 regularization (Ridge regression)
C.Add more features to the model
D.Increase the number of training epochs
AnswerB

L2 regularization penalizes large coefficients, reducing overfitting.

Why this answer

Regularization is a standard technique to combat overfitting in linear models.

131
MCQeasy

A data scientist is training a linear regression model and observes that the training loss is low but validation loss is high. Which step should the data scientist take to address this issue?

A.Apply L2 regularization to the model
B.Increase the number of training epochs
C.Reduce the size of the training dataset
D.Add more features to the model
AnswerA

Regularization penalizes large weights, reducing overfitting.

Why this answer

Option D is correct because high validation loss indicates overfitting; regularization reduces overfitting. Option A is wrong because adding more features may increase overfitting. Option B is wrong because increasing training time typically increases overfitting.

Option C is wrong because reducing training data may worsen overfitting.

132
Matchingmedium

Match each hyperparameter tuning strategy to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Exhaustive search over specified hyperparameter values

Random sampling of hyperparameter combinations

Probabilistic model to guide search

Early stopping and resource allocation

SageMaker automatic tuning

Why these pairings

These strategies are used to optimize hyperparameters.

133
MCQmedium

A company is deploying a machine learning model for real-time fraud detection. The model must respond within 100ms. Which SageMaker endpoint deployment strategy should be used?

A.Deploy the model to a SageMaker Serverless Inference endpoint.
B.Deploy the model to a SageMaker Real-Time Inference endpoint with a Multi-Model Endpoint configuration.
C.Deploy the model as an AWS Lambda function with an API Gateway trigger.
D.Use SageMaker Batch Transform to process requests in batches.
AnswerB

Multi-Model Endpoints provide low latency and cost efficiency for real-time serving.

Why this answer

Option B is correct because a SageMaker Real-Time Inference endpoint with a Multi-Model Endpoint configuration provides low-latency (sub-100ms) responses by keeping models loaded in memory and routing requests efficiently. This architecture is ideal for real-time fraud detection where multiple models may be needed, and it meets the strict latency requirement without the cold-start overhead of serverless options.

Exam trap

The trap here is that candidates may confuse serverless or Lambda-based solutions as inherently low-latency, overlooking the cold-start penalty and network overhead that make them unsuitable for sub-100ms real-time inference in SageMaker.

How to eliminate wrong answers

Option A is wrong because SageMaker Serverless Inference endpoints have a cold-start latency that can exceed 100ms, especially for infrequent or bursty traffic, making them unsuitable for real-time fraud detection with strict latency constraints. Option C is wrong because deploying as an AWS Lambda function with API Gateway introduces additional network hops and cold-start delays, and Lambda has a maximum execution timeout of 15 minutes but is not optimized for sub-100ms ML inference with large models or frameworks. Option D is wrong because SageMaker Batch Transform is designed for asynchronous, offline processing of large datasets in batches, not for real-time, low-latency inference required by fraud detection.

134
MCQmedium

A data scientist is working on a binary classification problem to predict loan default. The dataset has 200,000 samples and 50 features. The target variable is imbalanced: 5% default, 95% non-default. The scientist trains a logistic regression model and achieves 95% accuracy, but the recall for the default class is only 20%. The business requires that at least 70% of actual defaults be identified (recall >= 0.7). Which approach should the scientist take to improve recall without significantly sacrificing precision?

A.Use random undersampling of the majority class to balance the dataset
B.Use oversampling techniques like SMOTE to create synthetic samples of the minority class
C.Change the decision threshold to 0.3
D.Increase the regularization strength (C) in logistic regression
AnswerB

SMOTE generates synthetic minority samples, helping the model learn better decision boundaries for the minority class, improving recall with less precision loss.

Why this answer

SMOTE (Synthetic Minority Oversampling Technique) creates synthetic samples for the minority class by interpolating between existing minority instances, which increases the representation of the default class in the training data. This directly addresses the low recall (20%) by providing the logistic regression model with more balanced class distributions, enabling it to learn decision boundaries that capture more true positives without discarding majority class information. Unlike simple oversampling, SMOTE reduces overfitting risk by generating novel samples rather than duplicating existing ones, which helps maintain precision while improving recall.

Exam trap

The trap here is that candidates often choose threshold adjustment (Option C) as a quick fix for recall, failing to recognize that it is a superficial change that does not improve the model's learned decision boundary and typically sacrifices precision disproportionately, whereas SMOTE addresses the root cause of imbalance in the training data.

How to eliminate wrong answers

Option A is wrong because random undersampling of the majority class discards 95% of the data (190,000 samples), which can lead to significant information loss and reduced model precision due to a smaller, less representative training set. Option C is wrong because changing the decision threshold to 0.3 is a post-hoc adjustment that does not address the underlying class imbalance; while it may increase recall by classifying more instances as positive, it typically causes a sharp drop in precision as many false positives are introduced. Option D is wrong because increasing regularization strength (lower C value) penalizes model complexity more heavily, which can cause underfitting and further reduce recall by making the decision boundary too simplistic to capture the minority class patterns.

135
MCQeasy

Refer to the exhibit. A data scientist checks the status of a SageMaker endpoint and sees the output above. The endpoint is receiving traffic, but the data scientist notices that the number of instances has not increased to the desired count. What is the most likely reason?

A.The endpoint is performing a rolling update
B.The endpoint is currently being updated
C.The account has reached its instance limit
D.Automatic scaling is not configured for the endpoint
AnswerD

The desired instance count will not be applied automatically without a scaling policy; it's just a target.

Why this answer

Option D is correct because the endpoint is receiving traffic but not scaling out, which indicates that automatic scaling (Application Auto Scaling) has not been configured for the SageMaker endpoint. Without a scaling policy, the endpoint will only use the initial instance count, regardless of traffic load. The status shown does not indicate any update or quota issue, so the lack of scaling is the most likely cause.

Exam trap

AWS often tests the distinction between endpoint status (e.g., 'InService' vs. 'Updating') and scaling configuration, trapping candidates who assume that any traffic increase automatically triggers scaling without an explicit scaling policy.

How to eliminate wrong answers

Option A is wrong because a rolling update would show a status like 'Updating' or 'RollingUpdate', not the current steady state, and would not prevent scaling beyond the desired count. Option B is wrong because if the endpoint were being updated, the status would reflect an 'InService' transition or 'Updating', and the instance count would not remain static at the initial value. Option C is wrong because an instance limit would cause a scaling failure or error message, not simply a failure to increase instances while the endpoint remains healthy and receiving traffic.

136
MCQhard

Refer to the exhibit. A data scientist ran a SageMaker training job using a built-in XGBoost algorithm. The job failed with the error shown. Which step should the data scientist take to fix the issue?

A.Write a custom algorithm that calculates accuracy
B.Remove the metric definition from the training job configuration
C.Change the metric to 'validation:rmse'
D.Use a different built-in algorithm that supports accuracy
AnswerB

XGBoost will use its default metrics (rmse for regression) if not specified, avoiding the error.

Why this answer

The error indicates that SageMaker's built-in XGBoost algorithm does not support a custom metric named 'accuracy' because XGBoost's built-in objective functions (e.g., 'binary:logistic', 'reg:squarederror') do not compute accuracy natively. Removing the metric definition from the training job configuration resolves the issue by allowing SageMaker to use the default metrics that XGBoost does support, such as 'validation:rmse' or 'validation:error'.

Exam trap

AWS often tests the misconception that any metric name can be used with built-in algorithms, when in fact the metric must be one of the predefined strings supported by the algorithm's container (e.g., 'validation:error' for XGBoost classification).

How to eliminate wrong answers

Option A is wrong because writing a custom algorithm to calculate accuracy is unnecessary and over-engineered; the built-in XGBoost already supports accuracy-like metrics (e.g., 'validation:error') if configured correctly, and the error is about an unsupported metric name, not a missing metric calculation. Option C is wrong because 'validation:rmse' is a valid metric for regression tasks, but the error is about an unsupported metric name 'accuracy', and simply changing to 'validation:rmse' does not address the root cause—the metric definition should be removed or corrected to a supported metric like 'validation:error' for classification. Option D is wrong because using a different built-in algorithm is an overreaction; the XGBoost algorithm fully supports classification and can compute error rate (which is 1 - accuracy) via the 'validation:error' metric, so the issue is purely a misconfiguration of the metric name.

137
MCQmedium

Refer to the exhibit. An IAM policy is attached to a SageMaker notebook instance. The data scientist runs a training job that reads from s3://my-bucket/training-data/ and writes to s3://my-bucket/output/. The training job fails with an access denied error. What is the most likely cause?

A.The policy does not allow sagemaker:CreateTrainingJob
B.The policy does not allow s3:PutObject on the output location
C.The policy is missing the sagemaker:InvokeEndpoint action
D.The policy does not allow s3:GetObject on the training data
AnswerB

The s3:PutObject action is restricted to the training-data prefix only.

Why this answer

The policy allows s3:PutObject only for the training-data prefix, not the output prefix. The training job needs write access to the output bucket. Option A is wrong because the policy does include s3:GetObject.

Option B is wrong because SageMaker actions are allowed. Option D is wrong because the policy allows specific actions.

138
MCQmedium

A financial services company is developing a fraud detection model using gradient boosting. The dataset contains 10 million transactions with 0.1% fraudulent. The model is trained on a SageMaker ml.m5.2xlarge instance and takes 8 hours. The team needs to reduce training time without sacrificing model performance. They have permission to use up to 4 instances. What should they do?

A.Switch to a built-in XGBoost with GPU support and use a p3.2xlarge instance
B.Use SageMaker hyperparameter tuning to find faster hyperparameters
C.Use SageMaker's distributed training with data parallelism across 4 ml.m5.2xlarge instances
D.Use SageMaker managed spot training with checkpointing
AnswerA

GPU acceleration can significantly reduce training time for gradient boosting.

Why this answer

GPU instances like p3.2xlarge accelerate XGBoost training substantially.

139
Multi-Selectmedium

Which TWO actions can help reduce overfitting in a neural network? (Choose 2.)

Select 2 answers
A.Increase the number of layers.
B.Decrease the learning rate.
C.Apply L1 or L2 regularization.
D.Increase the training dataset size.
E.Add dropout layers.
AnswersC, E

Regularization penalizes large weights, reducing overfitting.

Why this answer

Option A is correct because dropout randomly drops units, preventing co-adaptation. Option D is correct because L1/L2 regularization penalize large weights. Option B is wrong because adding more layers increases model complexity.

Option C is wrong because increasing training data helps underfitting, not overfitting. Option E is wrong because reducing learning rate may not prevent overfitting.

140
MCQmedium

A data scientist is building a model to predict insurance claim amounts. The target variable is right-skewed with many small claims and a few very large claims. The scientist wants to minimize the impact of outliers. Which loss function or transformation is MOST appropriate?

A.Use mean squared error loss without any transformation
B.Use quantile loss to predict the median
C.Use Poisson loss assuming the target follows a Poisson distribution
D.Apply a log transformation to the target variable
AnswerD

Log transformation reduces skewness and makes the distribution more symmetric, reducing outlier impact.

Why this answer

Using a log transformation or modeling with a log-link function can reduce skewness and impact of outliers. Option A (Mean squared error) is sensitive to outliers. Option B (Quantile loss) is robust but less common for mean prediction.

Option D (Poisson loss) is for count data. Option C (Log transformation of target) is standard for skewed continuous targets.

141
MCQmedium

A data scientist is using principal component analysis (PCA) for dimensionality reduction before training a classifier. The classifier's performance on the test set is poor. What is the most likely cause?

A.The classifier is overfitting
B.The data was not scaled before applying PCA
C.Too few principal components were retained, losing important information
D.Too many principal components were retained, including noise
AnswerC

Discards discriminative features.

Why this answer

C is correct because PCA is an unsupervised dimensionality reduction technique that projects data onto principal components capturing the maximum variance. If too few components are retained, the reduced representation may discard features that are critical for the classifier to distinguish between classes, leading to poor test performance due to underfitting.

Exam trap

AWS often tests the misconception that PCA always improves classifier performance by removing noise, but the trap here is that candidates may overlook the risk of underfitting when too few components are retained, especially when the discarded variance contains critical discriminative features.

How to eliminate wrong answers

Option A is wrong because overfitting would cause high training accuracy but poor test accuracy, whereas the question states the classifier's performance on the test set is poor without mentioning training performance, making underfitting from information loss more likely. Option B is wrong because while scaling is a best practice for PCA (since PCA is sensitive to variances), unscaled data would typically distort component directions and degrade performance, but the most likely cause given poor test performance is retaining too few components, not scaling alone. Option D is wrong because retaining too many components, including noise, would typically lead to overfitting (high variance, poor generalization), but the question's scenario of poor test performance without context of training performance points more directly to underfitting from insufficient components.

142
MCQeasy

A data scientist is using Amazon SageMaker to train a classification model. The dataset contains categorical features with high cardinality. Which encoding method is most appropriate for handling high-cardinality categorical features in a linear model?

A.Target encoding
B.Label encoding
C.One-hot encoding
D.Ordinal encoding
AnswerA

Target encoding replaces categories with the mean of the target variable, reducing dimensionality and capturing predictive power.

Why this answer

One-hot encoding creates many binary columns, which can cause the curse of dimensionality for high-cardinality features. Label encoding assigns arbitrary integers, which linear models may interpret as ordinal. Target encoding (mean encoding) replaces categories with the mean of the target variable, which captures information without expanding dimensionality.

This is often used for high-cardinality features. Ordinal encoding is similar to label encoding.

143
Multi-Selectmedium

Which TWO metrics are appropriate for evaluating a binary classification model trained on imbalanced data? (Select TWO.)

Select 2 answers
A.Log loss
B.F1 score
C.Accuracy
D.Precision-recall curve
E.ROC-AUC
AnswersB, D

F1 balances precision and recall.

Why this answer

The F1 score is appropriate for imbalanced binary classification because it balances precision and recall, making it robust when the positive class is rare. Unlike accuracy, it does not get inflated by a majority negative class, and it directly penalizes models that predict the majority class for all instances.

Exam trap

Cisco often tests the misconception that ROC-AUC is always the best metric for imbalanced data, but the trap here is that ROC-AUC can be misleadingly high when the positive class is rare, whereas precision-recall curve and F1 score better reflect model performance on the minority class.

144
Multi-Selecthard

Which THREE techniques are effective for reducing overfitting in a deep neural network?

Select 3 answers
A.Increasing model complexity
B.Early stopping
C.Dropout
D.Reducing the amount of training data
E.L2 regularization
AnswersB, C, E

Early stopping prevents overfitting by stopping training.

Why this answer

Dropout randomly drops neurons during training, L2 regularization penalizes large weights, and early stopping halts training before overfitting. Increasing model complexity (more layers) would increase overfitting. Data augmentation is also effective but not listed as a separate option; here we have dropout, L2, and early stopping.

So correct are A, B, D. C: Increasing model complexity is opposite. E: Reducing training data would worsen overfitting.

145
MCQeasy

A company wants to deploy a machine learning model that requires very low latency predictions (under 10ms). The model is a small ensemble of decision trees. Which SageMaker deployment option is most suitable?

A.SageMaker Notebook instance
B.AWS Lambda function with the model packaged
C.SageMaker endpoint with a single instance
D.SageMaker Batch Transform
AnswerC

Provides real-time low-latency inference.

Why this answer

C is correct because a SageMaker endpoint with a single instance provides a persistent, real-time inference API that can achieve sub-10ms latency for a small ensemble of decision trees. The endpoint keeps the model loaded in memory and uses synchronous HTTP requests, minimizing cold start and network overhead, which is essential for low-latency predictions.

Exam trap

The trap here is that candidates often confuse batch processing (Batch Transform) with real-time inference, or assume that serverless options like Lambda are always the fastest, ignoring cold start and timeout constraints.

How to eliminate wrong answers

Option A is wrong because a SageMaker Notebook instance is an interactive development environment, not a deployment target; it cannot serve real-time predictions with a stable endpoint. Option B is wrong because AWS Lambda has a maximum execution timeout of 15 minutes and a cold start latency that often exceeds 10ms, especially when loading a model package; it is designed for short, stateless functions, not persistent low-latency inference. Option D is wrong because SageMaker Batch Transform is an asynchronous, batch processing service that processes large datasets offline; it does not provide real-time endpoints and has no latency guarantee under 10ms.

146
MCQeasy

A data scientist is training a binary classifier on an imbalanced dataset (95% negative, 5% positive). The model achieves 99% accuracy but only correctly identifies 2% of the positive samples. Which metric should the data scientist focus on to improve the model's performance?

A.Precision
B.RMSE
C.Recall
D.Accuracy
AnswerC

Recall measures the proportion of actual positives correctly identified.

Why this answer

Option B is correct because recall measures the proportion of actual positives correctly identified, which is critical for imbalanced datasets. Option A is wrong because accuracy is misleading when classes are imbalanced. Option C is wrong because RMSE is for regression.

Option D is wrong because precision does not directly address the low identification of positives.

147
MCQmedium

A data scientist is training a binary classification model on a highly imbalanced dataset (0.1% positive class). To improve recall, the team decides to use SageMaker's built-in XGBoost algorithm. Which parameter adjustment is most likely to increase recall without significantly sacrificing precision?

A.Increase max_depth from 5 to 10
B.Reduce num_round from 100 to 50
C.Increase subsample from 0.8 to 1.0
D.Set scale_pos_weight to the ratio of negative to positive samples
AnswerD

scale_pos_weight adjusts class weights to focus on the minority class, improving recall.

Why this answer

Setting scale_pos_weight to the ratio of negative to positive samples (approximately 999:1) tells XGBoost to assign a higher penalty to misclassifications of the minority positive class. This directly increases the gradient contribution from positive samples during training, which shifts the decision boundary to improve recall while maintaining a balance that avoids excessive false positives, thus preserving precision.

Exam trap

Cisco often tests the misconception that simply increasing model complexity (max_depth) or data usage (subsample) will fix imbalance, when the correct approach is to use a class-weighting parameter like scale_pos_weight that directly addresses the skewed gradient contributions.

How to eliminate wrong answers

Option A is wrong because increasing max_depth from 5 to 10 makes the model more complex and prone to overfitting, which can actually hurt generalization and may not specifically target recall improvement for the minority class. Option B is wrong because reducing num_round from 100 to 50 decreases the number of boosting iterations, which typically reduces model capacity and can lower recall by underfitting the minority class patterns. Option C is wrong because increasing subsample from 0.8 to 1.0 uses all training data for each tree, which reduces randomness and can increase overfitting without addressing class imbalance; it does not directly influence recall for the positive class.

148
MCQeasy

A data scientist is building a model to predict customer churn. The dataset includes both numerical features (e.g., account age, usage minutes) and categorical features (e.g., region, plan type). The data scientist wants to use a linear classifier. Which feature engineering step is required before training?

A.Normalize numerical features
B.Impute missing values
C.Remove outliers
D.One-hot encode categorical features
AnswerD

Linear models require numerical input; one-hot encoding converts categories to binary vectors.

Why this answer

Linear classifiers (e.g., logistic regression, linear SVM) require numerical input and cannot directly process categorical text labels. One-hot encoding converts each categorical feature into binary indicator columns, allowing the linear model to learn separate weights for each category. Without this step, the model would either fail to train or treat categorical strings as ordinal values, which is mathematically invalid for linear decision boundaries.

Exam trap

The trap here is that candidates may assume normalization (A) is the most critical step for linear models, overlooking that categorical features must be converted to numerical form before any linear classifier can process them.

How to eliminate wrong answers

Option A is wrong because normalizing numerical features is beneficial for convergence speed and weight interpretation but is not strictly required before training a linear classifier; many implementations handle unscaled data. Option B is wrong because imputing missing values is a data cleaning step that may be necessary but is not specific to the requirement of using a linear classifier with categorical features. Option C is wrong because removing outliers is a data preprocessing technique that can improve model robustness but is not a mandatory step for linear classifiers to function with categorical data.

149
MCQhard

A bank is building a credit risk model using a large dataset with 500 features and 2 million samples. The dataset contains many categorical features with high cardinality (e.g., zip code, occupation). The model must be deployed on SageMaker and provide real-time predictions with low latency. They also need to explain individual predictions for regulatory compliance. Which approach is most appropriate?

A.Use a linear model with target encoding for categorical features and deploy with SageMaker's built-in linear learner algorithm
B.Use a deep neural network with embedding layers for categorical features and use SageMaker's built-in Debugger for explanations
C.Use XGBoost with one-hot encoding for categorical features and deploy with SageMaker's built-in SHAP explainer
D.Use a gradient boosting model with ordinal encoding for categorical features and use SageMaker's built-in XGBoost with SHAP
AnswerD

Ordinal encoding handles high cardinality without explosion; XGBoost captures interactions; SHAP provides explanations.

Why this answer

XGBoost with ordinal encoding and SHAP balances performance, latency, and explainability.

150
MCQeasy

A data scientist is using Amazon SageMaker to train a model. The training job is taking longer than expected. The scientist wants to reduce training time without changing the algorithm or the hardware. Which action is most likely to help?

A.Increase the batch size used during training.
B.Add regularization to the loss function.
C.Use data augmentation to increase the dataset size.
D.Reduce the number of training epochs.
AnswerA

Increasing batch size reduces the number of iterations per epoch, speeding up training. It may require tuning the learning rate, but it is a common technique to reduce training time.

Why this answer

Using data augmentation increases the dataset size, which would increase training time. Increasing the batch size can speed up training because it processes more samples per step, but it may affect convergence. Reducing the number of epochs reduces the number of passes over the data, directly reducing training time.

However, this might affect model quality. Among the options, reducing epochs is a direct way to reduce time. But note: increasing batch size can also reduce time, but it's not guaranteed to be safe for model quality.

The question says 'without changing the algorithm or the hardware', and asks for 'most likely to help'. Reducing epochs is straightforward. Data augmentation increases time.

Changing optimizer could be considered changing algorithm. Adding regularization does not reduce time.

← PreviousPage 2 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Modeling questions.