Knowledge + Practice

CCNA Modeling Questions

75 of 624 questions · Page 3/9 · Modeling · Answers revealed

Practice these questions Domain overview All questions

151

MCQhard

A machine learning engineer is using Amazon SageMaker to train a deep learning model. The training job is taking longer than expected. The engineer notices that the GPU utilization is low (around 30%) while CPU utilization is high. Which action is most likely to improve training speed?

A.Increase the number of data loading workers

B.Use a smaller instance type with fewer GPUs

C.Decrease the number of data loading workers

D.Increase the batch size

AnswerA

More workers can parallelize data loading and reduce I/O bottleneck, improving GPU utilization.

Why this answer

Low GPU utilization with high CPU utilization suggests a data loading bottleneck. Increasing the number of data loading workers keeps the GPU fed. Reducing batch size or using a smaller instance would not help.

Using Pipe mode (streaming) might help but not as directly as increasing workers.

Practice this question →

152

MCQhard

A data scientist trains a gradient boosting model on a large dataset using SageMaker. The training completes successfully, but when deploying the model to a real-time endpoint, inference latency is too high. Which change is MOST likely to reduce latency without significant accuracy loss?

A.Use a larger instance type for the endpoint

B.Prune the trees by removing nodes with low importance

C.Increase the number of trees in the ensemble

D.Use SageMaker Batch Transform instead of real-time

AnswerB

Pruning reduces model size and inference time.

Why this answer

Pruning trees by removing nodes with low importance reduces the model's complexity, which directly decreases inference latency because fewer decision paths need to be evaluated. In gradient boosting, this can be done with minimal accuracy loss if the removed nodes correspond to splits that contribute little to the overall prediction, as measured by feature importance or gain.

Exam trap

The trap here is that candidates often confuse scaling the endpoint (Option A) as the primary fix for latency, when the real issue is model complexity that can be reduced through pruning without significant accuracy loss.

How to eliminate wrong answers

Option A is wrong because using a larger instance type may reduce latency through more CPU/memory, but it does not address the root cause of high latency from model complexity and increases cost; it is a scaling workaround, not a model optimization. Option C is wrong because increasing the number of trees in the ensemble would increase model size and inference computation, making latency worse, not better. Option D is wrong because SageMaker Batch Transform is designed for offline, asynchronous inference on large datasets and does not provide real-time endpoints; switching to batch transform would not meet the requirement for a real-time endpoint and introduces significant latency for individual predictions.

Practice this question →

153

MCQhard

A data scientist notices that a linear regression model trained on a dataset has high variance. The model performs well on the training data but poorly on the test data. Which action is most likely to reduce the variance?

A.Decrease the amount of training data

B.Apply L2 regularization to the model

C.Increase the number of gradient descent iterations

D.Add more features to the model

AnswerB

L2 regularization shrinks coefficients and reduces model complexity, thereby reducing variance.

Why this answer

High variance indicates the model is overfitting to the training data. L2 regularization (ridge regression) adds a penalty proportional to the square of the magnitude of the coefficients, which shrinks them toward zero. This reduces the model's sensitivity to noise in the training data, thereby lowering variance and improving generalization to the test set.

Exam trap

Cisco often tests the bias-variance tradeoff by making candidates confuse regularization with optimization steps or feature engineering, so the trap here is assuming that more training data or more iterations always improve model performance without considering their effect on variance.

How to eliminate wrong answers

Option A is wrong because decreasing the amount of training data typically increases variance, as the model has fewer examples to learn from and is more likely to overfit. Option C is wrong because increasing gradient descent iterations does not reduce variance; it only ensures the optimization converges to a minimum, which may even worsen overfitting if the model is already complex. Option D is wrong because adding more features increases model complexity, which generally raises variance and exacerbates overfitting, not reduces it.

Practice this question →

154

MCQeasy

A data scientist is training a linear regression model and notices high bias in the training set. What action is most likely to reduce bias?

A.Apply L1 regularization.

B.Increase the learning rate.

C.Increase the amount of training data.

D.Add more relevant features to the model.

AnswerD

Adding features increases model capacity, which can reduce high bias.

Why this answer

High bias indicates that the model is underfitting the training data, meaning it is too simple to capture the underlying patterns. Adding more relevant features increases the model's capacity to learn complex relationships, directly reducing bias. This is a standard approach in linear regression to address underfitting.

Exam trap

The trap here is that candidates confuse high bias with high variance and incorrectly choose increasing training data (Option C) or regularization (Option A), which are solutions for overfitting, not underfitting.

How to eliminate wrong answers

Option A is wrong because L1 regularization (Lasso) reduces overfitting by shrinking coefficients to zero, which increases bias rather than reducing it. Option B is wrong because increasing the learning rate affects the convergence speed of gradient descent, not the model's bias; it may cause divergence or oscillation. Option C is wrong because increasing the amount of training data helps reduce variance (overfitting) but does not address high bias; with high bias, the model is already too simple to fit the data well.

Practice this question →

155

Matchingmedium

Match each AWS AI service to its capability.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Natural language processing

Language translation

Text-to-speech

Speech-to-text

Conversational chatbots

Why these pairings

These are AWS AI services for various NLP tasks.

Practice this question →

156

MCQmedium

Refer to the exhibit. An IAM policy is attached to a SageMaker notebook instance role. When the data scientist tries to run a training job that writes model artifacts to 's3://my-bucket/models/', the job fails with an access denied error. What is the MOST likely cause?

A.The IAM role does not have a trust policy

B.Missing s3:PutObject permission for the output S3 bucket

C.The policy does not include any S3 actions

D.The sagemaker:CreateTrainingJob action is not allowed on the specific resource

AnswerB

Write access is needed for model artifacts.

Why this answer

The error occurs because the IAM policy attached to the SageMaker notebook instance role does not grant the s3:PutObject permission on the 's3://my-bucket/models/' path. SageMaker training jobs require this permission to write model artifacts to the specified S3 output bucket. Without it, the API call to upload the model fails with an access denied error, even if other S3 actions are allowed.

Exam trap

The trap here is that candidates often assume the error is due to a missing trust policy or a missing sagemaker:CreateTrainingJob permission, but the actual failure is at the S3 write step, which requires explicit s3:PutObject on the output bucket.

How to eliminate wrong answers

Option A is wrong because a trust policy is not required for the SageMaker notebook instance role to assume itself; trust policies are needed for cross-account access or service-to-service role assumption, not for the role's own permissions. Option C is wrong because the statement says the policy is attached to the role, and while it may include S3 actions, the specific s3:PutObject action is missing for the output bucket; the problem is not the absence of all S3 actions but the missing write permission. Option D is wrong because the sagemaker:CreateTrainingJob action is allowed on the SageMaker resource (the notebook role has permissions to create training jobs), but the failure occurs at the S3 write step, not at the training job creation step.

Practice this question →

157

Multi-Selectmedium

A company is using Amazon SageMaker to train an XGBoost model. The training data contains missing values. Which TWO methods can XGBoost handle missing values internally?

Select 2 answers

A.Use surrogate splits to handle missing values.

B.Drop rows with missing values.

C.Learn the best direction to go when a value is missing.

D.Treat missing values as a separate category.

E.Impute missing values with the mean of the feature.

AnswersC, D

XGBoost uses a sparsity-aware algorithm that learns the optimal split direction for missing values.

Why this answer

XGBoost can handle missing values by learning the best direction to go when a value is missing (sparsity-aware algorithm). You don't need to impute or drop. However, the question asks 'which TWO methods can XGBoost handle missing values internally?' The correct answer is that XGBoost can treat missing values as a separate category and learn the optimal split direction.

Also, you can set a default direction. But the options: A (impute with mean) is not internal; B (drop rows) is not; C (treat missing as a separate category) is correct; D (learn best split direction) is correct; E (use surrogate splits) is not XGBoost default. So C and D.

Practice this question →

158

Multi-Selecthard

A company uses SageMaker to train a model. The training job fails with 'ResourceLimitExceeded' error. Which TWO actions should the company take to resolve this?

Select 2 answers

A.Launch the training job in a different AWS region.

B.Use a different instance type that is not at its limit.

C.Use SageMaker Managed Spot Training to reduce cost.

D.Compress the training data to reduce storage requirements.

E.Request a service limit increase for SageMaker training job resources.

AnswersB, E

Different instance types may have separate limits.

Why this answer

Option B is correct because the 'ResourceLimitExceeded' error indicates that the requested instance type has reached its concurrent usage limit in the current AWS region. Switching to a different instance type that is not at its limit allows the training job to proceed without exceeding the service quota. Option E is correct because requesting a service limit increase for SageMaker training job resources directly raises the cap on the number of concurrent instances or total instance count, resolving the underlying quota issue.

Exam trap

The trap here is that candidates confuse 'ResourceLimitExceeded' with cost or storage issues, leading them to select Managed Spot Training or data compression, which do not address the underlying AWS service quota limit.

Practice this question →

159

MCQeasy

A data scientist is using Amazon SageMaker to train a deep learning model with a large dataset. The training job fails with a 'CUDA out of memory' error. What is the MOST efficient way to resolve this issue?

A.Switch to a CPU-only instance

B.Use a larger instance type with more GPUs

C.Increase the batch size

D.Reduce the batch size

AnswerD

Smaller batch size reduces memory consumption per GPU.

Why this answer

The 'CUDA out of memory' error occurs when the GPU's memory is insufficient to hold the model parameters, gradients, optimizer states, and the current batch of data. Reducing the batch size decreases the memory footprint per training step, allowing the model to fit within the available GPU memory without requiring a more expensive instance or sacrificing GPU acceleration.

Exam trap

AWS often tests the misconception that 'more resources' (larger instance or more GPUs) is always the best fix, when in fact adjusting hyperparameters like batch size is the most efficient and cost-effective first step.

How to eliminate wrong answers

Option A is wrong because switching to a CPU-only instance would eliminate GPU acceleration entirely, drastically slowing training for deep learning workloads, and does not address the root cause of memory pressure. Option B is wrong because using a larger instance with more GPUs is an expensive overprovisioning solution that does not optimize resource usage; it may also introduce additional complexity with multi-GPU data parallelism. Option C is wrong because increasing the batch size would increase GPU memory consumption, exacerbating the out-of-memory error rather than resolving it.

Practice this question →

160

Multi-Selectmedium

A data scientist is building a regression model to predict house prices. The dataset contains 10 features, including 'number_of_bedrooms' and 'square_footage'. The scientist observes that the model has high variance. Which TWO actions are most appropriate to reduce overfitting? (Choose TWO.)

Select 2 answers

A.Reduce model complexity by using a simpler model

B.Add L2 regularization to the model

C.Increase the number of training epochs

D.Decrease the amount of training data

E.Add more polynomial features

AnswersA, B

Simpler models have lower variance.

Why this answer

Options A and C are correct. Adding L2 regularization penalizes large weights, reducing variance. Reducing model complexity (e.g., using a simpler model) also reduces overfitting.

Option B is wrong because adding more features increases complexity. Option D is wrong because increasing training epochs may lead to more overfitting. Option E is wrong because decreasing training data increases variance.

Practice this question →

161

MCQmedium

A data science team is training a binary classification model using Amazon SageMaker. The dataset is highly imbalanced (95% negative class, 5% positive class). The team wants to maximize the F1 score. Which built-in SageMaker algorithm is most appropriate?

A.Linear Learner

B.XGBoost

C.PCA

D.K-Means

AnswerB

XGBoost has scale_pos_weight parameter to handle imbalance and can optimize for F1.

Why this answer

XGBoost supports scale_pos_weight to handle class imbalance, directly optimizing for F1. Linear Learner with balanced class weights can also help but typically optimizes log loss. K-Means is unsupervised.

PCA is for dimensionality reduction.

Practice this question →

162

Multi-Selectmedium

A data scientist is building a binary classifier using logistic regression. The dataset has 10 features and 100,000 observations. The model achieves 99% accuracy on the test set, but the precision is 50% and recall is 90%. Which TWO actions should the data scientist take to improve model performance? (Choose 2.)

Select 2 answers

A.Increase the regularization strength (C) in logistic regression.

B.Adjust the decision threshold to increase precision at the cost of recall.

C.Use a random forest classifier instead of logistic regression.

D.Collect more training data.

E.Remove features that have low correlation with the target.

AnswersB, C

Lowering threshold increases recall; raising threshold increases precision.

Why this answer

Precision is low, recall is high. To improve precision, the data scientist can adjust the decision threshold or use a different algorithm. Option A (increasing regularization) may help but is not direct.

Option D (using class weights) addresses imbalance. Option C (collecting more data) may not help. Option E (changing to random forest) could improve precision.

Option B (removing features) might hurt.

Practice this question →

163

MCQmedium

A data scientist is using Amazon SageMaker to perform hyperparameter tuning for a neural network. The tuning job uses the 'Random' search strategy. After 10 training jobs, the best objective metric has plateaued. The scientist wants to improve the results without increasing the total number of training jobs. Which approach should they take?

A.Use a different objective metric that is easier to optimize

B.Normalize the input features to have zero mean and unit variance

C.Increase the maximum number of training jobs

D.Switch the hyperparameter tuning strategy to 'Bayesian'

AnswerD

Bayesian optimization uses past trials to inform future hyperparameter choices, often converging faster.

Why this answer

Switching to Bayesian search (e.g., 'Bayesian' strategy) is more efficient because it uses past results to choose the next hyperparameters, potentially finding better values in fewer jobs. Increasing the number of jobs would increase cost. Random search might get lucky but is less efficient.

Changing the objective metric or scaling features would not directly improve the tuning process.

Practice this question →

164

Multi-Selecthard

A company is using SageMaker to train a TensorFlow model for image classification. The training is slow on a single GPU instance. Which TWO strategies can reduce training time? (Choose TWO.)

Select 2 answers

A.Increase the image size

B.Use SageMaker Pipe Mode for data ingestion

C.Increase the number of training epochs

D.Use distributed training with multiple GPUs

E.Decrease the batch size

AnswersB, D

Pipe Mode streams data directly to the training container, reducing I/O time.

Why this answer

Options B and D are correct. Using SageMaker Pipe Mode streams data from S3, reducing download time. Using multiple GPUs (distributed training) parallelizes computation.

Option A (batch size decrease) may slow training. Option C (larger images) increases computation. Option E (more epochs) increases training time.

Practice this question →

165

MCQhard

A machine learning team trains a deep learning model on SageMaker. The training job uses a single ml.p3.2xlarge instance and takes 12 hours. The team needs to reduce training time without changing the algorithm. Which approach is most effective?

A.Increase the learning rate

B.Switch to a larger instance type, such as ml.p3.16xlarge

C.Use managed Spot Training

D.Use SageMaker's distributed data parallelism across multiple instances

AnswerD

Distributed data parallelism scales training across GPUs, reducing wall-clock time.

Why this answer

Using SageMaker's distributed data parallelism (e.g., with SageMaker distributed training libraries) across multiple GPUs can significantly reduce training time by splitting the mini-batches across GPUs. Increasing instance type to a single larger GPU (e.g., p3.16xlarge) helps but is less effective than multi-GPU distribution. Hyperparameter tuning doesn't directly reduce training time.

Spot instances may interrupt.

Practice this question →

166

MCQhard

A data scientist ran an XGBoost training job in SageMaker and it failed with the error shown in the exhibit. Which hyperparameter change is most likely to resolve the numeric overflow?

A.Reduce max_depth to a lower value

B.Increase subsample

C.Increase eta (learning rate)

D.Increase num_round

AnswerA

Numeric overflow often occurs when trees are too deep, leading to large leaf weights. Reducing depth prevents this.

Why this answer

Option A is correct because reducing max_depth prevents trees from growing too deep, which can cause numeric overflow. Option B is wrong because increasing eta helps convergence but not overflow. Option C is wrong because increasing num_round may worsen overflow.

Option D is wrong because subsample doesn't affect depth.

Practice this question →

167

Multi-Selectmedium

Which THREE of the following are best practices for training deep learning models on Amazon SageMaker?

Select 3 answers

A.Disable automatic scaling to avoid interruptions

B.Use SageMaker Debugger to profile system bottlenecks

C.Use Pipe mode for training data stored in S3 to reduce startup time

D.Always use the largest instance type available for faster training

E.Use managed spot training to reduce cost

AnswersB, C, E

Debugger provides insights into GPU utilization and I/O bottlenecks.

Why this answer

SageMaker Debugger is a best practice because it provides real-time profiling of system bottlenecks such as CPU/GPU utilization, memory I/O, and network throughput during training. This allows you to identify and resolve performance issues early, optimizing training efficiency and cost. It integrates directly with SageMaker's training jobs without requiring code changes.

Exam trap

The trap here is that candidates may confuse 'avoiding interruptions' with disabling automatic scaling, when in fact automatic scaling is designed to prevent interruptions by dynamically adjusting capacity, and disabling it increases the risk of failures.

Practice this question →

168

MCQmedium

A team is using SageMaker to train a model with hyperparameter tuning. The training jobs are taking too long. The team wants to reduce time without sacrificing model quality. Which approach should they take?

A.Use random search instead of Bayesian optimization.

B.Enable early stopping in the hyperparameter tuning job.

C.Increase the maximum number of training jobs.

D.Reduce the maximum runtime per training job.

AnswerB

Early stops poorly performing training jobs, saving time.

Why this answer

Option A is correct because early stopping terminates poor performing jobs early, saving time. Option B is wrong because reducing max runtime may not allow convergence. Option C is wrong because increasing max jobs would increase time.

Option D is wrong because random search is faster but may miss optimal hyperparameters; however, early stopping directly reduces time on bad trials.

Practice this question →

169

MCQhard

A machine learning engineer is training a deep learning model using the SageMaker built-in XGBoost algorithm. The training job is taking longer than expected. The engineer notices that the training data is stored in S3 in CSV format and is 500 GB in size. The instance type is ml.c4.8xlarge with 10 instances. Which change would most likely reduce training time?

A.Convert the data to Parquet format.

B.Increase the number of instances to 20.

C.Use Pipe input mode instead of File input mode.

D.Increase the size of the EBS volume attached to each instance.

AnswerC

Pipe mode streams data directly, reducing I/O bottleneck.

Why this answer

Pipe input mode streams data directly from S3 to the training instances without first downloading it to the local EBS volume, eliminating the I/O bottleneck of reading a 500 GB CSV file. This reduces the time spent on data loading and allows the XGBoost algorithm to begin training sooner, which is especially beneficial for large datasets.

Exam trap

Cisco often tests the distinction between data format optimization (Parquet) and data ingestion mode (Pipe vs. File), where candidates mistakenly choose a format change without recognizing that the primary bottleneck is the data transfer mechanism, not the storage format.

How to eliminate wrong answers

Option A is wrong because converting to Parquet format would reduce storage size and improve read efficiency, but the primary bottleneck here is the data transfer time from S3 to the instances, not the format overhead; Pipe mode addresses the transfer bottleneck more directly. Option B is wrong because increasing the number of instances to 20 would add more parallelism but also increase the overhead of data distribution and coordination, and the training job is already bottlenecked by data ingestion, not compute capacity. Option D is wrong because increasing the EBS volume size does not improve I/O throughput for reading from S3; the data must still be downloaded from S3 to the EBS volume, so the bottleneck remains.

Practice this question →

170

MCQmedium

A data scientist builds a Random Forest model using SageMaker. The model performs well on training data but poorly on test data. Which step is most likely to reduce overfitting?

A.Reduce the maximum depth of each tree

B.Increase the number of trees

C.Switch to a linear model

D.Increase the number of features considered at each split

AnswerA

Shallower trees reduce model complexity and help prevent overfitting.

Why this answer

Reducing the maximum depth of each tree limits the complexity of individual decision trees, preventing them from memorizing noise and specific patterns in the training data. This directly addresses overfitting by enforcing simpler, more generalized splits, which improves performance on unseen test data.

Exam trap

The trap here is that candidates often assume adding more trees (Option B) always improves generalization, but they miss that overfitting in Random Forest is primarily caused by individual trees being too deep, not by the ensemble size.

How to eliminate wrong answers

Option B is wrong because increasing the number of trees in a Random Forest does not reduce overfitting; it typically reduces variance and improves generalization, but if trees are already deep and overfit, more trees will still produce overfit predictions. Option C is wrong because switching to a linear model is an extreme and unnecessary step; Random Forest can be regularized effectively by tuning hyperparameters like max_depth, and a linear model may underfit if the data has non-linear relationships. Option D is wrong because increasing the number of features considered at each split increases tree diversity but also allows each tree to potentially overfit to more features, especially if the features are noisy or irrelevant, thus not reducing overfitting.

Practice this question →

171

MCQmedium

A team is training a large language model using SageMaker's distributed training. They notice that the training loss is not decreasing after the first few epochs. Which action is MOST likely to resolve this issue?

A.Increase the batch size

B.Add L2 regularization

C.Reduce the learning rate

D.Switch from Adam to SGD optimizer

AnswerC

A high learning rate can cause the loss to stall; reducing it allows finer updates.

Why this answer

A learning rate that is too high can cause the loss to plateau or diverge. Reducing the learning rate often helps. Increasing batch size may stabilize training but not directly address plateau.

Switching optimizer or adding regularization may help but are less direct.

Practice this question →

172

MCQmedium

A team trained a multiclass classification model using SageMaker built-in XGBoost. The model's accuracy is high, but for a specific class, recall is very low. The team wants to improve recall for that class without significant accuracy drop. Which approach is MOST effective?

A.Add more training data from all classes

B.Resample the training data to balance the class representation

C.Increase the max_depth hyperparameter of XGBoost

D.Switch from XGBoost to a linear learner

AnswerB

Resampling addresses class imbalance, improving recall for minority class.

Why this answer

Option B is correct because resampling the training data to balance class representation directly addresses the root cause of low recall for a specific class in a multiclass XGBoost model. XGBoost's built-in objective functions (e.g., 'multi:softmax') optimize for overall accuracy, which can bias the model toward majority classes; resampling (e.g., oversampling the minority class or undersampling the majority) forces the model to learn decision boundaries that better capture the minority class, improving recall without drastically reducing overall accuracy.

Exam trap

The trap here is that candidates often assume increasing model complexity (max_depth) or switching algorithms will fix class imbalance, when in fact the most effective and direct approach is to rebalance the training data through resampling.

How to eliminate wrong answers

Option A is wrong because adding more data from all classes does not specifically target the underrepresented class; it may even worsen the imbalance if the new data is also skewed, and it does not guarantee improved recall for the minority class. Option C is wrong because increasing max_depth can lead to overfitting, which might temporarily boost recall on training data but often degrades generalization and overall accuracy, and it does not systematically address class imbalance. Option D is wrong because switching from XGBoost to a linear learner (e.g., LinearLearner in SageMaker) assumes linear separability, which is rarely true for complex multiclass problems; linear models typically have lower capacity to model minority class patterns and often yield worse recall than tree-based methods like XGBoost.

Practice this question →

173

MCQeasy

A data scientist is training a model and wants to monitor training progress. Which AWS service can be used to track metrics like loss and accuracy in real time?

A.Amazon SageMaker Ground Truth

B.Amazon SageMaker Automatic Model Tuning

C.AWS Glue

D.Amazon CloudWatch

AnswerD

CloudWatch can monitor custom metrics.

Why this answer

Amazon CloudWatch is the correct service because it provides real-time monitoring of metrics such as loss and accuracy during model training. When using SageMaker, training jobs automatically emit metrics to CloudWatch via the CloudWatch agent, allowing you to view logs and set alarms on metric thresholds in near real-time.

Exam trap

The trap here is that candidates may confuse Amazon SageMaker Automatic Model Tuning (which orchestrates hyperparameter searches) with a monitoring service, but it does not provide real-time metric tracking itself—only CloudWatch does.

How to eliminate wrong answers

Option A is wrong because Amazon SageMaker Ground Truth is a data labeling service, not a monitoring tool for training metrics. Option B is wrong because Amazon SageMaker Automatic Model Tuning (hyperparameter tuning) launches training jobs with different hyperparameters but does not itself track real-time metrics; it relies on CloudWatch for that. Option C is wrong because AWS Glue is a serverless data integration and ETL service, not designed for real-time metric tracking during model training.

Practice this question →

174

MCQhard

A data scientist is tuning a gradient boosting model using SageMaker automatic model tuning. The hyperparameter 'num_round' ranges from 50 to 500. The tuning job uses 'ObjectiveMetric' = 'validation:auc'. After 50 training jobs, the best objective value is 0.95. The data scientist suspects overfitting. What should the data scientist do?

A.Increase 'max_depth' to capture more complex patterns.

B.Add an early stopping round and increase the range for regularization hyperparameters like 'gamma' and 'lambda'.

C.Increase 'num_round' to 1000 and keep other hyperparameters unchanged.

D.Decrease the range of 'num_round' to 10-100.

AnswerB

Early stopping prevents overfitting; regularization penalizes complexity.

Why this answer

Increasing early stopping rounds and adding regularization (like gamma or lambda) helps reduce overfitting. Lowering learning rate with more rounds can also help. Option A (decreasing rounds) might underfit.

Option C (increasing max_depth) worsens overfitting. Option D (increasing num_round) with no regularization may overfit more.

Practice this question →

175

Multi-Selecteasy

Which TWO techniques are used for feature scaling? (Choose 2.)

Select 2 answers

A.One-hot encoding

B.Standardization (Z-score normalization)

C.Min-Max scaling

D.Principal Component Analysis (PCA)

E.Label encoding

AnswersB, C

Standardization scales features to have mean 0 and variance 1.

Why this answer

Standardization (Z-score normalization) is a feature scaling technique that transforms data to have a mean of 0 and a standard deviation of 1, using the formula z = (x - μ) / σ. This is essential for algorithms like SVM, k-means, and PCA that assume normally distributed features and are sensitive to feature magnitudes.

Exam trap

Cisco often tests the distinction between feature scaling techniques (which transform numerical feature values) and encoding or dimensionality reduction techniques, leading candidates to mistakenly select one-hot encoding or PCA as scaling methods.

Practice this question →

176

MCQeasy

A company is using Amazon SageMaker to deploy a model for real-time inference. The model receives requests with varying payload sizes. The company observes occasional latency spikes. Which feature can help mitigate this?

A.Multi-model endpoints

B.Amazon Elastic Inference

C.Automatic scaling

D.Amazon SageMaker Inference Recommender

AnswerD

Inference Recommender runs benchmarks to recommend optimal instance and endpoint configuration.

Why this answer

SageMaker Inference Recommender provides load testing and recommendations for instance type and endpoint configuration. It can help identify optimal settings to reduce latency spikes. Multi-model endpoints are for hosting multiple models, not directly for latency spikes.

Elastic Inference is for accelerating deep learning inference, not general latency. Automatic scaling adjusts capacity but not per-request latency.

Practice this question →

177

MCQmedium

A data scientist is training a regression model. The training loss is decreasing but the validation loss starts to increase after a few epochs. Which technique should the scientist use to address this issue?

A.Decrease the batch size.

B.Implement early stopping based on validation loss.

C.Add more layers to the model.

D.Increase the learning rate.

AnswerB

Stops training before overfitting.

Why this answer

Option B is correct because early stopping halts training when validation loss stops improving, preventing overfitting. Option A is wrong because adding more layers may worsen overfitting. Option C is wrong because increasing learning rate can cause divergence.

Option D is wrong because decreasing batch size may increase noise but not directly address overfitting.

Practice this question →

178

MCQhard

A research lab is training a large language model (LLM) on SageMaker using PyTorch. The model has 1 billion parameters and does not fit on a single GPU. They have access to a cluster of 16 p4d.24xlarge instances (each with 8 A100 GPUs). They need to train the model with minimal changes to the training script. Which SageMaker feature should they use?

A.SageMaker's model parallelism with automatic partitioning

B.SageMaker's distributed data parallelism with Horovod

C.Use SageMaker's built-in BlazingText algorithm

D.SageMaker's managed spot training with checkpointing

AnswerA

Model parallelism splits the model across GPUs, and SageMaker's library automates this.

Why this answer

SageMaker's model parallelism is designed for large models that don't fit on a single device.

Practice this question →

179

MCQeasy

A company uses Amazon SageMaker to host a model for real-time predictions. The model endpoint is experiencing high latency during peak hours. The data scientist wants to reduce latency without increasing cost. Which action should they take?

A.Enable data capture for the endpoint to log requests

B.Switch to a larger instance type

C.Reduce the number of instances behind the endpoint

D.Enable auto-scaling for the endpoint based on latency metrics

AnswerD

Auto-scaling adjusts capacity to demand, maintaining low latency without over-provisioning.

Why this answer

Using SageMaker's production variants with auto-scaling can help handle traffic spikes without over-provisioning, thus managing latency and cost. Switching to a larger instance would increase cost. Reducing the number of instances would increase latency.

Enabling data capture adds overhead and increases latency.

Practice this question →

180

Multi-Selecteasy

Which TWO of the following are common techniques to handle missing values in a dataset?

Select 2 answers

A.Standardization

B.Principal Component Analysis (PCA)

C.One-hot encoding

D.Remove rows with missing values

E.Imputation with mean or median

AnswersD, E

Removing rows is a simple approach.

Why this answer

Options A and B are correct. A is correct because imputation with mean/median fills missing values. B is correct because removing rows with missing values is a valid approach.

C is wrong because one-hot encoding is for categorical data, not missing values. D is wrong because PCA is for dimensionality reduction, not missing value handling. E is wrong because standardization is for scaling, not missing values.

Practice this question →

181

MCQhard

A team is using SageMaker to train a custom PyTorch model on a large dataset (10 TB) stored in S3. The training job is repeatedly failing due to 'OutOfMemory' errors on the GPU. The team is using a single ml.p3.8xlarge instance. Which change is most likely to resolve the issue?

A.Change the instance type to ml.p3.16xlarge (more GPUs)

B.Use managed spot training to reduce cost

C.Reduce the batch size in the training script

D.Switch the input mode from Pipe to File

AnswerC

Reducing batch size decreases GPU memory usage per step, resolving OOM errors.

Why this answer

The 'OutOfMemory' error on the GPU indicates that the model and its associated data exceed the available GPU memory. Reducing the batch size directly decreases the memory footprint per training step, allowing the model to fit within the GPU's memory limits. This is the most direct and effective fix for GPU OOM errors, as it reduces the amount of data processed simultaneously without changing the instance type or input mode.

Exam trap

Cisco often tests the misconception that adding more GPUs (Option A) solves per-GPU memory issues, but the OOM error is per-device and requires reducing per-device memory usage, not increasing the number of devices.

How to eliminate wrong answers

Option A is wrong because switching to ml.p3.16xlarge adds more GPUs but does not increase the memory per GPU (each GPU still has 16 GB); the OOM error occurs on a single GPU, so more GPUs won't resolve the per-GPU memory exhaustion. Option B is wrong because managed spot training reduces cost but does not affect GPU memory usage; it could even cause interruptions that complicate debugging. Option D is wrong because switching from Pipe to File input mode changes how data is streamed (Pipe streams directly from S3, File downloads to local storage) but does not reduce the memory consumed by batches during training; in fact, File mode may increase local disk usage but not GPU memory.

Practice this question →

182

MCQmedium

A data scientist is training a binary classification model on a dataset with 100 features and 10,000 rows. The model overfits significantly: training accuracy is 99%, but validation accuracy is 80%. The data scientist has tried L1 and L2 regularization without improvement. The dataset is clean and representative. Which approach is MOST likely to reduce overfitting? A. Increase the number of training epochs. B. Add more training data by generating synthetic samples using SMOTE. C. Reduce the number of features using PCA. D. Use a simpler model like logistic regression instead of a decision tree ensemble. The data scientist needs to maintain a validation accuracy above 85%, but the current model is too complex. The company has limited budget for data labeling. Which option is BEST?

A.Use a simpler model like logistic regression

B.Add more training data by generating synthetic samples using SMOTE

C.Reduce the number of features using PCA

D.Increase the number of training epochs

AnswerA

Simpler model reduces capacity and overfitting.

Why this answer

Option A is correct because the current model (likely a decision tree ensemble like Random Forest or XGBoost) is too complex for the dataset, causing overfitting. Switching to a simpler model like logistic regression reduces variance by limiting the hypothesis space, which directly addresses overfitting without requiring additional data or feature engineering. Given the limited labeling budget, this approach is cost-effective and can improve generalization, potentially achieving the required >85% validation accuracy.

Exam trap

The trap here is that candidates often assume more data (SMOTE) or dimensionality reduction (PCA) will always reduce overfitting, but in this scenario the core issue is model complexity, not data quantity or feature noise.

How to eliminate wrong answers

Option B is wrong because SMOTE generates synthetic samples by interpolating between existing minority class instances, which does not add new independent information; it can exacerbate overfitting by creating artificial patterns that the model already memorizes. Option C is wrong because PCA reduces dimensionality by projecting features onto principal components, but it is unsupervised and may discard features that are discriminative for the binary classification task, potentially harming validation accuracy. Option D is wrong because increasing the number of training epochs allows the model to further minimize training loss, which worsens overfitting by making the model memorize noise rather than generalize.

Practice this question →

183

Multi-Selecthard

A company is deploying a real-time inference endpoint with SageMaker. The model is a large neural network that requires GPU acceleration. Which TWO configurations must be set?

Select 2 answers

A.Instance type with GPU

B.Create a SageMaker model with the inference code and model artifacts

C.Batch transform job

D.Production variant

E.Training container image

AnswersA, B

Required for GPU inference.

Why this answer

Option A is correct because deploying a real-time inference endpoint with a large neural network that requires GPU acceleration necessitates selecting an instance type with a GPU, such as the ml.p3 or ml.g4dn series, to provide the parallel processing power needed for low-latency inference. Without a GPU instance, the model would fall back to CPU, leading to unacceptable inference times for large neural networks.

Exam trap

The trap here is that candidates often confuse the required configurations for deploying a real-time endpoint with those for training or batch processing, mistakenly selecting Batch Transform or Training Container Image instead of recognizing that the instance type with GPU and the SageMaker model definition are the two essential components.

Practice this question →

184

MCQhard

A data scientist is using Amazon SageMaker to train a model with a large dataset that does not fit into memory on a single instance. The training algorithm supports distributed training. Which approach should the scientist use to train the model efficiently?

A.Use SageMaker File mode and increase the instance volume size

B.Use Amazon EMR to preprocess data and then train on a smaller sample

C.Split the data into smaller files and use multiple training jobs sequentially

D.Use SageMaker Pipe mode to stream data directly from S3

AnswerD

Pipe mode allows the algorithm to read data on the fly, handling large datasets.

Why this answer

SageMaker Pipe mode streams data from S3 directly to the training algorithm without writing to disk, enabling processing of large datasets beyond memory.

Practice this question →

185

MCQhard

A company is deploying a real-time fraud detection system using a gradient boosting model on AWS SageMaker. The model uses 200 features and is trained on 50 GB of data. The inference latency requirement is under 10 ms per request. During load testing, the endpoint shows average latency of 15 ms. Which change is MOST likely to reduce latency below 10 ms?

A.Switch to a GPU-based instance type

B.Reduce the number of features to the top 50 based on feature importance

C.Increase the number of trees in the model

D.Use a larger batch size for inference

AnswerB

Fewer features reduce inference computation time, directly lowering latency.

Why this answer

Reducing the number of features from 200 to the top 50 directly decreases the amount of data each inference request must process, which lowers both feature engineering overhead and model evaluation time. For gradient boosting models on SageMaker, fewer features mean fewer decision tree splits to traverse per prediction, which can significantly reduce latency without requiring hardware changes. This is the most direct and cost-effective way to meet the 10 ms requirement.

Exam trap

The trap here is that candidates often assume GPU instances universally speed up inference, but for tree-based models like gradient boosting, the bottleneck is sequential tree traversal, not parallel computation, so feature reduction is the correct optimization.

How to eliminate wrong answers

Option A is wrong because switching to a GPU-based instance type does not inherently reduce latency for gradient boosting models; GPUs excel at parallel matrix operations (e.g., deep learning) but offer minimal benefit for tree-based models where inference is sequential and CPU-bound. Option C is wrong because increasing the number of trees in the model increases the ensemble size, requiring more sequential evaluations per prediction, which would increase latency, not reduce it. Option D is wrong because using a larger batch size for inference increases throughput (requests per second) but does not reduce per-request latency; in fact, it can increase latency for individual requests due to queuing and processing delays.

Practice this question →

186

MCQeasy

A data scientist is building a binary classifier and obtains the following confusion matrix on the test set: TP=80, FP=20, TN=70, FN=30. What is the precision?

A.0.727

B.0.8

C.0.75

D.0.762

AnswerB

Precision = TP/(TP+FP) = 80/100 = 0.8.

Why this answer

Precision = TP / (TP+FP) = 80/(80+20)=0.8. Recall = TP/(TP+FN)=80/110≈0.727. Accuracy = (80+70)/200=0.75.

F1 = 2*(0.8*0.727)/(0.8+0.727)≈0.762.

Practice this question →

187

MCQmedium

A data scientist is using Amazon SageMaker to train a model with a custom Docker container. The training job fails with an error: 'Container exited with code 137'. What is the most likely cause?

A.The training data was corrupted.

B.The training job exceeded the maximum runtime.

C.The Docker entrypoint script was not found.

D.The training instance ran out of memory.

AnswerD

Exit code 137 indicates OOM kill.

Why this answer

Exit code 137 (128+9) indicates the container was killed by the SIGKILL signal, which typically occurs when the Linux Out-Of-Memory (OOM) killer terminates a process that has exceeded its memory allocation. In Amazon SageMaker, training instances have finite memory, and if the training algorithm or data loading exceeds that limit, the OOM killer forcibly stops the container, resulting in exit code 137.

Exam trap

The trap here is that candidates often confuse exit code 137 with a generic 'container error' or 'runtime timeout' (option B), not realizing that 137 specifically signals a SIGKILL from the OOM killer due to memory exhaustion.

How to eliminate wrong answers

Option A is wrong because corrupted training data would typically cause a non-zero exit code like 1 or a Python traceback, not a SIGKILL (137). Option B is wrong because exceeding the maximum runtime results in exit code 143 (SIGTERM) or a timeout error, not 137. Option C is wrong because a missing entrypoint script would cause an immediate container startup failure with exit code 127 (command not found) or 126 (permission denied), not a memory-related kill signal.

Practice this question →

188

MCQmedium

A company is using SageMaker's built-in image classification algorithm to classify product images into 100 categories. The training takes 3 hours on a single p3.2xlarge instance. They need to reduce training time to under 1 hour. They have access to a cluster of 4 p3.2xlarge instances. Which approach should they take?

A.Use SageMaker's hyperparameter tuning to find faster convergence

B.Use a smaller batch size on each instance

C.Use SageMaker's managed spot training with checkpointing

D.Use SageMaker's distributed training with data parallelism using Horovod

AnswerD

Data parallelism across 4 instances can reduce training time nearly linearly.

Why this answer

Distributed training with data parallelism effectively reduces training time.

Practice this question →

189

MCQeasy

A data scientist is training a binary classification model and wants to evaluate its performance using a metric that is robust to class imbalance. Which metric should be used?

A.Mean squared error

B.Area under the ROC curve (AUC)

C.F1 score

D.Accuracy

AnswerC

F1 score balances precision and recall and is robust to class imbalance.

Why this answer

The F1 score is the harmonic mean of precision and recall and is robust to class imbalance because it considers both false positives and false negatives. Accuracy can be misleading with imbalanced classes.

Practice this question →

190

MCQhard

A company uses Amazon SageMaker to train a deep learning model for image classification. The training dataset consists of 500,000 images, each 256x256 pixels, stored in S3. The team uses a single ml.p3.2xlarge instance for training. The training time is unacceptably long (over 48 hours). The team wants to reduce training time without sacrificing model accuracy. They have already optimized the data pipeline by using SageMaker Pipe mode and sharding the S3 dataset. The model is a ResNet-50 implemented in TensorFlow. The team is considering the following options: A) Switch to a ml.p3.16xlarge instance which has 8 GPUs and more memory. B) Implement distributed data parallelism using Horovod across multiple instances. C) Use SageMaker's built-in Hyperparameter Tuning to find optimal hyperparameters. D) Reduce the image resolution to 128x128 to speed up training. Which option will MOST effectively reduce training time while maintaining accuracy?

A.Switch to a ml.p3.16xlarge instance

B.Reduce the image resolution to 128x128

C.Implement distributed data parallelism using Horovod across multiple instances

D.Use SageMaker's built-in Hyperparameter Tuning

AnswerC

Horovod enables efficient multi-GPU, multi-instance training, scaling training time linearly.

Why this answer

Using multiple instances with Horovod for distributed data parallelism can scale training linearly with the number of GPUs, significantly reducing time. A larger single instance (ml.p3.16xlarge) provides 8 GPUs but still limited by single instance. Hyperparameter tuning does not directly reduce training time.

Reducing resolution may lose accuracy.

Practice this question →

191

MCQmedium

A company is using Amazon SageMaker to deploy a model for real-time inference. The model requires 500 MB of memory and has a latency requirement of 100 ms. The endpoint is receiving 10 requests per second. Which instance type should be chosen for cost-effectiveness?

A.ml.c5.xlarge

B.ml.t2.medium

C.ml.m5.large

D.ml.p3.2xlarge

AnswerC

Adequate memory and cost-effective.

Why this answer

Option A is correct because ml.m5.large (2 vCPU, 8 GB) is more than sufficient for memory and throughput, and is cost-effective. Option B is wrong because ml.c5.xlarge (4 vCPU, 8 GB) is more expensive than needed. Option C is wrong because ml.t2.medium (2 vCPU, 4 GB) has enough memory but may have burstable CPU limitations.

Option D is wrong because ml.p3.2xlarge is GPU-optimized and overkill.

Practice this question →

192

MCQeasy

A data scientist is using Amazon SageMaker to train a linear regression model. The target variable is right-skewed. Which transformation should the data scientist apply to the target variable to improve model performance?

A.Min-max scaling

B.One-hot encoding

C.Log transformation

D.Principal Component Analysis (PCA)

AnswerC

Log transformation reduces right skewness.

Why this answer

Option B is correct because log transformation is commonly used to reduce skewness. Option A is wrong because min-max scaling does not address skewness. Option C is wrong because one-hot encoding is for categorical variables.

Option D is wrong because PCA is for dimensionality reduction.

Practice this question →

193

MCQhard

A data scientist is training a time series forecasting model using Amazon SageMaker's DeepAR algorithm. The dataset contains daily sales data for 10,000 products over 2 years. The scientist splits the data chronologically: training on the first 18 months, validation on the next 3 months, and test on the last 3 months. The model performs well on validation but poorly on test. The data scientist suspects the model is overfitting to the validation period. Which action should the scientist take to improve test performance?

A.Use time series cross-validation with an expanding window

B.Reduce the context length to 30 days

C.Add more exogenous features like holidays and promotions

D.Use the entire dataset for training and ignore validation

AnswerA

Proper cross-validation reduces overfitting to a specific validation period.

Why this answer

Option D (use cross-validation respecting time order) such as expanding window. Option A (increase training data) may help but not specifically address overfitting to validation. Option B (add more features) may worsen overfitting.

Option C (reduce context length) may lose long-term patterns.

Practice this question →

194

MCQeasy

A data scientist wants to automate the selection of optimal hyperparameters for a model. Which SageMaker feature should be used?

A.SageMaker Debugger

B.SageMaker Model Monitor

C.SageMaker Automatic Model Tuning

D.SageMaker Experiments

AnswerC

Automatic Model Tuning optimizes hyperparameters.

Why this answer

SageMaker Automatic Model Tuning (AMT) is the correct feature because it automates hyperparameter optimization by running multiple training jobs with different hyperparameter combinations, using algorithms like Bayesian optimization or random search to find the best set. This directly addresses the requirement to automate selection of optimal hyperparameters.

Exam trap

The trap here is that candidates confuse SageMaker Experiments (which tracks and compares runs) with Automatic Model Tuning (which actively searches for optimal hyperparameters), leading them to pick D instead of C.

How to eliminate wrong answers

Option A is wrong because SageMaker Debugger monitors and debugs training jobs in real-time (e.g., detecting vanishing gradients or overfitting), but it does not perform hyperparameter optimization. Option B is wrong because SageMaker Model Monitor detects data drift and quality issues in deployed endpoints, not hyperparameter tuning during training. Option D is wrong because SageMaker Experiments tracks and organizes training runs, metrics, and parameters for comparison, but it does not automatically select optimal hyperparameters.

Practice this question →

195

MCQhard

A company wants to build a machine learning model to predict customer churn. The dataset includes customer demographics, usage patterns, and support interactions. The data is stored in Amazon S3. The data scientist needs to perform feature engineering, including creating aggregate features from support interactions and encoding categorical variables. Which AWS service is most suitable for building the feature engineering pipeline?

A.AWS Glue

B.Amazon EMR

C.AWS Batch

D.Amazon SageMaker Processing

AnswerD

SageMaker Processing is purpose-built for data preprocessing and feature engineering with SageMaker.

Why this answer

Amazon SageMaker Processing is the most suitable service because it is purpose-built for data preprocessing and feature engineering within the SageMaker ecosystem. It allows you to run custom Python scripts (e.g., using pandas or PySpark) on managed infrastructure to create aggregate features from support interactions and encode categorical variables, and it integrates seamlessly with SageMaker for model training and deployment.

Exam trap

The trap here is that candidates often confuse AWS Glue (a general ETL tool) with SageMaker Processing, but the question specifically asks for a service that integrates with the SageMaker model building pipeline, making SageMaker Processing the correct choice.

How to eliminate wrong answers

Option A is wrong because AWS Glue is primarily a serverless ETL service for data cataloging and schema discovery, not optimized for running custom feature engineering scripts with tight integration to SageMaker training jobs. Option B is wrong because Amazon EMR is a big data platform for running distributed frameworks like Spark and Hadoop, which is overkill and less integrated for simple feature engineering tasks that SageMaker Processing can handle more directly. Option C is wrong because AWS Batch is a general-purpose batch computing service for running any containerized workload, but it lacks native integration with SageMaker’s model building pipeline and does not provide the same level of convenience for feature engineering steps.

Practice this question →

196

Multi-Selecteasy

A data scientist is performing feature selection for a linear regression model. Which TWO methods are appropriate? (Choose TWO.)

Select 2 answers

A.Lasso (L1) regularization

B.Ridge (L2) regularization

C.t-distributed stochastic neighbor embedding (t-SNE)

D.Forward selection

E.Principal component analysis (PCA)

AnswersA, D

Lasso can zero out feature coefficients, effectively selecting features.

Why this answer

Forward selection and Lasso regularization are both feature selection methods. Lasso adds L1 penalty that shrinks coefficients to zero. Option C is wrong because PCA reduces dimensions but does not select original features.

Option D is wrong because L2 regularization (Ridge) does not set coefficients to zero. Option E is wrong because t-SNE is for visualization.

Practice this question →

197

MCQeasy

A data scientist is training a binary classification model on an imbalanced dataset (95% negative class, 5% positive class). The model achieves 95% accuracy but only predicts the negative class for all examples. Which metric should the scientist use to evaluate model performance more appropriately?

A.F1 score

B.Mean squared error

C.Accuracy

D.AUC-ROC

AnswerD

AUC-ROC evaluates the model's ability to distinguish between classes regardless of threshold and is robust to imbalance.

Why this answer

AUC-ROC is robust to class imbalance because it evaluates the model's ability to discriminate between positive and negative classes across all classification thresholds, rather than relying on a single threshold. In this scenario, the model predicts only the negative class, so its true positive rate is 0 and false positive rate is 0, yielding an AUC-ROC of 0.5 (random performance), which correctly reflects the model's lack of predictive power.

Exam trap

The trap here is that candidates often choose F1 score (Option A) thinking it handles imbalance well, but they forget that F1 score requires at least some true positives to be meaningful, and in this extreme case where the model predicts only negatives, F1 score collapses to 0 or undefined, whereas AUC-ROC correctly identifies random performance.

How to eliminate wrong answers

Option A is wrong because F1 score is a harmonic mean of precision and recall, but when the model predicts only the negative class, recall is 0 (no true positives), making the F1 score undefined or 0, which does not provide a meaningful evaluation of the model's overall discriminative ability. Option B is wrong because mean squared error (MSE) is a regression metric that measures average squared differences between predicted and actual values; it is not designed for binary classification and does not account for class imbalance or threshold behavior. Option C is wrong because accuracy is misleading on imbalanced datasets; a model that always predicts the majority class achieves high accuracy (95%) but fails to identify any positive instances, so accuracy does not reflect the model's true performance on the minority class.

Practice this question →

198

MCQmedium

A data scientist is working with a dataset containing categorical features with high cardinality. The scientist wants to use a tree-based model. Which encoding method should be used?

A.Ordinal encoding

B.Target encoding

C.Label encoding

D.One-hot encoding

AnswerA

Ordinal encoding assigns integers without implying order, suitable for trees.

Why this answer

Option C is correct because tree-based models can handle ordinal encoding naturally. Option A is wrong because one-hot encoding creates many dimensions, not ideal for high cardinality. Option B is wrong because label encoding may impose ordinal relationship.

Option D is wrong because target encoding may cause overfitting.

Practice this question →

199

MCQmedium

A data scientist is training a binary classification model on an imbalanced dataset where the positive class represents 1% of the data. The model needs to maximize recall while keeping precision above 0.7. Which sampling strategy should the data scientist use?

A.NearMiss from imbalanced-learn to undersample the majority class based on distance to minority samples.

B.SMOTE from imbalanced-learn to generate synthetic samples for the minority class.

C.RandomUnderSampler from imbalanced-learn to undersample the majority class.

D.TomekLinks from imbalanced-learn to remove overlapping samples.

E.RandomOverSampler from imbalanced-learn to oversample the minority class.

AnswerB

SMOTE creates synthetic samples, balancing the dataset and improving recall while preserving precision.

Why this answer

Option C is correct because SMOTE generates synthetic samples for the minority class, which can improve recall without discarding data. Option A (RandomUnderSampler) may discard too many majority samples, reducing precision. Option B (RandomOverSampler) can cause overfitting.

Option D (NearMiss) focuses on hard samples and may reduce recall. Option E (TomekLinks) only removes noisy instances, not addressing imbalance effectively.

Practice this question →

200

MCQmedium

A company is using Amazon SageMaker to build a binary classification model for customer churn. The dataset is highly imbalanced (90% no churn, 10% churn). Which technique is MOST effective for handling class imbalance?

A.Use accuracy as the evaluation metric.

B.Undersample the majority class.

C.Use SMOTE to generate synthetic samples for the minority class.

D.Train a random forest model instead of logistic regression.

AnswerC

SMOTE is a standard oversampling technique.

Why this answer

SMOTE (Synthetic Minority Oversampling Technique) is the most effective option because it generates synthetic samples for the minority class by interpolating between existing minority instances, thereby balancing the dataset without discarding valuable majority-class data. This approach directly addresses the class imbalance in a binary classification task on SageMaker, improving model recall for the churn class without the information loss caused by undersampling.

Exam trap

The trap here is that candidates often assume switching to a tree-based model (like random forest) inherently solves class imbalance, but the exam tests that explicit resampling or cost-sensitive techniques are required for effective handling.

How to eliminate wrong answers

Option A is wrong because accuracy is a misleading metric for imbalanced datasets; a model that predicts 'no churn' for all instances would achieve 90% accuracy but fail to identify any churn cases. Option B is wrong because undersampling the majority class discards potentially useful data, which can lead to loss of information and reduced model performance, especially when the dataset is not extremely large. Option D is wrong because simply switching to a random forest model does not inherently address class imbalance; while tree-based models can handle imbalance better than logistic regression, they still require explicit imbalance-handling techniques like SMOTE or class weighting to be effective.

Practice this question →

201

Multi-Selecteasy

Which TWO actions are best practices for tuning hyperparameters using Amazon SageMaker Automatic Model Tuning?

Select 2 answers

A.Set the number of training jobs to a very large value

B.Use the same hyperparameters as the baseline model

C.Use Bayesian optimization strategy

D.Use grid search strategy

E.Use random search strategy

AnswersC, E

Bayesian optimization is effective and efficient.

Why this answer

Using random search or Bayesian optimization are supported strategies. Grid search is also possible but not efficient for many hyperparameters. Setting a large number of training jobs can be costly.

Using the same hyperparameters as the baseline does not tune. So correct are A and C. B: Grid search is less efficient.

D: Large number of jobs is not a best practice due to cost. E: Not tuning is not a best practice.

Practice this question →

202

MCQhard

A data scientist is training a deep learning model for image classification. The model is overfitting on the training data. Which combination of techniques will most effectively reduce overfitting?

A.Add dropout layers and use data augmentation

B.Reduce the batch size

C.Train for more epochs without early stopping

D.Increase the number of layers and neurons

AnswerA

Dropout randomly drops units to prevent co-adaptation; data augmentation increases effective training set size, both reduce overfitting.

Why this answer

Dropout layers randomly deactivate a fraction of neurons during training, which forces the network to learn more robust features and prevents co-adaptation. Data augmentation artificially expands the training dataset by applying transformations (e.g., rotation, flipping, cropping), which reduces the model's ability to memorize spurious patterns and improves generalization. Together, these techniques directly counteract overfitting by increasing regularization and effective training diversity.

Exam trap

Cisco often tests the misconception that increasing model complexity (more layers/neurons) or training longer will fix overfitting, when in reality these actions worsen it, and that simple hyperparameter changes like batch size reduction are not primary regularization techniques.

How to eliminate wrong answers

Option B is wrong because reducing the batch size introduces noisier gradient estimates, which can sometimes act as a mild regularizer but is not a primary or reliable technique to combat overfitting; it may even destabilize training. Option C is wrong because training for more epochs without early stopping will exacerbate overfitting, as the model will continue to memorize noise in the training data. Option D is wrong because increasing the number of layers and neurons increases model capacity, which makes overfitting worse by allowing the network to fit training data more precisely.

Practice this question →

203

MCQhard

A data scientist is training a neural network for image classification. The dataset has 50,000 images across 100 classes. The model uses a ResNet-50 architecture pre-trained on ImageNet. The training loss decreases rapidly, but validation loss starts to increase after 5 epochs. Which of the following is the most effective technique to address this?

A.Increase the learning rate

B.Add more layers to the network

C.Use data augmentation to increase the diversity of the training set

D.Use a smaller batch size

AnswerC

Data augmentation artificially expands the training set, reducing overfitting and improving generalization.

Why this answer

The rapid decrease in training loss followed by an increase in validation loss after only 5 epochs is a classic sign of overfitting. Data augmentation artificially expands the training set by applying random transformations (e.g., rotations, flips, crops) to existing images, which improves the model's generalization and reduces overfitting. This is the most effective technique among the options because it directly addresses the lack of diverse training examples without changing the model architecture or training hyperparameters in a way that could destabilize learning.

Exam trap

The trap here is that candidates often confuse overfitting with underfitting or training instability, and incorrectly choose to increase learning rate or add layers, not recognizing that the validation loss rising while training loss falls is the textbook symptom of overfitting that requires regularization or more data.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate would likely cause the optimizer to overshoot minima, making both training and validation loss unstable or diverge, which does not fix overfitting. Option B is wrong because adding more layers to an already deep ResNet-50 would increase model capacity and exacerbate overfitting, especially with a fixed dataset size. Option D is wrong because using a smaller batch size introduces more noise into gradient estimates, which can sometimes act as a regularizer but is less reliable and effective than data augmentation for addressing overfitting in image classification; it may also slow convergence.

Practice this question →

204

MCQhard

A data scientist is working on a multi-class classification problem with 10 classes. The model outputs probabilities and the scientist wants to evaluate the model's ability to rank classes correctly. Which metric is most appropriate?

A.F1 score

B.Accuracy

C.Area Under the ROC Curve (AUC-ROC)

D.Log loss

AnswerC

AUC-ROC measures ranking ability for multi-class via one-vs-rest.

Why this answer

The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) measures the model's ability to distinguish between classes. For multi-class, one-vs-rest AUC can be used. Log loss measures calibration, not ranking.

F1 score is for binary classification or per-class. Accuracy does not consider ranking. Option A: AUC-ROC is correct.

Option B: Log loss measures probability calibration. Option C: F1 score is not a ranking metric. Option D: Accuracy is not a ranking metric.

Practice this question →

205

MCQmedium

A data scientist is building a regression model to predict house prices. The dataset includes features such as square footage, number of bedrooms, and location. After training a linear regression model, the scientist notices that the residuals have a pattern: they increase as the predicted value increases. Which action is most appropriate?

A.Remove outliers from the dataset

B.Use Ridge regression instead of linear regression

C.Add polynomial features to the model

D.Apply a log transformation to the target variable

AnswerD

Log transformation can stabilize variance and reduce heteroscedasticity.

Why this answer

Patterned residuals (heteroscedasticity) violating linear regression assumptions. Log-transforming the target variable can stabilize variance. Adding polynomial features or interactions may help with non-linearity but not specifically for heteroscedasticity.

Ridge regression is for multicollinearity, not for patterned residuals.

Practice this question →

206

MCQhard

A company uses Amazon SageMaker to train a model for fraud detection. The dataset has 1 million transactions, with 0.1% fraud. The data scientist trains a random forest model and achieves 99.9% accuracy but 0% recall on the fraud class. Which technique is most likely to improve recall without significantly reducing precision?

A.Tune the classification threshold

B.Use cost-sensitive learning with a high cost for fraud misclassification

C.Apply SMOTE to generate synthetic fraud samples

D.Undersample the majority class

AnswerC

SMOTE creates synthetic instances of the minority class, balancing the dataset and improving recall while maintaining precision.

Why this answer

Option C is correct because SMOTE (Synthetic Minority Oversampling Technique) generates synthetic fraud samples by interpolating between existing minority class instances, which directly addresses the extreme class imbalance (0.1% fraud). This increases the representation of the fraud class in the training data, allowing the random forest model to learn decision boundaries that capture fraud patterns, thereby improving recall without introducing the noise or information loss associated with other methods.

Exam trap

The trap here is that candidates often assume cost-sensitive learning (Option B) is the best approach for imbalanced data, but in extreme imbalance with 0% recall, oversampling techniques like SMOTE are more effective because they directly increase the minority class representation rather than just adjusting penalties.

How to eliminate wrong answers

Option A is wrong because tuning the classification threshold can trade off precision and recall, but with 0% recall, the model is already predicting all instances as non-fraud; lowering the threshold may increase recall but will likely cause a drastic drop in precision due to the overwhelming majority class. Option B is wrong because cost-sensitive learning assigns a higher penalty to fraud misclassification during training, which can improve recall, but it does not directly address the lack of fraud examples in the dataset and may still result in poor precision if the model cannot learn from sparse data. Option D is wrong because undersampling the majority class reduces the dataset size and discards potentially useful information, which can lead to loss of decision boundary details and decreased model performance, often harming precision more than helping recall.

Practice this question →

207

MCQhard

A team is using SageMaker to train a deep learning model for image classification. The training job is failing with a 'CUDA out of memory' error. The team is using a p3.2xlarge instance (1 GPU, 16 GB GPU memory). The dataset consists of 256x256 RGB images. Which action is MOST likely to resolve the error without changing the instance type?

A.Increase the batch size to utilize GPU more efficiently

B.Enable automatic model tuning to optimize hyperparameters

C.Use Spot Instances to reduce cost

D.Reduce the batch size

AnswerD

Smaller batch size reduces memory footprint per iteration, resolving OOM errors.

Why this answer

The 'CUDA out of memory' error indicates that the GPU memory is exhausted. Reducing the batch size directly decreases the memory footprint per training step, allowing the model to fit within the 16 GB GPU memory of the p3.2xlarge instance. This is the most direct and effective fix without changing the instance type.

Exam trap

The trap here is that candidates may confuse 'CUDA out of memory' with a performance issue and incorrectly choose to increase batch size for efficiency, when in fact the error is a hard memory limit that requires reducing memory usage.

How to eliminate wrong answers

Option A is wrong because increasing the batch size would increase GPU memory consumption, worsening the out-of-memory error. Option B is wrong because automatic model tuning (hyperparameter optimization) does not directly address GPU memory limits; it may even suggest larger batch sizes that exacerbate the issue. Option C is wrong because Spot Instances reduce cost but do not affect GPU memory capacity; the error would persist regardless of instance pricing model.

Practice this question →

208

MCQmedium

A company uses Amazon SageMaker to train an XGBoost model on a large dataset. Training takes a long time. Which action can reduce training time without significantly affecting model accuracy?

A.Use a deep neural network instead

B.Increase the learning rate

C.Use a larger instance type

D.Enable early stopping

AnswerD

Early stopping stops when no improvement.

Why this answer

Early stopping halts training when the model's performance on a validation set stops improving for a specified number of rounds. This prevents overfitting and reduces training time by eliminating unnecessary iterations, while typically preserving accuracy because the optimal model is already found.

Exam trap

AWS often tests the misconception that increasing learning rate or using more powerful hardware always speeds up training without side effects, but the correct answer focuses on algorithmic efficiency rather than resource scaling.

How to eliminate wrong answers

Option A is wrong because replacing XGBoost with a deep neural network generally increases training time and requires more data and tuning, not reducing time. Option B is wrong because increasing the learning rate can cause the model to converge to a suboptimal solution or diverge, significantly reducing accuracy. Option C is wrong because using a larger instance type increases computational resources and may reduce wall-clock time, but it does not reduce the total compute time or algorithmic iterations; it also incurs higher cost and does not address the root cause of long training.

Practice this question →

209

MCQmedium

A data scientist is using Amazon SageMaker to train a model on a dataset that contains both numerical and categorical features. The categorical features have high cardinality (e.g., postal codes, product IDs). Which feature engineering approach is most suitable for handling these high-cardinality categorical features in a tree-based model?

A.One-hot encode the categorical features

B.Use label encoding

C.Apply target encoding

D.Apply binary encoding

AnswerB

Tree-based models like XGBoost can effectively use label-encoded features because they make splits based on ordering.

Why this answer

Label encoding is suitable for tree-based models because these models split on feature values and can handle ordinal relationships implicitly. Unlike linear models, tree-based models do not assume any distance metric between categories, so label encoding avoids the dimensionality explosion of one-hot encoding while preserving the ability to capture splits based on high-cardinality features.

Exam trap

The trap here is that candidates often default to one-hot encoding for categorical features without considering the model type, failing to recognize that tree-based models can effectively use label encoding for high-cardinality features without the drawbacks of dimensionality explosion.

How to eliminate wrong answers

Option A is wrong because one-hot encoding high-cardinality features (e.g., thousands of unique postal codes) creates an extremely sparse feature matrix with many columns, leading to increased memory usage, slower training, and potential overfitting in tree-based models. Option C is wrong because target encoding, while effective for high-cardinality features, introduces target leakage and can cause overfitting if not carefully regularized, making it less robust than label encoding for tree-based models in a straightforward SageMaker training pipeline. Option D is wrong because binary encoding, though more compact than one-hot, still creates multiple binary columns per feature and can complicate interpretability; tree-based models can handle label encoding directly without needing this transformation.

Practice this question →

210

MCQeasy

A team is training a linear regression model to predict house prices. After training, they observe that the model has high bias (underfitting). Which action is most likely to reduce bias?

A.Increase the regularization strength.

B.Reduce the amount of training data.

C.Decrease the number of model parameters.

D.Add more relevant features and increase model complexity.

AnswerD

Adding features reduces bias.

Why this answer

High bias (underfitting) means the model is too simple to capture the underlying patterns in the data. Adding more relevant features and increasing model complexity (e.g., using polynomial features or more interaction terms) gives the linear regression model greater capacity to fit the training data, directly reducing bias. This aligns with the bias-variance tradeoff, where increasing complexity lowers bias at the cost of potentially increasing variance.

Exam trap

The trap here is that candidates often confuse regularization (which controls overfitting) with bias reduction, mistakenly thinking increasing regularization or reducing parameters will fix underfitting, when in fact those actions increase bias.

How to eliminate wrong answers

Option A is wrong because increasing regularization strength (e.g., L1 or L2 penalty) forces the model to shrink coefficients toward zero, which increases bias and worsens underfitting. Option B is wrong because reducing the amount of training data does not address model simplicity; it typically increases variance and can exacerbate bias if the model cannot learn the true distribution. Option C is wrong because decreasing the number of model parameters (e.g., removing features or using a simpler model) reduces complexity, which directly increases bias and makes underfitting worse.

Practice this question →

211

MCQmedium

A data scientist is using Amazon SageMaker to train a natural language processing model using a custom Docker container. The training script reads data from an S3 bucket and writes checkpoints to an S3 bucket. The training job is failing with the error 'Unable to write to checkpoint path: s3://my-bucket/checkpoints/'. The IAM role associated with the training job has the following policy: {'Effect': 'Allow', 'Action': 's3:PutObject', 'Resource': 'arn:aws:s3:::my-bucket/checkpoints/*'}. The bucket 'my-bucket' exists and the prefix 'checkpoints/' is empty. What is the most likely cause of the failure?

A.The IAM role is missing the s3:ListBucket permission

B.The IAM role does not have s3:PutObject permission

C.The S3 bucket does not exist

D.The checkpoint prefix already contains objects

AnswerA

SageMaker needs ListBucket to access the bucket.

Why this answer

The error 'Unable to write to checkpoint path' occurs because the SageMaker training job's IAM role lacks the `s3:ListBucket` permission. Even though the role has `s3:PutObject` on the checkpoint prefix, SageMaker's S3 client first performs a `ListObjects` (or `HeadObject`) call to verify the bucket exists and to check the prefix state before writing. Without `s3:ListBucket` on the bucket itself, the API call fails, causing the write operation to abort.

Exam trap

The trap here is that candidates assume `s3:PutObject` alone is sufficient for writing to S3, but AWS requires `s3:ListBucket` on the bucket to verify the path before writing, a nuance frequently tested in MLS-C01 and SAA exams.

How to eliminate wrong answers

Option B is wrong because the policy explicitly includes `s3:PutObject` on the checkpoint path, so the permission is present. Option C is wrong because the question states the bucket 'my-bucket' exists, so the bucket is not missing. Option D is wrong because the prefix is explicitly described as empty, and even if it contained objects, `s3:PutObject` would still succeed; the error is about the inability to write, not about overwriting existing objects.

Practice this question →

212

MCQmedium

A company uses SageMaker to host a real-time inference endpoint for a classification model. The endpoint receives traffic spikes that cause high latency. The team wants a solution that automatically scales based on demand while keeping costs low. Which approach is BEST?

A.Use provisioned concurrency for the endpoint

B.Use a multi-model endpoint to serve multiple models

C.Deploy the endpoint on Spot Instances

D.Enable automatic scaling on the endpoint using Application Auto Scaling

AnswerD

Automatic scaling adjusts instance count based on demand, balancing cost and latency.

Why this answer

SageMaker endpoints support automatic scaling with Application Auto Scaling based on custom metrics like 'InvocationsPerInstance' or 'SageMakerVariantInvocationsPerInstance'. Provisioned concurrency is not available for SageMaker endpoints. Spot instances are not recommended for real-time endpoints due to interruptions.

Multi-model endpoints help but scaling is still needed.

Practice this question →

213

Multi-Selecthard

A machine learning engineer is tuning a Gradient Boosting model for a regression task. The dataset contains 50 features and 100,000 samples. The engineer wants to speed up training without sacrificing predictive performance significantly. Which THREE hyperparameters should the engineer consider adjusting? (Choose THREE.)

Select 3 answers

A.Reduce the subsample ratio (e.g., from 1.0 to 0.5)

B.Increase learning_rate and decrease n_estimators proportionally

C.Increase the number of estimators

D.Decrease max_depth of trees

E.Reduce max_features (e.g., from 'auto' to 0.5)

AnswersA, D, E

Using fewer samples per tree speeds training.

Why this answer

Option A (subsample) uses a fraction of samples per tree, reducing overfitting and training time. Option B (max_features) limits features considered for splits, reducing computation. Option C (learning_rate) with more estimators trades off; lowering learning rate often requires more estimators, not faster.

Option D (max_depth) lower depth speeds up training. Option E (n_estimators) increasing slows training.

Practice this question →

214

MCQhard

A research team is training a deep learning model for object detection using SageMaker's built-in SSD algorithm. The dataset contains 50,000 images with bounding box annotations. The team uses a single ml.p3.2xlarge instance. After 24 hours of training, the model's loss has plateaued, but the mean average precision (mAP) on validation is only 0.45. The team wants to improve mAP without increasing training time. Which action should they take?

A.Increase the learning rate by a factor of 2

B.Use a pre-trained model as the backbone (e.g., ResNet-50 pre-trained on ImageNet)

C.Increase the batch size to 64

D.Add more convolutional layers to the backbone

AnswerB

Transfer learning boosts accuracy with no additional training time.

Why this answer

Option B (use a pre-trained backbone) transfers learned features, often improving accuracy. Option A (increase batch size) may not improve mAP and could slow convergence. Option C (add more layers) increases training time.

Option D (increase learning rate) may destabilize training.

Practice this question →

215

MCQmedium

A machine learning team is using SageMaker to train a deep learning model. The training job is failing due to insufficient GPU memory. Which approach should the team take to resolve this issue without changing the model architecture?

A.Increase the batch size.

B.Use gradient accumulation to reduce the effective batch size per step.

C.Add more GPUs to the training instance.

D.Decrease the learning rate.

AnswerB

Gradient accumulation allows training with larger effective batches while keeping per-step memory low.

Why this answer

Option B is correct because gradient accumulation simulates larger batch sizes without increasing memory per step. Option A is wrong because increasing batch size increases memory usage. Option C is wrong because reducing learning rate does not affect memory.

Option D is wrong because adding more GPUs to a single instance may not help if memory is already exhausted on each GPU; but the key is to reduce per-GPU memory, which gradient accumulation achieves by using a smaller effective batch size per step.

Practice this question →

216

MCQmedium

A company wants to deploy a machine learning model that predicts customer churn. The model must provide interpretable predictions to explain why a customer is likely to churn. Which algorithm is most appropriate?

A.Gradient boosting machine

B.Support vector machine (SVM)

C.Decision tree

D.Deep neural network

AnswerC

Decision trees are highly interpretable.

Why this answer

Decision trees are inherently interpretable because they produce a clear, rule-based structure that shows exactly which features and thresholds lead to a churn prediction. This white-box nature allows stakeholders to trace the reasoning for each prediction, meeting the requirement for interpretability without needing post-hoc explanation methods.

Exam trap

Cisco often tests the trade-off between model accuracy and interpretability, where candidates mistakenly choose a high-performance black-box model (like gradient boosting or neural networks) without recognizing that the question explicitly prioritizes interpretability over raw predictive power.

How to eliminate wrong answers

Option A is wrong because gradient boosting machines are ensemble models that combine many weak learners, making them highly accurate but difficult to interpret directly; they require techniques like SHAP or LIME for explanation, which adds complexity. Option B is wrong because support vector machines operate in high-dimensional feature spaces using kernel functions, producing decision boundaries that are not easily interpretable without additional tools. Option D is wrong because deep neural networks are black-box models with multiple hidden layers and non-linear transformations, making their predictions opaque and requiring external interpretability methods.

Practice this question →

217

Multi-Selecthard

A data scientist is training a random forest model for regression. The model shows high variance on the validation set. Which TWO actions are most likely to reduce variance? (Choose 2.)

Select 2 answers

A.Use bootstrap sampling with replacement

B.Decrease the maximum depth of trees

C.Increase the minimum samples per leaf

D.Increase the number of trees in the forest

E.Increase the number of features considered at each split

AnswersB, C

Shallow trees reduce overfitting, lowering variance.

Why this answer

Decreasing depth and increasing min samples per leaf both reduce complexity, lowering variance.

Practice this question →

218

MCQeasy

A machine learning team is training a deep learning model on Amazon SageMaker and notices that the training loss is decreasing but the validation loss is increasing. What is the most likely cause?

A.Vanishing gradients

B.Overfitting the training data

C.Learning rate is too high

D.Underfitting the training data

AnswerB

Overfitting occurs when model learns noise, causing validation loss to increase after a point.

Why this answer

When training loss continues to decrease while validation loss increases, the model is memorizing the training data rather than learning generalizable patterns. This is the classic symptom of overfitting, where the model's capacity exceeds what is needed for the underlying data distribution, causing it to fit noise in the training set. In Amazon SageMaker, this can be observed by monitoring the validation loss metric during training jobs.

Exam trap

Cisco often tests the distinction between overfitting and high learning rate by presenting a scenario where training loss decreases but validation loss increases, and candidates mistakenly attribute it to a learning rate that is too high, not recognizing that a high learning rate would cause both losses to diverge or oscillate.

How to eliminate wrong answers

Option A is wrong because vanishing gradients cause the model to stop learning entirely, resulting in both training and validation loss stagnating or decreasing very slowly, not a divergence between the two. Option C is wrong because a learning rate that is too high typically causes the loss to oscillate or diverge on both training and validation sets, not a monotonic decrease in training loss with an increase in validation loss. Option D is wrong because underfitting means the model is too simple to capture patterns, leading to high loss on both training and validation sets, not a decreasing training loss.

Practice this question →

219

MCQhard

A data scientist is troubleshooting a failed SageMaker training job that uses a custom Docker image. The failure reason shows 'unrecognized arguments: --sagemaker_program'. What is the most likely cause?

A.The Docker image is tagged incorrectly and cannot be pulled

B.The training job is in a different region than the ECR repository

C.The input mode is File mode, but the container expects Pipe mode

D.The custom Docker image does not use the SageMaker training toolkit and thus does not accept SageMaker hyperparameters

AnswerD

Custom containers that are not toolkit-based ignore SageMaker hyperparameters, causing unrecognized argument errors if the entry point tries to parse them.

Why this answer

The error 'unrecognized arguments: --sagemaker_program' indicates that the custom Docker image does not include the SageMaker Training Toolkit. The SageMaker Training Toolkit is a Python library that provides a default entry point to parse and handle SageMaker-specific hyperparameters (like --sagemaker_program, --sagemaker_submit_directory, etc.). Without this toolkit, the container's entry point does not recognize these arguments, causing the training job to fail.

Exam trap

The trap here is that candidates often confuse container-level errors (like pull failures or region mismatches) with argument parsing errors, failing to recognize that the SageMaker Training Toolkit is required to handle SageMaker-specific CLI arguments.

How to eliminate wrong answers

Option A is wrong because if the Docker image were tagged incorrectly or could not be pulled, the error would be an ECR pull failure (e.g., 'CannotPullContainerError' or 'RepositoryNotFoundException'), not an argument parsing error. Option B is wrong because a region mismatch between the training job and the ECR repository would result in a 'RepositoryNotFoundException' or access denied error, not an unrecognized argument error. Option C is wrong because the input mode (File vs.

Pipe) affects how data is ingested (e.g., via SM_INPUT_FILE or SM_INPUT_PIPE environment variables), but it does not affect the parsing of command-line hyperparameters like --sagemaker_program.

Practice this question →

220

Multi-Selecthard

A company is deploying a machine learning model on SageMaker for real-time inference. The model requires GPU for low latency. Which THREE steps are necessary to set up the endpoint?

Select 3 answers

A.Train the model using a SageMaker training job

B.Create a SageMaker batch transform job

C.Create a SageMaker model object that points to the S3 bucket containing the model artifacts and the inference container image

D.Create an endpoint configuration specifying the instance type (e.g., ml.p3.2xlarge) and initial instance count

E.Create a SageMaker endpoint using the endpoint configuration

AnswersC, D, E

A model object is required to deploy an endpoint.

Why this answer

Correct options: A (Create a model in SageMaker with the inference code), C (Create an endpoint configuration with a production variant specifying instance type and initial variant weight), and D (Create an endpoint using the endpoint configuration). B is not necessary because the model is already trained. E is not necessary for real-time inference.

Practice this question →

221

MCQhard

Refer to the exhibit. A data scientist is trying to run a SageMaker training job using a script that reads training data from 's3://my-bucket/training/data.csv'. The job fails with an access denied error. What is the MOST likely reason?

A.The S3 bucket policy may deny access, or the IAM role lacks necessary permissions beyond GetObject.

B.The training job is running in a VPC without S3 VPC endpoint.

C.The sagemaker:CreateTrainingJob action is not allowed on the specific resource.

D.The S3 path is incorrectly formatted.

AnswerA

Common reason: the training script may need to list the bucket or access other prefixes, or a bucket policy denies the request.

Why this answer

Option B is correct. The IAM policy allows s3:GetObject only on objects under 'training/', but the script may also need to read other objects or the bucket itself. Option A is wrong because the action is allowed.

Option C is wrong because the S3 URI is valid. Option D is wrong because no VPC is mentioned.

Practice this question →

222

Multi-Selecthard

Which THREE of the following are appropriate methods to reduce overfitting in a decision tree model?

Select 3 answers

A.Increase the number of features considered for each split

B.Increase the maximum depth of the tree

C.Prune the tree after training

D.Set a minimum number of samples required to split an internal node

E.Limit the maximum depth of the tree

AnswersC, D, E

Pruning removes branches that have little predictive power, reducing overfitting.

Why this answer

Option A is correct because pruning reduces tree size. Option B is correct because limiting depth reduces complexity. Option D is correct because setting minimum samples per leaf prevents overfitting.

Option C is wrong because increasing depth increases overfitting. Option E is wrong because increasing features may increase overfitting.

Practice this question →

223

MCQeasy

Refer to the exhibit. What is the recall of the model?

A.0.85

B.0.80

C.0.89

D.0.90

AnswerB

Recall = 80/(80+20) = 0.80.

Why this answer

Recall is calculated as True Positives divided by the sum of True Positives and False Negatives. From the confusion matrix, True Positives = 80 and False Negatives = 20, so recall = 80 / (80 + 20) = 0.80. Option B is correct.

Exam trap

Cisco often tests the distinction between recall and precision, where candidates mistakenly compute precision (TP/(TP+FP)) instead of recall, leading to option A (0.85).

How to eliminate wrong answers

Option A (0.85) is wrong because it incorrectly uses True Positives divided by the sum of True Positives and False Positives (80/94 ≈ 0.85), which is precision, not recall. Option C (0.89) is wrong because it likely results from dividing True Positives by the total number of predictions (80/90 ≈ 0.89), which is accuracy. Option D (0.90) is wrong because it might come from dividing True Positives by the sum of True Positives and False Positives plus False Negatives (80/100 = 0.80, not 0.90), or from a miscalculation such as using True Negatives incorrectly.

Practice this question →

224

MCQmedium

A data scientist is building a recommendation system for an e-commerce platform. The dataset includes user-item interactions (clicks, purchases, ratings). The scientist wants to use matrix factorization. Which approach is most appropriate for handling implicit feedback (e.g., clicks) rather than explicit ratings?

A.Use k-means clustering to segment users and then use item popularity within clusters

B.Use singular value decomposition (SVD) on the interaction matrix with missing values filled with 0

C.Use a deep neural network with a softmax output to predict item probabilities

D.Use weighted alternating least squares (WALS) with confidence weights

AnswerD

WALS is specifically designed for implicit feedback by assigning confidence to observed and unobserved interactions.

Why this answer

Weighted Alternating Least Squares (WALS) is specifically designed for implicit feedback scenarios because it treats unobserved interactions as negative signals with low confidence, rather than missing values. By assigning confidence weights (e.g., based on click frequency or dwell time), WALS can factorize the implicit feedback matrix effectively, avoiding the bias introduced by treating all zeros as true negatives.

Exam trap

The trap here is that candidates often assume SVD (Option B) is the standard matrix factorization method, but they overlook that SVD requires a complete matrix and treats zeros as missing, which is invalid for implicit feedback where zeros carry meaning.

How to eliminate wrong answers

Option A is wrong because k-means clustering followed by item popularity ignores the collaborative signal between users and items, and does not learn latent factors that capture nuanced preferences. Option B is wrong because SVD requires a dense matrix and assumes missing values are zero, which is inappropriate for implicit feedback where zeros can mean either no interaction or negative preference, leading to poor factorization. Option C is wrong because while a deep neural network with softmax can predict item probabilities, it is not the most appropriate or efficient approach for implicit feedback matrix factorization; WALS is a simpler, proven method that directly handles the implicit feedback structure without overfitting or requiring extensive hyperparameter tuning.

Practice this question →

225

MCQhard

A data scientist is using Amazon SageMaker Autopilot to automatically build a model for a regression problem. The dataset has 100 features and 50,000 rows. Autopilot recommends a model with an R² of 0.85 on the validation set. However, when deployed to production, the model performs poorly (R² of 0.2). What is the most likely cause?

A.The model is overfitting to the training data

B.The production data distribution has shifted from the training data distribution

C.The model is underfitting due to insufficient training

D.Autopilot selected the wrong features

AnswerB

Data drift causes model performance to degrade in production.

Why this answer

Option B is correct because a large discrepancy between validation and production performance often indicates data drift. Option A (overfitting) is possible but less likely given validation performance. Option C (feature importance) is not the direct cause.

Option D (Autopilot bug) is rare.

Practice this question →

← PreviousPage 3 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Modeling questions.

Start 20-question session