CCNA Ml Modeling Questions

75 of 624 questions · Page 6/9 · Ml Modeling topic · Answers revealed

376
MCQhard

A company is building a binary classification model to predict customer churn. The dataset is highly imbalanced (95% non-churn, 5% churn). The data scientist uses SMOTE to oversample the minority class. After training a logistic regression model, the recall for the churn class is 0.80, but the precision is only 0.10. Which action would MOST likely improve precision without significantly harming recall?

A.Use random oversampling instead of SMOTE
B.Reduce the number of features in the model
C.Increase the classification threshold for the positive class
D.Decrease the classification threshold for the positive class
AnswerC

A higher threshold reduces false positives, improving precision, while likely still capturing many true positives.

Why this answer

Option A is correct because increasing the classification threshold reduces false positives, improving precision, while still retaining many true positives. Option B is wrong because decreasing the threshold would further reduce precision. Option C is wrong because using a different oversampling technique might not directly address the threshold issue.

Option D is wrong because reducing the model's complexity could reduce recall.

377
MCQhard

A data scientist is training a deep learning model on Amazon SageMaker using a PyTorch estimator. The training job runs on a single ml.p3.2xlarge instance but is taking too long. The scientist wants to reduce training time by using distributed data parallelism across multiple GPUs. Which change to the training script and SageMaker estimator is required?

A.Add the SageMaker distributed data parallelism configuration in the estimator and modify the script to use the SageMaker distributed library.
B.Change the framework to TensorFlow and use tf.distribute.MirroredStrategy with instance_count=2.
C.Modify the script to use torch.nn.parallel.DistributedDataParallel and set instance_count to 2 in the estimator.
D.Modify the script to use torch.nn.DataParallel and keep instance_count as 1.
AnswerC

DDP is efficient for multi-node training.

Why this answer

Option C is correct because to achieve distributed data parallelism across multiple GPUs on multiple instances with PyTorch, you must modify the training script to use `torch.nn.parallel.DistributedDataParallel` (DDP), which handles gradient synchronization across nodes. Additionally, you must set `instance_count` to 2 (or more) in the SageMaker PyTorch estimator to launch multiple instances, each with its own GPU, enabling true multi-node distributed training.

Exam trap

Cisco often tests the distinction between `DataParallel` (single-node, multi-GPU) and `DistributedDataParallel` (multi-node, multi-GPU), leading candidates to incorrectly choose `DataParallel` because they overlook the requirement for multiple instances.

How to eliminate wrong answers

Option A is wrong because the SageMaker distributed data parallelism library is a separate framework (SMDDP) that is not required for PyTorch DDP; using it would add unnecessary complexity and is not the standard approach for PyTorch users. Option B is wrong because switching to TensorFlow is unnecessary and introduces a framework change; the question specifies PyTorch, and `tf.distribute.MirroredStrategy` is for TensorFlow, not PyTorch. Option D is wrong because `torch.nn.DataParallel` only parallelizes within a single node (single instance) and does not support multi-instance distributed training; it also does not scale across multiple GPUs on different instances, so it would not reduce training time when using multiple instances.

378
Multi-Selectmedium

A data scientist is training a gradient boosting model using SageMaker. The model is overfitting to the training data. Which TWO actions can help reduce overfitting? (Choose 2)

Select 2 answers
A.Increase the number of boosting rounds
B.Increase the learning rate
C.Increase the minimum child weight
D.Reduce the maximum depth of trees
E.Use a larger training dataset
AnswersC, D

Higher min_child weight requires more data to split, reducing overfitting.

Why this answer

Increasing the learning rate actually worsens overfitting; increasing max_depth increases model complexity. Reducing max_depth and increasing min_child_weight both regularize the model.

379
MCQhard

A data scientist is building a regression model to predict house prices. The dataset contains features like number of bedrooms, square footage, and location. After training, the model has high variance. Which technique should the data scientist use to reduce variance without significantly increasing bias?

A.Use bagging
B.Increase the number of features
C.Apply L2 regularization
D.Use fewer training examples
AnswerC

L2 regularization penalizes large coefficients, reducing variance.

Why this answer

Option D is correct because L2 regularization (Ridge) penalizes large coefficients, reducing variance while keeping bias low. Option A is wrong because increasing model complexity increases variance. Option B is wrong because removing features may increase bias.

Option C is wrong because bagging reduces variance but may not be appropriate for all cases; regularization is more direct.

380
MCQmedium

A company is building a recommendation system using matrix factorization. The dataset has 1 million users and 100,000 items. The data scientist trains a model using SageMaker's Factorization Machines algorithm. The model achieves a root mean squared error (RMSE) of 0.95 on the test set. However, the business requires RMSE below 0.90. The data scientist has already tuned hyperparameters like number of factors and learning rate. Which additional step should the data scientist take to improve RMSE?

A.Add side features such as user demographics and item categories
B.Increase the number of training iterations
C.Increase the number of factors to 1000
D.Use a linear regression model instead
AnswerA

Side features enrich the model and can improve accuracy.

Why this answer

Option D (add side features) provides more information to the model. Option A (more factors) may overfit. Option B (more iterations) may not help if converged.

Option C (use linear regression) is less powerful.

381
MCQhard

A data scientist is training a deep learning model for image classification on Amazon SageMaker. The dataset consists of 10,000 images of size 224x224 pixels. The training job uses a single ml.p3.2xlarge instance. The data scientist notices that the GPU utilization is very low (~20%) and the training is slow. Which change would most likely improve GPU utilization?

A.Use gradient accumulation
B.Use a larger instance type with more GPUs
C.Increase the batch size
D.Increase the number of data loader workers to load data in parallel
AnswerD

More workers can load data faster, reducing idle GPU time.

Why this answer

Low GPU utilization often indicates that the data loading pipeline is bottlenecked. Increasing the number of data loader workers can improve data throughput to the GPU, keeping it busy.

382
MCQhard

A company runs an e-commerce platform on AWS. They have a SageMaker endpoint serving a product recommendation model. The model uses a custom container with a TensorFlow model. Recently, the endpoint has been returning high latency and occasional 504 errors during peak traffic. The data scientist observes that the model inference time is around 200 ms per request, but the endpoint is configured with a single ml.c5.large instance. The traffic spikes can reach 100 requests per second. The data scientist needs to reduce latency and eliminate 504 errors. Which course of action is most appropriate?

A.Use Amazon Elastic Inference to attach an EI accelerator to the endpoint instance
B.Configure the SageMaker endpoint with Application Auto Scaling to scale out based on the 'InvocationsPerInstance' metric, and use a larger instance type such as ml.c5.xlarge
C.Switch to a multi-model endpoint to serve multiple models on the same instance
D.Replace the SageMaker endpoint with an AWS Lambda function that loads the model from S3 and returns predictions
AnswerB

Auto scaling adds instances to handle load; a larger instance reduces per-request latency.

Why this answer

Option B is correct because the endpoint is bottlenecked by both instance size and concurrency. With a single ml.c5.large instance handling 100 requests per second and a 200 ms inference time, the instance can only process about 5 requests per second (1000 ms / 200 ms = 5 requests per second per instance). Application Auto Scaling based on the 'InvocationsPerInstance' metric will add instances during traffic spikes, while upgrading to ml.c5.xlarge doubles compute capacity per instance, reducing latency and eliminating 504 errors caused by request queue overflow.

Exam trap

The trap here is that candidates often confuse performance bottlenecks with model optimization or cost-saving strategies, and incorrectly choose Elastic Inference or multi-model endpoints, which address different problems (GPU acceleration or multi-model hosting) rather than the core issue of insufficient compute capacity and lack of auto scaling.

How to eliminate wrong answers

Option A is wrong because Amazon Elastic Inference attaches a GPU accelerator for deep learning inference, but the ml.c5.large instance is CPU-based and the bottleneck is compute capacity per instance, not GPU acceleration; EI does not address the concurrency or scaling issue. Option C is wrong because a multi-model endpoint is designed to host multiple models on a single instance to reduce hosting costs, not to reduce latency or handle high traffic spikes; it does not increase the compute capacity or scaling of the endpoint. Option D is wrong because AWS Lambda has a maximum execution timeout of 15 minutes and limited memory (up to 10 GB), but loading a TensorFlow model from S3 on each invocation would add cold start latency and cannot sustain 100 requests per second without heavy concurrency management, making it unsuitable for real-time inference at this scale.

383
MCQeasy

A data scientist is training a random forest model. During hyperparameter tuning, which parameter is MOST effective at reducing overfitting?

A.Increase the number of trees
B.Increase the number of features considered per split
C.Decrease the maximum depth of each tree
D.Increase the maximum depth of each tree
AnswerC

Shallow trees generalize better.

Why this answer

Decreasing the maximum depth of each tree limits the complexity of individual trees, preventing them from memorizing noise and outliers in the training data. This directly reduces overfitting by enforcing simpler decision boundaries, which is a core regularization technique for ensemble methods like Random Forest.

Exam trap

AWS often tests the misconception that adding more trees always reduces overfitting, but the trap is that while more trees stabilize predictions, they do not address the root cause of overfitting from overly complex individual trees.

How to eliminate wrong answers

Option A is wrong because increasing the number of trees generally improves model stability and reduces variance without significantly increasing overfitting; it can even help generalization. Option B is wrong because increasing the number of features considered per split increases tree diversity and can reduce overfitting, but it is not the most effective parameter for directly controlling overfitting. Option D is wrong because increasing the maximum depth of each tree allows trees to grow deeper, capturing more specific patterns and noise, which exacerbates overfitting.

384
Multi-Selectmedium

A data scientist is training a random forest classifier on Amazon SageMaker and wants to reduce overfitting. Which TWO actions should the scientist take? (Choose TWO.)

Select 2 answers
A.Increase the number of features considered per split
B.Increase the maximum depth of trees
C.Decrease the number of trees
D.Limit the maximum depth of trees
E.Increase the number of trees
AnswersD, E

Shallow trees reduce overfitting.

Why this answer

Increasing the number of trees reduces variance, and limiting the maximum depth prevents overfitting. Option A is wrong because increasing max depth increases overfitting. Option D is wrong because reducing the number of trees increases variance.

Option E is wrong because increasing the number of features increases tree correlation and may increase overfitting.

385
MCQeasy

A machine learning engineer needs to deploy a real-time inference endpoint for a model that requires GPU acceleration for low latency. Which AWS service should be used?

A.Amazon SageMaker real-time endpoint
B.Amazon SageMaker batch transform
C.Amazon EC2 with auto scaling
D.AWS Lambda with GPU
AnswerA

SageMaker real-time endpoints support GPU instances and provide low-latency inference.

Why this answer

Amazon SageMaker provides real-time endpoints that support GPU instances for low-latency inference. AWS Lambda does not support GPU, and Batch is for asynchronous processing. EC2 would require manual management.

386
MCQhard

Refer to the exhibit. A SageMaker training job failed with the error shown. What is the most likely cause of this error?

A.The input data contains missing or invalid values
B.The training algorithm is not compatible with the data type
C.The training instance type is not powerful enough
D.The training script has a syntax error
AnswerA

NaN or infinity in data cause this error.

Why this answer

The error indicates that the input data contains NaN or infinite values. This is a data quality issue. The algorithm expects clean numeric values.

The algorithm itself is fine; the training script may have a bug but the error specifically points to input data.

387
MCQmedium

A data scientist is trying to launch a SageMaker training job using an IAM role with the above policy. The training job fails with an access denied error. What is the MOST likely reason?

A.The policy does not include s3:ListBucket permission
B.The sagemaker:StopTrainingJob action is not required
C.The sagemaker:CreateTrainingJob action is not allowed for the specific instance type
D.The S3 bucket ARN should not include the /* suffix
AnswerA

SageMaker needs ListBucket to access objects in the bucket.

Why this answer

The policy grants s3:GetObject and s3:PutObject permissions on the S3 bucket ARN with a /* suffix, but SageMaker training jobs also require s3:ListBucket permission at the bucket level (without the /*) to enumerate objects and validate paths during job creation. Without s3:ListBucket, the training job fails with an access denied error even though read/write permissions are present.

Exam trap

AWS often tests the subtle distinction between bucket-level actions (s3:ListBucket) and object-level actions (s3:GetObject, s3:PutObject), where candidates mistakenly assume object permissions are sufficient for SageMaker training jobs.

How to eliminate wrong answers

Option B is wrong because sagemaker:StopTrainingJob is an unrelated action that is not required for launching a training job; the error occurs during job creation, not stopping. Option C is wrong because the policy does not specify any instance type restrictions, and SageMaker IAM policies do not typically deny CreateTrainingJob based on instance type unless explicitly scoped with a condition key like sagemaker:InstanceTypes. Option D is wrong because the /* suffix is correctly used to grant object-level permissions on all objects within the bucket; the issue is the missing bucket-level s3:ListBucket permission, not the suffix itself.

388
MCQmedium

A machine learning team is deploying a real-time inference endpoint for a fraud detection model using Amazon SageMaker. The model requires low latency (<100 ms) and the team expects a steady stream of requests with occasional spikes. Which instance type and deployment strategy should they use to minimize cost while meeting latency requirements?

A.Use ml.p3 instances with a multi-model endpoint.
B.Use AWS Lambda with a container image for serverless inference.
C.Use ml.m5 instances with a production variant and auto-scaling.
D.Use ml.c5 instances with a single endpoint and provisioned concurrency.
AnswerD

Compute-optimized instances and auto-scaling meet latency and cost goals.

Why this answer

Option D is correct because ml.c5 instances are compute-optimized for low-latency inference, and provisioned concurrency pre-warms the endpoint to handle steady traffic with spikes without cold starts, meeting the <100 ms requirement cost-effectively. This combination avoids over-provisioning while ensuring consistent performance.

Exam trap

The trap here is that candidates often choose auto-scaling (Option C) thinking it handles spikes cost-effectively, but they overlook the latency penalty of scaling up during a spike, which can exceed 100 ms, whereas provisioned concurrency (Option D) pre-warms capacity to meet the latency requirement.

How to eliminate wrong answers

Option A is wrong because ml.p3 instances are GPU-based and designed for deep learning training, not cost-effective for real-time inference on a fraud detection model that likely uses tree-based or linear models; multi-model endpoints add overhead that can increase latency. Option B is wrong because AWS Lambda has a maximum execution timeout of 15 minutes and cold starts can exceed 100 ms, making it unsuitable for sub-100 ms real-time inference with occasional spikes. Option C is wrong because ml.m5 instances are general-purpose and may not provide the compute-optimized performance needed for low latency; using a production variant with auto-scaling can introduce scaling latency during spikes, potentially violating the 100 ms requirement.

389
MCQeasy

A machine learning engineer is training a linear regression model on a dataset with 50 features. After training, the model achieves high accuracy on the training set but poor accuracy on the test set. Which technique should the engineer use to address this issue?

A.Train a deeper neural network with more layers
B.Add more features through feature engineering
C.Apply L1 or L2 regularization
D.Increase the size of the training dataset
AnswerC

Regularization penalizes large coefficients and reduces overfitting.

Why this answer

The model exhibits overfitting: high training accuracy but poor test accuracy. L1 (Lasso) or L2 (Ridge) regularization penalizes large coefficients, reducing model complexity and improving generalization. This directly addresses the variance problem without requiring more data or features.

Exam trap

AWS often tests the distinction between overfitting and underfitting, and the trap here is that candidates may think adding more data (Option D) is the universal fix for overfitting, when in fact regularization is the most direct and efficient solution for a model with high variance.

How to eliminate wrong answers

Option A is wrong because training a deeper neural network would increase model capacity and likely worsen overfitting, not fix it. Option B is wrong because adding more features through feature engineering would increase dimensionality and exacerbate overfitting, not reduce it. Option D is wrong because increasing the training dataset size can help reduce overfitting, but it is not the most direct or practical fix; regularization is a more immediate and targeted technique for this specific symptom.

390
MCQhard

A data scientist is tuning a linear regression model and observes that the model has high bias and low variance. Which action is most likely to improve model performance?

A.Reduce the number of features
B.Increase regularization
C.Add more features
D.Reduce the amount of training data
AnswerC

Increases complexity, reducing bias.

Why this answer

High bias and low variance indicate underfitting, meaning the model is too simple to capture the underlying patterns in the data. Adding more features increases model complexity, allowing it to learn more relevant relationships and reduce bias. This directly addresses the core issue of underfitting in linear regression.

Exam trap

AWS often tests the bias-variance tradeoff by presenting high bias (underfitting) and high variance (overfitting) scenarios, and the trap here is that candidates mistakenly choose to increase regularization or reduce features, which are remedies for overfitting, not underfitting.

How to eliminate wrong answers

Option A is wrong because reducing the number of features further simplifies the model, which would increase bias and worsen underfitting. Option B is wrong because increasing regularization penalizes model coefficients more heavily, reducing complexity and increasing bias, which is the opposite of what is needed. Option D is wrong because reducing the amount of training data typically increases variance (overfitting risk) and does not address the high bias problem; it may also degrade the model's ability to learn generalizable patterns.

391
MCQeasy

A machine learning engineer needs to deploy a model that makes real-time predictions with latency under 100ms. The model is a small ensemble of decision trees. Which AWS service is MOST suitable?

A.Amazon EMR with Spark Streaming
B.AWS Glue
C.Amazon SageMaker endpoint
D.AWS Lambda with custom container
AnswerC

SageMaker endpoints are designed for real-time inference with low latency.

Why this answer

Amazon SageMaker provides real-time endpoints with low latency for model inference, and can host the ensemble as a single endpoint.

392
MCQhard

A data scientist is tuning hyperparameters for an XGBoost model on a large dataset using Amazon SageMaker. The training job is taking too long, and they want to speed up the tuning process. Which strategy is most effective?

A.Use Bayesian optimization
B.Use grid search with a fine-grained grid
C.Use random search with more iterations
D.Reduce the max depth of trees
AnswerA

Bayesian optimization is more efficient.

Why this answer

Option B is correct because Bayesian optimization is more efficient than grid search, especially for large datasets. Option A is wrong because grid search is exhaustive and slow. Option C is wrong because random search is faster but less efficient.

Option D is wrong because reducing max depth may reduce accuracy.

393
Multi-Selectmedium

A data scientist is training a classification model on a dataset with missing values in several features. The data scientist wants to use SageMaker to train the model. Which TWO approaches can the data scientist use to handle missing data within the SageMaker training pipeline? (Choose two.)

Select 2 answers
A.Use the SageMaker built-in XGBoost algorithm, which can handle missing values by default.
B.Use the SageMaker BlazingText algorithm, which automatically imputes missing values.
C.Use SageMaker Inference Pipeline to handle missing values at inference time.
D.Use SageMaker Processing to run a custom Python script that imputes missing values before training.
E.Use SageMaker PCA algorithm, which automatically handles missing values.
AnswersA, D

XGBoost has built-in support for missing values.

Why this answer

Option A is correct because the SageMaker built-in XGBoost algorithm has a built-in mechanism to handle missing values by default. It learns the best direction (left or right branch) to route missing values during training, so no explicit imputation is needed. This makes it a seamless choice for datasets with missing data within the SageMaker training pipeline.

Exam trap

The trap here is that candidates often assume all SageMaker built-in algorithms automatically handle missing values, but only XGBoost does; BlazingText and PCA require complete data, and Inference Pipeline is for serving, not training.

394
MCQhard

An e-commerce company uses a linear regression model to predict customer lifetime value (LTV). The model shows high variance on the test set, with training RMSE much lower than test RMSE. Which of the following is the MOST effective approach to reduce overfitting?

A.Apply L2 regularization (Ridge regression)
B.Use a polynomial kernel in a support vector regressor
C.Add more features, including interaction terms
D.Increase training data size by duplicating existing samples
AnswerA

L2 regularization shrinks coefficients and reduces variance.

Why this answer

High variance (low training RMSE, high test RMSE) indicates overfitting. L2 regularization (Ridge regression) adds a penalty proportional to the square of the coefficients, shrinking them toward zero without eliminating them, which reduces model complexity and improves generalization. This directly addresses overfitting by constraining the model's sensitivity to noise in the training data.

Exam trap

Cisco often tests the misconception that adding more data always reduces overfitting, but the trap here is that duplicating existing samples (Option D) does not provide new, diverse examples and therefore fails to address the root cause of high variance.

How to eliminate wrong answers

Option B is wrong because using a polynomial kernel in a support vector regressor increases model complexity by mapping data into a higher-dimensional space, which would exacerbate overfitting rather than reduce it. Option C is wrong because adding more features, including interaction terms, further increases model complexity and variance, making overfitting worse. Option D is wrong because duplicating existing samples does not introduce new information; it artificially inflates the weight of existing patterns, which can actually increase overfitting by reinforcing noise in the training data.

395
Multi-Selecthard

A data scientist is training a deep learning model using Amazon SageMaker. The training loss is decreasing, but the validation loss starts increasing after 10 epochs. The model is overfitting. Which TWO actions should the data scientist take to reduce overfitting? (Choose 2.)

Select 2 answers
A.Increase the number of layers
B.Remove L2 regularization
C.Increase the number of training steps
D.Add dropout layers
E.Add early stopping based on validation loss
AnswersD, E

Dropout regularizes by randomly dropping neurons.

Why this answer

Option D is correct because dropout layers randomly deactivate a fraction of neurons during training, which forces the network to learn more robust features and reduces co-adaptation, a common cause of overfitting. This technique is particularly effective in deep learning models trained on SageMaker, where large architectures can quickly memorize training data.

Exam trap

The trap here is that candidates often confuse regularization techniques that reduce overfitting (dropout, L2, early stopping) with actions that increase model capacity (more layers, more steps), leading them to select options that would worsen the problem.

396
MCQmedium

A company uses Amazon SageMaker to train a model for detecting fraudulent transactions. The dataset is highly imbalanced (99.9% legitimate, 0.1% fraudulent). Which approach is most effective to address this imbalance?

A.Use class weights in the loss function
B.Apply SMOTE to generate synthetic samples
C.Random oversampling of the minority class
D.Collect more data for the minority class
AnswerB

SMOTE generates synthetic samples to balance the dataset.

Why this answer

Option C is correct because SMOTE generates synthetic samples for the minority class, effectively balancing the dataset. Option A is wrong because random oversampling may cause overfitting. Option B is wrong because using class weights directly in the loss function is also valid but SMOTE is often more effective.

Option D is wrong because collecting more data is good but may not be feasible.

397
MCQmedium

A team deployed a SageMaker endpoint with the configuration shown in the exhibit. During a traffic spike, the endpoint becomes unresponsive. Which change to the endpoint configuration would best improve availability?

A.Reduce the initial instance count to 0 and use on-demand invocation
B.Add a second production variant with the same model
C.Configure auto-scaling for the endpoint
D.Change the instance type to ml.m5.xlarge
AnswerC

Auto-scaling dynamically adds instances during traffic spikes, improving availability.

Why this answer

Option C is correct because adding auto-scaling allows the endpoint to adjust instance count based on load. Option A is wrong because increasing instance size may not handle spikes if only one instance. Option B is wrong because multiple variants for A/B testing don't improve availability.

Option D is wrong because reducing instance count worsens availability.

398
Multi-Selectmedium

A data scientist is training a gradient boosting model using SageMaker's built-in XGBoost algorithm. The dataset has missing values in several features. Which TWO actions should the data scientist take to handle missing values effectively? (Choose two.)

Select 2 answers
A.Impute missing values with the median of each feature using a preprocessing step.
B.Use one-hot encoding to create binary columns indicating missingness.
C.Remove all rows with missing values from the training dataset.
D.Apply PCA to reduce dimensionality and ignore missing values.
E.Set the 'missing' parameter in XGBoost to a specific value (e.g., 0) and let the algorithm learn the best imputation.
AnswersA, E

Median imputation is a robust method that preserves data.

Why this answer

Option A and Option D are correct. XGBoost can learn the best direction to handle missing values by using the missing parameter. Alternatively, using a constant imputation (like median) is a standard approach.

Option B (remove rows) loses data. Option C (one-hot encode) is for categorical. Option E (PCA) is for dimensionality reduction.

399
Multi-Selectmedium

Which TWO metrics are suitable for evaluating a regression model? (Select TWO.)

Select 2 answers
A.Accuracy
B.Root Mean Squared Error (RMSE)
C.R-squared
D.F1-score
E.Precision
AnswersB, C

RMSE measures average prediction error in regression.

Why this answer

Root Mean Squared Error (RMSE) is a standard metric for regression models because it measures the average magnitude of prediction errors in the same units as the target variable. It penalizes larger errors more heavily due to squaring, making it sensitive to outliers and useful for comparing model performance.

Exam trap

Cisco often tests the distinction between classification and regression metrics, and the trap here is that candidates mistakenly apply classification metrics like Accuracy, F1-score, or Precision to regression problems because they are familiar with them from other contexts.

400
MCQmedium

A company is using SageMaker to train a model, but the training job fails with an out-of-memory error. Which action should the data scientist take to resolve this issue?

A.Use a larger instance type for training
B.Decrease the batch size
C.Increase the learning rate
D.Increase the number of layers
AnswerB

Smaller batches use less memory.

Why this answer

Decreasing the batch size reduces the memory footprint per training step, directly addressing the out-of-memory (OOM) error. In SageMaker, the training instance's GPU or CPU memory is shared between model parameters, activations, and the batch data; a smaller batch size lowers the peak memory usage, allowing the training job to complete without exceeding the instance's memory limit.

Exam trap

The trap here is that candidates often default to scaling up infrastructure (larger instance) instead of optimizing hyperparameters like batch size, which is a more immediate and cost-effective fix for OOM errors in SageMaker.

How to eliminate wrong answers

Option A is wrong because using a larger instance type may resolve the OOM error but is not the most efficient or cost-effective first step; it increases costs and does not address the root cause of memory bloat. Option C is wrong because increasing the learning rate does not affect memory usage; it changes the step size in gradient descent and can lead to divergence or instability. Option D is wrong because increasing the number of layers adds more parameters and activations, which increases memory consumption and would worsen the OOM error.

401
MCQhard

A company is using SageMaker to train a deep learning model with TensorFlow. The training job is running on an ml.p3.16xlarge instance. The data scientist wants to maximize GPU utilization. Which configuration should be used?

A.Use a single GPU and increase the number of epochs.
B.Use a CPU-only instance for training and then deploy on GPU.
C.Use File mode input and a small batch size.
D.Use Pipe mode or Fast File mode with a large batch size that fits in GPU memory.
AnswerD

Pipe mode streams data efficiently; large batch size maximizes GPU compute.

Why this answer

SageMaker's Pipe mode and fast file mode reduce I/O bottlenecks. For GPU utilization, the data pipeline must keep GPUs busy. Option A (File mode with small batch) may underutilize GPU.

Option B (single GPU) wastes resources. Option D (using CPU instance) is counterproductive.

402
Drag & Dropmedium

Drag and drop the steps to use Amazon SageMaker Debugger to debug a training job in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Debugger requires hook configuration, job setup with rules, execution, and analysis.

403
MCQmedium

A team is using Amazon SageMaker to train a deep learning model. The training job is taking too long, and they want to reduce training time without significant accuracy loss. They have already tried increasing the number of instances. Which technique should they consider next?

A.Increase L2 regularization
B.Reduce model complexity
C.Gradient accumulation
D.Early stopping
AnswerC

Gradient accumulation simulates larger batch sizes, improving convergence speed.

Why this answer

Option D is correct because gradient accumulation allows simulating larger batch sizes without increasing memory, often speeding up convergence. Option A is wrong because early stopping may stop too early. Option B is wrong because reducing model complexity may cause underfitting.

Option C is wrong because L2 regularization does not reduce training time.

404
MCQmedium

A data scientist is training a deep learning model for image classification using Amazon SageMaker. The training job is taking too long. The data scientist wants to speed up training by using distributed training across multiple GPUs. Which SageMaker feature or configuration should the data scientist use?

A.SageMaker Debugger
B.Model parallelism in SageMaker
C.SageMaker hyperparameter tuning
D.SageMaker Data Parallelism library
AnswerD

The SageMaker Data Parallelism library distributes data across multiple GPUs, reducing training time for large datasets.

Why this answer

Option D is correct because the SageMaker Data Parallelism library is specifically designed to distribute training across multiple GPUs by splitting the input data across workers, which reduces per-GPU computation time and accelerates training for deep learning models. This library uses optimized all-reduce algorithms (e.g., Ring AllReduce) to synchronize gradients efficiently, making it ideal for speeding up image classification tasks that are data-intensive.

Exam trap

The trap here is that candidates often confuse model parallelism (splitting the model) with data parallelism (splitting the data), and incorrectly choose model parallelism when the scenario clearly describes a training speed issue solvable by distributing data across GPUs.

How to eliminate wrong answers

Option A is wrong because SageMaker Debugger is a tool for monitoring and debugging training jobs (e.g., capturing tensors, detecting anomalies), not for distributing training across GPUs. Option B is wrong because model parallelism in SageMaker splits the model itself across devices, which is useful for models too large to fit on a single GPU, but the question asks to speed up training for a model that already fits on a single GPU, where data parallelism is the appropriate approach. Option C is wrong because SageMaker hyperparameter tuning automates the search for optimal hyperparameters (e.g., learning rate, batch size) but does not directly enable distributed training across multiple GPUs.

405
MCQeasy

A company is using Amazon SageMaker to deploy a machine learning model for real-time inference. The model was trained using XGBoost and achieves high accuracy. However, during deployment, the endpoint returns a 'ModelError' when receiving input data. The input is a CSV string. What is the most likely cause?

A.The input data format does not match the model's expected format (e.g., CSV vs JSON)
B.The inference instance type is too small
C.The model is not properly loaded into memory
D.The model weights are corrupted during deployment
AnswerA

SageMaker inference endpoints require the input to be in the format expected by the model, e.g., CSV for XGBoost.

Why this answer

The most common cause of ModelError during inference is that the input format does not match what the model expects. XGBoost models typically expect CSV without headers. The serializer setting in SageMaker must be configured correctly.

If the model expects text/csv but the endpoint is configured as JSON, the error occurs. The other options are less likely: model weights are loaded correctly if the model deployed, and the instance type affects latency not errors.

406
MCQhard

A data scientist is training a model using SageMaker's built-in XGBoost algorithm with a large dataset stored in CSV format. The training job is using File mode. The data scientist wants to reduce the time it takes to start training. Which approach would be most effective?

A.Increase the size of the EBS volume.
B.Convert the data to Parquet format.
C.Use Pipe mode for the input data channel.
D.Increase the number of training instances.
AnswerC

Pipe mode starts training immediately by streaming data.

Why this answer

Pipe mode streams data directly from Amazon S3 into the training container, eliminating the need to first download the entire dataset to the EBS volume. This reduces the startup time significantly because training can begin as soon as the first records arrive, rather than waiting for the full download to complete.

Exam trap

The trap here is that candidates often assume converting to a more efficient format like Parquet will speed up training startup, but in File mode the bottleneck is the download step, not the read efficiency, so Pipe mode directly addresses the root cause.

How to eliminate wrong answers

Option A is wrong because increasing the EBS volume size does not reduce the time to start training; it only provides more storage space, and the download time from S3 remains the same. Option B is wrong because converting to Parquet format improves read performance and reduces storage size, but the training job still uses File mode, which requires the full dataset to be downloaded to the EBS volume before training starts. Option D is wrong because increasing the number of training instances does not reduce the startup time; it distributes the training workload across more machines but still requires each instance to download the full dataset in File mode before training begins.

407
MCQmedium

A data scientist is trying to create a SageMaker training job but receives an access denied error. The IAM policy attached to the role is shown in the exhibit. What is the most likely cause of the error?

A.The policy does not allow s3:PutObject for the output location
B.The policy does not allow sagemaker:CreateTrainingJob
C.The policy has an explicit deny on s3:PutObject
D.The policy does not allow s3:GetObject on the output bucket
AnswerA

SageMaker needs to write model artifacts and output to S3, requiring s3:PutObject.

Why this answer

Option B is correct because the policy lacks permissions to write output to S3. The s3:PutObject action is missing. Option A is wrong because the policy allows s3:GetObject.

Option C is wrong because sagemaker:CreateTrainingJob is allowed. Option D is wrong because there is no condition denying writes.

408
MCQmedium

A team is using Amazon SageMaker to train a model and wants to automatically stop training when the model stops improving to save costs. Which SageMaker feature should they use?

A.SageMaker Experiments
B.SageMaker Debugger
C.SageMaker Managed Spot Training with early stopping
D.SageMaker Automatic Model Tuning
AnswerC

Spot training can use a stopping condition to halt when improvements cease.

Why this answer

SageMaker's managed spot training with early stopping can halt jobs when improvement plateaus, but the specific feature is Automatic Model Tuning with early stopping, or using a custom stopping condition via CloudWatch. However, the simplest is SageMaker's built-in early stopping in Hyperparameter Tuning jobs. For a single training job, use a custom callback or the SageMaker Debugger.

409
Multi-Selecthard

A data scientist is using Amazon SageMaker to train a random forest model for a binary classification task. The dataset has 50 features and 10,000 samples. The model achieves high training accuracy but poor test accuracy. Which TWO actions should the scientist take to improve generalization?

Select 2 answers
A.Increase the max_samples parameter.
B.Reduce the max_depth of the trees.
C.Increase the max_features parameter.
D.Increase the number of trees (n_estimators).
E.Increase the min_samples_leaf parameter.
AnswersB, E

Reducing tree depth limits model complexity and helps prevent overfitting.

Why this answer

The model is overfitting. Increasing max_depth would increase overfitting. Reducing the number of trees (n_estimators) generally increases bias and may reduce overfitting, but random forest is robust to overfitting with more trees.

Typically, increasing trees reduces overfitting. However, reducing tree depth (max_depth) reduces model complexity. Also, increasing the minimum samples per leaf reduces overfitting.

So correct actions are: reduce max_depth and increase min_samples_leaf. Option B (increase n_estimators) can also help, but it is not a direct fix for overfitting; random forest with more trees tends to generalize better. But the question says 'which TWO'.

I'll go with A and D as they directly reduce complexity. However, increasing n_estimators is also a common practice to reduce overfitting. Let's check: In random forest, more trees reduce variance and overfitting.

So B is also correct. But we need exactly 2 correct. The question says 'Which TWO'.

I need to provide two correct options. I'll choose A and D. But I'll reconsider: Reducing max_depth (A) and increasing min_samples_leaf (D) are standard ways to reduce overfitting.

Increasing n_estimators (B) also helps but may increase training time. The question likely expects A and D. I'll set A and D as correct.

410
MCQeasy

A data scientist has this IAM policy attached to an IAM role used by SageMaker. When trying to create a training job, the scientist gets an access denied error. The training data is in 's3://my-bucket/training-data/'. What is the most likely cause?

A.The bucket name is misspelled
B.The S3 resource ARN is incorrect
C.Missing s3:ListBucket permission
D.The sagemaker:CreateTrainingJob action is not allowed
AnswerC

SageMaker needs ListBucket permission to access objects.

Why this answer

Option C is correct. The policy allows 's3:GetObject' but not 's3:ListBucket', which is required for SageMaker to access objects in the bucket. Option A is wrong because the actions are allowed.

Option B is wrong because the resource is specified correctly. Option D is wrong because the bucket and prefix are correct.

411
MCQmedium

A data scientist is training a binary classification model on an imbalanced dataset where the positive class represents 5% of the data. The model achieves 99% accuracy but only identifies 10% of the actual positive cases. Which metric should the data scientist focus on to evaluate the model's performance on the positive class?

A.Precision
B.Recall
C.AUC-ROC
D.F1 score
AnswerB

Recall measures the proportion of actual positives correctly identified, which is the key issue.

Why this answer

Recall measures the proportion of actual positives correctly identified, which is critical for imbalanced datasets where accuracy is misleading. Option A is wrong because precision focuses on correctness of positive predictions, not coverage. Option B is wrong because F1 balances precision and recall but doesn't directly address the low recall.

Option D is wrong because AUC-ROC considers overall separability, not specifically recall of the positive class.

412
Multi-Selecteasy

A data scientist is training a binary classifier using imbalanced data. Which TWO techniques can help improve model performance on the minority class? (Choose two.)

Select 2 answers
A.Undersample the majority class randomly.
B.Use accuracy as the evaluation metric.
C.Use the F1 score as the evaluation metric.
D.Oversample the minority class using SMOTE.
E.Apply L1 regularization to the model.
AnswersC, D

F1 score balances precision and recall.

Why this answer

The F1 score is the harmonic mean of precision and recall, making it a robust evaluation metric for imbalanced datasets because it captures both false positives and false negatives. Unlike accuracy, which can be misleadingly high when the majority class dominates, the F1 score provides a balanced measure of model performance on the minority class.

Exam trap

Cisco often tests the misconception that random undersampling is always beneficial for imbalanced data, but candidates must recognize that it can discard useful majority class patterns and that SMOTE or other synthetic oversampling methods are preferred.

413
MCQmedium

Refer to the exhibit. A data scientist ran a SageMaker training job and reviewed the logs. The training completed quickly, but the model performance is very poor. What is the most likely cause?

A.The model is overfitting to the training data.
B.There is data leakage from the test set into the training set.
C.The learning rate is too low, causing slow convergence.
D.The training dataset is too small for the model complexity.
AnswerD

A small dataset can be trained quickly but leads to poor generalization.

Why this answer

The training ran for only about 1 minute, which is too short for a typical training. The model likely didn't converge. This indicates that the training job might have been configured with too few epochs or the data was very small, or the algorithm stopped early.

The logs show 'Training completed' quickly. The most likely cause is that the training job used a very small number of epochs or early stopping criteria caused premature termination. Option C (model overfitting) would show longer training and high training accuracy.

Option D (data leakage) would show good performance. Option A (insufficient training data) could cause poor performance, but the logs show training completed quickly, suggesting the job didn't run long enough. Option B (incorrect learning rate) could cause divergence but would still train for the specified epochs.

The quick completion suggests the job was configured with too few epochs or early stopping. But among the options, A (insufficient training data) is plausible. However, the question says 'most likely'.

I'll choose B (incorrect learning rate) because if the learning rate is too high, the loss may explode and cause early stopping or NaN, leading to quick termination. But the log doesn't show errors. Actually, the log shows normal completion.

So it's likely the model didn't train enough. Option B: If learning rate is too low, training can be slow but still complete epochs. The quick completion suggests the number of epochs was small.

But the options don't mention epochs. Option A: insufficient training data would still train for the number of epochs. Option C: overfitting would not cause quick completion.

Option D: data leakage would give good performance. So I'm leaning towards B: incorrect learning rate (too high) could cause the loss to become NaN and training to stop, but the log says 'Training completed' not 'Stopped'. It might be that the training completed all epochs because of a small dataset.

Actually, the log shows 'Training completed' after 1 minute, so it might have finished all epochs. If the dataset is very small, training could be fast. That would lead to poor performance due to insufficient data.

So A is plausible. I'll go with A.

414
Multi-Selectmedium

A data scientist is performing feature selection for a classification problem with 100 features. The data scientist wants to reduce overfitting and improve model interpretability. Which THREE methods are appropriate for feature selection? (Choose THREE.)

Select 3 answers
A.Principal Component Analysis (PCA)
B.Recursive Feature Elimination (RFE)
C.L1 regularization (Lasso)
D.Adding random noise to the features
E.Feature importance from a random forest model
AnswersB, C, E

RFE recursively removes the least important features based on model coefficients or feature importance.

Why this answer

Recursive Feature Elimination (RFE) is a wrapper method that recursively removes the least important features based on a model's feature weights or coefficients, training the model multiple times to identify the optimal subset. This directly reduces overfitting by eliminating irrelevant or redundant features and improves interpretability by keeping only the most predictive features.

Exam trap

AWS often tests the distinction between feature selection (keeping original features) and dimensionality reduction (creating new features), so candidates mistakenly choose PCA as a feature selection method when it is actually a feature extraction technique.

415
Multi-Selectmedium

A data scientist is training a linear regression model and wants to handle multicollinearity among features. Which TWO actions are appropriate?

Select 2 answers
A.Add interaction terms between features
B.Use Ridge regression (L2 regularization)
C.Use Lasso regression (L1 regularization)
D.Remove one of the highly correlated features
E.Scale all features to have zero mean and unit variance
AnswersB, D

Ridge regression shrinks coefficients of correlated features, reducing their impact.

Why this answer

Ridge regression (L2) adds a penalty that can reduce the impact of correlated features. Removing one of the correlated features directly addresses multicollinearity. Lasso (L1) may also help but is less effective for groups of correlated features.

Scaling features does not remove collinearity. Adding interaction terms increases multicollinearity.

416
MCQmedium

A company uses Amazon SageMaker to train a classification model. The training job fails with an error indicating that the algorithm requires a GPU but the instance type does not have one. The scientist used the built-in XGBoost algorithm. What should the scientist do to resolve the issue?

A.Choose a CPU instance type for the training job
B.Install a GPU-enabled version of XGBoost in the training container
C.Change the algorithm to a deep learning algorithm
D.Use a larger GPU instance type
AnswerA

XGBoost can run on CPU; use CPU instance.

Why this answer

XGBoost does not require a GPU; it can run on CPU. The error may be due to using a GPU-only algorithm version or misconfiguration. The simplest solution is to choose a CPU instance type.

Installing a GPU version is unnecessary. Changing algorithm is not needed. Using a larger CPU instance can help but is not required.

Option A: Choose a CPU instance type is correct. Option B: Installing GPU version is not needed. Option C: Changing algorithm is unnecessary.

Option D: Using a larger instance may not address the issue if the instance type is still GPU-only.

417
MCQmedium

A machine learning team is deploying a model that performs real-time inference on streaming data from Amazon Kinesis Data Streams. The model requires sub-100ms latency. Which deployment option should the team choose?

A.Use Amazon SageMaker batch transform
B.Use Amazon SageMaker asynchronous inference
C.Deploy the model on an Amazon SageMaker real-time endpoint
D.Deploy a custom inference container on AWS Lambda
AnswerC

Real-time endpoints provide low-latency inference.

Why this answer

Amazon SageMaker real-time endpoints provide low-latency inference suitable for sub-100ms requirements. SageMaker batch transform is for offline predictions. SageMaker asynchronous inference is for near-real-time with longer latencies.

AWS Lambda alone may not handle model serving efficiently for low latency. Option A: SageMaker real-time endpoint is correct. Option B: SageMaker batch transform is for batch, not real-time.

Option C: SageMaker asynchronous inference has higher latency. Option D: AWS Lambda custom inference is possible but may not meet sub-100ms consistently due to cold starts.

418
Multi-Selecteasy

A data scientist is evaluating a classification model. The confusion matrix shows that the model has 50 true positives, 100 true negatives, 20 false positives, and 30 false negatives. Which TWO metrics can be calculated from this confusion matrix? (Choose two.)

Select 2 answers
A.R-squared
B.F1 score
C.Recall
D.Root mean squared error
E.Precision
AnswersC, E

Recall = TP/(TP+FN) can be directly calculated.

Why this answer

Options A and D are correct because precision and recall are directly computed from TP, FP, FN. Option B is wrong because R-squared is for regression. Option C is wrong because RMSE is for regression.

Option E is wrong because F1 score requires both precision and recall, but it is not directly from the confusion matrix without calculation; however, it can be derived. But the question asks 'calculate from this confusion matrix', and both precision and recall are directly calculated. F1 is derived from them, so it's also calculable.

However, the question says 'TWO', and the most direct are precision and recall. F1 requires an extra step. I'll consider precision and recall as the correct pair.

But to be precise, I'll pick A and D.

419
MCQeasy

A company is building a recommendation system for an e-commerce platform. The data includes user IDs and item IDs. Which SageMaker built-in algorithm is most appropriate?

A.BlazingText
B.XGBoost
C.Factorization Machines
D.Image Classification
AnswerC

Designed for recommendation.

Why this answer

Factorization Machines (FM) are specifically designed for recommendation tasks with sparse, high-dimensional categorical data like user IDs and item IDs. They model pairwise interactions between features (e.g., user-item interactions) using factorized parameters, making them highly effective for collaborative filtering and implicit feedback scenarios in e-commerce.

Exam trap

The trap here is that candidates often choose XGBoost (B) because it is a versatile algorithm, but they overlook that FM is purpose-built for sparse, high-dimensional interaction data and directly models pairwise feature interactions without manual feature engineering.

How to eliminate wrong answers

Option A is wrong because BlazingText is optimized for word embeddings and text classification, not for collaborative filtering or sparse user-item interaction matrices. Option B is wrong because XGBoost is a gradient boosting tree-based algorithm that struggles with extremely sparse, high-cardinality categorical features without extensive feature engineering, and it does not inherently model pairwise interactions like FM. Option D is wrong because Image Classification is designed for convolutional neural network tasks on pixel data, not for tabular or recommendation data with user and item IDs.

420
MCQmedium

A company is building a binary classifier to detect fraudulent transactions. The dataset is highly imbalanced with only 0.1% positive cases. The data scientist uses logistic regression and obtains 99.9% accuracy on the test set. Which metric should the data scientist use to evaluate the model's performance?

A.ROC AUC
B.Precision-recall curve
C.Precision
D.F1 score
AnswerB

Precision-recall curves focus on the positive class and handle imbalance well.

Why this answer

With only 0.1% positive cases, accuracy is misleading because a model that always predicts 'not fraudulent' achieves 99.9% accuracy. The precision-recall curve focuses on the positive class and is robust to extreme class imbalance, showing the trade-off between precision and recall across thresholds. This makes it the best choice for evaluating a binary classifier on highly imbalanced fraud detection data.

Exam trap

The trap here is that candidates see 'ROC AUC' as a standard metric and forget that it can be inflated by a large number of true negatives in imbalanced datasets, making precision-recall the correct choice for evaluating rare event classifiers.

How to eliminate wrong answers

Option A is wrong because ROC AUC can be overly optimistic on highly imbalanced datasets; the area under the ROC curve is dominated by the large number of true negatives, masking poor performance on the rare positive class. Option C is wrong because precision alone is a single-point metric that does not capture the trade-off with recall, so it cannot fully evaluate model performance across different decision thresholds. Option D is wrong because the F1 score is a harmonic mean of precision and recall at a single threshold, which may not reflect the model's overall ability to rank positive cases; it is less informative than the full precision-recall curve for threshold selection in imbalanced settings.

421
MCQmedium

A company wants to deploy a machine learning model that requires GPU acceleration for inference. The model is small and can fit on a single GPU. Which SageMaker endpoint configuration is MOST cost-effective?

A.Use a ml.p3.16xlarge instance with 8 GPUs.
B.Use a SageMaker Serverless Inference endpoint.
C.Use a Multi-Model Endpoint on a ml.g4dn.xlarge instance.
D.Use a ml.p3.2xlarge instance with 1 GPU and enable automatic scaling.
AnswerD

A single GPU instance with scaling provides cost-effective real-time inference.

Why this answer

Option D is the most cost-effective because it uses a single-GPU ml.p3.2xlarge instance, which matches the requirement that the model fits on one GPU, and enables automatic scaling to handle variable traffic without over-provisioning. This avoids paying for unused GPU capacity while still providing the necessary GPU acceleration for inference.

Exam trap

The trap here is that candidates often choose a larger GPU instance (like A) thinking it provides better performance, or select Serverless Inference (B) assuming it supports all instance types, but the exam tests the specific constraint that GPU acceleration is required and that Serverless Inference is CPU-only.

How to eliminate wrong answers

Option A is wrong because a ml.p3.16xlarge instance with 8 GPUs is massively over-provisioned for a model that fits on a single GPU, leading to unnecessary cost. Option B is wrong because SageMaker Serverless Inference does not support GPU acceleration; it uses CPU-based compute, which would not meet the GPU requirement. Option C is wrong because a Multi-Model Endpoint on a ml.g4dn.xlarge instance, while cost-effective for hosting multiple models, uses a single GPU that must be shared among all loaded models, potentially causing contention and not being the most cost-effective for a single model that fits on one GPU.

422
MCQhard

A data scientist is building a binary classifier for loan default prediction. The cost of a false negative (missing a default) is 10 times higher than the cost of a false positive. Which evaluation metric is MOST appropriate?

A.Precision
B.F-beta score with beta=2
C.Accuracy
D.Area under the ROC curve
AnswerB

F-beta with beta>1 gives more weight to recall, minimizing costly false negatives.

Why this answer

The F-beta score with beta=2 is the most appropriate metric because it weights recall (sensitivity) higher than precision, which is critical when false negatives are 10 times more costly than false positives. Beta=2 means recall is considered 2^2 = 4 times more important than precision, directly aligning with the asymmetric cost structure. This allows the model to be tuned to minimize missed defaults, even at the expense of more false alarms.

Exam trap

The trap here is that candidates often default to AUC-ROC as a 'balanced' metric without realizing it does not incorporate asymmetric error costs, leading them to overlook the F-beta score which is explicitly designed for such scenarios.

How to eliminate wrong answers

Option A is wrong because precision focuses only on the proportion of true positives among positive predictions, ignoring false negatives entirely, so it cannot account for the higher cost of missing defaults. Option C is wrong because accuracy treats all correct predictions equally and is misleading when classes are imbalanced, which is common in loan default prediction, and it does not incorporate the differential cost of errors. Option D is wrong because the area under the ROC curve (AUC-ROC) measures the model's ability to discriminate between classes across all thresholds but does not directly optimize for a specific cost ratio; it is a rank-based metric that does not penalize false negatives more heavily.

423
MCQeasy

A data scientist is building a time series forecasting model for monthly sales. The data shows strong seasonality with a yearly pattern. They plan to use Amazon Forecast. Which algorithm should they choose?

A.XGBoost
B.K-means clustering
C.DeepAR+
D.Linear regression
AnswerC

DeepAR+ is designed for time series with seasonality and trends.

Why this answer

Amazon Forecast's DeepAR+ is built for time series with seasonality.

424
Multi-Selecthard

Which THREE of the following are common causes of overfitting in machine learning models?

Select 3 answers
A.Using a complex model like a deep neural network on a small dataset
B.Model has too many parameters relative to the number of training samples
C.Having a large dataset with many samples
D.Training for too many epochs
E.Using regularization techniques
AnswersA, B, D

Complex models on small data overfit.

Why this answer

Option A is correct because a complex model like a deep neural network has high capacity and can easily memorize noise and patterns specific to a small dataset, rather than learning generalizable features. With limited training samples, the model fails to capture the underlying data distribution, leading to poor performance on unseen data.

Exam trap

Cisco often tests the misconception that more data or regularization causes overfitting, when in fact both are standard countermeasures; the trap is confusing correlation with causation in model training dynamics.

425
MCQhard

A data scientist is tuning a gradient boosting model using Amazon SageMaker's Automatic Model Tuning (hyperparameter optimization). The objective metric is validation:auc. After 50 training jobs, the best model still has a validation AUC of only 0.65. The scientist suspects overfitting because the training AUC is 0.99. Which hyperparameter configuration is MOST likely to reduce overfitting?

A.Increase lambda from 1 to 10
B.Increase num_round from 100 to 500
C.Increase max_depth from 6 to 12
D.Increase subsample from 0.5 to 1.0
AnswerA

Higher L2 regularization reduces overfitting by penalizing large weights.

Why this answer

Increasing lambda (L2 regularization) from 1 to 10 adds a stronger penalty on the magnitude of leaf weights in the gradient boosting model. This directly reduces overfitting by discouraging the model from fitting noise in the training data, which is consistent with the observed gap between training AUC (0.99) and validation AUC (0.65). In XGBoost, lambda controls the L2 regularization term on weights, and a higher value forces the model to be simpler and more generalizable.

Exam trap

The trap here is that candidates often assume increasing model complexity (e.g., more rounds, deeper trees) will improve performance, but the question explicitly describes overfitting, so the correct answer must reduce complexity or increase regularization, which is lambda.

How to eliminate wrong answers

Option B is wrong because increasing num_round (number of boosting rounds) from 100 to 500 would increase model complexity and training time, likely worsening overfitting by allowing the model to further memorize the training data. Option C is wrong because increasing max_depth from 6 to 12 allows trees to grow deeper, capturing more specific interactions and noise, which exacerbates overfitting rather than reducing it. Option D is wrong because increasing subsample from 0.5 to 1.0 means using all training data for each tree, removing the stochastic regularization effect that subsampling provides, which would reduce generalization and increase overfitting risk.

426
MCQeasy

A company is using SageMaker Autopilot to automatically build a binary classification model. After the AutoML job completes, the data scientist wants to understand which features are most important for the best candidate model. How can the scientist get feature importance?

A.Open the SageMaker Autopilot job details and view the 'Explainability' tab
B.Re-run the best model using SageMaker built-in XGBoost with the 'feature_importance' hyperparameter
C.Check the CloudWatch Logs for the training job
D.Use SageMaker Ground Truth to label a new dataset
AnswerA

Autopilot provides feature importance in the explainability tab for the best candidate.

Why this answer

SageMaker Autopilot automatically generates a 'Explainability' tab within the job details for the best candidate model. This tab uses SHAP (SHapley Additive exPlanations) values to provide feature importance, showing which features most influence the model's predictions. The data scientist can directly access this information without any additional configuration or re-running the model.

Exam trap

AWS often tests the misconception that feature importance must be manually extracted via code or logs, when in fact SageMaker Autopilot provides it directly in the UI under the 'Explainability' tab for the best candidate model.

How to eliminate wrong answers

Option B is wrong because SageMaker built-in XGBoost does not have a 'feature_importance' hyperparameter; feature importance is a property of the trained model object (e.g., via `get_fscore()` or `plot_importance()`), not a hyperparameter set before training. Option C is wrong because CloudWatch Logs for the training job contain training metrics, loss values, and algorithm logs, but not structured feature importance data; feature importance is not emitted to logs by default. Option D is wrong because SageMaker Ground Truth is a data labeling service for creating labeled datasets, not for extracting feature importance from a trained model; it is unrelated to model interpretability.

427
Multi-Selecthard

A company is using Amazon SageMaker to deploy a model for real-time inference. The model is a deep neural network that requires GPU for low latency. The endpoint currently uses a single ml.p3.2xlarge instance. Traffic is expected to increase by 5x. Which TWO actions should the company take to handle the increased traffic?

Select 2 answers
A.Use a larger instance type with more GPUs
B.Switch to a CPU-based instance
C.Enable auto-scaling on the endpoint
D.Use a multi-model endpoint
E.Decrease the batch size
AnswersA, C

Larger instance provides more GPU compute.

Why this answer

Option B is correct because enabling auto-scaling allows the endpoint to handle variable traffic. Option D is correct because using a larger instance with more GPUs (e.g., ml.p3.8xlarge) can increase throughput. Option A is wrong because switching to CPU would increase latency.

Option C is wrong because adding more instances without scaling policy may not be optimal. Option E is wrong because reducing batch size would decrease throughput.

428
MCQhard

A machine learning team is using Amazon SageMaker to train a model using a custom Docker container. The training job fails with an error: 'Unable to write to /opt/ml/model'. The container does not have root access. What is the most likely cause?

A.The /opt/ml/model directory does not exist in the container
B.The container is using an unsupported operating system
C.The container does not have internet access
D.The container process does not have write permission to /opt/ml/model
AnswerD

The process user lacks write permissions to the directory.

Why this answer

SageMaker expects training containers to write the model artifact to /opt/ml/model. The container process must have write permissions to that directory. Root access is not required; the container runs with the 'sagemaker' user.

The directory may not exist or permissions are wrong. The error indicates a write issue, likely permissions.

429
MCQhard

A data scientist is building a model to predict customer churn. The dataset has 20 features, including categorical variables with high cardinality (e.g., ZIP code). The data scientist wants to use a linear model. Which feature engineering technique is MOST appropriate for the high-cardinality categorical features?

A.One-hot encoding
B.Target encoding
C.TF-IDF
D.Standard scaling
AnswerB

Target encoding handles high cardinality well.

Why this answer

Target encoding is the most appropriate technique for high-cardinality categorical features when using a linear model because it replaces each category with the mean of the target variable for that category, creating a numeric feature that captures the relationship between the category and the target without exploding the feature space. One-hot encoding would create an unmanageable number of binary columns (e.g., thousands of ZIP codes), leading to the curse of dimensionality and making the linear model unstable or computationally infeasible.

Exam trap

AWS often tests the trap that candidates default to one-hot encoding for all categorical variables, failing to recognize that high cardinality makes it impractical for linear models, whereas target encoding is a more efficient alternative that preserves feature information without dimensionality explosion.

How to eliminate wrong answers

Option A is wrong because one-hot encoding for high-cardinality features like ZIP code would generate thousands of dummy variables, causing the linear model to suffer from the curse of dimensionality, multicollinearity, and overfitting. Option C is wrong because TF-IDF is designed for text data to weigh term frequency against inverse document frequency, not for encoding categorical variables like ZIP codes in a churn prediction task. Option D is wrong because standard scaling is a normalization technique for numerical features, not a method for encoding categorical variables, and applying it to raw categorical labels would be meaningless.

430
MCQhard

A machine learning team is deploying a model using Amazon SageMaker. The model receives requests with sparse high-dimensional features. The team wants to minimize inference latency. Which SageMaker endpoint configuration is MOST suitable?

A.Use a multi-variant endpoint with two variants
B.Use a serverless endpoint with provisioned concurrency
C.Use a single model endpoint with a large instance type
D.Use a multi-model endpoint on a GPU instance
AnswerD

Multi-model endpoints reduce latency by loading models on demand.

Why this answer

Option D is correct because multi-model endpoints on GPU instances allow multiple models to be loaded into memory on a single GPU-backed instance, reducing cold-start latency for sparse high-dimensional features by keeping models warm and leveraging GPU parallelism for inference. This minimizes inference latency compared to other configurations by avoiding the overhead of separate endpoint invocations and optimizing resource utilization for high-dimensional data.

Exam trap

The trap here is that candidates often assume a large single-instance endpoint (Option C) is sufficient for low latency, but they overlook the GPU acceleration and memory efficiency of multi-model endpoints for sparse high-dimensional data, which is a key optimization tested in the MLS-C01 exam.

How to eliminate wrong answers

Option A is wrong because a multi-variant endpoint with two variants distributes traffic across multiple model versions or instance types, which does not inherently reduce inference latency for sparse high-dimensional features and can introduce additional routing overhead. Option B is wrong because a serverless endpoint with provisioned concurrency is designed for infrequent or variable traffic patterns, but it incurs cold-start latency for each new invocation and is not optimized for high-dimensional sparse data that benefits from GPU acceleration. Option C is wrong because a single model endpoint with a large instance type may provide sufficient compute but does not leverage GPU parallelism for sparse high-dimensional features, leading to suboptimal inference latency compared to GPU-based multi-model endpoints.

431
MCQmedium

A data scientist is using Amazon SageMaker to train a model using the built-in XGBoost algorithm. The training job is taking a long time. The data scientist notices that the input data is in CSV format and the training job is using File mode. The data size is 50 GB. What is the BEST way to reduce training time?

A.Use a larger instance type with more vCPUs.
B.Convert the data to Parquet format.
C.Reduce the number of features in the dataset.
D.Switch the input mode to Pipe.
AnswerD

Pipe mode reduces I/O wait time by streaming data.

Why this answer

Option B is correct. Pipe mode streams data directly from S3, reducing I/O overhead. Option A is wrong because increasing instance count may not help if I/O is bottleneck.

Option C is wrong because converting to Parquet may not be supported directly by XGBoost. Option D is wrong because reducing data harms accuracy.

432
MCQmedium

A company is training a deep learning model on SageMaker using a large dataset stored in S3. The training job is taking a long time due to I/O bottlenecks. Which action would MOST effectively reduce the I/O bottleneck?

A.Use Amazon EFS as the data source.
B.Use Amazon FSx for Lustre as the data source.
C.Increase the number of training instances.
D.Use Pipe input mode in the SageMaker estimator.
AnswerD

Pipe mode streams data, reducing disk I/O.

Why this answer

Using Pipe mode streams data directly from S3 to the algorithm without writing to disk, reducing I/O wait. Option A (increasing instance count) may help but not directly address I/O. Option C (using EFS) adds latency.

Option D (using FSx) is more complex.

433
MCQmedium

A data scientist is using Amazon SageMaker to build a text classification model. The dataset has 100,000 labeled samples and 20 classes. The scientist wants to use a pre-trained BERT model and fine-tune it. Which approach is MOST cost-effective?

A.Train a BERT model from scratch using a larger instance.
B.Fine-tune a pre-trained BERT-base model using a GPU instance.
C.Use a pre-trained BERT-large model with a larger instance.
D.Train a CNN model from scratch using CPU instances.
AnswerB

BERT-base is cost-effective and fine-tuning is efficient.

Why this answer

Option A is correct because using a small BERT variant like BERT-base reduces computational cost while still being effective. Option B is wrong because training from scratch is expensive. Option C is wrong because using the full BERT-large is overkill.

Option D is wrong because using a CNN from scratch would require more data and training.

434
MCQmedium

A team is using Amazon SageMaker to train a deep learning model for image classification. The training job is taking too long, and they want to reduce training time without sacrificing model accuracy. Which approach is most effective?

A.Reduce the batch size
B.Reduce the number of training epochs
C.Reduce the image resolution
D.Use transfer learning with a pre-trained model and fine-tune on the target dataset
AnswerD

Transfer learning uses features learned from a large dataset, allowing faster convergence and similar accuracy.

Why this answer

Option C is correct because using a pre-trained model (transfer learning) leverages existing feature representations, reducing training time while maintaining accuracy. Option A is wrong because reducing epochs may harm accuracy. Option B is wrong because reducing batch size can increase training time.

Option D is wrong because reducing image size may lose information.

435
MCQmedium

A company uses Amazon SageMaker to deploy a real-time inference endpoint for a regression model. The endpoint is experiencing high latency during spikes in traffic. The data scientist needs to reduce latency while maintaining cost efficiency. Which action should the data scientist take?

A.Use batch transform instead of real-time inference
B.Use a larger instance type for the endpoint
C.Deploy the model on a multi-model endpoint
D.Enable automatic scaling for the endpoint
AnswerD

Automatic scaling adds instances during traffic spikes, reducing latency.

Why this answer

Option D is correct because enabling automatic scaling for the SageMaker endpoint allows the number of instances to dynamically adjust based on traffic patterns, reducing latency during spikes by adding capacity when needed and removing it during low traffic to maintain cost efficiency. Automatic scaling uses CloudWatch metrics (e.g., InvocationsPerInstance or CPUUtilization) to trigger scale-out and scale-in policies, ensuring the endpoint can handle bursts without over-provisioning.

Exam trap

The trap here is that candidates confuse automatic scaling with simply adding more resources (Option B) or assume multi-model endpoints (Option C) are a latency solution, when in fact automatic scaling is the only option that directly addresses both latency spikes and cost efficiency through dynamic instance management.

How to eliminate wrong answers

Option A is wrong because batch transform is designed for offline, asynchronous inference on large datasets and does not support real-time inference, so it cannot reduce latency for a real-time endpoint. Option B is wrong because using a larger instance type may reduce latency for individual requests but increases cost significantly and does not dynamically adapt to traffic spikes, leading to either over-provisioning or continued high latency during bursts. Option C is wrong because a multi-model endpoint hosts multiple models on the same instance to improve resource utilization, but it does not inherently reduce latency during traffic spikes; in fact, it can increase latency due to model loading/unloading overhead and contention for shared resources.

436
MCQeasy

A data scientist is using Amazon SageMaker to train a linear learner model for regression. After reviewing the training logs, the data scientist notices that the loss is not decreasing and remains high. The learning rate is set to 0.01. The data is normalized. What should the data scientist do to improve convergence?

A.Normalize the data again.
B.Reduce the mini-batch size.
C.Try different learning rates, such as 0.001 or 0.1.
D.Increase the number of epochs.
AnswerC

Tuning the learning rate is a common first step to improve convergence.

Why this answer

Option B is correct. The learning rate may be too high causing oscillation or too low causing slow convergence. Adjusting it can help.

Option A is wrong because more epochs may not help if the learning rate is inappropriate. Option C is wrong because the data is already normalized. Option D is wrong because reducing batch size increases noise but may not resolve convergence issues.

437
Multi-Selecteasy

A data scientist is training a text classification model using Amazon SageMaker's built-in BlazingText algorithm. The dataset contains 1 million documents. Which TWO hyperparameters are most important to tune for improving model accuracy?

Select 2 answers
A.Learning rate
B.Batch size
C.Loss function
D.Type of optimizer
E.Number of epochs
AnswersA, E

Learning rate controls the step size during optimization and is crucial for convergence.

Why this answer

Learning rate and number of epochs are critical hyperparameters for training neural networks like BlazingText. They control how quickly the model learns and how long it trains.

438
MCQmedium

A company is building a fraud detection model using a random forest classifier. The dataset is highly imbalanced with 99% legitimate transactions and 1% fraudulent. The model currently achieves 99% accuracy on the test set, but the fraud recall is only 10%. The business requires at least 80% recall for fraud. The data scientist has tried oversampling the minority class and adjusting class weights, but recall remains below 40%. The dataset contains millions of transactions with hundreds of features. Which approach should the data scientist try next to improve fraud recall?

A.Randomly undersample the majority class to a 50:50 ratio
B.Use a gradient boosting machine (e.g., XGBoost) with scale_pos_weight parameter
C.Apply PCA to reduce dimensionality before training
D.Use a logistic regression model with L2 regularization
AnswerB

Gradient boosting often outperforms random forest on imbalance with proper weighting.

Why this answer

Option D (gradient boosting with scale_pos_weight) is effective for imbalance. Option A (logistic regression) likely underperforms. Option B (undersampling) loses too much data.

Option C (PCA) reduces features but may lose signal.

439
MCQeasy

A company uses Amazon SageMaker to train a model and wants to track metrics like loss and accuracy in real-time. Which SageMaker feature should be used?

A.SageMaker Model Monitor
B.SageMaker metrics and CloudWatch dashboards
C.SageMaker Experiments
D.SageMaker Debugger
AnswerB

Provides real-time training metrics.

Why this answer

Option C is correct because SageMaker's built-in metrics and CloudWatch integration allow real-time tracking. Option A is wrong because SageMaker Debugger is for model debugging, not real-time metrics. Option B is wrong because SageMaker Experiments is for managing experiments, not real-time.

Option D is wrong because SageMaker Model Monitor is for monitoring inference.

440
Multi-Selecthard

Which THREE techniques can help reduce overfitting in a neural network trained on a small dataset?

Select 3 answers
A.Apply L2 weight regularization
B.Increase the number of hidden layers
C.Train for more epochs
D.Use data augmentation
E.Add dropout layers
AnswersA, D, E

L2 regularization penalizes large weights.

Why this answer

L2 weight regularization (also known as weight decay) penalizes large weights by adding a term to the loss function proportional to the sum of squared weights. This forces the network to learn simpler patterns and reduces sensitivity to noise in the training data, which is especially helpful when the dataset is small and prone to overfitting.

Exam trap

Cisco often tests the misconception that increasing model complexity (more layers or epochs) always improves performance, when in fact on small datasets it reliably worsens overfitting.

441
MCQhard

A machine learning engineer is using Amazon SageMaker to deploy a model for real-time inference. The model is a large ensemble that requires 4 GB of memory and has a latency requirement of 100 ms. Which instance type and deployment configuration should the engineer choose to optimize cost while meeting requirements?

A.ml.m5.large (2 vCPU, 8 GB memory)
B.SageMaker Serverless Inference
C.ml.c5.large (2 vCPU, 4 GB memory)
D.ml.p3.2xlarge (8 vCPU, 61 GB memory, 1 GPU)
AnswerA

8 GB memory provides headroom, and cost is moderate.

Why this answer

ml.m5.large provides 8 GB memory and is cost-effective for real-time inference with moderate latency requirements. Option A is wrong because ml.c5.large has only 4 GB memory, insufficient for 4 GB model plus overhead. Option B is wrong because ml.p3.2xlarge is GPU-accelerated and expensive, overkill for this model.

Option D is wrong because Serverless Inference has cold start latency that may exceed 100 ms.

442
MCQhard

A company is building a real-time fraud detection system using Amazon SageMaker. The model must have low latency (under 10ms) and high throughput (thousands of predictions per second). The team has trained a gradient boosting model using XGBoost. Which SageMaker inference option is MOST suitable?

A.Use SageMaker asynchronous inference.
B.Deploy the model on a SageMaker real-time endpoint with a multi-model endpoint.
C.Deploy the model on a SageMaker serverless endpoint.
D.Use a batch transform job.
AnswerB

Multi-model endpoints optimize cost and latency for high throughput.

Why this answer

A multi-model endpoint (MME) on SageMaker is the most suitable option because it allows you to host multiple XGBoost models on a single endpoint, sharing the underlying instance to maximize throughput and minimize latency. MMEs keep models loaded in memory and route requests to the correct model with sub-10ms overhead, meeting the low-latency and high-throughput requirements for real-time fraud detection.

Exam trap

The trap here is that candidates often confuse 'real-time' with 'serverless' or 'asynchronous', failing to recognize that serverless endpoints introduce cold-start latency and throughput limits that break the sub-10ms and high-throughput requirements.

How to eliminate wrong answers

Option A is wrong because asynchronous inference is designed for large payloads or long processing times (e.g., batch processing with minutes of latency), not for real-time sub-10ms predictions. Option C is wrong because serverless endpoints have a cold-start latency that can exceed 10ms and are throttled at lower concurrency, making them unsuitable for thousands of predictions per second. Option D is wrong because batch transform jobs are offline, not real-time, and cannot provide sub-10ms latency or handle streaming prediction requests.

443
MCQeasy

A data scientist is building a regression model to predict energy consumption. The dataset includes features like temperature, humidity, day of week, and holiday flags. The scientist uses a linear regression model and obtains an R-squared of 0.85 on training and 0.40 on test. The scientist suspects the model is not capturing non-linear relationships. Which approach should the scientist use to capture non-linearity?

A.Apply PCA to the feature set
B.Increase L1 regularization using Lasso
C.Remove features with low correlation to the target
D.Add polynomial features (e.g., squared terms and interactions)
AnswerD

Polynomial features allow linear model to fit non-linear patterns.

Why this answer

Option C (add polynomial features) captures non-linear relationships. Option A (increase regularization) reduces overfitting but doesn't add non-linearity. Option B (use PCA) reduces dimensionality.

Option D (remove features) may lose information.

444
MCQeasy

A data scientist is building a classification model to predict customer churn. The dataset has 10,000 samples with 100 features. After training a logistic regression model, the scientist observes that the model has high variance (overfitting). Which technique can reduce overfitting?

A.Remove the regularization term
B.Use L2 regularization (Ridge)
C.Add polynomial features
D.Use a smaller learning rate
AnswerB

L2 regularization penalizes large weights, reducing overfitting.

Why this answer

L2 regularization (Ridge) adds a penalty on large coefficients, reducing overfitting. Removing features may help but is not the best practice. Increasing model complexity (polynomial features) would worsen overfitting.

Increasing training data helps but not listed.

445
MCQmedium

A data scientist is using Amazon SageMaker to train a model. The training dataset is stored in S3 as CSV files. The scientist wants to use the SageMaker built-in Linear Learner algorithm. Which input mode should be used for optimal performance?

A.Augmented manifest file mode
B.File mode
C.Pipe mode
D.Fast file mode
AnswerC

Pipe mode streams data, reducing I/O overhead and improving performance.

Why this answer

Pipe mode streams data directly from S3 to the algorithm without writing to disk, reducing I/O overhead. File mode downloads the entire dataset to disk, which is slower. Fast file mode is not a SageMaker feature.

Augmented manifest is for additional metadata, not performance.

446
MCQhard

A company runs a real-time recommendation system on SageMaker with a model that uses a deep neural network. The endpoint uses a single ml.p3.2xlarge instance. Recently, the number of users has grown, and the endpoint's latency has increased from 50ms to 200ms, exceeding the SLA of 100ms. The model inference code is optimized and cannot be improved further. The company wants to reduce latency while minimizing cost. The data scientist has the following options: A. Switch to a larger instance type with more GPU memory, such as ml.p3.8xlarge. B. Use SageMaker's Elastic Inference to attach an EI accelerator to the existing instance. C. Deploy the model on multiple smaller instances (e.g., ml.p3.2xlarge) behind a load balancer and distribute traffic. D. Convert the model to use TensorFlow Lite and deploy on a CPU-based instance. Which option is the MOST cost-effective and meets the latency requirement?

A.Convert to TensorFlow Lite on CPU
B.Use SageMaker's Elastic Inference
C.Switch to a larger instance type, e.g., ml.p3.8xlarge
D.Deploy on multiple smaller instances behind a load balancer
AnswerB

Elastic Inference provides cost-effective GPU acceleration.

Why this answer

Option B is the most cost-effective because Elastic Inference provides dedicated GPU acceleration at a fraction of the cost of a full GPU instance. Option A increases cost significantly. Option C may reduce latency but increases cost and complexity.

Option D may not maintain accuracy and may not meet latency requirements on CPU.

447
MCQmedium

Refer to the exhibit. An IAM policy is attached to a SageMaker notebook instance. A data scientist is trying to invoke the endpoint 'my-endpoint' from the notebook but receives an AccessDenied error. What is the likely cause?

A.The policy allows InvokeEndpoint only for endpoints with the exact ARN, but the endpoint ARN is different.
B.The policy uses a wildcard for CreateEndpoint, which is too permissive.
C.The policy does not allow sagemaker:CreateEndpoint for the specific endpoint.
D.The policy is not attached to the IAM role used by the notebook instance.
AnswerD

Without the policy, InvokeEndpoint is denied.

Why this answer

Option D is correct because the error 'AccessDenied' when invoking a SageMaker endpoint from a notebook instance typically indicates that the IAM role attached to the notebook does not have the required permissions. The policy shown in the exhibit grants sagemaker:InvokeEndpoint for the specific endpoint ARN, but if the policy is not attached to the IAM role that the notebook instance is using, the role lacks the permission, resulting in the AccessDenied error. Attaching the policy to the correct IAM role resolves the issue.

Exam trap

AWS often tests the distinction between having a policy defined versus having it attached to the correct IAM role; candidates mistakenly assume that if a policy exists in the account, it automatically applies to all resources, but IAM policies must be explicitly attached to the role or user making the request.

How to eliminate wrong answers

Option A is wrong because the policy explicitly allows InvokeEndpoint for the endpoint ARN 'arn:aws:sagemaker:us-east-1:123456789012:endpoint/my-endpoint', so if the endpoint ARN matches, this is not the cause. Option B is wrong because the wildcard for CreateEndpoint is irrelevant to the InvokeEndpoint action; the error is about invoking, not creating, and a permissive CreateEndpoint policy does not cause an AccessDenied on InvokeEndpoint. Option C is wrong because the policy does not need to allow sagemaker:CreateEndpoint for invoking an endpoint; the required action is sagemaker:InvokeEndpoint, which is already allowed in the policy.

448
Multi-Selecteasy

Which TWO of the following are appropriate use cases for Amazon SageMaker built-in algorithms?

Select 2 answers
A.Classifying customer churn using tabular data
B.Reinforcement learning using Q-learning
C.Classifying text documents using word embeddings
D.Image classification using a custom CNN architecture
E.Time series forecasting using ARIMA
AnswersA, C

XGBoost or Linear Learner can be used.

Why this answer

XGBoost is suitable for tabular classification. BlazingText is for text classification on word embeddings. Image classification using custom CNNs may use built-in but not necessarily.

Time series forecasting is not a built-in algorithm (use DeepAR). Reinforcement learning is not a built-in algorithm.

449
MCQeasy

A machine learning engineer is using Amazon SageMaker to train a model. The training job fails with an out-of-memory error. The training data size is 10 GB and the instance is ml.m5.xlarge (16 GB memory). Which change is MOST likely to resolve the issue without increasing cost?

A.Reduce the batch size in the training script.
B.Switch to a GPU instance like p3.2xlarge.
C.Use a larger instance type like ml.m5.4xlarge.
D.Decrease the training dataset size.
AnswerA

Smaller batch size reduces memory footprint per iteration.

Why this answer

Option A is correct because many algorithms allow you to set a batch size, and reducing it lowers memory usage. Option B is wrong because changing to GPU may not help and could increase cost. Option C is wrong because increasing instance type increases cost.

Option D is wrong because decreasing the dataset size may lose information.

450
Multi-Selecteasy

A data scientist is evaluating a linear regression model. Which TWO metrics are appropriate for evaluating the model's performance?

Select 2 answers
A.R-squared
B.Root Mean Squared Error (RMSE)
C.Precision
D.Area Under the ROC Curve (AUC-ROC)
E.F1 score
AnswersA, B

R-squared measures the proportion of variance explained by the model.

Why this answer

R-squared is a standard metric for linear regression that measures the proportion of variance in the dependent variable explained by the independent variables. It ranges from 0 to 1, with higher values indicating better fit, making it directly appropriate for evaluating regression model performance.

Exam trap

AWS often tests the distinction between regression and classification metrics, and the trap here is that candidates mistakenly apply classification metrics like Precision, AUC-ROC, or F1 score to a regression problem, not recognizing they are fundamentally incompatible with continuous outputs.

← PreviousPage 6 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Ml Modeling questions.