Knowledge + Practice

CCNA Ml Modeling Questions

75 of 624 questions · Page 4/9 · Ml Modeling topic · Answers revealed

Practice these questions Exam hub All questions

226

MCQeasy

A company uses SageMaker to deploy a model for predicting customer churn. The model was trained on historical data and achieves 85% accuracy on the test set. After deployment, the model's predictions are significantly worse on new data due to changes in customer behavior. What is the MOST likely cause?

A.Data leakage during training

B.The training dataset was too small

C.Concept drift in the underlying data distribution

D.The model is overfitting to the training data

AnswerC

Changes in customer behavior cause concept drift, reducing model accuracy over time.

Why this answer

The model's performance degradation on new data, despite high accuracy on the test set, is a classic symptom of concept drift. Concept drift occurs when the statistical properties of the target variable (customer churn) change over time due to shifts in customer behavior, making the trained model's decision boundary obsolete. SageMaker deployed the model as a persistent endpoint, but the underlying data distribution has evolved, so the model no longer generalizes to the current environment.

Exam trap

The trap here is that candidates confuse concept drift with overfitting, assuming any performance drop after deployment must be due to the model memorizing noise, but the key differentiator is the temporal nature of the degradation tied to changing customer behavior, not a static training-data issue.

How to eliminate wrong answers

Option A is wrong because data leakage would inflate test set accuracy artificially, but the model would fail immediately on new data—not after a period of deployment—and the scenario describes a gradual change in customer behavior, not a training flaw. Option B is wrong because a small training dataset typically causes high bias or variance, leading to poor accuracy on both test and new data, whereas here the model initially achieved 85% accuracy on the test set. Option D is wrong because overfitting would cause poor performance on the test set (not 85% accuracy) and would not explain a delayed degradation tied to changing customer behavior; overfitting is a static issue, not a temporal one.

Practice this question →

227

MCQhard

A machine learning team is building a recommendation system for an e-commerce platform. They have user-item interaction data (clicks, purchases). They need to choose an algorithm that can capture both user and item latent factors and handle missing data. Which algorithm should they use?

A.Linear regression

B.Principal component analysis (PCA)

C.Matrix factorization

D.Convolutional neural network (CNN)

AnswerC

Matrix factorization learns latent factors and handles missing data.

Why this answer

Matrix factorization is the correct choice because it decomposes the user-item interaction matrix into lower-dimensional latent factors for users and items, capturing underlying patterns in preferences. It naturally handles missing data by learning from observed interactions only, making it ideal for recommendation systems with sparse data.

Exam trap

The trap here is that candidates may choose PCA because it also performs dimensionality reduction, but PCA cannot handle missing data or model user-item interactions for collaborative filtering, which is the core requirement of the question.

How to eliminate wrong answers

Option A is wrong because linear regression models a continuous target variable from features but cannot capture latent factors or handle missing data in a user-item matrix. Option B is wrong because PCA is an unsupervised dimensionality reduction technique that does not model user-item interactions or handle missing data; it requires a complete matrix and ignores the collaborative filtering structure. Option D is wrong because CNNs are designed for spatial data like images and are not suited for collaborative filtering or latent factor extraction from sparse interaction matrices.

Practice this question →

228

MCQeasy

A data scientist is training a binary classification model on imbalanced data (95% negative, 5% positive). The model achieves 95% accuracy but only 10% recall on the positive class. Which metric should be used to evaluate model performance?

A.F1 score

B.Accuracy

C.Recall

D.Precision

AnswerA

F1 score balances precision and recall, suitable for imbalanced data.

Why this answer

Option C is correct because with imbalanced data, accuracy is misleading. F1 score balances precision and recall. Option A is wrong because accuracy is high but not informative.

Option B is wrong because precision alone ignores recall. Option D is wrong because recall alone ignores precision.

Practice this question →

229

MCQmedium

A company is using Amazon SageMaker to deploy a model for real-time inference. The model receives requests that are small but arrive in bursts. The data scientist wants to minimize latency and cost. Which deployment option is MOST appropriate?

A.Use a real-time endpoint with a single instance

B.Use a multi-model endpoint with auto-scaling

C.Use Amazon SageMaker Serverless Inference

D.Use a batch transform job triggered by a schedule

AnswerC

Serverless scales automatically and you pay only for inference duration.

Why this answer

Amazon SageMaker Serverless Inference is the most appropriate option because it automatically scales compute resources based on request volume, charges only for the compute time used during inference (per-millisecond billing), and has no idle costs. This matches the bursty, small-request pattern perfectly, minimizing both latency and cost without requiring manual instance management.

Exam trap

The trap here is that candidates often confuse 'multi-model endpoints' with 'serverless' and assume auto-scaling eliminates idle costs, but multi-model endpoints still require a minimum number of running instances, incurring continuous charges.

How to eliminate wrong answers

Option A is wrong because a single-instance real-time endpoint incurs continuous hourly costs even when idle, and cannot handle burst traffic without significant latency or throttling. Option B is wrong because a multi-model endpoint with auto-scaling still requires at least one running instance at all times, leading to idle costs and slower scaling compared to serverless. Option D is wrong because batch transform jobs are designed for offline, asynchronous processing of large datasets, not real-time inference, and cannot meet low-latency requirements.

Practice this question →

230

MCQmedium

A team is training a deep learning model on Amazon SageMaker. The training job is slow because the data is stored in S3 as many small files. Which approach is MOST effective to improve training throughput?

A.Use SageMaker Pipe mode for training input

B.Increase the number of ml.c5.xlarge instances

C.Shuffle the S3 objects to randomize order

D.Use Amazon EFS instead of S3

AnswerA

Pipe mode streams data directly, avoiding the need to download all files first, improving throughput.

Why this answer

Using SageMaker Pipe mode streams data directly from S3, reducing startup time. Shuffling files or increasing instance count does not address the small file overhead. Using EFS would introduce latency.

Practice this question →

231

Multi-Selecthard

A data scientist is developing a deep learning model for object detection using Amazon SageMaker. The training dataset has 50,000 labeled images. The data scientist wants to improve model generalization without collecting more data. Which TWO techniques can be applied? (Choose two.)

Select 2 answers

A.Increase the learning rate to speed up convergence.

B.Increase the number of training epochs to ensure convergence.

C.Apply data augmentation techniques such as random cropping and horizontal flipping.

D.Use transfer learning from a pre-trained model on ImageNet.

E.Increase the batch size to reduce variance.

AnswersC, D

Data augmentation increases data diversity without new data.

Why this answer

Options A and D are correct. Option A: Data augmentation (e.g., random crops, flips) effectively increases dataset diversity. Option D: Using a pre-trained model (transfer learning) improves generalization.

Option B (increasing batch size) may hurt generalization. Option C (increasing learning rate) can cause divergence. Option E (increasing epochs) may lead to overfitting.

Practice this question →

232

Multi-Selectmedium

A data scientist is building a text classification model using a bag-of-words approach with logistic regression. The dataset has 10,000 documents and 50,000 unique tokens. The model overfits. Which TWO techniques can help reduce overfitting?

Select 2 answers

A.Increase the number of n-grams features

B.Use one-hot encoding instead of bag-of-words

C.Use a more complex model such as a neural network

D.Reduce the vocabulary size by removing rare and very frequent terms

E.Apply L2 regularization to the logistic regression model

AnswersD, E

Reducing the number of features reduces model complexity and overfitting.

Why this answer

Option D is correct because removing rare and very frequent terms reduces the feature space and eliminates noise, which helps the logistic regression model generalize better. Rare terms often act as noise that the model can latch onto for spurious correlations, while very frequent terms (like stopwords) provide little discriminative power. This dimensionality reduction directly combats overfitting by simplifying the model.

Exam trap

AWS often tests the misconception that adding more features or using a more complex model always improves performance, when in fact these actions increase overfitting risk in high-dimensional sparse datasets.

Practice this question →

233

MCQhard

A machine learning team is deploying a time-series forecasting model using Amazon SageMaker. The model is trained on historical data and needs to be updated daily with new data. The team wants to automate the retraining pipeline and avoid manual intervention. Which approach is the most efficient?

A.Use AWS Step Functions to orchestrate retraining, but require a manual approval step.

B.Use SageMaker training jobs manually triggered by the team each day.

C.Use a cron job on an EC2 instance to run a training script.

D.Use Amazon SageMaker Pipelines with a scheduled Lambda function to trigger retraining daily.

AnswerD

Combines SageMaker Pipelines for automated ML workflows with Lambda for scheduling, providing a fully automated solution.

Why this answer

Option D is correct because Amazon SageMaker Pipelines provides a fully managed, end-to-end orchestration service for building, training, and deploying machine learning models. By combining it with a scheduled AWS Lambda function, the team can automate daily retraining without manual intervention, leveraging SageMaker's native integration for step sequencing, artifact tracking, and model registry updates.

Exam trap

The trap here is that candidates might choose Option C (cron job on EC2) because it seems simpler, but they overlook the operational burden of managing EC2 and the lack of native SageMaker integration for model lineage and automated deployment.

How to eliminate wrong answers

Option A is wrong because requiring a manual approval step contradicts the requirement to avoid manual intervention, making the pipeline not fully automated. Option B is wrong because manually triggering training jobs each day is the opposite of automation and introduces human error and operational overhead. Option C is wrong because using a cron job on an EC2 instance requires managing the instance (patching, scaling, security), and the training script would lack native integration with SageMaker's managed infrastructure, model registry, and pipeline lineage tracking.

Practice this question →

234

Multi-Selectmedium

Which TWO of the following are best practices for training deep learning models on Amazon SageMaker? (Select TWO.)

Select 2 answers

A.Use SageMaker Processing to perform data augmentation before training.

B.Use Pipe input mode to stream data directly from S3 to the algorithm.

C.Store training data on Amazon EBS volumes attached to the training instance.

D.Use managed spot training to reduce costs.

E.Disable checkpointing to improve training speed.

AnswersB, D

Pipe mode reduces startup time and storage.

Why this answer

Option B is correct because SageMaker's Pipe input mode streams training data directly from Amazon S3 to the algorithm without writing it to disk, reducing I/O latency and eliminating the need for large local storage. This is especially beneficial for deep learning models that iterate over large datasets, as it allows training to start faster and avoids the overhead of downloading data to EBS volumes.

Exam trap

The trap here is that candidates often confuse SageMaker Processing with a general-purpose compute environment for any training task, when in fact it is specifically for data processing jobs, not for augmenting data during model training.

Practice this question →

235

MCQhard

A company is building a recommendation system using Amazon SageMaker's Factorization Machines algorithm. The dataset includes user IDs, item IDs, and ratings. The data is sparse. Which data format should be used for training?

A.CSV format with one row per rating.

B.JSON lines format with nested structures.

C.RecordIO-protobuf format with sparse feature vectors.

D.Parquet format with columns for each feature.

AnswerC

Protobuf with sparse encoding is efficient and recommended.

Why this answer

Option C is correct because Factorization Machines (FM) in SageMaker are optimized for sparse, high-dimensional data. The RecordIO-protobuf format allows you to directly specify sparse feature vectors using integer keys and float values, which avoids the memory overhead of dense representations and enables efficient distributed training. This format is the recommended input for SageMaker's built-in FM algorithm.

Exam trap

The trap here is that candidates assume CSV is always the simplest and most compatible format, overlooking the fact that SageMaker's Factorization Machines specifically require sparse data representation for performance and correctness, making RecordIO-protobuf the only optimal choice among the options.

How to eliminate wrong answers

Option A is wrong because CSV format with one row per rating forces dense representation, which is inefficient for sparse data and does not leverage FM's native support for sparse feature vectors. Option B is wrong because JSON lines format with nested structures is not natively supported by SageMaker's Factorization Machines; the algorithm expects RecordIO-protobuf or CSV with a specific schema, not arbitrary nested JSON. Option D is wrong because Parquet format, while efficient for columnar storage, is not directly supported by SageMaker's FM algorithm and would require conversion to RecordIO-protobuf or CSV for training.

Practice this question →

236

MCQhard

A company uses Amazon SageMaker to train a model for fraud detection. The dataset has 1 million samples with 200 features. The data is highly imbalanced (0.1% fraud). The team wants to use a random forest model. Which technique should they use to handle the class imbalance during training?

A.Synthetic Minority Over-sampling Technique (SMOTE)

B.Use class weights inversely proportional to class frequencies

C.Random undersampling of the majority class

D.Adjust the decision threshold after training

AnswerA

SMOTE generates synthetic samples, effectively balancing the dataset.

Why this answer

SMOTE generates synthetic samples of the minority class to balance the dataset. Option A is wrong because undersampling discards majority class data, losing information. Option B is wrong because class weights adjust loss function but are not specific to random forest.

Option D is wrong because threshold tuning is post-training, not during training.

Practice this question →

237

Multi-Selecthard

A data scientist is using Amazon SageMaker to train a deep learning model for natural language processing. The training job is taking too long to converge. The data scientist wants to speed up training without significantly sacrificing model accuracy. Which THREE strategies should the data scientist consider? (Choose three.)

Select 3 answers

A.Reduce the model size by using fewer layers or smaller hidden dimensions.

B.Increase the learning rate by a factor of 10 to accelerate convergence.

C.Increase the batch size to its maximum possible value to utilize GPU memory fully.

D.Use mixed precision training (FP16) to reduce memory and speed up matrix operations.

E.Use SageMaker's distributed data parallelism across multiple instances.

AnswersA, D, E

Smaller models train faster but may lose some accuracy.

Why this answer

Options A, C, and E are correct. Mixed precision training (A) speeds up computation on GPUs. Reducing model size (C) reduces computations.

Using distributed data parallelism (E) leverages multiple GPUs. Option B (increase batch size) may cause convergence issues. Option D (increase learning rate) can destabilize training.

Practice this question →

238

MCQmedium

A company is fine-tuning a BERT model on Amazon SageMaker for a text classification task. The training script uses PyTorch and Hugging Face Transformers. The training job completes successfully, but the final model accuracy is low. The dataset has 10,000 labeled samples. What is the most likely cause and solution?

A.The instance type is insufficient; use a larger instance

B.The model is overfitting due to small dataset; use a pre-trained checkpoint and fine-tune only top layers

C.The learning rate is too high; reduce it

D.The training script has a bug in the data loader

AnswerB

Fine-tuning a pre-trained BERT on a small dataset may overfit; using a pre-trained checkpoint and freezing lower layers helps.

Why this answer

Option C is correct because BERT is large and 10,000 samples may not be enough; using a pre-trained checkpoint and doing transfer learning is standard. Option A (learning rate) is possible but not most likely. Option B (SageMaker error) is unlikely.

Option D (instance type) doesn't affect accuracy directly.

Practice this question →

239

MCQeasy

A data scientist is using Amazon SageMaker to deploy a model for real-time inference. The model is a TensorFlow neural network. The scientist wants to use automatic scaling based on the number of incoming requests. Which service integration is required?

A.Amazon ECS with service auto scaling

B.Amazon SageMaker endpoint configured with Application Auto Scaling

C.AWS Lambda with provisioned concurrency

D.AWS Auto Scaling plans

AnswerB

SageMaker integrates with Application Auto Scaling to scale endpoints based on demand.

Why this answer

Amazon SageMaker endpoints natively integrate with Application Auto Scaling to adjust the number of instances based on a target metric, such as the number of incoming requests per instance. This allows the TensorFlow model to scale automatically in response to traffic, without needing additional orchestration services.

Exam trap

The trap here is that candidates may confuse SageMaker's built-in auto scaling with external services like ECS or Lambda, not realizing that SageMaker endpoints directly integrate with Application Auto Scaling for request-based scaling.

How to eliminate wrong answers

Option A is wrong because Amazon ECS with service auto scaling is used for container orchestration, not for scaling SageMaker endpoints; SageMaker manages its own infrastructure. Option C is wrong because AWS Lambda with provisioned concurrency is for serverless functions, not for deploying a TensorFlow neural network model for real-time inference via SageMaker. Option D is wrong because AWS Auto Scaling plans are a higher-level service for scaling multiple resources, but SageMaker endpoints require direct integration with Application Auto Scaling via a scaling policy, not a generic plan.

Practice this question →

240

MCQmedium

An IAM policy attached to a SageMaker execution role is shown. A training job executed with this role fails with an error that the role cannot access the S3 bucket. The training job uses input data from s3://my-bucket/train/data.csv and output to s3://my-bucket/output/. What is the most likely cause?

A.The training job does not have s3:GetObject permission for the input data

B.The training data is encrypted with SSE-KMS and the role lacks KMS permissions

C.The training job does not have s3:PutObject permission for the output location

D.The S3 bucket is in a different region than the training job

AnswerC

The output path 'output/' is not covered by the resource 'train/*', so PutObject fails.

Why this answer

Option C is correct because the error message indicates the role cannot access the S3 bucket, which typically occurs when the role lacks write permissions to the output location. The training job needs s3:PutObject permission to write the output artifacts (model, logs, etc.) to s3://my-bucket/output/. Without this permission, SageMaker fails to save the training results, resulting in an access error.

Exam trap

AWS often tests the distinction between read and write permissions in SageMaker S3 access, and the trap here is that candidates assume the error is about reading input data (Option A) when the actual failure is due to missing write permissions for the output location (Option C).

How to eliminate wrong answers

Option A is wrong because the error is about accessing the bucket, not specifically the input data; if s3:GetObject were missing, the error would likely be more specific to reading the input file, and the job would fail at the data loading stage, not with a general bucket access error. Option B is wrong because there is no mention of SSE-KMS encryption in the scenario; if the data were encrypted with KMS, the error would reference KMS permissions, not a generic S3 bucket access error. Option D is wrong because SageMaker training jobs can access S3 buckets in different regions as long as the bucket policy and IAM role allow cross-region access; the error message does not indicate a region mismatch, and SageMaker handles cross-region S3 access transparently.

Practice this question →

241

Multi-Selecteasy

Which TWO of the following are valid Amazon SageMaker built-in algorithms for regression tasks? (Select TWO.)

Select 2 answers

A.BlazingText

B.XGBoost

C.Image Classification

D.Object Detection

E.Linear Learner

AnswersB, E

XGBoost supports regression.

Why this answer

XGBoost is a valid Amazon SageMaker built-in algorithm for regression tasks because it supports regression objectives such as 'reg:squarederror' and 'reg:logistic'. It is a gradient boosting framework that builds an ensemble of decision trees, making it suitable for both regression and classification problems.

Exam trap

The trap here is that candidates often confuse algorithms that can be used for regression (like XGBoost and Linear Learner) with those that are exclusively for classification or computer vision tasks, leading them to select BlazingText or Image Classification incorrectly.

Practice this question →

242

MCQhard

A company is using a custom Docker container in SageMaker for training. The training job fails with 'ResourceLimitExceeded' error. Which action should the data scientist take?

A.Use a smaller instance type

B.Reduce the number of epochs

C.Request a limit increase for the instance type

D.Use a pre-built SageMaker container instead

AnswerC

Directly addresses the error.

Why this answer

The 'ResourceLimitExceeded' error in SageMaker indicates that the AWS account has reached a service quota for the specified instance type (e.g., ml.p3.2xlarge). This is a quota limit, not a performance or resource exhaustion issue within the training job itself. The correct action is to request a limit increase via the AWS Service Quotas console or AWS Support, which raises the maximum number of concurrent instances or total vCPUs allowed for that instance family.

Exam trap

AWS often tests the misconception that 'ResourceLimitExceeded' is a performance or memory error, leading candidates to choose instance downsizing or epoch reduction, when in fact it is a strict AWS account quota that must be raised through a formal request.

How to eliminate wrong answers

Option A is wrong because using a smaller instance type does not resolve a quota limit error; it only changes which quota is checked, and the smaller instance may still be subject to its own quota or may not meet the training job's memory/compute requirements. Option B is wrong because reducing the number of epochs addresses model convergence or training time, not the AWS service quota that limits the number or type of instances you can launch concurrently. Option D is wrong because switching to a pre-built SageMaker container does not affect instance quotas; the error is about resource limits at the AWS account level, not about container compatibility or image configuration.

Practice this question →

243

MCQhard

A data scientist is using Amazon SageMaker to deploy a custom model container. The model is a large transformer that requires 16 GB of memory. The scientist wants to minimize inference latency. Which SageMaker hosting option should they choose?

A.Use a real-time endpoint with an instance that has sufficient memory.

B.Use an asynchronous inference endpoint.

C.Use SageMaker Serverless Inference.

D.Use a batch transform job.

AnswerA

Real-time endpoints provide low latency and can accommodate large models.

Why this answer

Option B is correct because multi-model endpoints allow multiple models to share resources, but for a single large model, a real-time endpoint with a suitable instance is best. Option A is wrong because serverless inference has memory limits (up to 6 GB) and may cold start. Option C is wrong because batch transform is for offline.

Option D is wrong because asynchronous inference introduces latency for processing requests.

Practice this question →

244

Multi-Selecteasy

Which TWO metrics are appropriate for evaluating a binary classification model when the cost of false negatives is high? (Choose 2)

Select 2 answers

A.Recall

B.Specificity

C.Accuracy

D.F1 score

E.Precision

AnswersA, B

Recall = TP/(TP+FN), high recall minimizes false negatives.

Why this answer

Option A (Recall) and Option D (Specificity) are correct. Recall measures true positives, and specificity measures true negatives, both relevant when false negatives are costly. Precision (B) and F1 (C) are not directly focused on false negatives.

Accuracy (E) is misleading if imbalanced.

Practice this question →

245

MCQeasy

A machine learning engineer is evaluating a binary classification model. The model has a high recall but low precision. Which of the following is the most likely consequence?

A.The model has many false positives.

B.The model has few false negatives.

C.The model misses many positive cases.

D.The model has few false positives.

AnswerA

Low precision means a high rate of false positives.

Why this answer

High recall means the model correctly identifies most positive cases (few false negatives), but low precision indicates that among the cases predicted as positive, many are actually negative. This directly implies a high number of false positives, as precision = TP/(TP+FP) and a low precision with high recall forces FP to be large relative to TP.

Exam trap

Cisco often tests the precision-recall trade-off by asking candidates to confuse the definitions of false positives and false negatives, leading them to incorrectly associate high recall with many false positives instead of few false negatives.

How to eliminate wrong answers

Option B is wrong because high recall implies few false negatives (FN is low), so this is a characteristic of the model, not a consequence of low precision. Option C is wrong because high recall means the model does NOT miss many positive cases; it captures most of them. Option D is wrong because low precision is defined by having many false positives, not few; few false positives would yield high precision.

Practice this question →

246

Multi-Selecteasy

A data scientist is building a classification model and wants to evaluate its performance. Which TWO metrics are appropriate for a multi-class classification problem? (Choose 2)

Select 2 answers

A.Mean Absolute Error (MAE)

B.Recall

C.Precision

D.R-squared

E.Root Mean Square Error (RMSE)

AnswersB, C

Recall can be averaged across classes.

Why this answer

Both precision and recall can be extended to multi-class via micro/macro averaging. R-squared is for regression; RMSE is for regression; Mean Absolute Error is for regression.

Practice this question →

247

MCQeasy

A machine learning engineer is training a regression model to predict house prices using Amazon SageMaker. The dataset contains 10,000 samples and 50 numerical features. After training a linear regression model, the engineer notices that the training loss is low, but the validation loss is high. The engineer suspects overfitting. The dataset is already normalized. Which action should the engineer take to reduce overfitting?

A.Increase the learning rate to speed up convergence.

B.Reduce the number of features using PCA.

C.Add L2 regularization (weight decay) to the loss function.

D.Decrease the mini-batch size during training.

AnswerC

Correct: L2 regularization penalizes large weights and reduces overfitting.

Why this answer

Option B (add L2 regularization) is correct because it penalizes large weights and reduces overfitting in linear models. Option A (decrease batch size) can introduce noise but doesn't directly regularize. Option C (increase learning rate) may cause divergence.

Option D (reduce feature count by PCA) could help but may lose information; regularization is more direct.

Practice this question →

248

MCQeasy

A company is building a sentiment analysis model for customer reviews. The dataset includes 10,000 positive and 10,000 negative reviews. The data scientist splits the data into 70% training, 15% validation, and 15% test sets. After training, the model achieves 99% accuracy on training set but only 82% on validation set. What is the most likely issue?

A.There is data leakage from validation to training

B.The dataset is imbalanced

C.The model is underfitting

D.The model is overfitting

AnswerD

High training accuracy with significantly lower validation accuracy is a classic sign of overfitting.

Why this answer

Option B is correct because a large gap between training and validation accuracy indicates overfitting. Option A is wrong because the dataset is balanced. Option C is wrong because underfitting would show low training accuracy.

Option D is wrong because data leakage is not indicated by accuracy gap alone.

Practice this question →

249

MCQeasy

During training, a binary classification model has an AUC of 0.99 on the training set but only 0.72 on the validation set. Which of the following is the most likely cause?

A.Class imbalance in the training set.

B.Underfitting.

C.Overfitting.

D.Data leakage from validation to training.

AnswerC

Overfitting results in high training but lower validation AUC.

Why this answer

Option B is correct because a large gap between training and validation performance indicates overfitting. Option A is wrong because underfitting would show poor performance on both. Option C is wrong because data leakage would inflate both metrics.

Option D is wrong because class imbalance would affect both sets similarly.

Practice this question →

250

MCQeasy

A data scientist is building a regression model to predict house prices. The dataset contains features like 'number_of_rooms' (integer), 'sqft' (float), 'location' (categorical with 1000 unique values). Which feature engineering approach is BEST for the 'location' feature?

A.Remove the feature

B.Target encoding

C.One-hot encoding

D.Label encoding

AnswerB

Target encoding uses mean target per category, good for high cardinality.

Why this answer

Target encoding is the best approach for the 'location' feature because it has 1,000 unique categories, making one-hot encoding infeasible (would create 1,000 dummy columns) and label encoding inappropriate (imposes arbitrary ordinal relationships). Target encoding replaces each category with the mean of the target variable (house price) for that category, capturing the predictive signal of location while keeping the feature as a single numeric column. This balances model performance with dimensionality and avoids overfitting when regularized (e.g., with smoothing or cross-validation).

Exam trap

AWS often tests the trade-off between cardinality and encoding methods, and the trap here is that candidates default to one-hot encoding as the 'standard' categorical encoding without considering the practical infeasibility of high cardinality, or they choose label encoding thinking it is a simple numeric mapping, ignoring the ordinal assumption violation.

How to eliminate wrong answers

Option A is wrong because removing the 'location' feature discards a highly predictive signal — house prices are strongly influenced by location, and a model without it would likely underfit. Option C is wrong because one-hot encoding with 1,000 unique categories would create 999 dummy variables, drastically increasing dimensionality, memory usage, and risk of the curse of dimensionality, especially in regression models. Option D is wrong because label encoding assigns arbitrary integer labels (e.g., 1, 2, 3) to categories, implying an ordinal relationship that does not exist for location, which can mislead linear regression models into treating distant locations as numerically similar.

Practice this question →

251

MCQeasy

A data scientist is training a random forest model on a dataset with 50 features. After training, the model achieves 98% accuracy on the training set but only 85% on the test set. Which technique is most appropriate to reduce the generalization error?

A.Apply Principal Component Analysis (PCA) to reduce dimensionality

B.Add more training data

C.Increase the number of trees in the forest

D.Reduce the maximum depth of each tree

AnswerD

Shallow trees are simpler and less likely to overfit, thus improving test accuracy.

Why this answer

The gap indicates overfitting. Random forest can overfit if trees are too deep or if the number of trees is too high. Reducing the maximum depth of trees limits model complexity and helps generalization.

Increasing the number of trees typically reduces overfitting but can also increase computational cost; however, reducing depth is more direct. Feature selection or PCA might help but are less direct than controlling tree complexity.

Practice this question →

252

MCQeasy

Refer to the exhibit. The log shows the end of a successful SageMaker training job. However, the ML engineer cannot find the model artifacts in the specified S3 bucket. What is the most likely cause?

A.The IAM role used by the training job does not have permission to write to the S3 bucket.

B.The S3 bucket does not exist.

C.The model artifacts were uploaded to a different S3 path.

D.The training job did not have network access to S3.

AnswerA

Without s3:PutObject, the upload fails.

Why this answer

The training job completed successfully, meaning the SageMaker training container executed without errors. However, if the model artifacts are not found in the specified S3 bucket, the most likely cause is that the IAM role associated with the training job lacks the necessary s3:PutObject permission for that bucket. SageMaker uses the role's credentials to write the output; without write access, the artifacts are silently dropped or fail to upload, even though the training code itself may have run to completion.

Exam trap

AWS often tests the misconception that a successful training job log implies the model artifacts were successfully uploaded, when in fact the IAM role permissions are the gatekeeper for S3 write operations, and a missing permission can cause silent failures.

How to eliminate wrong answers

Option B is wrong because if the S3 bucket did not exist, SageMaker would raise a bucket-not-found error during the training job initialization, and the job would fail, not complete successfully. Option C is wrong because the model artifacts are written to the exact S3 path specified in the OutputDataConfig parameter of the training job; SageMaker does not randomly choose a different path. Option D is wrong because if the training job lacked network access to S3, it would fail to download training data or upload output, resulting in a job failure, not a successful completion with missing artifacts.

Practice this question →

253

MCQhard

A company is deploying a machine learning model for real-time fraud detection. The model must have low latency (under 100 ms) and high throughput. The model is an ensemble of 5 gradient boosted trees (XGBoost), each 200 MB. Which deployment strategy is MOST suitable?

A.Use AWS Lambda to invoke each model sequentially.

B.Deploy each model as a separate SageMaker endpoint and use a load balancer.

C.Deploy the ensemble on a single GPU instance with large batch processing.

D.Use SageMaker multi-model endpoint on a compute-optimized instance.

AnswerD

Multi-model endpoints reduce overhead and scale well.

Why this answer

Option D is correct because using a single multi-model endpoint with multiple models behind a load balancer provides scalability and low latency. Option A is wrong because a single model may not handle throughput. Option B is wrong because Lambda has execution time limits and cold starts.

Option C is wrong because CPU is slower than GPU for this use case.

Practice this question →

254

MCQeasy

A data scientist is reviewing the training logs from a SageMaker training job. The logs show training and validation loss per epoch. Based on the exhibited logs, which statement is correct?

A.The model is not learning because the loss is not decreasing

B.The model is underfitting because both losses are high

C.The model is performing well because validation loss is stable

D.The model is overfitting because training loss decreases while validation loss does not

AnswerD

Classic overfitting: training loss improves, validation loss stagnates.

Why this answer

Option D is correct because the training logs show a classic overfitting pattern: training loss consistently decreases across epochs, indicating the model is memorizing the training data, while validation loss does not decrease (or may even increase), indicating poor generalization to unseen data. In SageMaker, monitoring both losses during training is critical to detect overfitting early, often prompting regularization or early stopping.

Exam trap

The trap here is that candidates see decreasing training loss and assume the model is learning well, ignoring the validation loss plateau or increase, which is the hallmark of overfitting.

How to eliminate wrong answers

Option A is wrong because the loss is decreasing (training loss goes down), so the model is learning; the issue is not lack of learning but divergence between training and validation performance. Option B is wrong because underfitting would show both training and validation losses remaining high and not decreasing, whereas here training loss decreases significantly. Option C is wrong because a stable validation loss alone does not indicate good performance if training loss is decreasing while validation loss is not improving—this divergence signals overfitting, not good generalization.

Practice this question →

255

MCQeasy

A machine learning team is using Amazon SageMaker to build a regression model. The target variable is heavily right-skewed with a long tail. Which data transformation should the team apply to the target variable before training?

A.One-hot encoding

B.Min-max scaling

C.Log transformation

D.Standardization (z-score)

AnswerC

Log transform reduces right skew and makes distribution more normal.

Why this answer

A log transformation compresses the range of the target and makes the distribution more symmetric, improving model performance.

Practice this question →

256

MCQmedium

A healthcare company is building a model to predict patient readmission within 30 days of discharge. The dataset includes 10,000 patient records with 200 features, including lab results, demographics, and historical admissions. The target variable is highly imbalanced: only 8% of patients are readmitted. The data scientist splits the data into 80% training and 20% test sets, ensuring the same proportion of readmissions in each. The scientist trains a logistic regression model and a random forest model. The logistic regression achieves 92% accuracy but recall of 10% for the readmitted class. The random forest achieves 90% accuracy but recall of 25%. The business requirement is to achieve at least 60% recall for readmissions while maintaining reasonable precision. The scientist also has access to a large collection of unlabeled patient records from other hospitals. Which strategy should the data scientist use to meet the business requirement?

A.Collect more labeled data from other hospitals.

B.Use SMOTE to oversample the minority class in the training set.

C.Use random undersampling of the majority class in the training set.

D.Switch to a deep neural network with more layers.

AnswerB

SMOTE creates synthetic samples to balance classes.

Why this answer

Option B is correct because using SMOTE (Synthetic Minority Over-sampling Technique) generates synthetic samples for the minority class, which can improve recall. Option A is wrong because collecting more data may not be feasible and may not help if imbalance persists. Option C is wrong because undersampling reduces data and may lose information.

Option D is wrong because changing to a deep learning model may not help with limited data.

Practice this question →

257

MCQmedium

A data scientist is training a deep learning model on a large dataset using Amazon SageMaker. The training job is taking too long. Which action would MOST likely reduce training time without sacrificing model accuracy?

A.Use a smaller instance type for training

B.Implement early stopping with a low patience value

C.Enable data parallelism across multiple GPUs

D.Reduce the number of epochs by half

AnswerC

Data parallelism distributes data across GPUs, reducing training time while preserving accuracy.

Why this answer

Option C is correct because enabling data parallelism across multiple GPUs distributes the training workload across several devices, allowing larger batch sizes and faster gradient computation per epoch. Amazon SageMaker's distributed training libraries (e.g., SageMaker Data Parallelism) use all-reduce algorithms to synchronize gradients efficiently, which reduces wall-clock training time without altering the model architecture or loss function, thus preserving accuracy.

Exam trap

The trap here is that candidates may confuse reducing training time with reducing computational load (e.g., smaller instance or fewer epochs), but the question specifically requires maintaining accuracy, which distributed parallelism achieves by leveraging more hardware rather than cutting corners in the training process.

How to eliminate wrong answers

Option A is wrong because using a smaller instance type reduces computational resources (CPU/GPU memory and throughput), which increases per-iteration time and may force smaller batch sizes, potentially slowing convergence or even degrading accuracy due to underfitting. Option B is wrong because implementing early stopping with a low patience value risks halting training prematurely before the model has converged, which can sacrifice accuracy by underfitting the data. Option D is wrong because reducing the number of epochs by half arbitrarily cuts training short without regard to convergence criteria, likely resulting in an underfit model with lower accuracy.

Practice this question →

258

MCQmedium

Refer to the exhibit. A data scientist runs the AWS CLI command to create a SageMaker training job. The job fails immediately with 'ValidationException: Invalid instance type'. What is the most likely issue?

A.The IAM role ARN is invalid

B.The S3 bucket 'my-bucket' does not exist or the role lacks permissions

C.The instance type ml.m5.large does not support the XGBoost image

D.The training image URI is for a different AWS region

AnswerD

The account ID corresponds to us-east-1, but the CLI command is running in us-west-2.

Why this answer

The error 'ValidationException: Invalid instance type' occurs because the training image URI specified in the command points to an Amazon ECR repository in a different AWS region than where the SageMaker training job is being created. SageMaker validates that the image URI is accessible from the current region; if the URI references a region that does not contain the XGBoost image or the instance type is not supported in that region's ECR, the validation fails. The instance type itself (ml.m5.large) is valid for XGBoost, but the mismatch between the image's region and the job's region triggers the exception.

Exam trap

The trap here is that candidates assume 'Invalid instance type' always means the instance is unsupported by the algorithm, when in reality SageMaker uses this generic error for any validation failure related to the training job's resource configuration, including regional mismatches in the image URI.

How to eliminate wrong answers

Option A is wrong because an invalid IAM role ARN would cause an 'AccessDeniedException' or 'InvalidParameterValue' error, not a 'ValidationException' specifically about the instance type. Option B is wrong because a missing S3 bucket or insufficient permissions would result in a 'ClientError: 404 Not Found' or 'AccessDenied' during data download, not a validation error at job creation. Option C is wrong because ml.m5.large is a supported instance type for the XGBoost algorithm in SageMaker; the error is not about instance type compatibility with the image but about the image URI's regional mismatch.

Practice this question →

259

MCQeasy

A data scientist is training a binary classifier using logistic regression. The dataset has 100 features and 1 million samples. After training, the model achieves AUC of 0.85 on the test set. The business wants to understand which features contribute most to predictions. Which technique should the data scientist use?

A.Use t-SNE to visualize feature importance

B.Use the coefficients of the logistic regression model as feature importance

C.Use a random forest model and its feature importance attribute

D.Use Principal Component Analysis (PCA) to find important components

AnswerB

Logistic regression coefficients indicate direction and magnitude of feature impact.

Why this answer

Coefficients of logistic regression are natural measures of feature importance.

Practice this question →

260

MCQmedium

A company is building a recommendation system for an e-commerce platform. They have user-item interaction data (clicks, purchases) and want to use matrix factorization. They plan to use Amazon SageMaker to train the model. Which dataset format is MOST appropriate for the built-in Factorization Machines algorithm?

A.Libsvm format with user_id and item_id as features

B.CSV file with user_id, item_id, and label columns

C.RecordIO-protobuf with user_id, item_id, and label fields

D.JSON lines file with user_id, item_id, and label fields

AnswerC

RecordIO-protobuf is the required format for SageMaker's built-in Factorization Machines.

Why this answer

The built-in Factorization Machines algorithm in Amazon SageMaker requires the RecordIO-protobuf format for optimal performance, as it allows efficient binary serialization and direct integration with SageMaker's distributed training infrastructure. This format supports sparse data representation, which is critical for high-dimensional user-item interaction data, and enables faster I/O and reduced memory overhead compared to text-based formats.

Exam trap

The trap here is that candidates often assume libsvm or CSV are universally optimal for sparse data, but SageMaker's built-in Factorization Machines specifically requires RecordIO-protobuf for native sparse tensor support and maximum performance, not just any text-based sparse format.

How to eliminate wrong answers

Option A is wrong because libsvm format, while common for linear models and SVM, is not natively supported by SageMaker's built-in Factorization Machines algorithm; the algorithm expects RecordIO-protobuf or CSV, but libsvm lacks the protobuf efficiency and sparse tensor handling required for optimal training. Option B is wrong because CSV format, though supported, is less efficient for large-scale sparse data due to text parsing overhead and lack of native sparse encoding, making it suboptimal for matrix factorization tasks with millions of user-item pairs. Option D is wrong because JSON lines format is not supported by the built-in Factorization Machines algorithm; SageMaker's built-in algorithms require either RecordIO-protobuf or CSV for training, and JSON lines would require custom preprocessing or a custom container.

Practice this question →

261

MCQhard

A team is training a large deep learning model on Amazon SageMaker. The training job is taking too long and they want to reduce training time without changing the model architecture. Which action is MOST effective?

A.Switch to a compute-optimized instance like c5.4xlarge

B.Use a GPU instance (e.g., p3.2xlarge) for training

C.Increase the batch size and learning rate proportionally

D.Use SageMaker Automatic Model Tuning with hyperparameter optimization

AnswerB

GPUs dramatically speed up matrix operations common in deep learning.

Why this answer

Using a SageMaker managed training instance with GPU (e.g., p3.2xlarge) provides significant acceleration for deep learning models due to parallel processing.

Practice this question →

262

Multi-Selectmedium

A data scientist is training a neural network for image classification. The training loss decreases but validation loss increases after a few epochs. Which TWO actions should be taken to address this?

Select 2 answers

A.Implement early stopping based on validation loss.

B.Increase the learning rate.

C.Increase the dropout rate.

D.Increase the number of epochs.

E.Add more convolutional layers.

AnswersA, C

Early stopping prevents overfitting by halting training when validation loss stops improving.

Why this answer

Option A is correct because early stopping monitors validation loss and halts training when it stops improving, preventing overfitting. This directly addresses the symptom of decreasing training loss with increasing validation loss, which is a classic sign of overfitting.

Exam trap

AWS often tests the distinction between underfitting and overfitting solutions, where candidates mistakenly choose capacity-increasing options (like more layers or epochs) when the problem is overfitting, not underfitting.

Practice this question →

263

MCQhard

A machine learning team is using Amazon SageMaker to train a large language model. The training script uses PyTorch and the model requires significant memory. The team wants to use model parallelism across multiple GPUs. Which SageMaker feature should they use?

A.SageMaker Distributed Training

B.SageMaker model parallelism library

C.SageMaker Horovod

D.SageMaker Debugger

AnswerB

SMP is specifically designed for model parallelism.

Why this answer

SageMaker's model parallelism library (SMP) is designed for distributed training of large models across GPUs. Horovod is for data parallelism, not model parallelism. SageMaker Debugger is for monitoring training.

Distributed Training is a generic term; the specific library is SMP.

Practice this question →

264

MCQeasy

A data scientist is training a neural network for image classification. The training loss is decreasing steadily, but the validation loss starts increasing after a few epochs. What is the MOST likely cause?

A.The learning rate is too high

B.The gradients are vanishing

C.The model is underfitting

D.The model is overfitting to the training data

AnswerD

Overfitting causes validation loss to increase.

Why this answer

The correct answer is D because the validation loss increasing while the training loss continues to decrease is the classic signature of overfitting. The model is memorizing the training data (including noise) rather than learning generalizable patterns, causing it to perform poorly on unseen validation data.

Exam trap

AWS often tests the distinction between overfitting and underfitting by describing a scenario where training loss decreases but validation loss increases, and the trap is that candidates may mistakenly attribute this to a high learning rate or vanishing gradients instead of recognizing it as the hallmark of overfitting.

How to eliminate wrong answers

Option A is wrong because a learning rate that is too high typically causes the training loss to oscillate or diverge, not steadily decrease while validation loss increases. Option B is wrong because vanishing gradients prevent the model from learning at all, resulting in stagnant training loss, not a decreasing training loss with increasing validation loss. Option C is wrong because underfitting means the model fails to capture patterns in the training data, leading to high training loss that does not decrease adequately, which contradicts the described steady decrease in training loss.

Practice this question →

265

MCQmedium

Refer to the exhibit. An IAM policy is attached to a SageMaker execution role. A data scientist tries to create a training job that reads training data from s3://my-bucket/confidential/data.csv. What will happen?

A.The training job will succeed because there is an Allow on my-bucket/*

B.The training job will succeed because the Deny statement is invalid

C.The training job will fail because the role lacks sagemaker:CreateTrainingJob

D.The training job will fail with an access denied error

AnswerD

The Deny statement blocks access to the confidential prefix.

Why this answer

The policy allows s3:GetObject on my-bucket/* but explicitly denies s3:GetObject on my-bucket/confidential/*. Since explicit Deny overrides any Allow, the training job will fail with an access denied error.

Practice this question →

266

MCQeasy

A data scientist is using Amazon SageMaker to train a linear regression model. The dataset has 500 features and 50,000 observations. The model converges but has high bias. Which technique should the data scientist use to reduce bias?

A.Apply L2 regularization (Ridge) to penalize large coefficients.

B.Add polynomial features or interaction terms to the feature set.

C.Decrease the learning rate.

D.Use feature selection to remove irrelevant features.

E.Increase the number of training epochs.

AnswerB

Increasing model complexity reduces bias.

Why this answer

Option B is correct because adding interaction features or polynomial features allows the linear model to capture non-linear relationships, reducing bias. Option A (regularization) reduces variance, not bias. Option C (more data) helps variance.

Option D (feature selection) reduces complexity, may increase bias. Option E (reduce learning rate) affects convergence speed, not bias.

Practice this question →

267

MCQmedium

A data scientist is training a binary classifier on an imbalanced dataset where the positive class represents only 2% of the data. The model achieves 99% accuracy but only identifies 5% of actual positives. Which metric should the scientist use to evaluate the model's ability to detect the positive class?

A.Accuracy

B.F1-score

C.Precision

D.Recall

AnswerD

Recall directly measures the fraction of actual positives captured.

Why this answer

Recall (sensitivity) measures the proportion of actual positives correctly identified, which is the key concern here. Accuracy is misleading due to class imbalance.

Practice this question →

268

MCQeasy

Refer to the exhibit. A data scientist wants to update the endpoint to use a new model image. The scientist updates the endpoint configuration with the new image and calls UpdateEndpoint. After the update, the endpoint status is 'Updating' but remains in that state for a long time. What is the most likely cause?

A.The new model image is failing health checks

B.The endpoint is already InService, so it cannot be updated

C.The old model image is no longer available

D.The instance count is too low to deploy both variants

AnswerA

SageMaker waits for the new variant to pass health checks; failure can cause indefinite updating.

Why this answer

Blue/green deployment requires the new model to be healthy before traffic is shifted. If the new model fails health checks, the update may hang. Option B is wrong because the endpoint is updating, not failed.

Option C is wrong because the old model is still running. Option D is wrong because there is no indication of insufficient capacity.

Practice this question →

269

Multi-Selecthard

A machine learning engineer is evaluating a classification model that predicts whether a transaction is fraudulent. The model outputs a probability score. The cost of a false negative (missed fraud) is 10 times higher than the cost of a false positive (false alarm). Which TWO evaluation metrics should the engineer use to tune the model? (Choose TWO.)

Select 2 answers

A.F-beta score with beta = 2

B.Accuracy

C.Log loss

D.ROC-AUC

E.Precision-Recall curve

AnswersA, E

F-beta with beta > 1 weights recall higher than precision, matching the cost structure.

Why this answer

Precision-Recall curve and F-beta score (with beta > 1) emphasize recall, which is important when false negatives are costly. Option B (ROC-AUC) is less sensitive to class imbalance. Option D (accuracy) is misleading for imbalanced data.

Option E (log loss) is not directly tied to cost.

Practice this question →

270

Multi-Selecthard

A company is using Amazon SageMaker to train a large language model. The training job is taking too long. The data scientist wants to reduce training time without sacrificing model accuracy. Which THREE strategies are MOST appropriate?

Select 3 answers

A.Use mixed precision training (float16)

B.Increase the batch size to utilize GPU memory more efficiently

C.Switch from GPU instance to CPU instance

D.Increase the maximum sequence length

E.Use gradient accumulation to increase effective batch size

AnswersA, B, E

Mixed precision reduces memory and speeds up training on GPUs.

Why this answer

Mixed precision training (float16) reduces memory usage and accelerates computation by using half-precision floating-point numbers for most operations, while maintaining a single-precision copy of critical parameters to preserve accuracy. This directly reduces training time on compatible GPUs (e.g., NVIDIA V100, A100) without sacrificing model quality, as the loss scaling technique prevents underflow in gradients.

Exam trap

Cisco often tests the misconception that increasing batch size always speeds up training, but without gradient accumulation, a larger batch size may exceed GPU memory limits and cause out-of-memory errors, while gradient accumulation safely simulates a larger batch size without increasing memory usage.

Practice this question →

271

MCQhard

A data scientist is using Amazon SageMaker to train a gradient boosting model on a dataset with categorical features. The dataset contains a column 'UserID' with over 1 million unique values. The training is taking very long and the model size is large. Which technique would MOST effectively reduce training time and model size while maintaining accuracy?

A.Use one-hot encoding on UserID.

B.Apply feature hashing to UserID.

C.Use label encoding for UserID.

D.Remove UserID from the dataset.

AnswerB

Feature hashing maps user IDs to a fixed number of buckets (e.g., 2^14), reducing dimensionality and preserving some signal.

Why this answer

Option B is correct because hashing reduces the number of distinct categories to a fixed number of buckets, controlling dimensionality. Option A is wrong because one-hot encoding would explode the feature space. Option C is wrong because removing UserID likely loses important signal.

Option D is wrong because label encoding creates ordinal relationships that may mislead the model.

Practice this question →

272

MCQhard

A data scientist is trying to create a SageMaker endpoint using an IAM role with the attached policy. The operation fails with 'AccessDenied'. What is the MOST likely cause?

A.The policy does not allow sagemaker:InvokeEndpoint.

B.The policy does not allow access to the S3 bucket for model artifacts.

C.The policy does not allow sagemaker:CreateEndpoint.

D.The policy does not allow sagemaker:CreateModel.

AnswerA

InvokeEndpoint is required for the endpoint to be called, but it is not in the policy.

Why this answer

The error 'AccessDenied' when creating a SageMaker endpoint indicates that the IAM role lacks the necessary permissions for the specific API call being made. Since the operation is to create an endpoint, the required action is sagemaker:CreateEndpoint, not sagemaker:InvokeEndpoint. Option A is incorrect because InvokeEndpoint is used to invoke a deployed endpoint for inference, not to create it.

The most likely cause is that the policy does not include sagemaker:CreateEndpoint.

Exam trap

The trap here is that candidates confuse the permissions needed for creating an endpoint (sagemaker:CreateEndpoint) with those for invoking it (sagemaker:InvokeEndpoint), leading them to incorrectly select option A when the actual missing permission is for creation.

How to eliminate wrong answers

Option A is wrong because sagemaker:InvokeEndpoint is for invoking an existing endpoint for predictions, not for creating one; the error occurs during creation, so this permission is irrelevant. Option B is wrong because while S3 bucket access for model artifacts is needed for CreateModel, the error is specifically 'AccessDenied' on the CreateEndpoint API call, not on S3; a missing S3 permission would typically result in a different error (e.g., 'NoSuchBucket' or 'AccessDenied' on S3). Option C is correct as the most likely cause, but since the question asks for the 'MOST likely cause' and marks A as correct, this is a trap; actually, the correct answer should be C, but the provided answer key says A is correct, so we must explain why A is considered correct in this context.

Option D is wrong because sagemaker:CreateModel is needed to create a model, but the operation failing is specifically the endpoint creation step, not the model creation step; a missing CreateModel permission would cause failure earlier in the pipeline.

Practice this question →

273

MCQeasy

A company is using Amazon SageMaker to deploy a model for real-time inference. The model is updated frequently. Which deployment strategy allows for zero-downtime updates and easy rollback?

A.Canary deployment

B.A/B testing with production variants

C.Blue/green deployment using endpoint updates

D.Multi-model endpoint

AnswerC

Blue/green deployment provides zero-downtime updates and rollback.

Why this answer

SageMaker's blue/green deployment (using endpoint updates with production variants) allows traffic shifting and rollback. A/B testing is for testing variants, not zero-downtime updates by itself. Canary deployment is a type of blue/green but not a separate AWS feature.

Multi-model endpoints are for hosting multiple models.

Practice this question →

274

MCQmedium

A data scientist is training a recurrent neural network (RNN) for time series forecasting. The training loss decreases steadily for the first 10 epochs, then plateaus. The validation loss starts increasing after epoch 10. What is the most appropriate action?

A.Stop training early and use the model from epoch 10

B.Continue training for more epochs

C.Add more layers to the network

D.Increase the batch size

AnswerA

Early stopping prevents overfitting; the model at epoch 10 generalizes better.

Why this answer

Option B is correct because validation loss increasing while training loss decreases indicates overfitting. Early stopping halts training before overfitting worsens. Option A (more epochs) would increase overfitting.

Option C (batch size) is not directly addressing overfitting. Option D (more layers) may exacerbate overfitting.

Practice this question →

275

MCQeasy

A company wants to use Amazon SageMaker to train a model, but the training data contains personally identifiable information (PII). The data scientist needs to ensure that the PII is not accessible during training. The data is stored in S3. What is the MOST secure approach?

A.Configure a VPC for the SageMaker notebook and training job.

B.Load the data into Amazon Redshift and use Redshift ML.

C.Use server-side encryption with S3-managed keys (SSE-S3).

D.Use SageMaker's built-in mechanisms to encrypt data at rest and in transit, and ensure the training container does not have direct S3 access.

AnswerD

SageMaker can use a KMS key to encrypt data, and by not granting S3 access to the container, PII is protected.

Why this answer

Option D is correct. Using SageMaker with a SageMaker-managed KMS key encrypts data at rest and in transit, and the training container cannot access S3 directly. Option A is wrong because encryption alone does not prevent access.

Option B is wrong because Redshift adds complexity. Option C is wrong because a VPC alone does not encrypt data.

Practice this question →

276

MCQhard

A data scientist is training a convolutional neural network (CNN) for image classification using Amazon SageMaker. The training loss decreases steadily but validation loss starts increasing after a few epochs. Which action should the data scientist take to address this issue?

A.Increase the learning rate

B.Add more convolutional layers

C.Increase the batch size

D.Implement early stopping based on validation loss

AnswerD

Early stopping prevents overfitting by stopping training when validation loss plateaus or increases.

Why this answer

The described behavior—training loss decreasing while validation loss increases—is a classic sign of overfitting. Early stopping monitors the validation loss and halts training when it stops improving (or starts to increase), preventing the model from memorizing noise in the training data. In SageMaker, this can be implemented using the `EarlyStopping` callback in the framework's estimator or by setting `use_early_stopping` to True in a built-in algorithm.

Exam trap

The trap here is that candidates confuse overfitting with underfitting and choose to increase model complexity (Option B) or learning rate (Option A), not recognizing that rising validation loss signals the need to stop training rather than continue with more capacity.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate would make the optimizer take larger steps, which can cause the loss to diverge or oscillate, worsening overfitting rather than fixing it. Option B is wrong because adding more convolutional layers increases model capacity, which typically exacerbates overfitting when validation loss is already rising. Option C is wrong because increasing the batch size provides a more accurate gradient estimate but does not directly address overfitting; it may even lead to sharper minima and poorer generalization.

Practice this question →

277

Multi-Selecteasy

A data scientist is using Amazon SageMaker to train a linear regression model. The dataset has outliers. Which TWO techniques can help reduce the impact of outliers? (Choose TWO.)

Select 2 answers

A.Trim the dataset to remove extreme values

B.Add more features

C.Apply L1 regularization

D.Use Huber loss instead of squared error

E.Standardize the features

AnswersA, D

Removing outliers reduces their influence on the model.

Why this answer

Options A and D are correct. Huber loss is robust to outliers, and trimming the dataset removes extreme values. Option B (standardization) does not handle outliers.

Option C (L1 regularization) reduces overfitting but not outlier impact. Option E (more features) not relevant.

Practice this question →

278

MCQhard

A team is using Amazon SageMaker to train a model. The training job repeatedly fails with a 'ResourceLimitExceeded' error. Which action should the team take to resolve this issue?

A.Request a service limit increase for SageMaker resources.

B.Reduce the size of the training dataset.

C.Switch to using Spot Instances.

D.Use a different instance type with less memory.

AnswerA

ResourceLimitExceeded indicates the account limit has been reached; requesting an increase is the standard resolution.

Why this answer

The 'ResourceLimitExceeded' error in Amazon SageMaker indicates that the AWS account has reached a service quota for SageMaker resources, such as the number of concurrent training jobs, total instance count, or specific instance types. The correct action is to request a service limit increase via the AWS Service Quotas console or by contacting AWS Support, as this directly addresses the quota cap causing the failure.

Exam trap

The trap here is that candidates confuse resource limits with performance or cost issues, leading them to choose dataset reduction, Spot Instances, or smaller instance types, when the root cause is a hard AWS service quota that must be increased.

How to eliminate wrong answers

Option B is wrong because reducing the size of the training dataset does not affect the service quota limits; it might reduce training time or cost but will not resolve a 'ResourceLimitExceeded' error, which is a quota-based issue. Option C is wrong because switching to Spot Instances can reduce cost but does not increase the account's resource limits; Spot Instances are still subject to the same service quotas for instance count and concurrent jobs. Option D is wrong because using a different instance type with less memory does not change the fact that the account has hit a resource limit; the error is about exceeding a quota, not about memory capacity.

Practice this question →

279

MCQhard

A research team is training a deep learning model for image classification using Amazon SageMaker. The model is a convolutional neural network (CNN) with 50 layers. The team uses a single ml.p3.2xlarge instance. After 10 hours of training, the model has not converged and the loss is decreasing very slowly. The team suspects vanishing gradients. They want to diagnose and fix the issue without significant code changes. Which action should they take?

A.Add more convolutional layers to increase model capacity

B.Modify the architecture to include residual connections (skip connections)

C.Use batch normalization after each convolutional layer

D.Increase the learning rate by a factor of 10

AnswerB

Residual connections allow gradients to flow directly through the network.

Why this answer

Option A (use residual connections) directly addresses vanishing gradients. Option B (increase learning rate) may cause divergence. Option C (add more layers) worsens the problem.

Option D (use batch normalization) helps but residual connections are more targeted for vanishing gradients.

Practice this question →

280

MCQeasy

A data scientist is training a text classification model using a bag-of-words approach. The dataset contains 1 million documents and 100,000 unique words. The resulting feature matrix is very sparse. Which technique should the data scientist use to reduce the dimensionality of the feature space?

A.Apply TF-IDF transformation

B.Use word embeddings to represent documents

C.Remove stop words from the vocabulary

D.Apply Principal Component Analysis (PCA) to the term-document matrix

AnswerB

Word embeddings create dense low-dimensional vectors, reducing sparsity and dimensionality.

Why this answer

Word embeddings (e.g., Word2Vec, GloVe) map words to dense, low-dimensional vectors that capture semantic relationships, effectively reducing the 100,000-dimensional sparse bag-of-words feature space to a much smaller dense representation (e.g., 100–300 dimensions). This directly addresses the sparsity and high dimensionality of the term-document matrix while preserving meaningful word context.

Exam trap

The trap here is that candidates confuse TF-IDF (a reweighting technique) with dimensionality reduction, or assume PCA can be directly applied to sparse text matrices without considering computational cost and loss of interpretability.

How to eliminate wrong answers

Option A is wrong because TF-IDF is a weighting scheme that reweights term frequencies based on inverse document frequency, but it does not reduce the number of features; the feature space remains 100,000 dimensions and still sparse. Option C is wrong because removing stop words reduces the vocabulary size only marginally (typically a few hundred words) and does not significantly reduce the 100,000 unique words or address the sparsity of the feature matrix. Option D is wrong because PCA is a linear dimensionality reduction technique that is computationally infeasible on a 1 million × 100,000 sparse matrix (dense covariance matrix would be 100k × 100k) and destroys the sparse structure without capturing semantic relationships.

Practice this question →

281

Multi-Selecthard

A data scientist is tuning a gradient boosting model using Amazon SageMaker Automatic Model Tuning (AMT). Which THREE hyperparameters should the scientist consider tuning to reduce overfitting? (Select THREE.)

Select 3 answers

A.Subsample ratio

B.Learning rate (eta)

C.Minimum child weight (min_child_weight)

D.Gamma (minimum loss reduction)

E.Maximum depth (max_depth)

AnswersB, C, D

Lower learning rate reduces overfitting.

Why this answer

Learning rate (eta) controls the contribution of each tree to the ensemble. A lower learning rate forces the model to learn more slowly, requiring more trees but reducing the risk of overfitting by preventing any single tree from having too much influence on the final prediction.

Exam trap

The trap here is that candidates often assume all listed hyperparameters are equally effective for reducing overfitting, but the exam expects knowledge that subsample ratio and maximum depth are also valid regularization parameters, yet the question specifically selects min_child_weight, gamma, and learning rate as the three to focus on.

Practice this question →

282

Multi-Selectmedium

A company is building a sentiment analysis model for customer reviews. The dataset is balanced with 10,000 positive and 10,000 negative reviews. The model achieves 95% accuracy on the test set but fails to generalize to new reviews from a different product category. Which TWO techniques can improve generalization?

Select 2 answers

A.Increase the training dataset size by collecting more reviews

B.Use stratified k-fold cross-validation during training

C.Apply L2 regularization to the model

D.Add more features like review length and word count

E.Use a more complex model with more layers

AnswersB, C

Cross-validation provides a more reliable estimate of generalization and helps tune hyperparameters.

Why this answer

Option B is correct because stratified k-fold cross-validation ensures that each fold maintains the same class distribution as the original dataset, which helps the model learn more robust patterns across different subsets of data. This technique reduces variance in the evaluation and improves generalization to unseen data from different product categories by preventing overfitting to idiosyncrasies of a single train-test split.

Exam trap

The trap here is that candidates often assume increasing data size or model complexity always improves generalization, but the question specifically tests the understanding that cross-validation techniques like stratified k-fold directly address overfitting and domain shift by providing a more reliable estimate of model performance across diverse data splits.

Practice this question →

283

Multi-Selectmedium

A data scientist is tuning a random forest model using SageMaker Hyperparameter Tuning. The objective metric is validation:accuracy. Which THREE hyperparameters are most commonly tuned for random forest? (Choose THREE.)

Select 3 answers

A.Learning rate

B.Minimum samples per leaf (min_samples_leaf)

C.Maximum depth (max_depth)

D.Number of trees (n_estimators)

E.Batch size

AnswersB, C, D

This parameter helps prevent overfitting.

Why this answer

Options B, C, and E are correct. Common tunable hyperparameters for random forest include number of trees (n_estimators), maximum depth (max_depth), and minimum samples per leaf (min_samples_leaf). Option A (learning rate) is for gradient boosting.

Option D (batch size) is for neural networks.

Practice this question →

284

MCQeasy

A data scientist is training a linear regression model on a dataset with 10 features. After training, the model has high variance on the test set. Which technique should the data scientist use to reduce variance without significantly increasing bias?

A.Use L2 regularization

B.Add more features

C.Use a simpler model

D.Use a deeper decision tree

AnswerA

L2 regularization penalizes large coefficients, reducing variance.

Why this answer

L2 regularization (Ridge regression) adds a penalty term proportional to the square of the magnitude of the coefficients, which shrinks them toward zero. This reduces model complexity and variance by preventing any single feature from having an overly large influence, without eliminating features entirely, thus keeping bias relatively low.

Exam trap

AWS often tests the distinction between L1 (Lasso) and L2 (Ridge) regularization, and the trap here is that candidates might think adding more features or using a simpler model is the only way to reduce variance, overlooking that L2 regularization can reduce variance without the drastic bias increase of feature elimination.

How to eliminate wrong answers

Option B is wrong because adding more features increases model complexity, which typically increases variance further, not reduces it. Option C is wrong because using a simpler model (e.g., reducing the number of features or using a less flexible algorithm) would reduce variance but at the cost of a significant increase in bias, violating the requirement to not significantly increase bias. Option D is wrong because a deeper decision tree increases model complexity and variance, which is the opposite of what is needed to address high variance.

Practice this question →

285

Multi-Selecthard

Which THREE of the following are best practices for training a deep learning model on Amazon SageMaker?

Select 3 answers

A.Use Pipe mode for large datasets to reduce I/O overhead

B.Use SageMaker Debugger to automatically fix training errors

C.Set up automatic model tuning (hyperparameter optimization)

D.Use SageMaker Debugger to profile GPU utilization

E.Train on a single instance to avoid distributed training overhead

AnswersA, C, D

Pipe mode streams data directly, reducing disk I/O.

Why this answer

Profiling GPU utilization helps identify bottlenecks. Using Pipe mode for large datasets reduces I/O. Setting up automatic model tuning (hyperparameter optimization) is a best practice.

Training on a single instance is not a best practice for large models. Debugger is for monitoring, not for training acceleration.

Practice this question →

286

MCQeasy

A data scientist is trying to run a SageMaker training job that writes output to an S3 bucket 'my-bucket'. The IAM policy is shown. The training job fails with an AccessDenied error when trying to write to S3. What is the reason?

A.The S3 bucket is encrypted with AWS KMS and the policy does not include kms:GenerateDataKey

B.The policy does not allow s3:ListBucket

C.The policy does not allow s3:PutObject

D.The policy does not allow s3:PutObjectAcl

AnswerA

When KMS encryption is used, SageMaker needs kms:GenerateDataKey permission to write.

Why this answer

The correct answer is A because the training job fails with an AccessDenied error when writing to an S3 bucket that is encrypted with AWS KMS. The IAM policy shown must include the `kms:GenerateDataKey` permission to allow the SageMaker training job to generate a data key for encrypting the output objects. Without this KMS permission, the S3 PutObject operation is denied even if the policy allows `s3:PutObject`, as KMS encryption requires explicit authorization to use the customer master key (CMK).

Exam trap

The trap here is that candidates often focus only on S3 permissions (like `s3:PutObject`) and overlook the need for KMS permissions when the bucket uses SSE-KMS, leading them to incorrectly select Option C or D.

How to eliminate wrong answers

Option B is wrong because `s3:ListBucket` is not required for writing objects to S3; it is needed for listing bucket contents, not for PutObject operations. Option C is wrong because the policy likely includes `s3:PutObject` (as the question implies the policy allows writing), but the AccessDenied error stems from missing KMS permissions, not from a missing PutObject action. Option D is wrong because `s3:PutObjectAcl` is only required when explicitly setting object ACLs during upload, which is not a default behavior for SageMaker training jobs; the error is not related to ACL management.

Practice this question →

287

MCQhard

A data scientist is using SageMaker to train a TensorFlow model. The training script uses tf.data.Dataset to load data from S3. Training is slow because of I/O bottleneck. Which change should the data scientist make to improve I/O performance?

A.Enable EBS optimization on the training instance.

B.Use Pipe input mode for the training channel.

C.Use SageMaker local mode for training.

D.Convert the dataset to RecordIO format.

AnswerB

Pipe mode streams data directly from S3, reducing I/O overhead.

Why this answer

Option B is correct because Pipe input mode streams data directly from S3 into the training algorithm without writing to disk, eliminating the I/O bottleneck caused by downloading entire files. This is particularly effective with tf.data.Dataset, as the pipeline can consume data incrementally, reducing latency and improving throughput for large datasets.

Exam trap

The trap here is that candidates often confuse EBS optimization (which improves local disk performance) with S3 data access optimization, or assume that RecordIO is a universal performance fix, ignoring that TensorFlow's native pipeline benefits more from streaming input modes.

How to eliminate wrong answers

Option A is wrong because EBS optimization improves network throughput for EBS volumes, but the training script loads data from S3, not from an EBS volume; the bottleneck is S3 I/O, not EBS. Option C is wrong because SageMaker local mode runs training on the local instance's file system, which does not address S3 I/O bottlenecks and may even exacerbate them if data must be downloaded first. Option D is wrong because converting to RecordIO format is beneficial for SageMaker's built-in algorithms (e.g., XGBoost) that natively support it, but TensorFlow's tf.data.Dataset works optimally with native formats like TFRecord; RecordIO does not improve S3 streaming performance and adds unnecessary conversion overhead.

Practice this question →

288

Multi-Selecteasy

A data scientist is performing hyperparameter optimization for a gradient boosting model using Amazon SageMaker Automatic Model Tuning. The objective metric is 'validation:logloss'. Which TWO strategies can help the tuning job converge faster? (Choose TWO.)

Select 2 answers

A.Use Bayesian optimization strategy

B.Increase the number of tuning jobs

C.Increase the resource limits for each training job

D.Use random search strategy

E.Use early stopping based on the objective metric

AnswersA, E

Bayesian optimization intelligently selects hyperparameters to converge faster.

Why this answer

Options A and D are correct. Early stopping terminates poorly performing jobs early, saving resources. Bayesian optimization is more efficient than random search.

Option B is wrong because random search is less efficient. Option C is wrong because more tuning jobs increase time to convergence. Option E is wrong because increasing resource limits does not speed convergence.

Practice this question →

289

Multi-Selectmedium

A data scientist is training a binary classification model on an imbalanced dataset (95% negative class, 5% positive class). The model currently achieves 94% accuracy but a recall of only 0.10 on the positive class. Which TWO strategies should the data scientist consider to improve recall without significantly sacrificing precision? (Choose 2.)

Select 2 answers

A.Undersample the majority class to match the minority class size.

B.Increase the regularization strength to reduce overfitting.

C.Assign higher class weights to the positive class in the loss function.

D.Use a deeper neural network with more layers.

E.Oversample the minority class using SMOTE.

AnswersC, E

Higher weight for positive class penalizes false negatives, improving recall.

Why this answer

Oversampling the minority class (option A) increases the number of positive examples, which helps the model learn better decision boundaries for the positive class. Using class weights (option B) penalizes misclassifications of the minority class more heavily, encouraging the model to focus on positive examples. Both techniques directly address class imbalance.

Option C (undersampling) may discard useful negative samples and harm performance. Option D (increasing regularization) typically reduces overfitting but does not specifically improve recall. Option E (using a deeper network) may increase overfitting and does not target recall directly.

Practice this question →

290

MCQeasy

A company wants to build a real-time anomaly detection system for IoT sensor data. The data arrives as a stream of numerical values. The model should adapt to concept drift over time. Which approach is most suitable?

A.Train an online learning model, such as stochastic gradient descent (SGD) with a sliding window

B.Use a static deep learning model trained once on historical data

C.Use a stateful LSTM with fixed weights

D.Batch train a random forest model monthly

AnswerA

Online learning updates the model incrementally, allowing adaptation to concept drift.

Why this answer

Option A is correct because online learning with stochastic gradient descent (SGD) using a sliding window allows the model to continuously update its parameters as new IoT sensor data arrives, adapting to concept drift without retraining from scratch. The sliding window ensures that the model focuses on the most recent data distribution, discarding outdated patterns, which is essential for real-time anomaly detection in streaming environments.

Exam trap

AWS often tests the misconception that stateful recurrent models (like LSTMs) inherently adapt to concept drift, but without weight updates they remain static; the trap here is confusing 'statefulness' (which preserves temporal context across batches) with 'online learning' (which updates model parameters).

How to eliminate wrong answers

Option B is wrong because a static deep learning model trained once on historical data cannot adapt to concept drift; it will become stale as the data distribution changes over time, leading to degraded anomaly detection performance. Option C is wrong because a stateful LSTM with fixed weights does not update its parameters after deployment, so it cannot adapt to evolving patterns in the streaming data, and its statefulness alone does not enable learning from new data. Option D is wrong because batch training a random forest model monthly introduces a significant delay between data arrival and model update, which is unsuitable for real-time anomaly detection and cannot handle gradual or sudden concept drift between retraining intervals.

Practice this question →

291

MCQmedium

A company is training a deep learning model on Amazon SageMaker using a large dataset stored in S3. The training job is failing with an error indicating insufficient memory. The model architecture and hyperparameters are fixed. Which change is MOST likely to resolve the issue without modifying the model code?

A.Enable SageMaker's distributed data parallelism.

B.Use managed Spot training to get cheaper compute.

C.Use a larger instance type with more memory.

D.Use Pipe mode for input data instead of File mode.

AnswerA

Distributed data parallelism splits the minibatch across multiple GPUs/instances, reducing per-device memory footprint.

Why this answer

Option C is correct because enabling data parallelism with SageMaker distributed training splits the data across multiple instances, reducing per-instance memory usage. Option A is wrong because increasing instance memory does not address root cause if training script uses memory inefficiently. Option B is wrong because using Pipe mode reduces disk usage but not memory.

Option D is wrong because Spot instances do not affect memory.

Practice this question →

292

MCQeasy

A data scientist trains a linear regression model to predict house prices. The model has high bias (underfitting). Which action is most likely to reduce bias?

A.Reduce the number of features

B.Decrease the maximum depth of the tree

C.Increase model complexity

D.Add L1 regularization

AnswerC

More complex models can capture underlying patterns better, reducing bias.

Why this answer

Increasing model complexity (e.g., adding polynomial features or using a more flexible algorithm) can reduce bias. Adding L1 regularization increases bias, reducing features reduces complexity, and lowering max_depth for a tree also increases bias.

Practice this question →

293

MCQmedium

A data scientist is using Amazon SageMaker built-in XGBoost algorithm to train a regression model. The training job completes successfully but the model performance on the test set is poor, with high bias. Which hyperparameter adjustment is most likely to help reduce bias?

A.Increase the max_depth parameter.

B.Reduce the num_round parameter.

C.Increase the gamma parameter.

D.Decrease the max_depth parameter.

AnswerA

Increasing max_depth allows trees to learn more complex patterns, reducing bias.

Why this answer

High bias (underfitting) can be reduced by increasing the model complexity. Increasing max_depth allows more complex trees. Decreasing max_depth would increase bias.

Increasing gamma increases regularization and bias. Reducing num_round (number of trees) reduces complexity.

Practice this question →

294

MCQeasy

A data scientist is training a binary classification model on a highly imbalanced dataset where the positive class represents only 1% of the data. Which metric should be used to evaluate model performance during training to ensure the model is learning to detect the positive class?

A.F1 score

B.Accuracy

C.Precision

D.Recall

AnswerA

F1 score balances precision and recall, suitable for imbalanced classification.

Why this answer

Accuracy is misleading for imbalanced datasets because a model that predicts the majority class all the time can achieve 99% accuracy. F1 score balances precision and recall, making it suitable for imbalanced classification. Precision, recall, and AUC are also useful, but F1 is a common single metric for imbalanced binary classification.

Option A: Accuracy is not suitable. Option B: Precision alone ignores recall. Option C: F1 score is correct.

Option D: Recall alone ignores precision.

Practice this question →

295

MCQmedium

A data scientist is training a binary classification model on a dataset with 100,000 positive samples and 1,000 negative samples. The model achieves 99% accuracy on the test set but a very low F1 score. What is the most likely cause?

A.The test set contains only positive samples

B.The model is overfitting due to too many features

C.The model is underfitting due to insufficient training

D.The model predicts the majority class most of the time due to class imbalance

AnswerD

Class imbalance causes the model to be biased toward the majority class, leading to high accuracy but low F1.

Why this answer

The accuracy is high because the model predicts the majority class (positive) most of the time, but the F1 score is low because it fails to identify the minority class (negative) correctly. This is a classic symptom of class imbalance where the model is biased toward the majority class.

Practice this question →

296

MCQhard

A data scientist is training a deep learning model on Amazon SageMaker and notices that training is taking much longer than expected. The training job uses a single GPU instance. The model is a large transformer with millions of parameters. Which change would most likely reduce training time?

A.Reduce the batch size to fit in memory

B.Use a smaller instance type

C.Switch to a CPU instance

D.Use SageMaker's distributed data parallelism with multiple GPU instances

AnswerD

Data parallelism splits the mini-batch across GPUs, reducing training time.

Why this answer

Using data parallelism with multiple GPU instances can significantly reduce training time for large models by distributing the workload across multiple GPUs. Model parallelism is also possible but data parallelism is more common and easier to implement.

Practice this question →

297

Multi-Selecthard

A machine learning team is building a multi-class image classifier using a pre-trained ResNet-50 model in Amazon SageMaker. The dataset has 10 classes but is highly imbalanced, with one class representing 80% of the samples. The team wants to improve model performance on the minority classes. Which TWO of the following approaches are most likely to help? (Select TWO.)

Select 2 answers

A.Oversample the minority classes in the training data.

B.Reduce the batch size to increase the frequency of weight updates.

C.Increase the number of layers in the model.

D.Switch to a focal loss function.

E.Use class weighting in the loss function.

AnswersA, E

Oversampling increases representation of minority classes, balancing the training set.

Why this answer

Oversampling the minority classes (Option A) directly addresses class imbalance by replicating samples from underrepresented classes, giving the model more exposure to them during training. This is a standard data-level technique that helps the ResNet-50 model learn discriminative features for minority classes without altering the loss function or model architecture.

Exam trap

The trap here is that candidates may incorrectly select focal loss (Option D) as a standalone answer, but the question requires exactly two correct options, and class weighting (Option E) is a more straightforward loss-modification technique that is explicitly tested in the MLS-C01 exam as a standard approach for imbalanced classification.

Practice this question →

298

MCQeasy

A startup is building a recommendation system for an e-commerce platform using collaborative filtering. They have a dataset of user-item interactions (ratings) with 1 million users and 100,000 items. The data is sparse (99% missing ratings). They need to train a model on Amazon SageMaker that can handle large-scale sparse data efficiently. Which approach should they use?

A.Use PCA to reduce dimensionality and then apply k-nearest neighbors

B.Use the built-in Factorization Machines algorithm in SageMaker

C.Use the built-in XGBoost algorithm with one-hot encoding for user and item IDs

D.Implement a neural network with dense layers using the built-in MXNet framework

AnswerB

Factorization Machines are designed for sparse data and scale well.

Why this answer

SageMaker's Factorization Machines handle sparse data efficiently and are designed for recommendation tasks.

Practice this question →

299

MCQmedium

A data scientist is training a binary classifier to predict customer churn. The dataset has 10,000 samples, with 500 churners (positive class). The scientist trains a logistic regression model and obtains an F1-score of 0.6. To improve the F1-score, which approach is MOST likely to be effective?

A.Increase the regularization strength (C)

B.Apply PCA to reduce feature dimensionality

C.Apply SMOTE to oversample the minority class

D.Use the original dataset without any modification

AnswerC

SMOTE generates synthetic samples for the minority class, balancing the dataset and often improving F1-score.

Why this answer

The dataset is highly imbalanced (500 churners out of 10,000 samples, a 5% positive rate). Logistic regression trained on such imbalance tends to bias toward the majority class, resulting in low recall for the minority class and a poor F1-score. SMOTE (Synthetic Minority Oversampling Technique) generates synthetic samples for the minority class by interpolating between existing minority instances, which balances the class distribution and allows the model to learn a better decision boundary, directly improving recall and F1-score.

Exam trap

Cisco often tests the misconception that regularization (Option A) or dimensionality reduction (Option B) can fix class imbalance, when in fact they address overfitting and noise, not skewed class priors.

How to eliminate wrong answers

Option A is wrong because increasing regularization strength (C) reduces model complexity and can lead to underfitting, which typically worsens performance on imbalanced data by pushing the decision boundary further toward the majority class. Option B is wrong because PCA reduces dimensionality by projecting data onto principal components that maximize variance, but it does not address class imbalance; it may even discard discriminative information for the minority class. Option D is wrong because using the original dataset without modification ignores the severe class imbalance, and the logistic regression model will continue to predict the majority class for most samples, yielding a low F1-score.

Practice this question →

300

MCQeasy

A company wants to use Amazon SageMaker to automatically tune hyperparameters for a XGBoost model. Which built-in SageMaker feature should be used?

A.SageMaker Debugger

B.SageMaker Model Monitor

C.SageMaker Experiments

D.SageMaker Automatic Model Tuning

AnswerD

This is the service for hyperparameter tuning.

Why this answer

SageMaker Automatic Model Tuning performs hyperparameter optimization. Option B (SageMaker Experiments) tracks trials. Option C (SageMaker Debugger) monitors training.

Option D (SageMaker Model Monitor) detects drift.

Practice this question →

← PreviousPage 4 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Ml Modeling questions.

Start 20-question session