Knowledge + Practice

CCNA Modeling Questions

75 of 624 questions · Page 8/9 · Modeling · Answers revealed

Practice these questions Domain overview All questions

526

Multi-Selectmedium

A data scientist is training a neural network for a multi-class classification problem. The model is overfitting. Which TWO of the following techniques can help reduce overfitting? (Choose two.)

Select 2 answers

A.Increase the number of hidden layers.

B.Add dropout layers after the hidden layers.

C.Decrease the learning rate.

D.Add L2 regularization to the loss function.

E.Reduce the batch size.

AnswersB, D

Dropout randomly drops units during training, reducing co-adaptation.

Why this answer

Option B is correct because dropout layers randomly deactivate a fraction of neurons during training, which prevents the network from relying too heavily on any single neuron and forces it to learn more robust features. This reduces co-adaptation among neurons and is a standard regularization technique to combat overfitting in neural networks.

Exam trap

AWS often tests the distinction between techniques that reduce overfitting (regularization) versus those that improve training dynamics (learning rate, batch size), leading candidates to mistakenly select options like decreasing the learning rate or reducing batch size as primary overfitting solutions.

Practice this question →

527

Multi-Selecthard

A company is deploying a machine learning model using Amazon SageMaker. The model requires GPUs for inference. Which THREE configurations can the company use to meet this requirement? (Choose THREE.)

Select 3 answers

A.SageMaker Serverless Inference

B.Real-time endpoints with ml.p3 instance types

C.SageMaker Batch Transform with ml.p3 instances

D.SageMaker Studio

E.SageMaker Elastic Inference (EI)

AnswersB, C, E

Real-time endpoints support GPU instances like ml.p3.

Why this answer

Real-time endpoints support GPU instances. Batch transform also supports GPU. Elastic Inference (option C) provides GPU acceleration without a full GPU instance.

Option B (Serverless) does not support GPU. Option D (SageMaker Studio) is an IDE, not for inference.

Practice this question →

528

MCQhard

Refer to the exhibit. The training job 'my-job' failed with the error 'Unable to pull image from ECR'. What is the most likely cause?

A.The IAM role does not have permission to pull images from the ECR repository.

B.The instance type ml.m5.large does not support custom images.

C.The S3 bucket for training data is in a different account.

D.The role ARN is incorrect.

AnswerA

Without ecr:GetDownloadUrlForLayer and BatchGetImage, the pull fails.

Why this answer

The error 'Unable to pull image from ECR' indicates that the SageMaker training job could not retrieve the custom Docker image stored in Amazon ECR. The most likely cause is that the IAM role associated with the training job lacks the `ecr:GetDownloadUrlForLayer` and `ecr:BatchGetImage` permissions required to pull images from the ECR repository. Without these permissions, SageMaker cannot authenticate and download the container image, even if the repository and image exist.

Exam trap

Cisco often tests the misconception that any IAM role with basic SageMaker permissions can pull images from ECR, but the trap here is that the role must have explicit ECR permissions (not just SageMaker permissions) to download the container image, and candidates may incorrectly blame the instance type or S3 bucket location instead.

How to eliminate wrong answers

Option B is wrong because the instance type ml.m5.large fully supports custom images; SageMaker allows custom Docker images on any supported instance type, including ml.m5.large, as long as the image is compatible with the instance architecture. Option C is wrong because the S3 bucket being in a different account would cause a different error (e.g., 'Access Denied' or 'Bucket not found') and would not affect the ability to pull an image from ECR, which is a separate service. Option D is wrong because an incorrect role ARN would result in a validation error when submitting the job (e.g., 'Invalid IAM Role ARN'), not a runtime error during image pull; the job would fail to start, not fail mid-execution with an ECR pull error.

Practice this question →

529

Multi-Selecthard

A company uses a SageMaker endpoint for real-time inference. They need to ensure high availability during deployment updates. Which THREE steps achieve this? (Choose 3)

Select 3 answers

A.Use a single instance to save costs

B.Use blue/green deployment with a new endpoint configuration

C.Configure multiple instances behind the endpoint

D.Delete the old endpoint before creating the new one

E.Use Canary or Linear traffic shifting in SageMaker

AnswersB, C, E

Blue/green allows traffic switch after new version is healthy.

Why this answer

Blue/green deployment, multiple instances, and traffic shifting are standard practices for zero-downtime updates.

Practice this question →

530

Multi-Selectmedium

Which THREE evaluation metrics are appropriate for a multi-class classification problem? (Choose 3.)

Select 3 answers

A.Confusion matrix.

B.Accuracy.

C.Mean squared error.

D.Precision-recall curve.

E.F1 score (macro/micro).

AnswersA, B, E

Confusion matrix provides per-class performance.

Why this answer

Option A is correct because accuracy is common for multi-class. Option C is correct because confusion matrix gives detailed per-class performance. Option D is correct because F1 score macro/micro averaging is used.

Option B is wrong because mean squared error is for regression. Option E is wrong because precision-recall curve is typically for binary.

Practice this question →

531

MCQmedium

A data scientist is deploying a SageMaker model using CloudFormation. The stack creation fails with the above error. What is the MOST likely cause?

A.The Docker image has not been pushed to the ECR repository

B.The IAM role does not have permissions to access ECR

C.The model name is incorrect

D.The instance type specified in the endpoint configuration is not available

AnswerA

The error clearly states the image does not exist in ECR.

Why this answer

The error indicates that SageMaker cannot find the Docker image specified in the `PrimaryContainer` of the model definition. CloudFormation creates the SageMaker model by referencing an ECR image URI; if that image has not been pushed to the specified ECR repository, the model creation fails immediately. This is the most common cause when the stack creation fails with an error about a missing or inaccessible image.

Exam trap

The trap here is that candidates confuse a missing image (resource not found) with an IAM permissions error, but the error message for a missing image is distinct and occurs at a different stage of the API call.

How to eliminate wrong answers

Option B is wrong because an IAM role lacking ECR permissions would produce an access denied or authorization error, not a 'not found' error for the image. Option C is wrong because an incorrect model name would cause a different error (e.g., 'Model not found') only when referencing an existing model, not during creation. Option D is wrong because an unavailable instance type would cause a resource allocation failure at the endpoint creation step, not during model creation.

Practice this question →

532

MCQeasy

A CloudFormation stack creation failed. The SageMaker endpoint resource shows CREATE_FAILED. What is the most likely issue?

A.The IAM role used by CloudFormation lacks permissions to create endpoints.

B.The S3 bucket 'my-bucket' does not contain the object 'model.tar.gz'.

C.The SageMaker endpoint configuration is invalid.

D.The instance type specified for the endpoint is not available in the region.

AnswerB

The error states the model data is not accessible, likely because the object does not exist.

Why this answer

The correct answer is B because a CREATE_FAILED status on a SageMaker endpoint resource during CloudFormation stack creation most commonly indicates that the model artifact specified in the Model definition cannot be located. SageMaker requires the S3 bucket and object path (e.g., 's3://my-bucket/model.tar.gz') to exist and be accessible at the time of model creation. If the object is missing, the model resource fails, cascading to the endpoint creation failure.

Exam trap

The trap here is that candidates often assume endpoint failures are always due to configuration or permissions, but the most common root cause in CloudFormation deployments is a missing S3 artifact, which is a prerequisite that is easy to overlook.

How to eliminate wrong answers

Option A is wrong because if the IAM role lacked permissions, CloudFormation would typically fail with an access denied error on the role itself, not specifically on the endpoint resource with CREATE_FAILED; the role is validated before resource creation. Option C is wrong because an invalid endpoint configuration would produce a validation error during stack creation, but the question states the endpoint resource shows CREATE_FAILED, which implies the configuration was accepted but the underlying model or instance caused failure. Option D is wrong because an unavailable instance type would result in a resource creation error with a specific message about insufficient capacity or unavailability, not a generic CREATE_FAILED on the endpoint; CloudFormation would report a different error code.

Practice this question →

533

MCQeasy

A machine learning team is using Amazon SageMaker to train a linear regression model. The team notices that the training loss decreases rapidly initially but then plateaus at a high value. What is the MOST likely cause?

A.The model uses batch normalization

B.The learning rate is set too low

C.The model is over-regularized with L2 regularization

D.The learning rate is set too high

AnswerD

A high learning rate can cause the loss to fluctuate or plateau after an initial drop.

Why this answer

A learning rate set too high causes the optimizer to take excessively large steps, overshooting the minimum of the loss function. This results in rapid initial decrease as the model makes large corrections, but then the loss plateaus at a high value because the parameters oscillate around the optimum without converging. In SageMaker's linear regression (typically using stochastic gradient descent), a high learning rate prevents fine-grained convergence, leading to a high plateau.

Exam trap

The trap here is that candidates often associate a plateau in loss with a learning rate that is too low (underfitting), but the rapid initial decrease followed by a high plateau is a classic sign of a learning rate that is too high, causing divergence or oscillation.

How to eliminate wrong answers

Option A is wrong because batch normalization is not typically used in linear regression models; it is a technique for deep neural networks to stabilize training by normalizing layer inputs, and it would not cause a high plateau. Option B is wrong because a learning rate set too low would cause the loss to decrease very slowly from the start, not rapidly initially and then plateau at a high value. Option C is wrong because over-regularization with L2 regularization would cause the loss to be high from the beginning due to large penalty terms, and the loss would not decrease rapidly initially; it would remain high throughout training.

Practice this question →

534

MCQhard

A company is deploying a model that predicts customer churn. The model's recall for the churn class is 0.9, but precision is 0.4. The business cost of false positives is high. Which strategy would MOST likely improve precision without significantly harming recall?

A.Collect more data for the churn class

B.Use a different algorithm such as Random Forest

C.Decrease the decision threshold for the churn class

D.Increase the decision threshold for the churn class

AnswerD

Higher threshold reduces false positives, improving precision, though recall may drop slightly.

Why this answer

Adjusting the decision threshold to require a higher probability before predicting churn can reduce false positives (increase precision) but may lower recall. The goal is to find a threshold that balances both. Using more aggressive regularization or different algorithms may not directly control the trade-off.

Practice this question →

535

MCQhard

A data scientist is building a multi-class classification model with 10 classes. The dataset has 100,000 samples. After training a random forest with 100 trees, the model achieves 85% accuracy on the test set. However, the data scientist notices that for one rare class (1% of data), recall is only 5%. Which technique is MOST likely to improve recall for the rare class without significantly reducing overall accuracy?

A.Increase the number of trees to 500

B.Apply SMOTE to oversample the rare class in the training data

C.Use stratified sampling only for the test set

D.Reduce the decision threshold for the rare class to 0.1

AnswerB

SMOTE creates synthetic samples for the minority class.

Why this answer

SMOTE (Synthetic Minority Oversampling Technique) generates synthetic samples for the rare class by interpolating between existing minority instances, which directly addresses the class imbalance. This increases the model's exposure to the rare class during training, improving recall without discarding data or significantly altering the overall class distribution, thus preserving overall accuracy.

Exam trap

Cisco often tests the misconception that increasing model complexity (more trees) or adjusting thresholds post-training can fix class imbalance, when in fact the root cause is the skewed training data distribution, which requires a data-level technique like SMOTE.

How to eliminate wrong answers

Option A is wrong because increasing the number of trees in a random forest primarily reduces variance and improves generalization, but it does not address class imbalance; recall for a rare class will remain low if the training data is skewed. Option C is wrong because stratified sampling on the test set only ensures the test set reflects the original class distribution, which does nothing to improve the model's ability to learn the rare class during training. Option D is wrong because reducing the decision threshold for the rare class to 0.1 would increase recall but at the cost of dramatically increasing false positives, which would significantly reduce overall accuracy, especially since the rare class is only 1% of the data.

Practice this question →

536

MCQeasy

A data scientist is building a binary classification model to predict whether a customer will subscribe to a service. The dataset contains 20 features, including categorical variables with high cardinality (e.g., zip code with 10,000 unique values). The scientist uses a logistic regression model and obtains a training AUC of 0.85 and a test AUC of 0.60. The scientist suspects overfitting due to high cardinality features. Which approach should the scientist use to address this issue?

A.Apply label encoding to the zip code feature

B.Remove the zip code feature entirely

C.Apply target encoding with smoothing to the zip code feature

D.Apply one-hot encoding to the zip code feature

AnswerC

Target encoding reduces cardinality and can improve generalization.

Why this answer

Option C (target encoding with smoothing) reduces cardinality while preserving predictive power. Option A (one-hot encoding) increases dimensionality drastically. Option B (label encoding) may introduce ordinality issues.

Option D (remove zip code) may lose important information.

Practice this question →

537

MCQeasy

A data scientist is using a decision tree algorithm for a classification task. The tree is very deep and achieves 100% accuracy on the training set but performs poorly on the test set. Which technique should the data scientist use to improve generalization?

A.Add more features to the dataset.

B.Reduce the number of training samples.

C.Prune the decision tree.

D.Increase the maximum depth of the tree.

AnswerC

Pruning reduces tree complexity and improves generalization.

Why this answer

A deep decision tree that achieves 100% training accuracy but poor test accuracy is overfitting the training data. Pruning the tree removes branches that have little statistical power, reducing complexity and improving generalization to unseen data.

Exam trap

The trap here is that candidates may confuse overfitting with underfitting and choose to increase model complexity (Option D) or add features (Option A), when the correct remedy for overfitting is to reduce complexity through pruning.

How to eliminate wrong answers

Option A is wrong because adding more features typically increases the risk of overfitting by giving the tree more opportunities to memorize noise. Option B is wrong because reducing the number of training samples exacerbates overfitting by providing less data for the tree to learn generalizable patterns. Option D is wrong because increasing the maximum depth would make the tree even deeper and more complex, worsening overfitting rather than improving generalization.

Practice this question →

538

MCQeasy

A data scientist is using Amazon SageMaker to train a linear regression model. The training data contains missing values. Which preprocessing step should be applied before training?

A.Ignore missing values; linear regression can handle them.

B.Impute missing values with the mean of the column.

C.Replace missing values with zeros.

D.Remove all rows containing missing values.

AnswerB

Imputation is a common technique to handle missing data.

Why this answer

Option B is correct because linear regression models in Amazon SageMaker cannot handle missing values natively; they require complete numerical input. Imputing missing values with the column mean is a standard preprocessing technique that preserves the overall distribution and avoids introducing bias, ensuring the SageMaker built-in Linear Learner algorithm can train without errors.

Exam trap

The trap here is that candidates may assume linear regression can inherently handle missing values (Option A) due to its statistical robustness, but AWS SageMaker's implementation requires complete data, and ignoring missing values will cause runtime errors or silent model degradation.

How to eliminate wrong answers

Option A is wrong because linear regression algorithms, including SageMaker's Linear Learner, do not accept missing values in the training data; they will either fail or produce incorrect results if missing values are present. Option C is wrong because replacing missing values with zeros can significantly distort the data distribution and model coefficients, especially if the missingness is not random, leading to biased estimates. Option D is wrong because removing all rows with missing values can drastically reduce the dataset size, potentially discarding valuable information and introducing selection bias, which is particularly problematic in small or imbalanced datasets.

Practice this question →

539

MCQhard

A data scientist is training a gradient boosting model on a large dataset (100 GB) stored in Amazon S3. The training job uses a SageMaker built-in XGBoost algorithm with a single ml.p3.2xlarge instance. The job fails with a memory error. Which solution should the data scientist adopt to resolve the memory issue?

A.Reduce the number of features by using PCA before training.

B.Use SageMaker Pipe mode to stream data directly from S3 instead of downloading it.

C.Increase the number of training instances to use distributed training with XGBoost.

D.Use SageMaker BlazingText algorithm with negative sampling.

E.Switch to SageMaker Linear Learner algorithm, which requires less memory.

AnswerC

Distributed training splits data across instances, reducing memory per instance.

Why this answer

Option C is correct because increasing the number of instances allows distributed training, reducing per-instance memory pressure. Option A may not fully load the dataset. Option B uses a different algorithm with different memory characteristics, but XGBoost can handle large data with distributed training.

Option D reduces features but may degrade model quality. Option E uses a different algorithm but not necessarily more memory-efficient.

Practice this question →

540

MCQmedium

A data scientist ran a SageMaker training job that failed with the error shown. The training script expects the data in '/opt/ml/input/data/training/train.csv'. What is the most likely issue?

A.The hyperparameter 'sagemaker_program' is misspelled

B.The training script has a bug in reading the file

C.The channel name should be 'train' instead of 'training'

D.The S3 data path should point to the exact file, not the folder

AnswerD

SageMaker copies the prefix content into the channel directory; if train.csv is not at the root of that prefix, the path is wrong.

Why this answer

Option A is correct. The S3Uri points to a prefix (folder), but the script expects a specific file name. SageMaker copies the contents of the S3 prefix to the channel directory; if the file is not directly under that prefix, it won't be found.

Option B is wrong because the channel name matches. Option C is wrong if the file exists. Option D is wrong because the error is about the file, not hyperparameters.

Practice this question →

541

MCQeasy

A machine learning engineer is deploying a model to SageMaker for real-time inference. The model is a TensorFlow SavedModel. Which SageMaker capability should be used to create an endpoint?

A.SageMaker hosting with TensorFlow Serving container

B.SageMaker Pipelines

C.SageMaker Model Monitor

D.SageMaker Ground Truth

AnswerA

SageMaker supports TensorFlow Serving for model deployment.

Why this answer

Option A is correct because SageMaker provides managed TensorFlow serving containers. Option B is wrong because SageMaker Ground Truth is for labeling data. Option C is wrong because SageMaker Model Monitor is for monitoring.

Option D is wrong because SageMaker Pipelines is for ML workflows.

Practice this question →

542

Multi-Selecthard

A data scientist is using Amazon SageMaker Debugger to monitor training. Which THREE types of issues can Debugger monitor?

Select 3 answers

A.Hardware failures

B.Poor weight initialization

C.Data drift

D.Overfitting

E.Vanishing gradients

AnswersB, D, E

Debugger can detect issues from poor initialization.

Why this answer

Amazon SageMaker Debugger can monitor training for poor weight initialization by analyzing tensors and gradients during the training process. It uses built-in rules to detect if weights are initialized with values that are too large or too small, which can lead to slow convergence or failure to learn. This is a core capability of Debugger's real-time monitoring of model parameters.

Exam trap

The trap here is that candidates confuse SageMaker Debugger (which monitors training metrics like gradients and weights) with SageMaker Model Monitor (which monitors inference data for drift and bias), leading them to incorrectly select data drift as a Debugger capability.

Practice this question →

543

Multi-Selecthard

A company is using Amazon SageMaker to tune hyperparameters for a gradient boosting model. The objective is to minimize root mean squared error (RMSE). The data scientist wants to explore the hyperparameter space efficiently. Which THREE hyperparameter tuning strategies should the data scientist consider? (Choose 3.)

Select 3 answers

A.Bayesian optimization

B.Random search

C.Grid search

D.Manual search

E.Hyperband

AnswersA, B, E

Uses probabilistic model to guide search.

Why this answer

Bayesian optimization is correct because it builds a probabilistic model of the objective function (RMSE) and uses an acquisition function to select the next hyperparameter combination to evaluate. This approach is sample-efficient, making it ideal for expensive-to-evaluate models like gradient boosting, as it balances exploration and exploitation to find optimal hyperparameters with fewer trials.

Exam trap

The trap here is that candidates often assume grid search is the most thorough strategy, but in practice it is inefficient for high-dimensional spaces, while SageMaker explicitly supports Bayesian optimization, random search, and Hyperband as the three built-in tuning strategies.

Practice this question →

544

Multi-Selectmedium

A data scientist is training a neural network for image classification. The training loss is not decreasing significantly, and the validation loss is high. Which TWO actions should the scientist take to address potential vanishing gradients?

Select 2 answers

A.Increase the learning rate

B.Use ReLU activation functions in hidden layers

C.Switch activation functions from ReLU to sigmoid

D.Add batch normalization layers

E.Remove dropout layers

AnswersB, D

ReLU does not saturate for positive inputs, reducing vanishing gradient risk.

Why this answer

ReLU activation functions help mitigate vanishing gradients because they output a constant gradient of 1 for positive inputs, preventing the gradient from shrinking as it propagates backward through many layers. This avoids the exponential decay of gradients that occurs with saturating activations like sigmoid or tanh, enabling effective training of deep networks.

Exam trap

The trap here is that candidates may confuse vanishing gradients with overfitting or learning rate issues, leading them to choose options like increasing the learning rate or removing dropout, which do not address the fundamental gradient propagation problem.

Practice this question →

545

MCQhard

A data scientist is using Amazon SageMaker's built-in BlazingText algorithm for word2vec embeddings. The dataset is a corpus of 10 million documents. After training, the data scientist observes that the learned embeddings do not capture semantic similarity well (e.g., 'king' and 'queen' are not close). Which hyperparameter adjustment is most likely to improve the quality of embeddings?

A.Increase the vector dimensionality

B.Decrease the window size

C.Decrease the number of negative samples

D.Increase the learning rate

AnswerA

Higher dimensionality allows embeddings to capture more fine-grained semantic relationships.

Why this answer

Increasing the vector dimensionality allows the model to capture more nuanced semantic relationships and co-occurrence patterns in the data. With 10 million documents, the default dimensionality (typically 100 or 300) may be insufficient to encode the rich contextual information, so raising it (e.g., to 300 or 500) gives the model more capacity to learn high-quality embeddings where words like 'king' and 'queen' become closer in vector space.

Exam trap

The trap here is that candidates often confuse 'window size' with 'context size' and assume decreasing it helps with similarity, but in reality, a larger window captures broader topical relationships, while a smaller window captures syntactic patterns; for semantic similarity, a moderate to large window is needed.

How to eliminate wrong answers

Option B is wrong because decreasing the window size reduces the context window, making the model focus on very local word co-occurrences, which actually harms the capture of broader semantic similarity like 'king' and 'queen'. Option C is wrong because decreasing the number of negative samples reduces the discriminative training signal, making it harder for the model to separate similar from dissimilar words, thus degrading embedding quality. Option D is wrong because increasing the learning rate can cause the optimization to overshoot minima or diverge, leading to unstable training and poor embeddings; the default learning rate in BlazingText is already tuned for convergence.

Practice this question →

546

MCQmedium

A data scientist is training a neural network for time series forecasting. The training loss decreases initially but then starts to increase after 20 epochs. Which action should the scientist take to address this?

A.Increase the dropout rate

B.Increase the learning rate

C.Implement early stopping based on validation loss

D.Add more layers to the network

AnswerC

Early stopping halts training when validation loss stops improving, preventing overfitting.

Why this answer

Option C is correct because early stopping monitors validation loss and stops training when it starts increasing, preventing overfitting. Option A is wrong because increasing learning rate may cause divergence. Option B is wrong because adding more layers could worsen overfitting.

Option D is wrong because dropout helps with overfitting but the issue is that loss is increasing, not just overfitting; early stopping is direct.

Practice this question →

547

Multi-Selectmedium

A data scientist is training a model using Amazon SageMaker. The training job is running on GPU instances, but the GPU utilization is low. Which TWO actions could improve GPU utilization?

Select 2 answers

A.Increase the number of epochs

B.Use a larger instance with multiple GPUs

C.Increase the batch size

D.Switch to CPU instances

E.Decrease the batch size

AnswersB, C

More GPUs increase parallelism.

Why this answer

Option A is correct because increasing batch size can better utilize GPU parallelism. Option C is correct because using a larger instance with more GPUs can improve utilization. Option B is wrong because decreasing batch size would reduce utilization.

Option D is wrong because using CPU instances would not utilize GPU. Option E is wrong because adding more epochs does not affect utilization per step.

Practice this question →

548

MCQmedium

A data scientist is training a text classification model using Amazon SageMaker. The dataset consists of 100,000 labeled documents. The data scientist notices that the model performs well on the training set but poorly on the validation set. Which regularization technique should the data scientist apply to reduce overfitting?

A.Dropout

B.Data augmentation

C.Batch normalization

D.Early stopping

AnswerA

Dropout randomly drops units during training, preventing co-adaptation and reducing overfitting.

Why this answer

Dropout is a regularization technique that randomly drops a fraction of neurons during training, which prevents the model from relying too heavily on any single feature and forces it to learn more robust representations. This directly addresses the overfitting symptom of high training accuracy and low validation accuracy by reducing the model's capacity to memorize noise in the training data.

Exam trap

The trap here is that candidates often confuse batch normalization with regularization, but batch normalization primarily addresses internal covariate shift and training stability, not overfitting, while dropout is the explicit regularization technique for neural networks.

How to eliminate wrong answers

Option B (Data augmentation) is wrong because it is primarily used for image or audio data to artificially expand the dataset by applying transformations, but for text classification, simple augmentation (e.g., synonym replacement) may not be as effective and is not a standard regularization technique for overfitting in this context. Option C (Batch normalization) is wrong because it normalizes layer inputs to stabilize and accelerate training, but it does not directly reduce overfitting; it can even have a slight regularizing effect but is not the primary technique for combating overfitting. Option D (Early stopping) is wrong because while it can prevent overfitting by halting training when validation performance plateaus, the question asks for a regularization technique, and early stopping is an optimization trick rather than a structural regularization method like dropout.

Practice this question →

549

MCQmedium

A data scientist is using Amazon SageMaker to train a model using a built-in algorithm. The training job fails with an error indicating that the algorithm expects the data to be in recordIO-protobuf format, but the input is CSV. What is the most efficient way to resolve this?

A.Change the inference data to recordIO-protobuf format.

B.Use a boto3 script to convert the CSV files locally and upload.

C.Use a SageMaker processing job to convert the CSV data to recordIO-protobuf format.

D.Switch to a different algorithm that accepts CSV format.

AnswerC

Processing jobs can efficiently transform data into the required format.

Why this answer

Option A is correct because converting CSV to recordIO-protobuf format using a SageMaker processing job is efficient and scalable. Option B is wrong because changing the algorithm may not be desirable. Option C is wrong because recordIO-protobuf is not suitable for CSV.

Option D is wrong because direct conversion via boto3 is more manual and less efficient than a processing job.

Practice this question →

550

MCQhard

A data scientist runs a training job that fails. The CLI output is shown in the exhibit. What is the MOST likely cause of the failure?

A.The S3 bucket or prefix does not exist.

B.The channel name is misspelled.

C.The instance type ml.m5.large is too small.

D.The IAM role does not have s3:GetObject permission.

AnswerA

The error message explicitly says the S3 URI is not found.

Why this answer

The CLI output shows an error indicating that the S3 bucket or prefix does not exist. This is a common failure when the training job's input data path is incorrect, as SageMaker attempts to read from the specified S3 location and fails if the bucket or prefix is missing. The error message directly points to this issue, making it the most likely cause.

Exam trap

The trap here is that candidates may confuse S3 permission errors (403) with bucket-not-found errors (404), leading them to incorrectly select the IAM role permission option when the actual issue is a missing S3 path.

How to eliminate wrong answers

Option B is wrong because a misspelled channel name would typically result in a different error, such as 'Invalid channel name' or 'Channel not found', not an S3 access error. Option C is wrong because the instance type ml.m5.large is a valid and commonly used instance for training; if it were too small, the job would likely start but fail due to resource exhaustion, not an immediate S3-related error. Option D is wrong because an IAM role lacking s3:GetObject permission would produce an 'Access Denied' or '403 Forbidden' error, not a 'bucket or prefix does not exist' error.

Practice this question →

551

MCQeasy

A data scientist is training a neural network on Amazon SageMaker and wants to automatically stop training if the validation loss does not improve for 5 consecutive epochs. Which feature should they use?

A.Implement early stopping in the training script

B.SageMaker Debugger

C.SageMaker Checkpointing

D.SageMaker Hyperparameter Tuning

AnswerA

Early stopping is implemented in the training code (e.g., Keras EarlyStopping callback).

Why this answer

Early stopping with a patience parameter is used to stop training when a metric stops improving. Option A is wrong because Checkpointing saves models, does not stop training. Option B is wrong because Hyperparameter tuning searches for best hyperparameters.

Option D is wrong because Debugger monitors training, but early stopping is a built-in feature of the framework.

Practice this question →

552

MCQmedium

Refer to the exhibit. A data scientist is assigned an IAM policy to deploy a SageMaker model. When the scientist tries to create an endpoint, the action fails with an authorization error. What is the missing permission?

A.iam:PassRole

B.sagemaker:ListEndpoints

C.sagemaker:InvokeEndpoint

D.sagemaker:UpdateEndpoint

AnswerA

SageMaker needs iam:PassRole to assume a role for creating endpoints.

Why this answer

The policy allows creating training jobs, models, endpoint configs, and endpoints, but does not allow invoking the endpoint for inference. The error is for creating the endpoint, which requires `sagemaker:CreateEndpoint` which is present. However, the error might be due to missing `sagemaker:InvokeEndpoint` which is needed for testing? Actually the question says creating the endpoint fails, so maybe the policy lacks `sagemaker:CreateEndpointConfig`? But that is present.

Possibly the missing permission is `sagemaker:ListTags`? No. The most common missing permission is `sagemaker:DescribeEndpoint`? But the error is authorization error, likely missing `sagemaker:CreateEndpoint`? Wait, it is present. Perhaps the policy needs `iam:PassRole` to pass a role to SageMaker.

Yes, SageMaker requires `iam:PassRole` to allow the service to assume a role. So the missing permission is `iam:PassRole`. Option D is correct.

Option A: `sagemaker:InvokeEndpoint` is for inference, not creation. Option B: `sagemaker:UpdateEndpoint` is for updating. Option C: `sagemaker:ListEndpoints` is for listing.

Practice this question →

553

MCQmedium

A data scientist is using SageMaker to train a deep learning model with a large dataset stored in S3. The training is taking a long time. Which action would most likely reduce training time without sacrificing accuracy?

A.Increase the batch size

B.Use SageMaker Pipe Input mode

C.Use a smaller instance type

D.Reduce the number of epochs

AnswerB

Streams data from S3 directly to the algorithm, reducing I/O time.

Why this answer

SageMaker Pipe Input mode streams training data directly from S3 into the algorithm without first downloading it to the local EBS volume. This eliminates the I/O bottleneck caused by large dataset downloads, significantly reducing training time while preserving accuracy because the model sees the same data.

Exam trap

The trap here is that candidates confuse batch size adjustments (which affect convergence stability) with I/O optimization techniques, overlooking that SageMaker Pipe mode directly addresses the data loading bottleneck without altering the training algorithm.

How to eliminate wrong answers

Option A is wrong because increasing the batch size can reduce training time per epoch but may degrade model accuracy due to convergence to sharper minima or increased generalization error, especially in deep learning. Option C is wrong because using a smaller instance type reduces computational capacity, increasing training time rather than decreasing it. Option D is wrong because reducing the number of epochs directly reduces training time but sacrifices accuracy by underfitting the model.

Practice this question →

554

MCQhard

A data scientist is using Amazon SageMaker to train a large language model with PyTorch. The training job is taking too long. The dataset is stored in S3 and the training script uses the SageMaker PyTorch container. Which change is MOST likely to reduce training time?

A.Use Pipe mode to stream data from S3 instead of downloading.

B.Increase the number of instances in the training job.

C.Change the optimizer to AdamW.

D.Switch to spot instances to reduce cost.

AnswerA

Pipe mode reduces data loading time.

Why this answer

Option A is correct because SageMaker Pipe mode streams data directly from S3 to the training algorithm via a Unix FIFO (named pipe), eliminating the need to first download the entire dataset to the training instance's local storage. This reduces I/O wait time and disk usage, which is especially beneficial for large language models where dataset sizes can be in terabytes, thereby significantly cutting total training time.

Exam trap

The trap here is that candidates often confuse cost-saving measures (spot instances) or model-tuning changes (AdamW) with performance improvements, while the actual bottleneck in large-scale training is frequently data I/O, not compute or optimizer choice.

How to eliminate wrong answers

Option B is wrong because simply increasing the number of instances does not address the root cause of slow data loading; it may even introduce communication overhead and increase costs without proportional speedup if the bottleneck is I/O. Option C is wrong because changing the optimizer to AdamW affects convergence behavior and model accuracy, not the data ingestion speed or training job duration directly. Option D is wrong because switching to spot instances reduces cost but does not reduce training time; spot instances can actually increase training time if they are interrupted and require checkpointing and resumption.

Practice this question →

555

MCQhard

A company is using Amazon SageMaker to train a large language model with billions of parameters. The training job uses multiple GPU instances in a distributed fashion. The training is converging but the loss is not decreasing as expected. The data scientist suspects that the learning rate is too high. Which technique should the data scientist use to automatically adjust the learning rate during training?

A.Use a fixed learning rate and train for more epochs

B.Increase the batch size to reduce variance

C.Implement learning rate scheduling with a cosine annealing schedule

D.Use gradient clipping

AnswerC

Cosine annealing reduces the learning rate smoothly, helping convergence.

Why this answer

Learning rate scheduling, such as a cosine annealing schedule, can automatically reduce the learning rate over time. This helps the model converge better. SageMaker's built-in algorithms support learning rate scheduling, or the user can implement it in custom training scripts.

Practice this question →

556

MCQeasy

A company is building a recommendation system using collaborative filtering. The dataset contains implicit feedback (clicks) from users on items. Which algorithm is best suited for this scenario?

A.Linear Regression

B.Alternating Least Squares (ALS)

C.K-means clustering

D.Singular Value Decomposition (SVD)

AnswerB

ALS is designed for implicit feedback and scales well.

Why this answer

Alternating Least Squares (ALS) is designed for implicit feedback datasets in collaborative filtering. Option A is wrong because SVD is for explicit ratings. Option C is wrong because K-means is clustering, not recommendation.

Option D is wrong because Linear Regression is for supervised regression, not recommendation.

Practice this question →

557

MCQhard

A data scientist is tuning a gradient boosting model using Amazon SageMaker Automatic Model Tuning. The objective metric is AUC. The training job converges quickly but the final model has low AUC on the validation set. Which hyperparameter should the data scientist adjust to improve validation AUC?

A.Increase the subsample ratio of training data

B.Decrease the learning rate and increase the number of rounds

C.Increase the learning rate

D.Increase the maximum depth of trees

AnswerB

Lower learning rate with more rounds typically improves generalization and AUC.

Why this answer

Decreasing the learning rate and increasing the number of rounds is the correct approach because a low learning rate forces the model to take smaller steps toward the optimum, reducing overfitting and allowing more trees to contribute to the ensemble. This combination often improves generalization and validation AUC when the training job converges too quickly, indicating that the model is overfitting or underfitting due to aggressive learning.

Exam trap

The trap here is that candidates mistakenly think increasing the learning rate will speed up convergence and improve AUC, but in reality it causes overfitting when the model already converges quickly, while decreasing the learning rate with more rounds is the standard remedy for underfitting or overfitting in gradient boosting.

How to eliminate wrong answers

Option A is wrong because increasing the subsample ratio (e.g., from 0.8 to 1.0) actually uses more training data per iteration, which can increase variance and overfitting, not improve validation AUC when the model already converges quickly. Option C is wrong because increasing the learning rate makes the model converge even faster, exacerbating overfitting and further reducing validation AUC. Option D is wrong because increasing the maximum depth of trees makes each tree more complex and prone to overfitting, which typically degrades validation AUC when the model already converges quickly.

Practice this question →

558

MCQhard

A company uses Amazon SageMaker to train a model using the built-in Linear Learner algorithm. The training data contains missing values in some features. What is the best practice for handling missing values with this algorithm?

A.Remove rows with missing values

B.Impute missing values using mean or median imputation

C.Set missing values to zero

D.Use the `handle_missing` parameter in the algorithm

AnswerB

Imputation is a standard preprocessing step for handling missing values.

Why this answer

Linear Learner expects dense input; it cannot handle missing values. The best practice is to impute missing values before training, such as using mean or median imputation. Setting the missing value to zero could bias the model.

Removing rows with missing values may lose data. The algorithm does not have a built-in missing value handling parameter. Option A: Impute missing values is correct.

Option B: Zero may cause bias. Option C: Removing rows may lose data. Option D: There is no parameter for missing values in Linear Learner.

Practice this question →

559

MCQhard

A team is training a neural network for image classification using Amazon SageMaker. The training loss decreases rapidly but the validation loss starts increasing after a few epochs. Which action should the team take?

A.Reduce the batch size

B.Add more convolutional layers

C.Increase the learning rate

D.Implement early stopping based on validation loss

AnswerD

Early stopping prevents overfitting.

Why this answer

Option A is correct because early stopping prevents overfitting by stopping when validation loss increases. Option B is wrong because increasing learning rate may worsen overfitting. Option C is wrong because adding more layers increases model complexity.

Option D is wrong because reducing batch size may slow training but not prevent overfitting.

Practice this question →

560

Multi-Selectmedium

A company is using SageMaker to deploy a model for real-time inference. The model requires GPU for low latency. Which THREE configurations should the company consider for high availability and cost optimization? (Choose three.)

Select 3 answers

A.Use Spot instances for the endpoint.

B.Use a multi-model endpoint to share GPU instances among multiple models.

C.Use SageMaker Batch Transform for inference.

D.Use multiple production variants with different instance types.

E.Enable automatic scaling based on invocation count.

AnswersB, D, E

Increases GPU utilization and reduces cost.

Why this answer

Option B is correct because a multi-model endpoint allows multiple models to be hosted on the same GPU-backed instance, sharing the GPU resources and reducing idle time. This improves cost efficiency by maximizing GPU utilization while still providing low-latency inference for each model. It is a recommended pattern for serving many models with GPU requirements without provisioning separate endpoints.

Exam trap

The trap here is that candidates often confuse high availability with cost optimization, incorrectly assuming Spot instances (Option A) are suitable for real-time inference despite their interruption risk, or they overlook multi-model endpoints as a GPU-sharing strategy.

Practice this question →

561

MCQeasy

A data scientist is training a Random Forest model on Amazon SageMaker. The model performs well on the training set but poorly on the test set. Which technique should the data scientist use to address this issue?

A.Increase the number of trees in the forest

B.Decrease the maximum depth of each tree

C.Increase the learning rate

D.Increase the maximum depth of each tree

AnswerB

Shallow trees generalize better, reducing overfitting.

Why this answer

The model is overfitting, as indicated by high training performance and poor test performance. Decreasing the maximum depth of each tree limits the complexity of individual trees, reducing overfitting by preventing them from memorizing noise in the training data. This is a standard regularization technique for Random Forest models in Amazon SageMaker.

Exam trap

AWS often tests the misconception that increasing model complexity (e.g., more trees or deeper trees) always improves performance, when in fact overfitting requires reducing complexity or applying regularization.

How to eliminate wrong answers

Option A is wrong because increasing the number of trees in the forest generally improves model stability and reduces variance without significantly increasing overfitting, but it does not address the root cause of overfitting from overly deep trees. Option C is wrong because learning rate is a hyperparameter for gradient boosting models, not for Random Forest; Random Forest does not use a learning rate. Option D is wrong because increasing the maximum depth of each tree would exacerbate overfitting by allowing trees to capture more noise and specific patterns in the training data, worsening test performance.

Practice this question →

562

MCQhard

A company is building a real-time fraud detection system using Amazon SageMaker. The model is a gradient boosting classifier trained on 500 GB of transactional data. The inference endpoint is deployed as a SageMaker real-time endpoint using an ml.c5.9xlarge instance. The model is serialized using the native format of the framework (XGBoost). The endpoint receives about 100 requests per second with an average payload size of 10 KB. The company observes that the endpoint's latency is around 200 ms, but they need under 100 ms. The data scientist profiles the endpoint and finds that the model inference time is 50 ms, but the remaining time is spent on data preprocessing and serialization/deserialization. The preprocessing involves converting JSON input to a NumPy array and then to a DMatrix. Which action is most likely to reduce latency to meet the requirement?

A.Use a more efficient serialization format such as Apache Arrow or Protocol Buffers for the input data

B.Switch to SageMaker Batch Transform to process requests in batches

C.Use a larger instance type such as ml.c5.18xlarge

D.Reduce the number of trees in the model

AnswerA

Reducing serialization/deserialization overhead directly addresses the bottleneck.

Why this answer

Option D is correct. By using SageMaker Batch Transform, the company can process requests in batches, reducing per-request overhead. However, the requirement is for real-time, so this may not be suitable.

Option A is wrong because larger instances may not reduce preprocessing overhead. Option B is wrong because reducing model complexity could hurt accuracy. Option C is wrong, but it's a plausible approach: using a more efficient serialization format (e.g., Protocol Buffers) can reduce deserialization time.

Actually, option C is correct: using a more efficient data format reduces preprocessing time. Option D is wrong because batch transform is for asynchronous, not real-time. The correct answer should be C.

Let me re-evaluate: The stem says 'remaining time is spent on data preprocessing and serialization/deserialization.' Using a more efficient serialization format (e.g., Protobuf instead of JSON) can reduce overhead. Option A: upgrading instance may not help if the bottleneck is serialization. Option B: reducing model complexity may affect accuracy.

Option D: batch transform is not real-time. So C is best.

Practice this question →

563

MCQeasy

A data scientist is building a regression model to predict house prices. The dataset includes features such as square footage, number of bedrooms, year built, and location. After training a linear regression model, the data scientist notices that the residuals have a clear pattern when plotted against predicted values: they increase with predicted values. The model also has high RMSE. Which action should the data scientist take to improve the model?

A.Remove outliers from the dataset.

B.Use L1 regularization (Lasso) to reduce overfitting.

C.Apply a log transformation to the target variable.

D.Add interaction terms between features.

AnswerC

Log transformation can stabilize variance and linearize the relationship, reducing the residual pattern.

Why this answer

Option C is correct because a pattern in residuals indicates non-linearity, and transforming the target variable (e.g., log transformation) can stabilize variance and linearize relationships. Option A is wrong because adding interaction terms may not fix heteroscedasticity. Option B is wrong because removing outliers is not the root cause.

Option D is wrong because regularization reduces overfitting but does not address non-linearity.

Practice this question →

564

MCQmedium

A company has a time series dataset of daily sales for the past 5 years. They want to forecast sales for the next 30 days. The data shows weekly seasonality and a slight upward trend. Which Amazon SageMaker algorithm is most appropriate for this task?

A.DeepAR

B.Linear Learner

C.XGBoost

D.K-Means

AnswerA

DeepAR is a built-in SageMaker algorithm for time series forecasting that handles seasonality and trends.

Why this answer

DeepAR is purpose-built for time series forecasting with seasonal patterns and trends. It uses a recurrent neural network (RNN) to model the conditional distribution of future values given past observations, and it natively handles multiple time series, missing data, and known seasonal periods (e.g., weekly). The weekly seasonality and upward trend in the daily sales data are exactly the kind of patterns DeepAR is designed to capture.

Exam trap

The trap here is that candidates often pick XGBoost (Option C) because it is a powerful tree-based model, but they overlook that it lacks native time series capabilities and requires manual feature engineering to capture seasonality and trend, whereas DeepAR is the only option specifically designed for this forecasting task.

How to eliminate wrong answers

Option B (Linear Learner) is wrong because it is a general-purpose linear regression or classification algorithm that cannot model seasonality or temporal dependencies without extensive manual feature engineering (e.g., lag variables, Fourier terms). Option C (XGBoost) is wrong because while it can be used for time series via feature engineering, it is not a dedicated forecasting algorithm and does not natively handle temporal order, autocorrelation, or seasonality; it treats each prediction as an independent regression task. Option D (K-Means) is wrong because it is an unsupervised clustering algorithm that groups data points by similarity and has no mechanism for forecasting future values in a time series.

Practice this question →

565

MCQmedium

A machine learning team is deploying a model for real-time fraud detection. The model must make predictions with less than 100ms latency. The team uses SageMaker and the model is a large ensemble of decision trees. Which SageMaker hosting option is MOST suitable?

A.SageMaker Multi-model endpoint

B.SageMaker Serverless Inference

C.SageMaker Elastic Inference

D.SageMaker Batch Transform

AnswerA

Supports multiple models on one instance with low latency.

Why this answer

A SageMaker Multi-model endpoint is the most suitable option because it allows hosting multiple models (including large ensembles) on a single endpoint while sharing underlying compute resources, which reduces cost and latency. The endpoint dynamically loads and caches models based on inference requests, enabling sub-100ms predictions for large ensemble models by keeping frequently used models in memory.

Exam trap

The trap here is that candidates might choose SageMaker Serverless Inference (Option B) thinking it automatically handles scaling for real-time workloads, but they overlook the cold start latency penalty that makes it unsuitable for sub-100ms latency requirements.

How to eliminate wrong answers

Option B (SageMaker Serverless Inference) is wrong because it has a cold start latency that can exceed 100ms, making it unsuitable for real-time fraud detection requiring consistent sub-100ms responses. Option C (SageMaker Elastic Inference) is wrong because it accelerates deep learning models by attaching GPU accelerators, but it does not benefit decision tree ensembles which are CPU-bound and do not leverage GPU acceleration. Option D (SageMaker Batch Transform) is wrong because it is designed for offline, asynchronous batch predictions on large datasets, not for real-time inference with low latency requirements.

Practice this question →

566

MCQmedium

A company uses Amazon SageMaker to train a linear regression model. After training, the model shows high bias on the training set. Which action is MOST likely to reduce bias?

A.Add more features

B.Collect more training data

C.Apply L2 regularization

D.Deploy the model to a larger instance

AnswerA

More features can capture patterns better.

Why this answer

High bias indicates that the model is underfitting the training data, meaning it is too simple to capture the underlying patterns. Adding more features increases the model's capacity to learn complex relationships, directly addressing underfitting by reducing bias. In SageMaker, this can be done by engineering additional input columns or using feature transformations before training.

Exam trap

The trap here is that candidates confuse high bias with high variance and incorrectly choose regularization or more data, which are solutions for overfitting, not underfitting.

How to eliminate wrong answers

Option B is wrong because collecting more training data does not reduce bias; it primarily helps with high variance (overfitting) by providing more examples to generalize from. Option C is wrong because L2 regularization (ridge regression) penalizes large coefficients, which increases bias to reduce variance, making bias worse in an already underfit model. Option D is wrong because deploying the model to a larger instance affects inference performance (latency/throughput) but does not change the model's learned parameters or its bias-variance tradeoff.

Practice this question →

567

Multi-Selecthard

Which THREE techniques can help reduce overfitting in a neural network? (Select THREE.)

Select 3 answers

A.Increase training epochs

B.Dropout

C.Early stopping

D.Increase the number of layers

E.L2 regularization

AnswersB, C, E

Dropout randomly drops units.

Why this answer

Dropout is correct because it randomly deactivates a fraction of neurons during training, forcing the network to learn redundant representations and preventing co-adaptation of features. This reduces overfitting by acting as an ensemble method without increasing computational cost at inference time.

Exam trap

AWS often tests the misconception that adding more capacity (layers/epochs) always improves performance, when in fact it increases overfitting without proper regularization.

Practice this question →

568

MCQeasy

A company uses Amazon SageMaker to deploy a model that predicts customer churn. The model is retrained weekly. The data scientist notices that the model's accuracy remains high, but the business reports that the model is not capturing new churn patterns. What is the most likely cause?

A.The model is underfitting the data

B.The model has data leakage from future data

C.The model is overfitting to the training data

D.The model is experiencing concept drift

AnswerD

Concept drift means the underlying data distribution changes, so the model's accuracy on old patterns remains high but it misses new patterns.

Why this answer

Concept drift occurs when the statistical properties of the target variable change over time, causing the model's predictions to become less relevant even if accuracy metrics remain high. In this scenario, the model is retrained weekly but still fails to capture new churn patterns because the underlying customer behavior has shifted—a classic sign of concept drift rather than a data or overfitting issue. Amazon SageMaker's built-in Model Monitor can detect such drift by comparing inference data distributions against a baseline.

Exam trap

The trap here is that candidates see 'accuracy remains high' and assume the model is overfitting or underfitting, but the key clue is 'not capturing new churn patterns'—which points to a shift in the underlying data distribution (concept drift), not a static model fit issue.

How to eliminate wrong answers

Option A is wrong because underfitting would manifest as consistently low accuracy on both training and test data, not as high accuracy with missed new patterns. Option B is wrong because data leakage from future data would cause unrealistically high performance during training and evaluation, not a failure to capture new churn patterns after deployment. Option C is wrong because overfitting would show high training accuracy but poor generalization on unseen data from the same distribution, whereas the problem here is that the data distribution itself has changed over time.

Practice this question →

569

MCQhard

A data scientist is using SageMaker to train a random forest model. The dataset has 100 features and 1 million rows. The training job fails with a 'ResourceLimitExceeded' error. What is the MOST likely cause?

A.The S3 bucket containing the training data is not in the same region.

B.The instance type selected does not have enough GPU memory.

C.The wrong algorithm was specified for the training job.

D.The account has reached its limit on the number of SageMaker training instances.

AnswerD

ResourceLimitExceeded indicates a service quota limit.

Why this answer

Random forest usually does not require GPU; CPU instances are sufficient. The error is likely due to exceeding the allowed number of instances or vCPU limit. Option A (GPU memory) is unlikely as random forest uses CPU.

Option C (S3 bucket) does not cause resource limit. Option D (wrong algorithm) would give different error.

Practice this question →

570

MCQmedium

A data scientist is training a deep learning model on Amazon SageMaker for image classification. The training is taking a long time and the GPU utilization is consistently below 30%. What should the data scientist do to improve GPU utilization and reduce training time?

A.Use early stopping to stop training earlier.

B.Increase the batch size.

C.Switch to a CPU-only instance.

D.Reduce the number of layers in the model.

AnswerB

Larger batches use GPU memory more efficiently and increase utilization.

Why this answer

Low GPU utilization (below 30%) indicates that the GPU is spending most of its time waiting for data to process, often due to small batch sizes that underutilize the GPU's parallel compute capacity. Increasing the batch size allows the GPU to process more samples per forward/backward pass, improving arithmetic intensity and hardware utilization, which directly reduces total training time on SageMaker.

Exam trap

The trap here is that candidates confuse 'low GPU utilization' with 'overfitting' or 'model complexity,' leading them to choose early stopping or reducing layers, when the real issue is insufficient data parallelism per batch.

How to eliminate wrong answers

Option A is wrong because early stopping halts training based on validation performance, but it does not address the root cause of low GPU utilization or improve hardware efficiency during each training step. Option C is wrong because switching to a CPU-only instance would drastically reduce computational throughput, making training even slower and further underutilizing resources. Option D is wrong because reducing the number of layers decreases model capacity and may harm accuracy, but it does not directly improve GPU utilization; the bottleneck is data throughput, not model depth.

Practice this question →

571

MCQmedium

A data scientist is training a binary classification model on a dataset with 100 features and 10,000 samples. The model achieves 99% accuracy on the training set but only 65% on the test set. Which technique should be applied first to address this issue?

A.Reduce the size of the training dataset

B.Increase the number of trees in a random forest

C.Apply L2 regularization to the model

D.Add more features to the model

AnswerC

L2 regularization penalizes large weights, reducing overfitting.

Why this answer

The symptoms indicate overfitting. Regularization (L1/L2) is a direct method to reduce overfitting by penalizing large coefficients. Option A is wrong because adding more features would worsen overfitting.

Option B is wrong because increasing model complexity would worsen overfitting. Option D is wrong because reducing training data would exacerbate the problem.

Practice this question →

572

Multi-Selectmedium

A data scientist is building a text classification model using a bag-of-words approach. The dataset contains 100,000 documents with a vocabulary of 50,000 unique words. The model is overfitting. Which THREE techniques can help reduce overfitting? (Choose THREE.)

Select 3 answers

A.Increase max_features to include more words

B.Apply L1 or L2 regularization

C.Reduce the n-gram range to unigrams only

D.Use feature selection to remove rare words

E.Use TF-IDF instead of raw counts

AnswersB, C, D

Regularization penalizes large coefficients, reducing overfitting.

Why this answer

Feature selection (removing rare words), regularization (L1/L2), and lowering n-gram range reduce model complexity and overfitting. Option A (increasing max_features) can increase overfitting. Option D (using TF-IDF) is a weighting scheme, not a regularization technique.

Practice this question →

573

Multi-Selectmedium

Which TWO of the following are valid techniques to handle missing data in a dataset?

Select 2 answers

A.Normalizing the data

B.Adding a constant value of 0

C.Mean imputation

D.Synthetic Minority Over-sampling (SMOTE)

E.Deleting rows with missing values

AnswersC, E

Replacing missing values with the mean is a standard technique.

Why this answer

Mean imputation (Option C) is a valid technique for handling missing data because it replaces missing values with the mean of the observed values for that feature, preserving the overall mean of the dataset. This approach is simple and effective for numerical data that is missing completely at random (MCAR), as it does not introduce bias in the mean estimate.

Exam trap

Cisco often tests the distinction between data preprocessing techniques (like imputation) and other unrelated techniques (like normalization or SMOTE), so the trap here is that candidates may confuse SMOTE or normalization as valid missing data handling methods because they are common preprocessing steps, but they serve entirely different purposes.

Practice this question →

574

Multi-Selectmedium

A company uses Amazon SageMaker to train a linear regression model. During evaluation, they observe that the model has high bias (underfitting). Which THREE actions can reduce bias?

Select 3 answers

A.Increase L2 regularization.

B.Add polynomial features.

C.Reduce the regularization strength.

D.Use a smaller training dataset.

E.Use a random forest model instead of linear regression.

AnswersB, C, E

Polynomial features increase model capacity, reducing bias.

Why this answer

Options A, B, and D are correct. Adding polynomial features increases model complexity. Using a more complex algorithm (random forest) can capture non-linear patterns.

Reducing regularization allows the model to fit more closely. Option C is wrong because adding L2 regularization increases bias. Option E is wrong because reducing training data typically increases bias.

Practice this question →

575

MCQmedium

A data scientist is training a text classification model using Amazon SageMaker's BlazingText algorithm. The dataset consists of 1 million documents, each labeled with one of 10 categories. The model achieves 92% accuracy on a held-out test set. However, when deployed, the model performs poorly on documents containing slang and typos. What should the data scientist do to improve model robustness?

A.Remove all documents with slang or typos from the training set.

B.Augment the training data by introducing common slang replacements and typos.

C.Increase the embedding dimension from 100 to 300.

D.Increase the number of training epochs.

AnswerB

Data augmentation exposes the model to realistic noise, improving robustness.

Why this answer

Option A is correct. Data augmentation with noise helps the model generalize to variations. Option B is wrong because removing such documents reduces training data.

Option C is wrong because a larger embedding dimension may not help with slang. Option D is wrong because increasing epochs may lead to overfitting.

Practice this question →

576

MCQeasy

A team is building a product recommendation system using matrix factorization in Amazon SageMaker. They notice that the model's training loss decreases steadily but validation loss starts increasing after 5 epochs. What is the most likely cause?

A.Underfitting

B.Not enough training data

C.Learning rate too high

D.Overfitting

AnswerD

The model is memorizing the training data.

Why this answer

In matrix factorization for recommendation systems, a decreasing training loss with an increasing validation loss after several epochs is a classic sign of overfitting. The model is memorizing the training data (including noise) rather than learning generalizable patterns, which degrades its performance on unseen validation data.

Exam trap

The trap here is that candidates may confuse the symptom of overfitting (training loss decreasing, validation loss increasing) with underfitting or a learning rate issue, but the key is the divergence between the two loss curves after a period of convergence.

How to eliminate wrong answers

Option A is wrong because underfitting would show high training loss that does not decrease sufficiently, not a diverging gap between training and validation loss. Option B is wrong because insufficient training data can contribute to overfitting, but the direct symptom described—training loss decreasing while validation loss increases—is the hallmark of overfitting, not a data quantity issue alone. Option C is wrong because a learning rate that is too high typically causes the loss to oscillate or diverge entirely, not a steady decrease in training loss with a later increase in validation loss.

Practice this question →

577

Multi-Selecthard

A company is deploying a machine learning model for real-time fraud detection. The model must have extremely low latency (<10 ms) and high throughput. Which THREE design choices meet these requirements? (Choose 3.)

Select 3 answers

A.Use GPU instances (e.g., ml.p3) for the endpoint.

B.Use one endpoint per model to avoid interference.

C.Use SageMaker Batch Transform for real-time predictions.

D.Use SageMaker multi-model endpoints to host multiple models on the same instance.

E.Use SageMaker Elastic Inference to attach GPU acceleration to a CPU instance.

AnswersA, D, E

GPU accelerates inference, reducing latency.

Why this answer

Option A is correct because GPU instances like ml.p3 provide massively parallel compute capability that accelerates matrix operations common in deep learning models, enabling inference latencies under 10 ms. For real-time fraud detection, the GPU's high throughput and low latency are essential for processing thousands of transactions per second without bottlenecks.

Exam trap

Cisco often tests the misconception that batch processing services like Batch Transform can be used for real-time inference, but the key distinction is that Batch Transform is designed for offline, asynchronous workloads and cannot meet low-latency requirements.

Practice this question →

578

Multi-Selecthard

Which TWO of the following are techniques used to reduce overfitting in a neural network?

Select 2 answers

A.Increase the number of layers

B.Batch normalization

C.L2 regularization

D.Dropout

E.Increase the learning rate

AnswersC, D

L2 regularization penalizes large weights.

Why this answer

Options B and D are correct. B is correct because dropout randomly drops units, preventing co-adaptation. D is correct because L2 regularization penalizes large weights.

A is wrong because increasing the number of layers increases model complexity, which can worsen overfitting. C is wrong because increasing learning rate may cause divergence, not reduce overfitting. E is wrong because batch normalization helps training but does not primarily reduce overfitting.

Practice this question →

579

MCQeasy

A data scientist is training a binary classification model on an imbalanced dataset where the positive class represents only 1% of the data. The model achieves 99% accuracy but fails to identify most positive cases. Which metric should the data scientist use to evaluate model performance?

A.R-squared

B.F1 score

C.Accuracy

D.RMSE

AnswerB

F1 score balances precision and recall, suitable for imbalanced data.

Why this answer

The F1 score is the harmonic mean of precision and recall, making it ideal for imbalanced datasets where accuracy is misleading. Since the model achieves 99% accuracy by simply predicting the majority class (negative), it fails to capture positive cases; F1 score penalizes this by balancing false positives and false negatives, providing a more truthful performance measure.

Exam trap

The trap here is that candidates often default to accuracy as the primary metric, overlooking how imbalanced data can inflate accuracy while hiding poor positive class detection, which the F1 score directly addresses.

How to eliminate wrong answers

Option A is wrong because R-squared is a regression metric that measures the proportion of variance explained by the model, not applicable to binary classification. Option C is wrong because accuracy is misleading on imbalanced datasets; a model predicting all negatives achieves 99% accuracy but fails to identify any positives, so it does not reflect true performance. Option D is wrong because RMSE is a regression metric that measures the square root of the average squared differences between predicted and actual values, not suitable for binary classification outcomes.

Practice this question →

580

MCQeasy

A company is building a binary classifier to predict customer churn. The dataset has 10,000 samples with 500 churners (5% positive class). After training a logistic regression model, the precision is 0.8 and recall is 0.2. Which metric should the data scientist focus on to improve the model's ability to identify churners while minimizing false positives?

A.Increase accuracy

B.Increase precision

C.Increase recall

D.Increase F1 score

AnswerC

Recall is low (0.2), so improving it will capture more churners.

Why this answer

Option A is correct because recall is very low (0.2), meaning the model misses most churners. Improving recall will capture more churners. Option B (precision) is already high but recall is low.

Option C (accuracy) is misleading due to class imbalance. Option D (F1) balances precision and recall, but the primary concern is low recall.

Practice this question →

581

MCQmedium

A machine learning engineer is deploying a PyTorch model on SageMaker for real-time inference. The model requires GPU for low latency. Which instance type and configuration should the engineer choose?

A.Deploy to an ml.c5.4xlarge instance with SageMaker batch transform.

B.Deploy to an ml.m5.large instance with a SageMaker model endpoint.

C.Deploy to an ml.p3.2xlarge instance with a SageMaker endpoint.

D.Deploy to an ml.p3.2xlarge instance with SageMaker batch transform.

AnswerC

p3 provides GPU; endpoint enables real-time inference.

Why this answer

SageMaker real-time endpoints support GPU instances like ml.p3.2xlarge. Option A (ml.m5.large) is CPU only. Option B (ml.c5.4xlarge) is CPU.

Option D (ml.p3.2xlarge with batch transform) is inference but batch is not real-time; endpoint is needed.

Practice this question →

582

MCQhard

A company uses Amazon SageMaker to train a custom TensorFlow model for image classification. The training job runs on a single ml.p3.2xlarge instance. The dataset contains 500,000 images stored in S3. The training time is too long (over 24 hours). The data scientist wants to reduce training time without changing the model architecture. The dataset is already in TFRecord format. The training script uses the default TensorFlow data pipeline. Which change will MOST significantly reduce training time?

A.Use SageMaker Pipe mode and increase the number of data files.

B.Use SageMaker's distributed data parallelism with multiple instances.

C.Switch the input mode from File to Pipe.

D.Optimize the data pipeline using tf.data.Dataset.prefetch and cache.

AnswerB

Distributed training across multiple GPUs significantly reduces wall-clock training time.

Why this answer

Option B is correct. Using SageMaker's distributed data parallelism with multiple GPUs reduces training time proportionally. Option A is wrong because File mode may cause I/O bottlenecks.

Option C is wrong because optimizing data pipeline helps but less than adding more compute. Option D is wrong because Pipe mode streams data but does not reduce computation.

Practice this question →

583

MCQhard

A media company uses SageMaker to train a neural network for content recommendation. The model uses embeddings for users and items. Training is slow and they want to reduce time. The dataset has 10 million users and 1 million items. They have a cluster of 8 p3.16xlarge instances. Which strategy is most likely to reduce training time?

A.Use data parallelism to replicate the model on each GPU and synchronize gradients

B.Reduce the embedding dimension from 256 to 64

C.Use SageMaker's model parallelism to split the embedding layers across GPUs

D.Use a smaller batch size to fit on each GPU

AnswerC

Model parallelism distributes large embedding tables across devices, reducing memory and enabling larger batches.

Why this answer

Model parallelism is designed for large models with memory-intensive layers like embeddings.

Practice this question →

584

MCQeasy

A company is using Amazon SageMaker to train a deep learning model. The training job is taking a long time, and the data scientist wants to reduce training time without sacrificing accuracy. Which technique should they use?

A.Use a larger instance type

B.Use a smaller instance type

C.Reduce the number of epochs

D.Use managed spot training with checkpointing

AnswerD

Spot instances are cheaper and can speed up training with checkpointing for interruptions.

Why this answer

Option D is correct because managed spot training can reduce cost and training time by using preemptible instances, often with checkpointing. Option A is wrong because increasing instance count may not linearly reduce time and could increase cost. Option B is wrong because reducing epochs may sacrifice accuracy.

Option C is wrong because using a smaller instance is likely to increase time.

Practice this question →

585

Multi-Selecthard

A machine learning team is deploying a real-time inference endpoint for a fraud detection model using Amazon SageMaker. The model is a LightGBM classifier trained on 1 GB of tabular data. The endpoint must respond within 100 ms for 99% of requests, with a throughput of 10 requests per second. During load testing, the team observes that the 99th percentile latency is 250 ms and the endpoint CPU utilization is consistently above 90%. The team has already selected an ml.c5.xlarge instance with auto scaling enabled. Which combination of actions should the team take to meet the latency requirement? (Choose 3.)

Select 3 answers

A.Upgrade the instance type to ml.c5.2xlarge to increase CPU resources per instance.

B.Reduce the number of trees in the LightGBM model to decrease inference time.

C.Enable SageMaker's data compression for endpoint input payloads.

D.Switch to using SageMaker Batch Transform instead of a real-time endpoint.

AnswersA, B, C

More CPU reduces per-request processing time, lowering latency.

Why this answer

Option A (switching to ml.c5.2xlarge) provides more CPU capacity, reducing latency. Option B (enabling SageMaker's data compression) reduces network transfer time and I/O overhead. Option D (using batch transform instead of real-time) is a fundamental change in architecture that would not meet real-time requirements.

Option E (reducing the number of trees in LightGBM) directly reduces inference computation time. Option F (increasing instance count) is already handled by auto scaling, but alone may not reduce latency per request if each instance is saturated.

Practice this question →

586

MCQeasy

A data scientist is using Amazon SageMaker to train a linear regression model. The training data contains 100 features and 1 million rows. The scientist notices that the model is overfitting, with training R² of 0.99 and validation R² of 0.65. The scientist has already tried adding L2 regularization and reducing the number of features. Which additional technique should the scientist try to reduce overfitting?

A.Increase the amount of training data

B.Increase the batch size

C.Increase the learning rate

D.Add more features

AnswerA

More data helps the model generalize better.

Why this answer

Option C is correct. Adding more training data can help reduce overfitting by providing a more representative sample. Option A is wrong because increasing model complexity (more features) would worsen overfitting.

Option B is wrong because increasing learning rate may cause instability. Option D is wrong because increasing batch size may not help and could lead to overfitting.

Practice this question →

587

MCQeasy

A data scientist is using SageMaker to train a linear learner algorithm. After training, the evaluation shows that the model has high bias. Which action is most likely to reduce bias?

A.Increase the L2 regularization strength

B.Reduce the amount of training data

C.Add feature crosses for categorical variables

D.Remove some features that have low variance

AnswerC

Adding feature crosses increases model capacity to capture interactions, reducing bias.

Why this answer

High bias indicates that the model is underfitting the data, meaning it is too simple to capture the underlying patterns. Adding feature crosses for categorical variables creates interaction features that allow the linear learner to model non-linear relationships, increasing model complexity and reducing bias. This is a standard technique in linear models to address underfitting without switching to a non-linear algorithm.

Exam trap

The trap here is that candidates often confuse bias with variance and incorrectly choose regularization (Option A) to fix underfitting, when regularization actually increases bias and is used to combat overfitting (high variance).

How to eliminate wrong answers

Option A is wrong because increasing L2 regularization strength penalizes large weights, which simplifies the model further and increases bias, not reduces it. Option B is wrong because reducing the amount of training data typically worsens underfitting by providing fewer examples for the model to learn from, increasing bias. Option D is wrong because removing low-variance features reduces the information available to the model, which can increase bias by discarding potentially useful signals.

Practice this question →

588

MCQmedium

A data science team is using Amazon SageMaker to train a deep learning model for object detection using the built-in SSD algorithm. The dataset consists of 100,000 labeled images stored in a SageMaker Pipe Mode input. The training job uses a single ml.p3.2xlarge instance. After 2 hours, the training job fails with the error 'ResourceLimitExceeded: The account-level service limit for ml.p3.2xlarge for training job usage is 1. Contact AWS Support to request a limit increase'. However, the team has already submitted a limit increase request and it was approved for 5 instances. What is the most likely cause of the error?

A.The instance is running out of GPU memory

B.The built-in SSD algorithm requires a GPU instance type with at least 16 GB of GPU memory

C.The service limit increase has not yet been applied to the account in the current region

D.The IAM role does not have permission to access the S3 bucket for model artifacts

AnswerC

Limit increases are region-specific and may take some time to become effective.

Why this answer

Option B is correct because the limit increase applies to the region and may take time to propagate, or the instance type limit is per-family, not per instance type. Option A (GPU memory) would be a different error. Option C (S3 access) would be AccessDenied.

Option D (algorithm) unrelated.

Practice this question →

589

MCQmedium

A data scientist is using Amazon SageMaker to train a deep learning model on a large dataset stored in S3. The training job is taking too long. The data scientist wants to reduce training time without changing the model architecture. Which action should they take?

A.Use Pipe mode for data input

B.Use a smaller instance type

C.Increase the number of epochs

D.Decrease the batch size

AnswerA

Pipe mode streams data, reducing download time.

Why this answer

Using Pipe mode streams data directly from S3 without downloading, reducing I/O time. Option A is wrong because a smaller instance may increase training time. Option C is wrong because reducing batch size can increase training steps.

Option D is wrong because increasing epochs increases training time.

Practice this question →

590

MCQmedium

A training job fails with the error shown. The training script expects a file named 'train.csv' in the 'training' channel. What is the most likely cause?

A.The 'train.csv' file is located inside a subfolder within 's3://my-bucket/data/', and the script expects it directly in the channel path.

B.The S3 bucket policy denies access to the 'train.csv' file.

C.The channel name in the input data configuration does not match the script's expected channel name.

D.The training script has a bug that prevents it from reading the file.

AnswerA

SageMaker downloads the entire S3 prefix; if the file is nested, it may not be at the expected location.

Why this answer

The error indicates that the training script cannot find 'train.csv' in the expected location. When SageMaker copies data from an S3 channel path (e.g., 's3://my-bucket/data/') to the training instance, it places the contents of that S3 prefix directly into the channel directory (e.g., '/opt/ml/input/data/training/'). If the CSV file is inside a subfolder (e.g., 's3://my-bucket/data/subfolder/train.csv'), the script will not find it at the top level of the channel path, causing a 'file not found' error.

Exam trap

Cisco often tests the distinction between S3 prefix behavior and file location expectations, trapping candidates who assume SageMaker automatically searches subdirectories or flattens the S3 structure.

How to eliminate wrong answers

Option B is wrong because an S3 bucket policy denying access would produce a different error (e.g., 'AccessDenied' or '403 Forbidden'), not a 'file not found' error from the training script. Option C is wrong because the error message does not mention a channel name mismatch; such a mismatch would cause SageMaker to fail to mount the channel, resulting in a different error during the job setup phase. Option D is wrong because the error is specifically about a missing file, not a runtime bug in the script's reading logic; a bug would typically produce a Python traceback or parsing error, not a 'file not found' error.

Practice this question →

591

MCQmedium

A company is using Amazon SageMaker to deploy a model for real-time inference. The model has a latency requirement of less than 100 milliseconds. During testing, the latency is around 150 milliseconds. Which action can most likely reduce the latency to meet the requirement?

A.Reduce the batch size for inference.

B.Enable data capture for the endpoint.

C.Increase the initial variant weight for the production variant.

D.Use a larger instance type for the endpoint.

AnswerD

A larger instance type provides more compute resources, reducing inference latency.

Why this answer

Enabling data capture adds overhead and increases latency. Using a larger instance type would provide more compute and reduce latency, but may increase cost. Reducing the batch size for inference (if batching is used) can reduce latency because the model processes fewer requests at once.

However, the question implies a real-time endpoint which typically processes one request at a time; batch size might be 1. Increasing the variant weight for the production variant is for traffic routing, not latency. The most direct is to use a more powerful instance type.

But also consider that increasing batch size (if using multi-record) increases latency. Reducing batch size reduces latency. However, for a real-time endpoint, the instance type is key.

I'll go with using a larger instance type.

Practice this question →

592

MCQhard

A company is deploying a machine learning model for real-time fraud detection. The model must have low latency (under 100 ms) and high throughput. The data scientist trains a gradient boosting model and deploys it to a SageMaker endpoint with a single ml.c5.xlarge instance. During load testing, the endpoint exceeds the latency threshold. Which change is MOST likely to reduce latency?

A.Replace the model with a simpler model, such as logistic regression

B.Use a larger instance type, such as ml.c5.4xlarge

C.Switch to batch transform for inference

D.Enable automatic scaling on the endpoint

AnswerA

A simpler model has lower inference latency, meeting the 100 ms requirement.

Why this answer

Option A is correct because replacing the gradient boosting model with a simpler model like logistic regression reduces the computational complexity per inference. Gradient boosting involves traversing many decision trees, each requiring multiple conditional checks and arithmetic operations, while logistic regression is a single linear transformation. This directly lowers CPU utilization per request, reducing latency under the same instance resources.

Exam trap

The trap here is that candidates often assume scaling up instance size or adding automatic scaling will fix latency, but latency is a per-request metric that depends on model complexity, not just infrastructure parallelism or throughput.

How to eliminate wrong answers

Option B is wrong because using a larger instance type (ml.c5.4xlarge) increases available vCPUs and memory, but the bottleneck is likely per-request computation time, not parallelism; a larger instance may improve throughput but does not guarantee per-request latency drops below 100 ms if the model itself is computationally heavy. Option C is wrong because batch transform is designed for offline, asynchronous inference on large datasets, not real-time low-latency serving; switching to batch transform would increase latency dramatically (minutes vs milliseconds) and violate the real-time requirement. Option D is wrong because automatic scaling adjusts the number of instances based on traffic, which helps with throughput under varying load but does not reduce the per-request latency of a single inference; scaling adds more endpoints but each individual request still faces the same model computation time.

Practice this question →

593

MCQeasy

A data scientist needs to evaluate a binary classification model. The dataset is balanced. Which metric is most appropriate to compare model performance?

A.Recall

B.F1 score

C.Precision

D.Accuracy

AnswerD

For balanced classes, accuracy is a straightforward metric.

Why this answer

For a balanced binary classification dataset, accuracy is the most appropriate metric because it directly measures the proportion of correct predictions (true positives and true negatives) out of all predictions. Since the class distribution is equal, accuracy is not misleadingly high due to class imbalance, making it a reliable and straightforward measure of overall model performance.

Exam trap

AWS often tests the misconception that F1 score or precision-recall metrics are always superior, but for balanced datasets, accuracy is the simplest and most appropriate metric, and candidates may overlook this by defaulting to imbalance-focused metrics.

How to eliminate wrong answers

Option A is wrong because recall focuses only on true positives relative to actual positives, ignoring true negatives and thus not capturing overall performance on a balanced dataset. Option B is wrong because the F1 score is the harmonic mean of precision and recall, which is more useful when there is class imbalance; for a balanced dataset, accuracy is simpler and equally informative. Option C is wrong because precision only considers true positives relative to predicted positives, neglecting true negatives and overall correctness, which is insufficient for balanced data.

Practice this question →

594

Multi-Selectmedium

Which TWO options are best practices for training machine learning models using SageMaker? (Choose TWO.)

Select 2 answers

A.Train the final model on the combined training and test sets to maximize data usage

B.Use incremental training when you have new data that is similar to the original training data

C.Use SageMaker Managed Spot Training to reduce training costs

D.Always use the largest possible instance type to minimize training time

E.Always enable checkpointing to save the model after every epoch

AnswersB, C

Incremental training saves time by starting from an existing model.

Why this answer

Option B is correct because SageMaker's incremental training allows you to continue training an existing model with new data that shares the same schema and feature space, without retraining from scratch. This is a best practice when you have a steady stream of similar data, as it saves time and compute resources while preserving previously learned patterns.

Exam trap

Cisco often tests the misconception that 'more data is always better' (Option A) or that 'bigger instances are always faster' (Option D), when in reality best practices prioritize data integrity, cost efficiency, and appropriate resource scaling.

Practice this question →

595

MCQmedium

A company is using Amazon SageMaker to train a model. The training job is using a large dataset stored in S3. The data scientist notices that the training job is spending a significant amount of time reading data from S3. Which approach would BEST reduce data loading time?

A.Use the Pipe mode input for the training data

B.Use the File mode input with a larger instance

C.Use a larger training instance with more CPU

D.Increase the batch size to reduce the number of batches

AnswerA

Pipe mode streams data directly, reducing I/O.

Why this answer

Pipe mode streams data directly from S3 into the training algorithm without first downloading it to the training instance's local storage. This eliminates the I/O bottleneck of writing large datasets to disk, significantly reducing data loading time compared to File mode, which downloads the entire dataset before training begins.

Exam trap

The trap here is that candidates often confuse 'batch size' with data loading performance, or assume that more CPU/instance size will speed up S3 reads, when in fact the bottleneck is the network and disk I/O, not compute.

How to eliminate wrong answers

Option B is wrong because File mode requires the entire dataset to be downloaded to the instance's local disk before training starts, which adds significant latency and does not address the root cause of slow S3 reads. Option C is wrong because a larger instance with more CPU does not reduce the time spent reading data from S3; the bottleneck is network I/O and S3 request latency, not compute capacity. Option D is wrong because increasing batch size only affects the number of forward/backward passes per epoch, not the time spent loading data from S3; the data must still be read in its entirety.

Practice this question →

596

MCQhard

A data scientist is using SageMaker to train an XGBoost model for regression. The training data contains categorical features with high cardinality (e.g., zip code with over 10,000 unique values). Which feature engineering approach is MOST appropriate to avoid overfitting while preserving predictive power?

A.Use target encoding with smoothing

B.One-hot encode the categorical features

C.Apply frequency encoding based on category occurrence

D.Label encode the categorical features

AnswerA

Target encoding captures category-target relationship with regularization to avoid overfitting.

Why this answer

Target encoding with smoothing is the most appropriate approach because it replaces each high-cardinality category with the mean of the target variable for that category, regularized by a smoothing factor that pulls estimates toward the global mean. This preserves predictive power by capturing the relationship between the category and the target while preventing overfitting on rare categories that have few samples. In SageMaker XGBoost, this avoids the curse of dimensionality from one-hot encoding and the arbitrary ordering from label encoding.

Exam trap

The trap here is that candidates often default to one-hot encoding for categorical features, not realizing that high cardinality makes it computationally infeasible and prone to overfitting, while target encoding with smoothing offers a compact and powerful alternative.

How to eliminate wrong answers

Option B is wrong because one-hot encoding a feature with over 10,000 unique values would create over 10,000 binary columns, drastically increasing dimensionality and memory usage, which leads to overfitting and poor generalization in tree-based models like XGBoost. Option C is wrong because frequency encoding replaces categories with their occurrence counts, which loses the relationship between the category and the target variable, often reducing predictive power and introducing bias toward frequent categories. Option D is wrong because label encoding assigns arbitrary integer labels to categories, which implies an ordinal relationship that does not exist, misleading the XGBoost model into treating the feature as ordered and potentially causing poor splits.

Practice this question →

597

MCQhard

A machine learning engineer is deploying a model for real-time inference using Amazon SageMaker. The model is a large ensemble that requires 8 GB of memory and 4 vCPUs. The expected traffic is 100 requests per second with a 200 ms latency requirement. Which instance configuration should they choose?

A.ml.t2.medium (2 vCPU, 4 GB)

B.ml.c5.2xlarge (8 vCPU, 16 GB)

C.ml.p3.2xlarge (8 vCPU, 61 GB GPU)

D.ml.m5.large (2 vCPU, 8 GB)

AnswerB

Adequate memory and vCPUs for the workload.

Why this answer

Option C is correct because ml.c5.2xlarge has 8 vCPUs and 16 GB memory, which is sufficient and cost-effective for the workload. Option A is wrong because ml.m5.large has only 2 vCPUs and 8 GB memory, which may not handle the throughput. Option B is wrong because ml.p3.2xlarge is GPU-optimized and overkill.

Option D is wrong because ml.t2.medium is too small for the memory requirement.

Practice this question →

598

MCQhard

A machine learning engineer is training a model using Amazon SageMaker. The training data is stored in S3 and is 10 TB. The engineer wants to use Pipe input mode to stream data from S3. Which algorithm supports Pipe mode?

A.Amazon SageMaker Linear Learner

B.Amazon SageMaker K-Means

C.Amazon SageMaker XGBoost

D.Amazon SageMaker PCA

AnswerC

Supports Pipe input mode.

Why this answer

Option A is correct because Amazon SageMaker XGBoost supports Pipe mode for streaming. Option B is wrong because Linear Learner does not support Pipe mode. Option C is wrong because K-Means does not support Pipe mode.

Option D is wrong because PCA does not support Pipe mode.

Practice this question →

599

Multi-Selecthard

A company uses Amazon SageMaker to train a deep learning model using TensorFlow. The training job is failing with an 'OutOfMemory' error. The instance type is ml.p3.2xlarge with 16 GB GPU memory. The model has 10 million parameters. Which THREE actions should be taken to resolve the memory issue? (Choose THREE.)

Select 3 answers

A.Reduce the batch size

B.Increase the number of epochs

C.Enable mixed precision training

D.Increase the batch size

E.Use gradient accumulation

AnswersA, C, E

Smaller batch size directly reduces memory usage.

Why this answer

Options B, C, and E are correct. Gradient accumulation simulates larger batch sizes without increasing memory usage. Mixed precision training reduces memory footprint.

Reducing batch size directly decreases memory usage. Option A is wrong because increasing batch size exacerbates memory issue. Option D is wrong because increasing epochs does not affect memory per step.

Practice this question →

600

Multi-Selectmedium

A machine learning engineer is training a deep learning model on Amazon SageMaker. The training job is taking a long time. Which THREE actions can reduce training time? (Choose 3.)

Select 3 answers

A.Use SageMaker managed spot training

B.Use SageMaker managed warm pools to reuse the training environment

C.Use SageMaker distributed training (data parallelism)

D.Use a smaller batch size

E.Use SageMaker hyperparameter tuning jobs

AnswersA, B, C

Spot instances can reduce cost and training time if interruptions are tolerated.

Why this answer

Warm pools, distributed training, and spot training can reduce training time.

Practice this question →

← PreviousPage 8 of 9 · 624 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Modeling questions.

Start 20-question session