CCNA Machine Learning Deep Learning Questions

31 of 106 questions · Page 2/2 · Machine Learning Deep Learning topic · Answers revealed

76
MCQhard

A data scientist is using a gradient boosting model (XGBoost) for a regression task and observes that the model's performance on the training set is much better than on the test set. Which hyperparameter tuning strategy would most effectively reduce overfitting?

A.Increase the number of boosting rounds
B.Increase the learning rate
C.Reduce the maximum depth of trees
D.Subsample less than 1.0
AnswerC

Shallow trees are less complex and generalize better, reducing overfitting.

Why this answer

Option A (Increase learning rate) makes each tree more influential, increasing overfitting. Option C (Increase boosting rounds) increases model complexity. Option D (Subsample less than 1.0) introduces randomness but is less direct than tree depth.

Option B (Reduce max depth) limits tree complexity, reducing overfitting.

77
MCQmedium

An organization needs to classify customer emails into categories. They have labeled data for some categories but not all. Which approach should they use?

A.Unsupervised clustering then labeling
B.Supervised learning for all categories
C.Reinforcement learning
D.Semi-supervised learning
AnswerD

Correct: Semi-supervised learning leverages both labeled and unlabeled data.

Why this answer

Option C is correct because semi-supervised learning uses a small amount of labeled data along with a large amount of unlabeled data. Options A, B, and D are incorrect: supervised learning requires labels for all categories, unsupervised clustering would group without category labels, and reinforcement learning is for sequential decision making.

78
MCQmedium

A machine learning team is deploying a model that predicts customer churn. They notice that the model's predictions are highly sensitive to small changes in input features, leading to inconsistent outputs. Which technique should the team apply to improve model stability?

A.Increase learning rate
B.Feature scaling
C.Regularization
D.Cross-validation
AnswerC

Regularization adds a penalty for large weights, reducing overfitting and sensitivity to input variations.

Why this answer

Regularization (Option C) is the correct technique because it adds a penalty term to the loss function (e.g., L1 or L2 regularization), which constrains the model's weights. This reduces variance and prevents overfitting to noise in the training data, directly addressing the high sensitivity to small input changes (brittleness). By shrinking coefficients, regularization forces the model to learn more general patterns, improving stability and consistency in predictions.

Exam trap

CompTIA often tests the misconception that feature scaling alone can fix model instability, but scaling only normalizes inputs and does not penalize large weights, which is the root cause of sensitivity to small input changes.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate makes gradient descent steps larger, which can cause the model to overshoot minima and increase instability, not reduce sensitivity to input changes. Option B is wrong because feature scaling normalizes input ranges (e.g., via standardization or min-max scaling) to help gradient descent converge faster, but it does not address model variance or overfitting that causes prediction instability. Option D is wrong because cross-validation is a technique for evaluating model performance and tuning hyperparameters, not a method to directly improve model stability or reduce sensitivity to input perturbations.

79
MCQeasy

A data scientist is training a binary classification model to detect fraudulent transactions. The dataset is highly imbalanced with only 1% fraud cases. Which technique is most appropriate to address the class imbalance?

A.Use a linear regression model
B.Oversample the minority class
C.Undersample the majority class
D.Increase the learning rate
AnswerB

Oversampling creates synthetic instances of the minority class, helping the model learn better boundaries.

Why this answer

Oversampling the minority class (e.g., SMOTE) creates synthetic samples to balance the dataset, which is a common and effective approach for imbalanced classification.

80
MCQmedium

Based on the exhibit, what is the most likely issue with the trained model?

A.Overfitting because training accuracy is much higher than validation accuracy
B.Data leakage artificially inflating training accuracy
C.Vanishing gradients causing no learning
D.Underfitting due to insufficient epochs
AnswerA

Training accuracy (99.32%) is significantly higher than validation accuracy (78.9%), a classic sign of overfitting.

Why this answer

The training accuracy reaches 99.32% while the validation accuracy plateaus around 78.9%, indicating overfitting. The model has memorized the training data but fails to generalize. Underfitting would show poor performance on both.

Vanishing gradients would cause loss not to decrease significantly. Data leakage would cause unusually high performance on both sets, but here validation accuracy is far lower.

81
MCQmedium

Refer to the exhibit. A developer is using the above configuration for a multi-class classification task. The model performs well on training data but poorly on validation data. Which modification could help?

A.Remove dropout
B.Increase the dropout rate
C.Add L2 regularization to the dense layers
D.Increase the learning rate
AnswerC

L2 regularization adds a penalty on weights, which can reduce overfitting.

Why this answer

Option B is correct because adding L2 regularization to the dense layers penalizes large weights, reducing overfitting. Option A is incorrect because increasing dropout from 0.5 may hurt performance; it's already a reasonable value. Option C is incorrect because increasing learning rate could destabilize training.

Option D is incorrect because removing dropout would likely increase overfitting.

82
MCQmedium

A company is deploying a machine learning model to predict customer churn. The dataset is highly imbalanced (95% non-churn, 5% churn). The model achieves 96% accuracy, but the F1-score for the churn class is only 0.2. Which metric should the team prioritize to evaluate model performance for this business problem?

A.F1-score
B.Accuracy
C.Log loss
D.AUC-ROC
AnswerA

F1-score balances precision and recall, suitable for imbalanced data.

Why this answer

In a highly imbalanced dataset (95% non-churn, 5% churn), accuracy is misleading because a model can achieve 96% accuracy by simply predicting the majority class for all instances. The F1-score, which is the harmonic mean of precision and recall, specifically measures the model's performance on the minority (churn) class. A low F1-score of 0.2 indicates the model fails to correctly identify churners, which is the critical business outcome, making F1-score the correct metric to prioritize.

Exam trap

CompTIA often tests the misconception that high accuracy is always good, especially in imbalanced datasets, leading candidates to overlook the F1-score as the appropriate metric for minority class performance.

How to eliminate wrong answers

Option B is wrong because accuracy is a poor metric for imbalanced datasets; a model can achieve high accuracy by always predicting the majority class, which does not reflect its ability to detect the minority churn class. Option C is wrong because log loss measures the confidence of probability predictions across all classes, but it does not directly address the imbalance or provide a clear threshold-based evaluation of the minority class performance like F1-score does. Option D is wrong because AUC-ROC evaluates the model's ability to rank positive and negative instances, but it can be overly optimistic in highly imbalanced scenarios and does not directly reflect precision and recall for the minority class, which are critical for churn prediction.

83
MCQhard

A financial institution uses a deep learning model for fraud detection. The model is a feedforward neural network with three hidden layers. It was trained on a balanced dataset of 100,000 transactions. During deployment, the model achieves high accuracy on the test set but the fraud detection rate (true positive rate) is only 40% while the false positive rate is 0.1%. The business requires a true positive rate of at least 80%. Which of the following actions is most likely to achieve the required true positive rate while minimizing the increase in false positives?

A.Increase the number of hidden layers to five to capture more complex patterns
B.Use synthetic minority oversampling (SMOTE) to rebalance the training set
C.Change the threshold for classifying a transaction as fraud from the default 0.5 to a lower value
D.Add L2 regularization to reduce overfitting
AnswerC

Lowering threshold increases TPR; the optimal threshold can be chosen based on the precision-recall curve.

Why this answer

Option A (more hidden layers) may not improve recall and could overfit. Option C (L2 regularization) would increase bias, likely lowering TPR. Option D (SMOTE) rebalances training but the model already trained on balanced data; threshold adjustment is more direct.

Option B (lower decision threshold) directly increases TPR at the cost of FPR; threshold can be tuned to achieve 80% TPR with minimal FPR increase.

84
MCQmedium

Based on the exhibit, what is the likely problem with the model?

A.Batch size too small
B.Overfitting
C.Learning rate too high
D.Underfitting
AnswerB

Correct: Training loss decreases but validation loss increases, classic overfitting.

Why this answer

Option A is correct because the training loss keeps decreasing while validation loss increases after a point, indicating overfitting. Options B, C, and D are incorrect: underfitting would have high training loss, learning rate too high would cause loss to oscillate, and batch size too small might affect convergence but not this pattern.

85
MCQmedium

A deep learning model for sentiment analysis has millions of parameters and is trained on a small dataset. Which technique can help prevent overfitting?

A.Learning rate scheduling
B.Batch normalization
C.Dropout
D.Early stopping
AnswerC

Correct: Dropout is specifically designed to reduce overfitting in large neural networks.

Why this answer

Option A is correct because dropout is a regularization technique that randomly drops neurons during training, reducing overfitting. Options B, C, and D are incorrect: batch normalization helps with internal covariate shift, learning rate scheduling helps convergence, and early stopping can prevent overfitting but is not as specific as dropout for parameter-heavy models.

86
MCQmedium

A self-driving car company uses a reinforcement learning agent to navigate. The agent was trained in a simulated environment and achieved high rewards. When deployed in the real world, the agent fails to avoid obstacles. The team collects real-world driving data and uses it to fine-tune the model. However, fine-tuning leads to catastrophic forgetting of the simulated knowledge. Which technique should the team use to mitigate this? A. Increase the learning rate during fine-tuning. B. Use elastic weight consolidation (EWC) to regularize important weights. C. Train the model from scratch using only real-world data. D. Increase the number of layers in the network.

A.Increase the number of layers in the network.
B.Use elastic weight consolidation (EWC) to regularize important weights.
C.Train the model from scratch using only real-world data.
D.Increase the learning rate during fine-tuning.
AnswerB

EWC selectively slows down learning on important weights for previous tasks, preserving simulated knowledge.

Why this answer

Option B is correct. Elastic weight consolidation (EWC) is a regularization technique that penalizes changes to weights that are important for previous tasks (simulation), thereby preventing catastrophic forgetting. Option A (increasing learning rate) would make forgetting worse.

Option C (training from scratch) discards the valuable simulation knowledge. Option D (adding layers) may increase capacity but does not address forgetting.

87
MCQhard

Refer to the exhibit. A compliance audit requires that model predictions be explainable for regulatory reasons. Which setting in the deployment configuration supports this requirement?

A.target_latency: 100
B.data_retention: "90 days"
C.drift_detection: true
D.explainability: "required"
AnswerD

This setting directly mandates explainability.

Why this answer

The 'explainability': 'required' under compliance indicates that the model must provide explanations, meeting the audit requirement.

88
Multi-Selecthard

A data scientist is using an ensemble method to combine multiple models. Which three statements about bagging (Bootstrap Aggregating) are true? (Select THREE.)

Select 3 answers
A.It requires the base models to be of different types
B.It reduces variance without increasing bias
C.It can be used with decision trees to create random forests
D.It reduces the error by combining weak learners
E.It trains models independently on bootstrap samples
AnswersB, C, E

Bagging averages predictions from models trained on bootstrap samples, reducing variance while bias remains similar.

Why this answer

Options A, B, and D are correct. Bagging reduces variance of unstable models (like trees) without increasing bias (A). It trains models independently on bootstrap samples (B).

Random forests use bagging along with random feature selection (D). Option C is false because boosting reduces bias, not bagging. Option E is false because bagging typically uses the same type of base model.

89
MCQmedium

A financial institution uses a random forest model to approve loan applications. Recently, the model's false positive rate has increased, leading to more defaults. The data science team reviews the feature importance and finds that the model heavily relies on a feature 'zip code' which correlates with income. The company is concerned about fairness. The regulatory team requires that the model's predictions are not biased against protected groups. Which action BEST addresses the fairness concern while maintaining predictive performance? A. Remove the 'zip code' feature and retrain the model. B. Use adversarial debiasing to train a model that is invariant to protected attributes. C. Add more training data from underrepresented zip codes. D. Apply a post-processing technique that adjusts thresholds for different groups.

A.Apply a post-processing technique that adjusts thresholds for different groups.
B.Remove the 'zip code' feature and retrain the model.
C.Add more training data from underrepresented zip codes.
D.Use adversarial debiasing to train a model that is invariant to protected attributes.
AnswerD

Adversarial debiasing explicitly reduces the model's ability to predict protected attributes, mitigating bias while retaining predictive power.

Why this answer

Option B is correct. Adversarial debiasing directly forces the model to learn representations that are not predictive of protected attributes, thereby reducing bias while maintaining performance as much as possible. Option A (removing zip code) might lose important information, as zip code could be a proxy for other legitimate factors; also, other features may still correlate with protected attributes.

Option C (adding data) does not directly address bias and may not remove the correlation. Option D (post-processing) can adjust thresholds but may not address the underlying model bias; it is a less robust solution.

90
MCQhard

A machine learning engineer notices that the gradient values in a deep network are becoming extremely small during backpropagation. What is this problem?

A.Dead ReLU
B.Exploding gradient
C.Covariate shift
D.Vanishing gradient
AnswerD

Correct: Vanishing gradient makes weights stop updating effectively.

Why this answer

Option B is correct because vanishing gradient occurs when gradients become very small, preventing weight updates. Option A is incorrect: exploding gradient would be large values. Option C is incorrect: dead ReLU refers to neurons that output zero.

Option D is incorrect: covariate shift is a change in input distribution.

91
MCQeasy

A data scientist wants to reduce the dimensionality of a dataset with 200 features before training a regression model. Which technique should they use?

A.LDA
B.t-SNE
C.Autoencoder
D.PCA
AnswerD

Correct: PCA is widely used for dimensionality reduction in regression tasks.

Why this answer

Option D is correct because PCA is a linear dimensionality reduction technique that is commonly used for feature reduction. Options A, B, and C are incorrect: t-SNE is for visualization only, LDA is for classification, and autoencoder is a neural network approach but more complex than needed.

92
MCQeasy

A machine learning engineer notices that a linear regression model has high bias. Which action is most likely to reduce bias?

A.Use a more complex model, such as polynomial regression
B.Reduce the number of training samples
C.Add L2 regularization
D.Apply feature scaling
AnswerA

More complex models have lower bias as they can fit more patterns.

Why this answer

Option A (Add L2 regularization) increases bias to reduce variance. Option C (Reduce the number of training samples) would increase bias. Option D (Apply feature scaling) does not directly affect bias.

Option B (Use a more complex model, such as polynomial regression) increases model flexibility, reducing bias.

93
MCQmedium

A data scientist is training a binary classification model to detect fraudulent transactions. The dataset contains 99.9% legitimate transactions and 0.1% fraudulent transactions. After training a logistic regression model, the accuracy is 99.9%, but the recall for the fraud class is 0%. Which of the following is the MOST likely cause?

A.The regularization parameter is too large, causing underfitting.
B.The model is overfitting due to too many features.
C.The learning rate was too high.
D.The dataset is highly imbalanced, and the model predicts the majority class for all instances.
AnswerD

With severe class imbalance, a model can achieve high accuracy by always predicting the majority class, leading to zero recall for the minority class.

Why this answer

Option C is correct because a highly imbalanced dataset often leads the model to predict the majority class for all instances, resulting in high accuracy but zero recall for the minority class. Option A (learning rate) would not cause this behavior; it affects convergence speed. Option B (overfitting) typically reduces generalization but not in this specific pattern.

Option D (too large regularization) might cause underfitting but would not necessarily yield zero recall for one class.

94
MCQmedium

A model trained on a dataset has high bias and low variance. What does this indicate?

A.Good fit
B.Data leakage
C.Overfitting
D.Underfitting
AnswerD

Correct: High bias leads to underfitting.

Why this answer

Option A is correct because high bias and low variance indicate underfitting, where the model cannot capture the underlying patterns. Options B, C, and D are incorrect: overfitting has low bias and high variance, good fit has low bias and low variance, and data leakage is not a bias-variance concept.

95
MCQeasy

A dataset contains features on vastly different scales (e.g., age 0-100 vs. income 0-1,000,000). Which preprocessing step is essential before training a neural network?

A.Data augmentation
B.Dimensionality reduction
C.Feature scaling (standardization or normalization)
D.One-hot encoding
AnswerC

Scaling brings features to a similar range, improving gradient descent.

Why this answer

Neural networks are sensitive to feature scale; standardization or normalization ensures stable convergence.

96
MCQhard

An autonomous vehicle system uses a deep reinforcement learning agent to navigate. The agent's reward function gives +1 for reaching the destination and -0.1 for each time step. After training, the agent learns to circle the block repeatedly without reaching the destination. Which modification is most likely to fix this behavior?

A.Increase the time penalty to -1 per step
B.Increase the reward for reaching the destination to +10
C.Use a discount factor closer to 0
D.Add a penalty for each turn the vehicle makes
AnswerA

A higher penalty per step makes circling less rewarding and encourages reaching the destination quickly.

Why this answer

The agent learns to circle the block because the cumulative penalty for each time step (-0.1) is too small relative to the reward for reaching the destination (+1). By increasing the time penalty to -1 per step, the agent will incur a much larger cost for delaying, making it optimal to reach the destination quickly rather than looping indefinitely. This directly addresses the reward structure imbalance that causes the undesirable behavior.

Exam trap

CompTIA often tests the misconception that increasing the terminal reward alone will fix reward hacking, when in fact the per-step penalty must be large enough to make delay costly relative to the goal reward.

How to eliminate wrong answers

Option B is wrong because simply increasing the destination reward to +10 does not change the per-step penalty; the agent can still accumulate a small penalty while circling, and the total reward from looping may still outweigh the delayed +10 reward if the discount factor is high. Option C is wrong because using a discount factor closer to 0 makes the agent myopic, focusing only on immediate rewards; this would actually encourage short-term circling behavior rather than long-term goal achievement. Option D is wrong because adding a penalty for each turn does not address the core issue of the agent preferring to delay reaching the destination; the agent could still circle without turning (e.g., driving in a straight loop) or the penalty might not be large enough to overcome the reward structure.

97
MCQmedium

A team trained a deep neural network on a limited dataset. The training loss decreases consistently, but the validation loss starts increasing after 20 epochs. What is the most likely issue and the best corrective action?

A.Vanishing gradient; use ReLU activation
B.Overfitting; apply regularization like dropout
C.Underfitting; increase model complexity
D.Data leakage; reshuffle split
AnswerB

Dropout randomly drops neurons to prevent co-adaptation, reducing overfitting.

Why this answer

The divergence between training and validation loss indicates overfitting. Regularization techniques like dropout help reduce overfitting.

98
Multi-Selectmedium

A deep learning engineer is training a convolutional neural network for image classification. The model is overfitting the training data. Which three techniques can help reduce overfitting? (Choose three.)

Select 3 answers
A.Add dropout layers
B.Apply L2 regularization
C.Use data augmentation
D.Use a smaller learning rate
E.Increase the number of convolutional layers
AnswersA, B, C

Dropout randomly drops units during training, reducing co-adaptation.

Why this answer

Dropout, data augmentation, and L2 regularization are standard techniques to reduce overfitting by adding regularization or increasing data diversity.

99
Multi-Selecteasy

A company is preparing a dataset for training a supervised machine learning model. The dataset contains missing values, outliers, and categorical features. Which two preprocessing steps are typically performed to prepare the data? (Choose two.)

Select 2 answers
A.Normalize numerical features to a standard range
B.Impute missing values with the mean
C.Encode categorical variables using one-hot encoding
D.Remove all features with low variance
E.Increase the number of features using PCA
AnswersB, C

Imputation handles missing data and is commonly done.

Why this answer

Imputing missing values and encoding categorical features are standard preprocessing steps for most machine learning pipelines.

100
MCQhard

A team is building a model to predict stock prices based on time series data. They need to capture long-term dependencies and avoid vanishing gradients. Which architecture is best suited?

A.Standard RNN
B.LSTM
C.Autoencoder
D.CNN
AnswerB

LSTM excels at learning long-term dependencies.

Why this answer

LSTM networks are designed with gating mechanisms to capture long-range dependencies and mitigate vanishing gradient problems in standard RNNs.

101
MCQeasy

A team wants to predict monthly sales using historical data. Which algorithm is most appropriate?

A.Linear regression
B.K-means
C.Decision tree
D.Logistic regression
AnswerA

Correct: Linear regression models the relationship between dependent and independent variables for continuous output.

Why this answer

Option D is correct because linear regression is used for predicting continuous values. Options A, B, and C are incorrect: logistic regression is for binary classification, decision tree can be used for regression but linear regression is simpler for trend prediction, and K-means is for clustering.

102
Multi-Selecthard

Which THREE of the following are best practices for preventing overfitting in deep learning models?

Select 3 answers
A.L2 regularization
B.Increasing the number of layers
C.Dropout
D.Using a larger batch size
E.Data augmentation
AnswersA, C, E

L2 adds penalty on weights, keeping them small and reducing overfitting.

Why this answer

Dropout and L2 regularization directly penalize complexity. Data augmentation increases effective training set size. Increasing layers adds capacity, worsening overfitting.

Large batch sizes often lead to sharp minima and overfitting, not a prevention technique.

103
Multi-Selecteasy

A data scientist is tuning hyperparameters for a support vector machine (SVM) with an RBF kernel. Which two hyperparameters most significantly affect model performance? (Select TWO.)

Select 2 answers
A.gamma (kernel coefficient)
B.learning rate
C.epsilon (for epsilon-SVR)
D.degree (for polynomial kernel)
E.C (regularization parameter)
AnswersA, E

gamma determines the radius of influence of support vectors.

Why this answer

Options A and B are correct. C (regularization) controls the trade-off between margin and misclassification, and gamma defines the influence of a single training example. Option C is for polynomial kernel, not RBF.

Option D is for regression SVM. Option E is not a hyperparameter for SVM.

104
MCQhard

A deep learning model for autonomous vehicle perception uses a large convolutional neural network. During deployment, the model misclassifies a stop sign that has a small sticker on it. This is likely an example of what type of vulnerability, and which defense is most appropriate?

A.Adversarial attack; implement adversarial training
B.Model inversion; add differential privacy
C.Data poisoning; use robust aggregation
D.Transfer learning; use domain adaptation
AnswerA

Small perturbations like stickers can cause adversarial misclassification; adversarial training improves robustness.

Why this answer

Option B (Data poisoning) involves corrupting training data. Option C (Model inversion) extracts training data. Option D (Transfer learning) is a technique, not a vulnerability.

Option A correctly identifies an adversarial attack (small perturbation causing misclassification) and suggests adversarial training as a defense.

105
MCQmedium

A retail company uses a gradient boosting model to predict customer lifetime value (CLV). The model currently uses 50 features including purchase history, demographics, and web behavior. The model's RMSE on the test set is 120. The data science team wants to improve the model's accuracy without increasing training time significantly. They have access to additional data: customer support interaction logs (text), social media sentiment (text), and third-party credit scores (numeric). They also have the ability to perform feature engineering, hyperparameter tuning, and ensemble methods. Which approach is most likely to yield the best improvement in predictive performance with minimal increase in training time?

A.Add the customer support text as a feature using TF-IDF vectors
B.Use an ensemble of gradient boosting and random forest models
C.Perform hyperparameter tuning using grid search
D.Engineer new features such as average purchase value and recency
AnswerD

Feature engineering can capture patterns without adding new data sources or significant time.

Why this answer

Option D is correct because engineering domain-relevant features like average purchase value and recency directly captures the underlying behavioral patterns that drive customer lifetime value, often providing a higher signal-to-noise ratio than adding raw text or third-party data. This approach leverages existing data without significantly increasing the feature dimensionality or training time, unlike adding TF-IDF vectors which would dramatically expand the feature space and slow training.

Exam trap

CompTIA often tests the misconception that adding more data (especially text) or complex ensemble methods always improves model accuracy, while the correct approach is to engineer features that capture domain-specific patterns with minimal computational overhead.

How to eliminate wrong answers

Option A is wrong because adding customer support text as TF-IDF vectors would introduce thousands of sparse features, significantly increasing training time and risking overfitting without guaranteed improvement in RMSE. Option B is wrong because ensembling gradient boosting with random forest typically increases training time substantially (both models must be trained) and may not outperform a well-tuned single gradient boosting model on structured data. Option C is wrong because hyperparameter tuning using grid search is computationally expensive, often requiring many model fits, and would increase training time more than feature engineering without leveraging the new data sources.

106
MCQmedium

A team trains a random forest model on a dataset with 50 features. The model's performance on the test set is significantly worse than on the training set. Which technique is most appropriate to address this issue?

A.Apply cross-validation to tune hyperparameters and reduce overfitting
B.Increase the number of trees in the forest
C.Use feature scaling
D.Perform PCA to reduce dimensions
AnswerA

Cross-validation finds optimal max depth, min samples split, etc., to combat overfitting.

Why this answer

Option D is correct because cross-validation helps tune hyperparameters to reduce overfitting. Option A is incorrect because increasing trees reduces variance but may not be sufficient. Option B is incorrect because tree-based models are scale-invariant.

Option C is incorrect because PCA can reduce dimensionality but may lose information; hyperparameter tuning is a better first step.

← PreviousPage 2 of 2 · 106 questions total

Ready to test yourself?

Try a timed practice session using only Machine Learning Deep Learning questions.