CCNA Machine Learning Deep Learning Questions

75 of 106 questions · Page 1/2 · Machine Learning Deep Learning topic · Answers revealed

1
Multi-Selectmedium

A team is designing a deep learning pipeline for a computer vision task. They want to reduce overfitting. Which two techniques are specifically effective for this purpose? (Select TWO.)

Select 2 answers
A.Dropout
B.Using a smaller batch size
C.Adding more layers
D.L2 weight regularization
E.Increasing the learning rate
AnswersA, D

Dropout randomly deactivates neurons, reducing overfitting by preventing reliance on specific features.

Why this answer

Options A and B are correct. Dropout randomly drops neurons during training, preventing co-adaptation. L2 regularization adds a penalty on weights, discouraging complexity.

Option C, increasing learning rate, can hinder convergence. Option D, adding more layers, typically increases overfitting. Option E, smaller batch size, can have a regularizing effect but is not as direct or commonly cited as the primary techniques.

2
Multi-Selecteasy

A data scientist is preparing a dataset for a binary classification neural network. The dataset contains both numerical and categorical features, and some rows have identical entries. Which TWO preprocessing steps are most essential to improve model performance and avoid overfitting?

Select 2 answers
A.Removing duplicate records
B.Scaling numerical features to have zero mean and unit variance
C.Increasing the batch size
D.Applying PCA for dimensionality reduction
E.Using dropout regularization in the model
AnswersA, B

Duplicate records can cause the model to overfit to repeated patterns.

Why this answer

Removing duplicate records prevents the model from being biased toward repeated instances. Scaling numerical features to zero mean and unit variance ensures that features with larger ranges do not dominate the gradient updates, which is especially important for neural networks. Increasing batch size and dropout regularization are hyperparameter choices, not preprocessing steps, and PCA is not always essential.

3
MCQeasy

Refer to the exhibit. What is the recall of the model?

A.0.44
B.0.80
C.0.90
D.0.99
AnswerA

Recall = 400/(400+500) = 0.4444, so 0.44.

Why this answer

Recall = TP / (TP + FN) = 400 / (400 + 500) = 400 / 900 ≈ 0.444, which rounds to 0.44.

4
Multi-Selectmedium

Which TWO techniques are commonly used to handle missing data in a machine learning dataset? (Choose TWO.)

Select 2 answers
A.Normalization
B.Imputation with mean or median
C.Deletion of rows with missing values
D.One-hot encoding
E.Dimensionality reduction
AnswersB, C

Replacing missing values with mean/median is a common imputation method.

Why this answer

Imputation with mean or median is a standard technique for handling missing numerical data because it preserves the dataset size and avoids introducing bias from simply discarding rows. By replacing missing values with the central tendency of the observed data, the model can still learn patterns without losing information, though it may reduce variance slightly.

Exam trap

CompTIA often tests the distinction between data preprocessing techniques (like normalization and encoding) and actual missing data handling methods, so candidates mistakenly select normalization or one-hot encoding as solutions for missing values.

5
MCQeasy

A data scientist is building a classification model to detect fraudulent transactions. The dataset is highly imbalanced with only 1% fraudulent cases. Which approach should the scientist use to evaluate model performance most effectively?

A.F1 score
B.Accuracy
C.Recall
D.Precision
AnswerA

F1 score is the harmonic mean of precision and recall, providing a balanced measure for imbalanced datasets.

Why this answer

In highly imbalanced datasets like fraud detection (1% positive class), accuracy is misleading because a model that predicts all transactions as legitimate would achieve 99% accuracy yet fail to detect any fraud. The F1 score (harmonic mean of precision and recall) is the most effective metric because it balances both false positives and false negatives, providing a single score that reflects the model's ability to correctly identify the minority class without being skewed by class imbalance.

Exam trap

CompTIA often tests the misconception that accuracy is always the best metric for classification, but in imbalanced datasets, accuracy is a trap because it does not reflect performance on the minority class, leading candidates to overlook metrics like F1 score that directly address class imbalance.

How to eliminate wrong answers

Option B (Accuracy) is wrong because it is dominated by the majority class (99% legitimate transactions), so a trivial model that never predicts fraud can still achieve 99% accuracy, masking poor fraud detection performance. Option C (Recall) is wrong because it only measures the proportion of actual fraud cases correctly identified (true positives / (true positives + false negatives)), ignoring false positives; a model that flags every transaction as fraud would have perfect recall but be unusable in practice. Option D (Precision) is wrong because it only measures the proportion of predicted fraud cases that are actually fraud (true positives / (true positives + false positives)), ignoring false negatives; a model that makes very few fraud predictions but with high precision would miss many actual frauds, which is unacceptable in fraud detection.

6
MCQeasy

Refer to the exhibit. An AI developer implements the above neural network architecture for handwritten digit recognition. The model achieves 85% training accuracy and 83% test accuracy. Which modification is most likely to improve training accuracy?

A.Increase the dropout rate to 0.7
B.Increase the number of filters in the first Conv2D layer
C.Add another dense layer before the output
D.Remove the dropout layer
AnswerD

Dropout adds regularization; removing it can increase training accuracy, especially if the model is underfitting.

Why this answer

Removing the dropout layer reduces regularization, allowing the model to fit the training data better and increase training accuracy.

7
MCQmedium

An e-commerce company uses a gradient boosting model to forecast daily sales. Recently, the model's predictions have become less accurate, showing a significant drop in R-squared on validation data. The data scientist checks for data drift but finds no significant changes in feature distributions. The model was trained on data from the past 24 months and is retrained monthly. Upon inspecting the feature importance, the data scientist notices that the top feature 'promotion_flag' has decreased in importance over time. What is the most likely cause of the performance degradation, and what should be done?

A.The model is overfitting to historical promotions; apply more regularization
B.Concept drift has occurred; retrain the model more frequently with recent data only, or use an online learning approach
C.The model's hyperparameters need tuning; perform a grid search
D.The promotion_flag feature is leaking future information; remove it
AnswerB

Concept drift changes the relationship between features and target; frequent retraining adapts to new patterns.

Why this answer

Option A (overfitting to promotions) does not explain the drop over time. Option C (hyperparameter tuning) is unlikely to fix the temporal change. Option D (leakage) would have caused issues from the start.

Option B correctly identifies concept drift (changing relationship) and suggests retraining more frequently or using online learning to adapt.

8
Multi-Selecthard

A data scientist is evaluating a trained binary classification model. The model has high accuracy but the precision is low and recall is high. Which three actions are most appropriate to improve precision? (Choose three.)

Select 3 answers
A.Collect more training data for the minority class
B.Apply oversampling to the majority class
C.Increase the classification threshold
D.Use a different algorithm that penalizes false positives more
E.Decrease the classification threshold
AnswersA, C, D

More minority data helps the model learn better boundaries, often improving precision.

Why this answer

Increasing the classification threshold reduces false positives, using a different algorithm that penalizes false positives more, and collecting more data for the minority class can all improve precision.

9
MCQmedium

A deep learning model for sentiment analysis uses a softmax output layer. The hidden layers currently use tanh activation. Which activation function should replace tanh to mitigate vanishing gradients in deeper networks?

A.Sigmoid
B.Softmax
C.ReLU
D.Linear
AnswerC

ReLU is non-saturating and helps mitigate vanishing gradients.

Why this answer

ReLU does not saturate for positive values, helping avoid vanishing gradient issues common with tanh and sigmoid.

10
MCQhard

A data scientist is training a random forest model on a large dataset and notices that the model is overfitting. Which hyperparameter adjustment is most likely to reduce overfitting?

A.Increase the maximum features
B.Decrease the maximum depth of trees
C.Decrease the minimum samples split
D.Increase the number of trees
AnswerB

Shorter trees are less complex and generalize better.

Why this answer

Decreasing the maximum depth of trees reduces model complexity, preventing overfitting.

11
MCQhard

A company is building a computer vision system to detect defects in manufactured parts. They have 10,000 labeled images per class (defective and non-defective). They want to achieve high accuracy with limited computational resources. Which deep learning architecture and approach is most appropriate?

A.Train a custom CNN from scratch with many layers
B.Use a decision tree ensemble
C.Use a pre-trained VGG16 and fine-tune the last few layers
D.Use an RNN to process image sequences
AnswerC

Transfer learning with fine-tuning is efficient and effective for moderate datasets.

Why this answer

Transfer learning using a pre-trained CNN like VGG16, fine-tuning only the last few layers, leverages existing features and reduces training time and resource requirements.

12
MCQmedium

A team is developing a recommendation system for an e-commerce platform. They want to use collaborative filtering but are concerned about cold-start problems for new users. Which approach would best mitigate the cold-start problem?

A.Incorporate user demographic features as side information
B.Increase the number of latent factors in matrix factorization
C.Use a popularity-based baseline for all recommendations
D.Use only item-based collaborative filtering
AnswerA

Demographic features enable recommendations for cold-start users by using profile information.

Why this answer

Option A (Use only item-based collaborative filtering) still suffers cold-start for new items. Option C (Increase latent factors) does not address cold-start. Option D (Use a popularity baseline) lacks personalization.

Option B (Incorporate user demographic features as side information) allows recommendations even for new users by leveraging profile data.

13
MCQmedium

A team is reviewing a neural network model summary. The input layer expects 784 features (e.g., 28x28 images). How many parameters does the first dense layer have?

A.100,224
B.109,258
C.100,352
D.8,256
AnswerC

Calculated as (784 * 128) + 128 = 100,352, matching the exhibit.

Why this answer

The first dense layer has 784 input features and 128 output units (a common default). Each of the 784 inputs connects to each of the 128 neurons, giving 784 * 128 = 100,352 weight parameters, plus 128 bias parameters (one per neuron), for a total of 100,480 parameters. However, the question asks for the number of parameters in the dense layer itself, and the correct answer is 100,352, which corresponds to the weight parameters only, as biases are often listed separately or the layer uses no bias.

In typical Keras summaries, the parameter count for a Dense layer with bias is (input_dim * units) + units, but here the provided correct answer matches the weight count alone, indicating the model summary excludes biases or uses a bias-less configuration.

Exam trap

CompTIA often tests whether candidates remember to include bias parameters in the total count, but here the trap is that the correct answer matches the weight-only count, leading candidates to overcount by adding biases and selecting a wrong option like 100,480 (not listed) or miscalculating the product.

How to eliminate wrong answers

Option A (100,224) is wrong because it likely results from miscalculating the product of 784 and 128 as 100,224, which is off by 128 (the bias count), or from using an incorrect input dimension. Option B (109,258) is wrong because it does not correspond to any standard calculation for a dense layer with 784 inputs and 128 outputs; it may arise from mistakenly using 854 inputs or a different layer size. Option D (8,256) is wrong because it represents only the bias parameters if there were 128 units (128 * 64 = 8,192, close but not exact) or a confusion with the number of outputs squared, not the full parameter count.

14
MCQmedium

A data engineer is designing a pipeline to train a linear regression model on a dataset with 10 million rows and 50 features. The dataset fits in memory. Which approach should the engineer use to train the model efficiently?

A.Normal equation
B.Batch gradient descent
C.Principal component analysis
D.Stochastic gradient descent
AnswerD

SGD updates weights per sample, making it efficient for large datasets.

Why this answer

Stochastic gradient descent (SGD) is the most efficient approach for training a linear regression model on a dataset with 10 million rows and 50 features because it updates the model parameters using only one training example per iteration, leading to much faster convergence per epoch compared to batch methods. Since the dataset fits in memory, SGD can still be implemented efficiently without the overhead of loading data in batches from disk, and it scales well to large datasets where the normal equation or batch gradient descent would be computationally prohibitive.

Exam trap

CompTIA often tests the misconception that the normal equation is always the best for small feature sets, but the trap here is that candidates overlook the massive computational cost of the O(n * f^2) matrix multiplication when n is large (10 million rows), even though f is small (50 features).

How to eliminate wrong answers

Option A is wrong because the normal equation requires computing (X^T X)^{-1} X^T y, which involves inverting a 50x50 matrix (feasible) but also computing X^T X, which is O(n * f^2) = 10 million * 2500 = 25 billion operations, making it extremely slow and memory-intensive for 10 million rows. Option B is wrong because batch gradient descent processes the entire 10-million-row dataset in each iteration, requiring O(n * f) = 500 million operations per epoch, which is computationally expensive and converges slowly compared to SGD. Option C is wrong because principal component analysis (PCA) is a dimensionality reduction technique used for feature reduction or visualization, not a method for training a linear regression model; it does not perform parameter optimization.

15
MCQhard

A media company uses a natural language processing (NLP) model to classify news articles into topics. The model was trained on articles from 2015-2018. In 2023, the model's F1 score drops significantly. The data scientists find that the word embeddings no longer capture the meaning of some terms (e.g., 'covid', 'metaverse'). The model uses static word embeddings (Word2Vec) trained on the original corpus. Which solution BEST addresses the observed degradation? A. Replace static embeddings with contextual embeddings from a transformer model like BERT, then fine-tune the classifier. B. Retrain the static Word2Vec embeddings on a larger corpus from 2023. C. Apply data augmentation to the original training data by replacing words with synonyms. D. Increase the dimensionality of the static embeddings.

A.Retrain the static Word2Vec embeddings on a larger corpus from 2023.
B.Increase the dimensionality of the static embeddings.
C.Replace static embeddings with contextual embeddings from a transformer model like BERT, then fine-tune the classifier.
D.Apply data augmentation to the original training data by replacing words with synonyms.
AnswerC

Contextual embeddings dynamically represent words based on context, handling semantic shift effectively.

Why this answer

Option A is correct. Contextual embeddings (e.g., BERT) capture meaning based on context, adapting to new uses of words like 'covid' meaning pandemic. Fine-tuning the classifier on new data would update the model.

Option B (retraining static embeddings) might capture new word senses but still assigns a single vector per word, missing context. Option C (data augmentation) does not introduce new word meanings. Option D (increasing dimensionality) does not address the semantic shift.

16
Multi-Selecteasy

A machine learning engineer is preparing to train a deep neural network for image classification. To avoid overfitting, which TWO techniques should the engineer apply? (Select TWO.)

Select 2 answers
A.Use dropout regularization.
B.Use data augmentation.
C.Increase the number of layers.
D.Remove all non-linear activation functions.
E.Reduce the training dataset size.
AnswersA, B

Dropout is a regularization technique that helps prevent overfitting by randomly dropping units.

Why this answer

Options B and D are correct. Dropout regularization randomly drops neurons during training, preventing co-adaptation. Data augmentation increases the effective size of the training set by applying transformations, reducing overfitting.

Option A (increasing layers) increases model capacity and may worsen overfitting. Option C (removing non-linear activation) reduces model expressiveness, leading to underfitting. Option E (reducing dataset size) would increase overfitting risk.

17
MCQhard

A machine learning engineer is troubleshooting a recurrent neural network that fails to learn long-range dependencies in sequential data. The gradients are computed using backpropagation through time. Which phenomenon is most likely occurring, and what architectural change would best address it?

A.Underfitting; increase the number of time steps
B.Vanishing gradients; use LSTM or GRU units
C.Exploding gradients; apply gradient clipping
D.Overfitting; reduce the number of layers
AnswerB

Vanishing gradients prevent learning long-range patterns; LSTMs and GRUs have gating mechanisms to preserve gradients.

Why this answer

Option A (Exploding gradients) causes unstable training; gradient clipping helps but not for long-range dependencies. Option C (Overfitting) would not specifically affect long-range learning. Option D (Underfitting) is too general.

Option B correctly identifies vanishing gradients and suggests LSTMs or GRUs, which maintain long-term memory.

18
MCQeasy

A data scientist trains a linear regression model on housing prices. The training error is low, but test error is high. What is the most likely issue?

A.Overfitting
B.Multicollinearity
C.Data leakage
D.Underfitting
AnswerA

Correct: low training error and high test error is classic overfitting.

Why this answer

Option A is correct because overfitting occurs when the model fits the training data too well but performs poorly on unseen data. Options B, C, and D are incorrect because underfitting would show high training error, data leakage would cause artificially high performance, and multicollinearity affects coefficient interpretation but not necessarily test error.

19
MCQhard

A data scientist is training a convolutional neural network (CNN) for object detection. The training loss decreases rapidly but then plateaus at a high value, and the validation loss starts increasing. Which action should the scientist take to improve the model?

A.Increase the learning rate
B.Increase the number of epochs
C.Reduce the model complexity
D.Add more convolutional layers
AnswerC

Reducing complexity (e.g., fewer layers) can reduce overfitting and improve validation performance.

Why this answer

The training loss decreasing rapidly then plateauing at a high value while validation loss increases is classic overfitting. Reducing model complexity (Option C) directly addresses overfitting by decreasing the number of parameters or applying regularization (e.g., dropout, L2), which forces the network to learn more generalizable features rather than memorizing noise in the training data.

Exam trap

CompTIA often tests the misconception that high training loss plateau means underfitting or insufficient learning, leading candidates to increase model complexity or epochs, when the real issue is overfitting indicated by the validation loss increase.

How to eliminate wrong answers

Option A is wrong because increasing the learning rate would likely cause the loss to oscillate or diverge, not fix the plateau or overfitting; it addresses convergence speed, not generalization. Option B is wrong because increasing epochs would continue training on an already overfitting model, worsening the validation loss divergence. Option D is wrong because adding more convolutional layers increases model complexity, which would exacerbate overfitting by adding more parameters to memorize training data.

20
MCQeasy

A healthcare startup is developing a diagnostic system using medical images. The team has collected 10,000 labeled images of skin lesions. They plan to train a convolutional neural network (CNN) from scratch. However, training converges slowly, and the validation accuracy plateaus at 70%. The data scientist suspects overfitting. The dataset contains 8,000 images of benign lesions and 2,000 of malignant. The team has limited GPU resources. Which of the following is the MOST effective course of action to improve validation accuracy? A. Reduce the number of convolutional layers. B. Apply transfer learning using a pre-trained model on ImageNet. C. Increase the learning rate by a factor of 10. D. Add more dropout after every convolutional layer.

A.Increase the learning rate by a factor of 10.
B.Reduce the number of convolutional layers.
C.Add more dropout after every convolutional layer.
D.Apply transfer learning using a pre-trained model on ImageNet.
AnswerD

Transfer learning provides a strong feature extractor learned from a large dataset, which can significantly improve performance with limited data.

Why this answer

Option B is correct. Transfer learning leverages a model pre-trained on a large dataset (e.g., ImageNet), which provides useful features for medical images and reduces the need for large amounts of data and computational resources. It is particularly effective when the dataset is small and imbalanced.

Option A (reducing layers) may reduce capacity and underfit. Option C (increasing learning rate) might cause divergence or overshoot minima. Option D (adding dropout) can help with overfitting but is unlikely to jump from 70% to a significantly higher accuracy given limited data; transfer learning provides a stronger boost.

21
MCQeasy

A company uses linear regression to predict sales based on advertising spend. The model's residuals show a pattern of increasing variance as spend increases. Which assumption of linear regression is violated?

A.Normality
B.Homoscedasticity
C.Linearity
D.Independence
AnswerB

Homoscedasticity requires constant variance of residuals; increasing variance violates it.

Why this answer

Option D is correct because homoscedasticity assumes constant variance of residuals; increasing variance indicates heteroscedasticity. Option A is incorrect because linearity is about the relationship, not residual variance. Option B is incorrect because independence refers to errors being independent.

Option C is incorrect because normality is about the distribution of residuals, not variance.

22
MCQhard

A company uses a neural network for fraud detection. The dataset has 99% legitimate, 1% fraudulent. The model achieves 99% accuracy but fails to detect most frauds. Which metric should they focus on?

A.Precision
B.F1-score
C.Recall
D.AUC-ROC
AnswerC

Correct: Recall measures the proportion of actual frauds that are correctly identified.

Why this answer

Option B is correct because recall measures the ability to find all positive samples (frauds), which is critical in fraud detection. Options A, C, and D are incorrect: precision is important but not as crucial as recall in this imbalanced scenario, F1-score balances precision and recall but recall directly addresses the issue, and AUC-ROC is not as intuitive for this specific problem.

23
MCQmedium

While training a deep neural network, the loss function fails to converge and oscillates wildly. Which adjustment is most likely to stabilize training?

A.Increase the number of hidden layers
B.Decrease the batch size
C.Reduce the learning rate
D.Use a test set
AnswerC

Lower learning rate reduces step size, stabilizing training.

Why this answer

When the loss function oscillates wildly and fails to converge, it typically indicates that the learning rate is too high, causing the optimizer to overshoot the minima. Reducing the learning rate allows the gradient descent updates to take smaller, more stable steps, which helps the loss converge smoothly. This is a fundamental hyperparameter tuning step in deep learning training.

Exam trap

CompTIA often tests the misconception that increasing model complexity (more layers) or using more data (test set) directly fixes training instability, when in fact the learning rate is the primary culprit for oscillation and non-convergence.

How to eliminate wrong answers

Option A is wrong because increasing the number of hidden layers adds more parameters and non-linearity, which can exacerbate instability and overfitting, not stabilize training. Option B is wrong because decreasing the batch size increases the variance in gradient estimates, which often leads to noisier updates and can worsen oscillation, not reduce it. Option D is wrong because using a test set is for evaluating generalization performance after training, not for stabilizing the training process itself.

24
MCQhard

Refer to the exhibit. The training pod is using 2 GPUs. During training, the GPU utilization is only 30% each. What is the most likely cause?

A.The learning rate is too high
B.The image is missing CUDA libraries
C.The number of epochs is too high
D.The batch size is too small to fully utilize GPUs
AnswerD

Small batch size leads to low compute-to-overhead ratio, underutilizing GPU resources.

Why this answer

Option A is correct because a batch size of 32 is small for two GPUs, leading to underutilization as GPUs spend time on kernel launches and synchronization. Option B is incorrect because learning rate does not directly impact GPU utilization. Option C is incorrect because number of epochs does not affect utilization per step.

Option D is incorrect because missing CUDA would cause errors, not low utilization.

25
MCQhard

An organization has a dataset with categorical features having high cardinality (e.g., ZIP codes). They plan to use a tree-based model. Which encoding method is most appropriate?

A.Label encoding
B.One-hot encoding
C.Target encoding (mean encoding)
D.Frequency encoding
AnswerC

Target encoding maps categories to the mean target, preserving predictive information compactly.

Why this answer

Option B is correct because target encoding replaces categories with the mean target value, capturing predictive power without explosion of features. Option A is incorrect because one-hot encoding creates many sparse features, inefficient for trees. Option C is incorrect because label encoding implies ordinality, which is misleading.

Option D is incorrect because frequency encoding may lose target relationship.

26
MCQeasy

A data scientist is training a binary classification model to detect fraudulent transactions. The dataset is highly imbalanced with 99% legitimate and 1% fraudulent. Which evaluation metric should be prioritized to assess model performance?

A.Accuracy
B.F1-score
C.Mean Squared Error
D.Log Loss
AnswerB

F1-score balances precision and recall, making it ideal for imbalanced classification.

Why this answer

Option A (Accuracy) is misleading because a model that always predicts 'legitimate' would achieve 99% accuracy but fail to detect fraud. Option C (Mean Squared Error) is for regression, not classification. Option D (Log Loss) can be used but is less interpretable for imbalanced data.

Option B (F1-score) balances precision and recall, making it ideal for imbalanced datasets.

27
MCQeasy

A team is implementing a machine learning pipeline to classify images for a defect detection system. They are considering using a pre-trained convolutional neural network (CNN) and fine-tuning it on their small dataset. What is the primary advantage of transfer learning in this scenario?

A.It ensures the model is not biased toward the original dataset
B.It eliminates the need for data preprocessing
C.It allows the model to leverage learned features from a large dataset, reducing training time and required data
D.It reduces the risk of overfitting by using a larger model
AnswerC

Transfer learning uses features from a large dataset, so fine-tuning requires less data and time.

Why this answer

Option C is correct because transfer learning leverages features learned from a large dataset, enabling effective training with a small dataset and reducing training time. Option A is incorrect because pre-trained models are often smaller, not larger. Option B is incorrect because preprocessing is still needed.

Option D is incorrect because the model may retain biases from the original dataset.

28
MCQmedium

An organization wants to automate the detection of defective products on an assembly line using computer vision. They have a limited number of labeled images for defective items. Which approach would be most effective?

A.Use a support vector machine with handcrafted features
B.Train a convolutional neural network from scratch on the limited data
C.Synthesize additional defective images using GANs
D.Use transfer learning with a pre-trained model like ResNet and fine-tune on the defect data
AnswerD

Transfer learning leverages knowledge from large datasets and fine-tunes on small data.

Why this answer

Option A (Train CNN from scratch) requires large datasets. Option C (SVM with handcrafted features) is less effective for image data. Option D (GAN synthesis) is complex and may not guarantee improvement.

Option B (Transfer learning) leverages pre-trained models and fine-tuning, ideal for small datasets.

29
MCQhard

Refer to the exhibit. A data scientist is training a binary classifier. Based on the training log, which problem is the model experiencing?

A.Underfitting
B.Data leakage
C.Overfitting
D.Vanishing gradient
AnswerC

Training loss decreases while validation loss increases, a classic sign of overfitting.

Why this answer

The training log shows that the model's training accuracy continues to improve while the validation accuracy plateaus or degrades after a certain number of epochs. This divergence between training and validation performance is the hallmark of overfitting, where the model memorizes the training data noise rather than learning generalizable patterns.

Exam trap

CompTIA often tests the distinction between overfitting and underfitting by showing a training log where training accuracy is high but validation accuracy is low, leading candidates to mistakenly think the model is underfitting because validation performance is poor.

How to eliminate wrong answers

Option A is wrong because underfitting would show poor performance on both training and validation sets, not the divergence seen here. Option B is wrong because data leakage typically causes unrealistically high performance on both sets or sudden jumps in metrics, not a gradual divergence after convergence. Option D is wrong because vanishing gradient affects deep networks by causing gradients to approach zero, preventing weight updates and stalling training, which would manifest as flat loss curves, not the overfitting pattern observed.

30
MCQeasy

A hospital wants to deploy a machine learning model to predict patient readmission risk within 30 days. They have a dataset with 10,000 records, 70 features including demographics, lab results, and past admissions. The target variable is binary (readmitted or not). The data scientist trains a logistic regression model and achieves an AUC of 0.85 on the test set. However, the hospital's clinicians require interpretability of predictions to trust the model. Which action should the data scientist take to ensure the model meets the interpretability requirement while maintaining performance?

A.Reduce the number of features to 10 using PCA and retrain the logistic regression
B.Replace logistic regression with a random forest model and use feature importance plots
C.Train a deep neural network and apply LIME or SHAP for explanations
D.Use the logistic regression model as is, since it is inherently interpretable with coefficients
AnswerD

Logistic regression coefficients provide direct interpretability for each feature.

Why this answer

Option A (random forest) offers feature importance but is less interpretable. Option C (deep neural network with LIME/SHAP) adds complexity and may reduce transparency. Option D (PCA and retrain) loses information and may degrade performance.

Option B (keep logistic regression) provides inherent interpretability through coefficients, meeting requirements without sacrificing performance.

31
Multi-Selectmedium

Which THREE are common activation functions used in neural networks? (Choose three.)

Select 3 answers
A.Sigmoid
B.K-means
C.Tanh
D.ReLU
E.Softmax
AnswersA, C, D

Correct: Sigmoid is a classic activation function.

Why this answer

Options A, B, and C are correct because Sigmoid, ReLU, and Tanh are widely used activation functions. Options D and E are incorrect: Softmax is used for output layer in multi-class classification, but it is not typically considered a 'common' activation function in hidden layers, and K-means is a clustering algorithm.

32
MCQeasy

A data scientist needs to predict whether a customer will churn based on historical data containing features like account age, monthly charges, and support tickets. The target variable is binary (churn or not). Which type of machine learning algorithm should be used?

A.Linear regression
B.Logistic regression
C.K-means clustering
D.Principal component analysis
AnswerB

Logistic regression outputs probabilities for binary classification.

Why this answer

Logistic regression is a classification algorithm well-suited for binary outcomes. Linear regression is for continuous outputs, K-means is unsupervised, and PCA is dimensionality reduction.

33
MCQhard

A deep learning model for natural language processing uses a recurrent neural network (RNN) to process long sequences. The gradients vanish after many time steps. Which architectural change is most effective to mitigate this problem?

A.Add dropout regularization
B.Use a larger learning rate
C.Replace the RNN cells with Long Short-Term Memory (LSTM) units
D.Increase the number of hidden layers
AnswerC

LSTM's gating structure preserves gradients over long sequences.

Why this answer

Option C is correct because LSTMs have gating mechanisms that allow gradients to flow longer, mitigating vanishing gradients. Option A is incorrect because more layers can exacerbate vanishing. Option B is incorrect because a larger learning rate may cause instability.

Option D is incorrect because dropout addresses overfitting, not vanishing gradients.

34
Multi-Selecthard

A company is deploying a machine learning model that predicts customer churn. The model currently has high variance. Which THREE actions should the data scientist take to reduce variance? (Select THREE.)

Select 3 answers
A.Reduce model complexity (e.g., fewer features, simpler model).
B.Use regularization.
C.Add more training data.
D.Remove outliers from the training data.
E.Increase model complexity.
AnswersA, B, C

Simpler models have lower variance.

Why this answer

Options B, C, and D are correct. Reducing model complexity (B) (e.g., fewer features, simpler model) directly limits variance. Adding more training data (C) helps the model learn a more general pattern, reducing variance.

Regularization (D) penalizes large weights, controlling model complexity. Option A (increasing complexity) would increase variance. Option E (removing outliers) can sometimes reduce variance but is not a standard or primary technique; it may also reduce bias but is less reliable.

35
MCQmedium

A data scientist is training a deep neural network for sentiment analysis. The training loss decreases steadily but the validation loss starts to increase after 10 epochs. What is the most likely cause and best corrective action?

A.Underfitting; increase model complexity
B.Vanishing gradients; use ReLU activation
C.Data leakage; shuffle data before splitting
D.Overfitting; apply dropout and early stopping
AnswerD

Validation loss increasing while training loss decreases is classic overfitting; dropout regularizes and early stopping halts training.

Why this answer

Option A (Underfitting) would show high training loss, not decreasing training loss. Option C (Vanishing gradients) would cause training loss to plateau slowly. Option D (Data leakage) often shows suspiciously high performance.

Option B correctly identifies overfitting and suggests dropout and early stopping.

36
Multi-Selectmedium

Which TWO techniques are commonly used to prevent overfitting in deep neural networks?

Select 2 answers
A.Using a larger learning rate
B.Dropout
C.L1 regularization
D.Early stopping
E.Increasing the number of layers
AnswersB, D

Dropout randomly drops neurons during training, reducing overfitting.

Why this answer

Dropout is a regularization technique that randomly drops a fraction of neurons during training, which prevents the network from relying too heavily on any single neuron and forces it to learn more robust features. This reduces overfitting by introducing noise and effectively training an ensemble of sub-networks.

Exam trap

CompTIA often tests the distinction between regularization techniques that reduce overfitting (like dropout and early stopping) versus hyperparameters or architectural changes that increase model capacity (like larger learning rates or more layers), which candidates mistakenly think help with overfitting.

37
MCQeasy

Refer to the exhibit. The training log shows losses and accuracies over 5 epochs. What is the most likely problem?

A.Data leakage
B.Overfitting
C.Underfitting
D.Vanishing gradient
AnswerB

Overfitting is indicated by decreasing training loss and increasing validation loss.

Why this answer

Option B is correct because training loss decreases while validation loss increases, a classic sign of overfitting. Option A is incorrect because underfitting would show high losses on both sets. Option C is incorrect because vanishing gradient affects training loss progression, not divergence.

Option D is incorrect because data leakage typically causes both sets to perform well.

38
MCQhard

An e-commerce company deploys a deep learning model for product recommendation. After a new data pipeline is implemented, the model's online performance degrades: recall drops by 20% and the click-through rate decreases. The data scientists suspect data drift. They compare the distribution of the input features between the training data and recent production data. The Kolmogorov-Smirnov test shows significant differences for two numerical features (price and rating). The team also notices that the frequency of categorical feature 'category' has changed. Which of the following is the MOST appropriate first step? A. Immediately retrain the model on all available data including new production data. B. Roll back to the previous data pipeline and investigate the root cause of drift. C. Use feature selection to remove the drifting features and retrain. D. Implement a monitoring dashboard to track drift over time and set up alerts.

A.Implement a monitoring dashboard to track drift over time and set up alerts.
B.Roll back to the previous data pipeline and investigate the root cause of drift.
C.Use feature selection to remove the drifting features and retrain.
D.Immediately retrain the model on all available data including new production data.
AnswerB

Rolling back restores the previous stable distribution; investigating the root cause prevents recurrence.

Why this answer

Option B is correct. Since the drift occurred after a pipeline change, rolling back and investigating the root cause is the most prudent first step before making model changes. Retraining on drifted data (A) might incorporate a faulty distribution.

Removing drifting features (C) could lose important information and may not fully address the issue. Implementing monitoring (D) is useful for long-term but does not address the immediate degradation.

39
MCQmedium

Refer to the exhibit. What is the most likely issue and what action should be taken?

A.Learning rate is too low; increase it
B.Underfitting; increase model complexity
C.Overfitting; apply early stopping around epoch 15
D.Data imbalance; use class weights
AnswerC

Validation loss starts rising after epoch 15; early stopping halts training at that point.

Why this answer

The training loss continues to decrease while validation loss increases after epoch 20, indicating overfitting. Early stopping around epoch 15 would prevent this.

40
Multi-Selecthard

Which TWO are key differences between Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)?

Select 2 answers
A.CNNs are designed for sequential data; RNNs for spatial data
B.RNNs have internal memory; CNNs do not
C.CNNs can handle variable-length inputs; RNNs require fixed-size inputs
D.CNNs use backpropagation; RNNs do not
E.CNNs use weight sharing across spatial dimensions; RNNs share weights across time steps
AnswersB, E

RNNs maintain a hidden state for temporal memory; CNNs are feedforward.

Why this answer

CNNs share weights across spatial dimensions via convolution filters, while RNNs share weights across time steps. RNNs have internal memory (hidden state) that captures temporal dependencies; CNNs lack inherent memory.

41
MCQeasy

A data scientist is training a neural network to classify images of handwritten digits. The model achieves 99% accuracy on training data but only 85% on validation data. Which technique should the scientist apply first to address this issue?

A.Remove one or more hidden layers from the network
B.Increase the number of training epochs
C.Apply L2 regularization to the network weights
D.Add more features to the input data
AnswerC

L2 regularization penalizes large weights and reduces overfitting.

Why this answer

The model shows high training accuracy (99%) but lower validation accuracy (85%), which is a classic sign of overfitting. L2 regularization (option C) adds a penalty term to the loss function proportional to the squared magnitude of the weights, discouraging the network from learning overly complex patterns that do not generalize. This directly addresses overfitting without reducing the model's capacity too aggressively.

Exam trap

CompTIA often tests the distinction between overfitting and underfitting, and the trap here is that candidates may confuse increasing epochs (option B) as a solution to low validation accuracy, when in fact it exacerbates overfitting in this scenario.

How to eliminate wrong answers

Option A is wrong because removing hidden layers reduces the model's capacity, which may underfit and does not specifically target the overfitting problem; the network already has sufficient capacity to memorize the training data. Option B is wrong because increasing the number of training epochs would likely worsen overfitting by allowing the model to further memorize noise in the training data, not improve validation performance. Option D is wrong because adding more features to the input data (e.g., additional pixel-level transformations) would increase the dimensionality and risk of overfitting, not reduce it, and is not a standard technique for addressing overfitting in neural networks.

42
MCQhard

A company deploys a deep learning model for real-time object detection in autonomous vehicles. The model was trained on high-end GPUs but needs to run on edge devices with limited computational resources. Which technique is most effective for reducing model size and inference latency while maintaining acceptable accuracy?

A.Hyperparameter tuning
B.Batch normalization
C.Dropout
D.Quantization
AnswerD

Quantization reduces numerical precision, shrinking model size and improving inference speed.

Why this answer

Quantization reduces the precision of model weights (e.g., from 32-bit to 8-bit), significantly decreasing model size and speeding up inference with minimal accuracy loss.

43
MCQmedium

Refer to the exhibit. A data scientist is training a neural network and observes the training log above. What is the most likely cause?

A.The model is overfitting
B.The model is underfitting
C.The batch size is too large
D.The learning rate is too high
AnswerD

High learning rate causes the optimizer to overshoot minima, leading to divergence.

Why this answer

The loss is increasing and accuracy decreasing, indicating divergence, which is typically caused by a learning rate that is too high.

44
Multi-Selectmedium

Which THREE are common activation functions used in neural networks? (Choose THREE.)

Select 3 answers
A.ReLU
B.Softmax
C.Sigmoid
D.Linear
E.Tanh
AnswersA, C, E

Rectified Linear Unit is widely used in hidden layers.

Why this answer

ReLU (Rectified Linear Unit) is a common activation function in neural networks because it introduces non-linearity while being computationally efficient. It outputs the input directly if positive, otherwise zero, which helps mitigate the vanishing gradient problem compared to sigmoid or tanh. This makes it a default choice for hidden layers in many deep learning architectures.

Exam trap

CompTIA often tests the distinction between activation functions used in hidden layers versus output layers, so candidates mistakenly select Softmax as a general activation function when it is only appropriate for the final layer in classification tasks.

45
Multi-Selecteasy

Which TWO are evaluation metrics for classification problems? (Choose two.)

Select 2 answers
A.Precision
B.Mean Absolute Error
C.R-squared
D.Mean Squared Error
E.Recall
AnswersA, E

Correct: Precision is a classification metric.

Why this answer

Options B and D are correct because Precision and Recall are classification metrics. Options A, C, and E are incorrect: Mean Squared Error and Mean Absolute Error are regression metrics, and R-squared is also for regression.

46
MCQeasy

A machine learning engineer needs to choose an algorithm for grouping customers into segments based on purchasing behavior without any labels. Which algorithm should the engineer use?

A.K-means clustering
B.Random forest classifier
C.Linear regression
D.Support vector machine
AnswerA

K-means is unsupervised and groups data based on feature similarity.

Why this answer

K-means clustering is an unsupervised algorithm that partitions data into K clusters based on similarity.

47
MCQhard

A financial institution is developing a fraud detection model using historical transaction data. The dataset contains over 10 million records, but only 0.01% of transactions are fraudulent. The current model uses a neural network trained with standard cross-entropy loss, and the team applies random undersampling of the majority class to create a balanced training set. However, the model still produces a high number of false positives (legitimate transactions flagged as fraud) and misses approximately 30% of actual fraud cases. The business requires that at least 95% of frauds be caught, and the false positive rate must be below 1% to avoid overwhelming fraud analysts. The team has limited resources to collect additional data and cannot change the model architecture significantly. Which approach should the team take to best meet the business requirements?

A.Use cost-sensitive learning by assigning a higher misclassification cost to the fraud class.
B.Apply feature selection to remove noisy predictors and then retrain the current model.
C.Switch to an anomaly detection algorithm such as Isolation Forest or One-Class SVM.
D.Collect more transaction data, especially fraudulent examples, to naturally balance the classes.
AnswerA

This directly penalizes false negatives more, encouraging the model to catch more frauds while maintaining a low false positive rate through tuning.

Why this answer

Cost-sensitive learning adjusts the loss function to penalize false negatives more heavily, directly addressing the need to catch more frauds while controlling false positives. Collecting more data is impractical and may not resolve the imbalance. Anomaly detection models treat fraud as outliers but often have high false positive rates in this context.

Feature selection does not inherently solve the imbalance or performance metric trade-off.

48
MCQeasy

A team is deploying a deep learning model for real-time image classification on edge devices with limited computational resources. Which technique would best help reduce model size and inference time without significant accuracy loss?

A.Data augmentation
B.Model pruning and quantization
C.Transfer learning
D.Ensemble learning
AnswerB

Pruning removes redundant weights and quantization reduces precision, decreasing model size and speeding up inference.

Why this answer

Option A (Data augmentation) improves generalization but does not reduce model size. Option B (Transfer learning) can reduce training time but not necessarily inference time or model size. Option D (Ensemble learning) increases both size and inference time.

Option C (Model pruning and quantization) directly reduces model size and speeds up inference.

49
Multi-Selecteasy

Which TWO techniques are commonly used for feature scaling? (Choose two.)

Select 2 answers
A.Standardization
B.One-hot encoding
C.Min-Max scaling
D.Normalization
E.PCA
AnswersA, C

Correct: Centers features to mean 0 and standard deviation 1.

Why this answer

Options A and B are correct because Min-Max scaling and standardization are standard feature scaling methods. Options C, D, and E are incorrect: PCA is dimensionality reduction, one-hot encoding is for categorical variables, and normalization is often synonymous with scaling but here it is less specific.

50
MCQmedium

A team is training a convolutional neural network (CNN) for medical image diagnosis. They have a limited dataset of 500 labeled images. Which strategy is most effective to improve model generalization?

A.Increasing network depth
B.Data augmentation
C.Using a larger batch size
D.Reducing the number of filters
AnswerB

Augmentation (e.g., rotation, flip) generates more training examples, improving generalization.

Why this answer

Data augmentation artificially increases the size and diversity of the training set by applying transformations, reducing overfitting.

51
MCQmedium

A machine learning engineer is building a spam filter. The dataset contains 10,000 emails, of which 1,000 are spam. The engineer decides to use a Random Forest classifier. Which preprocessing step is most critical to ensure the model generalizes well to new, unseen emails?

A.Apply Principal Component Analysis (PCA) to reduce dimensionality
B.Normalize the numerical features to have zero mean and unit variance
C.Split the data into training and testing sets before any other preprocessing
D.Encode all features using one-hot encoding
AnswerC

Splitting first prevents data leakage and ensures realistic evaluation.

Why this answer

Option C is correct because splitting the data into training and testing sets before any other preprocessing prevents data leakage. If preprocessing like normalization or PCA is applied to the entire dataset first, the test set information influences the training process, leading to overly optimistic performance estimates and poor generalization to new, unseen emails.

Exam trap

CompTIA often tests the concept of data leakage by presenting preprocessing steps that seem harmless but actually incorporate test set information, tricking candidates into thinking scaling or dimensionality reduction is always necessary for tree-based models.

How to eliminate wrong answers

Option A is wrong because PCA is an unsupervised dimensionality reduction technique that, if applied before splitting, would leak information from the test set into the training set, and Random Forest is robust to high-dimensional sparse data, making PCA unnecessary for generalization. Option B is wrong because Random Forest is a tree-based ensemble method that is invariant to monotonic transformations and does not require feature scaling; normalizing before splitting would also risk data leakage if done on the full dataset. Option D is wrong because one-hot encoding is only relevant for categorical features, and applying it before splitting could introduce data leakage if the encoding uses levels present only in the test set; moreover, not all features in an email dataset are categorical, and Random Forest can handle label encoding without one-hot encoding.

52
Multi-Selecthard

Which TWO are valid techniques to reduce overfitting in a deep neural network? (Choose TWO.)

Select 2 answers
A.Increase batch size
B.Increase learning rate
C.L2 regularization
D.Gradient clipping
E.Dropout
AnswersC, E

L2 regularization adds a penalty for large weights, discouraging complex models.

Why this answer

L2 regularization (option C) is a valid technique to reduce overfitting by adding a penalty term proportional to the square of the weight magnitudes to the loss function. This discourages the network from learning overly complex patterns, effectively shrinking weights and improving generalization. Dropout (option E) randomly drops a fraction of neurons during training, which prevents co-adaptation of features and forces the network to learn more robust representations, also reducing overfitting.

Exam trap

CompTIA often tests the distinction between techniques that improve training stability (like gradient clipping or adjusting batch size/learning rate) versus those that directly regularize the model to reduce overfitting (like L2 regularization and dropout), leading candidates to confuse optimization tricks with regularization methods.

53
MCQeasy

A team is building a recommendation system using collaborative filtering. They have a sparse user-item matrix. Which technique should they use to handle the sparsity and improve recommendations?

A.Association rule mining
B.Matrix factorization
C.k-nearest neighbors
D.Content-based filtering
AnswerB

Matrix factorization reduces dimensionality and captures latent features, effectively handling sparsity.

Why this answer

Matrix factorization (B) is the correct technique because it decomposes the sparse user-item matrix into lower-dimensional latent factor matrices, effectively capturing underlying patterns and filling in missing entries. This directly addresses sparsity by learning dense representations that generalize beyond observed interactions, which is a core strength in collaborative filtering for recommendation systems.

Exam trap

CompTIA often tests the misconception that k-nearest neighbors (k-NN) is the go-to for collaborative filtering, but candidates fail to recognize that k-NN's performance collapses under high sparsity, whereas matrix factorization explicitly models latent factors to overcome this.

How to eliminate wrong answers

Option A is wrong because association rule mining (e.g., Apriori algorithm) is designed for market basket analysis to find frequent itemsets and rules, not for handling sparse user-item matrices in collaborative filtering; it fails to generalize from sparse data and does not model latent factors. Option C is wrong because k-nearest neighbors (k-NN) is a memory-based collaborative filtering method that relies on direct similarity computations between users or items, which degrades severely with high sparsity due to lack of overlapping ratings, leading to poor recommendations. Option D is wrong because content-based filtering uses item features (e.g., genre, keywords) to recommend similar items, not the user-item interaction matrix; it does not address sparsity in collaborative filtering and ignores collaborative signals from other users.

54
Multi-Selecteasy

Which TWO of the following are common activation functions used in deep neural networks?

Select 2 answers
A.Linear Regression
B.Support Vector Machine
C.K-means
D.ReLU
E.Sigmoid
AnswersD, E

ReLU is the most common activation for hidden layers.

Why this answer

Sigmoid and ReLU are widely used activation functions. Support Vector Machine is a classifier, not an activation. K-means is a clustering algorithm.

Linear regression is a model, not an activation function.

55
MCQeasy

A machine learning engineer has a dataset of 100,000 records. She splits it into 70% training, 15% validation, and 15% test sets. After training, the model achieves 95% accuracy on training and 85% on validation. What does the accuracy difference most likely indicate?

A.The validation set is too small
B.The model generalizes well
C.The model is overfitting
D.The test set should be larger
AnswerC

Overfitting explains high training accuracy and lower validation accuracy.

Why this answer

A wide gap between training and validation accuracy is a classic sign of overfitting, where the model memorizes training data but fails to generalize.

56
MCQeasy

A data analyst wants to predict housing prices based on square footage, number of bedrooms, and location. Which machine learning approach is most suitable?

A.K-means clustering
B.Decision tree regression
C.Association rule mining
D.Linear regression
AnswerD

Linear regression models the linear relationship between input features and a continuous output.

Why this answer

Linear regression is a simple and interpretable model for predicting a continuous target variable like housing price.

57
MCQhard

A deep learning model for image classification is overfitting the training data. The team has already tried data augmentation and dropout. Which additional technique should they implement to reduce overfitting?

A.Batch normalization
B.Increase number of epochs
C.Gradient clipping
D.Early stopping
AnswerD

Early stopping monitors validation loss and stops training when it starts to increase, reducing overfitting.

Why this answer

Early stopping (Option D) is the correct additional technique because it halts training when validation performance stops improving, directly preventing the model from memorizing noise in the training data. Since data augmentation and dropout are already in use, early stopping provides a complementary regularization effect by limiting the number of training iterations before overfitting occurs.

Exam trap

CompTIA often tests the distinction between techniques that address overfitting versus those that solve optimization issues, leading candidates to confuse batch normalization or gradient clipping as overfitting solutions when they are not.

How to eliminate wrong answers

Option A is wrong because batch normalization primarily accelerates training and stabilizes learning by normalizing layer inputs, but it does not directly reduce overfitting—it can even have a slight regularizing effect, but it is not a primary overfitting countermeasure. Option B is wrong because increasing the number of epochs would exacerbate overfitting by giving the model more opportunities to memorize training data, making the problem worse. Option C is wrong because gradient clipping is used to prevent exploding gradients in deep networks, especially in RNNs, and does not address overfitting from excessive model capacity or insufficient regularization.

58
MCQhard

A healthcare startup is developing a deep learning model to detect diabetic retinopathy from retinal fundus images. The dataset contains 50,000 images, but only 5% are labeled as positive for the disease. The team uses a convolutional neural network (CNN) with a final sigmoid layer and binary cross-entropy loss. After training for 20 epochs, the model achieves 95% accuracy on the test set, but the recall for the positive class is only 10%. The team suspects the model is biased toward the negative class due to class imbalance. The data is stored in a secure environment, and no additional labeled data can be obtained. The team has access to the following techniques: oversampling the minority class, undersampling the majority class, using class weights in the loss function, applying data augmentation, and using a different architecture. Which course of action is most likely to improve recall for the positive class while maintaining reasonable overall performance?

A.Undersample the majority class to balance the dataset
B.Oversample the minority class using synthetic image generation
C.Assign higher class weights to the positive class in the loss function
D.Replace the CNN with a transformer-based architecture
AnswerC

Class weights force the model to focus on the minority class, improving recall.

Why this answer

Assigning higher class weights to the positive class in the loss function directly penalizes misclassifications of the minority class during training. This forces the model to pay more attention to positive samples without altering the dataset distribution, which is critical when no additional labeled data can be obtained and the data is in a secure environment. It improves recall by increasing the gradient contribution from positive samples, while maintaining overall performance because the model still sees the original data distribution.

Exam trap

The trap here is that candidates often choose oversampling (Option B) as the default solution for class imbalance, but fail to recognize that synthetic image generation for medical images can introduce unrealistic patterns and is not a standard or safe technique, whereas class weights are a lightweight, data-preserving approach that directly addresses the loss function.

How to eliminate wrong answers

Option A is wrong because undersampling the majority class discards a large number of negative samples, which can lead to loss of valuable information and degrade overall accuracy, especially with a 95% negative class. Option B is wrong because oversampling the minority class using synthetic image generation (e.g., SMOTE) is not directly applicable to high-dimensional image data without careful adaptation, and it may introduce unrealistic artifacts that harm generalization; the question specifies 'synthetic image generation' which is not a standard or safe approach for retinal fundus images. Option D is wrong because replacing the CNN with a transformer-based architecture does not address the class imbalance problem; transformers are not inherently better at handling imbalanced data and would require more data and computational resources, which are not available here.

59
MCQeasy

A company wants to deploy a machine learning model that requires continuous learning as new data arrives. The model must be able to adapt to changing patterns without retraining from scratch. Which approach should be used?

A.Transfer learning
B.Online learning
C.Batch learning
D.Unsupervised learning
AnswerB

Online learning updates the model incrementally, allowing adaptation to new data without full retraining.

Why this answer

Online learning (also called incremental learning) updates the model incrementally as each new data point arrives, without requiring full retraining. This makes it ideal for scenarios where data arrives continuously and patterns shift over time, as the model can adapt its parameters on the fly.

Exam trap

CompTIA often tests the distinction between training paradigms (online vs. batch) and other ML concepts like transfer learning or unsupervised learning, so candidates may confuse 'continuous learning' with 'transfer learning' or incorrectly assume that any learning method can handle streaming data.

How to eliminate wrong answers

Option A is wrong because transfer learning reuses a pre-trained model on a new but related task, but it does not inherently support continuous adaptation to streaming data—it typically requires a separate fine-tuning phase. Option C is wrong because batch learning trains the model on the entire dataset at once and requires retraining from scratch when new data arrives, making it unsuitable for continuous learning. Option D is wrong because unsupervised learning is a paradigm for finding patterns in unlabeled data, not a deployment strategy for handling streaming data or model updates.

60
MCQeasy

Based on the exhibit, what does this indicate about the model?

A.The model has balanced performance
B.The model is underfitting
C.The model is overfitting
D.The model has high precision but low recall, missing many positives
AnswerD

Correct: Precision is high, recall is low, indicating the model is conservative in labeling positives.

Why this answer

Option B is correct because high precision (0.95) means few false positives, but low recall (0.60) means many false negatives, so the model misses many positive instances. Options A, C, and D are incorrect: overfitting cannot be determined from these metrics alone, underfitting would likely show both low precision and recall, and balanced performance would have similar values.

61
MCQmedium

A healthcare organization wants to use patient data to predict disease risk. They are concerned about bias in the model. Which step is most critical during the data preparation phase to mitigate bias?

A.Applying SMOTE to oversample minority classes
B.Using a more complex algorithm
C.Removing all demographic features
D.Ensuring the training data is representative of the target population
AnswerD

Representative data prevents bias from skewed sampling.

Why this answer

Option A is correct because ensuring the training data is representative of the target population is fundamental to avoid bias. Option B is incorrect because SMOTE addresses class imbalance, not bias from non-representative sampling. Option C is incorrect because model complexity does not directly address bias.

Option D is incorrect because removing demographic features may not eliminate bias if proxy variables remain.

62
MCQmedium

A machine learning engineer is tuning a neural network for image classification. The training loss decreases steadily, but the validation loss starts increasing after 50 epochs. Which action best addresses this issue?

A.Increase the number of hidden layers
B.Add more training data
C.Apply early stopping with a patience of 10 epochs
D.Increase the batch size
AnswerC

Early stopping monitors validation loss and stops training when it starts increasing, directly addressing overfitting.

Why this answer

Early stopping halts training when validation performance degrades, preventing overfitting.

63
Multi-Selecteasy

Which TWO are characteristics of supervised learning?

Select 2 answers
A.Does not require target variable
B.Requires labeled data
C.Uses reinforcement signals
D.Learns to cluster data
E.Predicts continuous or categorical output
AnswersB, E

Supervised learning uses input-output pairs for training.

Why this answer

Supervised learning requires labeled data and predicts either continuous (regression) or categorical (classification) outputs.

64
MCQhard

The exhibit shows a model configuration for a classification task with 10 classes. What is wrong with this setup?

A.The loss function should be categorical crossentropy, not mean squared error
B.The metric should be precision, not accuracy
C.The activation should be sigmoid in hidden layers
D.The optimizer should be SGD, not Adam
AnswerA

Correct: MSE is for regression; classification requires crossentropy loss.

Why this answer

Option A is correct because the loss function should be categorical crossentropy for multi-class classification with softmax output. Options B, C, and D are incorrect: Adam optimizer is appropriate, ReLU activations are fine for hidden layers, and accuracy metric is appropriate.

65
MCQmedium

An AI engineer is training a deep neural network for image recognition. The training loss decreases steadily for the first few epochs but then plateaus and starts to oscillate. Which adjustment is most likely to improve convergence?

A.Add more layers
B.Increase the learning rate
C.Increase the batch size
D.Reduce the learning rate
AnswerD

A lower learning rate can smooth convergence and reduce oscillation.

Why this answer

Option B is correct because oscillating loss often indicates a learning rate that is too high; reducing it stabilizes training. Option A is incorrect because increasing the learning rate would worsen oscillation. Option C is incorrect because increasing batch size can help but not primarily address oscillation.

Option D is incorrect because adding more layers could increase complexity and overfitting.

66
MCQhard

An AI developer observes that the training accuracy of a neural network is high, but the test accuracy is low. The model uses a ReLU activation function and Adam optimizer. Which approach is most likely to improve test accuracy?

A.Increase the learning rate
B.Add L2 regularization to the loss function
C.Switch to a stochastic gradient descent optimizer
D.Increase the number of epochs
AnswerB

L2 regularization penalizes large weights, preventing overfitting.

Why this answer

L2 regularization adds a penalty on large weights, reducing overfitting and improving test accuracy.

67
Multi-Selectmedium

Which THREE techniques can help reduce overfitting in neural networks?

Select 3 answers
A.Increasing training data size
B.L2 regularization
C.Using a larger learning rate
D.Dropout
E.Increasing number of layers
AnswersA, B, D

More data helps the model generalize better.

Why this answer

Dropout randomly drops neurons, L2 regularization penalizes large weights, and increasing data size reduces overfitting by providing more examples.

68
MCQhard

Refer to the exhibit. An AI specialist reviews the model evaluation report for a binary classifier. The specialist wants to improve recall. Which action is most likely effective?

A.Decrease the classification threshold
B.Collect more training data for the minority class
C.Increase the classification threshold
D.Add more features
AnswerB

More minority data provides the model with more patterns, often improving recall.

Why this answer

Collecting more data for the minority class (the class with lower recall) helps the model learn better representations, often improving recall.

69
MCQmedium

A team trains a deep learning model for image classification with 1000 classes. The training loss decreases but validation loss starts increasing after 10 epochs. What should they do first?

A.Use data augmentation
B.Increase batch size
C.Reduce learning rate
D.Add dropout layers
AnswerD

Correct: Dropout is a regularization technique that helps prevent overfitting.

Why this answer

Option C is correct because adding dropout layers can help reduce overfitting by randomly dropping neurons during training. Options A, B, and D are incorrect: reducing learning rate may help but not as directly for overfitting, increasing batch size may improve stability but not necessarily overfitting, and data augmentation helps if the dataset is small but the symptom here is overfitting.

70
MCQmedium

During training of a neural network, the loss oscillates and does not converge smoothly. The learning rate is set to 0.1. What is the most likely cause and what adjustment should be made?

A.Learning rate too low; increase it
B.Batch size too small; increase it
C.Learning rate too high; decrease it
D.Too many epochs; stop early
AnswerC

High learning rate causes divergence and oscillations.

Why this answer

A learning rate that is too high causes the optimizer to overshoot minima, leading to oscillations. Reducing the learning rate stabilizes training.

71
MCQhard

A data scientist is training a multi-class classifier with 10 classes. The training log shows the above output for the first two epochs. What is the most likely cause?

A.Batch normalization is disabled
B.The learning rate is set to zero
C.The dataset is imbalanced
D.The model is overfitting
AnswerB

A zero learning rate prevents any weight updates, so the model outputs remain at initial random values.

Why this answer

When the learning rate is set to zero, the optimizer makes no updates to the model weights regardless of the computed gradients. The training loss remains constant across epochs because the parameters never change, which matches the log showing identical loss values for both epochs. This is a common debugging scenario where a misconfigured learning rate prevents any learning from occurring.

Exam trap

CompTIA often tests the misconception that a flat loss curve is always due to data issues or model capacity, when in fact it is a classic symptom of a zero or extremely small learning rate that prevents any weight updates.

How to eliminate wrong answers

Option A is wrong because disabling batch normalization would cause training instability and fluctuating loss values, not a perfectly flat loss across epochs. Option C is wrong because an imbalanced dataset affects final accuracy and per-class performance, but the loss would still decrease (or oscillate) as the model learns the majority classes. Option D is wrong because overfitting is characterized by decreasing training loss with increasing validation loss, not a completely static training loss.

72
MCQhard

A fraud detection model is trained on a dataset where only 0.1% of transactions are fraudulent. The model achieves 99.9% accuracy but fails to catch most frauds. Which metric should the team prioritize, and which technique could help?

A.Mean Squared Error; use L2 regularization
B.F1 score; use principal component analysis
C.Accuracy; collect more data
D.Precision-Recall AUC; use oversampling like SMOTE
AnswerD

Precision-Recall AUC evaluates minority class well; SMOTE generates synthetic samples.

Why this answer

With severe class imbalance, accuracy is misleading. Precision-Recall AUC focuses on minority class, and SMOTE oversamples it.

73
Multi-Selectmedium

Which THREE of the following are techniques for handling missing data in machine learning?

Select 3 answers
A.Deletion of rows with missing values
B.Autoencoder reconstruction
C.Mean imputation
D.Principal Component Analysis
E.Using a separate category for missing values
AnswersA, C, E

Listwise deletion removes incomplete records; a basic approach.

Why this answer

Mean imputation replaces missing with the mean, deletion removes rows with missing, and flagging missing adds an indicator. Autoencoder imputation is more advanced but not a standard technique. PCA is for dimensionality reduction, not missing data handling.

74
MCQhard

A company deploys a machine learning model that makes predictions on streaming data. Over time, the data distribution shifts, causing model performance to degrade. Which monitoring strategy is most appropriate to detect this drift?

A.Compare the distribution of predictions to the training set
B.Monitor the model's training loss
C.Retrain the model daily on new data
D.Track the model's accuracy on a fixed validation set over time
AnswerD

Accuracy drop on a static validation set indicates concept drift.

Why this answer

Option A is correct because tracking accuracy on a fixed validation set over time directly reveals performance degradation due to distribution shift. Option B is incorrect because training loss may remain low even with drift. Option C is incorrect because retraining daily is a response, not a detection method.

Option D is incorrect because comparing prediction distributions is less direct than performance metrics.

75
MCQeasy

A data scientist is building a binary classification model to predict customer churn. The dataset has 10,000 samples with 80% non-churn and 20% churn. The model achieves 95% accuracy but fails to identify churners correctly. Which metric should the scientist focus on to evaluate model performance properly?

A.Precision
B.F1-score
C.Recall (TPR)
D.Specificity
AnswerC

Recall focuses on identifying positive cases, which is the main objective.

Why this answer

Option A is correct because recall (true positive rate) measures the ability to find positive (churn) cases, which is the goal in an imbalanced dataset. Option B, precision, is important but less critical when the cost of missing churners is high. Option C, F1-score, balances precision and recall but recall is more directly needed.

Option D, specificity, measures true negative rate, not relevant for catching churners.

Page 1 of 2 · 106 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Machine Learning Deep Learning questions.