Knowledge + Practice

CCNA AI Concepts and Foundations Questions

28 of 103 questions · Page 2/2 · AI Concepts and Foundations · Answers revealed

Practice these questions Domain overview All questions

76

MCQeasy

A chatbot developer uses a transformer-based model for customer service. Users complain that the chatbot sometimes gives offensive responses. Which technique should be applied first to mitigate this issue?

A.Increase the model size to improve its understanding of context.

B.Decrease the temperature parameter to make outputs more deterministic.

C.Train a separate classifier to detect offensive outputs in real time.

D.Review and filter the training dataset for offensive or biased language, then fine-tune the model.

AnswerD

Cleaning training data addresses the root cause.

Why this answer

Option D is correct because the root cause of offensive responses in transformer-based models is typically biased or toxic language present in the training data. Reviewing and filtering the dataset to remove such content, followed by fine-tuning the model, directly addresses the source of the problem. This approach aligns with the principle of data-centric AI, where improving data quality is the first step before modifying model architecture or inference parameters.

Exam trap

CompTIA often tests the misconception that modifying inference parameters (like temperature) or adding post-processing classifiers can fix fundamental data quality issues, when in fact the first and most effective mitigation is to address the training data itself.

How to eliminate wrong answers

Option A is wrong because increasing model size does not inherently fix biased or offensive outputs; larger models can actually amplify existing biases in the training data due to increased capacity to memorize patterns. Option B is wrong because decreasing the temperature parameter makes outputs more deterministic (lower randomness) but does not prevent the model from generating offensive content that it has learned from the data; it only reduces creative variation, not toxicity. Option C is wrong because training a separate classifier to detect offensive outputs in real time is a reactive measure that adds latency and complexity, whereas the proactive first step should be to clean the training data; a classifier also cannot prevent the model from generating offensive content in the first place.

Practice this question →

77

MCQhard

An AI team notices that their model's performance degrades over time because the statistical relationship between input features and the target variable changes. This issue is called:

A.Data drift

B.Overfitting

C.Concept drift

D.Model drift

AnswerC

Correct; concept drift describes changes in the mapping from inputs to outputs.

Why this answer

Concept drift occurs when the statistical relationship between input features and the target variable changes over time, causing model performance to degrade. This is distinct from data drift, which involves changes in the input data distribution alone. In the AI0-001 context, concept drift directly addresses the shift in the underlying mapping from features to labels.

Exam trap

CompTIA often tests the distinction between data drift and concept drift, where candidates mistakenly choose data drift because they focus on the input features changing, rather than the relationship between features and the target.

How to eliminate wrong answers

Option A is wrong because data drift refers to changes in the distribution of input features, not the relationship between features and the target. Option B is wrong because overfitting is a model that memorizes training data noise and fails to generalize, not a temporal degradation due to shifting relationships. Option D is wrong because 'model drift' is not a standard term in machine learning; the correct term for the described phenomenon is concept drift.

Practice this question →

78

Multi-Selecteasy

Which TWO of the following are key stages in the AI lifecycle?

Select 2 answers

A.Human annotation of all data

B.Model retraining

C.Model deployment

D.Data collection

E.Manual feature extraction

AnswersC, D

Deploying the model into a production environment is a critical phase.

Why this answer

Data collection and model deployment are essential stages. Data collection provides the raw material for training, and model deployment puts the trained model into production. Manual feature extraction is becoming automated, human annotation is not always required, and model retraining should be continuous.

Practice this question →

79

MCQmedium

A financial services company is developing an AI model to detect fraudulent transactions. The dataset contains 99.9% legitimate transactions and 0.1% fraudulent ones. Which technique should the data scientist use to address the class imbalance problem?

A.Apply Synthetic Minority Oversampling Technique (SMOTE)

B.Use a bagging ensemble method

C.Undersample the legitimate transactions

D.Use cost-sensitive learning with higher weight on fraudulent class

AnswerA

SMOTE creates synthetic examples of the minority class, balancing the dataset without losing information.

Why this answer

SMOTE (Synthetic Minority Oversampling Technique) is the correct choice because it generates synthetic examples of the minority class (fraudulent transactions) by interpolating between existing minority instances, rather than duplicating them. This addresses the extreme 0.1% fraud rate without introducing overfitting or losing data, making it a standard technique for imbalanced classification problems in financial fraud detection.

Exam trap

CompTIA often tests the distinction between resampling techniques (SMOTE, undersampling) and algorithmic adjustments (cost-sensitive learning, ensemble methods), so candidates may incorrectly choose cost-sensitive learning because it 'handles imbalance' without recognizing that SMOTE is the specific data-level technique asked for.

How to eliminate wrong answers

Option B is wrong because bagging (bootstrap aggregating) is an ensemble method that reduces variance but does not directly address class imbalance; it would still train on the skewed distribution unless combined with resampling. Option C is wrong because undersampling the legitimate (majority) class would discard 99.9% of the data, causing severe information loss and potentially degrading model performance on legitimate transactions. Option D is wrong because cost-sensitive learning assigns higher misclassification costs to the minority class, which can help but is not a resampling technique; the question specifically asks for a technique to 'address the class imbalance problem' via data manipulation, and SMOTE is the direct resampling approach.

Practice this question →

80

MCQhard

Refer to the exhibit. A system administrator reviews the deployment. Which action should be taken to meet the SLA?

A.Retrain the model

B.Implement caching

C.Reduce model input size

D.Scale up the compute resources

AnswerD

Correct; more compute power can speed up inference.

Why this answer

The exhibit shows a deployment where inference latency exceeds the SLA requirement. Scaling up compute resources (e.g., adding more CPU cores, GPU memory, or increasing instance size) directly reduces per-request processing time by providing more parallel processing capacity, which is the most straightforward way to meet latency SLAs when the model is already optimized.

Exam trap

CompTIA often tests the misconception that retraining or caching are universal performance fixes, when in fact they address accuracy and request repetition respectively, not raw compute throughput.

How to eliminate wrong answers

Option A is wrong because retraining the model improves accuracy or adapts to new data, but does not inherently reduce inference latency unless the model architecture is changed to a smaller or more efficient one, which is not indicated. Option B is wrong because caching can reduce latency for repeated identical requests, but the exhibit does not suggest that requests are repetitive; caching does not help with unique or dynamic inputs. Option C is wrong because reducing model input size (e.g., downsampling images or truncating text) may lower latency but at the cost of accuracy or completeness, and the SLA likely requires maintaining output quality; scaling compute resources preserves model fidelity.

Practice this question →

81

Multi-Selecteasy

Which TWO of the following are common techniques to reduce overfitting in a neural network?

Select 2 answers

A.Increasing the number of hidden layers

B.Using a larger learning rate

C.L2 regularization

D.Training for more epochs

E.Dropout

AnswersC, E

Correct; L2 regularization adds a penalty on squared weights.

Why this answer

L2 regularization (option C) reduces overfitting by adding a penalty term proportional to the squared magnitude of the weights to the loss function. This forces the network to keep weights small, preventing it from fitting noise in the training data and improving generalization.

Exam trap

CompTIA often tests the misconception that adding more layers or training longer always improves accuracy, when in fact these actions typically increase overfitting without proper regularization or validation monitoring.

Practice this question →

82

Multi-Selecthard

Which THREE of the following are key considerations when deploying an AI model in a production environment?

Select 3 answers

A.Basing acceptance solely on training accuracy

B.Maximizing model complexity to achieve the best accuracy

C.Monitoring model performance for data drift

D.Ensuring inference latency meets service-level agreements

E.Providing explainability for model decisions

AnswersC, D, E

Correct; models degrade over time if data changes.

Why this answer

Option C is correct because data drift refers to the change in the statistical properties of the input data over time, which can degrade model accuracy. Continuous monitoring for data drift is essential in production to detect when the model's assumptions about the data distribution are no longer valid, triggering retraining or alerts.

Exam trap

CompTIA often tests the misconception that high training accuracy is the primary goal for production deployment, when in reality operational concerns like latency, explainability, and drift monitoring are prioritized over raw accuracy.

Practice this question →

83

MCQmedium

A hospital uses an AI system to predict patient deterioration from vital signs. The system currently uses a logistic regression model trained on data from the past year. Recently, the hospital adopted a new patient monitoring device that provides more accurate readings. The model's performance has dropped significantly. The data science team has access to the new device's data for the past month and wants to improve the model with minimal disruption. The team also wants to ensure the model remains interpretable for regulatory compliance. Which approach should they take?

A.Retrain the logistic regression model on a combined dataset of old and new device data

B.Continue using the current model and manually adjust predictions based on device differences

C.Build an ensemble of logistic regression and a neural network using new data only

D.Replace the logistic regression model with a gradient boosting model using only new device data

AnswerA

This incorporates the new device's accuracy while maintaining interpretability and using all available data.

Why this answer

Retraining the logistic regression model on a combined dataset of old and new device data is the best approach because it leverages all available data to adapt the model to the new device's measurement distribution while preserving the model's inherent interpretability. Logistic regression is a linear model that remains fully transparent for regulatory compliance, and combining both datasets helps the model learn the systematic shift in vital sign readings without discarding valuable historical patterns. This minimizes disruption by avoiding a complete overhaul and directly addresses the performance drop caused by the change in input data distribution.

Exam trap

CompTIA often tests the trade-off between model performance and interpretability, and the trap here is that candidates may prioritize performance gains from complex models (like gradient boosting or neural networks) without recognizing that regulatory compliance mandates interpretability, making logistic regression the only viable choice despite its simplicity.

How to eliminate wrong answers

Option B is wrong because manually adjusting predictions based on device differences is ad-hoc, non-scalable, and introduces subjective bias, which undermines both reliability and regulatory compliance. Option C is wrong because building an ensemble with a neural network reduces interpretability, violating the regulatory requirement, and using only new data ignores valuable historical patterns, leading to overfitting and poor generalization. Option D is wrong because replacing logistic regression with a gradient boosting model sacrifices interpretability, as gradient boosting is a black-box model, and training only on one month of new data risks overfitting and fails to capture long-term trends.

Practice this question →

84

MCQmedium

A team deploying an AI model for real-time fraud detection notices that inference latency is too high. The model is a deep neural network with 50 layers, deployed on a cloud GPU. Which of the following is the BEST approach to reduce latency while maintaining acceptable accuracy?

A.Deploy the model on a more powerful GPU.

B.Reduce the batch size for inference.

C.Replace the DNN with a logistic regression model.

D.Apply knowledge distillation to create a smaller model.

AnswerD

Correct; distillation compresses the model while preserving performance.

Why this answer

Knowledge distillation trains a smaller 'student' model to mimic the behavior of a larger 'teacher' model, significantly reducing the number of parameters and layers while preserving most of the original accuracy. This directly addresses the high inference latency caused by the 50-layer DNN by producing a compact model that runs faster on the same GPU hardware.

Exam trap

CompTIA often tests the misconception that simply upgrading hardware or reducing batch size is the best latency fix, when in fact architectural compression techniques like knowledge distillation are the most effective for deep models with strict latency budgets.

How to eliminate wrong answers

Option A is wrong because upgrading to a more powerful GPU only provides a linear speedup and does not address the fundamental architectural overhead of a 50-layer network; it also increases cost without guaranteeing latency targets. Option B is wrong because reducing batch size actually increases the number of inference passes per transaction, which can increase per-request latency due to underutilized GPU parallelism and higher overhead from frequent kernel launches. Option C is wrong because replacing the DNN with logistic regression would cause a catastrophic drop in accuracy for complex fraud patterns, as logistic regression cannot model non-linear interactions and high-dimensional feature spaces that the DNN captures.

Practice this question →

85

MCQhard

A self-driving car company is developing an object detection system using a convolutional neural network (CNN). The system needs to detect pedestrians and vehicles in real-time with high accuracy. Which technique can reduce inference time while maintaining accuracy?

A.Apply model pruning and quantization

B.Use a pre-trained model and fine-tune it

C.Add more convolutional layers

D.Increase number of filters in each layer

AnswerA

Pruning removes unimportant weights, and quantization reduces precision of weights, both speeding up inference while preserving accuracy.

Why this answer

Model pruning removes redundant or less important weights from the CNN, reducing computational load, while quantization converts floating-point weights to lower-precision integers (e.g., INT8). Together, they shrink model size and speed up inference without significantly degrading accuracy, making them ideal for real-time object detection in resource-constrained environments like autonomous vehicles.

Exam trap

CompTIA often tests the misconception that adding more layers or filters always improves performance, when in fact it increases latency and resource usage, while pruning and quantization are the standard techniques for reducing inference time without sacrificing accuracy.

How to eliminate wrong answers

Option B is wrong because fine-tuning a pre-trained model improves accuracy for a specific task but does not inherently reduce inference time; it may even increase it if the model remains large. Option C is wrong because adding more convolutional layers increases the network depth and computational cost, which slows inference and can cause overfitting without careful regularization. Option D is wrong because increasing the number of filters in each layer expands the feature map channels, raising the number of parameters and FLOPs, which directly increases inference time.

Practice this question →

86

MCQhard

A financial institution is deploying a reinforcement learning agent to optimize stock trading decisions. The agent is trained in a simulated environment that mimics historical market data. After deployment, the agent performs well initially but then suffers large losses during a period of high volatility that was underrepresented in the training data. The team wants to make the agent more robust to such market conditions without retraining from scratch. They have a budget for additional simulation compute and access to a broader historical dataset including past crises. The agent uses a deep Q-network (DQN) architecture. Which strategy should they adopt?

A.Increase the replay buffer size and continue training on the original dataset

B.Keep the DQN but perform extensive hyperparameter tuning on the original data

C.Modify the DQN to use a recurrent neural network (e.g., DRQN) and train on the expanded dataset

D.Switch to a policy gradient method with a random exploration strategy

AnswerC

Recurrent networks capture temporal dynamics better, and training on a more diverse dataset improves robustness.

Why this answer

Option C is correct because a Deep Recurrent Q-Network (DRQN) can capture temporal dependencies in market data, which is crucial for handling volatile periods that were underrepresented in training. By training on the expanded dataset that includes past crises, the agent can learn from sequential patterns of volatility, making it more robust without requiring a complete retraining from scratch. This approach leverages the existing DQN architecture while adding recurrent layers to better model the dynamic market conditions.

Exam trap

CompTIA often tests the misconception that simply tuning hyperparameters or expanding the replay buffer can fix a model's inability to generalize to unseen distributions, when the real solution requires a change in architecture to handle temporal dependencies.

How to eliminate wrong answers

Option A is wrong because simply increasing the replay buffer size and continuing training on the original dataset does not address the core issue of underrepresented high-volatility data; the agent would still lack exposure to those critical patterns. Option B is wrong because hyperparameter tuning on the original data cannot compensate for missing training examples of volatile market conditions; it only optimizes performance on the existing distribution. Option D is wrong because switching to a policy gradient method with random exploration does not inherently improve robustness to rare events; it may even increase variance and instability without addressing the data deficiency.

Practice this question →

87

MCQhard

A data scientist splits a dataset into training (80%) and test (20%). After training, the model achieves 95% accuracy on training and 60% on test. Which step should the data scientist take first?

A.Collect more data

B.Use cross-validation

C.Apply regularization

D.Increase model complexity

AnswerC

Regularization penalizes large weights, reducing overfitting.

Why this answer

The model shows high training accuracy (95%) but significantly lower test accuracy (60%), which is a classic sign of overfitting. Regularization (Option C) directly addresses overfitting by adding a penalty term to the loss function (e.g., L1 or L2 regularization), discouraging the model from learning overly complex patterns that do not generalize. This is the first step because it targets the core issue without requiring additional data or increasing complexity.

Exam trap

CompTIA often tests the misconception that overfitting is always solved by more data or cross-validation, but the immediate corrective action is to apply regularization to penalize model complexity.

How to eliminate wrong answers

Option A is wrong because collecting more data can help reduce overfitting, but it is not the first step; regularization is a simpler, more immediate fix that does not depend on data availability. Option B is wrong because cross-validation is a technique for model evaluation and hyperparameter tuning, not a direct remedy for overfitting; it would help assess the severity but does not solve the underlying problem. Option D is wrong because increasing model complexity would worsen overfitting, as it allows the model to fit noise even more closely, further reducing test accuracy.

Practice this question →

88

MCQhard

A team is training a deep learning model for natural language processing using a large corpus. They notice the model has a very high number of parameters and training is slow. Which technique can reduce the number of parameters without significant performance loss?

A.Apply embedding compression

B.Add more dropout layers

C.Use a larger batch size

D.Increase learning rate

AnswerA

Embedding compression reduces the dimensionality of embedding layers, directly reducing parameters with minimal impact on performance.

Why this answer

Embedding compression reduces the dimensionality of the embedding layer, which often contains the majority of the model's parameters in NLP tasks. By using techniques like low-rank factorization or pruning, the model retains most of its representational power while significantly decreasing the parameter count and training time.

Exam trap

The trap here is that candidates confuse regularization techniques (like dropout) or training speed optimizations (batch size, learning rate) with actual parameter reduction, which only embedding compression directly achieves.

How to eliminate wrong answers

Option B is wrong because adding more dropout layers does not reduce the number of parameters; it only randomly drops neurons during training to prevent overfitting, leaving the parameter count unchanged. Option C is wrong because using a larger batch size improves training speed through better hardware utilization but does not reduce the number of parameters. Option D is wrong because increasing the learning rate can speed up convergence but does not affect the parameter count and may cause training instability or divergence.

Practice this question →

89

MCQeasy

A company wants to use AI to analyze customer reviews and determine sentiment (positive, negative, neutral). Which AI subfield is most directly applicable?

A.Reinforcement learning

B.Computer vision

C.Natural language processing

D.Robotics

AnswerC

Correct; NLP is used for text analysis and sentiment.

Why this answer

Natural language processing (NLP) is the AI subfield that enables machines to understand, interpret, and generate human language. Analyzing customer reviews for sentiment requires processing text, extracting meaning, and classifying it as positive, negative, or neutral, which is a core NLP task called sentiment analysis.

Exam trap

The trap here is that candidates often confuse natural language processing with computer vision or reinforcement learning because they see 'AI' broadly, but the specific task of analyzing text directly maps to NLP, not the other subfields.

How to eliminate wrong answers

Option A is wrong because reinforcement learning is a training paradigm where an agent learns by interacting with an environment and receiving rewards or penalties; it is not designed for text classification or sentiment analysis. Option B is wrong because computer vision focuses on interpreting visual data such as images and videos, not textual content like customer reviews. Option D is wrong because robotics deals with the design and control of physical machines to perform tasks in the real world, which is unrelated to analyzing text-based sentiment.

Practice this question →

90

MCQeasy

A marketing team uses a recommendation system to suggest products to customers. The system currently uses collaborative filtering. Which scenario would most likely cause the cold-start problem?

A.A new product is added to the catalog with no purchase history.

B.The system switches from collaborative filtering to content-based filtering.

C.The website interface is redesigned, affecting user navigation.

D.A seasonal product experiences a sudden spike in sales.

AnswerA

No interaction data exists for the new product, so collaborative filtering fails.

Why this answer

The cold-start problem occurs when a recommendation system lacks sufficient data to make accurate predictions. In collaborative filtering, recommendations rely on historical user-item interactions (e.g., purchase history). A new product with no purchase history has no interaction data, so the system cannot find similar users or items to generate recommendations, directly causing the cold-start problem.

Exam trap

CompTIA often tests the cold-start problem by making candidates confuse it with performance issues or UI changes, but the trap here is that the cold-start problem is specifically about insufficient interaction data for new users or items, not about algorithm switches or interface redesigns.

How to eliminate wrong answers

Option B is wrong because switching from collaborative filtering to content-based filtering does not inherently cause the cold-start problem; content-based filtering uses item features (e.g., product attributes) to make recommendations, which can still work for new items if features are available. Option C is wrong because a website interface redesign affects user navigation but does not impact the underlying recommendation algorithm's data availability or the cold-start problem. Option D is wrong because a sudden spike in sales for a seasonal product provides abundant interaction data, which actually helps collaborative filtering make better recommendations, not cause a cold-start.

Practice this question →

91

MCQeasy

A machine learning engineer wants to evaluate a binary classifier. Which metric is MOST appropriate when the positive class is rare (e.g., 1% of total data)?

A.True negative rate

B.F1-score

C.Mean squared error

D.Accuracy

AnswerB

Correct; F1 considers both precision and recall.

Why this answer

When the positive class is rare (e.g., 1% of total data), accuracy is misleading because a classifier that always predicts the negative class would achieve 99% accuracy. The F1-score is the harmonic mean of precision and recall, making it robust to class imbalance by focusing on the positive class performance. It is the most appropriate metric for evaluating binary classifiers on imbalanced datasets.

Exam trap

CompTIA often tests the misconception that accuracy is always the best metric, but in imbalanced datasets it is misleading, and candidates must recognize that F1-score (or precision-recall curves) is the correct choice for rare positive classes.

How to eliminate wrong answers

Option A is wrong because the true negative rate (specificity) measures the proportion of actual negatives correctly identified, which is not sensitive to the rare positive class and can be high even if the classifier misses all positives. Option C is wrong because mean squared error (MSE) is a regression metric that measures average squared differences between predicted and actual values, not suitable for binary classification outcomes. Option D is wrong because accuracy ( (TP+TN)/(TP+TN+FP+FN) ) is dominated by the majority class in imbalanced datasets, giving a falsely high score even when the classifier fails to detect the rare positive class.

Practice this question →

92

Multi-Selectmedium

Which TWO techniques are commonly used to handle missing data in a dataset?

Select 2 answers

A.Feature scaling

B.One-hot encoding

C.Remove rows with missing values

D.Impute with mean or median

E.Principal component analysis (PCA)

AnswersC, D

Simple deletion if missing data is minimal.

Why this answer

Option C is correct because removing rows with missing values is a straightforward technique to handle missing data, especially when the missingness is random and the dataset is large enough that dropping a few rows does not significantly reduce the sample size or introduce bias. Option D is correct because imputing missing values with the mean or median is a common statistical method that preserves the dataset size and is simple to implement, though it can reduce variance and may distort relationships if the data is not missing completely at random.

Exam trap

CompTIA often tests the distinction between data preprocessing techniques that handle missing values versus those that transform or reduce features, so candidates may confuse feature scaling or PCA with missing data handling because they are all part of data preparation.

Practice this question →

93

Multi-Selecthard

Which THREE factors are common causes of bias in AI systems?

Select 3 answers

A.Cross-validation

B.Lack of diversity in the development team

C.Unrepresentative training sample

D.Biased historical data used for training

E.High regularization

AnswersB, C, D

Homogeneous teams may overlook biased assumptions.

Why this answer

Option B is correct because a lack of diversity in the development team leads to homogeneity of thought, which can cause blind spots in identifying potential biases in data, features, or model behavior. When the team does not represent the full spectrum of end users, the AI system may inadvertently encode assumptions that disadvantage underrepresented groups, resulting in biased outcomes.

Exam trap

CompTIA often tests the distinction between statistical bias (e.g., from regularization or validation techniques) and harmful societal bias that leads to unfair outcomes, so candidates mistakenly select options like cross-validation or high regularization as causes of bias.

Practice this question →

94

Multi-Selectmedium

A company is implementing an AI solution for fraud detection. The dataset is highly imbalanced (only 1% fraudulent transactions). Which THREE techniques are most appropriate to address class imbalance? (Select three.)

Select 3 answers

A.Apply cost-sensitive learning by assigning a higher misclassification cost to the minority class.

B.Reduce the number of features using principal component analysis (PCA).

C.Use accuracy as the primary evaluation metric.

D.Evaluate model performance using precision-recall curves and F1 score.

E.Use synthetic oversampling (SMOTE) to create additional minority class samples.

AnswersA, D, E

Cost-sensitive methods penalize minority class errors more heavily.

Why this answer

Option A is correct because cost-sensitive learning directly addresses class imbalance by assigning a higher misclassification cost to the minority class (fraudulent transactions). This forces the model to penalize false negatives more heavily, thereby improving recall for the minority class without altering the dataset distribution.

Exam trap

CompTIA often tests the misconception that accuracy is a valid metric for imbalanced datasets, but the trap here is that candidates overlook how a high accuracy can mask poor minority class performance, leading them to select option C instead of focusing on precision-recall curves and F1 score.

Practice this question →

95

MCQmedium

Refer to the exhibit. The training log shows loss and accuracy for a binary classification model. What is the most likely issue with this model?

A.Overfitting

B.Insufficient epochs

C.Underfitting

D.Data leakage

AnswerA

The divergence between decreasing training loss and increasing validation loss indicates overfitting.

Why this answer

The training loss decreases and training accuracy increases, but validation loss increases and validation accuracy decreases. This is a classic sign of overfitting, where the model learns training data noise but fails to generalize. Underfitting would show both training and validation loss high.

Data leakage would show unusually high accuracy early. Insufficient epochs would show both losses still decreasing.

Practice this question →

96

MCQmedium

A team is training a neural network for image classification. They observe that training loss decreases steadily but validation loss starts increasing after 20 epochs. What is the most likely issue?

A.Underfitting

B.Vanishing gradients

C.Data leakage

D.Overfitting

AnswerD

Correct; the model is fitting noise in training data.

Why this answer

Option A is correct because increasing validation loss while training loss continues to decrease is a classic sign of overfitting. Option B (underfitting) would show poor training loss. Option C (vanishing gradients) would cause slow convergence.

Option D (data leakage) would affect validation if leaked, but pattern is overfitting.

Practice this question →

97

Multi-Selecteasy

Which THREE are common machine learning algorithms used for regression?

Select 3 answers

A.Logistic regression

B.K-means

C.Linear regression

D.Decision tree

E.K-nearest neighbors

AnswersC, D, E

Correct; linear regression predicts a continuous target.

Why this answer

Linear regression is a fundamental supervised learning algorithm used for regression tasks, where the goal is to predict a continuous numeric output based on one or more input features. It models the relationship between the dependent and independent variables by fitting a linear equation to the observed data, making it a core algorithm for regression problems.

Exam trap

CompTIA often tests the distinction between regression and classification algorithms, and the trap here is that candidates mistakenly associate 'logistic regression' with regression tasks due to its name, when it is actually a classification algorithm.

Practice this question →

98

MCQhard

A research team is training a deep neural network for image classification. The training loss decreases rapidly for the first few epochs but then plateaus, while validation loss starts to increase after epoch 10. Which action would best address this issue?

A.Reduce the batch size to introduce more noise during training.

B.Increase the learning rate to help the model escape the plateau.

C.Implement early stopping based on validation loss to prevent further overfitting.

D.Add more convolutional layers to increase model capacity.

AnswerC

Early stopping stops training before overfitting worsens.

Why this answer

The training loss decreasing rapidly then plateauing while validation loss increases after epoch 10 is a classic sign of overfitting. Early stopping monitors validation loss and halts training when it begins to rise, preventing the model from memorizing noise in the training data. This directly addresses the overfitting issue without requiring architectural or hyperparameter changes that could destabilize training.

Exam trap

CompTIA often tests the misconception that plateauing training loss always requires adjusting learning rate or batch size, when in fact the simultaneous rise in validation loss is the definitive indicator of overfitting that early stopping is designed to solve.

How to eliminate wrong answers

Option A is wrong because reducing batch size increases gradient noise, which can actually worsen overfitting by preventing the model from converging to a stable minimum and may amplify validation loss increases. Option B is wrong because increasing the learning rate when validation loss is already rising risks overshooting the optimal weights, causing divergence or even higher validation loss. Option D is wrong because adding more convolutional layers increases model capacity, which exacerbates overfitting by giving the model more parameters to memorize training data rather than generalizing.

Practice this question →

99

MCQhard

A financial institution uses a machine learning model to approve loan applications. The model was trained on historical data that inadvertently encoded a bias against applicants from certain zip codes, leading to discriminatory lending practices. A recent audit reveals that the model's decisions are unfair, and regulators require the bank to remediate the bias without significantly reducing overall approval accuracy. The data science team has access to the training data, the model, and a set of fairness metrics. They also have a small, unbiased validation set. Which course of action should the team take to satisfy regulatory requirements?

A.Remove the zip code feature from the model and retrain

B.Implement adversarial debiasing using the unbiased validation set to enforce fairness constraints

C.Increase the weight of samples from disadvantaged zip codes in the training data

D.Retrain the model using only the unbiased validation set

AnswerB

Adversarial debiasing directly optimizes for fairness and accuracy.

Why this answer

Adversarial debiasing directly addresses the bias encoded in the model by training a predictor and an adversary simultaneously. The adversary tries to predict the protected attribute (e.g., zip code) from the model's predictions, while the predictor is penalized for allowing such inference, enforcing fairness constraints. Using the unbiased validation set ensures the debiasing process is guided by ground truth labels that are free from historical bias, allowing the model to retain high accuracy while reducing discrimination.

Exam trap

CompTIA often tests the misconception that removing a sensitive feature (like zip code) is sufficient to eliminate bias, but the trap is that models can learn proxy features, so a more sophisticated debiasing technique like adversarial debiasing is required.

How to eliminate wrong answers

Option A is wrong because simply removing the zip code feature does not eliminate bias; the model can still learn proxy features (e.g., income, loan amount) that correlate with zip code, leading to continued discriminatory outcomes. Option C is wrong because increasing sample weights for disadvantaged zip codes may overcorrect and reduce overall accuracy, and it does not directly enforce a fairness constraint; it can also introduce new biases if the weighting is not carefully tuned. Option D is wrong because retraining on only the small unbiased validation set would likely lead to severe overfitting and poor generalization, as the dataset is too small to capture the full distribution of loan applications, significantly reducing approval accuracy.

Practice this question →

100

MCQmedium

A retail company wants to implement a recommendation system using collaborative filtering. The dataset contains user-item interactions (ratings) for 10,000 users and 5,000 products. The matrix is very sparse (99% missing values). The team plans to use matrix factorization to predict missing ratings. However, the training time is excessively long, and the model is not converging. The data engineer suggests using a smaller learning rate and more iterations. Which additional technique should the team apply to speed up training and improve convergence?

A.Add L2 regularization to the loss function

B.Increase the minibatch size

C.Reduce the number of latent factors

D.Switch to the Adam optimizer

AnswerA

Regularization prevents overfitting and improves convergence by penalizing large weights.

Why this answer

The correct answer is A because adding L2 regularization to the loss function helps prevent overfitting and improves convergence in matrix factorization, especially with extremely sparse data (99% missing). Regularization penalizes large latent factor weights, which stabilizes the optimization process and allows the model to generalize better, reducing the risk of divergence during training.

Exam trap

CompTIA often tests the misconception that adaptive optimizers like Adam are a universal fix for convergence issues, but in sparse matrix factorization, L2 regularization is a more direct solution to the overfitting and instability that cause non-convergence.

How to eliminate wrong answers

Option B is wrong because increasing minibatch size typically speeds up training per iteration but can lead to slower convergence and may not address the core issue of non-convergence due to overfitting or ill-conditioned gradients. Option C is wrong because reducing the number of latent factors reduces model capacity and can cause underfitting, but it does not directly fix convergence problems; in fact, it may worsen the model's ability to capture patterns in sparse data. Option D is wrong because switching to the Adam optimizer can help with convergence in many cases, but the question asks for an additional technique beyond the suggested smaller learning rate and more iterations; Adam adapts learning rates per parameter but does not inherently address the overfitting and stability issues caused by extreme sparsity, whereas L2 regularization directly mitigates those.

Practice this question →

101

MCQhard

A team is building a natural language processing (NLP) model to analyze customer feedback. They have a large corpus of unlabeled text data and want to generate word embeddings that capture semantic meaning. Which approach should they use?

A.One-hot encoding

B.TF-IDF vectorization

C.Word2Vec

D.Bag-of-words model

AnswerC

Word2Vec learns dense embeddings from unlabeled text, capturing semantic relationships.

Why this answer

Word2Vec is the correct approach because it learns dense, distributed word embeddings from large unlabeled corpora by training a shallow neural network to predict words in context (CBOW) or context from words (Skip-gram). This captures semantic relationships such as analogy and similarity, which is essential for analyzing customer feedback without labeled data.

Exam trap

CompTIA often tests the distinction between frequency-based vectorization (TF-IDF, bag-of-words) and prediction-based embedding methods (Word2Vec, GloVe), trapping candidates who think TF-IDF captures semantic meaning when it only captures term importance in a document.

How to eliminate wrong answers

Option A is wrong because one-hot encoding produces sparse, high-dimensional vectors with no semantic meaning—each word is represented as a binary vector with a single 1, and all vectors are orthogonal, so no similarity or relationship between words is captured. Option B is wrong because TF-IDF vectorization relies on term frequency and inverse document frequency to produce weighted sparse vectors, which reflect word importance in a document but do not capture semantic meaning or word relationships; it is a bag-of-words variant that ignores word order and context. Option D is wrong because the bag-of-words model creates sparse vectors based on word counts, losing all word order and context, and cannot generate embeddings that capture semantic similarity or analogy.

Practice this question →

102

MCQmedium

An AI model is trained to predict loan default. The training data contains 95% non-default and 5% default. Which metric is most appropriate to evaluate model performance given the imbalanced dataset?

A.Mean squared error

B.F1-score

C.Accuracy

D.R-squared

AnswerB

F1-score considers both false positives and false negatives, providing a balanced measure for minority class performance.

Why this answer

The F1-score is the harmonic mean of precision and recall, making it robust to class imbalance. In this dataset with 95% non-default and 5% default, accuracy would be misleadingly high (95%) even if the model never predicts default, while F1-score penalizes poor recall of the minority class.

Exam trap

CompTIA often tests the misconception that accuracy is always the best metric, leading candidates to overlook its failure in imbalanced scenarios where a trivial classifier can achieve high accuracy.

How to eliminate wrong answers

Option A is wrong because Mean Squared Error (MSE) is a regression metric that measures average squared differences between predicted and actual values, not suitable for binary classification tasks like loan default prediction. Option C is wrong because accuracy is misleading on imbalanced datasets; a model predicting all non-default would achieve 95% accuracy but fail to identify any actual defaults. Option D is wrong because R-squared is a regression metric that indicates the proportion of variance explained by the model, inappropriate for evaluating classification performance on imbalanced data.

Practice this question →

103

MCQhard

A data scientist notices the model overfits. Which change to the exhibit's configuration would most likely reduce overfitting?

A.Remove dropout layers

B.Increase learning rate to 0.01

C.Add L2 regularization to dense layers

D.Increase units in the first dense layer to 512

AnswerC

L2 regularization adds a penalty on large weights, discouraging complex models and reducing overfitting.

Why this answer

Adding L2 regularization to dense layers penalizes large weights by adding a squared magnitude term to the loss function, which forces the model to learn simpler patterns and reduces overfitting. This directly addresses the core issue of the model memorizing noise in the training data.

Exam trap

CompTIA often tests the misconception that increasing model capacity (more units or layers) or removing regularization always improves performance, when in fact these changes exacerbate overfitting; candidates must recognize that regularization techniques like L2 are specifically designed to penalize complexity and reduce overfitting.

How to eliminate wrong answers

Option A is wrong because removing dropout layers would actually increase overfitting, as dropout is a regularization technique that randomly drops neurons during training to prevent co-adaptation. Option B is wrong because increasing the learning rate to 0.01 (a relatively high value) can cause the optimizer to overshoot minima and lead to unstable training, but it does not directly reduce overfitting; in fact, a too-high learning rate may prevent convergence altogether. Option D is wrong because increasing units in the first dense layer to 512 adds more parameters to the model, which increases capacity and typically worsens overfitting rather than reducing it.

Practice this question →

← PreviousPage 2 of 2 · 103 questions total

Ready to test yourself?

Try a timed practice session using only AI Concepts and Foundations questions.

Start 20-question session