CCNA Ai Concepts Foundations Questions

75 of 103 questions · Page 1/2 · Ai Concepts Foundations topic · Answers revealed

1
MCQeasy

A data scientist is training a binary classification model to detect fraudulent transactions. The dataset has 99% legitimate transactions and 1% fraudulent. The model achieves 99% accuracy but fails to catch most fraud. Which metric should the team prioritize to evaluate model performance?

A.F1 score
B.Precision
C.Accuracy
D.Recall
AnswerD

Recall measures the ability to catch fraudulent transactions, which is the primary goal.

Why this answer

Recall (sensitivity) measures the proportion of actual positive cases (fraud) correctly identified. With 99% accuracy but failing to catch most fraud, the model is biased toward the majority class (legitimate transactions), so recall is the critical metric to ensure fraud detection improves.

Exam trap

CompTIA often tests the misconception that high accuracy implies good model performance, especially in imbalanced datasets, leading candidates to overlook recall as the appropriate metric for minority class detection.

How to eliminate wrong answers

Option A is wrong because F1 score is the harmonic mean of precision and recall; while useful, it does not isolate the model's ability to catch fraud, and in this imbalanced dataset, a high F1 could still mask poor recall if precision is high. Option B is wrong because precision measures how many predicted frauds are actually fraud, but the model's failure to catch most fraud means recall is the primary concern, not the false positive rate. Option C is wrong because accuracy is misleading in imbalanced datasets; 99% accuracy can be achieved by simply predicting 'legitimate' for all transactions, which explains why the model fails to detect fraud.

2
Multi-Selectmedium

When evaluating a binary classification model, which two metrics are most appropriate for imbalanced datasets? (Choose two.)

Select 2 answers
A.Accuracy
B.Mean absolute error
C.Recall
D.R-squared
E.Precision
AnswersC, E

Recall measures the proportion of actual positives correctly identified, essential for capturing minority class.

Why this answer

Recall (Option C) is correct because it measures the proportion of actual positive cases correctly identified, which is critical in imbalanced datasets where the minority class is of primary interest. Precision (Option E) is correct because it measures the accuracy of positive predictions, helping to avoid false positives when the positive class is rare. Together, recall and precision provide a balanced view of model performance on the minority class, unlike accuracy which can be misleadingly high by simply predicting the majority class.

Exam trap

CompTIA often tests the misconception that accuracy is always the best metric, but the trap here is that accuracy fails on imbalanced datasets, and candidates must recognize that recall and precision are the appropriate pair for evaluating minority class performance.

3
MCQhard

Refer to the exhibit. A deep learning model is being trained. Based on the training log, which problem is most evident?

A.Vanishing gradients
B.Overfitting
C.Underfitting
D.Data leakage
AnswerB

Training loss decreases, validation loss increases.

Why this answer

The training log shows that the training loss continues to decrease while the validation loss increases after a certain epoch, which is a classic sign of overfitting. The model is memorizing the training data rather than learning generalizable patterns, leading to poor performance on unseen data.

Exam trap

CompTIA often tests the distinction between overfitting and underfitting by showing loss curves where training loss decreases but validation loss increases, which candidates may misinterpret as a normal training progression or as vanishing gradients.

How to eliminate wrong answers

Option A is wrong because vanishing gradients typically manifest as stagnant or very slow learning across both training and validation metrics, not as diverging loss curves. Option C is wrong because underfitting would show high training loss and high validation loss without improvement, not a decreasing training loss with an increasing validation loss. Option D is wrong because data leakage usually causes unusually high performance on both training and validation sets from the start, not a divergence after initial improvement.

4
MCQmedium

Refer to the exhibit. The model is a neural network for 10-class classification. The training log shows no improvement over 5 epochs. Which of the following is the most likely root cause?

A.The batch size is too large, making gradient updates insignificant.
B.The output layer uses sigmoid activation instead of softmax.
C.The learning rate is too high, causing the loss to oscillate.
D.The model is suffering from vanishing gradients, preventing weight updates.
AnswerD

Vanishing gradients can cause no learning, leading to constant loss and random accuracy.

Why this answer

The training log shows no improvement over 5 epochs, which is a classic symptom of vanishing gradients in deep neural networks. When gradients become extremely small during backpropagation, weight updates are negligible, causing the loss to stagnate. This is especially common in deep networks with sigmoid or tanh activations, where gradients saturate in the tails of the activation function.

Exam trap

CompTIA often tests the distinction between symptoms of high learning rate (oscillation/divergence) and vanishing gradients (flat loss), so candidates mistakenly choose 'learning rate too high' when they see no improvement, but the key clue is the absence of oscillation or divergence in the loss curve.

How to eliminate wrong answers

Option A is wrong because a batch size that is too large typically leads to noisy or less effective gradient updates, but it does not cause complete stagnation; the loss would still fluctuate or decrease slowly. Option B is wrong because using sigmoid activation in the output layer for 10-class classification would produce outputs that do not sum to 1, making it unsuitable for multi-class probability estimation, but it would not prevent the loss from changing entirely—the model would still update weights, albeit incorrectly. Option C is wrong because a learning rate that is too high causes the loss to oscillate or diverge, not to remain flat with no improvement; the loss would show erratic behavior or NaN values, not a steady plateau.

5
MCQeasy

A marketing team wants to segment customers into groups based on purchasing behavior without predefined categories. Which algorithm should they use?

A.K-means clustering
B.Naive Bayes classifier
C.Logistic regression
D.Support vector machine
AnswerA

K-means is an unsupervised algorithm that groups data into clusters based on similarity, perfect for segmentation.

Why this answer

K-means clustering is an unsupervised learning algorithm that groups data points into clusters based on similarity without requiring predefined labels. Since the marketing team wants to segment customers based on purchasing behavior without predefined categories, K-means is the correct choice as it discovers natural groupings in the data.

Exam trap

CompTIA often tests the distinction between supervised and unsupervised learning, and the trap here is that candidates may confuse clustering (unsupervised) with classification (supervised) algorithms, leading them to pick a classifier like Naive Bayes or logistic regression instead of K-means.

How to eliminate wrong answers

Option B (Naive Bayes classifier) is wrong because it is a supervised learning algorithm that requires labeled training data to classify instances into predefined categories, making it unsuitable for discovering unknown segments. Option C (Logistic regression) is wrong because it is a supervised learning algorithm used for binary classification tasks, not for unsupervised clustering or segmentation without predefined groups. Option D (Support vector machine) is wrong because it is a supervised learning algorithm that separates data into predefined classes using hyperplanes, not for discovering hidden patterns or groupings in unlabeled data.

6
MCQhard

An AI team is deploying a predictive maintenance model for industrial equipment. The model predicts failure within a 30-day window. The cost of a false positive is 10% of the cost of a false negative. Which evaluation metric should the team prioritize?

A.F2 score (beta=2) to prioritize recall over precision.
B.Area under the ROC curve (AUC-ROC) to measure overall discrimination.
C.F1 score to balance precision and recall equally.
D.Precision to minimize false positives.
AnswerA

F2 score puts more weight on recall, aligning with the higher cost of false negatives.

Why this answer

Given asymmetric costs, F-beta with beta>1 weights recall (false negatives) higher. Precision and recall individually ignore the cost trade-off. AUC-ROC summarizes performance but does not incorporate costs; F1 gives equal weight, which is not suitable when false negatives are costlier.

7
MCQhard

A retail company deploys a machine learning model to predict customer churn. The model outputs a probability between 0 and 1, and churn is predicted if probability > 0.5. After deployment, the model has a high false positive rate (many non-churning customers labeled as churn), which leads to unnecessary retention offers and increased costs. The data science team confirms the model was trained on historical data with a balanced class distribution. The business team wants to reduce false positives while maintaining a reasonable true positive rate. However, they cannot retrain the model because the original training data is no longer available. What is the best course of action to reduce false positives?

A.Retrain the model using only the most recent three months of data.
B.Increase the decision threshold to a higher value, such as 0.7.
C.Collect new labeled data and perform transfer learning from the original model.
D.Decrease the decision threshold to a lower value, such as 0.3.
AnswerB

A higher threshold requires stronger evidence for churn, thus reducing false positives.

Why this answer

Raising the decision threshold (e.g., to 0.7) will reduce false positives because only high-confidence predictions will be classified as churn. This does not require retraining or new data. Reducing the threshold would increase false positives.

Retraining is not possible without data. Collecting new data would take time and still require retraining.

8
MCQmedium

A company deploys a chatbot using a large language model (LLM). After launch, users report that the chatbot sometimes generates plausible but false information. This phenomenon is known as:

A.Gradient explosion
B.Overfitting
C.Concept drift
D.Hallucination
AnswerD

Correct; LLMs often produce false information convincingly.

Why this answer

Option D is correct because hallucination in LLMs refers to the generation of plausible but factually incorrect or nonsensical information. This occurs when the model's probabilistic next-token prediction produces confident-sounding outputs that deviate from training data or real-world facts, often due to insufficient grounding or training data gaps.

Exam trap

The trap here is that candidates may confuse hallucination with overfitting, thinking the model is 'making up' data due to memorization errors, but overfitting is about poor generalization to new inputs, not confident false outputs from a well-generalized model.

How to eliminate wrong answers

Option A is wrong because gradient explosion is a training instability issue in deep neural networks where gradients become excessively large, causing weight updates to diverge; it does not relate to post-deployment output inaccuracies. Option B is wrong because overfitting describes a model that memorizes training data too well, performing poorly on unseen data, not generating false information that seems plausible. Option C is wrong because concept drift refers to a change in the statistical properties of the target variable over time, requiring model retraining, not a static LLM generating false outputs.

9
Multi-Selecteasy

A data analyst needs to select two appropriate unsupervised learning techniques for clustering unlabeled data. (Choose two.)

Select 2 answers
A.Linear regression
B.Support vector machine
C.Hierarchical clustering
D.Decision tree
E.K-means
AnswersC, E

Hierarchical clustering is an unsupervised algorithm that builds a hierarchy of clusters.

Why this answer

K-means and hierarchical clustering are both unsupervised learning algorithms for clustering data into groups without labels.

10
Multi-Selectmedium

A data scientist is building a natural language processing model to classify customer reviews as positive or negative. Which TWO preprocessing steps are most essential before tokenization? (Select two.)

Select 2 answers
A.Perform stemming or lemmatization.
B.Remove punctuation and special characters.
C.Convert all text to lowercase.
D.Remove stop words from the text.
E.Replace missing values with a placeholder.
AnswersB, C

Removing punctuation helps tokens become clean words.

Why this answer

Removing punctuation and special characters (Option B) is essential because tokenizers typically split on whitespace, so punctuation attached to words (e.g., 'great!', 'bad.') would create noisy tokens like 'great!' and 'bad.' instead of clean tokens 'great' and 'bad'. Converting all text to lowercase (Option C) ensures that words like 'Great', 'great', and 'GREAT' are all mapped to the same token, preventing the model from treating them as distinct features and reducing vocabulary size.

Exam trap

CompTIA often tests the ordering of preprocessing steps, and the trap here is that candidates mistakenly believe stemming, lemmatization, or stop word removal should be done before tokenization, when in fact tokenization must come first to split the text into tokens for those later steps to operate on.

11
MCQhard

A manufacturing company is using a convolutional neural network (CNN) to detect defects on an assembly line. The model was trained on a balanced dataset of defective and non-defective parts. In production, the model shows high precision (95%) but very low recall (50%). The production line manager wants to minimize missed defects (false negatives). The data scientist has access to the original training data and can retrain the model. Which strategy is most effective for increasing recall while maintaining acceptable precision?

A.Apply data augmentation to defective images
B.Lower the classification threshold for the defective class
C.Use a bagging ensemble of CNNs
D.Oversample the defective class in training
AnswerB

Lowering the threshold increases sensitivity (recall) as more instances are classified as defective, directly reducing false negatives.

Why this answer

Lowering the classification threshold for the defective class directly addresses the recall issue by allowing more samples to be classified as defective, which reduces false negatives. This is the most immediate and effective method because it does not require retraining and can be tuned to balance precision and recall based on the manager's priority of minimizing missed defects.

Exam trap

CompTIA often tests the misconception that retraining with data augmentation or oversampling is the only way to fix recall issues, when in fact threshold tuning is a simpler and more direct post-training adjustment that does not require model retraining.

How to eliminate wrong answers

Option A is wrong because data augmentation on defective images primarily helps with generalization and overfitting, not with shifting the decision boundary to increase recall; it may improve model robustness but does not directly increase the number of true positives at inference time. Option C is wrong because a bagging ensemble of CNNs reduces variance and can improve overall accuracy, but it does not specifically target the recall-precision trade-off and may even lower recall if the ensemble's voting threshold remains unchanged. Option D is wrong because oversampling the defective class in training addresses class imbalance but the model was already trained on a balanced dataset; oversampling would not solve the underlying issue of the model's conservative decision boundary, and it could lead to overfitting on defective samples without guaranteeing higher recall.

12
MCQmedium

A company uses an AI model to screen job applications. The model is trained on historical hiring data that reflects past biases. After deployment, the model disproportionately rejects candidates from certain demographics. Which concept does this best illustrate?

A.Overfitting
B.Model drift
C.Algorithmic bias
D.Underfitting
AnswerC

Correct; this describes the biased outcome due to biased data.

Why this answer

Option C is correct because algorithmic bias refers to systematic and unfair discrimination in AI outputs due to biased training data or model design. Option A (overfitting) is about a model that performs well on training data but poorly on new data due to excessive complexity. Option B (underfitting) is when a model is too simple to capture patterns.

Option D (model drift) is about performance degradation over time due to changes in data distribution.

13
MCQmedium

A self-driving car uses an AI model that learns by trial and error, receiving rewards for correct actions and penalties for mistakes. This type of learning is:

A.Supervised learning
B.Unsupervised learning
C.Transfer learning
D.Reinforcement learning
AnswerD

Correct; RL uses rewards to learn optimal actions.

Why this answer

Reinforcement learning (RL) is the correct answer because the self-driving car's AI model learns through trial and error, receiving rewards for correct actions and penalties for mistakes. This feedback-driven process, where an agent interacts with an environment to maximize cumulative reward, is the defining characteristic of reinforcement learning, not supervised or unsupervised learning.

Exam trap

CompTIA often tests the distinction between reinforcement learning and supervised learning by describing a scenario with feedback (rewards/penalties) but no labeled dataset, leading candidates to mistakenly choose supervised learning because they associate 'feedback' with 'labels'.

How to eliminate wrong answers

Option A is wrong because supervised learning requires labeled input-output pairs (e.g., images tagged with 'stop sign') to train a model, not trial-and-error feedback. Option B is wrong because unsupervised learning finds hidden patterns in unlabeled data (e.g., clustering sensor readings) without any reward or penalty signals. Option C is wrong because transfer learning applies knowledge from a pre-trained model to a new but related task, not learning from scratch via rewards and punishments.

14
Multi-Selectmedium

Which TWO of the following are appropriate uses of unsupervised learning?

Select 2 answers
A.Classifying emails as spam or not spam
B.Predicting the sale price of a house given its features
C.Detecting unusual patterns in network traffic that may indicate a cyberattack
D.Identifying a person from a photo
E.Segmenting customers into groups based on purchasing behavior
AnswersC, E

Anomaly detection often uses unsupervised methods.

Why this answer

Unsupervised learning discovers hidden patterns or structures in unlabeled data. Detecting unusual patterns in network traffic (option C) is a classic anomaly detection task, often performed using clustering or autoencoders, where the model learns 'normal' behavior and flags deviations without requiring labeled attack data.

Exam trap

CompTIA often tests the distinction between supervised and unsupervised learning by presenting tasks that seem 'automatic' but actually require labeled data, tricking candidates into choosing supervised tasks as unsupervised uses.

15
MCQhard

A data scientist trains a deep neural network for image classification. The training loss decreases but validation loss starts increasing after 50 epochs. What should the data scientist do to improve generalization?

A.Decrease batch size
B.Apply dropout and early stopping
C.Add more hidden layers
D.Increase learning rate
AnswerB

Dropout randomly ignores neurons during training to reduce overfitting, and early stops when validation loss worsens, preventing further overfitting.

Why this answer

The increasing validation loss while training loss decreases is a classic sign of overfitting. Dropout randomly deactivates neurons during training, which prevents co-adaptation and forces the network to learn more robust features. Early stopping halts training when validation performance stops improving, directly addressing the overfitting by selecting the model with the best generalization before it degrades.

Exam trap

CompTIA often tests the misconception that increasing model complexity (more layers) or adjusting batch size/learning rate can fix overfitting, when in reality these changes either exacerbate the problem or address unrelated training dynamics.

How to eliminate wrong answers

Option A is wrong because decreasing batch size introduces more noise into the gradient estimates, which can actually hurt generalization and may lead to slower convergence or instability, not a direct cure for overfitting. Option C is wrong because adding more hidden layers increases model capacity and complexity, which typically worsens overfitting by allowing the network to memorize the training data even more. Option D is wrong because increasing the learning rate can cause the optimizer to overshoot minima, leading to divergence or poor convergence, and does not address the fundamental issue of the model fitting noise in the training data.

16
Multi-Selecthard

Which TWO of the following are techniques used for reducing overfitting in neural networks? (Choose two.)

Select 2 answers
A.Dropout
B.Boosting
C.L2 regularization
D.Increasing the learning rate
E.Increasing the number of hidden layers
AnswersA, C

Dropout randomly drops neurons to reduce overfitting.

Why this answer

Dropout is a regularization technique that randomly drops a fraction of neurons during training, which prevents the network from relying too heavily on any single neuron and forces it to learn more robust features. This reduces overfitting by introducing noise that improves generalization.

Exam trap

CompTIA often tests the distinction between regularization techniques and other training strategies, so the trap here is that candidates may confuse boosting (an ensemble method) with regularization, or assume that increasing model complexity (more layers) or learning rate can help reduce overfitting when they actually do the opposite.

17
MCQhard

An organization is developing an AI system to approve loan applications. They want to ensure the model does not discriminate based on race or gender. Which technique BEST addresses this concern?

A.Remove race and gender features from the training data.
B.Use a more complex model to capture nuances.
C.Apply adversarial debiasing during model training.
D.Collect more training data from diverse populations.
AnswerC

Correct; adversarial debiasing learns fair representations.

Why this answer

Adversarial debiasing is a technique that explicitly trains the model to remove sensitive information (like race or gender) from its internal representations, preventing the model from learning discriminatory patterns even if correlated features remain. This directly addresses fairness by making the model's predictions independent of protected attributes, which is more robust than simply removing features (which can still allow proxy discrimination).

Exam trap

CompTIA often tests the misconception that removing protected attributes is sufficient to eliminate bias, when in reality proxy features and correlated variables can still cause discrimination, making adversarial debiasing or other fairness-aware algorithms necessary.

How to eliminate wrong answers

Option A is wrong because simply removing race and gender features does not prevent the model from learning proxies for these attributes (e.g., zip code, income bracket) that can still lead to discriminatory outcomes. Option B is wrong because using a more complex model increases the risk of overfitting to spurious correlations and does not inherently address fairness; it may even amplify biases present in the data. Option D is wrong because collecting more diverse data does not guarantee fairness; biased labeling, historical discrimination, or imbalanced representation can persist, and the model may still learn to discriminate unless debiasing techniques are applied.

18
MCQhard

A cybersecurity firm is developing an AI system to detect zero-day malware using behavior analysis. The team collects a dataset of 1,000 malware samples and 10,000 benign files from corporate endpoints. The model is a random forest classifier. After deployment, the false positive rate is 5%, which is acceptable, but the detection rate for new malware variants drops to 30%. The security analyst suspects the model is overfitting to the specific malware families in the training set. Which improvement should the team implement first?

A.Use a boosting ensemble instead of bagging
B.Collect more malware samples from the same families
C.Replace the random forest with a deep neural network
D.Engineer features that capture generic behavioral patterns
AnswerD

Generic features (e.g., process creation frequency, registry changes) help the model learn behaviors common to malware, improving detection of new variants.

Why this answer

The core issue is that the model has overfitted to the specific malware families in the training set, causing poor generalization to unseen zero-day variants. Engineering features that capture generic behavioral patterns (e.g., API call sequences, file system interactions, network connection anomalies) reduces reliance on family-specific signatures, improving detection of novel malware. This directly addresses the root cause of the 30% detection rate drop without introducing new model complexity or data imbalance issues.

Exam trap

CompTIA often tests the misconception that more complex models (boosting, DNNs) automatically improve performance, when in reality, feature engineering to address the specific failure mode (overfitting to training families) is the most effective first step.

How to eliminate wrong answers

Option A is wrong because boosting ensembles (e.g., AdaBoost, XGBoost) are more prone to overfitting on noisy data than bagging (Random Forest), which would exacerbate the existing overfitting problem. Option B is wrong because collecting more samples from the same families reinforces the model's bias toward those specific patterns, worsening generalization to new variants. Option C is wrong because replacing Random Forest with a deep neural network (DNN) typically requires significantly more data to avoid overfitting, and with only 1,000 malware samples, a DNN would likely perform worse, not better.

19
MCQeasy

A company is building a recommendation system for an e-commerce platform. They want the system to learn from user purchase history and browsing behavior to suggest products. Which type of machine learning is most appropriate for this task?

A.Supervised learning
B.Semi-supervised learning
C.Unsupervised learning
D.Transfer learning
AnswerC

Unsupervised learning can find patterns in user behavior without labels, suitable for recommendations.

Why this answer

Unsupervised learning is the most appropriate because the system must discover hidden patterns and groupings in user purchase history and browsing behavior without labeled outcomes. Recommendation systems often use clustering or association rule mining (e.g., market basket analysis) to identify product affinities and user segments, which are core unsupervised techniques. This allows the system to suggest products based on learned co-occurrence patterns rather than predefined categories.

Exam trap

CompTIA often tests the misconception that recommendation systems always require labeled data, leading candidates to choose supervised learning, but the key is that unsupervised learning excels at finding hidden structures in unlabeled behavioral data.

How to eliminate wrong answers

Option A is wrong because supervised learning requires labeled training data (e.g., explicit ratings or purchase/no-purchase labels), which the scenario does not provide; the system must learn from unlabeled behavioral data. Option B is wrong because semi-supervised learning still requires a small amount of labeled data to guide the learning, but the problem statement specifies only raw purchase history and browsing behavior with no labels. Option D is wrong because transfer learning involves applying knowledge from a pre-trained model on a different but related task, which is unnecessary here since the system can learn directly from the available data without needing to transfer from another domain.

20
MCQhard

An e-commerce company deploys a recommendation system using collaborative filtering. After launch, the system shows high accuracy for popular items but fails to recommend niche products to users who would likely buy them. Which technique should the team implement to improve recommendations for long-tail items?

A.Apply matrix factorization with higher latent factors
B.Switch to a hybrid filtering approach that incorporates item metadata
C.Increase the weight of popular items in the recommendation score
D.Collect more user interaction data over time
AnswerB

Hybrid filtering uses item features to recommend niche items even with sparse interaction data.

Why this answer

Collaborative filtering relies on user-item interactions, which are sparse for niche products (the long tail). A hybrid filtering approach that incorporates item metadata (e.g., category, description, attributes) can bridge the gap by using content-based signals to recommend niche items even when interaction data is limited. This directly addresses the cold-start and sparsity problems for long-tail items.

Exam trap

CompTIA often tests the misconception that more data or higher model complexity (like more latent factors) automatically solves sparsity, when in fact the core issue is the lack of interaction signals for niche items, which requires a hybrid approach to incorporate auxiliary information.

How to eliminate wrong answers

Option A is wrong because increasing latent factors in matrix factorization can lead to overfitting and does not inherently solve the sparsity problem for long-tail items; it may even amplify noise. Option C is wrong because increasing the weight of popular items would further bias recommendations toward the head of the distribution, worsening the neglect of niche products. Option D is wrong because simply collecting more user interaction data over time does not guarantee that long-tail items will receive sufficient interactions; the data will still be skewed toward popular items, and the system needs a mechanism to leverage non-interaction signals like metadata.

21
MCQmedium

A data scientist is training a neural network to classify images of animals. The training accuracy is 99%, but validation accuracy is only 65%. Which technique should the data scientist use to address this issue?

A.Apply batch normalization
B.Increase the number of training epochs
C.Add dropout layers to the network
D.Increase the learning rate
AnswerC

Dropout randomly deactivates neurons, which reduces overfitting by making the model less sensitive to specific weights.

Why this answer

The high training accuracy (99%) and low validation accuracy (65%) indicate overfitting, where the model memorizes the training data but fails to generalize. Adding dropout layers randomly drops neurons during training, which forces the network to learn more robust features and reduces overfitting. This technique is specifically designed to improve generalization without requiring more data or altering the learning rate.

Exam trap

CompTIA often tests the distinction between techniques that improve training speed (batch normalization, learning rate tuning) versus those that improve generalization (dropout, regularization), and the trap here is that candidates may confuse overfitting with underfitting or assume that more training always helps.

How to eliminate wrong answers

Option A is wrong because batch normalization normalizes layer inputs to stabilize and accelerate training, but it does not directly address overfitting; it can even slightly reduce the need for dropout but is not the primary solution for this gap. Option B is wrong because increasing the number of training epochs would likely worsen overfitting, as the model would have more opportunities to memorize the training data, further increasing the accuracy gap. Option D is wrong because increasing the learning rate can cause the model to converge too quickly to a suboptimal solution or diverge, and it does not target the root cause of overfitting.

22
MCQmedium

A startup is building a chatbot to handle customer inquiries. They want the chatbot to understand context and provide accurate responses without requiring extensive labeled data. Which AI approach is most suitable?

A.Reinforcement learning from human feedback
B.Rule-based natural language processing
C.Convolutional neural networks (CNNs)
D.Transfer learning with a pre-trained transformer model
AnswerD

Transfer learning leverages pre-trained language models and fine-tunes with small data.

Why this answer

Transfer learning with a pre-trained transformer model (e.g., BERT, GPT) is the most suitable approach because it allows the chatbot to understand context and generate accurate responses using knowledge learned from vast general-domain text, requiring only minimal fine-tuning on the startup's specific customer inquiry data. This eliminates the need for extensive labeled datasets, as the model already captures nuanced language patterns and contextual relationships through its self-attention mechanism.

Exam trap

CompTIA often tests the misconception that RLHF alone reduces the need for labeled data, when in fact it requires a pre-trained model and a reward model trained on human preferences, making transfer learning the more direct solution for minimizing labeled data requirements.

How to eliminate wrong answers

Option A is wrong because reinforcement learning from human feedback (RLHF) is a fine-tuning technique that still requires a substantial initial labeled dataset or a reward model, and it is typically applied on top of a pre-trained model rather than being a standalone solution for reducing labeled data needs. Option B is wrong because rule-based NLP relies on handcrafted rules and pattern matching, which cannot handle the variability and contextual ambiguity of natural language in customer inquiries without extensive manual effort and brittle maintenance. Option C is wrong because convolutional neural networks (CNNs) are primarily designed for spatial pattern recognition (e.g., images) and, while they can be applied to text, they lack the sequential context modeling and long-range dependency capture that transformer architectures provide, making them less effective for conversational understanding.

23
MCQeasy

A data scientist is training a model to classify customer support tickets into categories. The dataset has 10,000 labeled examples, but the 'billing' category contains 8,000 examples while the 'technical' category contains 2,000. Which technique is most appropriate to address this imbalance before training?

A.Apply random oversampling on the 'technical' category.
B.Remove all examples except 'billing' and use a one-class classifier.
C.Use accuracy as the only evaluation metric.
D.Train the model as is, then adjust thresholds post-training.
AnswerA

Correct; oversampling balances the classes.

Why this answer

Option A is correct because random oversampling duplicates examples from the minority class ('technical') to balance the class distribution, preventing the model from becoming biased toward the majority class ('billing'). This technique directly addresses the class imbalance before training, which is critical for classification tasks where the minority class is underrepresented.

Exam trap

CompTIA often tests the misconception that adjusting thresholds post-training can compensate for class imbalance, but the trap here is that the model's internal weights are already skewed by the imbalanced training data, making threshold tuning ineffective without prior balancing.

How to eliminate wrong answers

Option B is wrong because removing all 'billing' examples discards the majority of the data, forcing a one-class classifier that cannot learn to distinguish between categories, which defeats the purpose of multi-class classification. Option C is wrong because accuracy is a misleading metric for imbalanced datasets; a model that always predicts 'billing' would achieve 80% accuracy without learning anything about 'technical' tickets. Option D is wrong because training the model as is on imbalanced data will bias the model toward the majority class, and post-training threshold adjustment alone cannot fix the underlying skewed decision boundary learned during training.

24
MCQeasy

An organization wants to classify support tickets into categories (billing, technical, etc.). Which type of machine learning is most suitable?

A.Unsupervised learning
B.Reinforcement learning
C.Supervised learning
D.Regression
AnswerC

Classification uses labeled data to predict categories.

Why this answer

Supervised learning is the correct choice because the organization has labeled historical support tickets (e.g., 'billing' or 'technical') and wants to train a model to map new tickets to these predefined categories. This is a classic classification task, where the algorithm learns from input-output pairs to predict the correct label for unseen data.

Exam trap

CompTIA often tests the distinction between classification (supervised) and clustering (unsupervised), so the trap here is that candidates mistakenly choose unsupervised learning because they think 'grouping tickets' is clustering, ignoring that the categories are predefined and labeled.

How to eliminate wrong answers

Option A is wrong because unsupervised learning discovers hidden patterns or clusters in unlabeled data, but here the categories are known and labeled, so clustering is unnecessary. Option B is wrong because reinforcement learning involves an agent learning through trial-and-error interactions with an environment to maximize a reward signal, which is not applicable to static ticket classification. Option D is wrong because regression predicts continuous numerical values (e.g., ticket resolution time), not discrete categorical labels like 'billing' or 'technical'.

25
MCQmedium

A hospital uses an AI system to prioritize patient triage based on vital signs and medical history. During a trial, the system consistently assigns lower urgency to elderly patients with chronic conditions, even when their symptoms suggest high risk. Which approach best addresses this bias?

A.Use a different dataset from a similar hospital without checking demographics
B.Manually increase the weight of age-related features in the model
C.Replace the neural network with a decision tree to simplify decision logic
D.Audit the training data for representation of elderly patients and retrain with balanced data
AnswerD

Auditing and retraining with balanced data addresses the root cause of bias.

Why this answer

Option D is correct because the bias originates from the training data underrepresenting elderly patients with chronic conditions, causing the model to learn skewed urgency patterns. Auditing the data for representation and retraining with balanced data directly addresses the root cause by ensuring the model learns from a fair distribution of cases, which is a standard bias mitigation technique in AI systems.

Exam trap

CompTIA often tests the misconception that changing the model architecture (e.g., switching to a decision tree) or manually tweaking feature weights can fix bias, when the real solution lies in auditing and rebalancing the training data.

How to eliminate wrong answers

Option A is wrong because using a different dataset from a similar hospital without checking demographics merely shifts the problem; it does not guarantee balanced representation and may introduce new biases. Option B is wrong because manually increasing the weight of age-related features is a form of ad hoc feature engineering that can overcorrect and introduce new biases, and it does not address the underlying data imbalance. Option C is wrong because replacing the neural network with a decision tree does not inherently fix bias; the decision tree will still learn from the same biased data, and its simpler logic does not prevent it from replicating the skewed patterns.

26
MCQhard

A team is designing an AI system for autonomous driving. They need to decide between an end-to-end deep learning approach versus a modular pipeline (perception, planning, control). Which is a key advantage of the modular approach?

A.It typically has lower inference latency.
B.Each module can be validated separately.
C.It handles novel scenarios better due to joint training.
D.It requires less engineering effort.
AnswerB

Correct; separability improves safety and troubleshooting.

Why this answer

Option C is correct because modular systems allow independent testing and debugging of each component. Option A (end-to-end simplicity) is not true; modular is more complex. Option B (lower latency) is not inherent.

Option D (end-to-end can be more robust to novel situations) is debated but modular offers better interpretability.

27
Multi-Selecteasy

A data scientist is preparing a dataset for supervised learning. Which TWO steps are essential?

Select 2 answers
A.One-hot encoding all features
B.Normalizing features
C.Labeling the data
D.Removing outliers
E.Splitting into training and test sets
AnswersC, E

Correct; supervised learning requires labeled examples.

Why this answer

Labeling the data is essential for supervised learning because the algorithm requires input-output pairs to learn a mapping function. Without labeled data, the model cannot be trained to predict outcomes, as supervised learning relies on ground-truth targets for error correction during training.

Exam trap

CompTIA often tests the distinction between mandatory preprocessing steps and optional optimizations, trapping candidates who confuse best practices (like normalization or outlier removal) with absolute requirements for supervised learning.

28
MCQeasy

A healthcare startup is building an AI system to predict patient readmission risk. The team collects structured data from electronic health records (EHR) including age, diagnosis codes, lab results, and previous admissions. During initial training, the model achieves 95% accuracy on the validation set but only 60% accuracy on a holdout test set from a different hospital. The data scientist suspects overfitting. Which action should the team take first to improve generalization?

A.Apply L2 regularization to the model
B.Switch to a linear regression model
C.Increase the model complexity by adding more layers
D.Collect more data from the same hospital
AnswerA

Regularization penalizes large coefficients, reducing overfitting and improving generalization to new data.

Why this answer

The model's high accuracy on the validation set but poor accuracy on a holdout test set from a different hospital indicates overfitting to the training data's specific patterns, which do not generalize to new data. L2 regularization (ridge regression) adds a penalty proportional to the square of the weights, discouraging the model from fitting noise and encouraging simpler, more generalizable decision boundaries. This directly addresses overfitting by reducing variance without requiring more data or reducing model capacity too drastically.

Exam trap

CompTIA often tests the misconception that overfitting is always solved by more data, but the trap here is that collecting more data from the same source does not fix distribution shift—regularization directly penalizes model complexity to improve generalization to unseen distributions.

How to eliminate wrong answers

Option B is wrong because switching to a linear regression model would reduce model capacity, potentially underfitting the complex relationships in EHR data, and does not specifically target the overfitting caused by high variance. Option C is wrong because increasing model complexity by adding more layers would exacerbate overfitting, making the model even more sensitive to training data noise and further reducing generalization. Option D is wrong because collecting more data from the same hospital would reinforce the same distributional biases and does not address the core issue of the model failing to generalize to a different hospital's data distribution.

29
MCQeasy

A company wants to deploy a chatbot that uses natural language understanding (NLU) to answer customer queries. Which AI technique is most suitable for understanding the intent of user input?

A.K-means clustering
B.Linear regression
C.Sequence-to-sequence model with attention
D.Decision tree
AnswerC

This architecture effectively models sequences and captures important parts of input via attention, ideal for understanding user intent.

Why this answer

Option C is correct because sequence-to-sequence models with attention are specifically designed to handle variable-length input sequences (like user queries) and map them to output sequences (like intent labels or responses). The attention mechanism allows the model to focus on the most relevant parts of the input when determining intent, which is critical for understanding nuanced or long user queries in NLU tasks.

Exam trap

The trap here is that candidates often confuse clustering (K-means) or simple classification (decision trees) with NLU, failing to recognize that understanding intent requires modeling sequential dependencies and context, which only sequence-to-sequence models with attention provide.

How to eliminate wrong answers

Option A is wrong because K-means clustering is an unsupervised learning algorithm used for grouping similar data points into clusters, not for understanding the intent of user input, which requires supervised learning or sequence modeling. Option B is wrong because linear regression is a regression technique for predicting continuous numerical values, not for classifying or interpreting the intent of natural language text. Option D is wrong because decision trees are simple rule-based classifiers that lack the ability to capture sequential dependencies and context in natural language, making them unsuitable for intent recognition in chatbot NLU.

30
MCQmedium

Refer to the exhibit. An AI auditor reviews the fairness configuration. What is the purpose of this policy?

A.Ensure equal error rates across groups
B.Ensure equal positive prediction rates across groups
C.Ensure equal accuracy across groups
D.Ensure model interpretability
AnswerB

Correct; demographic parity aims for similar selection rates.

Why this answer

The policy sets a fairness constraint that requires the model's positive prediction rate (the fraction of instances predicted as the positive class) to be equal across all defined groups. This is a standard demographic parity requirement, which is implemented by adjusting the decision threshold or reweighting training data to ensure that each group receives the same proportion of positive predictions, regardless of the actual outcome distribution.

Exam trap

CompTIA often tests the distinction between demographic parity (equal positive prediction rates) and equalized odds (equal error rates), so candidates mistakenly choose 'equal error rates' when they see a fairness policy that actually enforces demographic parity.

How to eliminate wrong answers

Option A is wrong because equal error rates across groups refer to equalized odds (equal false positive and false negative rates), not equal positive prediction rates. Option C is wrong because equal accuracy across groups is a different fairness metric (accuracy parity) that does not guarantee equal positive prediction rates. Option D is wrong because model interpretability is a separate concern about understanding model decisions, not a fairness constraint on prediction rates.

31
MCQhard

An AI system is being developed to diagnose diseases from medical images. The model achieves 99% accuracy on the test set, but when deployed in a different hospital, performance drops significantly. Which of the following is the MOST likely cause?

A.The model is being attacked by adversarial examples.
B.The training data does not represent the new hospital's population or imaging equipment.
C.The model is overfitted to the training data.
D.Data leakage occurred during preprocessing.
AnswerB

Correct; domain shift is a common cause of performance degradation.

Why this answer

The model's high accuracy on the test set but poor performance in a different hospital indicates a distribution shift between the training data and the deployment environment. This is a classic case of dataset shift, where the training data does not represent the new hospital's patient population or imaging equipment, leading to degraded model generalization.

Exam trap

CompTIA often tests the distinction between overfitting and dataset shift, where candidates mistakenly attribute a deployment performance drop to overfitting even when test accuracy is high, missing the real issue of distribution mismatch.

How to eliminate wrong answers

Option A is wrong because adversarial examples are deliberately crafted inputs designed to fool a model, but the scenario describes a general performance drop across all images, not targeted attacks. Option C is wrong because overfitting would cause poor performance on the test set as well, not just on deployment; here the test accuracy is high, ruling out overfitting. Option D is wrong because data leakage would inflate test accuracy artificially, but the drop in deployment is due to distribution mismatch, not leakage during preprocessing.

32
MCQhard

An AI system for autonomous vehicles uses reinforcement learning (RL) to navigate. The reward function encourages reaching the destination quickly but penalizes collisions heavily. The agent learns to drive aggressively, causing minor accidents. Which modification to the reward function would best align the agent's behavior with desired safe driving?

A.Increase the collision penalty to a very large negative value.
B.Remove the time-based reward and only reward reaching the destination.
C.Use a potential-based reward shaping to encourage progress toward destination.
D.Add a penalty term for high acceleration and jerky movements.
AnswerD

Penalizing aggressive actions directly encourages smooth driving.

Why this answer

Option D is correct because adding a penalty for high acceleration and jerky movements directly addresses the root cause of the aggressive driving behavior—smoothness and safety—without undermining the primary goal of reaching the destination. This modification shapes the reward function to penalize unsafe driving patterns, aligning the agent's learned policy with desired safe navigation while preserving the time-based incentive for efficiency.

Exam trap

CompTIA often tests the misconception that simply increasing the penalty for collisions (option A) is sufficient to ensure safe driving, when in reality it can lead to reward hacking or overly conservative policies, and the correct solution requires shaping the reward to penalize the specific unsafe behaviors (e.g., high acceleration) that cause accidents.

How to eliminate wrong answers

Option A is wrong because simply increasing the collision penalty to a very large negative value may cause the agent to become overly cautious, potentially leading to freezing behavior or failure to navigate effectively, and does not address the underlying aggressive driving patterns that cause minor accidents. Option B is wrong because removing the time-based reward eliminates the incentive for efficiency, which could result in the agent taking excessively long routes or failing to prioritize timely arrival, thus not aligning with the desired safe driving behavior. Option C is wrong because potential-based reward shaping encourages progress toward the destination but does not penalize aggressive maneuvers; it may still allow the agent to drive aggressively as long as it makes progress, failing to mitigate the unsafe driving patterns.

33
Multi-Selectmedium

A team is deploying an AI model for credit approval. Which TWO ethical considerations must be addressed?

Select 2 answers
A.Training speed
B.Model interpretability
C.Model accuracy
D.Model fairness to avoid bias
E.Model size
AnswersB, D

Correct; interpretability helps ensure transparency and accountability.

Why this answer

Model interpretability (B) is essential for credit approval because financial decisions must be explainable to regulators and customers under laws like GDPR or ECOA. A black-box model that cannot justify why a loan was denied violates compliance requirements, making interpretability a core ethical and legal necessity.

Exam trap

CompTIA often tests the distinction between ethical requirements (interpretability, fairness) and technical performance metrics (accuracy, speed, size), leading candidates to mistakenly select accuracy as an ethical consideration.

34
MCQeasy

A retail company wants to build a model to predict customer churn based on purchase history and demographics. The dataset includes categorical features like region and gender, and numerical features like total spend. What is the best initial step before training the model?

A.Train a deep neural network directly on raw data
B.One-hot encode categorical variables and normalize numerical variables
C.Remove all categorical features to simplify the model
D.Perform principal component analysis (PCA) on all features
AnswerB

This is the correct initial step to prepare the data for most machine learning models.

Why this answer

One-hot encoding categorical variables and normalizing numerical variables is standard preprocessing to convert categorical data into numeric format and scale features, which many algorithms require for optimal performance.

35
MCQhard

An AI engineer trains a deep learning model for image classification. After training, the training accuracy is 99% but validation accuracy is 85%. Which technique would best address this discrepancy?

A.Increase data augmentation
B.Decrease the learning rate
C.Increase the number of layers
D.Add dropout layers
AnswerD

Dropout reduces overfitting by preventing co-adaptation of neurons.

Why this answer

The high training accuracy (99%) and lower validation accuracy (85%) indicate overfitting, where the model memorizes training data but fails to generalize. Dropout layers randomly deactivate neurons during training, forcing the network to learn more robust features and reducing overfitting. This technique directly addresses the discrepancy by improving validation performance without sacrificing training capacity.

Exam trap

CompTIA often tests the distinction between techniques that address overfitting (like dropout) versus those that improve convergence (like learning rate adjustment) or model capacity (like adding layers), trapping candidates who confuse regularization with optimization.

How to eliminate wrong answers

Option A is wrong because increasing data augmentation can help reduce overfitting by creating more varied training samples, but it is not the best technique here as it may not sufficiently address the already severe overfitting and could introduce noise; dropout is a more direct regularization method. Option B is wrong because decreasing the learning rate addresses convergence issues (e.g., slow training or oscillation) but does not directly combat overfitting; it may even worsen the gap if the model continues to memorize. Option C is wrong because increasing the number of layers adds more parameters, which typically exacerbates overfitting by increasing model capacity, making the discrepancy worse.

36
MCQhard

A company develops an AI model that recommends job candidates. The model inadvertently discriminates against a protected group. Which approach is most effective for mitigating this bias?

A.Remove the protected attribute from the training data
B.Use a fairness-aware machine learning algorithm
C.Analyze model predictions after deployment
D.Collect more training data from the protected group
AnswerB

Fairness-aware algorithms incorporate constraints to reduce disparate impact.

Why this answer

Option B is correct because fairness-aware machine learning algorithms explicitly incorporate fairness constraints or objectives during model training, directly addressing and mitigating bias against protected groups. Unlike simple removal of protected attributes, these algorithms can detect and correct for proxy discrimination and disparate impact, ensuring the model's recommendations are equitable by design.

Exam trap

CompTIA often tests the misconception that removing a protected attribute from training data is sufficient to eliminate bias, but the trap is that models can still discriminate through correlated proxy features, making fairness-aware algorithms necessary.

How to eliminate wrong answers

Option A is wrong because simply removing the protected attribute from training data does not eliminate bias; the model can still learn proxies for that attribute from correlated features (e.g., zip code correlating with race), leading to indirect discrimination. Option C is wrong because analyzing model predictions after deployment is a detection step, not a mitigation approach; it can identify bias but does not prevent or correct it in the model's behavior. Option D is wrong because collecting more training data from the protected group does not inherently address bias; it may even amplify existing disparities if the data collection process or underlying societal biases remain unchanged, and it does not adjust the model's learning process to ensure fairness.

37
MCQeasy

A data scientist is preparing a dataset for a classification task. The dataset contains 10,000 rows and 50 features, but many features have missing values. Which approach should the scientist take first to address the missing data?

A.Use a deep learning model to predict missing values without preprocessing.
B.Analyze the pattern and proportion of missing values to choose an appropriate imputation strategy.
C.Remove all rows with any missing values to ensure a clean dataset.
D.Replace missing values with the mean of each feature immediately.
AnswerB

Understanding missingness pattern is crucial before deciding on imputation or deletion.

Why this answer

Option B is correct because the first step in handling missing data is to understand the pattern and proportion of missingness (e.g., MCAR, MAR, MNAR) to select an appropriate imputation method. Blindly applying imputation or deletion without analysis can introduce bias or reduce model performance. This diagnostic step ensures the chosen strategy aligns with the data's underlying structure and the classification task's requirements.

Exam trap

CompTIA often tests the misconception that immediate imputation (e.g., mean/median) or row deletion is the safest first step, when in reality, a diagnostic analysis of missingness patterns is required before any data modification.

How to eliminate wrong answers

Option A is wrong because deep learning models typically require complete data or sophisticated handling of missingness; using them to predict missing values without preprocessing ignores the need to first understand the missing data mechanism and can lead to overfitting or biased predictions. Option C is wrong because removing all rows with any missing values can discard a significant portion of the dataset (up to 50 features with missingness), potentially losing valuable information and reducing statistical power, especially when missingness is not completely random. Option D is wrong because immediately replacing missing values with the mean of each feature assumes the data is missing completely at random (MCAR) and can distort feature distributions, reduce variance, and introduce bias if the missingness is related to the feature values themselves.

38
MCQmedium

An AI system is being designed to automatically detect fraudulent transactions in real-time. The system must have low latency and high precision to minimize false alarms. Which algorithm is most appropriate?

A.Logistic regression
B.Convolutional neural network
C.Deep reinforcement learning
D.Random forest
AnswerD

Random forest provides high accuracy and precision with low inference latency, making it ideal for real-time fraud detection.

Why this answer

Random forest is the most appropriate algorithm because it handles high-dimensional transaction data, provides feature importance for interpretability, and achieves high precision with low latency through ensemble decision trees. Its parallelizable structure allows real-time scoring, and it naturally balances precision and recall without the computational overhead of deep learning.

Exam trap

CompTIA often tests the misconception that deep learning (CNNs or reinforcement learning) is always superior for complex tasks, but here the key constraints are low latency and high precision on tabular data, where ensemble methods like random forest outperform deep models.

How to eliminate wrong answers

Option A is wrong because logistic regression assumes linear decision boundaries and cannot capture complex non-linear patterns in transaction data, leading to lower precision. Option B is wrong because convolutional neural networks are designed for spatial data like images, not tabular transaction features, and introduce unnecessary latency and computational cost for real-time fraud detection. Option C is wrong because deep reinforcement learning is used for sequential decision-making in dynamic environments (e.g., game playing, robotics), not for static classification tasks like fraud detection, and its training instability and high latency make it unsuitable for real-time scoring.

39
MCQmedium

A hospital deploys an AI system to detect pneumonia from chest X-rays. The model achieves 95% accuracy on the test set but later is found to be less accurate for patients under 18. The development team suspects bias. Which step should be taken first to investigate?

A.Automatically retrain the model with a balanced dataset including more pediatric cases.
B.Expand the test set with more pediatric X-rays and re-evaluate overall accuracy.
C.Compute and compare performance metrics for different age subgroups in the test set.
D.Add more features to the model to capture age-related anatomical differences.
AnswerC

Subgroup analysis is the standard first step in fairness auditing.

Why this answer

Option C is correct because the first step in investigating suspected model bias is to perform a disaggregated analysis of performance metrics across relevant subgroups, such as age brackets. This directly identifies whether the model's accuracy, precision, recall, or other metrics differ significantly for pediatric patients versus adults, confirming the presence and nature of the bias before any remediation is attempted.

Exam trap

CompTIA often tests the principle that aggregate metrics like overall accuracy can be misleading, and the trap here is that candidates jump to a solution (retraining or adding features) before performing the necessary diagnostic step of subgroup performance analysis.

How to eliminate wrong answers

Option A is wrong because automatically retraining the model with a balanced dataset without first understanding the root cause of the bias could introduce new biases or fail to address the specific issue, and it skips the critical diagnostic step of measuring subgroup performance. Option B is wrong because expanding the test set with more pediatric X-rays and re-evaluating overall accuracy would dilute the subgroup signal into a single aggregate metric, masking the disparity rather than revealing it. Option D is wrong because adding more features to the model without first analyzing the existing bias is a premature intervention; it assumes the bias stems from missing features rather than from imbalanced training data or model behavior, and it could increase complexity without solving the underlying problem.

40
Multi-Selectmedium

Which TWO statements correctly describe the difference between supervised and unsupervised learning?

Select 2 answers
A.Supervised learning is only used for classification
B.Unsupervised learning always requires a target variable
C.Supervised learning requires labeled data
D.Supervised learning is a subset of reinforcement learning
E.Unsupervised learning discovers hidden patterns
AnswersC, E

Labels are required for supervised tasks.

Why this answer

Option C is correct because supervised learning relies on labeled datasets where each training example is paired with an output label, enabling the model to learn a mapping from inputs to outputs. This is a fundamental distinction from unsupervised learning, which works with unlabeled data to find inherent structures or patterns.

Exam trap

CompTIA often tests the misconception that supervised learning is synonymous with classification, ignoring regression, or that unsupervised learning requires a target variable, which is a direct contradiction of its definition.

41
MCQmedium

An AI model for detecting fraudulent transactions has high precision but low recall. Which business impact is most likely?

A.The model has no impact on fraud detection
B.The model detects all fraudulent transactions
C.Many fraudulent transactions go undetected
D.Many legitimate transactions are flagged as fraud
AnswerC

Low recall indicates a high number of false negatives.

Why this answer

High precision means that when the model flags a transaction as fraudulent, it is very likely correct. However, low recall indicates that the model misses a significant proportion of actual fraudulent transactions. Therefore, the most likely business impact is that many fraudulent transactions go undetected, leading to financial losses.

Exam trap

CompTIA often tests the distinction between precision and recall by presenting a scenario where candidates confuse high precision with high recall, leading them to incorrectly select option D (many legitimate transactions flagged) instead of recognizing that low recall causes undetected fraud.

How to eliminate wrong answers

Option A is wrong because a model with high precision and low recall does have a significant impact—it fails to catch many fraud cases, which directly affects business outcomes. Option B is wrong because low recall means the model does not detect all fraudulent transactions; it misses many, contradicting the claim of detecting all fraud. Option D is wrong because high precision implies few false positives, so legitimate transactions are rarely flagged as fraud; that scenario would correspond to low precision, not high precision.

42
MCQeasy

Based on the exhibit, what issue should the team address?

A.Model accuracy below threshold
B.Potential fairness bias across groups
C.High latency
D.Low throughput
AnswerB

The disparity in accuracy between Group B (0.83) and other groups (0.97, 0.96) indicates a fairness issue that needs to be addressed.

Why this answer

Option B is correct because the exhibit likely shows a confusion matrix or performance metrics broken down by demographic groups (e.g., race, gender), revealing that the model's false positive or false negative rates differ significantly across groups. This disparity indicates a potential fairness bias, which must be addressed to ensure equitable outcomes, especially in high-stakes AI applications like hiring or lending.

Exam trap

CompTIA often tests the misconception that high overall accuracy or low latency/throughput issues are the primary concerns, when the real problem is hidden bias revealed only by disaggregated performance metrics across subgroups.

How to eliminate wrong answers

Option A is wrong because the exhibit does not show an overall accuracy metric below a threshold; instead, it highlights group-wise performance differences, not a global accuracy issue. Option C is wrong because latency refers to inference time per request, which is not indicated by group-wise performance metrics or confusion matrices. Option D is wrong because throughput measures the number of predictions per second, which is unrelated to the group-level bias patterns shown in the exhibit.

43
MCQmedium

A manufacturing company uses a computer vision AI to inspect products on an assembly line for defects. The AI model was trained on images from a single camera angle under bright, uniform lighting. Recently, the company moved the inspection station to a different part of the factory where lighting is dimmer and varies due to nearby windows. The model now misclassifies many non-defective products as defective, causing false alarms and production delays. The team has limited labeled data from the new environment. Which action should the team take to restore inspection accuracy while minimizing downtime?

A.Apply domain adaptation techniques using a small set of labeled images from the new environment
B.Increase the defect classification threshold to reduce false positives
C.Revert to the previous lighting setup by reinstalling bright, uniform lights
D.Retrain the model from scratch using a large dataset of images from the new environment
AnswerA

Domain adaptation adjusts the model to new conditions with minimal data.

Why this answer

Domain adaptation techniques allow a model trained on a source domain (bright, uniform lighting) to generalize to a target domain (dim, variable lighting) using only a small set of labeled images from the new environment. This approach minimizes downtime because it avoids the need for large-scale data collection or retraining from scratch, and it directly addresses the distribution shift that causes false positives.

Exam trap

CompTIA often tests the misconception that simply adjusting a threshold or reverting to old conditions is a valid fix, when the correct approach is to adapt the model to the new data distribution using domain adaptation.

How to eliminate wrong answers

Option B is wrong because increasing the classification threshold reduces false positives at the cost of increasing false negatives, which would allow defective products to pass inspection — a critical safety and quality risk. Option C is wrong because reverting to the previous lighting setup is a workaround that does not solve the underlying domain shift problem and may be impractical or costly if the new location is fixed. Option D is wrong because retraining from scratch requires a large labeled dataset from the new environment, which the team does not have, and would cause significant downtime for data collection and training.

44
MCQhard

An AI model achieves high accuracy on training data but performs poorly on new test data. The data scientist suspects the model has memorized noise. Which technique directly adds a penalty term to the loss function to address this?

A.Batch normalization
B.Data augmentation
C.Dropout
D.L2 regularization
AnswerD

Correct; L2 adds a penalty term proportional to squared weights.

Why this answer

L2 regularization (also known as weight decay) directly adds a penalty term proportional to the squared magnitude of the model's weights to the loss function. This discourages the model from fitting the noise in the training data by keeping weights small, thereby reducing overfitting and improving generalization to new test data.

Exam trap

CompTIA often tests the distinction between regularization techniques that modify the loss function (L2) versus those that modify the network architecture or data (dropout, batch normalization, data augmentation), so candidates mistakenly choose dropout because it is a well-known regularization method, even though it does not add a penalty term to the loss function.

How to eliminate wrong answers

Option A is wrong because batch normalization normalizes the inputs of each layer to stabilize and accelerate training, but it does not add a penalty term to the loss function; it addresses internal covariate shift, not overfitting from memorized noise. Option B is wrong because data augmentation artificially expands the training dataset by applying transformations (e.g., rotations, flips) to reduce overfitting, but it does not modify the loss function with a penalty term. Option C is wrong because dropout randomly drops neurons during training to prevent co-adaptation, which is a regularization technique but it does not add a penalty term to the loss function; it works by altering the network architecture during training.

45
MCQmedium

A financial institution uses a regression model to predict credit risk. The model has a high R-squared on training data but low R-squared on test data. Which of the following is the most likely cause?

A.The features were not standardized before training.
B.The model is overfitting the training data.
C.The model is underfitting the training data.
D.There is multicollinearity among the input features.
AnswerB

Overfitting explains high training and low test performance.

Why this answer

A high R-squared on training data combined with a low R-squared on test data is the classic symptom of overfitting. The model has memorized noise and specific patterns in the training set rather than learning generalizable relationships, causing poor performance on unseen data.

Exam trap

CompTIA often tests the distinction between overfitting and underfitting by presenting a high training metric with a low test metric, tempting candidates to think the model is 'too good' or that data preprocessing (like standardization) is the fix.

How to eliminate wrong answers

Option A is wrong because feature standardization (scaling) affects convergence speed for some algorithms but does not inherently cause overfitting or the described train-test R-squared gap. Option C is wrong because underfitting would produce low R-squared on both training and test data, not high on training and low on test. Option D is wrong because multicollinearity inflates coefficient variances and can reduce interpretability, but it does not typically cause a large discrepancy between training and test R-squared; it affects both sets similarly.

46
MCQmedium

Refer to the exhibit. A data scientist defines a model configuration in JSON. Which component is missing from the configuration for a complete machine learning pipeline?

A.Training hyperparameters
B.Data preprocessing steps
C.Model type
D.Evaluation metrics
AnswerB

Preprocessing (scaling, encoding) is missing.

Why this answer

A complete machine learning pipeline must include data preprocessing steps to transform raw data into a format suitable for model training. The JSON configuration defines the model type, evaluation metrics, and training hyperparameters, but omits any specification for data cleaning, normalization, feature encoding, or splitting, which are essential for reproducibility and model performance.

Exam trap

CompTIA often tests the misconception that a model configuration is complete if it includes the model type, hyperparameters, and evaluation metrics, but candidates overlook that data preprocessing is a mandatory pipeline stage for transforming raw data before training.

How to eliminate wrong answers

Option A is wrong because training hyperparameters (e.g., learning rate, batch size) are present in the configuration as part of the model training specification, so they are not missing. Option C is wrong because the model type (e.g., 'neural_network', 'random_forest') is explicitly defined in the JSON under the 'model' key, so it is not missing. Option D is wrong because evaluation metrics (e.g., 'accuracy', 'f1_score') are listed in the configuration under the 'evaluation' section, so they are not missing.

47
Multi-Selecteasy

A data scientist is training a supervised learning model for customer churn prediction. Which TWO types of bias are most likely to affect the model's fairness and accuracy if not addressed?

Select 2 answers
A.Algorithmic bias
B.Selection bias
C.Measurement bias
D.Sampling bias
E.Confirmation bias
AnswersB, C

Selection bias arises when the sample is not representative of the population, leading to skewed predictions.

Why this answer

Selection bias (B) occurs when the training data does not represent the true customer population, e.g., using only data from a specific time period or region, leading to a model that fails to generalize. Measurement bias (C) arises from systematic errors in how features are recorded, such as inconsistent data collection methods across customer segments, which can skew predictions and harm fairness.

Exam trap

CompTIA often tests the distinction between data-level biases (selection, measurement) and human cognitive biases (confirmation bias), so candidates mistakenly pick confirmation bias because it sounds plausible in a data science context.

48
MCQmedium

A company uses a pre-trained language model for a legal document classification task. They have limited labeled data (500 documents). Which strategy is MOST effective for adapting the model to this domain?

A.Use a rule-based keyword matching system instead.
B.Train a new model from scratch on the 500 documents.
C.Apply extensive data augmentation to increase dataset size.
D.Fine-tune the pre-trained model on the 500 labeled documents.
AnswerD

Correct; transfer learning works well with small labeled datasets.

Why this answer

Fine-tuning a pre-trained language model on 500 labeled legal documents is the most effective strategy because it leverages the model's existing knowledge of language structure and general semantics, requiring only a small amount of domain-specific data to adapt to the legal classification task. This approach avoids the high data requirements of training from scratch and outperforms rule-based or augmentation-only methods by directly optimizing the model's weights for the target domain.

Exam trap

CompTIA often tests the misconception that more data is always better (trap of Option C) or that starting from scratch is safer (trap of Option B), when in fact transfer learning via fine-tuning is the standard approach for low-resource NLP tasks.

How to eliminate wrong answers

Option A is wrong because rule-based keyword matching lacks the semantic understanding needed for legal document classification, where context and nuance are critical, and it cannot generalize beyond predefined patterns. Option B is wrong because training a new model from scratch on only 500 documents is insufficient for deep learning models, leading to severe overfitting and poor generalization due to the lack of pre-trained linguistic knowledge. Option C is wrong because extensive data augmentation on only 500 documents may introduce noise and unrealistic variations, and it does not provide the same benefit as leveraging a pre-trained model's learned representations, which already capture rich language patterns.

49
MCQmedium

A data scientist trains a linear regression model to predict house prices. The model has high bias and low variance. Which action would most likely reduce bias?

A.Apply L2 regularization
B.Increase the training dataset size
C.Add polynomial features
D.Remove irrelevant features
AnswerC

Adding complexity reduces bias but may increase variance.

Why this answer

High bias indicates the model is underfitting the data, meaning it is too simple to capture the underlying patterns. Adding polynomial features increases model complexity by introducing non-linear terms, which allows the linear regression model to better fit the training data and thus reduce bias.

Exam trap

CompTIA often tests the bias-variance tradeoff by making candidates confuse regularization (which reduces variance) with methods that reduce bias, or by implying that more data always fixes underfitting.

How to eliminate wrong answers

Option A is wrong because L2 regularization (Ridge regression) reduces overfitting by penalizing large coefficients, which increases bias to lower variance, making bias worse. Option B is wrong because increasing the training dataset size typically reduces variance (helps with overfitting) but does not address underfitting (high bias) — it may even make bias more apparent. Option D is wrong because removing irrelevant features simplifies the model further, which increases bias and is counterproductive when the goal is to reduce bias.

50
Multi-Selecthard

An organization is deploying a deep learning model in production. Which THREE components are essential for maintaining model performance over time?

Select 3 answers
A.Performance monitoring
B.Hyperparameter tuning
C.Model retraining pipeline
D.Feature importance analysis
E.Data drift detection
AnswersA, C, E

Continuous monitoring of key metrics alerts teams to degradation in model performance.

Why this answer

Performance monitoring (A) is essential because it provides continuous visibility into model metrics such as accuracy, latency, and throughput, enabling early detection of degradation. Without ongoing monitoring, teams cannot identify when a model's predictions deviate from expected behavior, which is critical for maintaining reliability in production.

Exam trap

CompTIA often tests the distinction between development-phase activities (hyperparameter tuning, feature analysis) and production-phase operational components (monitoring, retraining, drift detection), so candidates mistakenly include tuning or analysis as essential for ongoing maintenance.

51
MCQeasy

A healthcare provider wants to use AI to predict patient readmission risk. They have structured data (age, diagnosis, lab results) and unstructured clinical notes. Which approach is most appropriate?

A.Convolutional neural network (CNN) on clinical notes
B.Recurrent neural network (RNN) on structured data
C.Logistic regression on structured data only
D.Multimodal model combining structured and text embeddings
AnswerD

A multimodal model can process both structured data and text, leveraging all available information.

Why this answer

Option D is correct because the scenario involves both structured data (age, diagnosis, lab results) and unstructured clinical notes. A multimodal model can process both types by combining embeddings from text (e.g., via a transformer or RNN) with structured features, enabling the model to learn cross-modal patterns that improve readmission risk prediction. This approach leverages the complementary strengths of structured and unstructured data, which is essential for capturing the full clinical picture.

Exam trap

The trap here is that candidates may assume a single model type (like CNN or RNN) is sufficient for all data, overlooking the need to combine structured and unstructured data through a multimodal architecture.

How to eliminate wrong answers

Option A is wrong because a convolutional neural network (CNN) on clinical notes alone ignores the structured data (age, diagnosis, lab results), which are critical for readmission prediction; CNNs are also less effective for sequential text than transformers or RNNs. Option B is wrong because a recurrent neural network (RNN) on structured data is suboptimal—structured data is typically tabular and better handled by tree-based models or dense layers, and RNNs are designed for sequential data like time series or text. Option C is wrong because logistic regression on structured data only discards the valuable unstructured clinical notes, missing key risk factors embedded in free text, and logistic regression cannot capture complex nonlinear interactions in the data.

52
MCQhard

An AI system is deployed to detect fraudulent transactions. The system flags 5% of transactions as fraudulent, but the actual fraud rate is 0.1%. The business sees many false positives and wants to reduce them without significantly increasing false negatives. Which metric should be prioritized for optimization?

A.Recall
B.F1 score
C.Accuracy
D.Precision
AnswerB

F1 score balances precision and recall, allowing trade-off to reduce false positives while maintaining reasonable recall.

Why this answer

The F1 score balances precision and recall, making it ideal when false positives are costly but false negatives must not increase significantly. Optimizing precision alone would reduce false positives but could increase false negatives, while recall alone would not address the false positive problem. The F1 score ensures both metrics are jointly optimized, aligning with the business requirement.

Exam trap

CompTIA often tests the misconception that precision is the best metric for reducing false positives, but the trap here is that precision alone ignores the impact on false negatives, which the business explicitly wants to avoid increasing.

How to eliminate wrong answers

Option A is wrong because recall focuses on minimizing false negatives, but does not address the false positive problem; optimizing recall alone would likely increase false positives, worsening the business issue. Option C is wrong because accuracy is misleading in highly imbalanced datasets (0.1% fraud rate); a system that never flags any transaction would achieve 99.9% accuracy but fail to detect fraud. Option D is wrong because precision reduces false positives, but optimizing precision alone could increase false negatives (missed fraud), which the business wants to avoid; the F1 score balances both.

53
MCQeasy

Refer to the exhibit. The data scientist notices that the model achieves 98% accuracy on the training set but only 72% on the test set. Which change to the model parameters is most likely to reduce this gap?

A.Increase n_estimators to 500.
B.Set max_depth to None to allow trees to grow fully.
C.Reduce max_depth to 3.
D.Switch from RandomForest to a linear model like LogisticRegression.
AnswerC

Reducing max_depth restricts the tree depth, reducing overfitting.

Why this answer

The model is overfitting: 98% training accuracy vs. 72% test accuracy. Reducing max_depth to 3 limits the depth of each decision tree, preventing them from memorizing noise and forcing them to learn more generalizable patterns. This is a standard regularization technique for tree-based ensembles.

Exam trap

CompTIA often tests the bias-variance tradeoff by presenting overfitting symptoms and expecting candidates to choose a regularization parameter (like reducing max_depth) rather than increasing model complexity or switching model families entirely.

How to eliminate wrong answers

Option A is wrong because increasing n_estimators to 500 would add more trees, which generally improves stability but does not reduce overfitting—it may even exacerbate it if individual trees are already too deep. Option B is wrong because setting max_depth to None allows trees to grow fully, which increases the risk of overfitting by capturing every detail in the training data, widening the accuracy gap. Option D is wrong because switching to a linear model like LogisticRegression is a drastic architectural change that may underfit if the data has non-linear relationships; the goal is to regularize the existing RandomForest, not replace it entirely.

54
MCQmedium

Refer to the exhibit. A data scientist observes the training output. Which issue is most likely?

A.Underfitting
B.Data augmentation failure
C.Overfitting
D.Model compression
AnswerC

Correct; high training accuracy with lower validation accuracy suggests overfitting.

Why this answer

The exhibit shows training loss decreasing while validation loss increases after a certain epoch, which is the classic signature of overfitting. The model is memorizing the training data rather than learning generalizable patterns, leading to poor performance on unseen data.

Exam trap

CompTIA often tests the distinction between overfitting and underfitting by showing a loss curve where training loss is low but validation loss rises, tricking candidates who focus only on the low training loss without checking validation performance.

How to eliminate wrong answers

Option A is wrong because underfitting would show both training and validation loss remaining high and not decreasing, not the divergence seen here. Option B is wrong because data augmentation failure would typically cause both losses to be high or erratic, not a clear divergence with low training loss. Option D is wrong because model compression reduces model size and may affect accuracy, but it does not produce the specific loss divergence pattern of overfitting.

55
MCQmedium

Based on the exhibit, what is the most likely issue with the model training?

A.Vanishing gradient
B.Learning rate too high
C.Underfitting
D.Overfitting
AnswerD

The diverging validation loss after initial improvement indicates the model is memorizing the training data and failing to generalize.

Why this answer

The exhibit shows training loss decreasing while validation loss increases after a certain point, which is a classic sign of overfitting. The model is memorizing the training data rather than generalizing, leading to poor performance on unseen validation data.

Exam trap

CompTIA often tests the distinction between overfitting and underfitting by showing a diverging validation loss curve, which candidates may misinterpret as a learning rate issue or vanishing gradient.

How to eliminate wrong answers

Option A is wrong because vanishing gradient typically causes training to stall early with both losses high and flat, not a diverging validation loss. Option B is wrong because a learning rate too high would cause both training and validation losses to oscillate or diverge together, not just validation loss increasing. Option C is wrong because underfitting would show both training and validation losses remaining high and plateauing, not a decreasing training loss.

56
Multi-Selecthard

Which three techniques are commonly used to mitigate overfitting in neural networks? (Choose three.)

Select 3 answers
A.Adding L2 regularization
B.Increasing training data
C.Dropout
D.Reducing number of layers
E.Early stopping
AnswersA, C, E

L2 regularization adds a penalty on large weights, discouraging overfitting by constraining the model complexity.

Why this answer

Adding L2 regularization (also known as weight decay) penalizes large weights by adding a term proportional to the squared magnitude of the weights to the loss function. This forces the network to keep weights small, reducing the model's sensitivity to noise in the training data and preventing it from fitting spurious patterns, which is a direct and effective method to combat overfitting.

Exam trap

CompTIA often tests the distinction between data-level strategies (like increasing training data) and algorithmic regularization techniques (like L2, dropout, early stopping), leading candidates to mistakenly select 'increasing training data' as a technique when the question specifically asks for techniques commonly used within the neural network training process.

57
MCQmedium

A machine learning team notices that their model's performance degrades when deployed to a new geographic region. The data distribution in the new region differs from the training data. Which concept best describes this issue?

A.Covariate shift
B.Data leakage
C.Underfitting
D.Overfitting
AnswerA

Covariate shift happens when the distribution of input features changes between training and deployment.

Why this answer

Covariate shift occurs when the distribution of the input features (covariates) changes between training and deployment, while the conditional relationship P(Y|X) remains the same. In this scenario, the model's performance degrades because the new geographic region has a different data distribution than the training data, which is the classic definition of covariate shift. This is a common issue in machine learning when models are deployed in environments not represented in the training set.

Exam trap

CompTIA often tests the distinction between covariate shift and overfitting, where candidates mistakenly think performance degradation on new data is always due to overfitting, but the key is that overfitting implies poor performance on the same distribution, not a different one.

How to eliminate wrong answers

Option B is wrong because data leakage refers to information from outside the training set (e.g., future data or target information) being used to train the model, which artificially inflates performance, not a distribution shift between training and deployment. Option C is wrong because underfitting occurs when a model is too simple to capture patterns in the training data, resulting in poor performance on both training and test sets, not specifically a degradation due to a change in data distribution. Option D is wrong because overfitting happens when a model learns noise or specific patterns in the training data too well, leading to poor generalization on unseen data from the same distribution, not a shift to a different distribution.

58
MCQeasy

A startup is building a chatbot for customer service. They have 500 recorded conversations and want to use a pre-trained language model to generate responses. However, they have limited computational resources and need the chatbot to respond in real-time. They are considering fine-tuning a large model like GPT-3 or using a smaller model like DistilBERT. The conversation data contains industry-specific jargon. Which approach should they take?

A.Use GPT-3 via API without fine-tuning
B.Fine-tune DistilBERT on the conversation data
C.Train a custom RNN from scratch on the conversations
D.Implement a rule-based system with keywords
AnswerB

DistilBERT is smaller, faster, and fine-tuning on domain-specific data will adapt it to jargon while meeting real-time requirements.

Why this answer

Option B is correct because fine-tuning DistilBERT on the 500 recorded conversations allows the model to adapt to industry-specific jargon while maintaining real-time responsiveness due to its smaller size. DistilBERT is a distilled version of BERT that retains 97% of BERT’s language understanding with 40% fewer parameters, making it suitable for limited computational resources. Fine-tuning on domain-specific data is essential here, as pre-trained models like GPT-3 lack exposure to the startup’s specialized terminology, and using a smaller model ensures low-latency inference for real-time chatbot responses.

Exam trap

CompTIA often tests the misconception that larger pre-trained models like GPT-3 are always superior for domain adaptation, ignoring the critical trade-offs of computational cost, latency, and the need for fine-tuning on small, specialized datasets.

How to eliminate wrong answers

Option A is wrong because using GPT-3 via API without fine-tuning would not adapt to the industry-specific jargon in the 500 conversations, leading to generic or incorrect responses, and the API call latency and cost are unsuitable for real-time constraints with limited resources. Option C is wrong because training a custom RNN from scratch on only 500 conversations is insufficient for learning complex language patterns, resulting in poor generalization and high risk of overfitting, while also requiring significant computational resources for training. Option D is wrong because a rule-based system with keywords cannot handle the variability and nuance of natural language in customer service conversations, especially with industry-specific jargon, and would fail to generate coherent, context-aware responses beyond predefined patterns.

59
MCQeasy

Which metric is most appropriate for evaluating a binary classification model where the positive class is rare and false positives are costly?

A.Accuracy
B.F1-score
C.Precision
D.Recall
AnswerC

Correct; precision measures how many predicted positives are actually positive, reducing false positives.

Why this answer

Precision is the most appropriate metric when the positive class is rare and false positives are costly because it measures the proportion of true positive predictions among all positive predictions. In this scenario, minimizing false positives is critical, and precision directly penalizes them by requiring high confidence before labeling an instance as positive. This aligns with the business need to avoid costly false alarms, such as in fraud detection or medical diagnosis for rare diseases.

Exam trap

CompTIA often tests the misconception that accuracy is always the best metric, but the trap here is that candidates overlook how class imbalance and asymmetric costs make precision or recall more relevant, and they fail to distinguish between F1-score and precision when the cost of false positives is explicitly stated.

How to eliminate wrong answers

Option A is wrong because accuracy is misleading for imbalanced datasets; a model that predicts the majority class for all instances can achieve high accuracy while failing to identify any positive cases, which is useless when the positive class is rare. Option B is wrong because F1-score balances precision and recall, but when false positives are costly, precision alone is more appropriate; F1-score would still allow some false positives in favor of recall, which is undesirable here. Option D is wrong because recall focuses on capturing all positive instances, but it does not penalize false positives; in a rare positive class scenario with high cost of false positives, maximizing recall would likely increase false positives, which is counterproductive.

60
MCQmedium

An AI engineer is tuning a deep learning model and observes that the training loss decreases very slowly. The learning rate is set to 0.001. Which adjustment is most likely to speed up convergence?

A.Increase the learning rate to 0.01
B.Add more hidden layers
C.Decrease the learning rate to 0.0001
D.Increase the batch size
AnswerA

A higher learning rate allows larger weight updates, potentially speeding up convergence.

Why this answer

A learning rate of 0.001 is causing the model to take very small steps toward the minimum of the loss function, resulting in slow convergence. Increasing the learning rate to 0.01 allows larger weight updates per iteration, which typically speeds up training. However, care must be taken not to overshoot the optimum, as an excessively high learning rate can cause divergence.

Exam trap

CompTIA often tests the misconception that decreasing the learning rate always improves training, when in fact a learning rate that is too low is a primary cause of slow convergence, and the correct adjustment is to increase it within a safe range.

How to eliminate wrong answers

Option B is wrong because adding more hidden layers increases model complexity and the number of parameters, which generally slows training and can exacerbate the slow convergence problem rather than solving it. Option C is wrong because decreasing the learning rate to 0.0001 would make the updates even smaller, further slowing convergence. Option D is wrong because increasing the batch size provides a more accurate gradient estimate but reduces the frequency of updates per epoch, which can actually slow convergence in terms of steps needed to reach a given loss.

61
MCQhard

A company uses the above policy to control AI model access. A data scientist tries to run inference with model "llama-3-70b" at 150 requests in 30 minutes. What will happen?

A.All requests are allowed because the model is in the allowed list
B.All requests are denied because the rate limit is per minute and 150 exceeds the limit
C.The first 100 requests are allowed; the remaining 50 are denied
D.All requests are denied because the second rule blocks all models
AnswerC

The rate limit allows 100 requests per hour; exceeding requests are denied.

Why this answer

Option C is correct because the policy allows up to 100 requests per 30 minutes for models in the allowed list, and 'llama-3-70b' is in that list. The rate limit is applied per 30-minute window, not per minute, so the first 100 requests are allowed, and the remaining 50 exceed the limit and are denied.

Exam trap

The trap here is that candidates often misinterpret the rate limit as a per-minute value (like 100 per minute) rather than the stated 100 per 30 minutes, leading them to incorrectly select Option B.

How to eliminate wrong answers

Option A is wrong because it ignores the rate limit; being in the allowed list does not bypass the 100 requests per 30-minute cap. Option B is wrong because it misinterprets the rate limit as per minute, but the policy specifies a per-30-minute window, so 150 requests in 30 minutes does not exceed a per-minute limit. Option D is wrong because the second rule does not block all models; it only blocks models not in the allowed list, and 'llama-3-70b' is explicitly allowed.

62
Multi-Selecteasy

Which TWO of the following are common activation functions used in neural networks? (Choose two.)

Select 2 answers
A.Gradient descent
B.LSTM
C.Dropout
D.ReLU
E.Sigmoid
AnswersD, E

ReLU is a widely used activation function.

Why this answer

ReLU (Rectified Linear Unit) is a widely used activation function that outputs the input directly if it is positive, and zero otherwise, introducing non-linearity while mitigating the vanishing gradient problem. Sigmoid is another common activation function that maps any real-valued input to a value between 0 and 1, making it useful for binary classification output layers. Both are fundamental building blocks in neural network architectures.

Exam trap

CompTIA often tests the distinction between activation functions and other neural network components like optimizers (gradient descent), architectures (LSTM), or regularization techniques (dropout), expecting candidates to recognize that only ReLU and Sigmoid directly compute a neuron's output from its input.

63
MCQeasy

A company wants to use AI to automatically categorize customer support tickets into topics like 'billing', 'technical', 'account'. They have 10,000 labeled examples. Which algorithm is most suitable for this task?

A.DBSCAN
B.Apriori
C.Principal component analysis (PCA)
D.Logistic regression
AnswerD

Logistic regression is a supervised learning algorithm for classification, suitable for multi-class problems with moderate data.

Why this answer

Option A is correct because logistic regression works well for multi-class classification with limited data. Option B is wrong because DBSCAN is clustering, not classification. Option C is wrong because Apriori is for association rules.

Option D is wrong because PCA is dimensionality reduction.

64
Multi-Selectmedium

Which THREE of the following are types of machine learning paradigms? (Choose three.)

Select 3 answers
A.Gradient boosting
B.Reinforcement learning
C.Unsupervised learning
D.Quantum computing
E.Supervised learning
AnswersB, C, E

Reinforcement learning involves an agent learning from rewards.

Why this answer

Reinforcement learning is a correct machine learning paradigm where an agent learns to make decisions by interacting with an environment, receiving rewards or penalties based on its actions. This trial-and-error approach is distinct from supervised and unsupervised learning, as it focuses on maximizing cumulative reward through exploration and exploitation.

Exam trap

CompTIA often tests candidates by listing specific algorithms (like gradient boosting) or adjacent technologies (like quantum computing) as distractors, hoping you confuse a technique or enabling technology with a fundamental learning paradigm.

65
MCQmedium

An AI model is being developed for medical diagnosis from X-ray images. The dataset contains only frontal chest X-rays. The model achieves high accuracy on test set but fails on lateral views. What is the most likely cause?

A.Dataset bias
B.Underfitting
C.Label noise
D.Overfitting
AnswerA

The training set lacks lateral views, causing bias; the model has not learned to recognize features specific to lateral X-rays.

Why this answer

The model was trained only on frontal views, so it did not learn features from lateral views, resulting in dataset bias and poor generalization to unseen perspectives.

66
MCQeasy

In the AI lifecycle, which phase involves splitting data into training, validation, and test sets?

A.Model training
B.Data preprocessing
C.Data collection
D.Model evaluation
AnswerB

Correct; preprocessing includes cleaning, transforming, and splitting data.

Why this answer

Data preprocessing is the phase where raw data is cleaned, transformed, and prepared for modeling. Splitting the dataset into training, validation, and test sets is a critical step during this phase to ensure unbiased evaluation and prevent data leakage. This split occurs before any model training begins, making it part of preprocessing rather than training or evaluation.

Exam trap

CompTIA often tests the misconception that data splitting belongs to model training or evaluation, when in fact it is a preprocessing step that must occur before any model sees the data.

How to eliminate wrong answers

Option A is wrong because model training is the phase where the algorithm learns patterns from the training data, not where the data is split; splitting must happen beforehand to avoid contaminating the evaluation. Option C is wrong because data collection is the initial gathering of raw data from sources, which occurs before any splitting or preprocessing. Option D is wrong because model evaluation uses the already-split test set to assess performance, but the split itself is established during data preprocessing.

67
MCQmedium

A government agency is deploying an AI model to screen loan applications. The model uses features like income, credit score, employment history, and zip code. During fairness auditing, the model is found to deny a disproportionately high number of applicants from a particular demographic group, even when controlling for legitimate financial factors. The agency wants to mitigate this bias without significantly reducing overall accuracy. Which approach should the data scientist prioritize?

A.Adjust the decision threshold for the affected group
B.Remove the zip code feature from the model
C.Apply sample weighting to balance the demographic groups
D.Use adversarial debiasing during model training
AnswerD

Adversarial debiasing forces the model to learn representations that are invariant to sensitive attributes, reducing bias with minimal accuracy loss.

Why this answer

Adversarial debiasing trains the model to remove sensitive information from its internal representations, reducing bias while maintaining accuracy. Option A (remove sensitive features) is ineffective because correlated proxies remain. Option B (reweight training samples) can help but may distort the distribution.

Option D (post-hoc threshold adjustment) may reduce disparity but often at the cost of overall accuracy; adversarial debiasing is a more principled in-processing method.

68
MCQmedium

A company built a speech-to-text model using a recurrent neural network (RNN). During deployment, the model performs poorly on accented speech. Which action would most effectively improve model robustness?

A.Collect a small sample of accented speech and fine-tune the model on that sample only.
B.Add dropout and reduce the number of RNN layers to prevent overfitting to the current data.
C.Augment the training dataset with various accented audio samples and retrain the model.
D.Replace the RNN with a convolutional neural network (CNN) for feature extraction.
AnswerC

Data augmentation with accents directly addresses the performance gap.

Why this answer

Option C is correct because augmenting the training dataset with diverse accented audio samples directly addresses the root cause of poor performance—distribution shift between training and deployment data. Retraining the model on this enriched dataset allows the RNN to learn invariant features across accents, improving generalization without altering the model architecture or risking catastrophic forgetting from fine-tuning on a tiny sample.

Exam trap

CompTIA often tests the misconception that architectural changes (like switching to CNN or adding regularization) can fix data distribution mismatches, when the real solution is to address the missing data diversity through augmentation or retraining.

How to eliminate wrong answers

Option A is wrong because fine-tuning on a small sample of accented speech can cause catastrophic forgetting of the original training distribution and does not provide enough diversity to learn robust accent-invariant features. Option B is wrong because adding dropout and reducing layers addresses overfitting to the current data, but the core problem is underfitting to accented speech due to missing representative training examples, not overfitting. Option D is wrong because replacing the RNN with a CNN for feature extraction does not inherently solve the accent robustness issue; CNNs are effective for spatial patterns but less suited for sequential temporal dependencies in speech, and the fundamental problem remains the lack of accented training data.

69
MCQmedium

A company wants to create an AI system that can identify objects in images. They have a large dataset of labeled images. Which type of neural network architecture is most suitable?

A.Transformer
B.Convolutional neural network (CNN)
C.Generative adversarial network (GAN)
D.Recurrent neural network (RNN)
AnswerB

Correct; CNNs excel at image recognition due to convolutional layers.

Why this answer

Convolutional neural networks (CNNs) are specifically designed to process grid-like data such as images. They use convolutional layers to automatically learn spatial hierarchies of features (edges, textures, objects) from pixel data, making them the most suitable architecture for image classification tasks with labeled datasets.

Exam trap

CompTIA often tests the misconception that any 'neural network' can handle images equally, but the trap is that RNNs and Transformers are sequence-based and not optimized for spatial feature extraction, while GANs are generative, not discriminative.

How to eliminate wrong answers

Option A is wrong because Transformers are primarily designed for sequential data (e.g., text) using self-attention mechanisms; while they can be adapted for vision (Vision Transformers), they require large datasets and are not the standard choice for traditional image classification. Option C is wrong because Generative adversarial networks (GANs) are used for generating new data (e.g., synthetic images) rather than classifying or identifying objects in existing images. Option D is wrong because Recurrent neural networks (RNNs) are designed for sequential or time-series data (e.g., text, speech) and struggle with spatial relationships in images due to vanishing gradients and lack of translation invariance.

70
Multi-Selecthard

A team is deploying a deep learning model that uses a convolutional neural network (CNN) for image recognition. The model achieves high accuracy but is very slow to infer on edge devices. Which THREE optimization techniques should the team consider to speed up inference without significant accuracy loss? (Select three.)

Select 3 answers
A.Use larger convolutional filters (e.g., 7x7 instead of 3x3) to capture more context.
B.Use weight pruning to remove unnecessary connections in the network.
C.Implement knowledge distillation by training a smaller model to mimic the larger one.
D.Increase the number of convolutional layers to improve feature extraction.
E.Apply model quantization to reduce weight precision.
AnswersB, C, E

Pruning reduces computation and memory footprint.

Why this answer

Weight pruning removes redundant or less important connections (weights) from the neural network, reducing the number of computations required during inference. This directly speeds up inference on edge devices while typically causing only a minor drop in accuracy if done carefully, making it a standard optimization technique for deploying CNNs on resource-constrained hardware.

Exam trap

CompTIA often tests the misconception that increasing model capacity (larger filters or more layers) improves performance without considering the trade-off in inference speed, leading candidates to select options that actually worsen latency on edge devices.

71
Multi-Selecthard

Which TWO of the following are key characteristics of unsupervised learning?

Select 2 answers
A.It uses data without labeled responses
B.It predicts a target variable based on input features
C.It discovers hidden patterns or groupings in data
D.It requires a reward signal to learn optimal actions
E.It typically requires a separate validation set for tuning
AnswersA, C

Unsupervised learning works with unlabeled data.

Why this answer

Option A is correct because unsupervised learning algorithms, such as k-means clustering or hierarchical clustering, operate exclusively on input data that has no labeled responses. The model must infer the underlying structure directly from the features without any ground-truth outputs to guide it, which is the defining characteristic of unsupervised learning.

Exam trap

CompTIA often tests the distinction between supervised, unsupervised, and reinforcement learning by presenting a characteristic that is true for one paradigm but not the other, and the trap here is that candidates may confuse 'predicting a target variable' (supervised) with 'discovering hidden patterns' (unsupervised) because both involve analyzing input features.

72
MCQeasy

A company implements a chatbot using a rule-based system. Users complain the chatbot cannot handle new queries. Which AI approach should be considered to improve flexibility?

A.Expert system
B.Natural language processing (NLP)
C.Robotic process automation
D.Machine learning
AnswerD

ML enables the system to learn patterns from data.

Why this answer

Machine learning (ML) enables a chatbot to learn from new data and adapt to unseen queries, unlike a static rule-based system. By training on historical conversations, an ML model can generalize patterns and handle novel inputs without requiring explicit rules for every scenario.

Exam trap

CompTIA often tests the misconception that NLP alone is sufficient for adaptive chatbots, but NLP is a component of understanding language, not a learning mechanism—machine learning is required for flexibility.

How to eliminate wrong answers

Option A is wrong because an expert system is also rule-based, relying on a fixed knowledge base and inference engine, which cannot adapt to new queries without manual rule updates. Option B is wrong because natural language processing (NLP) alone provides text understanding (e.g., tokenization, parsing) but does not inherently learn from new data; it must be combined with ML for adaptive behavior. Option C is wrong because robotic process automation (RPA) automates repetitive, rule-based tasks in structured environments and cannot handle the variability of new, unseen queries.

73
MCQeasy

A data scientist wants to group customers into segments based on purchasing behavior without predefined labels. Which type of machine learning is most appropriate?

A.Reinforcement learning
B.Supervised learning
C.Unsupervised learning
D.Semi-supervised learning
AnswerC

Correct; unsupervised learning identifies patterns without labels.

Why this answer

Unsupervised learning is the correct choice because the data scientist has no predefined labels and wants to discover natural groupings in customer purchasing behavior. Clustering algorithms, such as K-means or DBSCAN, are used in unsupervised learning to segment data based on inherent patterns without any target variable.

Exam trap

CompTIA often tests the distinction between supervised and unsupervised learning by presenting a scenario with no labels, and the trap is that candidates may confuse clustering (unsupervised) with classification (supervised) or think semi-supervised applies when no labels exist at all.

How to eliminate wrong answers

Option A is wrong because reinforcement learning involves an agent learning from rewards and penalties by interacting with an environment, not grouping unlabeled data. Option B is wrong because supervised learning requires labeled training data with known outcomes, which is not available in this scenario. Option D is wrong because semi-supervised learning uses a small amount of labeled data alongside a larger unlabeled dataset, but the question explicitly states there are no predefined labels.

74
MCQeasy

A company deploys an AI model to predict equipment failure. The model performs well on historical data but fails to generalize to new data from a different factory. Which concept best describes this issue?

A.Transfer learning
B.Underfitting
C.Overfitting
D.Bias-variance tradeoff
AnswerC

The model fits training data too closely and fails on new data.

Why this answer

Option C (Overfitting) is correct because the model learned patterns specific to the historical data from the original factory, including noise and factory-specific nuances, rather than generalizable features. When applied to new data from a different factory, those learned patterns do not hold, causing poor performance. This is the classic symptom of overfitting: high accuracy on training data but low accuracy on unseen data.

Exam trap

CompTIA often tests the distinction between overfitting and underfitting by describing a model that performs well on training data but poorly on new data, which candidates may mistakenly attribute to underfitting if they focus only on the poor generalization without noting the strong training performance.

How to eliminate wrong answers

Option A is wrong because transfer learning refers to leveraging knowledge from one task to improve learning on a related task, which is not the issue here—the model fails to generalize, not that it fails to transfer knowledge. Option B is wrong because underfitting occurs when the model is too simple to capture underlying patterns, resulting in poor performance on both training and new data, whereas here the model performs well on historical data. Option D is wrong because bias-variance tradeoff is a broader concept describing the balance between underfitting (high bias) and overfitting (high variance); while overfitting is a manifestation of high variance, the specific issue described is overfitting itself, not the tradeoff.

75
MCQhard

Refer to the exhibit. A team deploys a sentiment analysis model with this policy. After one month, the monitoring system triggers an alert for feature drift. Which action should the team take first?

A.Review the fairness check settings to ensure protected attributes are still relevant.
B.Immediately retrain the model on recent data to adapt to the drift.
C.Compare the current feature distributions with the training set to identify which features drifted.
D.Reduce the classification threshold to 0.5 to increase sensitivity.
AnswerC

Drift analysis should first characterize the drift to decide corrective action.

Why this answer

Option C is correct because when a monitoring system triggers an alert for feature drift, the first step is to diagnose which features have changed. Comparing current feature distributions with the training set identifies the specific features that drifted, enabling targeted remediation such as retraining with recent data or feature engineering. This aligns with the standard MLOps workflow for drift detection and response.

Exam trap

CompTIA often tests the misconception that any model alert should trigger immediate retraining, but the correct first step is always to diagnose the drift type and affected features before taking action.

How to eliminate wrong answers

Option A is wrong because fairness check settings and protected attributes are unrelated to feature drift; they address bias, not distribution shifts in input features. Option B is wrong because immediately retraining the model without first identifying which features drifted is premature and may waste resources or fail to address the root cause. Option D is wrong because reducing the classification threshold to 0.5 adjusts the decision boundary for sensitivity but does not correct feature distribution changes; it could degrade model performance further.

Page 1 of 2 · 103 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Ai Concepts Foundations questions.