Knowledge + Practice

CompTIA AI+ AI0-001 (AI0-001) — Questions 451–500

500 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 7 of 7

451

MCQmedium

A model trained on a dataset has high bias and low variance. What does this indicate?

A.Good fit

B.Data leakage

C.Overfitting

D.Underfitting

AnswerD

Correct: High bias leads to underfitting.

Why this answer

Option A is correct because high bias and low variance indicate underfitting, where the model cannot capture the underlying patterns. Options B, C, and D are incorrect: overfitting has low bias and high variance, good fit has low bias and low variance, and data leakage is not a bias-variance concept.

Full explanation →

452

MCQeasy

A dataset contains features on vastly different scales (e.g., age 0-100 vs. income 0-1,000,000). Which preprocessing step is essential before training a neural network?

A.Data augmentation

B.Dimensionality reduction

C.Feature scaling (standardization or normalization)

D.One-hot encoding

AnswerC

Scaling brings features to a similar range, improving gradient descent.

Why this answer

Neural networks are sensitive to feature scale; standardization or normalization ensures stable convergence.

Full explanation →

453

Multi-Selectmedium

A company is implementing an AI solution for fraud detection. The dataset is highly imbalanced (only 1% fraudulent transactions). Which THREE techniques are most appropriate to address class imbalance? (Select three.)

Select 3 answers

A.Apply cost-sensitive learning by assigning a higher misclassification cost to the minority class.

B.Reduce the number of features using principal component analysis (PCA).

C.Use accuracy as the primary evaluation metric.

D.Evaluate model performance using precision-recall curves and F1 score.

E.Use synthetic oversampling (SMOTE) to create additional minority class samples.

AnswersA, D, E

Cost-sensitive methods penalize minority class errors more heavily.

Why this answer

Option A is correct because cost-sensitive learning directly addresses class imbalance by assigning a higher misclassification cost to the minority class (fraudulent transactions). This forces the model to penalize false negatives more heavily, thereby improving recall for the minority class without altering the dataset distribution.

Exam trap

CompTIA often tests the misconception that accuracy is a valid metric for imbalanced datasets, but the trap here is that candidates overlook how a high accuracy can mask poor minority class performance, leading them to select option C instead of focusing on precision-recall curves and F1 score.

Full explanation →

454

MCQhard

An autonomous vehicle system uses a deep reinforcement learning agent to navigate. The agent's reward function gives +1 for reaching the destination and -0.1 for each time step. After training, the agent learns to circle the block repeatedly without reaching the destination. Which modification is most likely to fix this behavior?

A.Increase the time penalty to -1 per step

B.Increase the reward for reaching the destination to +10

C.Use a discount factor closer to 0

D.Add a penalty for each turn the vehicle makes

AnswerA

A higher penalty per step makes circling less rewarding and encourages reaching the destination quickly.

Why this answer

The agent learns to circle the block because the cumulative penalty for each time step (-0.1) is too small relative to the reward for reaching the destination (+1). By increasing the time penalty to -1 per step, the agent will incur a much larger cost for delaying, making it optimal to reach the destination quickly rather than looping indefinitely. This directly addresses the reward structure imbalance that causes the undesirable behavior.

Exam trap

CompTIA often tests the misconception that increasing the terminal reward alone will fix reward hacking, when in fact the per-step penalty must be large enough to make delay costly relative to the goal reward.

How to eliminate wrong answers

Option B is wrong because simply increasing the destination reward to +10 does not change the per-step penalty; the agent can still accumulate a small penalty while circling, and the total reward from looping may still outweigh the delayed +10 reward if the discount factor is high. Option C is wrong because using a discount factor closer to 0 makes the agent myopic, focusing only on immediate rewards; this would actually encourage short-term circling behavior rather than long-term goal achievement. Option D is wrong because adding a penalty for each turn does not address the core issue of the agent preferring to delay reaching the destination; the agent could still circle without turning (e.g., driving in a straight loop) or the penalty might not be large enough to overcome the reward structure.

Full explanation →

455

MCQmedium

A team trained a deep neural network on a limited dataset. The training loss decreases consistently, but the validation loss starts increasing after 20 epochs. What is the most likely issue and the best corrective action?

A.Vanishing gradient; use ReLU activation

B.Overfitting; apply regularization like dropout

C.Underfitting; increase model complexity

D.Data leakage; reshuffle split

AnswerB

Dropout randomly drops neurons to prevent co-adaptation, reducing overfitting.

Why this answer

The divergence between training and validation loss indicates overfitting. Regularization techniques like dropout help reduce overfitting.

Full explanation →

456

MCQmedium

A retail company is building a recommendation system to suggest products to customers based on their purchase history. The data engineering team has collected data from point-of-sale systems, online browsing logs, and customer reviews. After cleaning the data, they notice that the feature set has over 500 dimensions, leading to high computational costs and potential overfitting. They need to reduce dimensionality while preserving as much variance as possible for the model. The team is considering various techniques. Which approach should they take to achieve this goal most effectively?

A.Keep all features but apply L1 regularization (Lasso) in the model to automatically reduce coefficients to zero.

B.Apply t-Distributed Stochastic Neighbor Embedding (t-SNE) to reduce the feature space to 50 dimensions.

C.Select only features that have a high correlation with the target variable, discarding all others.

D.Use Principal Component Analysis (PCA) to reduce the feature space to the top 50 principal components that explain 95% of the variance.

AnswerD

PCA efficiently reduces dimensionality while retaining most variance, and the components can be used in downstream models.

Why this answer

Principal Component Analysis (PCA) is a linear dimensionality reduction technique that projects data onto a lower-dimensional subspace while maximizing variance. It is well-suited for reducing a large number of correlated features. t-SNE is primarily for visualization and does not produce a transformation that can be applied to new data easily. Feature selection based on correlation may discard useful interactions.

Keeping all features and using regularization would still require full feature set during training and may not reduce dimensionality in the pipeline.

Full explanation →

457

MCQmedium

A company uses an AI system to recommend products. The recommendation accuracy is high, but users complain about lack of diversity. Which strategy should the team adopt to improve diversity without significantly sacrificing accuracy?

A.Randomly replace some recommendations with popular items.

B.Use only popularity-based recommendations.

C.Increase the number of recommendations and use collaborative filtering.

D.Modify the loss function to include a term that penalizes overly similar recommendations.

AnswerD

This explicitly encourages diversity while retaining accuracy.

Why this answer

Option D is correct because modifying the loss function to include a diversity penalty directly addresses the lack of recommendation diversity at the algorithmic level. By adding a regularization term that penalizes overly similar recommendations, the model learns to balance accuracy with variety, ensuring that the output set remains diverse without a significant drop in relevance. This approach is a standard technique in recommendation systems, often implemented via Determinantal Point Processes (DPPs) or diversity-aware loss functions.

Exam trap

CompTIA often tests the misconception that simply adding more recommendations or using popularity will solve diversity issues, when in reality, algorithmic constraints like loss function modification are required to maintain accuracy while improving diversity.

How to eliminate wrong answers

Option A is wrong because randomly replacing some recommendations with popular items introduces noise and can significantly degrade accuracy, as popular items may not be relevant to the user's specific preferences. Option B is wrong because using only popularity-based recommendations completely ignores personalization, leading to a severe loss of accuracy and user-specific relevance. Option C is wrong because simply increasing the number of recommendations and using collaborative filtering does not inherently enforce diversity; it may still produce a homogeneous set of similar items, and the increased list size can dilute relevance without a diversity constraint.

Full explanation →

458

MCQhard

A real-time recommendation system uses a model retrained daily. The operations team notices that click-through rate drops sharply at 8 AM each day and recovers by noon. The retraining job runs at midnight. What is the most likely cause?

A.The model overfits to late-night user behavior

B.The model suffers from catastrophic forgetting due to daily retraining

C.There is data drift due to morning user patterns not seen in training

D.The retraining pipeline has a bug that only affects morning predictions

AnswerC

Morning patterns differ from training data, causing a temporary performance drop until the model adapts through retraining.

Why this answer

The sharp drop in click-through rate at 8 AM, followed by recovery by noon, strongly indicates data drift caused by a shift in user behavior patterns during morning hours. Since the model is retrained at midnight using data that predominantly captures late-night user behavior, it fails to generalize to the distinct morning user patterns (e.g., different browsing habits, content preferences). This is a classic example of temporal data drift where the training distribution does not match the inference distribution at specific times of day.

Exam trap

CompTIA often tests the distinction between data drift and model degradation issues; the trap here is that candidates might confuse a temporary performance dip due to distribution shift (data drift) with a permanent model flaw like overfitting or catastrophic forgetting, which would not self-correct within the same day.

How to eliminate wrong answers

Option A is wrong because overfitting to late-night user behavior would cause poor performance during morning hours, but the recovery by noon suggests the model adapts as more morning data becomes available, not that it is permanently overfit. Option B is wrong because catastrophic forgetting refers to a model losing previously learned knowledge when trained on new data, which would cause a persistent performance drop, not a temporary one that recovers within hours. Option D is wrong because a pipeline bug that only affects morning predictions would likely cause consistent errors or failures at 8 AM every day, not a gradual recovery by noon, and there is no evidence of a bug in the retraining process itself.

Full explanation →

459

Multi-Selectmedium

Which TWO of the following are best practices for monitoring AI models in production?

Select 2 answers

A.Set up alerts for prediction latency and error rates.

B.Monitor model accuracy only at deployment time.

C.Regularly retrain without checking performance.

D.Freeze the model version once deployed to avoid changes.

E.Track input data distribution and compare with training data.

AnswersA, E

Operational metrics like latency and errors are critical for production monitoring.

Why this answer

Tracking input data distribution helps detect drift, and alerts on latency/error rates ensure operational health. Other options are incorrect or incomplete.

Full explanation →

460

Multi-Selectmedium

A deep learning engineer is training a convolutional neural network for image classification. The model is overfitting the training data. Which three techniques can help reduce overfitting? (Choose three.)

Select 3 answers

A.Add dropout layers

B.Apply L2 regularization

C.Use data augmentation

D.Use a smaller learning rate

E.Increase the number of convolutional layers

AnswersA, B, C

Dropout randomly drops units during training, reducing co-adaptation.

Why this answer

Dropout, data augmentation, and L2 regularization are standard techniques to reduce overfitting by adding regularization or increasing data diversity.

Full explanation →

461

MCQhard

An organization implements AI governance following the NIST AI Risk Management Framework. They need to ensure that all model decisions are logged with sufficient detail for later audit. Which logging requirement is most critical for traceability?

A.Input data and model name only

B.Source code and training dataset hash

C.Model outputs and confidence scores only

D.Timestamp, input data, output, and model version

AnswerD

These four elements enable full reconstruction and audit of each decision.

Why this answer

Option D is correct because timestamp, input/output data, and model version together provide full traceability for audit. Option A is wrong because logging only outputs and confidence is insufficient without inputs. Option B is wrong because logging only inputs and model name misses version and outputs.

Option C is wrong because code and training data logs are not typically part of inference audit trails.

Full explanation →

462

MCQmedium

Refer to the exhibit. The training log shows loss and accuracy for a binary classification model. What is the most likely issue with this model?

A.Overfitting

B.Insufficient epochs

C.Underfitting

D.Data leakage

AnswerA

The divergence between decreasing training loss and increasing validation loss indicates overfitting.

Why this answer

The training loss decreases and training accuracy increases, but validation loss increases and validation accuracy decreases. This is a classic sign of overfitting, where the model learns training data noise but fails to generalize. Underfitting would show both training and validation loss high.

Data leakage would show unusually high accuracy early. Insufficient epochs would show both losses still decreasing.

Full explanation →

463

MCQmedium

A team is training a neural network for image classification. They observe that training loss decreases steadily but validation loss starts increasing after 20 epochs. What is the most likely issue?

A.Underfitting

B.Vanishing gradients

C.Data leakage

D.Overfitting

AnswerD

Correct; the model is fitting noise in training data.

Why this answer

Option A is correct because increasing validation loss while training loss continues to decrease is a classic sign of overfitting. Option B (underfitting) would show poor training loss. Option C (vanishing gradients) would cause slow convergence.

Option D (data leakage) would affect validation if leaked, but pattern is overfitting.

Full explanation →

464

Multi-Selecteasy

Which THREE are common machine learning algorithms used for regression?

Select 3 answers

A.Logistic regression

B.K-means

C.Linear regression

D.Decision tree

E.K-nearest neighbors

AnswersC, D, E

Correct; linear regression predicts a continuous target.

Why this answer

Linear regression is a fundamental supervised learning algorithm used for regression tasks, where the goal is to predict a continuous numeric output based on one or more input features. It models the relationship between the dependent and independent variables by fitting a linear equation to the observed data, making it a core algorithm for regression problems.

Exam trap

CompTIA often tests the distinction between regression and classification algorithms, and the trap here is that candidates mistakenly associate 'logistic regression' with regression tasks due to its name, when it is actually a classification algorithm.

Full explanation →

465

MCQeasy

Refer to the exhibit. A data engineer is training a binary classification neural network. The loss fluctuates and does not converge. Which hyperparameter adjustment is most likely to stabilize training?

A.Change the activation to tanh

B.Add dropout after each layer

C.Decrease the learning rate

D.Increase the number of units in the first dense layer

AnswerC

Lower learning rate makes updates smaller, reducing oscillations and promoting convergence.

Why this answer

A high learning rate can cause the loss to oscillate and prevent convergence. Decreasing the learning rate typically stabilizes training. Increasing units, changing activation, or adding dropout may not address the root cause of fluctuation.

Full explanation →

466

Multi-Selecteasy

A company is preparing a dataset for training a supervised machine learning model. The dataset contains missing values, outliers, and categorical features. Which two preprocessing steps are typically performed to prepare the data? (Choose two.)

Select 2 answers

A.Normalize numerical features to a standard range

B.Impute missing values with the mean

C.Encode categorical variables using one-hot encoding

D.Remove all features with low variance

E.Increase the number of features using PCA

AnswersB, C

Imputation handles missing data and is commonly done.

Why this answer

Imputing missing values and encoding categorical features are standard preprocessing steps for most machine learning pipelines.

Full explanation →

467

MCQmedium

A company implements an AI-based chatbot for customer service. After deployment, customers report that the chatbot sometimes uses offensive language. The development team reviews the training data and finds no explicit offensive content. What is the most likely explanation?

A.There is a bug in the deployment pipeline

B.The model is overfitting to rare examples

C.The model learned biased language patterns from the training corpus

D.The training data was poisoned by an attacker

AnswerC

The model may have learned offensive language from context, e.g., associating certain demographics with negative terms.

Why this answer

Large language models can learn unintended associations from training data, including biased or offensive language embedded in context. Even without explicit offensive content, the model may generate such language due to learned patterns.

Full explanation →

468

MCQhard

A team is building a model to predict stock prices based on time series data. They need to capture long-term dependencies and avoid vanishing gradients. Which architecture is best suited?

A.Standard RNN

B.LSTM

C.Autoencoder

D.CNN

AnswerB

LSTM excels at learning long-term dependencies.

Why this answer

LSTM networks are designed with gating mechanisms to capture long-range dependencies and mitigate vanishing gradient problems in standard RNNs.

Full explanation →

469

MCQeasy

A team wants to predict monthly sales using historical data. Which algorithm is most appropriate?

A.Linear regression

B.K-means

C.Decision tree

D.Logistic regression

AnswerA

Correct: Linear regression models the relationship between dependent and independent variables for continuous output.

Why this answer

Option D is correct because linear regression is used for predicting continuous values. Options A, B, and C are incorrect: logistic regression is for binary classification, decision tree can be used for regression but linear regression is simpler for trend prediction, and K-means is for clustering.

Full explanation →

470

MCQhard

A team is implementing an ML pipeline using a feature store. Which benefit does a feature store primarily provide in an AI operations context?

A.Automated scaling of inference endpoints

B.Real-time monitoring of model performance

C.Consistency of feature computation between training and inference

D.Automatic model versioning and rollback

AnswerC

Feature store provides a centralized, consistent feature computation pipeline.

Why this answer

A feature store ensures that feature engineering logic is stored, versioned, and reused consistently across both training and inference pipelines. This eliminates training-serving skew, a common cause of model degradation in production, by guaranteeing that the same transformations are applied to data regardless of when or where it is computed.

Exam trap

CompTIA often tests the distinction between infrastructure-level benefits (scaling, monitoring, versioning) and the core data-consistency problem that a feature store solves, leading candidates to confuse feature stores with model registries or serving platforms.

How to eliminate wrong answers

Option A is wrong because automated scaling of inference endpoints is a function of model serving infrastructure (e.g., Kubernetes Horizontal Pod Autoscaler or serverless inference platforms), not a primary benefit of a feature store. Option B is wrong because real-time monitoring of model performance is handled by observability tools (e.g., MLflow, Prometheus, or custom drift detection systems), not by the feature store itself. Option D is wrong because automatic model versioning and rollback is a capability of model registries and CI/CD pipelines (e.g., MLflow Model Registry or DVC), whereas a feature store focuses on feature definitions and values, not model artifacts.

Full explanation →

471

Multi-Selecthard

A deployed NLP sentiment analysis model experiences a sharp decline in accuracy on customer reviews. The team has verified the input data format and pipeline are correct. Which THREE actions should be taken to diagnose and remediate? (Choose 3.)

Select 3 answers

A.Analyze recent user input for distribution shifts compared to training data.

B.Immediately retrain the model with all available data.

C.Increase the size of the training dataset by adding synthetic data.

D.Revert to a previous model version that performed well.

E.Conduct a root cause analysis focusing on concept drift.

AnswersA, D, E

Identifies data drift which is a common cause of degradation.

Why this answer

Options A, B, and D are correct. Analyzing user input detects shift, reverting provides quick recovery, and root cause analysis prevents recurrence. Option C is wrong because synthetic data may introduce noise.

Option E is wrong because immediate retraining without analysis could embed issues.

Full explanation →

472

MCQmedium

A company deployed a chatbot using a pre-trained language model. Users report that the chatbot provides incorrect answers to domain-specific questions. Which approach should the AI team prioritize to improve accuracy without retraining the entire model?

A.Fine-tune the model on a curated dataset of domain-specific conversations.

B.Increase the temperature parameter to reduce randomness.

C.Collect more general training data and retrain the model from scratch.

D.Roll back to a previous version of the model that was more accurate.

AnswerA

Fine-tuning adapts the model to the domain with less data and compute.

Why this answer

Fine-tuning on a curated domain-specific dataset is the most efficient way to improve accuracy for specialized queries without retraining the entire model. It adjusts the model's weights using a smaller, targeted dataset, preserving general language understanding while adapting to domain terminology and context.

Exam trap

CompTIA often tests the misconception that increasing temperature reduces randomness (when it actually increases it) or that rolling back to an older version is a valid fix for new domain-specific issues, leading candidates to choose B or D instead of recognizing fine-tuning as the targeted, efficient solution.

How to eliminate wrong answers

Option B is wrong because increasing the temperature parameter increases randomness in token selection, which would make answers less deterministic and more likely to be incorrect, not more accurate. Option C is wrong because collecting more general training data and retraining from scratch is resource-intensive, time-consuming, and contradicts the requirement to avoid retraining the entire model. Option D is wrong because rolling back to a previous version does not address the domain-specific inaccuracies; the older model likely lacks the specialized knowledge needed and may have its own deficiencies.

Full explanation →

473

Multi-Selecteasy

Which TWO of the following are essential components of a responsible AI governance framework?

Select 2 answers

A.Assignment of a responsible owner for each AI system's outcomes

B.Using ensemble methods to reduce overfitting

C.Clear documentation of model development and decision-making processes

D.Automated hyperparameter tuning to improve accuracy

E.Deploying models on dedicated hardware to reduce latency

AnswersA, C

Accountability is a fundamental governance requirement.

Why this answer

Options A and D are correct because transparency in model decisions and accountability for AI outcomes are foundational to responsible AI governance. Options B, C, and E are important but are more operational or technical rather than core governance components.

Full explanation →

474

Multi-Selecthard

Which THREE of the following are best practices for preventing overfitting in deep learning models?

Select 3 answers

A.L2 regularization

B.Increasing the number of layers

C.Dropout

D.Using a larger batch size

E.Data augmentation

AnswersA, C, E

L2 adds penalty on weights, keeping them small and reducing overfitting.

Why this answer

Dropout and L2 regularization directly penalize complexity. Data augmentation increases effective training set size. Increasing layers adds capacity, worsening overfitting.

Large batch sizes often lead to sharp minima and overfitting, not a prevention technique.

Full explanation →

475

MCQmedium

A healthcare organization uses an AI model to predict patient readmission risk. To comply with patient privacy regulations, they apply differential privacy during training. What is the primary trade-off of using differential privacy?

A.Increased training time for reduced bias

B.Lower interpretability for higher fairness

C.Faster inference for lower memory usage

D.Reduced model accuracy for increased privacy

AnswerD

Noise injection lowers accuracy but bounds privacy loss.

Why this answer

Option A is correct because differential privacy adds noise, which reduces model accuracy but protects privacy. Option B is wrong because training time may increase slightly but not primarily; the main trade-off is accuracy. Option C is wrong because interpretability is not directly affected.

Option D is wrong because inference latency is not significantly impacted.

Full explanation →

476

MCQhard

A research team is training a deep neural network for image classification. The training loss decreases rapidly for the first few epochs but then plateaus, while validation loss starts to increase after epoch 10. Which action would best address this issue?

A.Reduce the batch size to introduce more noise during training.

B.Increase the learning rate to help the model escape the plateau.

C.Implement early stopping based on validation loss to prevent further overfitting.

D.Add more convolutional layers to increase model capacity.

AnswerC

Early stopping stops training before overfitting worsens.

Why this answer

The training loss decreasing rapidly then plateauing while validation loss increases after epoch 10 is a classic sign of overfitting. Early stopping monitors validation loss and halts training when it begins to rise, preventing the model from memorizing noise in the training data. This directly addresses the overfitting issue without requiring architectural or hyperparameter changes that could destabilize training.

Exam trap

CompTIA often tests the misconception that plateauing training loss always requires adjusting learning rate or batch size, when in fact the simultaneous rise in validation loss is the definitive indicator of overfitting that early stopping is designed to solve.

How to eliminate wrong answers

Option A is wrong because reducing batch size increases gradient noise, which can actually worsen overfitting by preventing the model from converging to a stable minimum and may amplify validation loss increases. Option B is wrong because increasing the learning rate when validation loss is already rising risks overshooting the optimal weights, causing divergence or even higher validation loss. Option D is wrong because adding more convolutional layers increases model capacity, which exacerbates overfitting by giving the model more parameters to memorize training data rather than generalizing.

Full explanation →

477

MCQhard

A financial institution uses a machine learning model to approve loan applications. The model was trained on historical data that inadvertently encoded a bias against applicants from certain zip codes, leading to discriminatory lending practices. A recent audit reveals that the model's decisions are unfair, and regulators require the bank to remediate the bias without significantly reducing overall approval accuracy. The data science team has access to the training data, the model, and a set of fairness metrics. They also have a small, unbiased validation set. Which course of action should the team take to satisfy regulatory requirements?

A.Remove the zip code feature from the model and retrain

B.Implement adversarial debiasing using the unbiased validation set to enforce fairness constraints

C.Increase the weight of samples from disadvantaged zip codes in the training data

D.Retrain the model using only the unbiased validation set

AnswerB

Adversarial debiasing directly optimizes for fairness and accuracy.

Why this answer

Adversarial debiasing directly addresses the bias encoded in the model by training a predictor and an adversary simultaneously. The adversary tries to predict the protected attribute (e.g., zip code) from the model's predictions, while the predictor is penalized for allowing such inference, enforcing fairness constraints. Using the unbiased validation set ensures the debiasing process is guided by ground truth labels that are free from historical bias, allowing the model to retain high accuracy while reducing discrimination.

Exam trap

CompTIA often tests the misconception that removing a sensitive feature (like zip code) is sufficient to eliminate bias, but the trap is that models can learn proxy features, so a more sophisticated debiasing technique like adversarial debiasing is required.

How to eliminate wrong answers

Option A is wrong because simply removing the zip code feature does not eliminate bias; the model can still learn proxy features (e.g., income, loan amount) that correlate with zip code, leading to continued discriminatory outcomes. Option C is wrong because increasing sample weights for disadvantaged zip codes may overcorrect and reduce overall accuracy, and it does not directly enforce a fairness constraint; it can also introduce new biases if the weighting is not carefully tuned. Option D is wrong because retraining on only the small unbiased validation set would likely lead to severe overfitting and poor generalization, as the dataset is too small to capture the full distribution of loan applications, significantly reducing approval accuracy.

Full explanation →

478

MCQhard

An AI system used for resume screening is found to consistently reject female candidates for technical roles. The data science team retrains the model after removing the 'gender' feature, but the bias persists. What is the most likely cause?

A.The model architecture is too complex

B.The model uses proxy variables that correlate with gender

C.The training data still contains historical hiring bias

D.The evaluation metric does not measure fairness

AnswerB

Features like 'years of experience gaps' or 'extracurricular activities' may correlate with gender and perpetuate bias.

Why this answer

Removing the gender feature alone is insufficient because other features (e.g., years of experience, education, hobbies) can act as proxies for gender. This is a common pitfall in fairness interventions.

Full explanation →

479

MCQmedium

A retail company wants to implement a recommendation system using collaborative filtering. The dataset contains user-item interactions (ratings) for 10,000 users and 5,000 products. The matrix is very sparse (99% missing values). The team plans to use matrix factorization to predict missing ratings. However, the training time is excessively long, and the model is not converging. The data engineer suggests using a smaller learning rate and more iterations. Which additional technique should the team apply to speed up training and improve convergence?

A.Add L2 regularization to the loss function

B.Increase the minibatch size

C.Reduce the number of latent factors

D.Switch to the Adam optimizer

AnswerA

Regularization prevents overfitting and improves convergence by penalizing large weights.

Why this answer

The correct answer is A because adding L2 regularization to the loss function helps prevent overfitting and improves convergence in matrix factorization, especially with extremely sparse data (99% missing). Regularization penalizes large latent factor weights, which stabilizes the optimization process and allows the model to generalize better, reducing the risk of divergence during training.

Exam trap

CompTIA often tests the misconception that adaptive optimizers like Adam are a universal fix for convergence issues, but in sparse matrix factorization, L2 regularization is a more direct solution to the overfitting and instability that cause non-convergence.

How to eliminate wrong answers

Option B is wrong because increasing minibatch size typically speeds up training per iteration but can lead to slower convergence and may not address the core issue of non-convergence due to overfitting or ill-conditioned gradients. Option C is wrong because reducing the number of latent factors reduces model capacity and can cause underfitting, but it does not directly fix convergence problems; in fact, it may worsen the model's ability to capture patterns in sparse data. Option D is wrong because switching to the Adam optimizer can help with convergence in many cases, but the question asks for an additional technique beyond the suggested smaller learning rate and more iterations; Adam adapts learning rates per parameter but does not inherently address the overfitting and stability issues caused by extreme sparsity, whereas L2 regularization directly mitigates those.

Full explanation →

480

MCQhard

A national security agency uses AI to analyze surveillance data for threat detection. The system is deployed in a high-stakes environment where false negatives could lead to missed threats, and false positives waste analyst time. Recently, a known hacker group attempted to evade detection by subtly modifying their communication patterns over time, a form of adversarial evasion. The agency wants to harden the system while maintaining performance. The system uses a deep neural network. Which mitigation strategy is most appropriate?

A.Switch to an unsupervised learning approach to detect anomalies

B.Simplify the model to a logistic regression to reduce the attack surface

C.Perform adversarial training using the hacker group's known evasion patterns

D.Add random noise to all input data to confuse evasion attempts

AnswerC

Adversarial training directly hardens the model against those patterns.

Why this answer

Option C is correct because adversarial training exposes the model to known evasion patterns during training, improving robustness without changing the model type. Option A is wrong because reducing model complexity may decrease accuracy. Option B is wrong because unsupervised learning may not capture the specific adversarial patterns.

Option D is wrong because random input perturbations do not represent realistic evasion.

Full explanation →

481

MCQhard

During a red-team exercise on an AI model, testers successfully extracted training data. Which vulnerability is this?

A.Membership inference

B.Model inversion

C.Adversarial example

D.Data poisoning

AnswerB

Model inversion reconstructs training data.

Why this answer

Option B (Model inversion) is correct because model inversion attacks reconstruct training data. Option A (Membership inference) determines if a record was used, not extraction. Option C (Data poisoning) corrupts training data.

Option D (Adversarial example) causes misclassification.

Full explanation →

482

MCQhard

An AI system misclassifies rare but critical events. The team considers using synthetic data. Which consideration is MOST important for ensuring the synthetic data improves performance on real rare events?

A.The synthetic data should include a wide variety of events, even if not realistic.

B.The synthetic data should be generated using an unsupervised generative model.

C.The synthetic data should accurately represent the distribution and features of real rare events.

D.The synthetic data should be as large as possible to cover all possibilities.

AnswerC

Fidelity to real event characteristics is crucial for generalization.

Why this answer

Option C is correct because synthetic data must faithfully replicate the distribution and feature space of real rare events to enable the model to learn meaningful decision boundaries. If the synthetic data does not capture the true underlying patterns—such as specific sensor readings or transaction anomalies—the model will fail to generalize to actual rare events, defeating the purpose of augmentation.

Exam trap

CompTIA often tests the misconception that 'more data is always better' or that 'any synthetic data helps,' when in reality the fidelity of the synthetic data to the real rare event distribution is the paramount factor for improving model performance on those events.

How to eliminate wrong answers

Option A is wrong because including a wide variety of unrealistic events introduces noise and spurious correlations, which can degrade the model's precision and recall on real rare events. Option B is wrong because the choice of generative model (unsupervised vs. supervised) is secondary; the critical factor is that the synthetic data accurately reflects the real rare event distribution, not the training paradigm. Option D is wrong because simply maximizing dataset size without ensuring fidelity to real rare events can lead to overfitting on synthetic artifacts and poor generalization to authentic edge cases.

Full explanation →

483

Multi-Selecteasy

A data scientist is tuning hyperparameters for a support vector machine (SVM) with an RBF kernel. Which two hyperparameters most significantly affect model performance? (Select TWO.)

Select 2 answers

A.gamma (kernel coefficient)

B.learning rate

C.epsilon (for epsilon-SVR)

D.degree (for polynomial kernel)

E.C (regularization parameter)

AnswersA, E

gamma determines the radius of influence of support vectors.

Why this answer

Options A and B are correct. C (regularization) controls the trade-off between margin and misclassification, and gamma defines the influence of a single training example. Option C is for polynomial kernel, not RBF.

Option D is for regression SVM. Option E is not a hyperparameter for SVM.

Full explanation →

484

MCQeasy

Refer to the exhibit. A system administrator sees these logs from an AI inference pipeline. What is the most likely sequence of events?

A.Data poisoning corrupted the model, causing NaN outputs

B.The security filter failed to detect an attack and the model returned an error

C.A non-adversarial input caused a NaN error due to missing data

D.An adversarial input was blocked by the security filter

AnswerD

The security filter flagged the input as adversarial and blocked it.

Why this answer

Option A is correct because the adversarial input triggered the security filter, which then blocked the request. Option B is wrong because the input caused NaN, not filter failure. Option C is wrong because there is no evidence of data poisoning.

Option D is wrong because the filter blocked it; it did not fail.

Full explanation →

485

MCQmedium

Refer to the exhibit. A data engineer runs a validation report on the customers table. The "income" column has 12 null values. Which imputation strategy is most appropriate for this column?

A.Remove rows with null income

B.Replace nulls with the median income per region

C.Replace nulls with 0

D.Replace nulls with the mean income of the entire dataset

AnswerB

Median per region respects regional variation and is robust to outliers.

Why this answer

Income varies by region, so imputing with the median per region accounts for regional differences. The mean of entire dataset may be skewed, 0 is inappropriate, and removing rows reduces sample size.

Full explanation →

486

MCQhard

A team is building a natural language processing (NLP) model to analyze customer feedback. They have a large corpus of unlabeled text data and want to generate word embeddings that capture semantic meaning. Which approach should they use?

A.One-hot encoding

B.TF-IDF vectorization

C.Word2Vec

D.Bag-of-words model

AnswerC

Word2Vec learns dense embeddings from unlabeled text, capturing semantic relationships.

Why this answer

Word2Vec is the correct approach because it learns dense, distributed word embeddings from large unlabeled corpora by training a shallow neural network to predict words in context (CBOW) or context from words (Skip-gram). This captures semantic relationships such as analogy and similarity, which is essential for analyzing customer feedback without labeled data.

Exam trap

CompTIA often tests the distinction between frequency-based vectorization (TF-IDF, bag-of-words) and prediction-based embedding methods (Word2Vec, GloVe), trapping candidates who think TF-IDF captures semantic meaning when it only captures term importance in a document.

How to eliminate wrong answers

Option A is wrong because one-hot encoding produces sparse, high-dimensional vectors with no semantic meaning—each word is represented as a binary vector with a single 1, and all vectors are orthogonal, so no similarity or relationship between words is captured. Option B is wrong because TF-IDF vectorization relies on term frequency and inverse document frequency to produce weighted sparse vectors, which reflect word importance in a document but do not capture semantic meaning or word relationships; it is a bag-of-words variant that ignores word order and context. Option D is wrong because the bag-of-words model creates sparse vectors based on word counts, losing all word order and context, and cannot generate embeddings that capture semantic similarity or analogy.

Full explanation →

487

MCQmedium

A data engineer is building a pipeline to ingest streaming data from IoT sensors. Which data storage solution is best suited for real-time analytics on timestamped sensor readings?

A.Data warehouse

B.Relational database

C.Data lake

D.Time-series database

AnswerD

Time-series databases provide specialized indexing, compression, and query capabilities for timestamped data.

Why this answer

Time-series databases (TSDBs) are optimized for high-ingest rates of timestamped data and provide efficient downsampling, retention policies, and time-based aggregation functions. For IoT sensor streaming, a TSDB like InfluxDB or TimescaleDB delivers sub-second query performance on time-range scans, which is essential for real-time analytics.

Exam trap

CompTIA often tests the misconception that 'any database can handle time-series data if you add a timestamp column,' ignoring the fundamental architectural differences in storage engines, indexing, and write optimization that make TSDBs the only viable choice for real-time streaming analytics.

How to eliminate wrong answers

Option A is wrong because data warehouses (e.g., Snowflake, Redshift) are designed for batch-oriented, structured querying of historical data and cannot sustain the high write throughput or low-latency time-range scans required for streaming sensor data. Option B is wrong because relational databases (e.g., PostgreSQL, MySQL) use row-based storage and B-tree indexes that degrade under continuous time-series inserts, leading to write contention and slow time-range queries. Option C is wrong because data lakes (e.g., S3, ADLS) store raw data in object storage with no indexing or time-ordering, making real-time analytics impossible due to high read latency and lack of native time-series functions.

Full explanation →

488

MCQhard

A deep learning model for autonomous vehicle perception uses a large convolutional neural network. During deployment, the model misclassifies a stop sign that has a small sticker on it. This is likely an example of what type of vulnerability, and which defense is most appropriate?

A.Adversarial attack; implement adversarial training

B.Model inversion; add differential privacy

C.Data poisoning; use robust aggregation

D.Transfer learning; use domain adaptation

AnswerA

Small perturbations like stickers can cause adversarial misclassification; adversarial training improves robustness.

Why this answer

Option B (Data poisoning) involves corrupting training data. Option C (Model inversion) extracts training data. Option D (Transfer learning) is a technique, not a vulnerability.

Option A correctly identifies an adversarial attack (small perturbation causing misclassification) and suggests adversarial training as a defense.

Full explanation →

489

Multi-Selectmedium

A team monitors a production model for bias. They measure the selection rate for two demographic groups and find a significant difference. Which TWO actions should the team take to mitigate bias? (Choose two.)

Select 2 answers

A.Increase the complexity of the model to capture more patterns

B.Add more training data from both groups

C.Retrain the model with a balanced training dataset

D.Remove the protected attribute from the model input

E.Implement a post-processing fairness adjustment

AnswersC, E

Balanced data reduces bias by ensuring the model learns from fair representations.

Why this answer

Retraining with a balanced training dataset (Option C) directly addresses the root cause of bias by ensuring the model learns from equal representation of both demographic groups, which reduces skewed selection rates. This is a standard data-level mitigation technique in AI fairness, as it prevents the model from overfitting to majority patterns.

Exam trap

CompTIA often tests the misconception that removing the protected attribute (Option D) is sufficient to eliminate bias, when in reality proxy features and correlated variables can perpetuate discrimination.

Full explanation →

490

Multi-Selecteasy

Which TWO of the following are best practices for securing an AI model against adversarial attacks?

Select 2 answers

A.Model pruning to reduce the number of parameters.

B.Adversarial training with perturbed examples.

C.Input sanitization and validation.

D.Increasing model complexity to capture more patterns.

E.Hyperparameter optimization using grid search.

AnswersB, C

Adversarial training exposes the model to adversarial inputs, improving robustness.

Why this answer

Option B is correct because adversarial training explicitly augments the training dataset with perturbed examples (e.g., using FGSM or PGD attacks) to teach the model to recognize and resist malicious inputs. This method directly hardens the model against evasion attacks by improving its decision boundary robustness.

Exam trap

CompTIA often tests the misconception that increasing model complexity or pruning improves security, when in fact these techniques address performance or efficiency, not adversarial robustness.

Full explanation →

491

MCQmedium

An AI model is trained to predict loan default. The training data contains 95% non-default and 5% default. Which metric is most appropriate to evaluate model performance given the imbalanced dataset?

A.Mean squared error

B.F1-score

C.Accuracy

D.R-squared

AnswerB

F1-score considers both false positives and false negatives, providing a balanced measure for minority class performance.

Why this answer

The F1-score is the harmonic mean of precision and recall, making it robust to class imbalance. In this dataset with 95% non-default and 5% default, accuracy would be misleadingly high (95%) even if the model never predicts default, while F1-score penalizes poor recall of the minority class.

Exam trap

CompTIA often tests the misconception that accuracy is always the best metric, leading candidates to overlook its failure in imbalanced scenarios where a trivial classifier can achieve high accuracy.

How to eliminate wrong answers

Option A is wrong because Mean Squared Error (MSE) is a regression metric that measures average squared differences between predicted and actual values, not suitable for binary classification tasks like loan default prediction. Option C is wrong because accuracy is misleading on imbalanced datasets; a model predicting all non-default would achieve 95% accuracy but fail to identify any actual defaults. Option D is wrong because R-squared is a regression metric that indicates the proportion of variance explained by the model, inappropriate for evaluating classification performance on imbalanced data.

Full explanation →

492

MCQhard

An organization wants to implement an AI ethics board. Which composition best ensures independence and expertise?

A.All members from the legal department

B.IT department head and data scientists

C.Mix of internal stakeholders and external ethicists

D.Only senior executives from the company

AnswerC

Ensures independence and diverse expertise.

Why this answer

Option B (Mix of internal stakeholders and external ethicists) is correct because external ethicists provide unbiased perspective and expertise, while internal stakeholders understand operations. Option A (Only senior executives from the company) may have conflicts of interest. Option C (All members from the legal department) focuses only on compliance.

Option D (IT department head and data scientists) lacks ethics expertise.

Full explanation →

493

MCQmedium

A retail company uses a gradient boosting model to predict customer lifetime value (CLV). The model currently uses 50 features including purchase history, demographics, and web behavior. The model's RMSE on the test set is 120. The data science team wants to improve the model's accuracy without increasing training time significantly. They have access to additional data: customer support interaction logs (text), social media sentiment (text), and third-party credit scores (numeric). They also have the ability to perform feature engineering, hyperparameter tuning, and ensemble methods. Which approach is most likely to yield the best improvement in predictive performance with minimal increase in training time?

A.Add the customer support text as a feature using TF-IDF vectors

B.Use an ensemble of gradient boosting and random forest models

C.Perform hyperparameter tuning using grid search

D.Engineer new features such as average purchase value and recency

AnswerD

Feature engineering can capture patterns without adding new data sources or significant time.

Why this answer

Option D is correct because engineering domain-relevant features like average purchase value and recency directly captures the underlying behavioral patterns that drive customer lifetime value, often providing a higher signal-to-noise ratio than adding raw text or third-party data. This approach leverages existing data without significantly increasing the feature dimensionality or training time, unlike adding TF-IDF vectors which would dramatically expand the feature space and slow training.

Exam trap

CompTIA often tests the misconception that adding more data (especially text) or complex ensemble methods always improves model accuracy, while the correct approach is to engineer features that capture domain-specific patterns with minimal computational overhead.

How to eliminate wrong answers

Option A is wrong because adding customer support text as TF-IDF vectors would introduce thousands of sparse features, significantly increasing training time and risking overfitting without guaranteed improvement in RMSE. Option B is wrong because ensembling gradient boosting with random forest typically increases training time substantially (both models must be trained) and may not outperform a well-tuned single gradient boosting model on structured data. Option C is wrong because hyperparameter tuning using grid search is computationally expensive, often requiring many model fits, and would increase training time more than feature engineering without leveraging the new data sources.

Full explanation →

494

MCQhard

A data scientist notices the model overfits. Which change to the exhibit's configuration would most likely reduce overfitting?

A.Remove dropout layers

B.Increase learning rate to 0.01

C.Add L2 regularization to dense layers

D.Increase units in the first dense layer to 512

AnswerC

L2 regularization adds a penalty on large weights, discouraging complex models and reducing overfitting.

Why this answer

Adding L2 regularization to dense layers penalizes large weights by adding a squared magnitude term to the loss function, which forces the model to learn simpler patterns and reduces overfitting. This directly addresses the core issue of the model memorizing noise in the training data.

Exam trap

CompTIA often tests the misconception that increasing model capacity (more units or layers) or removing regularization always improves performance, when in fact these changes exacerbate overfitting; candidates must recognize that regularization techniques like L2 are specifically designed to penalize complexity and reduce overfitting.

How to eliminate wrong answers

Option A is wrong because removing dropout layers would actually increase overfitting, as dropout is a regularization technique that randomly drops neurons during training to prevent co-adaptation. Option B is wrong because increasing the learning rate to 0.01 (a relatively high value) can cause the optimizer to overshoot minima and lead to unstable training, but it does not directly reduce overfitting; in fact, a too-high learning rate may prevent convergence altogether. Option D is wrong because increasing units in the first dense layer to 512 adds more parameters to the model, which increases capacity and typically worsens overfitting rather than reducing it.

Full explanation →

495

Multi-Selecthard

Which TWO deployment strategies allow for testing a new model version before fully rolling it out?

Select 2 answers

A.Shadow deployment

B.Canary deployment

C.Direct cutover

D.A/B testing with traffic splitting

E.Blue-Green deployment

AnswersB, D

Canary releases route a subset of users to the new version for validation.

Why this answer

Canary deployment is correct because it routes a small percentage of live traffic to the new model version while the majority continues using the stable version. This allows real-world validation of the new model's performance and error rates under production load before a full rollout, minimizing blast radius if issues arise.

Exam trap

The trap here is that candidates confuse shadow deployment with canary deployment, mistakenly thinking shadow also tests user-facing behavior, when in fact shadow only tests infrastructure impact without validating model outputs against live user expectations.

Full explanation →

496

MCQhard

An e-commerce company deploys a recommendation model that must serve predictions with sub-100 ms latency for millions of users during peak hours. The model is a large neural network. Which architecture is most suitable?

A.Batch process predictions every hour.

B.Use a distributed system with load balancers and model replicas.

C.Deploy the model on a single powerful GPU server.

D.Use serverless functions with auto-scaling.

AnswerB

This architecture handles high traffic and meets latency requirements efficiently.

Why this answer

Option B is correct because distributing the model across multiple servers with load balancers and replicas allows horizontal scaling to handle millions of concurrent users while maintaining sub-100 ms latency. This architecture provides fault tolerance and can dynamically adjust to peak traffic loads, which is essential for real-time inference with large neural networks.

Exam trap

CompTIA often tests the misconception that a single powerful server or serverless functions can meet strict latency and throughput requirements, but the trap here is that horizontal scaling with load-balanced replicas is the only viable solution for high-concurrency, low-latency inference with large models.

How to eliminate wrong answers

Option A is wrong because batch processing predictions every hour introduces latency of up to 3600 seconds, which fails the sub-100 ms requirement and is unsuitable for real-time recommendation systems. Option C is wrong because a single powerful GPU server creates a single point of failure and cannot scale horizontally to handle millions of concurrent users during peak hours, leading to resource contention and latency spikes. Option D is wrong because serverless functions typically have cold start delays (often 100 ms to several seconds) and may not support large neural network models due to memory and execution time limits (e.g., AWS Lambda max 15 minutes, 10 GB memory), making them unsuitable for low-latency, high-throughput inference.

Full explanation →

497

MCQmedium

A team trains a random forest model on a dataset with 50 features. The model's performance on the test set is significantly worse than on the training set. Which technique is most appropriate to address this issue?

A.Apply cross-validation to tune hyperparameters and reduce overfitting

B.Increase the number of trees in the forest

C.Use feature scaling

D.Perform PCA to reduce dimensions

AnswerA

Cross-validation finds optimal max depth, min samples split, etc., to combat overfitting.

Why this answer

Option D is correct because cross-validation helps tune hyperparameters to reduce overfitting. Option A is incorrect because increasing trees reduces variance but may not be sufficient. Option B is incorrect because tree-based models are scale-invariant.

Option C is incorrect because PCA can reduce dimensionality but may lose information; hyperparameter tuning is a better first step.

Full explanation →

498

MCQhard

An MLOps team automates model deployment with a CI/CD pipeline. A performance regression is detected after deploying a new model version. The team needs to automatically roll back to the previous version. Which approach best enables safe automated rollback?

A.Use a blue/green deployment with automated health checks and traffic switching

B.Maintain a manual rollback script that the operations team can run

C.Deploy new models as canary releases and monitor for 24 hours

D.Automatically keep the previous model version in storage for later use

AnswerA

Blue/green allows instant rollback by redirecting traffic.

Why this answer

Blue/green deployment with automated health checks and traffic switching is the best approach because it allows the team to instantly route all traffic back to the previous (green) environment if the new (blue) version fails health checks. This ensures zero-downtime rollback without manual intervention, directly addressing the need for safe automated rollback in a CI/CD pipeline.

Exam trap

CompTIA often tests the distinction between preserving artifacts (storage) and enabling automated traffic switching (deployment strategy), so candidates mistakenly choose Option D thinking storage alone ensures rollback capability.

How to eliminate wrong answers

Option B is wrong because a manual rollback script introduces human delay and error risk, contradicting the requirement for automated rollback. Option C is wrong because canary releases with a 24-hour monitoring window do not provide immediate automated rollback; they rely on manual decision-making after observation, which is not fully automated. Option D is wrong because simply keeping the previous model version in storage does not enable automatic traffic switching or rollback; it only preserves the artifact, not the deployment state.

Full explanation →

499

MCQeasy

A team is using a pre-trained language model for sentiment analysis. They want to adapt it to a specific domain with limited labeled data. Which approach is most efficient?

A.Fine-tune the pre-trained model on domain data

B.Use the pre-trained model as is

C.Train a new model from scratch

D.Ensemble multiple pre-trained models

AnswerA

Fine-tuning updates the model weights slightly on domain data, achieving good performance with few examples.

Why this answer

Fine-tuning leverages the pre-trained model's knowledge and requires only minimal additional training on domain-specific data, making it efficient. Training from scratch is computationally expensive and requires large datasets. Using the model as-is may perform poorly on domain-specific language.

Ensembling multiple models adds complexity without clear benefit.

Full explanation →

500

MCQeasy

A company has developed a deep learning model for image classification. The team wants to deploy the model to production with high availability and scalability. Which approach should they use?

A.Run the model on a laptop during business hours.

B.Deploy the model as a monolithic application on a single server.

C.Embed the model directly into a mobile app.

D.Use a containerized approach with Kubernetes.

AnswerD

Kubernetes provides orchestration, scaling, and high availability for containerized applications.

Why this answer

Option D is correct because containerization with Kubernetes provides the orchestration, auto-scaling, and self-healing capabilities required for high availability and scalability in production. Kubernetes manages container lifecycles, distributes traffic across replicas via Services and Ingress controllers, and can automatically scale pods based on CPU/memory metrics or custom metrics, ensuring the deep learning model handles variable loads without downtime.

Exam trap

CompTIA often tests the misconception that embedding AI models directly into mobile apps or running them on a single server is sufficient for production, when in reality enterprise-grade deployments require container orchestration for resilience and elasticity.

How to eliminate wrong answers

Option A is wrong because running the model on a laptop during business hours lacks any production-grade availability, scalability, or fault tolerance; it is a single point of failure and cannot handle concurrent requests. Option B is wrong because a monolithic application on a single server creates a single point of failure, cannot scale horizontally, and offers no load balancing or automated recovery, making it unsuitable for high availability. Option C is wrong because embedding the model directly into a mobile app offloads inference to client devices, which introduces latency, security risks, and inconsistent performance; it does not provide centralized high availability or scalability for the production service.

Full explanation →

Page 7 of 7

All pages

Practice AI0-001 by domain

Target a specific domain to shore up weak areas.

AI Concepts and Foundations Machine Learning and Deep Learning AI Models and Data Engineering AI Implementation and Operations AI Security, Ethics and Governance

See all domains with question counts →