CompTIA · Free Practice Questions · Last reviewed May 2026
30real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.
A company deploys an AI model to predict equipment failure. The model performs well on historical data but fails to generalize to new data from a different factory. Which concept best describes this issue?
Transfer learning
Underfitting
Overfitting
The model fits training data too closely and fails on new data.
Bias-variance tradeoff
A data scientist trains a linear regression model to predict house prices. The model has high bias and low variance. Which action would most likely reduce bias?
Apply L2 regularization
Increase the training dataset size
Add polynomial features
Adding complexity reduces bias but may increase variance.
Remove irrelevant features
An AI engineer trains a deep learning model for image classification. After training, the training accuracy is 99% but validation accuracy is 85%. Which technique would best address this discrepancy?
Increase data augmentation
Decrease the learning rate
Increase the number of layers
Add dropout layers
Dropout reduces overfitting by preventing co-adaptation of neurons.
A company implements a chatbot using a rule-based system. Users complain the chatbot cannot handle new queries. Which AI approach should be considered to improve flexibility?
Expert system
Natural language processing (NLP)
Robotic process automation
Machine learning
ML enables the system to learn patterns from data.
An AI model for detecting fraudulent transactions has high precision but low recall. Which business impact is most likely?
The model has no impact on fraud detection
The model detects all fraudulent transactions
Many fraudulent transactions go undetected
Low recall indicates a high number of false negatives.
Many legitimate transactions are flagged as fraud
A data scientist splits a dataset into training (80%) and test (20%). After training, the model achieves 95% accuracy on training and 60% on test. Which step should the data scientist take first?
Collect more data
Use cross-validation
Apply regularization
Regularization penalizes large weights, reducing overfitting.
Increase model complexity
Want more AI Concepts and Foundations practice?
Practice this domainA data scientist is building a classification model to detect fraudulent transactions. The dataset is highly imbalanced with only 1% fraudulent cases. Which approach should the scientist use to evaluate model performance most effectively?
F1 score
F1 score is the harmonic mean of precision and recall, providing a balanced measure for imbalanced datasets.
Accuracy
Recall
Precision
A machine learning team is deploying a model that predicts customer churn. They notice that the model's predictions are highly sensitive to small changes in input features, leading to inconsistent outputs. Which technique should the team apply to improve model stability?
Increase learning rate
Feature scaling
Regularization
Regularization adds a penalty for large weights, reducing overfitting and sensitivity to input variations.
Cross-validation
A deep learning model for image classification is overfitting the training data. The team has already tried data augmentation and dropout. Which additional technique should they implement to reduce overfitting?
Batch normalization
Increase number of epochs
Gradient clipping
Early stopping
Early stopping monitors validation loss and stops training when it starts to increase, reducing overfitting.
A company wants to deploy a machine learning model that requires continuous learning as new data arrives. The model must be able to adapt to changing patterns without retraining from scratch. Which approach should be used?
Transfer learning
Online learning
Online learning updates the model incrementally, allowing adaptation to new data without full retraining.
Batch learning
Unsupervised learning
A data engineer is designing a pipeline to train a linear regression model on a dataset with 10 million rows and 50 features. The dataset fits in memory. Which approach should the engineer use to train the model efficiently?
Normal equation
Batch gradient descent
Principal component analysis
Stochastic gradient descent
SGD updates weights per sample, making it efficient for large datasets.
A data scientist is training a convolutional neural network (CNN) for object detection. The training loss decreases rapidly but then plateaus at a high value, and the validation loss starts increasing. Which action should the scientist take to improve the model?
Increase the learning rate
Increase the number of epochs
Reduce the model complexity
Reducing complexity (e.g., fewer layers) can reduce overfitting and improve validation performance.
Add more convolutional layers
Want more Machine Learning and Deep Learning practice?
Practice this domainA data scientist is preparing a dataset for training a classification model. The dataset contains 10,000 records with a binary target variable where 9,500 belong to class A and 500 belong to class B. Which technique should the scientist use to address the class imbalance?
SMOTE (Synthetic Minority Oversampling Technique)
SMOTE creates synthetic minority samples to balance classes.
Random undersampling of class A
Adding Gaussian noise to class B
Principal Component Analysis (PCA)
An engineer is building a regression model to predict housing prices. The dataset includes features such as square footage, number of bedrooms, and year built. The engineer notices that the square footage values range from 500 to 10,000, while the number of bedrooms ranges from 1 to 5. Which preprocessing step is most critical before training a gradient descent-based model?
Use k-fold cross-validation
Apply log transformation to all features
Normalize or standardize the features
Scaling improves convergence of gradient descent.
One-hot encode the features
A machine learning team is deploying a sentiment analysis model for customer reviews. The model was trained on reviews from an e-commerce site but will be used for a social media platform. The team observes a drop in accuracy. Which concept best explains this issue?
Data drift
The distribution of reviews differs between e-commerce and social media.
Concept drift
Bias-variance tradeoff
Overfitting
A data engineer needs to design a data pipeline for a real-time fraud detection system. The system requires low-latency processing of streaming transactions. Which architecture is most appropriate?
Stream processing with Apache Kafka and Flink
Stream processing provides low-latency real-time analysis.
Data lake with Apache Spark
Batch processing with Apache Hadoop
Microservices architecture with REST APIs
A team is training a deep learning model for image classification. The training loss decreases rapidly but validation loss starts increasing after a few epochs. Which regularization technique should be applied to mitigate this issue?
Data augmentation
L2 regularization
Early stopping
Early stopping prevents overfitting by stopping training when validation loss starts to rise.
Dropout
An organization needs to store sensitive customer data for training a machine learning model. The data must be encrypted at rest and in transit, and access must be audited. Which combination of practices should be implemented?
Use TLS for transfer, AES-256 for storage, and AWS CloudTrail for auditing
These provide encryption and auditing.
Use FTP for transfer, AES-128 for storage, and manual log review
Use SSH for transfer, store data in a database, and enable access logs
Use MD5 for hashing, store data in plaintext, and enable server logs
Want more AI Models and Data Engineering practice?
Practice this domainA company deployed a chatbot using a pre-trained language model. Users report that the chatbot provides incorrect answers to domain-specific questions. Which approach should the AI team prioritize to improve accuracy without retraining the entire model?
Fine-tune the model on a curated dataset of domain-specific conversations.
Fine-tuning adapts the model to the domain with less data and compute.
Increase the temperature parameter to reduce randomness.
Collect more general training data and retrain the model from scratch.
Roll back to a previous version of the model that was more accurate.
An AI system misclassifies rare but critical events. The team considers using synthetic data. Which consideration is MOST important for ensuring the synthetic data improves performance on real rare events?
The synthetic data should include a wide variety of events, even if not realistic.
The synthetic data should be generated using an unsupervised generative model.
The synthetic data should accurately represent the distribution and features of real rare events.
Fidelity to real event characteristics is crucial for generalization.
The synthetic data should be as large as possible to cover all possibilities.
A data scientist trains a regression model and notices the training loss is low but validation loss is high. Which technique should be applied FIRST to address this issue?
Increase the learning rate.
Add more layers to the neural network.
Increase the size of the training dataset.
Apply L1 or L2 regularization to the model.
Regularization penalizes large weights, reducing overfitting.
A company deploys an AI model for loan approval. The model shows bias against a protected group. The team decides to use adversarial debiasing. What is the PRIMARY advantage of this approach?
It guarantees the model's predictions are private.
It reduces bias while preserving predictive performance by learning representations that are invariant to sensitive attributes.
This is the core benefit of adversarial debiasing.
It is simpler to implement than pre-processing techniques.
It ensures equal approval rates across all groups.
An AIOps platform monitors server metrics and triggers alerts. The team notices too many false positives. Which adjustment should be made to the anomaly detection model?
Use a more complex model to better fit the data.
Shorten the observation window to detect anomalies faster.
Increase the training data to include more normal patterns.
Raise the anomaly score threshold for triggering alerts.
A higher threshold means only more extreme deviations trigger alerts.
A team deploys a machine learning model as a REST API. They want to monitor model drift. Which metric is MOST appropriate for detecting drift in the input data distribution?
Model accuracy on a recent holdout set.
Population stability index (PSI) comparing training and recent data.
PSI directly quantifies distribution shift.
F1 score on the training data.
Root mean squared error (RMSE) on test data.
Want more AI Implementation and Operations practice?
Practice this domainA healthcare organization deploys an AI system to analyze medical images and detect anomalies. During a routine audit, the security team discovers that the AI model occasionally returns results that include data from patients who have opted out of data sharing. Which security control should be implemented to prevent this violation?
Apply data anonymization techniques to the training dataset.
Anonymization removes personally identifiable information, ensuring that the model cannot output data linked to specific patients.
Implement role-based access control (RBAC) on the AI model's inference API.
Use differential privacy during model training.
Encrypt the training data at rest and in transit.
A financial institution is implementing an AI-based fraud detection system. The compliance officer is concerned about potential bias in the model that could lead to unfair treatment of certain customer groups. Which governance practice should be prioritized to address this concern?
Increase the diversity of the training data by collecting more samples from underrepresented groups.
Schedule regular bias audits using fairness metrics.
Bias audits with metrics like demographic parity can detect unfair treatment and guide mitigation.
Retrain the model every month with the latest transaction data.
Use SHAP values to provide explanations for each prediction.
A company uses a machine learning model to recommend products to customers. The marketing team notices that the model is recommending high-profit items more frequently than low-profit items, even when customers are likely to prefer the latter. This behavior is causing customer dissatisfaction. Which approach would best align the model with customer preferences while maintaining profitability?
Train the model with a loss function that weights profit more heavily than customer satisfaction.
Use a multi-objective optimization framework to balance profit and customer satisfaction.
Multi-objective optimization explicitly considers trade-offs between competing objectives, leading to more balanced recommendations.
Adjust the model's hyperparameters to reduce the influence of profit features.
Remove profit data from the training set and only use customer preference data.
An AI system used for resume screening is found to consistently rank male candidates higher than female candidates with similar qualifications. The HR director wants to remediate this bias without significantly reducing model accuracy. Which technique should be applied?
Apply adversarial debiasing to the model during training.
Adversarial debiasing reduces bias by training the model to be unable to predict protected attributes from its predictions.
Use a random selection of candidates to avoid bias.
Remove the gender feature from the dataset and retrain.
Collect more training data from underrepresented groups.
A company is developing an AI chatbot for customer service. The legal team is concerned that the chatbot might generate responses that violate privacy regulations. Which governance mechanism should be implemented to mitigate this risk?
Use explainable AI techniques to understand why the chatbot generates certain responses.
Encrypt all chatbot conversations at rest and in transit.
Implement a human-in-the-loop review process for high-risk responses.
Human review can catch and block responses that violate privacy regulations before they are sent to customers.
Anonymize the training data used to train the chatbot.
A self-driving car company is testing an AI model for pedestrian detection. During simulation, the model fails to detect pedestrians in low-light conditions. The safety team wants to improve robustness without retraining the entire model from scratch. Which approach is most appropriate?
Replace the convolutional layers with transformer layers to improve attention.
Apply data augmentation techniques to simulate low-light conditions in the training dataset.
Data augmentation can expand the training data to include low-light scenarios, improving robustness without full retraining.
Use adversarial training to add imperceptible perturbations to training images.
Increase the model's depth by adding more convolutional layers.
Want more AI Security, Ethics and Governance practice?
Practice this domainThe AI0-001 exam has 80 questions and must be completed in 90 minutes. The passing score is 700/1000.
Multiple-choice and performance-based questions covering IT security, networking, and operations. Some questions are performance-based (PBQs), asking you to complete tasks in a simulated environment.
The exam covers 5 domains: AI Concepts and Foundations, Machine Learning and Deep Learning, AI Models and Data Engineering, AI Implementation and Operations, AI Security, Ethics and Governance. Questions are weighted by domain — higher-weight domains appear more on your actual exam.
No. These are original exam-style practice questions written against the official CompTIA AI0-001 exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.
Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.