Practice AI0-001 Machine Learning and Deep Learning questions with full explanations on every answer.
Start practicing
Machine Learning and Deep Learning — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A data scientist is building a classification model to detect fraudulent transactions. The dataset is highly imbalanced with only 1% fraudulent cases. Which approach should the scientist use to evaluate model performance most effectively?
2A machine learning team is deploying a model that predicts customer churn. They notice that the model's predictions are highly sensitive to small changes in input features, leading to inconsistent outputs. Which technique should the team apply to improve model stability?
3A deep learning model for image classification is overfitting the training data. The team has already tried data augmentation and dropout. Which additional technique should they implement to reduce overfitting?
4A company wants to deploy a machine learning model that requires continuous learning as new data arrives. The model must be able to adapt to changing patterns without retraining from scratch. Which approach should be used?
5A data engineer is designing a pipeline to train a linear regression model on a dataset with 10 million rows and 50 features. The dataset fits in memory. Which approach should the engineer use to train the model efficiently?
6A data scientist is training a convolutional neural network (CNN) for object detection. The training loss decreases rapidly but then plateaus at a high value, and the validation loss starts increasing. Which action should the scientist take to improve the model?
7A team is building a recommendation system using collaborative filtering. They have a sparse user-item matrix. Which technique should they use to handle the sparsity and improve recommendations?
8Which TWO techniques are commonly used to handle missing data in a machine learning dataset? (Choose TWO.)
9Which THREE are common activation functions used in neural networks? (Choose THREE.)
10Which TWO are valid techniques to reduce overfitting in a deep neural network? (Choose TWO.)
11A data scientist is training a multi-class classifier with 10 classes. The training log shows the above output for the first two epochs. What is the most likely cause?
12A team is reviewing a neural network model summary. The input layer expects 784 features (e.g., 28x28 images). How many parameters does the first dense layer have?
13A data scientist is training a neural network to classify images of handwritten digits. The model achieves 99% accuracy on training data but only 85% on validation data. Which technique should the scientist apply first to address this issue?
14A company is deploying a machine learning model to predict customer churn. The dataset is highly imbalanced (95% non-churn, 5% churn). The model achieves 96% accuracy, but the F1-score for the churn class is only 0.2. Which metric should the team prioritize to evaluate model performance for this business problem?
15An autonomous vehicle system uses a deep reinforcement learning agent to navigate. The agent's reward function gives +1 for reaching the destination and -0.1 for each time step. After training, the agent learns to circle the block repeatedly without reaching the destination. Which modification is most likely to fix this behavior?
16A machine learning engineer is building a spam filter. The dataset contains 10,000 emails, of which 1,000 are spam. The engineer decides to use a Random Forest classifier. Which preprocessing step is most critical to ensure the model generalizes well to new, unseen emails?
17Which TWO techniques are commonly used to prevent overfitting in deep neural networks?
18Refer to the exhibit. A data scientist is training a binary classifier. Based on the training log, which problem is the model experiencing?
19A healthcare startup is developing a deep learning model to detect diabetic retinopathy from retinal fundus images. The dataset contains 50,000 images, but only 5% are labeled as positive for the disease. The team uses a convolutional neural network (CNN) with a final sigmoid layer and binary cross-entropy loss. After training for 20 epochs, the model achieves 95% accuracy on the test set, but the recall for the positive class is only 10%. The team suspects the model is biased toward the negative class due to class imbalance. The data is stored in a secure environment, and no additional labeled data can be obtained. The team has access to the following techniques: oversampling the minority class, undersampling the majority class, using class weights in the loss function, applying data augmentation, and using a different architecture. Which course of action is most likely to improve recall for the positive class while maintaining reasonable overall performance?
20A retail company uses a gradient boosting model to predict customer lifetime value (CLV). The model currently uses 50 features including purchase history, demographics, and web behavior. The model's RMSE on the test set is 120. The data science team wants to improve the model's accuracy without increasing training time significantly. They have access to additional data: customer support interaction logs (text), social media sentiment (text), and third-party credit scores (numeric). They also have the ability to perform feature engineering, hyperparameter tuning, and ensemble methods. Which approach is most likely to yield the best improvement in predictive performance with minimal increase in training time?
21A data scientist is training a binary classification model to detect fraudulent transactions. The dataset is highly imbalanced with only 1% fraud cases. Which technique is most appropriate to address the class imbalance?
22A machine learning engineer is tuning a neural network for image classification. The training loss decreases steadily, but the validation loss starts increasing after 50 epochs. Which action best addresses this issue?
23A company deploys a deep learning model for real-time object detection in autonomous vehicles. The model was trained on high-end GPUs but needs to run on edge devices with limited computational resources. Which technique is most effective for reducing model size and inference latency while maintaining acceptable accuracy?
24A data analyst wants to predict housing prices based on square footage, number of bedrooms, and location. Which machine learning approach is most suitable?
25A team is training a convolutional neural network (CNN) for medical image diagnosis. They have a limited dataset of 500 labeled images. Which strategy is most effective to improve model generalization?
26An AI developer observes that the training accuracy of a neural network is high, but the test accuracy is low. The model uses a ReLU activation function and Adam optimizer. Which approach is most likely to improve test accuracy?
27A machine learning engineer needs to choose an algorithm for grouping customers into segments based on purchasing behavior without any labels. Which algorithm should the engineer use?
28While training a deep neural network, the loss function fails to converge and oscillates wildly. Which adjustment is most likely to stabilize training?
29A data scientist is training a random forest model on a large dataset and notices that the model is overfitting. Which hyperparameter adjustment is most likely to reduce overfitting?
30A company is preparing a dataset for training a supervised machine learning model. The dataset contains missing values, outliers, and categorical features. Which two preprocessing steps are typically performed to prepare the data? (Choose two.)
31A deep learning engineer is training a convolutional neural network for image classification. The model is overfitting the training data. Which three techniques can help reduce overfitting? (Choose three.)
32A data scientist is evaluating a trained binary classification model. The model has high accuracy but the precision is low and recall is high. Which three actions are most appropriate to improve precision? (Choose three.)
33Refer to the exhibit. A data scientist is training a neural network and observes the training log above. What is the most likely cause?
34Refer to the exhibit. An AI specialist reviews the model evaluation report for a binary classifier. The specialist wants to improve recall. Which action is most likely effective?
35Refer to the exhibit. An AI developer implements the above neural network architecture for handwritten digit recognition. The model achieves 85% training accuracy and 83% test accuracy. Which modification is most likely to improve training accuracy?
36A data scientist is building a binary classification model to predict customer churn. The dataset has 10,000 samples with 80% non-churn and 20% churn. The model achieves 95% accuracy but fails to identify churners correctly. Which metric should the scientist focus on to evaluate model performance properly?
37A team is implementing a machine learning pipeline to classify images for a defect detection system. They are considering using a pre-trained convolutional neural network (CNN) and fine-tuning it on their small dataset. What is the primary advantage of transfer learning in this scenario?
38A company uses linear regression to predict sales based on advertising spend. The model's residuals show a pattern of increasing variance as spend increases. Which assumption of linear regression is violated?
39An AI engineer is training a deep neural network for image recognition. The training loss decreases steadily for the first few epochs but then plateaus and starts to oscillate. Which adjustment is most likely to improve convergence?
40A healthcare organization wants to use patient data to predict disease risk. They are concerned about bias in the model. Which step is most critical during the data preparation phase to mitigate bias?
41A team trains a random forest model on a dataset with 50 features. The model's performance on the test set is significantly worse than on the training set. Which technique is most appropriate to address this issue?
42A deep learning model for natural language processing uses a recurrent neural network (RNN) to process long sequences. The gradients vanish after many time steps. Which architectural change is most effective to mitigate this problem?
43An organization has a dataset with categorical features having high cardinality (e.g., ZIP codes). They plan to use a tree-based model. Which encoding method is most appropriate?
44A company deploys a machine learning model that makes predictions on streaming data. Over time, the data distribution shifts, causing model performance to degrade. Which monitoring strategy is most appropriate to detect this drift?
45A data scientist is tuning hyperparameters for a support vector machine (SVM) with an RBF kernel. Which two hyperparameters most significantly affect model performance? (Select TWO.)
46A team is designing a deep learning pipeline for a computer vision task. They want to reduce overfitting. Which two techniques are specifically effective for this purpose? (Select TWO.)
47A data scientist is using an ensemble method to combine multiple models. Which three statements about bagging (Bootstrap Aggregating) are true? (Select THREE.)
48Refer to the exhibit. The training log shows losses and accuracies over 5 epochs. What is the most likely problem?
49Refer to the exhibit. A developer is using the above configuration for a multi-class classification task. The model performs well on training data but poorly on validation data. Which modification could help?
50Refer to the exhibit. The training pod is using 2 GPUs. During training, the GPU utilization is only 30% each. What is the most likely cause?
51A data scientist needs to predict whether a customer will churn based on historical data containing features like account age, monthly charges, and support tickets. The target variable is binary (churn or not). Which type of machine learning algorithm should be used?
52A team trained a deep neural network on a limited dataset. The training loss decreases consistently, but the validation loss starts increasing after 20 epochs. What is the most likely issue and the best corrective action?
53A company is building a computer vision system to detect defects in manufactured parts. They have 10,000 labeled images per class (defective and non-defective). They want to achieve high accuracy with limited computational resources. Which deep learning architecture and approach is most appropriate?
54A machine learning engineer has a dataset of 100,000 records. She splits it into 70% training, 15% validation, and 15% test sets. After training, the model achieves 95% accuracy on training and 85% on validation. What does the accuracy difference most likely indicate?
55A deep learning model for sentiment analysis uses a softmax output layer. The hidden layers currently use tanh activation. Which activation function should replace tanh to mitigate vanishing gradients in deeper networks?
56A fraud detection model is trained on a dataset where only 0.1% of transactions are fraudulent. The model achieves 99.9% accuracy but fails to catch most frauds. Which metric should the team prioritize, and which technique could help?
57A dataset contains features on vastly different scales (e.g., age 0-100 vs. income 0-1,000,000). Which preprocessing step is essential before training a neural network?
58During training of a neural network, the loss oscillates and does not converge smoothly. The learning rate is set to 0.1. What is the most likely cause and what adjustment should be made?
59A team is building a model to predict stock prices based on time series data. They need to capture long-term dependencies and avoid vanishing gradients. Which architecture is best suited?
60Which TWO are characteristics of supervised learning?
61Which THREE techniques can help reduce overfitting in neural networks?
62Which TWO are key differences between Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN)?
63Refer to the exhibit. What is the most likely issue and what action should be taken?
64Refer to the exhibit. A compliance audit requires that model predictions be explainable for regulatory reasons. Which setting in the deployment configuration supports this requirement?
65Refer to the exhibit. What is the recall of the model?
66A data scientist trains a linear regression model on housing prices. The training error is low, but test error is high. What is the most likely issue?
67A team trains a deep learning model for image classification with 1000 classes. The training loss decreases but validation loss starts increasing after 10 epochs. What should they do first?
68A company uses a neural network for fraud detection. The dataset has 99% legitimate, 1% fraudulent. The model achieves 99% accuracy but fails to detect most frauds. Which metric should they focus on?
69A data scientist wants to reduce the dimensionality of a dataset with 200 features before training a regression model. Which technique should they use?
70A deep learning model for sentiment analysis has millions of parameters and is trained on a small dataset. Which technique can help prevent overfitting?
71An organization needs to classify customer emails into categories. They have labeled data for some categories but not all. Which approach should they use?
72A machine learning engineer notices that the gradient values in a deep network are becoming extremely small during backpropagation. What is this problem?
73A team wants to predict monthly sales using historical data. Which algorithm is most appropriate?
74A model trained on a dataset has high bias and low variance. What does this indicate?
75Which TWO techniques are commonly used for feature scaling? (Choose two.)
76Which THREE are common activation functions used in neural networks? (Choose three.)
77Which TWO are evaluation metrics for classification problems? (Choose two.)
78Based on the exhibit, what is the likely problem with the model?
79The exhibit shows a model configuration for a classification task with 10 classes. What is wrong with this setup?
80Based on the exhibit, what does this indicate about the model?
81A data scientist is training a binary classification model to detect fraudulent transactions. The dataset is highly imbalanced with 99% legitimate and 1% fraudulent. Which evaluation metric should be prioritized to assess model performance?
82A team is deploying a deep learning model for real-time image classification on edge devices with limited computational resources. Which technique would best help reduce model size and inference time without significant accuracy loss?
83A machine learning engineer notices that a linear regression model has high bias. Which action is most likely to reduce bias?
84A team is developing a recommendation system for an e-commerce platform. They want to use collaborative filtering but are concerned about cold-start problems for new users. Which approach would best mitigate the cold-start problem?
85A data scientist is training a deep neural network for sentiment analysis. The training loss decreases steadily but the validation loss starts to increase after 10 epochs. What is the most likely cause and best corrective action?
86An organization wants to automate the detection of defective products on an assembly line using computer vision. They have a limited number of labeled images for defective items. Which approach would be most effective?
87A machine learning engineer is troubleshooting a recurrent neural network that fails to learn long-range dependencies in sequential data. The gradients are computed using backpropagation through time. Which phenomenon is most likely occurring, and what architectural change would best address it?
88A data scientist is using a gradient boosting model (XGBoost) for a regression task and observes that the model's performance on the training set is much better than on the test set. Which hyperparameter tuning strategy would most effectively reduce overfitting?
89A deep learning model for autonomous vehicle perception uses a large convolutional neural network. During deployment, the model misclassifies a stop sign that has a small sticker on it. This is likely an example of what type of vulnerability, and which defense is most appropriate?
90Which TWO of the following are common activation functions used in deep neural networks?
91Which THREE of the following are techniques for handling missing data in machine learning?
92Which THREE of the following are best practices for preventing overfitting in deep learning models?
93A hospital wants to deploy a machine learning model to predict patient readmission risk within 30 days. They have a dataset with 10,000 records, 70 features including demographics, lab results, and past admissions. The target variable is binary (readmitted or not). The data scientist trains a logistic regression model and achieves an AUC of 0.85 on the test set. However, the hospital's clinicians require interpretability of predictions to trust the model. Which action should the data scientist take to ensure the model meets the interpretability requirement while maintaining performance?
94An e-commerce company uses a gradient boosting model to forecast daily sales. Recently, the model's predictions have become less accurate, showing a significant drop in R-squared on validation data. The data scientist checks for data drift but finds no significant changes in feature distributions. The model was trained on data from the past 24 months and is retrained monthly. Upon inspecting the feature importance, the data scientist notices that the top feature 'promotion_flag' has decreased in importance over time. What is the most likely cause of the performance degradation, and what should be done?
95A financial institution uses a deep learning model for fraud detection. The model is a feedforward neural network with three hidden layers. It was trained on a balanced dataset of 100,000 transactions. During deployment, the model achieves high accuracy on the test set but the fraud detection rate (true positive rate) is only 40% while the false positive rate is 0.1%. The business requires a true positive rate of at least 80%. Which of the following actions is most likely to achieve the required true positive rate while minimizing the increase in false positives?
96A data scientist is training a binary classification model to detect fraudulent transactions. The dataset contains 99.9% legitimate transactions and 0.1% fraudulent transactions. After training a logistic regression model, the accuracy is 99.9%, but the recall for the fraud class is 0%. Which of the following is the MOST likely cause?
97A machine learning engineer is preparing to train a deep neural network for image classification. To avoid overfitting, which TWO techniques should the engineer apply? (Select TWO.)
98A company is deploying a machine learning model that predicts customer churn. The model currently has high variance. Which THREE actions should the data scientist take to reduce variance? (Select THREE.)
99A healthcare startup is developing a diagnostic system using medical images. The team has collected 10,000 labeled images of skin lesions. They plan to train a convolutional neural network (CNN) from scratch. However, training converges slowly, and the validation accuracy plateaus at 70%. The data scientist suspects overfitting. The dataset contains 8,000 images of benign lesions and 2,000 of malignant. The team has limited GPU resources. Which of the following is the MOST effective course of action to improve validation accuracy? A. Reduce the number of convolutional layers. B. Apply transfer learning using a pre-trained model on ImageNet. C. Increase the learning rate by a factor of 10. D. Add more dropout after every convolutional layer.
100A financial institution uses a random forest model to approve loan applications. Recently, the model's false positive rate has increased, leading to more defaults. The data science team reviews the feature importance and finds that the model heavily relies on a feature 'zip code' which correlates with income. The company is concerned about fairness. The regulatory team requires that the model's predictions are not biased against protected groups. Which action BEST addresses the fairness concern while maintaining predictive performance? A. Remove the 'zip code' feature and retrain the model. B. Use adversarial debiasing to train a model that is invariant to protected attributes. C. Add more training data from underrepresented zip codes. D. Apply a post-processing technique that adjusts thresholds for different groups.
101An e-commerce company deploys a deep learning model for product recommendation. After a new data pipeline is implemented, the model's online performance degrades: recall drops by 20% and the click-through rate decreases. The data scientists suspect data drift. They compare the distribution of the input features between the training data and recent production data. The Kolmogorov-Smirnov test shows significant differences for two numerical features (price and rating). The team also notices that the frequency of categorical feature 'category' has changed. Which of the following is the MOST appropriate first step? A. Immediately retrain the model on all available data including new production data. B. Roll back to the previous data pipeline and investigate the root cause of drift. C. Use feature selection to remove the drifting features and retrain. D. Implement a monitoring dashboard to track drift over time and set up alerts.
102A self-driving car company uses a reinforcement learning agent to navigate. The agent was trained in a simulated environment and achieved high rewards. When deployed in the real world, the agent fails to avoid obstacles. The team collects real-world driving data and uses it to fine-tune the model. However, fine-tuning leads to catastrophic forgetting of the simulated knowledge. Which technique should the team use to mitigate this? A. Increase the learning rate during fine-tuning. B. Use elastic weight consolidation (EWC) to regularize important weights. C. Train the model from scratch using only real-world data. D. Increase the number of layers in the network.
103A media company uses a natural language processing (NLP) model to classify news articles into topics. The model was trained on articles from 2015-2018. In 2023, the model's F1 score drops significantly. The data scientists find that the word embeddings no longer capture the meaning of some terms (e.g., 'covid', 'metaverse'). The model uses static word embeddings (Word2Vec) trained on the original corpus. Which solution BEST addresses the observed degradation? A. Replace static embeddings with contextual embeddings from a transformer model like BERT, then fine-tune the classifier. B. Retrain the static Word2Vec embeddings on a larger corpus from 2023. C. Apply data augmentation to the original training data by replacing words with synonyms. D. Increase the dimensionality of the static embeddings.
104A data scientist is preparing a dataset for a binary classification neural network. The dataset contains both numerical and categorical features, and some rows have identical entries. Which TWO preprocessing steps are most essential to improve model performance and avoid overfitting?
105Based on the exhibit, what is the most likely issue with the trained model?
106A financial institution is developing a fraud detection model using historical transaction data. The dataset contains over 10 million records, but only 0.01% of transactions are fraudulent. The current model uses a neural network trained with standard cross-entropy loss, and the team applies random undersampling of the majority class to create a balanced training set. However, the model still produces a high number of false positives (legitimate transactions flagged as fraud) and misses approximately 30% of actual fraud cases. The business requires that at least 95% of frauds be caught, and the false positive rate must be below 1% to avoid overwhelming fraud analysts. The team has limited resources to collect additional data and cannot change the model architecture significantly. Which approach should the team take to best meet the business requirements?
The Machine Learning and Deep Learning domain covers the key concepts tested in this area of the AI0-001 exam blueprint published by CompTIA. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all AI0-001 domains — no account required.
The Courseiva AI0-001 question bank contains 106 questions in the Machine Learning and Deep Learning domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Machine Learning and Deep Learning domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included