AI0-001 Practice Questions

Question 1

A machine learning engineer is building a spam filter. The dataset contains 10,000 emails, of which 1,000 are spam. The engineer decides to use a Random Forest classifier. Which preprocessing step is most critical to ensure the model generalizes well to new, unseen emails?

Accepted Answer

Split the data into training and testing sets before any other preprocessing. Option C is correct because splitting the data into training and testing sets before any other preprocessing prevents data leakage. If preprocessing like normalization or PCA is applied to the entire dataset first, the test set information influences the training process, leading to overly optimistic performance estimates and poor generalization to new, unseen emails.

Answer

Apply Principal Component Analysis (PCA) to reduce dimensionality

Answer

Normalize the numerical features to have zero mean and unit variance

Answer

Encode all features using one-hot encoding

Question 2

Which THREE are common data preprocessing steps in a machine learning pipeline? (Choose 3)

Accepted Answer

Encoding categorical variables. Encoding categorical variables is a common data preprocessing step because machine learning algorithms require numerical input. Techniques like one-hot encoding or label encoding convert categorical data (e.g., colors, countries) into numeric format, enabling the model to process them correctly. Without this step, the model would misinterpret categorical labels as ordinal or meaningless numeric values.

Answer

Hyperparameter tuning

Answer

Model evaluation

Question 3

An e-commerce company uses an AI system to set dynamic prices for products. A customer complains that the price they see is higher than the price shown to a friend for the same product at the same time. The company wants to ensure pricing fairness. Which ethical principle should guide the redesign of the pricing algorithm?

Accepted Answer

Transparency and explainability. Transparency and explainability is the correct principle because the core issue is that the customer cannot understand why the AI system set a different price for them compared to their friend. Redesigning the algorithm to provide clear, understandable reasons for price variations—such as demand, purchase history, or time of day—directly addresses this lack of visibility. This principle ensures that the system's decision-making process is open to scrutiny, which is essential for building trust and resolving fairness complaints in dynamic pricing models.

Answer

Privacy by design

Answer

Accountability

Answer

Beneficence

Question 4

An AI system used for autonomous driving is found to have a lower accuracy in detecting pedestrians with darker skin tones. The development team wants to address this ethical issue. Which action is most effective?

Accepted Answer

Augment the training dataset with more images of pedestrians with darker skin. Option B is correct because augmenting the training dataset with more images of pedestrians with darker skin directly addresses the root cause of the bias: underrepresentation in the training data. By providing a more balanced and diverse dataset, the model can learn more robust features for all skin tones, reducing accuracy disparity without altering the algorithm's core logic or introducing arbitrary thresholds.

Answer

Conduct additional testing to measure the disparity

Answer

Replace the object detection algorithm with a different one

Answer

Adjust the model's decision threshold for pedestrian detection

Question 5

In the AI lifecycle, which phase involves splitting data into training, validation, and test sets?

Accepted Answer

Data preprocessing. Data preprocessing is the phase where raw data is cleaned, transformed, and prepared for modeling. Splitting the dataset into training, validation, and test sets is a critical step during this phase to ensure unbiased evaluation and prevent data leakage. This split occurs before any model training begins, making it part of preprocessing rather than training or evaluation.

Answer

Model training

Answer

Data collection

Answer

Model evaluation

Question 6

A startup is building a chatbot for customer service. They have 500 recorded conversations and want to use a pre-trained language model to generate responses. However, they have limited computational resources and need the chatbot to respond in real-time. They are considering fine-tuning a large model like GPT-3 or using a smaller model like DistilBERT. The conversation data contains industry-specific jargon. Which approach should they take?

Accepted Answer

Fine-tune DistilBERT on the conversation data. Option B is correct because fine-tuning DistilBERT on the 500 recorded conversations allows the model to adapt to industry-specific jargon while maintaining real-time responsiveness due to its smaller size. DistilBERT is a distilled version of BERT that retains 97% of BERT’s language understanding with 40% fewer parameters, making it suitable for limited computational resources. Fine-tuning on domain-specific data is essential here, as pre-trained models like GPT-3 lack exposure to the startup’s specialized terminology, and using a smaller model ensures low-latency inference for real-time chatbot responses.

Answer

Use GPT-3 via API without fine-tuning

Answer

Train a custom RNN from scratch on the conversations

Answer

Implement a rule-based system with keywords

Question 7

A data scientist is preparing a dataset for supervised learning. Which TWO steps are essential?

Accepted Answer

Labeling the data. Labeling the data is essential for supervised learning because the algorithm requires input-output pairs to learn a mapping function. Without labeled data, the model cannot be trained to predict outcomes, as supervised learning relies on ground-truth targets for error correction during training.

Answer

One-hot encoding all features

Answer

Normalizing features

Answer

Removing outliers

Question 8

A company wants to create an AI system that can identify objects in images. They have a large dataset of labeled images. Which type of neural network architecture is most suitable?

Accepted Answer

Convolutional neural network (CNN). Convolutional neural networks (CNNs) are specifically designed to process grid-like data such as images. They use convolutional layers to automatically learn spatial hierarchies of features (edges, textures, objects) from pixel data, making them the most suitable architecture for image classification tasks with labeled datasets.

Answer

Transformer

Answer

Generative adversarial network (GAN)

Answer

Recurrent neural network (RNN)

Question 9

A financial services company is developing an AI model to detect fraudulent transactions. The dataset contains 99.9% legitimate transactions and 0.1% fraudulent ones. Which technique should the data scientist use to address the class imbalance problem?

Accepted Answer

Apply Synthetic Minority Oversampling Technique (SMOTE). SMOTE (Synthetic Minority Oversampling Technique) is the correct choice because it generates synthetic examples of the minority class (fraudulent transactions) by interpolating between existing minority instances, rather than duplicating them. This addresses the extreme 0.1% fraud rate without introducing overfitting or losing data, making it a standard technique for imbalanced classification problems in financial fraud detection.

Answer

Use a bagging ensemble method

Answer

Undersample the legitimate transactions

Answer

Use cost-sensitive learning with higher weight on fraudulent class

Question 10

Based on the exhibit, which action is most likely to resolve the memory issue?

Accepted Answer

Reduce the batch size.. The exhibit shows an out-of-memory (OOM) error during training. Reducing the batch size decreases the memory footprint per iteration, allowing the model to fit within available GPU memory. This directly resolves the memory issue without altering the model architecture or data.

Answer

Add more training data.

Answer

Increase the learning rate.

Answer

Switch to a CPU.

Question 11

A company deploys an AI model via a REST API that handles sensitive customer data. To secure the endpoint, the security team requires that only authenticated and authorized applications can invoke the API. Which mechanism should be implemented?

Accepted Answer

API key or bearer token in the HTTP header. Option A is correct because API keys or bearer tokens (e.g., OAuth 2.0 access tokens) are the standard mechanism for authenticating and authorizing client applications when invoking a REST API. These tokens are passed in the HTTP Authorization header, allowing the server to verify the client's identity and permissions before processing requests containing sensitive customer data.

Answer

TLS encryption for the connection

Answer

Input sanitization to prevent injection

Answer

IP whitelisting

Question 12

During an AI model deployment, the operations team notices that inference requests are taking longer than expected. Which component is most likely causing the bottleneck?

Accepted Answer

The machine learning model's size and architecture. The machine learning model's size and architecture directly determine the computational complexity of inference. Larger models with more parameters or deeper architectures require more matrix multiplications and memory bandwidth, which increases latency per request. This is the most common bottleneck in AI deployment because the model itself is the core computation unit, and its inference time scales with its complexity.

Answer

Input data preprocessing pipeline

Answer

API gateway rate limiting

Answer

Database connection pool size

Question 13

During model monitoring, a loan approval model shows disparate impact against a protected group. The model's overall accuracy is high, but the false positive rate for the protected group is 0.12 compared to 0.02 for other groups. Which action should the operations team take first?

Accepted Answer

Retrain the model with reweighted training data to minimize disparity. Option C is correct because retraining the model with reweighted training data directly addresses the root cause of disparate impact—biased historical data—by assigning higher weights to underrepresented groups during training. This technique, often implemented via cost-sensitive learning or sample reweighting, adjusts the model's internal decision boundaries to reduce false positive rate disparities without sacrificing overall accuracy. The operations team should first attempt to mitigate bias at the data level before considering threshold adjustments or model replacement, as reweighting preserves the model's learned patterns while promoting fairness.

Answer

Document the disparity and proceed with deployment because accuracy is high

Answer

Replace the model with a simpler model that is less discriminatory

Answer

Adjust the decision threshold for the protected group to equalize false positive rates

Question 14

A healthcare company must deploy a diagnostic AI model that uses protected health information (PHI). To comply with HIPAA, the operations team needs to ensure data privacy during model inference. Which practice should be implemented?

Accepted Answer

Encrypt all PHI at rest and in transit within the inference pipeline. Option B is correct because HIPAA mandates encryption of protected health information (PHI) both at rest and in transit to safeguard data confidentiality during model inference. Encrypting the entire inference pipeline ensures that even if data is intercepted or accessed without authorization, it remains unreadable. This practice directly addresses the compliance requirement for data privacy without relying on network location or partial obfuscation.

Answer

Run the model on-premises to avoid cloud data transmission

Answer

Mask sensitive fields in the input data before inference

Answer

Apply differential privacy during model training only

Question 15

A model trained on a dataset with imbalanced classes achieves 98% accuracy but only 50% recall for the minority class. Which technique should be applied first to address the imbalance?

Accepted Answer

Apply cost-sensitive learning. Cost-sensitive learning directly modifies the model's loss function to penalize misclassifications of the minority class more heavily than those of the majority class. This approach addresses the root cause of the imbalance—the model's bias toward the majority class—without altering the dataset distribution, making it the most immediate and effective first step.

Answer

Reduce the majority class size

Answer

Use SMOTE to generate synthetic samples

Answer

Collect more data for the minority class

CompTIA AI+ AI0-001 practice test

Three ways to study

All 500 AI0-001 questions with answers

Study AI0-001 by domain

Study AI0-001 by topic

AI Concepts and Foundations practice questions

Machine Learning and Deep Learning practice questions

AI Models and Data Engineering practice questions

AI Implementation and Operations practice questions

AI Security, Ethics and Governance practice questions

CompTIA A+ hardware practice questions

CompTIA A+ mobile devices practice questions

CompTIA A+ networking practice questions

CompTIA A+ operating systems practice questions

CompTIA A+ security practice questions

CompTIA A+ software troubleshooting questions

CompTIA A+ operational procedures questions

Top AI0-001 questions

CompTIA AI+ AI0-001 practice questions

A machine learning engineer is building a spam filter. The dataset contains 10,000 emails, of which 1,000 are spam. The engineer decides to use a Random Forest classifier. Which preprocessing step is most critical to ensure the model generalizes well to new, unseen emails?

Which THREE are common data preprocessing steps in a machine learning pipeline? (Choose 3)

An AI system used for autonomous driving is found to have a lower accuracy in detecting pedestrians with darker skin tones. The development team wants to address this ethical issue. Which action is most effective?

In the AI lifecycle, which phase involves splitting data into training, validation, and test sets?

A data scientist is preparing a dataset for supervised learning. Which TWO steps are essential?

A company wants to create an AI system that can identify objects in images. They have a large dataset of labeled images. Which type of neural network architecture is most suitable?

A financial services company is developing an AI model to detect fraudulent transactions. The dataset contains 99.9% legitimate transactions and 0.1% fraudulent ones. Which technique should the data scientist use to address the class imbalance problem?

Based on the exhibit, which action is most likely to resolve the memory issue?

Exhibit

A company deploys an AI model via a REST API that handles sensitive customer data. To secure the endpoint, the security team requires that only authenticated and authorized applications can invoke the API. Which mechanism should be implemented?

During an AI model deployment, the operations team notices that inference requests are taking longer than expected. Which component is most likely causing the bottleneck?

During model monitoring, a loan approval model shows disparate impact against a protected group. The model's overall accuracy is high, but the false positive rate for the protected group is 0.12 compared to 0.02 for other groups. Which action should the operations team take first?

A healthcare company must deploy a diagnostic AI model that uses protected health information (PHI). To comply with HIPAA, the operations team needs to ensure data privacy during model inference. Which practice should be implemented?

A model trained on a dataset with imbalanced classes achieves 98% accuracy but only 50% recall for the minority class. Which technique should be applied first to address the imbalance?

An MLOps team automates model deployment with a CI/CD pipeline. A performance regression is detected after deploying a new model version. The team needs to automatically roll back to the previous version. Which approach best enables safe automated rollback?

Refer to the exhibit. A team created an access policy for a fraud detection model endpoint. An intern reports being unable to access the model for testing. Reviewing the policy, what is the most likely cause?

Exhibit

A dataset for a binary classification problem has 95% of samples in class "0" and 5% in class "1". The data scientist trains a logistic regression model and achieves 95% accuracy. Which metric should the scientist primarily use to evaluate model performance?

A data scientist is evaluating a binary classification model for fraud detection. The dataset is highly imbalanced (99% non-fraud, 1% fraud). Which TWO metrics are most appropriate for assessing model performance? (Choose two.)

A data engineer is building a pipeline to ingest streaming data from IoT sensors. Which data storage solution is best suited for real-time analytics on timestamped sensor readings?

While training a deep neural network, the loss function fails to converge and oscillates wildly. Which adjustment is most likely to stabilize training?

A data engineer needs to store training data in a format that supports columnar pruning during model training. Which storage format should they use?

Which TWO of the following are common methods for mitigating bias in AI models?

An AI system is being designed to automatically detect fraudulent transactions in real-time. The system must have low latency and high precision to minimize false alarms. Which algorithm is most appropriate?

Question Discussion

How to use these AI0-001 questions

Quick answer