AI0-001 · topic practice

AI Models and Data Engineering practice questions

Practise CompTIA AI+ AI0-001 AI Models and Data Engineering practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: AI Models and Data Engineering

What the exam tests

What to know about AI Models and Data Engineering

AI Models and Data Engineering questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common AI Models and Data Engineering exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

AI Models and Data Engineering questions

20 questions · select your answer, then reveal the explanation

A data scientist is preparing a dataset for training a classification model. The dataset contains 10,000 records with a binary target variable where 9,500 belong to class A and 500 belong to class B. Which technique should the scientist use to address the class imbalance?

An engineer is building a regression model to predict housing prices. The dataset includes features such as square footage, number of bedrooms, and year built. The engineer notices that the square footage values range from 500 to 10,000, while the number of bedrooms ranges from 1 to 5. Which preprocessing step is most critical before training a gradient descent-based model?

A machine learning team is deploying a sentiment analysis model for customer reviews. The model was trained on reviews from an e-commerce site but will be used for a social media platform. The team observes a drop in accuracy. Which concept best explains this issue?

A data engineer needs to design a data pipeline for a real-time fraud detection system. The system requires low-latency processing of streaming transactions. Which architecture is most appropriate?

A team is training a deep learning model for image classification. The training loss decreases rapidly but validation loss starts increasing after a few epochs. Which regularization technique should be applied to mitigate this issue?

Question 6hardmultiple choice
Read the full NAT/PAT explanation →

An organization needs to store sensitive customer data for training a machine learning model. The data must be encrypted at rest and in transit, and access must be audited. Which combination of practices should be implemented?

A data analyst is cleaning a dataset and finds that 20% of the values for the 'age' column are missing. Which imputation method is most robust if the data is not normally distributed?

Which TWO techniques are commonly used for feature selection in machine learning? (Choose 2)

Which THREE are common data preprocessing steps in a machine learning pipeline? (Choose 3)

Which TWO are best practices for versioning machine learning models? (Choose 2)

An engineer is training a neural network and observes the output shown. Which conclusion is most likely correct?

Exhibit

Refer to the exhibit.

```
Epoch 1/10 - loss: 0.6932 - accuracy: 0.5234 - val_loss: 0.6918 - val_accuracy: 0.5312
Epoch 2/10 - loss: 0.4231 - accuracy: 0.8047 - val_loss: 0.5234 - val_accuracy: 0.7422
Epoch 3/10 - loss: 0.3125 - accuracy: 0.8828 - val_loss: 0.6015 - val_accuracy: 0.7344
Epoch 4/10 - loss: 0.2146 - accuracy: 0.9219 - val_loss: 0.7234 - val_accuracy: 0.7188
Epoch 5/10 - loss: 0.1478 - accuracy: 0.9531 - val_loss: 0.8342 - val_accuracy: 0.7031
```

A data engineer is reviewing an S3 bucket policy for a machine learning project. The policy is intended to allow access to training data only from the corporate network (10.0.0.0/16). However, users in the corporate network report access denied. Which issue is most likely causing the problem?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::ml-training-data/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "10.0.0.0/16"
        }
      }
    }
  ]
}
```

A data scientist is training a deep learning model for image classification. The training loss decreases steadily but the validation loss starts increasing after 10 epochs. Which technique should the scientist apply to address this issue?

A financial institution is building a fraud detection system using a supervised learning model. The dataset is highly imbalanced with 99.9% legitimate transactions and 0.1% fraudulent ones. Which approach would be MOST effective to train the model to detect fraud?

A company wants to deploy an AI model for real-time inference on edge devices with limited computational resources. Which model architecture would be MOST suitable?

A data engineer is designing a pipeline for a streaming data application that uses a machine learning model to detect anomalies in real time. Which TWO practices should the engineer implement to ensure data quality and model reliability?

Question 17mediummulti select
Read the full NAT/PAT explanation →

A team is developing a natural language processing model to classify customer feedback. The dataset contains text in multiple languages. Which THREE preprocessing steps are essential to ensure the model performs well across all languages?

A large e-commerce company uses a recommendation system based on collaborative filtering. The system uses a matrix factorization model that is trained nightly on the entire user-item interaction history. Recently, the company launched a flash sale with thousands of new products. Users are reporting that the recommendations are not showing the new products, even for users who have purchased them during the sale. The data engineering team notices that the new products have very few interactions in the training data. The model's loss on the validation set has increased, and the recall@10 metric has dropped from 0.45 to 0.32. The team needs to improve the recommendation of new items without retraining the entire model from scratch every hour. Which approach should the team take?

Question 19mediummultiple choice
Read the full NAT/PAT explanation →

A healthcare startup is developing a deep learning model to detect diabetic retinopathy from retinal images. The model is trained on a dataset of 10,000 labeled images. During initial testing, the model achieves 99% accuracy on the training set but only 85% on the test set. The startup wants to deploy the model in a clinical setting where false negatives (missing a disease) are critical. The team has access to additional unlabeled retinal images from multiple sources. Which strategy should the team use to improve the model's generalization and reduce false negatives?

A data scientist is preparing a dataset for a classification model. The dataset contains a column "Age" with 10% missing values and a column "Income" with 30% missing values. Which imputation strategy is MOST appropriate to minimize bias?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused AI Models and Data Engineering sessions

Start a AI Models and Data Engineering only practice session

Every question in these sessions is drawn from the AI Models and Data Engineering domain — nothing else.

Related practice questions

Related AI0-001 topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the AI0-001 exam test about AI Models and Data Engineering?
AI Models and Data Engineering questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just AI Models and Data Engineering questions in a focused session?
Yes — the session launcher on this page draws every question from the AI Models and Data Engineering domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other AI0-001 topics?
Use the topic links above to move to related areas, or go back to the AI0-001 question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the AI0-001 exam covers. They are not copied from any real exam or dump site.