How should I use these AI Models and Data Engineering practice questions?

Read each scenario carefully and choose your answer before revealing the explanation. Then check why your choice was right or wrong. Repeat until the reasoning feels automatic.

Can I practise just AI Models and Data Engineering questions in a focused session?

Yes — use the session launcher on this page to start a 10-, 20-, 30- or 50-question session drawn entirely from the AI Models and Data Engineering domain.

AI0-001 · topic practice

AI Models and Data Engineering practice questions

Practise CompTIA AI+ AI0-001 AI Models and Data Engineering practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security

20 questionsDomain: AI Models and Data Engineering

Practice 10 questions Browse domain →

What the exam tests

What to know about AI Models and Data Engineering

AI Models and Data Engineering questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common AI Models and Data Engineering exam traps

▸Answering from memory before reading the full scenario.
▸Missing a constraint such as cost, availability, security, scope or command context.
▸Choosing a broad answer when the question asks for the most specific fix.
▸Ignoring why the wrong options are tempting.

Practice set

AI Models and Data Engineering questions

20 questions · select your answer, then reveal the explanation

Question 1easymultiple choice

Read the full AI Models and Data Engineering explanation →

A data scientist is preparing a dataset for training a classification model. The dataset contains 10,000 records with a binary target variable where 9,500 belong to class A and 500 belong to class B. Which technique should the scientist use to address the class imbalance?

Trap 1: Random undersampling of class A

Undersampling reduces data and may lose important patterns.

Trap 2: Adding Gaussian noise to class B

Adding noise does not create new informative samples.

Trap 3: Principal Component Analysis (PCA)

PCA reduces features, not address imbalance.

Study all AI Models and Data Engineering common traps →

A
SMOTE (Synthetic Minority Oversampling Technique)
SMOTE creates synthetic minority samples to balance classes.
B
Random undersampling of class A
Why wrong: Undersampling reduces data and may lose important patterns.
C
Adding Gaussian noise to class B
Why wrong: Adding noise does not create new informative samples.
D
Principal Component Analysis (PCA)
Why wrong: PCA reduces features, not address imbalance.

AI Models and Data Engineering practice questions

What to know about AI Models and Data Engineering

Common AI Models and Data Engineering exam traps

AI Models and Data Engineering questions

A data scientist is preparing a dataset for training a classification model. The dataset contains 10,000 records with a binary target variable where 9,500 belong to class A and 500 belong to class B. Which technique should the scientist use to address the class imbalance?

A machine learning team is deploying a sentiment analysis model for customer reviews. The model was trained on reviews from an e-commerce site but will be used for a social media platform. The team observes a drop in accuracy. Which concept best explains this issue?

A data engineer needs to design a data pipeline for a real-time fraud detection system. The system requires low-latency processing of streaming transactions. Which architecture is most appropriate?

A team is training a deep learning model for image classification. The training loss decreases rapidly but validation loss starts increasing after a few epochs. Which regularization technique should be applied to mitigate this issue?

An organization needs to store sensitive customer data for training a machine learning model. The data must be encrypted at rest and in transit, and access must be audited. Which combination of practices should be implemented?

A data analyst is cleaning a dataset and finds that 20% of the values for the 'age' column are missing. Which imputation method is most robust if the data is not normally distributed?

Which TWO techniques are commonly used for feature selection in machine learning? (Choose 2)

Which THREE are common data preprocessing steps in a machine learning pipeline? (Choose 3)

Which TWO are best practices for versioning machine learning models? (Choose 2)

An engineer is training a neural network and observes the output shown. Which conclusion is most likely correct?

Exhibit

A data engineer is reviewing an S3 bucket policy for a machine learning project. The policy is intended to allow access to training data only from the corporate network (10.0.0.0/16). However, users in the corporate network report access denied. Which issue is most likely causing the problem?

Exhibit

A data scientist is training a deep learning model for image classification. The training loss decreases steadily but the validation loss starts increasing after 10 epochs. Which technique should the scientist apply to address this issue?

A financial institution is building a fraud detection system using a supervised learning model. The dataset is highly imbalanced with 99.9% legitimate transactions and 0.1% fraudulent ones. Which approach would be MOST effective to train the model to detect fraud?

A company wants to deploy an AI model for real-time inference on edge devices with limited computational resources. Which model architecture would be MOST suitable?

A data engineer is designing a pipeline for a streaming data application that uses a machine learning model to detect anomalies in real time. Which TWO practices should the engineer implement to ensure data quality and model reliability?

A team is developing a natural language processing model to classify customer feedback. The dataset contains text in multiple languages. Which THREE preprocessing steps are essential to ensure the model performs well across all languages?

A data scientist is preparing a dataset for a classification model. The dataset contains a column "Age" with 10% missing values and a column "Income" with 30% missing values. Which imputation strategy is MOST appropriate to minimize bias?

Track your progress over time

Start a AI Models and Data Engineering only practice session

Related AI0-001 topic practice pages

AI Concepts and Foundations practice questions

Machine Learning and Deep Learning practice questions

AI Models and Data Engineering practice questions

AI Implementation and Operations practice questions

AI Security, Ethics and Governance practice questions

CompTIA A+ hardware practice questions

CompTIA A+ mobile devices practice questions

CompTIA A+ networking practice questions

CompTIA A+ operating systems practice questions

CompTIA A+ security practice questions

CompTIA A+ software troubleshooting questions

CompTIA A+ operational procedures questions

Frequently asked questions

Track your progress

Study resources

Exam traps to avoid