CompTIA · Free Practice Questions · Last reviewed May 2026
30real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.
A retail company stores customer purchase history in a relational database. The database contains a table 'transactions' with columns: transaction_id, customer_id, product_id, quantity, price, and transaction_date. A data analyst needs to create a report that shows total revenue per customer for the last quarter. Which data concept describes the relationship between customer_id and total revenue?
Foreign key
Composite attribute
Derived attribute
Total revenue is calculated from other attributes, making it derived.
Atomic attribute
A healthcare database stores patient records. Each patient has a unique patient_id, and the database includes a table 'visits' with visit_id, patient_id, visit_date, and diagnosis_code. To ensure data integrity, which constraint should be applied to the patient_id column in the 'visits' table?
Unique constraint
Foreign key
Foreign key enforces referential integrity.
Primary key
Check constraint
A data engineer is designing a data warehouse for a multinational corporation. The company has sales data from different regions with varying currencies and date formats. To ensure consistency, which data concept should be applied to standardize the data before loading into the warehouse?
Data cleansing
Data transformation
Transformation includes standardization of formats.
Data profiling
Data masking
An e-commerce company uses a star schema for its data warehouse. The fact table 'sales_fact' contains foreign keys to dimension tables: customer_dim, product_dim, time_dim, and store_dim. A business user wants to know the total sales for each product category in the last month. Which join operation is required to retrieve this data?
Self-join on the fact table
Cross join between fact and dimension tables
Inner join between fact table and dimension tables
Inner join returns only matching rows, which is typical in star schema queries.
Left outer join between fact and dimension tables
A data analyst is working with a dataset containing customer information. The dataset includes a column 'full_name' which stores first and last names together. To perform analysis on first names separately, which data concept describes the process of splitting 'full_name' into 'first_name' and 'last_name'?
Data deduplication
Data summarization
Data normalization
Normalization reduces redundancy and breaks down attributes.
Data aggregation
A data scientist is building a machine learning model to predict customer churn. The dataset includes both numerical features (age, income) and categorical features (gender, marital status). Which data concept describes the process of converting categorical features into numerical values that can be used by the algorithm?
Data sampling
Encoding
Encoding converts categories to numbers, e.g., one-hot encoding.
Feature scaling
Dimensionality reduction
Want more Comparing and Contrasting Data Concepts practice?
Practice this domainA data analyst is pulling data from a production database for a report. The database contains customer orders with a column 'order_date'. The analyst notices that some orders have dates in the future. Which data quality issue does this represent?
Invalid data type
Inconsistent data
Missing data
Violation of business rules
Future orders are not valid per business rules, indicating a data quality issue.
A data engineer is designing a data pipeline to ingest streaming data from IoT sensors. The sensors send data every second, and the pipeline must handle bursts of up to 10,000 messages per second. Which approach is most appropriate for capturing this data before processing?
Directly write each message to a relational database
Load directly into a data warehouse
Use a message queue to buffer the incoming data
A message queue handles high throughput and provides reliable buffering.
Store data in flat files and process in nightly batches
A data analyst needs to combine two datasets: one contains customer information (customer_id, name, address) and the other contains order information (order_id, customer_id, order_date). The analyst wants to include all customers, even those who have not placed orders. Which type of join should be used?
FULL OUTER JOIN
INNER JOIN
LEFT JOIN
LEFT JOIN includes all customers, with order data where available.
RIGHT JOIN
A data analyst is tasked with extracting data from a legacy system that outputs fixed-width text files. The analyst needs to parse these files into a structured format. Which tool or method is most appropriate for this task?
A spreadsheet application
An ETL tool with a graphical interface
A scripting language such as Python
Python provides libraries and string manipulation ideal for parsing fixed-width files.
SQL
A company is merging two databases from different departments. In Database A, customer IDs are integers. In Database B, customer IDs are alphanumeric strings. To merge, the data analyst must reconcile these differences. Which step should be taken first?
Drop the ID column and use a surrogate key
Convert all IDs to integers using CAST
Perform data profiling to understand the ID formats and relationships
Profiling helps determine the best strategy for reconciliation.
Create a mapping table based on the first character
A data analyst needs to extract data from an API that returns JSON. The analyst wants to convert the JSON output into a tabular format for analysis. Which function in a scripting language is commonly used for this purpose?
json.loads()
to_csv()
read_json()
json_normalize()
This function normalizes semi-structured JSON data into a flat table.
Want more Mining and Acquiring Data practice?
Practice this domainA data analyst needs to identify the most frequently occurring value in a dataset. Which measure of central tendency should they use?
Mode
Mode is the most frequently occurring value.
Standard deviation
Median
Mean
A retail company wants to predict future sales based on historical data. Which modeling approach is most appropriate if the data shows a clear seasonal pattern?
Linear regression
Time series analysis
Time series analysis explicitly models seasonal patterns.
K-means clustering
Logistic regression
A data analyst is building a model to predict customer churn. The dataset has 10,000 records with 500 churned customers. The model predicts churn with 95% accuracy, but only identifies 10% of actual churners. Which metric best highlights this issue?
Accuracy
F1 score
Recall
Recall is low (10%), showing the model fails to detect churners.
Precision
A data analyst needs to combine two datasets that have the same columns but different rows. Which operation should they use?
Concatenate
Append
Append adds rows from one dataset to another with same columns.
Merge
Aggregate
A data analyst is performing a hypothesis test with a significance level of 0.05. The p-value obtained is 0.03. What should the analyst conclude?
Reject the null hypothesis
p < alpha indicates statistically significant result.
Fail to reject the null hypothesis
Accept the null hypothesis
The result is practically significant
A data scientist trains a regression model and observes high variance with low bias. Which technique is most appropriate to reduce variance?
Apply Ridge regularization
Ridge adds penalty to coefficients, reducing overfitting and variance.
Increase polynomial features
Use a smaller training set
Remove correlated features
Want more Analyzing and Modeling Data practice?
Practice this domainA data analyst is creating a dashboard to monitor server CPU utilization over the past 24 hours. Which chart type is most appropriate for showing the trend of CPU usage over time?
Scatter plot
Pie chart
Line chart
Line charts display trends over time effectively.
Bar chart
A sales dashboard shows monthly revenue but the bars are very tall for some months and very short for others, making comparisons difficult. Which visualization modification would best improve readability?
Change bar colors to gradient
Apply a logarithmic scale on the y-axis
Log scale compresses wide ranges so differences are visible.
Add more horizontal gridlines
Use a 3D bar chart for depth
A data analyst creates a bubble chart showing country GDP (x-axis), life expectancy (y-axis), and population (bubble size). However, large bubbles overlap and obscure many data points. Which corrective action should the analyst take?
Increase the chart canvas size
Set bubble opacity to 70%
Transparency allows seeing through overlapping bubbles.
Reduce all bubble sizes uniformly
Remove outlier countries with large populations
An analyst wants to show the distribution of test scores for 500 students. Which visualization type is best for understanding the shape of the distribution?
Line chart
Pie chart
Scatter plot
Histogram
Histograms display frequency distribution of numerical data.
A dashboard shows sales by region using a map with color intensity. Users complain that two regions with very different sales appear nearly the same color. What is the most likely cause?
The map projection is distorted
The color scale uses a sequential palette with insufficient contrast
Sequential palettes can have low perceptual difference between adjacent values.
The monitor resolution is too low
Users are color blind
An analyst creates a stacked bar chart showing quarterly sales by product category. The chart becomes hard to read because some categories have very small contributions. Which redesign is most effective?
Combine small categories into an 'Other' group
Grouping small items simplifies the chart and improves readability.
Change to a pie chart for each quarter
Increase the width of each bar
Switch to a 3D stacked column chart
Want more Visualizing Data practice?
Practice this domainA data analyst notices that a line chart showing monthly sales over the past two years has a steep drop in one month. Upon investigation, the analyst discovers that a new sales region was added mid-month and the data was not normalized. Which of the following best practices should the analyst apply to communicate this insight accurately?
Remove the month with the drop from the report.
Use a bar chart instead to show the data.
Normalize the sales data by region and explain the data anomaly in the report.
Normalization corrects the artifact, and explanation provides transparency.
Present the data as-is and let stakeholders interpret the drop.
A data team is preparing a dashboard for executives. The team wants to highlight key performance indicators (KPIs) that are below target. Which of the following visualization techniques would most effectively draw attention to underperforming metrics without causing confusion?
Remove underperforming KPIs from the dashboard to avoid confusion.
Use a scatter plot to show the relationship between KPIs.
Apply conditional formatting to turn KPI values red when below target.
Red highlights call attention to issues immediately.
Use a pie chart showing the proportion of each KPI.
A data analyst needs to present the distribution of customer ages to a non-technical audience. Which type of chart would be most appropriate?
Scatter plot
Histogram
Histograms show distribution of continuous data.
Pie chart
Line chart
A data analyst creates a report showing sales by product category. The analyst notices that one category has a very high sales figure due to a one-time bulk order. Which of the following is the best way to communicate this insight to stakeholders?
Delete the bulk order from the dataset.
Add a note to the chart explaining the bulk order.
Annotation provides context for the anomaly.
Remove the category with the bulk order from the report.
Use a pie chart to show the proportion of each category.
A data analyst is building a dashboard that will be used by both executives and operational managers. The executives need high-level summaries, while managers need granular details. Which dashboard design principle should the analyst apply?
Use a single chart that shows both summary and detail simultaneously.
Display all available data on one page for transparency.
Design the dashboard with drill-down capabilities from summary to detail.
Drill-down allows executives to see overview and managers to access details on demand.
Create two separate dashboards for each audience.
A data analyst wants to compare the sales performance of four different stores over the same time period. Which chart type is most suitable?
Line chart with multiple lines
Grouped bar chart
Grouped bars allow side-by-side comparison of stores.
Stacked bar chart
Pie chart with multiple pies
Want more Communicating Data Insights practice?
Practice this domainThe DA0-001 exam has 90 questions and must be completed in 90 minutes. The passing score is 675/1000.
Multiple-choice and performance-based questions covering IT security, networking, and operations. Some questions are performance-based (PBQs), asking you to complete tasks in a simulated environment.
The exam covers 5 domains: Comparing and Contrasting Data Concepts, Mining and Acquiring Data, Analyzing and Modeling Data, Visualizing Data, Communicating Data Insights. Questions are weighted by domain — higher-weight domains appear more on your actual exam.
No. These are original exam-style practice questions written against the official CompTIA DA0-001 exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.
Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.