Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

HomeCertificationsDA0-001Exam Questions

CompTIA · Free Practice Questions · Last reviewed May 2026

DA0-001 Exam Questions and Answers

30real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.

90 exam questions
90 min time limit
Pass: 675/1000 / 1000
5 exam domains
OverviewDomain BlueprintStudy GuideAll QuestionsSample by Domain
1. Comparing and Contrasting Data Concepts2. Mining and Acquiring Data3. Analyzing and Modeling Data4. Visualizing Data5. Communicating Data Insights
1

Domain 1: Comparing and Contrasting Data Concepts

All Comparing and Contrasting Data Concepts questions
Q1
mediumFull explanation →

A retail company stores customer purchase history in a relational database. The database contains a table 'transactions' with columns: transaction_id, customer_id, product_id, quantity, price, and transaction_date. A data analyst needs to create a report that shows total revenue per customer for the last quarter. Which data concept describes the relationship between customer_id and total revenue?

A

Foreign key

B

Composite attribute

C

Derived attribute

Total revenue is calculated from other attributes, making it derived.

D

Atomic attribute

Why: Total revenue is calculated by summing (quantity * price) for each customer, making it a derived attribute because it is computed from existing stored data (quantity and price) rather than stored directly. In the context of the 'transactions' table, customer_id is a stored key, but total_revenue is not stored; it is derived via aggregation, which matches the definition of a derived attribute in database design.
Q2
easyFull explanation →

A healthcare database stores patient records. Each patient has a unique patient_id, and the database includes a table 'visits' with visit_id, patient_id, visit_date, and diagnosis_code. To ensure data integrity, which constraint should be applied to the patient_id column in the 'visits' table?

A

Unique constraint

B

Foreign key

Foreign key enforces referential integrity.

C

Primary key

D

Check constraint

Why: Option B is correct because a foreign key constraint ensures that patient_id in visits references a valid patient_id in the patient table. Option A is wrong because primary key ensures uniqueness in its own table. Option C is wrong because unique constraint prevents duplicates. Option D is wrong because check constraint validates values based on a condition.
Q3
hardFull explanation →

A data engineer is designing a data warehouse for a multinational corporation. The company has sales data from different regions with varying currencies and date formats. To ensure consistency, which data concept should be applied to standardize the data before loading into the warehouse?

A

Data cleansing

B

Data transformation

Transformation includes standardization of formats.

C

Data profiling

D

Data masking

Why: Data transformation is the correct concept because it involves converting data from source formats (e.g., different currencies and date formats) into a consistent, standardized format before loading into the data warehouse. This process includes applying conversion rules, such as using ISO 8601 for dates and a single base currency (e.g., USD) with exchange rate tables, ensuring uniformity across all regional data. Without transformation, the warehouse would contain incompatible data types, breaking referential integrity and analytical queries.
Q4
mediumFull explanation →

An e-commerce company uses a star schema for its data warehouse. The fact table 'sales_fact' contains foreign keys to dimension tables: customer_dim, product_dim, time_dim, and store_dim. A business user wants to know the total sales for each product category in the last month. Which join operation is required to retrieve this data?

A

Self-join on the fact table

B

Cross join between fact and dimension tables

C

Inner join between fact table and dimension tables

Inner join returns only matching rows, which is typical in star schema queries.

D

Left outer join between fact and dimension tables

Why: To retrieve total sales for each product category, you need to join the fact table with the product dimension table to map product keys to categories, and with the time dimension table to filter on the last month. An inner join is correct because it returns only rows where matching keys exist in both tables, which is the standard approach for star-schema queries where all required dimension attributes are present. This ensures that only valid sales transactions with corresponding product and time entries are included in the aggregation.
Q5
easyFull explanation →

A data analyst is working with a dataset containing customer information. The dataset includes a column 'full_name' which stores first and last names together. To perform analysis on first names separately, which data concept describes the process of splitting 'full_name' into 'first_name' and 'last_name'?

A

Data deduplication

B

Data summarization

C

Data normalization

Normalization reduces redundancy and breaks down attributes.

D

Data aggregation

Why: Option C is correct because data normalization is the process of organizing data to reduce redundancy and improve integrity, which includes splitting composite attributes like 'full_name' into atomic values ('first_name', 'last_name'). This aligns with the first normal form (1NF) principle in database design, where each column should contain indivisible values. The data analyst is decomposing a single field into multiple, more granular fields to enable separate analysis.
Q6
hardFull explanation →

A data scientist is building a machine learning model to predict customer churn. The dataset includes both numerical features (age, income) and categorical features (gender, marital status). Which data concept describes the process of converting categorical features into numerical values that can be used by the algorithm?

A

Data sampling

B

Encoding

Encoding converts categories to numbers, e.g., one-hot encoding.

C

Feature scaling

D

Dimensionality reduction

Why: Encoding is the correct data concept because it transforms categorical features (like gender and marital status) into numerical representations (e.g., one-hot encoding, label encoding) that machine learning algorithms can process. Unlike feature scaling or dimensionality reduction, encoding directly addresses the incompatibility of non-numeric data with mathematical model operations.

Want more Comparing and Contrasting Data Concepts practice?

Practice this domain
2

Domain 2: Mining and Acquiring Data

All Mining and Acquiring Data questions
Q1
mediumFull explanation →

A data analyst is pulling data from a production database for a report. The database contains customer orders with a column 'order_date'. The analyst notices that some orders have dates in the future. Which data quality issue does this represent?

A

Invalid data type

B

Inconsistent data

C

Missing data

D

Violation of business rules

Future orders are not valid per business rules, indicating a data quality issue.

Why: Option D is correct because future order dates violate a business rule that order_date must be in the past or present. This is a classic data integrity issue where the data does not conform to domain-specific constraints, such as 'order_date <= CURRENT_DATE'. The analyst should flag this as a violation of business rules, not a data type or consistency problem.
Q2
hardFull explanation →

A data engineer is designing a data pipeline to ingest streaming data from IoT sensors. The sensors send data every second, and the pipeline must handle bursts of up to 10,000 messages per second. Which approach is most appropriate for capturing this data before processing?

A

Directly write each message to a relational database

B

Load directly into a data warehouse

C

Use a message queue to buffer the incoming data

A message queue handles high throughput and provides reliable buffering.

D

Store data in flat files and process in nightly batches

Why: Option C is correct because a message queue (e.g., Apache Kafka, Amazon Kinesis, or RabbitMQ) provides an asynchronous buffer that decouples the high-velocity ingestion (up to 10,000 messages/second) from downstream processing. This allows the pipeline to absorb burst traffic without overwhelming the processing layer, ensures data durability, and supports replayability in case of failures.
Q3
easyFull explanation →

A data analyst needs to combine two datasets: one contains customer information (customer_id, name, address) and the other contains order information (order_id, customer_id, order_date). The analyst wants to include all customers, even those who have not placed orders. Which type of join should be used?

A

FULL OUTER JOIN

B

INNER JOIN

C

LEFT JOIN

LEFT JOIN includes all customers, with order data where available.

D

RIGHT JOIN

Why: A LEFT JOIN returns all rows from the left table (customers) and the matching rows from the right table (orders). If a customer has no orders, the order columns will contain NULLs. This satisfies the requirement to include all customers, even those without orders.
Q4
mediumFull explanation →

A data analyst is tasked with extracting data from a legacy system that outputs fixed-width text files. The analyst needs to parse these files into a structured format. Which tool or method is most appropriate for this task?

A

A spreadsheet application

B

An ETL tool with a graphical interface

C

A scripting language such as Python

Python provides libraries and string manipulation ideal for parsing fixed-width files.

D

SQL

Why: Python is the most appropriate choice because fixed-width text files require precise column slicing based on character positions, which Python's string slicing and libraries like `struct` or `pandas.read_fwf` handle natively. Unlike graphical ETL tools or spreadsheets, Python provides programmatic control to define exact field widths, handle edge cases like missing delimiters, and process large files efficiently without manual intervention.
Q5
hardFull explanation →

A company is merging two databases from different departments. In Database A, customer IDs are integers. In Database B, customer IDs are alphanumeric strings. To merge, the data analyst must reconcile these differences. Which step should be taken first?

A

Drop the ID column and use a surrogate key

B

Convert all IDs to integers using CAST

C

Perform data profiling to understand the ID formats and relationships

Profiling helps determine the best strategy for reconciliation.

D

Create a mapping table based on the first character

Why: Option C is correct because data profiling is the essential first step before any transformation or mapping. It allows the analyst to examine the actual formats, patterns, and relationships in both ID columns (e.g., whether Database B's alphanumeric IDs contain embedded numeric sequences or consistent prefixes). Without profiling, any conversion or mapping would be based on assumptions that could lead to data loss or incorrect merges.
Q6
easyFull explanation →

A data analyst needs to extract data from an API that returns JSON. The analyst wants to convert the JSON output into a tabular format for analysis. Which function in a scripting language is commonly used for this purpose?

A

json.loads()

B

to_csv()

C

read_json()

D

json_normalize()

This function normalizes semi-structured JSON data into a flat table.

Why: Option D is correct because `json_normalize()` is a function in the pandas library specifically designed to flatten semi-structured JSON data (including nested lists and dictionaries) into a tabular DataFrame. This makes it the ideal tool for converting API responses with complex nesting into rows and columns for analysis, unlike simpler JSON parsing functions.

Want more Mining and Acquiring Data practice?

Practice this domain
3

Domain 3: Analyzing and Modeling Data

All Analyzing and Modeling Data questions
Q1
easyFull explanation →

A data analyst needs to identify the most frequently occurring value in a dataset. Which measure of central tendency should they use?

A

Mode

Mode is the most frequently occurring value.

B

Standard deviation

C

Median

D

Mean

Why: The mode is the measure of central tendency that identifies the most frequently occurring value in a dataset. Unlike the mean or median, the mode directly counts the frequency of each distinct value and returns the value with the highest count, making it the correct choice for this specific requirement.
Q2
mediumFull explanation →

A retail company wants to predict future sales based on historical data. Which modeling approach is most appropriate if the data shows a clear seasonal pattern?

A

Linear regression

B

Time series analysis

Time series analysis explicitly models seasonal patterns.

C

K-means clustering

D

Logistic regression

Why: Time series analysis is specifically designed to model data points indexed in time order, making it ideal for capturing and forecasting seasonal patterns. Unlike regression models, it accounts for autocorrelation, trends, and seasonality components, which are critical for accurate sales prediction from historical data.
Q3
hardFull explanation →

A data analyst is building a model to predict customer churn. The dataset has 10,000 records with 500 churned customers. The model predicts churn with 95% accuracy, but only identifies 10% of actual churners. Which metric best highlights this issue?

A

Accuracy

B

F1 score

C

Recall

Recall is low (10%), showing the model fails to detect churners.

D

Precision

Why: Recall (also known as sensitivity or true positive rate) measures the proportion of actual positives correctly identified. With only 10% of actual churners detected, the model has a recall of 0.1, which directly highlights the failure to capture churners despite high overall accuracy.
Q4
easyFull explanation →

A data analyst needs to combine two datasets that have the same columns but different rows. Which operation should they use?

A

Concatenate

B

Append

Append adds rows from one dataset to another with same columns.

C

Merge

D

Aggregate

Why: Option B (Append) is correct because appending is the standard operation for combining two datasets with identical columns but different rows, stacking the rows from one dataset onto the other. In tools like SQL, this is achieved with the UNION or UNION ALL operator, and in Python pandas, it is done via the `append()` method or `pd.concat()` with axis=0. This operation preserves the column structure while extending the row count.
Q5
mediumFull explanation →

A data analyst is performing a hypothesis test with a significance level of 0.05. The p-value obtained is 0.03. What should the analyst conclude?

A

Reject the null hypothesis

p < alpha indicates statistically significant result.

B

Fail to reject the null hypothesis

C

Accept the null hypothesis

D

The result is practically significant

Why: Since the p-value (0.03) is less than the significance level (0.05), the result is statistically significant. This means the observed data provides sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. The analyst should conclude that there is a statistically significant effect or difference.
Q6
hardFull explanation →

A data scientist trains a regression model and observes high variance with low bias. Which technique is most appropriate to reduce variance?

A

Apply Ridge regularization

Ridge adds penalty to coefficients, reducing overfitting and variance.

B

Increase polynomial features

C

Use a smaller training set

D

Remove correlated features

Why: Ridge regularization (L2) reduces variance by adding a penalty term proportional to the square of the coefficients, which shrinks them toward zero without eliminating them. This directly addresses high variance (overfitting) by constraining the model's complexity, while low bias indicates the model fits the training data well. The regularization parameter λ controls the trade-off between bias and variance.

Want more Analyzing and Modeling Data practice?

Practice this domain
4

Domain 4: Visualizing Data

All Visualizing Data questions
Q1
easyFull explanation →

A data analyst is creating a dashboard to monitor server CPU utilization over the past 24 hours. Which chart type is most appropriate for showing the trend of CPU usage over time?

A

Scatter plot

B

Pie chart

C

Line chart

Line charts display trends over time effectively.

D

Bar chart

Why: A line chart is the most appropriate choice for displaying CPU utilization trends over a continuous 24-hour period because it connects data points in chronological order, making it easy to observe peaks, valleys, and overall patterns. The x-axis represents time (hours), and the y-axis represents CPU usage percentage, allowing the analyst to quickly identify when utilization spikes or drops. This aligns with the DA0-001 objective of selecting the correct visualization for time-series data.
Q2
mediumFull explanation →

A sales dashboard shows monthly revenue but the bars are very tall for some months and very short for others, making comparisons difficult. Which visualization modification would best improve readability?

A

Change bar colors to gradient

B

Apply a logarithmic scale on the y-axis

Log scale compresses wide ranges so differences are visible.

C

Add more horizontal gridlines

D

Use a 3D bar chart for depth

Why: A logarithmic scale compresses the y-axis so that large values are displayed proportionally to small values, making it easier to compare relative changes across months with vastly different revenue figures. This is particularly useful when the data spans several orders of magnitude, as it prevents tall bars from dominating the view and short bars from being barely visible.
Q3
hardFull explanation →

A data analyst creates a bubble chart showing country GDP (x-axis), life expectancy (y-axis), and population (bubble size). However, large bubbles overlap and obscure many data points. Which corrective action should the analyst take?

A

Increase the chart canvas size

B

Set bubble opacity to 70%

Transparency allows seeing through overlapping bubbles.

C

Reduce all bubble sizes uniformly

D

Remove outlier countries with large populations

Why: Setting bubble opacity to 70% allows overlapping bubbles to become semi-transparent, so data points underneath remain visible. This technique preserves the original data representation (GDP, life expectancy, and population) without altering the chart's scale or removing data. It is a standard visualization practice for handling overplotting in dense scatter plots and bubble charts.
Q4
easyFull explanation →

An analyst wants to show the distribution of test scores for 500 students. Which visualization type is best for understanding the shape of the distribution?

A

Line chart

B

Pie chart

C

Scatter plot

D

Histogram

Histograms display frequency distribution of numerical data.

Why: A histogram is the correct choice because it groups continuous test scores into bins and displays the frequency of scores within each bin, allowing the analyst to see the shape of the distribution (e.g., normal, skewed, bimodal). This directly addresses the goal of understanding distribution shape, which is a core use case for histograms in data visualization.
Q5
mediumFull explanation →

A dashboard shows sales by region using a map with color intensity. Users complain that two regions with very different sales appear nearly the same color. What is the most likely cause?

A

The map projection is distorted

B

The color scale uses a sequential palette with insufficient contrast

Sequential palettes can have low perceptual difference between adjacent values.

C

The monitor resolution is too low

D

Users are color blind

Why: The issue is that the color scale uses a sequential palette with insufficient contrast between adjacent data values. When the color gradient is too narrow or uses similar hues, regions with significantly different sales figures map to nearly identical colors, making the visualization ineffective. This is a common problem in data visualization when the color mapping does not span the full range of the data or uses a perceptually uniform palette poorly.
Q6
hardFull explanation →

An analyst creates a stacked bar chart showing quarterly sales by product category. The chart becomes hard to read because some categories have very small contributions. Which redesign is most effective?

A

Combine small categories into an 'Other' group

Grouping small items simplifies the chart and improves readability.

B

Change to a pie chart for each quarter

C

Increase the width of each bar

D

Switch to a 3D stacked column chart

Why: Combining small categories into an 'Other' group reduces visual clutter and improves readability by aggregating negligible contributions into a single bar segment. This technique preserves the overall trend while eliminating the noise from many tiny slices that make the stacked bar chart hard to interpret.

Want more Visualizing Data practice?

Practice this domain
5

Domain 5: Communicating Data Insights

All Communicating Data Insights questions
Q1
mediumFull explanation →

A data analyst notices that a line chart showing monthly sales over the past two years has a steep drop in one month. Upon investigation, the analyst discovers that a new sales region was added mid-month and the data was not normalized. Which of the following best practices should the analyst apply to communicate this insight accurately?

A

Remove the month with the drop from the report.

B

Use a bar chart instead to show the data.

C

Normalize the sales data by region and explain the data anomaly in the report.

Normalization corrects the artifact, and explanation provides transparency.

D

Present the data as-is and let stakeholders interpret the drop.

Why: Option C is correct because the core issue is that the sales data is not normalized by region, causing a misleading drop when a new region was added mid-month. By normalizing the data (e.g., calculating per-region averages or percentages) and explicitly noting the anomaly in the report, the analyst ensures accurate communication of insights. This aligns with the DA0-001 domain of Communicating Data Insights, where transparency and data integrity are paramount.
Q2
hardFull explanation →

A data team is preparing a dashboard for executives. The team wants to highlight key performance indicators (KPIs) that are below target. Which of the following visualization techniques would most effectively draw attention to underperforming metrics without causing confusion?

A

Remove underperforming KPIs from the dashboard to avoid confusion.

B

Use a scatter plot to show the relationship between KPIs.

C

Apply conditional formatting to turn KPI values red when below target.

Red highlights call attention to issues immediately.

D

Use a pie chart showing the proportion of each KPI.

Why: Conditional formatting that turns KPI values red when below target is the most effective technique because it leverages pre-attentive visual processing — the human eye naturally notices color changes (especially red) before other visual elements. This allows executives to instantly identify underperforming metrics without needing to interpret complex chart types, reducing cognitive load and confusion.
Q3
easyFull explanation →

A data analyst needs to present the distribution of customer ages to a non-technical audience. Which type of chart would be most appropriate?

A

Scatter plot

B

Histogram

Histograms show distribution of continuous data.

C

Pie chart

D

Line chart

Why: A histogram is the most appropriate chart for displaying the distribution of a single continuous variable, such as customer ages, to a non-technical audience. It groups ages into bins and shows the frequency of customers within each bin, making the shape, center, and spread of the distribution immediately visible. This aligns with the DA0-001 objective of selecting visualizations that clearly communicate data insights to stakeholders.
Q4
mediumFull explanation →

A data analyst creates a report showing sales by product category. The analyst notices that one category has a very high sales figure due to a one-time bulk order. Which of the following is the best way to communicate this insight to stakeholders?

A

Delete the bulk order from the dataset.

B

Add a note to the chart explaining the bulk order.

Annotation provides context for the anomaly.

C

Remove the category with the bulk order from the report.

D

Use a pie chart to show the proportion of each category.

Why: Option B is correct because it maintains data integrity while providing necessary context. Adding a note to the chart allows stakeholders to understand the anomaly without distorting the underlying data. This approach aligns with best practices in data communication, where transparency about outliers is critical for accurate interpretation.
Q5
hardFull explanation →

A data analyst is building a dashboard that will be used by both executives and operational managers. The executives need high-level summaries, while managers need granular details. Which dashboard design principle should the analyst apply?

A

Use a single chart that shows both summary and detail simultaneously.

B

Display all available data on one page for transparency.

C

Design the dashboard with drill-down capabilities from summary to detail.

Drill-down allows executives to see overview and managers to access details on demand.

D

Create two separate dashboards for each audience.

Why: Option C is correct because drill-down capabilities allow users to start with a high-level summary (e.g., total revenue by region) and then interactively navigate to granular details (e.g., individual transactions) without overwhelming either audience. This design principle supports both executive and operational manager needs within a single dashboard, maintaining clarity and performance by loading only the required level of detail on demand.
Q6
easyFull explanation →

A data analyst wants to compare the sales performance of four different stores over the same time period. Which chart type is most suitable?

A

Line chart with multiple lines

B

Grouped bar chart

Grouped bars allow side-by-side comparison of stores.

C

Stacked bar chart

D

Pie chart with multiple pies

Why: A grouped bar chart is the most suitable choice because it allows direct comparison of discrete categories (four stores) across a common time period, with each group representing a time interval and individual bars representing each store's sales. This chart type excels at side-by-side comparisons of multiple entities over the same categorical axis, making it ideal for the analyst's goal.

Want more Communicating Data Insights practice?

Practice this domain

Frequently asked questions

How many questions are on the DA0-001 exam?

The DA0-001 exam has 90 questions and must be completed in 90 minutes. The passing score is 675/1000.

What types of questions appear on the DA0-001 exam?

Multiple-choice and performance-based questions covering IT security, networking, and operations. Some questions are performance-based (PBQs), asking you to complete tasks in a simulated environment.

How are DA0-001 questions organised by domain?

The exam covers 5 domains: Comparing and Contrasting Data Concepts, Mining and Acquiring Data, Analyzing and Modeling Data, Visualizing Data, Communicating Data Insights. Questions are weighted by domain — higher-weight domains appear more on your actual exam.

Are these the actual DA0-001 exam questions?

No. These are original exam-style practice questions written against the official CompTIA DA0-001 exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.

Ready to practice all 90 DA0-001 questions?

Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.

Browse all DA0-001 questionsTake a timed practice test