DA0-001 Exam Questions and Answers

A retail company stores customer purchase history in a relational database. The database contains a table 'transactions' with columns: transaction_id, customer_id, product_id, quantity, price, and transaction_date. A data analyst needs to create a report that shows total revenue per customer for the last quarter. Which data concept describes the relationship between customer_id and total revenue?

Foreign key

Composite attribute

Derived attribute

Total revenue is calculated from other attributes, making it derived.

Atomic attribute

Why: Total revenue is calculated by summing (quantity * price) for each customer, making it a derived attribute because it is computed from existing stored data (quantity and price) rather than stored directly. In the context of the 'transactions' table, customer_id is a stored key, but total_revenue is not stored; it is derived via aggregation, which matches the definition of a derived attribute in database design.

A healthcare database stores patient records. Each patient has a unique patient_id, and the database includes a table 'visits' with visit_id, patient_id, visit_date, and diagnosis_code. To ensure data integrity, which constraint should be applied to the patient_id column in the 'visits' table?

Unique constraint

Foreign key

Foreign key enforces referential integrity.

Primary key

Check constraint

Why: Option B is correct because a foreign key constraint ensures that patient_id in visits references a valid patient_id in the patient table. Option A is wrong because primary key ensures uniqueness in its own table. Option C is wrong because unique constraint prevents duplicates. Option D is wrong because check constraint validates values based on a condition.

A data engineer is designing a data warehouse for a multinational corporation. The company has sales data from different regions with varying currencies and date formats. To ensure consistency, which data concept should be applied to standardize the data before loading into the warehouse?

Data cleansing

Data transformation

Transformation includes standardization of formats.

Data profiling

Data masking

Why: Data transformation is the correct concept because it involves converting data from source formats (e.g., different currencies and date formats) into a consistent, standardized format before loading into the data warehouse. This process includes applying conversion rules, such as using ISO 8601 for dates and a single base currency (e.g., USD) with exchange rate tables, ensuring uniformity across all regional data. Without transformation, the warehouse would contain incompatible data types, breaking referential integrity and analytical queries.

An e-commerce company uses a star schema for its data warehouse. The fact table 'sales_fact' contains foreign keys to dimension tables: customer_dim, product_dim, time_dim, and store_dim. A business user wants to know the total sales for each product category in the last month. Which join operation is required to retrieve this data?

Self-join on the fact table

Cross join between fact and dimension tables

Inner join between fact table and dimension tables

Inner join returns only matching rows, which is typical in star schema queries.

Left outer join between fact and dimension tables

Why: To retrieve total sales for each product category, you need to join the fact table with the product dimension table to map product keys to categories, and with the time dimension table to filter on the last month. An inner join is correct because it returns only rows where matching keys exist in both tables, which is the standard approach for star-schema queries where all required dimension attributes are present. This ensures that only valid sales transactions with corresponding product and time entries are included in the aggregation.

A data analyst is working with a dataset containing customer information. The dataset includes a column 'full_name' which stores first and last names together. To perform analysis on first names separately, which data concept describes the process of splitting 'full_name' into 'first_name' and 'last_name'?

Data deduplication

Data summarization

Data normalization

Normalization reduces redundancy and breaks down attributes.

Data aggregation

Why: Option C is correct because data normalization is the process of organizing data to reduce redundancy and improve integrity, which includes splitting composite attributes like 'full_name' into atomic values ('first_name', 'last_name'). This aligns with the first normal form (1NF) principle in database design, where each column should contain indivisible values. The data analyst is decomposing a single field into multiple, more granular fields to enable separate analysis.

A data scientist is building a machine learning model to predict customer churn. The dataset includes both numerical features (age, income) and categorical features (gender, marital status). Which data concept describes the process of converting categorical features into numerical values that can be used by the algorithm?

Data sampling

Encoding

Encoding converts categories to numbers, e.g., one-hot encoding.

Feature scaling

Dimensionality reduction

Why: Encoding is the correct data concept because it transforms categorical features (like gender and marital status) into numerical representations (e.g., one-hot encoding, label encoding) that machine learning algorithms can process. Unlike feature scaling or dimensionality reduction, encoding directly addresses the incompatibility of non-numeric data with mathematical model operations.

Want more Comparing and Contrasting Data Concepts practice?

All Mining and Acquiring Data questions

Domain 2: Mining and Acquiring Data

A data analyst is pulling data from a production database for a report. The database contains customer orders with a column 'order_date'. The analyst notices that some orders have dates in the future. Which data quality issue does this represent?

Invalid data type

Inconsistent data

Missing data

Violation of business rules

Future orders are not valid per business rules, indicating a data quality issue.

Why: Option D is correct because future order dates violate a business rule that order_date must be in the past or present. This is a classic data integrity issue where the data does not conform to domain-specific constraints, such as 'order_date <= CURRENT_DATE'. The analyst should flag this as a violation of business rules, not a data type or consistency problem.

A data engineer is designing a data pipeline to ingest streaming data from IoT sensors. The sensors send data every second, and the pipeline must handle bursts of up to 10,000 messages per second. Which approach is most appropriate for capturing this data before processing?

Directly write each message to a relational database

Load directly into a data warehouse

Use a message queue to buffer the incoming data

A message queue handles high throughput and provides reliable buffering.

Store data in flat files and process in nightly batches

Why: Option C is correct because a message queue (e.g., Apache Kafka, Amazon Kinesis, or RabbitMQ) provides an asynchronous buffer that decouples the high-velocity ingestion (up to 10,000 messages/second) from downstream processing. This allows the pipeline to absorb burst traffic without overwhelming the processing layer, ensures data durability, and supports replayability in case of failures.

A data analyst needs to combine two datasets: one contains customer information (customer_id, name, address) and the other contains order information (order_id, customer_id, order_date). The analyst wants to include all customers, even those who have not placed orders. Which type of join should be used?

FULL OUTER JOIN

INNER JOIN

LEFT JOIN

LEFT JOIN includes all customers, with order data where available.

RIGHT JOIN

Why: A LEFT JOIN returns all rows from the left table (customers) and the matching rows from the right table (orders). If a customer has no orders, the order columns will contain NULLs. This satisfies the requirement to include all customers, even those without orders.

A data analyst is tasked with extracting data from a legacy system that outputs fixed-width text files. The analyst needs to parse these files into a structured format. Which tool or method is most appropriate for this task?

A spreadsheet application

An ETL tool with a graphical interface

A scripting language such as Python

Python provides libraries and string manipulation ideal for parsing fixed-width files.

SQL

Why: Python is the most appropriate choice because fixed-width text files require precise column slicing based on character positions, which Python's string slicing and libraries like `struct` or `pandas.read_fwf` handle natively. Unlike graphical ETL tools or spreadsheets, Python provides programmatic control to define exact field widths, handle edge cases like missing delimiters, and process large files efficiently without manual intervention.

A company is merging two databases from different departments. In Database A, customer IDs are integers. In Database B, customer IDs are alphanumeric strings. To merge, the data analyst must reconcile these differences. Which step should be taken first?

Drop the ID column and use a surrogate key

Convert all IDs to integers using CAST

Perform data profiling to understand the ID formats and relationships

Profiling helps determine the best strategy for reconciliation.

Create a mapping table based on the first character

Why: Option C is correct because data profiling is the essential first step before any transformation or mapping. It allows the analyst to examine the actual formats, patterns, and relationships in both ID columns (e.g., whether Database B's alphanumeric IDs contain embedded numeric sequences or consistent prefixes). Without profiling, any conversion or mapping would be based on assumptions that could lead to data loss or incorrect merges.

A data analyst needs to extract data from an API that returns JSON. The analyst wants to convert the JSON output into a tabular format for analysis. Which function in a scripting language is commonly used for this purpose?

json.loads()

to_csv()

read_json()

json_normalize()

This function normalizes semi-structured JSON data into a flat table.

Why: Option D is correct because `json_normalize()` is a function in the pandas library specifically designed to flatten semi-structured JSON data (including nested lists and dictionaries) into a tabular DataFrame. This makes it the ideal tool for converting API responses with complex nesting into rows and columns for analysis, unlike simpler JSON parsing functions.

Want more Mining and Acquiring Data practice?

All Analyzing and Modeling Data questions

Domain 3: Analyzing and Modeling Data

A data analyst needs to identify the most frequently occurring value in a dataset. Which measure of central tendency should they use?

Mode

Mode is the most frequently occurring value.

Standard deviation

Median

Mean

Why: The mode is the measure of central tendency that identifies the most frequently occurring value in a dataset. Unlike the mean or median, the mode directly counts the frequency of each distinct value and returns the value with the highest count, making it the correct choice for this specific requirement.

A retail company wants to predict future sales based on historical data. Which modeling approach is most appropriate if the data shows a clear seasonal pattern?

Linear regression

Time series analysis

Time series analysis explicitly models seasonal patterns.

K-means clustering

Logistic regression

Why: Time series analysis is specifically designed to model data points indexed in time order, making it ideal for capturing and forecasting seasonal patterns. Unlike regression models, it accounts for autocorrelation, trends, and seasonality components, which are critical for accurate sales prediction from historical data.

A data analyst is building a model to predict customer churn. The dataset has 10,000 records with 500 churned customers. The model predicts churn with 95% accuracy, but only identifies 10% of actual churners. Which metric best highlights this issue?

Accuracy

F1 score

Recall

Recall is low (10%), showing the model fails to detect churners.

Precision

Why: Recall (also known as sensitivity or true positive rate) measures the proportion of actual positives correctly identified. With only 10% of actual churners detected, the model has a recall of 0.1, which directly highlights the failure to capture churners despite high overall accuracy.

A data analyst needs to combine two datasets that have the same columns but different rows. Which operation should they use?

Concatenate

Append

Append adds rows from one dataset to another with same columns.

Merge

Aggregate

Why: Option B (Append) is correct because appending is the standard operation for combining two datasets with identical columns but different rows, stacking the rows from one dataset onto the other. In tools like SQL, this is achieved with the UNION or UNION ALL operator, and in Python pandas, it is done via the `append()` method or `pd.concat()` with axis=0. This operation preserves the column structure while extending the row count.

A data analyst is performing a hypothesis test with a significance level of 0.05. The p-value obtained is 0.03. What should the analyst conclude?

Reject the null hypothesis

p < alpha indicates statistically significant result.

Fail to reject the null hypothesis

Accept the null hypothesis

The result is practically significant

Why: Since the p-value (0.03) is less than the significance level (0.05), the result is statistically significant. This means the observed data provides sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. The analyst should conclude that there is a statistically significant effect or difference.

A data scientist trains a regression model and observes high variance with low bias. Which technique is most appropriate to reduce variance?

Apply Ridge regularization

Ridge adds penalty to coefficients, reducing overfitting and variance.

Increase polynomial features

Use a smaller training set

Remove correlated features

Why: Ridge regularization (L2) reduces variance by adding a penalty term proportional to the square of the coefficients, which shrinks them toward zero without eliminating them. This directly addresses high variance (overfitting) by constraining the model's complexity, while low bias indicates the model fits the training data well. The regularization parameter λ controls the trade-off between bias and variance.

Want more Analyzing and Modeling Data practice?

All Visualizing Data questions

Domain 4: Visualizing Data

A data analyst is creating a dashboard to monitor server CPU utilization over the past 24 hours. Which chart type is most appropriate for showing the trend of CPU usage over time?

Scatter plot

Pie chart

Line chart

Line charts display trends over time effectively.

Bar chart

Why: A line chart is the most appropriate choice for displaying CPU utilization trends over a continuous 24-hour period because it connects data points in chronological order, making it easy to observe peaks, valleys, and overall patterns. The x-axis represents time (hours), and the y-axis represents CPU usage percentage, allowing the analyst to quickly identify when utilization spikes or drops. This aligns with the DA0-001 objective of selecting the correct visualization for time-series data.

A sales dashboard shows monthly revenue but the bars are very tall for some months and very short for others, making comparisons difficult. Which visualization modification would best improve readability?

Change bar colors to gradient

Apply a logarithmic scale on the y-axis

Log scale compresses wide ranges so differences are visible.

Add more horizontal gridlines

Use a 3D bar chart for depth

Why: A logarithmic scale compresses the y-axis so that large values are displayed proportionally to small values, making it easier to compare relative changes across months with vastly different revenue figures. This is particularly useful when the data spans several orders of magnitude, as it prevents tall bars from dominating the view and short bars from being barely visible.

A data analyst creates a bubble chart showing country GDP (x-axis), life expectancy (y-axis), and population (bubble size). However, large bubbles overlap and obscure many data points. Which corrective action should the analyst take?

Increase the chart canvas size

Set bubble opacity to 70%

Transparency allows seeing through overlapping bubbles.

Reduce all bubble sizes uniformly

Remove outlier countries with large populations

Why: Setting bubble opacity to 70% allows overlapping bubbles to become semi-transparent, so data points underneath remain visible. This technique preserves the original data representation (GDP, life expectancy, and population) without altering the chart's scale or removing data. It is a standard visualization practice for handling overplotting in dense scatter plots and bubble charts.

An analyst wants to show the distribution of test scores for 500 students. Which visualization type is best for understanding the shape of the distribution?

Line chart

Pie chart

Scatter plot

Histogram

Histograms display frequency distribution of numerical data.

Why: A histogram is the correct choice because it groups continuous test scores into bins and displays the frequency of scores within each bin, allowing the analyst to see the shape of the distribution (e.g., normal, skewed, bimodal). This directly addresses the goal of understanding distribution shape, which is a core use case for histograms in data visualization.

A dashboard shows sales by region using a map with color intensity. Users complain that two regions with very different sales appear nearly the same color. What is the most likely cause?

The map projection is distorted

The color scale uses a sequential palette with insufficient contrast

Sequential palettes can have low perceptual difference between adjacent values.

The monitor resolution is too low

Users are color blind

Why: The issue is that the color scale uses a sequential palette with insufficient contrast between adjacent data values. When the color gradient is too narrow or uses similar hues, regions with significantly different sales figures map to nearly identical colors, making the visualization ineffective. This is a common problem in data visualization when the color mapping does not span the full range of the data or uses a perceptually uniform palette poorly.

An analyst creates a stacked bar chart showing quarterly sales by product category. The chart becomes hard to read because some categories have very small contributions. Which redesign is most effective?

Combine small categories into an 'Other' group

Grouping small items simplifies the chart and improves readability.

Change to a pie chart for each quarter

Increase the width of each bar

Switch to a 3D stacked column chart

Why: Combining small categories into an 'Other' group reduces visual clutter and improves readability by aggregating negligible contributions into a single bar segment. This technique preserves the overall trend while eliminating the noise from many tiny slices that make the stacked bar chart hard to interpret.

Want more Visualizing Data practice?

All Communicating Data Insights questions

Domain 5: Communicating Data Insights

A data analyst notices that a line chart showing monthly sales over the past two years has a steep drop in one month. Upon investigation, the analyst discovers that a new sales region was added mid-month and the data was not normalized. Which of the following best practices should the analyst apply to communicate this insight accurately?

Remove the month with the drop from the report.

Use a bar chart instead to show the data.

Normalize the sales data by region and explain the data anomaly in the report.

Normalization corrects the artifact, and explanation provides transparency.

Present the data as-is and let stakeholders interpret the drop.

Why: Option C is correct because the core issue is that the sales data is not normalized by region, causing a misleading drop when a new region was added mid-month. By normalizing the data (e.g., calculating per-region averages or percentages) and explicitly noting the anomaly in the report, the analyst ensures accurate communication of insights. This aligns with the DA0-001 domain of Communicating Data Insights, where transparency and data integrity are paramount.

A data team is preparing a dashboard for executives. The team wants to highlight key performance indicators (KPIs) that are below target. Which of the following visualization techniques would most effectively draw attention to underperforming metrics without causing confusion?

Remove underperforming KPIs from the dashboard to avoid confusion.

Use a scatter plot to show the relationship between KPIs.

Apply conditional formatting to turn KPI values red when below target.

Red highlights call attention to issues immediately.

Use a pie chart showing the proportion of each KPI.

Why: Conditional formatting that turns KPI values red when below target is the most effective technique because it leverages pre-attentive visual processing — the human eye naturally notices color changes (especially red) before other visual elements. This allows executives to instantly identify underperforming metrics without needing to interpret complex chart types, reducing cognitive load and confusion.

A data analyst needs to present the distribution of customer ages to a non-technical audience. Which type of chart would be most appropriate?

Scatter plot

Histogram

Histograms show distribution of continuous data.

Pie chart

Line chart

Why: A histogram is the most appropriate chart for displaying the distribution of a single continuous variable, such as customer ages, to a non-technical audience. It groups ages into bins and shows the frequency of customers within each bin, making the shape, center, and spread of the distribution immediately visible. This aligns with the DA0-001 objective of selecting visualizations that clearly communicate data insights to stakeholders.

A data analyst creates a report showing sales by product category. The analyst notices that one category has a very high sales figure due to a one-time bulk order. Which of the following is the best way to communicate this insight to stakeholders?

Delete the bulk order from the dataset.

Add a note to the chart explaining the bulk order.

Annotation provides context for the anomaly.

Remove the category with the bulk order from the report.

Use a pie chart to show the proportion of each category.

Why: Option B is correct because it maintains data integrity while providing necessary context. Adding a note to the chart allows stakeholders to understand the anomaly without distorting the underlying data. This approach aligns with best practices in data communication, where transparency about outliers is critical for accurate interpretation.

A data analyst is building a dashboard that will be used by both executives and operational managers. The executives need high-level summaries, while managers need granular details. Which dashboard design principle should the analyst apply?

Use a single chart that shows both summary and detail simultaneously.

Display all available data on one page for transparency.

Design the dashboard with drill-down capabilities from summary to detail.

Drill-down allows executives to see overview and managers to access details on demand.

Create two separate dashboards for each audience.

Why: Option C is correct because drill-down capabilities allow users to start with a high-level summary (e.g., total revenue by region) and then interactively navigate to granular details (e.g., individual transactions) without overwhelming either audience. This design principle supports both executive and operational manager needs within a single dashboard, maintaining clarity and performance by loading only the required level of detail on demand.

A data analyst wants to compare the sales performance of four different stores over the same time period. Which chart type is most suitable?

Line chart with multiple lines

Grouped bar chart

Grouped bars allow side-by-side comparison of stores.

Stacked bar chart

Pie chart with multiple pies

Why: A grouped bar chart is the most suitable choice because it allows direct comparison of discrete categories (four stores) across a common time period, with each group representing a time interval and individual bars representing each store's sales. This chart type excels at side-by-side comparisons of multiple entities over the same categorical axis, making it ideal for the analyst's goal.

Want more Communicating Data Insights practice?

Browse all DA0-001 questions Take a timed practice test

Frequently asked questions

How many questions are on the DA0-001 exam?

The DA0-001 exam has 90 questions and must be completed in 90 minutes. The passing score is 675/1000.

What types of questions appear on the DA0-001 exam?

Multiple-choice and performance-based questions covering IT security, networking, and operations. Some questions are performance-based (PBQs), asking you to complete tasks in a simulated environment.

How are DA0-001 questions organised by domain?

The exam covers 5 domains: Comparing and Contrasting Data Concepts, Mining and Acquiring Data, Analyzing and Modeling Data, Visualizing Data, Communicating Data Insights. Questions are weighted by domain — higher-weight domains appear more on your actual exam.

Are these the actual DA0-001 exam questions?

No. These are original exam-style practice questions written against the official CompTIA DA0-001 exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.

Ready to practice all 90 DA0-001 questions?

Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.

CompTIA · Free Practice Questions · Last reviewed May 2026

DA0-001 Exam Questions and Answers

30real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.

90 exam questions

90 min time limit

Pass: 675/1000 / 1000

5 exam domains

Overview Domain Blueprint Study Guide All QuestionsSample by Domain

1. Comparing and Contrasting Data Concepts 2. Mining and Acquiring Data 3. Analyzing and Modeling Data 4. Visualizing Data 5. Communicating Data Insights

Domain 1: Comparing and Contrasting Data Concepts

All Comparing and Contrasting Data Concepts questions

Foreign key

Composite attribute

Derived attribute

Total revenue is calculated from other attributes, making it derived.

Atomic attribute

Unique constraint

Foreign key

Foreign key enforces referential integrity.

Primary key

Check constraint

Data cleansing

Data transformation

Transformation includes standardization of formats.

Data profiling

Data masking

Self-join on the fact table

Cross join between fact and dimension tables

Inner join between fact table and dimension tables

Inner join returns only matching rows, which is typical in star schema queries.

Left outer join between fact and dimension tables

Data deduplication

Data summarization

Data normalization

Normalization reduces redundancy and breaks down attributes.

Data aggregation

Data sampling

Encoding

Encoding converts categories to numbers, e.g., one-hot encoding.

Feature scaling

Dimensionality reduction

Want more Comparing and Contrasting Data Concepts practice?

All Mining and Acquiring Data questions

Domain 2: Mining and Acquiring Data

Invalid data type

Inconsistent data

Missing data

Violation of business rules

Future orders are not valid per business rules, indicating a data quality issue.

Directly write each message to a relational database

Load directly into a data warehouse

Use a message queue to buffer the incoming data

A message queue handles high throughput and provides reliable buffering.

Store data in flat files and process in nightly batches

FULL OUTER JOIN

INNER JOIN

LEFT JOIN

LEFT JOIN includes all customers, with order data where available.

RIGHT JOIN

A spreadsheet application

An ETL tool with a graphical interface

A scripting language such as Python

Python provides libraries and string manipulation ideal for parsing fixed-width files.

SQL

Drop the ID column and use a surrogate key

Convert all IDs to integers using CAST

Perform data profiling to understand the ID formats and relationships

Profiling helps determine the best strategy for reconciliation.

Create a mapping table based on the first character

json.loads()

to_csv()

read_json()

json_normalize()

This function normalizes semi-structured JSON data into a flat table.

Want more Mining and Acquiring Data practice?

All Analyzing and Modeling Data questions

Domain 3: Analyzing and Modeling Data

A data analyst needs to identify the most frequently occurring value in a dataset. Which measure of central tendency should they use?

Mode

Mode is the most frequently occurring value.

Standard deviation

Median

Mean

A retail company wants to predict future sales based on historical data. Which modeling approach is most appropriate if the data shows a clear seasonal pattern?

Linear regression

Time series analysis

Time series analysis explicitly models seasonal patterns.

K-means clustering

Logistic regression

Accuracy

F1 score

Recall

Recall is low (10%), showing the model fails to detect churners.

Precision

A data analyst needs to combine two datasets that have the same columns but different rows. Which operation should they use?

Concatenate

Append

Append adds rows from one dataset to another with same columns.

Merge

Aggregate

A data analyst is performing a hypothesis test with a significance level of 0.05. The p-value obtained is 0.03. What should the analyst conclude?

Reject the null hypothesis

p < alpha indicates statistically significant result.

Fail to reject the null hypothesis

Accept the null hypothesis

The result is practically significant

A data scientist trains a regression model and observes high variance with low bias. Which technique is most appropriate to reduce variance?

Apply Ridge regularization

Ridge adds penalty to coefficients, reducing overfitting and variance.

Increase polynomial features

Use a smaller training set

Remove correlated features

Want more Analyzing and Modeling Data practice?

All Visualizing Data questions

Domain 4: Visualizing Data

A data analyst is creating a dashboard to monitor server CPU utilization over the past 24 hours. Which chart type is most appropriate for showing the trend of CPU usage over time?

Scatter plot

Pie chart

Line chart

Line charts display trends over time effectively.

Bar chart

Change bar colors to gradient

Apply a logarithmic scale on the y-axis

Log scale compresses wide ranges so differences are visible.

Add more horizontal gridlines

Use a 3D bar chart for depth

Increase the chart canvas size

Set bubble opacity to 70%

Transparency allows seeing through overlapping bubbles.

Reduce all bubble sizes uniformly

Remove outlier countries with large populations

An analyst wants to show the distribution of test scores for 500 students. Which visualization type is best for understanding the shape of the distribution?

Line chart

Pie chart

Scatter plot

Histogram

Histograms display frequency distribution of numerical data.

A dashboard shows sales by region using a map with color intensity. Users complain that two regions with very different sales appear nearly the same color. What is the most likely cause?

The map projection is distorted

The color scale uses a sequential palette with insufficient contrast

Sequential palettes can have low perceptual difference between adjacent values.

The monitor resolution is too low

Users are color blind

Combine small categories into an 'Other' group

Grouping small items simplifies the chart and improves readability.

Change to a pie chart for each quarter

Increase the width of each bar

Switch to a 3D stacked column chart

Want more Visualizing Data practice?

All Communicating Data Insights questions

Domain 5: Communicating Data Insights

Remove the month with the drop from the report.

Use a bar chart instead to show the data.

Normalize the sales data by region and explain the data anomaly in the report.

Normalization corrects the artifact, and explanation provides transparency.

Present the data as-is and let stakeholders interpret the drop.

Remove underperforming KPIs from the dashboard to avoid confusion.

Use a scatter plot to show the relationship between KPIs.

Apply conditional formatting to turn KPI values red when below target.

Red highlights call attention to issues immediately.

Use a pie chart showing the proportion of each KPI.

A data analyst needs to present the distribution of customer ages to a non-technical audience. Which type of chart would be most appropriate?

Scatter plot

Histogram

Histograms show distribution of continuous data.

Pie chart

Line chart

Delete the bulk order from the dataset.

Add a note to the chart explaining the bulk order.

Annotation provides context for the anomaly.

Remove the category with the bulk order from the report.

Use a pie chart to show the proportion of each category.

Use a single chart that shows both summary and detail simultaneously.

Display all available data on one page for transparency.

Design the dashboard with drill-down capabilities from summary to detail.

Drill-down allows executives to see overview and managers to access details on demand.

Create two separate dashboards for each audience.

A data analyst wants to compare the sales performance of four different stores over the same time period. Which chart type is most suitable?

Line chart with multiple lines

Grouped bar chart

Grouped bars allow side-by-side comparison of stores.

Stacked bar chart

Pie chart with multiple pies

Want more Communicating Data Insights practice?