Practice DA0-001 Comparing and Contrasting Data Concepts questions with full explanations on every answer.
Start practicing
Comparing and Contrasting Data Concepts — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A retail company stores customer purchase history in a relational database. The database contains a table 'transactions' with columns: transaction_id, customer_id, product_id, quantity, price, and transaction_date. A data analyst needs to create a report that shows total revenue per customer for the last quarter. Which data concept describes the relationship between customer_id and total revenue?
2A healthcare database stores patient records. Each patient has a unique patient_id, and the database includes a table 'visits' with visit_id, patient_id, visit_date, and diagnosis_code. To ensure data integrity, which constraint should be applied to the patient_id column in the 'visits' table?
3A data engineer is designing a data warehouse for a multinational corporation. The company has sales data from different regions with varying currencies and date formats. To ensure consistency, which data concept should be applied to standardize the data before loading into the warehouse?
4An e-commerce company uses a star schema for its data warehouse. The fact table 'sales_fact' contains foreign keys to dimension tables: customer_dim, product_dim, time_dim, and store_dim. A business user wants to know the total sales for each product category in the last month. Which join operation is required to retrieve this data?
5A data analyst is working with a dataset containing customer information. The dataset includes a column 'full_name' which stores first and last names together. To perform analysis on first names separately, which data concept describes the process of splitting 'full_name' into 'first_name' and 'last_name'?
6A data scientist is building a machine learning model to predict customer churn. The dataset includes both numerical features (age, income) and categorical features (gender, marital status). Which data concept describes the process of converting categorical features into numerical values that can be used by the algorithm?
7A company's database has a table 'orders' with columns: order_id, customer_id, order_date, and total_amount. A data analyst needs to identify customers who have placed more than 5 orders in the past year. Which data concept should be used to group orders by customer and count them?
8A data analyst receives a dataset with a column 'salary' that contains values like '45,000', '55,000', and '65,000'. The analyst notices that the values are stored as text. Which data concept should be applied to convert the salary column from text to numeric format for analysis?
9Which TWO of the following are characteristics of structured data? (Choose TWO.)
10Which THREE of the following are valid data quality dimensions? (Choose THREE.)
11Which TWO of the following are examples of data transformation? (Choose TWO.)
12A financial services company is migrating its customer data from a legacy on-premises relational database to a cloud-based data warehouse. The legacy database uses a denormalized schema with a single table 'customer_master' that contains all customer attributes, including repeated groups for multiple accounts per customer (account1_type, account1_balance, account2_type, account2_balance, etc.). The data warehouse team wants to implement a normalized star schema with separate dimension and fact tables. During the ETL process, the team encounters an error: 'Data truncation: string data right truncation' when loading account_type values into the dim_account table. The account_type column in dim_account is defined as VARCHAR(10), but the source data contains account types like 'SavingsPlus' (11 characters) and 'CheckingPremium' (15 characters). The team must resolve this issue without losing data. Which course of action should the team take?
13A healthcare organization maintains a database of patient records. The database has a table 'patients' with columns: patient_id (primary key), first_name, last_name, date_of_birth, gender, and last_visit_date. A data analyst is tasked with creating a report that lists all patients who have not visited in the last two years. The analyst writes a query: SELECT * FROM patients WHERE last_visit_date < DATEADD(year, -2, GETDATE()); However, the query returns zero rows, even though the analyst knows there are patients who have not visited for over two years. Upon inspection, the analyst discovers that the last_visit_date column contains NULL values for patients who have never visited. Which modification to the query should the analyst make to include patients with NULL last_visit_date?
14A data analyst needs to ensure that a customer's address is stored in a consistent format across multiple databases. Which data quality dimension is the analyst primarily concerned with?
15A data engineer is designing a data warehouse for a retail company. The fact table must record each sale transaction, including product ID, store ID, date, and quantity sold. The product details (name, category, price) are stored in a separate table. This design is an example of which data modeling concept?
16A data analyst is troubleshooting a report that shows unusually high sales for a specific product. Upon investigation, the analyst finds that the product was returned by several customers, but the returns were recorded in a separate system and not reflected in the sales data. Which data integration concept was likely missing?
17When the analyst runs the query, it fails. What is the most likely reason?
18A data analyst is reviewing the error log from a nightly batch load. What is the most likely cause of the error?
19A data analyst is creating a report for a marketing campaign. The campaign data includes customer names, email addresses, and purchase history. Which of the following best describes the 'customer name' data type?
20Which TWO of the following are considered structured data?
21Which TWO of the following are examples of data governance best practices?
22Which THREE of the following are characteristics of a relational database?
23Which THREE of the following are valid methods for handling missing data?
24A mid-sized e-commerce company stores customer data in a relational database. The database has a table named 'Customers' with columns: CustomerID (primary key), FirstName, LastName, Email, Phone, Address, City, State, ZipCode, and SignUpDate. The company is migrating to a new CRM system that requires a denormalized structure for performance reasons. The new system expects a single table 'CustomerDetails' with columns: CustomerID, FullName (concatenation of first and last name), ContactInfo (JSON object containing email, phone, and address), SignUpDate, and Region (derived from state). The data analyst must design an ETL process to transform the data. During a test run, the analyst notices that some records have missing Phone or Address values. Which of the following is the best approach to handle missing data in the ContactInfo JSON object?
25Drag and drop the steps for the ETL (Extract, Transform, Load) process in the correct order.
26Drag and drop the steps to create a data visualization dashboard in the correct order.
27Match each data quality dimension to its description.
28Match each data visualization type to its best use case.
29A company needs to store raw, unprocessed data from IoT sensors for future machine learning experiments. The data is in various formats and schemas are not yet defined. Which storage solution is most appropriate?
30A financial application requires fast query performance for aggregations on large historical datasets. The schema has many lookup tables. Which schema design is most efficient for this workload?
31A database table has columns: OrderID (primary key), ProductID, CustomerID, CustomerName, OrderDate, ProductName. All products are purchased only by the customer who placed the order. Which normal form violation exists if CustomerName depends on CustomerID?
32A data engineer needs to store logs from web servers that have varying fields. The logs are in JSON format. Which data type describes this JSON data?
33A retail company processes daily transactions. The current system transforms data before loading it into the data warehouse. The volume is growing rapidly, and they want to load raw data first to reduce processing time. Which approach should they adopt?
34A table Orders has OrderID (primary key), CustomerID, and CustomerEmail. During analysis, it is found that CustomerID uniquely identifies CustomerEmail. Which normal form is violated if both CustomerID and CustomerEmail are stored in this table?
35A company requires real-time masking of credit card numbers for customer support agents while allowing full access for accountants. Which technique should be implemented?
36A hospital's patient records system must process thousands of small transactions per second. Which type of database system is best suited for this workload?
37A data quality report shows that 95% of records have all required fields completed, but 20% of the completed fields contain values that are outside valid ranges. Which data quality dimension is most affected?
38Which TWO roles are primarily responsible for defining and enforcing data governance policies within an organization?
39Which THREE of the following are NoSQL database types?
40A data team must implement a data retention policy to reduce storage costs while meeting legal requirements. Which TWO actions best achieve this?
41Refer to the exhibit. A data analyst is trying to understand access permissions for the company-data bucket. Which statement accurately describes the effective permissions?
42Refer to the exhibit. A database administrator notices that queries filtering on both CustomerID and OrderDate are slow. Which single change would most likely improve performance for such queries?
43Refer to the exhibit. A data pipeline is failing to parse this log entry. What is the most likely cause of the error?
44A retail analyst needs to determine the most popular product category. The dataset includes columns: ProductID, Category, SalesDate, QuantitySold, UnitPrice. Which column contains qualitative data?
45A data scientist is building a model to predict customer churn. The company's internal CRM system provides customer demographics and transaction history. They also purchase demographic data from a third-party vendor. How should the purchased data be classified?
46A logistics company is analyzing truck delivery times. Which variable is discrete?
47A hospital wants to analyze patient readmission rates. The data contains daily patient visits. What is the level of granularity?
48A data analyst finds that the "Age" column contains values like "N/A", "unknown", and negative numbers. Which data quality dimension is primarily affected?
49An e-commerce company stores customer support emails in a text database, product images in a blob store, and sales transactions in a SQL table. Which data store holds only structured data?
50A market researcher conducts a survey with questions like "What is your favorite brand?" and "How many units do you purchase per year?" Which data types correspond?
51In a customer database, each row represents a customer with columns: CustomerID, Name, Address, Phone. What does the column "Name" represent?
52A data audit reveals that some numbers in the "Revenue" column were manually entered from PDF invoices. This introduces potential errors. Which data concept is being addressed?
53Which TWO data types are considered quantitative? (Select two.)
54Which THREE characteristics describe unstructured data? (Select three.)
55Which TWO are examples of primary data? (Select two.)
56Refer to the exhibit. Which type of data is the field "region"?
57Refer to the exhibit. Based on the data profiling results, what is a likely data quality issue?
58Refer to the exhibit. Which data quality dimension is compromised by the missing value for Charlie's salary?
59A retail company stores customer transaction data in a relational database. They want to analyze purchasing patterns over time. Which type of data structure best supports this analysis?
60A data analyst receives a dataset with inconsistent date formats (e.g., "01/02/2023", "2023-01-02", "Jan 2, 2023"). Which data quality dimension is most directly affected?
61A company is designing a data lake to store raw sensor data from IoT devices. The data arrives as JSON objects with varying schemas. Which storage approach is most appropriate?
62A healthcare provider needs to integrate patient data from multiple clinics into a single data warehouse. Which process is used to extract, transform, and load the data?
63A data analyst is working with a dataset that contains customer names and addresses. Some records have missing state codes. Which data quality issue is this?
64A financial institution wants to analyze transaction networks to detect fraud rings. Which database type is best suited for this analysis?
65A data architect is designing a schema for a product catalog where each product has a variable number of attributes. Which NoSQL database type is most appropriate?
66A data engineer is comparing data warehouses and data lakes. Which statement accurately describes a data warehouse?
67During an ETL process, a data quality check fails due to duplicate customer IDs. Which data quality dimension is violated?
68Which TWO of the following are examples of semi-structured data?
69Which THREE data quality dimensions are commonly assessed in a data profiling task?
70Which TWO of the following are characteristics of a data lake?
71Refer to the exhibit. A data analyst notices that direct S3 access to files outside the "incoming/" prefix is blocked. Which data governance principle does this policy enforce?
72Refer to the exhibit. A data analyst runs this query to identify high-value customers. However, the result does not include customers with exactly 5 orders. Which data concept does the HAVING clause illustrate?
73Refer to the exhibit. An Avro schema is defined as shown. Which data design concept does this represent?
74A marketing team needs to store customer feedback from social media posts, including text, images, and emojis. Which data concept is most appropriate for this storage?
75An organization wants to assign responsibility for data quality and metadata management. Which role is primarily accountable for defining data standards and ensuring data quality across a specific domain?
76A data analyst notices that customer addresses in the database contain invalid ZIP codes. Which data quality dimension is being violated?
77A company is implementing a data lifecycle management policy. Which stage occurs immediately after data is created?
78To consolidate data from multiple operational databases into a central repository for reporting, a company decides to transform data before loading it into the target system. Which data integration approach is being used?
79A business needs to store large volumes of raw data in its native format for future analytics. Which storage architecture is most appropriate?
80A data modeler is designing a dimensional model for a sales analytics system. The fact table contains sales transactions, and the dimension tables include product, customer, and time. To reduce data redundancy, the modeler normalizes the dimension tables into multiple related tables. Which schema is being implemented?
81An organization has multiple systems that store customer information inconsistently. To create a single authoritative view of customer data, they implement a process that identifies and merges duplicate records. This is an example of which data management discipline?
82A data governance team is drafting a policy for handling personally identifiable information (PII). According to data governance best practices, which document should define the classification levels and handling procedures?
83Which TWO of the following are considered internal data sources within an organization?
84Which THREE of the following are common characteristics of unstructured data?
85Which TWO of the following are primary benefits of implementing a data governance program?
86Refer to the exhibit. The data shown is an example of which data concept?
87Refer to the exhibit. Which data concept does this exhibit best represent?
88Refer to the exhibit. Which conclusion can be drawn from this data quality report?
89A market research firm collects survey responses where customers rate satisfaction on a scale of 'Very Unsatisfied', 'Unsatisfied', 'Neutral', 'Satisfied', 'Very Satisfied'. What type of data is being collected?
90A data analyst needs to compare sales data from the company's internal CRM with public demographic data from a government census. Which data concept best describes this scenario?
91A sensor records temperature readings in Celsius and a separate sensor records wind speed in meters per second. A data scientist wants to combine these datasets for analysis. Which statement accurately compares these data types?
92A company stores customer data in a relational database with tables for orders, products, and customers. Which type of data best describes this?
93A researcher wants to study the effect of a new drug. She collects data directly from clinical trial participants. Later, she compares her findings with historical data from medical journals. Which contrast best describes her data sources?
94A data analyst notices that a column labeled 'Income' contains values like '$50,000' and '$75,000', but also 'High' and 'Low'. What data concept issue is occurring?
95Which of the following is an example of qualitative data?
96A data team is building a predictive model. They have data on 'Number of employees' (whole numbers) and 'Revenue' (currency). Which statement correctly compares these data types?
97A dataset contains a column 'Education Level' with values: 'High School', 'Bachelor', 'Master', 'PhD'. An analyst computes the average by assigning numbers 1-4. Which data concept is being violated?
98Which TWO of the following are characteristics of structured data? (Choose TWO.)
99Which TWO of the following are examples of quantitative data? (Choose TWO.)
100Which THREE of the following are properties of ratio data? (Choose THREE.)
101The exhibit shows a JSON schema for a dataset. Which statement correctly describes the data types represented?
102A retail company has merged with another firm and now needs to create a unified customer data warehouse. The existing systems use different data classification methods: System A stores customer income as a categorical range (e.g., '$0-$50k', '$50k-$100k', '$100k+') while System B stores exact income as a decimal number. A data analyst must combine these into a single table. The goal is to perform statistical analysis that includes calculating average income, but the categorical data from System A loses precision. The analyst proposes converting System B's exact values into the same ranges as System A to ensure consistency. However, the data governance team wants to preserve as much detail as possible. Which course of action should the analyst recommend?
103A healthcare analytics team is building a dashboard to monitor patient vitals. They receive data from two sources: Source 1 provides 'heart rate' as an integer (beats per minute), and Source 2 provides 'blood pressure' as a ratio (systolic/diastolic, e.g., 120/80). The team wants to create a combined metric called 'cardiac stress index' that uses both heart rate and systolic blood pressure. However, they notice that heart rate data occasionally contains negative values due to sensor errors. The data governance policy requires that all data be valid and meaningful. Which action best addresses the data quality issue while preserving the data types?
104A retail company analyzes customer purchase data to improve inventory management. They store daily transaction records in a relational database and monthly aggregate reports in a data warehouse. Which difference between these storage methods best explains why the warehouse is more suitable for trend analysis?
105A data analyst is evaluating data quality issues in a customer database. Which TWO actions are best practices for ensuring data consistency?
106An organization is implementing a data lake to store raw data from various sources. Which THREE characteristics are typically associated with a data lake compared to a data warehouse?
107A manufacturing company has two primary data systems: an ERP system that stores production orders with fields like OrderID, ProductID, Quantity, and ProductionDate, and a CRM system that stores customer sales with fields like SaleID, CustomerID, ProductID, SaleDate, and Amount. The data analyst needs to create a unified view of product performance by joining these tables. However, the ProductID field in the ERP uses a 5-character alphanumeric code (e.g., 'P1234'), while the CRM uses a 6-character code (e.g., 'PR1234'). Additionally, some products have multiple entries due to slight variations in naming. The analyst wants to ensure accurate matching without losing data. Which action should the analyst take first to address the data inconsistency?
108A large financial institution is implementing a data governance framework to comply with new regulations requiring strict control over sensitive customer data. The data governance committee has identified several domains, including customer master data, transaction data, and risk assessment data. They need to decide on a master data management (MDM) approach that ensures a single, authoritative source of customer information across all systems. However, the current environment has multiple legacy systems with conflicting customer records. The committee is concerned about downtime and business disruption during the transition. Which MDM approach best balances data consistency with minimal operational impact?
109A data analyst at a marketing agency is working with a dataset containing customer demographics, purchase history, and social media engagement metrics. The agency wants to perform sentiment analysis on unstructured social media comments to identify brand perception. The dataset also includes structured fields like age, income, and purchase amounts. The analyst needs to choose a storage and processing platform that can handle both structured and unstructured data efficiently without requiring extensive schema definition upfront. Which platform should the analyst recommend?
110A telecommunications company is experiencing issues with its customer satisfaction survey data. The data is collected from multiple channels: phone, email, and web forms. Each channel uses a different scale for ratings: phone uses 1-10, email uses 1-5, and web uses 1-7. Additionally, some survey responses contain missing values for demographic fields. The data analyst needs to calculate an overall satisfaction score that is comparable across all channels. The company's leadership wants a single metric that minimizes distortion from the different scales. Which approach should the analyst use to standardize the ratings?
111A healthcare organization is subject to strict data privacy regulations requiring the classification of all data assets. The data governance team has identified three data sensitivity levels: Public, Internal, and Restricted. They have a new data pipeline importing patient health records from multiple clinics. The records include patient names, diagnoses, treatment codes, and insurance information. The team must ensure that the classification is applied correctly and that restricted data (e.g., diagnoses) is not exposed to unauthorized personnel. However, the pipeline uses automated tagging based on metadata rules, and some fields are misclassified. What is the most effective immediate action to improve classification accuracy?
112An e-commerce company wants to provide real-time personalized product recommendations based on customer browsing behavior. Currently, they have a traditional data warehouse that processes batch updates every night. The marketing team complains that recommendations are outdated within hours because customers see yesterday's data. The data engineer needs to modify the architecture to support near-real-time analytics. The budget is limited, and the existing warehouse infrastructure must be reused as much as possible. Which architectural change would best meet the requirement?
113A data analyst is comparing characteristics of structured and unstructured data. Which TWO of the following are characteristics of structured data? (Choose two.)
114Refer to the exhibit. A data architect is designing a data dictionary for a relational database. Based on the exhibit, which data concept is being illustrated?
115A retail company is merging customer data from three separate systems: an e-commerce platform, a point-of-sale (POS) system, and a loyalty program. The e-commerce platform stores customer names in "FirstName LastName" format, the POS system stores names as "LastName, FirstName", and the loyalty program stores names in separate "first_name" and "last_name" fields. The data analyst needs to create a unified customer master table. After initial merging, there are 20% more records than expected, including duplicates with slight name variations (e.g., "John Smith" vs "John A. Smith"). To ensure accurate consolidation, which data concept should the analyst prioritize applying first?
The Comparing and Contrasting Data Concepts domain covers the key concepts tested in this area of the DA0-001 exam blueprint published by CompTIA. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DA0-001 domains — no account required.
The Courseiva DA0-001 question bank contains 115 questions in the Comparing and Contrasting Data Concepts domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Comparing and Contrasting Data Concepts domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included