Practice DA0-001 Mining and Acquiring Data questions with full explanations on every answer.
Start practicing
Mining and Acquiring Data — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A data analyst is pulling data from a production database for a report. The database contains customer orders with a column 'order_date'. The analyst notices that some orders have dates in the future. Which data quality issue does this represent?
2A data engineer is designing a data pipeline to ingest streaming data from IoT sensors. The sensors send data every second, and the pipeline must handle bursts of up to 10,000 messages per second. Which approach is most appropriate for capturing this data before processing?
3A data analyst needs to combine two datasets: one contains customer information (customer_id, name, address) and the other contains order information (order_id, customer_id, order_date). The analyst wants to include all customers, even those who have not placed orders. Which type of join should be used?
4A data analyst is tasked with extracting data from a legacy system that outputs fixed-width text files. The analyst needs to parse these files into a structured format. Which tool or method is most appropriate for this task?
5A company is merging two databases from different departments. In Database A, customer IDs are integers. In Database B, customer IDs are alphanumeric strings. To merge, the data analyst must reconcile these differences. Which step should be taken first?
6A data analyst needs to extract data from an API that returns JSON. The analyst wants to convert the JSON output into a tabular format for analysis. Which function in a scripting language is commonly used for this purpose?
7A data analyst is building a dataset from multiple sources and needs to ensure data quality. During the data acquisition phase, which activity is most important to perform?
8An organization needs to acquire data from a third-party vendor. The data will be used for regulatory reporting. Which of the following should be the primary consideration before acquiring the data?
9A data analyst is using SQL to extract data. The analyst wants to retrieve all records from a table named 'sales' where the 'amount' column is greater than 100. Which SQL clause should be used?
10Which TWO of the following are common methods for acquiring data from external sources?
11Which THREE of the following are best practices when performing data extraction for a data pipeline?
12Which TWO of the following are valid SQL clauses used to filter and sort data?
13What is the primary purpose of the HAVING clause in the query shown?
14A data analyst sees this error in the ETL logs. What is the most likely cause?
15A data engineer is configuring access to a data lake in Amazon S3. What does the JSON policy shown allow?
16A healthcare organization is building a data warehouse to support population health analytics. The data sources include: (1) an electronic health record (EHR) system with a relational database containing patient demographics, diagnoses, and medications; (2) a claims system that generates CSV files daily; (3) patient-generated health data from mobile apps via a REST API returning JSON. The data engineer needs to design a data acquisition process that runs nightly. The EHR system has a change tracking mechanism that logs changes with timestamps. The claims CSV files are appended daily. The API supports filtering by date. The data warehouse uses a star schema with fact and dimension tables. The engineer must ensure data consistency and minimize load times. Which approach should the engineer take?
17A retail company is migrating its on-premises data warehouse to a cloud data warehouse. The current ETL process extracts data from a transactional database (SQL Server) and a web analytics system (JSON logs). The ETL runs nightly and takes 6 hours. The business requires that the new cloud warehouse support real-time reporting with data latency of less than 15 minutes. The data engineer proposes using change data capture (CDC) from the SQL Server database and streaming the JSON logs via a message queue. However, management is concerned about cost and complexity. The engineer must design a solution that meets the latency requirement while minimizing operational overhead. Which approach should the engineer recommend?
18A data analyst is merging two datasets from different departments. The analyst notices that the 'CustomerID' field in the first dataset is stored as an integer, while in the second dataset it is stored as a string with leading zeros. Which TWO steps should the analyst take to ensure successful data integration?
19Based on the exhibit, what is the most likely cause of the import failure?
20A marketing company is building a customer segmentation model. The data team has access to two sources: a CRM database with customer demographics and purchase history, and a third-party data provider that offers social media activity scores. The CRM data is updated daily, while the third-party data is refreshed weekly on Sundays. The analyst needs to create a unified dataset for the model training scheduled for Wednesday morning. The analyst runs a SQL query to join the two tables on CustomerID, but the resulting dataset has far fewer rows than expected. Upon investigation, the analyst finds that many customers in the CRM do not have matching records in the third-party data. Additionally, some customers in the third-party data have multiple entries due to unresolved duplicates. The analyst must produce the most complete dataset possible while maintaining data quality. Which course of action should the analyst take?
21Drag and drop the steps to perform a data backup using the 3-2-1 rule in the correct order.
22Drag and drop the steps to perform a data audit in the correct order.
23Match each data analysis technique to its primary purpose.
24Match each database concept to its definition.
25A data analyst needs to collect customer sentiment data from social media platforms. Which data acquisition method is most appropriate?
26A company is merging two customer databases from different acquisitions. They need to identify duplicate records. Which data profiling technique is most effective?
27A data architect is designing an ETL pipeline to ingest streaming data from IoT sensors. The data must be available for real-time analytics. Which acquisition method is best?
28A marketing team wants to collect data on competitor pricing for similar products. Which data source is most appropriate?
29During data acquisition, an analyst notices that the data from an external vendor has inconsistent date formats. What is the first step the analyst should take?
30A data engineer needs to acquire data from a legacy mainframe system that does not support modern APIs or direct database connectivity. Which approach is most feasible?
31A small business wants to acquire customer feedback through a short questionnaire emailed after purchase. Which data acquisition method does this represent?
32An organization is integrating data from multiple sources into a data warehouse. They need to handle differences in data granularity (e.g., daily vs. hourly sales data). Which technique is most appropriate?
33A data analyst is using a public API to collect historical weather data. The API has a rate limit of 100 requests per minute, but the analyst needs to retrieve 10,000 records as quickly as possible. What strategy should be used?
34Which TWO are common methods for acquiring internal data? (Choose two.)
35Which THREE are best practices for data profiling during acquisition? (Choose three.)
36Which THREE are common challenges when acquiring data from external APIs? (Choose three.)
37Refer to the exhibit. An analyst runs this query before acquiring data from a PostgreSQL database. What is the primary purpose of this query?
38Refer to the exhibit. A data engineer is setting up data acquisition from an S3 bucket with this policy. What does the policy enforce?
39Refer to the exhibit. An analyst sees this log during data acquisition. What action should be taken first?
40A data analyst is tasked with combining customer data from a CRM system and a billing system. The CRM uses a GUID for customer ID, while billing uses an integer. Which approach should the analyst use to ensure a reliable merge?
41A data team needs to extract data from a legacy system that only supports flat file exports. Which data acquisition method is most appropriate?
42During a data mining project, an analyst discovers that a significant number of records have a negative value for the age field. What is the most appropriate first step?
43Refer to the exhibit. What does the query return?
44Refer to the exhibit. What data quality issue is indicated?
45Refer to the exhibit. If the date column is stored as a string in 'MM/DD/YYYY' format, what will be the result?
46A data analyst needs to identify duplicate customer records. Which TWO methods are commonly used? (Select two.)
47After merging two datasets, an analyst finds that the resulting dataset has many null values in some columns. Which TWO steps should the analyst take to address this? (Select two.)
48Which THREE data sources are suitable for web scraping? (Select three.)
49A retail company wants to analyze customer purchase patterns to identify products frequently bought together. Which data mining technique is most appropriate?
50A data analyst is importing a CSV file that contains a mixture of numeric and text fields. What is the most common issue when importing?
51During data acquisition, a data engineer uses a tool to extract data from a source system incrementally based on a timestamp column. Which method is being used?
52A data analyst discovers that a dataset contains multiple records for the same customer with different spellings (e.g., 'Jon' vs 'John'). Which data preparation step should be applied first?
53A financial institution is merging transaction data from two different systems. System A stores currency amounts as integers in cents, and System B stores as decimals in dollars. What is the best way to integrate the data?
54A data team is integrating customer data from three sources. After joining, they find that the count of unique customers is lower than expected. What is the most likely cause?
55A data analyst needs to merge two customer tables from different sources. One table uses 'CUST_ID' as the primary key, the other uses 'CustomerID'. To ensure accurate merging, the analyst should first:
56A company receives daily sales data in CSV format. The data includes a 'Date' column in MM/DD/YYYY format. To load this into a database that expects YYYY-MM-DD, the analyst should:
57A data analyst is tasked with collecting data from a web API that returns JSON. The API requires an API key in the header. Which method should be used to authenticate?
58An analyst needs to combine two datasets from different sources that share a common key but have different levels of granularity. Dataset A has daily sales per store, Dataset B has hourly foot traffic per store. The analyst wants to analyze correlation. Which approach is appropriate?
59A data team is designing an ETL process to extract data from an operational database daily. The database experiences heavy write loads during business hours. What is the best practice to minimize impact on operations?
60A healthcare organization acquires data from multiple hospitals with different patient record systems. The data includes patient IDs but no common identifier across systems. Which technique should be used to link records?
61A financial analyst is integrating data from multiple stock exchanges. One exchange provides trade timestamps in UTC, another in Eastern Time. The analyst needs accurate time synchronization for time-series analysis. What is the best approach?
62An e-commerce company is merging customer data from three legacy systems. Two systems use email as unique identifier, but one system allows multiple customers per email. The third uses phone number. To create a unified customer view, the analyst should first:
63A data engineer is tasked with acquiring data from a third-party vendor that provides daily file drops via SFTP. The files are large (10 GB each). The pipeline must load data into a data warehouse. Which approach optimizes for speed and reliability?
64A data analyst is validating a dataset acquired from an external source. Which TWO actions are appropriate for data quality assessment?
65A company is acquiring social media data via a public API. Which TWO considerations are important for ensuring ethical and legal compliance?
66A data scientist is merging retail transaction data from online and in-store sources. Which THREE steps are required to ensure data consistency?
67A data analyst receives the above JSON snippet from a web API. The analyst needs to extract the email addresses for all customers. Which JSONPath expression should be used?
68An analyst is reviewing the above SQL query used to acquire data. What does this query retrieve?
69A data pipeline log shows the above error. Which data transformation should be applied during acquisition?
70A marketing team wants to analyze customer sentiment from social media posts. Which data acquisition method is most appropriate?
71A data analyst needs to combine sales data from multiple regional databases with different schemas. Which process is best?
72An organization is acquiring data from an external vendor. The vendor provides a flat file with inconsistent delimiters and missing values. Which step should be performed first in data acquisition?
73A data analyst is tasked with gathering data from a legacy system that only exports CSV files. The files contain headers but no data types. Which tool would best facilitate initial data exploration?
74A company wants to collect real-time clickstream data from its website. Which acquisition method is most suitable?
75A financial institution needs to acquire credit transaction data from multiple sources while ensuring compliance with data privacy regulations. What is the most critical step?
76A data analyst is extracting data from a relational database using SQL. Which clause is essential for limiting the rows retrieved to only those needed?
77An e-commerce company is acquiring product data from multiple supplier APIs. The APIs return JSON with inconsistent field naming conventions. Which data acquisition technique should be applied?
78A data team is using web scraping to collect competitor pricing data. The target website has anti-scraping measures like CAPTCHAs and rate limiting. Which approach is most effective?
79Which TWO are examples of internal data sources? (Select exactly 2)
80A data analyst is evaluating data quality issues during acquisition. Which TWO issues are most likely to arise from merging data from different sources? (Select exactly 2)
81Which THREE are best practices for acquiring data via web scraping? (Select exactly 3)
82Refer to the exhibit. What is the most likely issue causing the unexpectedly low count?
83Refer to the exhibit. What is the most likely cause of the extraction failure?
84A retail company is acquiring sales data from 150 stores worldwide. Each store sends daily CSV files via email to a central email address. The data acquisition process is manual: an intern downloads each attachment and copies it into a shared folder. The shared folder is then accessed by an ETL tool that loads data into a data warehouse. Recently, the data warehouse has been missing records for several stores. The intern reports that some emails are not being received or are delayed. The company needs to improve the reliability and timeliness of data acquisition. Which course of action should be taken first?
85A marketing analyst needs to combine customer data from a CRM database with social media engagement data from a third-party API. Which data acquisition method is most appropriate?
86A data analyst is tasked with collecting data from multiple spreadsheets provided by different departments. Each spreadsheet has different column names and formats. What is the best first step?
87A data engineer is designing an ETL pipeline to extract sales data from a legacy on-premise database and load it into a cloud data warehouse. The database is slow and queries during business hours affect performance. Which extraction strategy minimizes impact?
88A research firm is acquiring data from public government databases via API. The API rate limits at 100 requests per minute. They need to download 10,000 records, but each request returns a maximum of 100 records. What is the most efficient approach to ensure complete acquisition without being blocked?
89Which TWO are valid data acquisition methods? (Select two.)
90Which THREE are challenges in acquiring data from external sources? (Select three.)
91A retail company's data analytics team needs to acquire point-of-sale (POS) transaction data from 200 stores daily. Each store sends a CSV file via email at the end of the day. The files often arrive late, have inconsistent column names (e.g., "StoreID", "Store_ID", "store_id"), and occasionally contain corrupted rows. The team manually processes these files, leading to frequent errors and delays. The company wants to automate the acquisition process to ensure data is available by 9 AM the next business day with high quality. Which approach best addresses these issues?
92A healthcare organization collects patient questionnaire data via paper forms at clinics. The forms are scanned and sent to a central office, where staff manually enter data into an electronic system. This process is slow and error-prone. The organization wants to reduce manual entry errors and speed up data availability. Which method should they adopt?
93A logistics company receives GPS tracking data from fleet vehicles at 1-second intervals via a cellular network. The data is used to optimize routes and monitor driver behavior. Recently, the data acquisition system has been missing updates for some vehicles when they pass through tunnels or remote areas. The data team notices gaps during these periods. The company needs a solution to ensure near-real-time data continuity. What should they do?
94An e-commerce company wants to integrate product pricing data from competitor websites to adjust its own prices dynamically. They plan to scrape pricing pages every hour. However, the competitors' websites have anti-scraping measures such as IP blocking and CAPTCHAs. The company's legal team also advises caution regarding terms of service. Which data acquisition strategy is both effective and compliant?
95A financial analytics firm needs to acquire historical stock market tick data (millions of records per day) from a data vendor. The vendor provides data via FTP in binary format. The firm's existing infrastructure uses on-premise servers with limited storage and processing power. They need to stream the data into a cloud data lake for analysis. However, the binary format is proprietary and requires a licensed decoder. The budget is constrained. Which approach best meets the data acquisition requirements?
96A social media monitoring company collects public tweets using the Twitter API. The API has a tiered access: free tier allows 500,000 tweets per month, and paid tier allows 2 million tweets per month. The company needs to collect 1.5 million tweets per month for analysis. They are on a free tier but have been exceeding the limit, causing account suspension. They need a sustainable solution without significantly increasing costs. What should they do?
97A data analyst is performing data acquisition from multiple source files. Which TWO data profiling tasks should the analyst complete before loading the data into the target system?
98Refer to the exhibit. A data analyst is trying to extract data from a SQL Server database but receives the error. Which configuration change should the analyst recommend to the database administrator?
99A large retail company is integrating customer data from two separate CRM systems into a new data warehouse. System A stores customer IDs as integers (e.g., 12345), while System B stores them as alphanumeric strings (e.g., 'CUST-12345-X'). Additionally, some customers exist in both systems but with slight name variations (e.g., 'John Smith' vs 'Jon Smith'). The data warehouse requires a unified customer table with a single unique identifier for each customer. The analyst needs to design the data acquisition process. Which of the following is the most appropriate first step?
The Mining and Acquiring Data domain covers the key concepts tested in this area of the DA0-001 exam blueprint published by CompTIA. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DA0-001 domains — no account required.
The Courseiva DA0-001 question bank contains 99 questions in the Mining and Acquiring Data domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Mining and Acquiring Data domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included