DA0-001 · topic practice

Mining and Acquiring Data practice questions

Practise CompTIA Data+ DA0-001 Mining and Acquiring Data practice questions — original exam-style scenarios with answer choices, explanations, and analysis of common mistakes.

Courseiva uses original exam-style practice questions designed for learning and revision. The goal is to understand the concepts, recognise exam patterns, and improve through explanations — not memorise copied exam dumps.

Reviewed byJohnson Ajibi· MSc IT Security
20 questionsDomain: Mining and Acquiring Data

What the exam tests

What to know about Mining and Acquiring Data

Mining and Acquiring Data questions test whether you can apply the concept in context, not just recognise a definition.

How the topic appears in realistic exam-style scenarios.

Which detail in the question changes the correct answer.

How to eliminate plausible but wrong options.

How to connect the question back to the wider exam objective.

Watch out for

Common Mining and Acquiring Data exam traps

  • Answering from memory before reading the full scenario.
  • Missing a constraint such as cost, availability, security, scope or command context.
  • Choosing a broad answer when the question asks for the most specific fix.
  • Ignoring why the wrong options are tempting.

Practice set

Mining and Acquiring Data questions

20 questions · select your answer, then reveal the explanation

A data analyst is pulling data from a production database for a report. The database contains customer orders with a column 'order_date'. The analyst notices that some orders have dates in the future. Which data quality issue does this represent?

A data engineer is designing a data pipeline to ingest streaming data from IoT sensors. The sensors send data every second, and the pipeline must handle bursts of up to 10,000 messages per second. Which approach is most appropriate for capturing this data before processing?

A data analyst needs to combine two datasets: one contains customer information (customer_id, name, address) and the other contains order information (order_id, customer_id, order_date). The analyst wants to include all customers, even those who have not placed orders. Which type of join should be used?

A data analyst is tasked with extracting data from a legacy system that outputs fixed-width text files. The analyst needs to parse these files into a structured format. Which tool or method is most appropriate for this task?

A company is merging two databases from different departments. In Database A, customer IDs are integers. In Database B, customer IDs are alphanumeric strings. To merge, the data analyst must reconcile these differences. Which step should be taken first?

A data analyst needs to extract data from an API that returns JSON. The analyst wants to convert the JSON output into a tabular format for analysis. Which function in a scripting language is commonly used for this purpose?

A data analyst is building a dataset from multiple sources and needs to ensure data quality. During the data acquisition phase, which activity is most important to perform?

An organization needs to acquire data from a third-party vendor. The data will be used for regulatory reporting. Which of the following should be the primary consideration before acquiring the data?

A data analyst is using SQL to extract data. The analyst wants to retrieve all records from a table named 'sales' where the 'amount' column is greater than 100. Which SQL clause should be used?

Which TWO of the following are common methods for acquiring data from external sources?

Which THREE of the following are best practices when performing data extraction for a data pipeline?

Which TWO of the following are valid SQL clauses used to filter and sort data?

What is the primary purpose of the HAVING clause in the query shown?

Exhibit

Refer to the exhibit.

```
SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(order_id) > 5
ORDER BY order_count DESC;
```

A data analyst sees this error in the ETL logs. What is the most likely cause?

Exhibit

Refer to the exhibit.

Error log:

```
2025-03-15 10:23:45 ERROR: ORA-12034: materialized view log on "SCHEMA"."SALES" is newer than last refresh
2025-03-15 10:23:45 INFO: Refresh of materialized view "SALES_MV" failed
```

A data engineer is configuring access to a data lake in Amazon S3. What does the JSON policy shown allow?

Exhibit

Refer to the exhibit.

```
{
  "policy": {
    "Statement": [
      {
        "Effect": "Allow",
        "Action": ["s3:GetObject"],
        "Resource": "arn:aws:s3:::data-bucket/*"
      }
    ]
  }
}
```
Question 16hardmultiple choice
Read the full NAT/PAT explanation →

A healthcare organization is building a data warehouse to support population health analytics. The data sources include: (1) an electronic health record (EHR) system with a relational database containing patient demographics, diagnoses, and medications; (2) a claims system that generates CSV files daily; (3) patient-generated health data from mobile apps via a REST API returning JSON. The data engineer needs to design a data acquisition process that runs nightly. The EHR system has a change tracking mechanism that logs changes with timestamps. The claims CSV files are appended daily. The API supports filtering by date. The data warehouse uses a star schema with fact and dimension tables. The engineer must ensure data consistency and minimize load times. Which approach should the engineer take?

A retail company is migrating its on-premises data warehouse to a cloud data warehouse. The current ETL process extracts data from a transactional database (SQL Server) and a web analytics system (JSON logs). The ETL runs nightly and takes 6 hours. The business requires that the new cloud warehouse support real-time reporting with data latency of less than 15 minutes. The data engineer proposes using change data capture (CDC) from the SQL Server database and streaming the JSON logs via a message queue. However, management is concerned about cost and complexity. The engineer must design a solution that meets the latency requirement while minimizing operational overhead. Which approach should the engineer recommend?

A data analyst is merging two datasets from different departments. The analyst notices that the 'CustomerID' field in the first dataset is stored as an integer, while in the second dataset it is stored as a string with leading zeros. Which TWO steps should the analyst take to ensure successful data integration?

Based on the exhibit, what is the most likely cause of the import failure?

Exhibit

Refer to the exhibit.

Data Import Log:
[2024-03-15 10:22:34] INFO: Starting import from source 'sales_raw.csv'
[2024-03-15 10:22:35] WARN: Row 1502: 'price' field contains non-numeric value '12.5A'. Skipping row.
[2024-03-15 10:22:36] ERROR: Row 3450: 'date' field value '2024-02-30' is invalid. Import halted.
[2024-03-15 10:22:36] INFO: Import process terminated with errors.

A marketing company is building a customer segmentation model. The data team has access to two sources: a CRM database with customer demographics and purchase history, and a third-party data provider that offers social media activity scores. The CRM data is updated daily, while the third-party data is refreshed weekly on Sundays. The analyst needs to create a unified dataset for the model training scheduled for Wednesday morning. The analyst runs a SQL query to join the two tables on CustomerID, but the resulting dataset has far fewer rows than expected. Upon investigation, the analyst finds that many customers in the CRM do not have matching records in the third-party data. Additionally, some customers in the third-party data have multiple entries due to unresolved duplicates. The analyst must produce the most complete dataset possible while maintaining data quality. Which course of action should the analyst take?

Free account

Track your progress over time

Create a free account to save your results and see which topics improve across sessions.

Focused Mining and Acquiring Data sessions

Start a Mining and Acquiring Data only practice session

Every question in these sessions is drawn from the Mining and Acquiring Data domain — nothing else.

Related practice questions

Related DA0-001 topic practice pages

Move into related areas when this topic feels solid.

Frequently asked questions

What does the DA0-001 exam test about Mining and Acquiring Data?
Mining and Acquiring Data questions test whether you can apply the concept in context, not just recognise a definition.
How should I use these practice questions?
Select your answer before revealing the explanation. Then read why each option is right or wrong — this active recall approach builds retention far faster than re-reading notes.
Can I practise just Mining and Acquiring Data questions in a focused session?
Yes — the session launcher on this page draws every question from the Mining and Acquiring Data domain. Use a 10-question session first to gauge your baseline, then move to 20 or 30 once the weak spots are clear.
Where can I practise other DA0-001 topics?
Use the topic links above to move to related areas, or go back to the DA0-001 question bank to see all topics.
Are these real exam questions or dumps?
These are original practice questions written to test the same concepts the DA0-001 exam covers. They are not copied from any real exam or dump site.