Practice DA0-001 Mining Data questions with full explanations on every answer.
Start practicing
Mining Data — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
A data analyst needs to retrieve all unique job titles from an employees table. Which SQL keyword should be used in the SELECT clause?
2A data analyst is profiling a new dataset containing customer information. When assessing data quality, which metric would be most appropriate to determine if the 'email' column contains valid email addresses?
3A data analyst is working with a sales table that contains columns: sale_id, product_id, sale_date, and amount. They need to calculate a 7-day moving average of sales amount for each product, ordered by sale_date. Which window function syntax should they use?
4A data analyst needs to extract the year from a column named 'order_date' in a SQL database. The database supports standard SQL functions. Which function should they use?
5A data analyst wants to retrieve the top 5 highest-paid employees from a table named 'employees' that has columns 'employee_id', 'salary', and 'name'. Which SQL query should they use?
6A data analyst is performing exploratory data analysis on a dataset containing house prices. They want to identify outliers in the 'price' column using the IQR method. The first quartile (Q1) is $200,000, the third quartile (Q3) is $350,000, and the IQR is $150,000. What is the upper bound for identifying outliers?
7A data analyst needs to perform stratified sampling on a customer database to ensure proportional representation across three regions: North (40%), South (30%), and West (30%). The total sample size required is 1,000. How many customers should be sampled from the North region?
8A data analyst uses a CTE to find employees who earn more than the average salary in their department. Which SQL clause is used to define the CTE?
9A data analyst is using pandas in Python to merge two DataFrames: sales (columns: sale_id, product_id, amount) and products (columns: product_id, product_name). Which pandas function should they use to combine these DataFrames on the 'product_id' column?
10In SQL, which string function would you use to remove leading and trailing spaces from a column named 'city'?
11A data analyst is writing a query to rank products by total sales amount within each category. They want ties to have the same rank and no gaps in the ranking sequence. Which window function should they use?
12A data analyst needs to count the number of orders placed by each customer, but only for customers who have placed more than 5 orders. Which SQL clause should be used to filter the aggregated results?
13A data analyst is performing data profiling on a customer table. Which TWO of the following are key metrics to assess data quality? (Select TWO.)
14A data analyst is using a recursive CTE to traverse a hierarchical organizational chart. Which THREE components are required to define a recursive CTE? (Select THREE.)
15A data analyst wants to sample a large dataset of customer transactions. Which TWO sampling methods are probability-based and ensure every element has a known chance of being selected? (Select TWO.)
16In a dataset of customer orders, you need to count the number of distinct customers who have placed orders. Which SQL aggregate function should you use?
17You are using pandas in Python to clean a dataset. You notice several rows with missing values in the 'age' column. Which method would you use to remove those rows?
18A data analyst wants to retrieve the top 5 highest-paid employees from an 'employees' table, including ties. Which SQL clause should be used?
19You are analyzing sales data and need to calculate the moving average of monthly sales over the previous 3 months for each month. Which type of function is best suited for this task?
20A data quality assessment reveals that a column named 'email' contains values like 'user@example' (missing domain extension). Which data profiling technique would best identify such pattern violations?
21A data analyst needs to sample 10% of customers from each of three regions (North, South, Central) to ensure proportional representation. Which sampling method should be used?
22In SQL, you want to retrieve all products whose names start with 'Pro'. Which WHERE clause should you use?
23You have a table 'Orders' with columns order_id, customer_id, order_date, and amount. You need to write a query that returns each customer's most recent order date and the amount for that order. Which approach is correct?
24During exploratory data analysis, you calculate the IQR for a numeric column and find that several data points fall below Q1 - 1.5*IQR. These points are likely:
25You have a hierarchical table 'Employees' with columns emp_id, emp_name, manager_id (referencing emp_id). You need to generate a full reporting chain from a given employee up to the CEO. Which SQL construct is most appropriate?
26A data analyst wants to extract the year from a date column 'order_date' in a SQL database. Which function should be used?
27In pandas, you have a DataFrame 'df' with columns 'product' and 'sales'. You want to calculate the total sales per product. Which method should you use?
28A data analyst is performing data profiling on a customer table. Which TWO metrics are most useful for understanding the completeness of the data? (Choose two.)
29A data analyst needs to identify the top 3 most frequent product categories from a sales table. Which SQL techniques can be used to achieve this? (Choose two.)
30A data analyst is cleaning text data in a SQL database. Which THREE string functions are commonly used to standardize and clean text? (Choose three.)
31A data analyst needs to retrieve only unique job titles from the 'employees' table. Which SQL keyword should be used in the SELECT clause?
32A dataset contains sales transactions with columns 'order_date', 'amount', and 'region'. The analyst wants to calculate the total sales per region for orders placed in 2023, but only include regions where total sales exceed $10,000. Which SQL clause should be used to filter the aggregated results?
33In a table 'employee_hierarchy' with columns 'employee_id', 'manager_id', and 'employee_name', an analyst needs to generate a list of all employees under a specific manager, including multiple levels of subordinates. Which SQL construct is most appropriate for querying this hierarchical data efficiently?
34A data analyst runs a query to count the number of customers in each city. The query uses COUNT(*) and GROUP BY city. However, the result includes NULL for some cities. What will COUNT(*) return for a group where the city is NULL?
35A table 'orders' contains columns 'order_id', 'customer_id', 'order_date', and 'total'. An analyst needs to find orders placed between January 1, 2023 and December 31, 2023. Which WHERE clause is correct?
36A data analyst is performing data profiling on a customer dataset. Which metric would best reveal the number of distinct values in the 'state' column?
37An analyst is using Python pandas and has a DataFrame 'sales' with columns 'date', 'product', 'revenue'. They need to create a pivot table showing total revenue per product per month. Which pandas function is most appropriate?
38A dataset contains a 'salary' column. The analyst wants to identify outliers using the IQR method. If Q1 = 40,000 and Q3 = 70,000, what is the upper threshold for a non-outlier?
39An analyst is sampling a large customer database to estimate the average purchase amount. To ensure that the sample proportionally represents different customer segments (e.g., age groups), which sampling method should be used?
40Which SQL aggregate function would an analyst use to calculate the average value of a numeric column?
41A data analyst is cleaning a dataset and finds that some cells in the 'email' column contain leading spaces. Which string function should be used to remove these spaces?
42In a table 'sales_team' with columns 'salesperson', 'quarter', and 'revenue', an analyst wants to assign a rank to each salesperson within their quarter based on revenue, with the highest revenue getting rank 1. However, if two salespeople have the same revenue, they should receive the same rank, and the next rank should be the next consecutive integer (no gaps). Which window function should be used?
43A data analyst is conducting exploratory data analysis (EDA) on a dataset. Which TWO tasks are typically performed during EDA? (Select two.)
44An analyst is using SQL to analyze employee data. Which THREE of the following are valid uses of the WHERE clause? (Select three.)
45A data analyst needs to perform a stratified random sample of a customer database. Which TWO steps are essential for this sampling method? (Select two.)
46A data analyst needs to count the number of distinct product categories in a table named 'products'. Which SQL function should be used in the SELECT clause?
47A data analyst wants to identify customers whose last name starts with 'Mc' from the 'customers' table. Which WHERE clause condition should be used?
48A data analyst is reviewing sales data and wants to find orders where the order total is between $100 and $500, inclusive. Which WHERE clause is correct?
49A data analyst runs the query: SELECT AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 60000. What is the purpose of the HAVING clause?
50Which SQL function can be used to extract the year from a date column 'order_date'?
51A data analyst wants to assign a unique sequential integer to each row in a result set, starting at 1, based on the order of the 'sales_amount' column descending. Which window function should be used?
52A data analyst is using a recursive CTE to traverse an organizational hierarchy. What is the purpose of the anchor member in the recursive CTE?
53A data analyst is profiling a dataset and finds that the 'email' column contains some NULL values. Which SQL query can be used to count how many rows have a NULL email?
54Which data sampling method involves selecting every k-th element from a list after a random start?
55A data analyst is using pandas to read a CSV file named 'sales.csv'. Which line of code correctly reads the file into a DataFrame?
56A data analyst is using the IQR method to identify outliers in a dataset. The first quartile (Q1) is 25 and the third quartile (Q3) is 45. What is the upper bound for identifying outliers?
57A data analyst needs to create a new column 'full_name' by concatenating 'first_name' and 'last_name' with a space. Which SQL function should be used in the SELECT clause?
58A data analyst is performing data profiling on a customer table. Which TWO metrics are commonly used to assess the completeness of a column? (Select TWO.)
59A data analyst is investigating a correlation between two continuous variables. Which THREE of the following are appropriate steps in this exploratory data analysis? (Select THREE.)
60A data analyst wants to retrieve the top 5 highest-paid employees from the 'employees' table. Which SQL clauses could be used to achieve this? (Select TWO.)
61A data analyst needs to count the number of customers who have placed at least one order. Which SQL query should be used?
62A data analyst wants to find the top 5 products by total sales amount, but only for products that have been sold more than 50 times. Which SQL query accomplishes this?
63An analyst needs to compute a running total of sales for each department, ordered by date. Which window function is most appropriate?
64A data analyst wants to generate a report showing employee names and their department names, but some employees are not assigned to any department. The analyst wants to include all employees. Which JOIN type should be used?
65A dataset contains a column 'birthdate' in 'YYYY-MM-DD' format. The analyst needs to calculate the average age of customers as of today. Which combination of functions is most appropriate?
66A data analyst is performing data profiling on a customer table. Which metric would best help identify missing values in the 'phone' column?
67A data analyst wants to randomly select 100 customers from a database for a survey, ensuring that the sample reflects the proportion of male and female customers in the population. Which sampling method is most appropriate?
68In a table with columns 'employee_id' and 'manager_id', a data analyst needs to retrieve the hierarchy level of each employee, where the top manager has manager_id NULL. Which SQL feature is best suited?
69A data analyst is using pandas in Python to clean a dataset. Which method is most appropriate to replace missing numerical values with the median of the column?
70An analyst wants to identify outliers in a dataset using the IQR method. Which values are typically considered outliers?
71A dataset contains transaction amounts with a few extremely high values. The analyst wants to reduce the impact of these outliers on the average. Which measure of central tendency is most robust?
72A data analyst uses a CTE to simplify a complex query. Which keyword is used to define a CTE?
73A data analyst is exploring a sales dataset and wants to identify columns that are likely to be foreign keys. Which TWO characteristics would indicate a foreign key?
74A data analyst is using Python pandas to perform exploratory data analysis. Which THREE methods are commonly used to assess data quality and distributions?
75A data analyst needs to sample records from a large dataset for a quick analysis. Which TWO sampling methods are examples of probability sampling?
76A data analyst runs the following query: SELECT DISTINCT city FROM customers. What is the primary purpose of using the DISTINCT keyword in this query?
77In a sales database, an analyst needs to retrieve all orders where the order amount is between $100 and $500. Which WHERE clause should be used?
78A data analyst uses the following query: SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 50000. What is the purpose of the HAVING clause in this query?
79A data analyst wants to concatenate first_name and last_name columns with a space in between. Which string function combination should be used in SQL?
80An analyst needs to retrieve the year from an order_date column (datetime type). Which function should be used in SQL?
81A data analyst is using a window function to assign a unique rank to each employee within their department based on salary, with ties receiving the same rank and leaving gaps. Which function should be used?
82A data analyst needs to create a recursive CTE to traverse a hierarchical employee-manager table. Which of the following is a key requirement for a recursive CTE?
83A data analyst is performing data profiling on a customer table. Which metric provides the number of unique values in a column?
84A data analyst wants to ensure a sample proportionally represents different regions in a population. Which sampling method should be used?
85During EDA, an analyst calculates the Z-score for each data point in a dataset. A data point with a Z-score of 3.5 is identified. What does this indicate?
86A data analyst uses Python's pandas library to read a CSV file into a DataFrame. Which function is used to read the file?
87An analyst writes a SQL query that uses a window function: SELECT employee_id, salary, LAG(salary, 1) OVER (ORDER BY salary DESC) AS prev_salary FROM employees. What does the LAG function return for the row with the highest salary?
88A data analyst needs to identify duplicate customer records based on email and phone number. Which SQL techniques can be used to find duplicates? (Select TWO).
89A data analyst is performing EDA on a dataset with numerical features. Which methods are appropriate for identifying outliers? (Select TWO).
90An analyst needs to aggregate sales data by region and product, then sort the results by total sales in descending order. Which SQL clauses are required? (Select THREE).
91A data analyst needs to retrieve all unique job titles from the employees table. Which SQL clause should be used with the SELECT statement?
92A data analyst wants to find customers whose last name starts with 'Mc' and have made purchases in 2023. The purchase table has a purchase_date column. Which SQL query accomplishes this?
93A data analyst is writing a query to rank products by total sales within each category, showing dense rank and avoiding gaps. Which window function should be used?
94An analyst needs to count the number of orders per customer but only for customers who have placed more than 5 orders. Which SQL construct allows filtering after aggregation?
95A data analyst wants to create a temporary result set that can be referenced within a single SQL statement. Which feature should be used?
96A data analyst is using pandas to clean a DataFrame. They need to replace missing values in the 'age' column with the median age. Which method should they use?
97During data profiling, an analyst wants to identify the number of distinct values in a column. Which SQL function should be used?
98A data analyst needs to sample 1000 customers from a database of 100,000 customers for a survey, ensuring every customer has an equal chance of selection. Which sampling method is most appropriate?
99In a dataset of employee salaries, the analyst notices one value that is significantly higher than the rest. Using the IQR method, which values are typically considered outliers?
100A data analyst wants to combine first_name and last_name columns into a single full_name column in a SQL query. Which string function should be used?
101A data analyst needs to calculate the running total of sales for each product over time. Which window function clause is essential for this calculation?
102An analyst is performing EDA and wants to measure the strength and direction of linear relationship between two continuous variables. Which statistical measure should they compute?
103A data analyst is validating referential integrity between orders and customers tables. Which TWO of the following checks should the analyst perform?
104An analyst is using a CTE to compute hierarchical data. Which TWO statements about recursive CTEs are true?
105A data analyst wants to export a summary report from a DataFrame in pandas. Which THREE methods are commonly used for data export?
106A data analyst is using SQL to filter a sales table for transactions that occurred in either 'Q1' or 'Q3' of 2023 and have a sale amount greater than $100. Which WHERE clause correctly implements this condition?
107A data analyst is reviewing a dataset of customer transactions and wants to assess data quality by profiling the 'order_date' column. Which TWO profiling tasks are most appropriate for this date column? (Select TWO).
108An e-commerce company wants to analyze sales performance across product categories. The dataset includes transaction amounts and a column 'category' with values (Electronics, Clothing, Home). The analyst decides to use stratified sampling to ensure proportional representation. Which THREE steps are required to implement this? (Select THREE).
109A data analyst is using pandas to clean a DataFrame that contains missing values in the 'age' and 'income' columns. Which THREE pandas methods are appropriate for handling missing data? (Select THREE).
110An analyst needs to identify outliers in a numeric column 'transaction_amount' using the interquartile range (IQR) method. Which TWO steps are part of this process? (Select TWO).
111A data analyst uses a Common Table Expression (CTE) to query hierarchical employee data (manager_id references employee_id). Which THREE statements about recursive CTEs are correct? (Select THREE).
112A data analyst needs to retrieve the top 5 most expensive products from a 'products' table sorted by price descending. Which TWO SQL clauses are required to achieve this? (Select TWO).
113An analyst wants to use Python (pandas) to compute the average sales amount per region from a DataFrame 'df' with columns 'region' and 'sales'. Which TWO pandas operations are needed? (Select TWO).
The Mining Data domain covers the key concepts tested in this area of the DA0-001 exam blueprint published by CompTIA. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DA0-001 domains — no account required.
The Courseiva DA0-001 question bank contains 113 questions in the Mining Data domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Mining Data domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included