CCNA Dap Mining Data Questions

38 of 113 questions · Page 2/2 · Dap Mining Data topic · Answers revealed

76
MCQhard

A data analyst is using a recursive CTE to traverse an organizational hierarchy. What is the purpose of the anchor member in the recursive CTE?

A.It provides the initial seed or starting rows for the recursion.
B.It filters the final output of the recursive CTE.
C.It specifies how to join the CTE with itself recursively.
D.It defines the termination condition for the recursion.
AnswerA

The anchor member returns the base result set.

Why this answer

The anchor member initializes the recursion with the base result set.

77
Multi-Selectmedium

An analyst wants to use Python (pandas) to compute the average sales amount per region from a DataFrame 'df' with columns 'region' and 'sales'. Which TWO pandas operations are needed? (Select TWO).

Select 2 answers
A.df.fillna(0)
B.df.pivot_table(index='region', values='sales', aggfunc='mean')
C.df['sales'].apply(np.sqrt)
D.df.merge(df2, on='region')
E.df.groupby('region')['sales'].mean()
AnswersB, E

Pivot table with mean aggregation.

Why this answer

To compute average per group, you can use groupby() followed by mean(), or pivot_table() with aggfunc='mean'. merge() combines DataFrames, apply() can be used but is less direct, and fillna() handles missing values.

78
MCQhard

In a table with columns 'employee_id' and 'manager_id', a data analyst needs to retrieve the hierarchy level of each employee, where the top manager has manager_id NULL. Which SQL feature is best suited?

A.A window function with ROW_NUMBER()
B.A recursive CTE
C.A GROUP BY clause with aggregation
D.A self-join with a LEFT JOIN
AnswerB

Recursive CTE can iterate through levels to assign hierarchy depth.

Why this answer

Recursive CTE can traverse hierarchical data to compute levels.

79
MCQmedium

A data analyst is reviewing sales data and wants to find orders where the order total is between $100 and $500, inclusive. Which WHERE clause is correct?

A.total > 100 AND total < 500
B.total IN (100, 500)
C.total BETWEEN 100 AND 500
D.total >= 100 OR total <= 500
AnswerC

BETWEEN includes both boundary values.

Why this answer

BETWEEN is inclusive of both endpoints.

80
MCQmedium

A dataset contains sales transactions with columns 'order_date', 'amount', and 'region'. The analyst wants to calculate the total sales per region for orders placed in 2023, but only include regions where total sales exceed $10,000. Which SQL clause should be used to filter the aggregated results?

A.HAVING
B.WHERE
C.GROUP BY
D.FILTER
AnswerA

HAVING filters aggregated results after GROUP BY.

Why this answer

The HAVING clause filters groups after aggregation, whereas WHERE filters rows before grouping.

81
MCQeasy

A data analyst wants to retrieve the top 5 highest-paid employees from a table named 'employees' that has columns 'employee_id', 'salary', and 'name'. Which SQL query should they use?

A.SELECT TOP 5 name, salary FROM employees ORDER BY salary DESC;
B.SELECT name, salary FROM employees ORDER BY salary DESC LIMIT 5;
C.SELECT name, salary FROM employees ORDER BY salary ASC LIMIT 5;
D.SELECT name, salary FROM employees WHERE ROWNUM <= 5 ORDER BY salary DESC;
AnswerB

Correct syntax.

Why this answer

ORDER BY salary DESC sorts from highest to lowest, and LIMIT 5 restricts to the first 5 rows.

82
Multi-Selectmedium

A data analyst is reviewing a dataset of customer transactions and wants to assess data quality by profiling the 'order_date' column. Which TWO profiling tasks are most appropriate for this date column? (Select TWO).

Select 2 answers
A.Pattern analysis (e.g., format consistency)
B.Count of null values
C.Variance
D.Cardinality (number of unique values)
E.Data type verification
AnswersB, E

Null count is a standard profiling check for any column.

Why this answer

Profiling a date column typically includes checking for null values and verifying the data type. Cardinality and pattern analysis are more relevant for categorical or string columns; variance is for numeric data.

83
MCQmedium

A data analyst uses a CTE to simplify a complex query. Which keyword is used to define a CTE?

A.DEFINE
B.CTE
C.DECLARE
D.WITH
AnswerD

CTEs are defined using the WITH keyword.

Why this answer

The WITH clause introduces a CTE.

84
MCQhard

An analyst writes a SQL query that uses a window function: SELECT employee_id, salary, LAG(salary, 1) OVER (ORDER BY salary DESC) AS prev_salary FROM employees. What does the LAG function return for the row with the highest salary?

A.The same salary value
B.NULL
C.The next highest salary
D.Zero
AnswerB

LAG returns NULL if there is no preceding row.

Why this answer

LAG returns the previous row's value in the ordered partition. For the first row (highest salary), there is no previous row, so it returns NULL.

85
MCQmedium

An analyst is sampling a large customer database to estimate the average purchase amount. To ensure that the sample proportionally represents different customer segments (e.g., age groups), which sampling method should be used?

A.Systematic sampling
B.Simple random sampling
C.Cluster sampling
D.Stratified sampling
AnswerD

Stratified sampling ensures proportional representation from each stratum.

Why this answer

Stratified sampling divides the population into strata (e.g., age groups) and samples proportionally from each stratum.

86
Multi-Selecthard

An analyst is using a CTE to compute hierarchical data. Which TWO statements about recursive CTEs are true?

Select 2 answers
A.The recursive member must reference the CTE name
B.The anchor member is the first SELECT that does not reference the CTE
C.Recursive CTEs can only be used for numerical sequences
D.Recursive CTEs must include a UNION ALL operator
E.Recursive CTEs cannot be used with GROUP BY
AnswersB, D

Anchor provides initial rows.

Why this answer

Recursive CTEs consist of an anchor member (initial result) and a recursive member that references the CTE itself.

87
Multi-Selecthard

A data analyst is using a recursive CTE to traverse a hierarchical organizational chart. Which THREE components are required to define a recursive CTE? (Select THREE.)

Select 3 answers
A.ORDER BY clause
B.Anchor member
C.WITH clause
D.Recursive member
E.UNION ALL operator
AnswersB, D, E

The anchor member defines the initial result set.

Why this answer

A recursive CTE requires an anchor member (initial query), a recursive member (that references the CTE itself), and the UNION ALL operator to combine them. The WITH clause is the outer syntax, not part of the recursion itself.

88
MCQhard

A data analyst needs to perform stratified sampling on a customer database to ensure proportional representation across three regions: North (40%), South (30%), and West (30%). The total sample size required is 1,000. How many customers should be sampled from the North region?

A.333
B.500
C.300
D.400
AnswerD

Correct: 40% of 1000.

Why this answer

Stratified sampling with proportional allocation: sample size per stratum = (stratum proportion) * total sample size. North = 0.40 * 1000 = 400.

89
MCQhard

A data analyst needs to create a recursive CTE to traverse a hierarchical employee-manager table. Which of the following is a key requirement for a recursive CTE?

A.The CTE must include a WHERE clause in the recursive member
B.The CTE must use the RECURSIVE keyword in the WITH clause
C.The recursive CTE must have at least one anchor member that does not reference the CTE
D.The recursive member must use UNION instead of UNION ALL
AnswerC

The anchor member provides the starting set; the recursive member references the CTE.

Why this answer

A recursive CTE must have an anchor member (non-recursive) and a recursive member that references the CTE name, connected by UNION ALL.

90
MCQmedium

A data analyst wants to assign a unique sequential integer to each row in a result set, starting at 1, based on the order of the 'sales_amount' column descending. Which window function should be used?

A.DENSE_RANK() OVER (ORDER BY sales_amount DESC)
B.RANK() OVER (ORDER BY sales_amount DESC)
C.NTILE(1) OVER (ORDER BY sales_amount DESC)
D.ROW_NUMBER() OVER (ORDER BY sales_amount DESC)
AnswerD

Correct: assigns sequential numbers based on the order.

Why this answer

ROW_NUMBER() assigns a unique sequential integer to each row within a partition, starting at 1.

91
Multi-Selecteasy

A data analyst needs to retrieve the top 5 most expensive products from a 'products' table sorted by price descending. Which TWO SQL clauses are required to achieve this? (Select TWO).

Select 2 answers
A.HAVING COUNT(*) > 1
B.WHERE price > 100
C.ORDER BY price DESC
D.GROUP BY price
E.LIMIT 5
AnswersC, E

Sorts prices from highest to lowest.

Why this answer

ORDER BY DESC sorts prices descending, and LIMIT restricts to 5 rows. GROUP BY and HAVING are for aggregation; WHERE is for filtering but not needed here.

92
MCQhard

You have a table 'Orders' with columns order_id, customer_id, order_date, and amount. You need to write a query that returns each customer's most recent order date and the amount for that order. Which approach is correct?

A.SELECT customer_id, MAX(order_date), amount FROM Orders GROUP BY customer_id
B.SELECT customer_id, order_date, amount FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn FROM Orders) t WHERE rn = 1
C.SELECT customer_id, FIRST_VALUE(order_date) OVER (PARTITION BY customer_id ORDER BY order_date DESC), FIRST_VALUE(amount) OVER (PARTITION BY customer_id ORDER BY order_date DESC) FROM Orders
D.SELECT customer_id, order_date, amount FROM Orders WHERE order_date IN (SELECT MAX(order_date) FROM Orders GROUP BY customer_id)
AnswerB

Correctly identifies the most recent order per customer.

Why this answer

Using a window function with ROW_NUMBER() to rank orders per customer by date descending, then filtering for rank=1, gives the most recent order details. FIRST_VALUE() can also get the amount, but requires careful framing. GROUP BY with MAX(date) alone cannot get the corresponding amount.

93
MCQeasy

A data analyst needs to retrieve only unique job titles from the 'employees' table. Which SQL keyword should be used in the SELECT clause?

A.TOP
B.DISTINCT
C.UNIQUE
D.FILTER
AnswerB

DISTINCT returns unique rows.

Why this answer

The DISTINCT keyword removes duplicate rows from the result set, returning only unique values.

94
Multi-Selecthard

A data analyst is using Python pandas to perform exploratory data analysis. Which THREE methods are commonly used to assess data quality and distributions?

Select 3 answers
A.df.transpose()
B.df.describe()
C.df.info()
D.df.sort_values()
E.df.value_counts()
AnswersB, C, E

Provides summary statistics for numerical columns.

Why this answer

describe() gives summary statistics, info() shows data types and non-null counts, and value_counts() shows frequency distributions.

95
Multi-Selecteasy

A data analyst wants to export a summary report from a DataFrame in pandas. Which THREE methods are commonly used for data export?

Select 3 answers
A.to_csv()
B.to_pickle()
C.to_excel()
D.to_sql()
E.to_json()
AnswersA, C, E

Exports to CSV file.

Why this answer

pandas provides to_csv, to_excel, and to_json for exporting data.

96
Multi-Selectmedium

A data analyst wants to retrieve the top 5 highest-paid employees from the 'employees' table. Which SQL clauses could be used to achieve this? (Select TWO.)

Select 2 answers
A.ORDER BY salary DESC
B.HAVING salary
C.ORDER BY salary ASC
D.LIMIT 5
E.GROUP BY salary
AnswersA, D

Sorts highest to lowest.

Why this answer

ORDER BY salary DESC sorts descending, and LIMIT/TOP/FETCH restricts rows.

97
MCQmedium

A data analyst is profiling a new dataset containing customer information. When assessing data quality, which metric would be most appropriate to determine if the 'email' column contains valid email addresses?

A.Pattern analysis
B.Null count
C.Cardinality
D.Row count
AnswerA

Pattern analysis can verify if values match the typical email format.

Why this answer

Pattern analysis (e.g., using regular expressions) can validate whether strings match the expected format of an email address. Row counts, null counts, and cardinality do not validate format.

98
MCQhard

A data analyst is using the IQR method to identify outliers in a dataset. The first quartile (Q1) is 25 and the third quartile (Q3) is 45. What is the upper bound for identifying outliers?

A.85
B.65
C.75
D.55
AnswerC

Correct: Q3 + 1.5*(Q3-Q1).

Why this answer

Upper bound = Q3 + 1.5 * IQR; IQR = Q3 - Q1 = 20; 1.5*20 = 30; 45+30 = 75.

99
MCQmedium

A data analyst uses the following query: SELECT department, AVG(salary) AS avg_salary FROM employees GROUP BY department HAVING AVG(salary) > 50000. What is the purpose of the HAVING clause in this query?

A.To ensure only departments with more than 50000 employees are shown
B.To filter individual employee records before grouping
C.To sort departments by average salary
D.To filter groups (departments) based on the average salary
AnswerD

HAVING filters groups after GROUP BY.

Why this answer

HAVING filters groups after aggregation, similar to WHERE but for aggregated results.

100
MCQeasy

A data analyst runs the following query: SELECT DISTINCT city FROM customers. What is the primary purpose of using the DISTINCT keyword in this query?

A.To sort the cities alphabetically
B.To count the number of cities
C.To filter cities that start with a specific letter
D.To remove duplicate city names
AnswerD

DISTINCT eliminates duplicate rows, returning unique city values.

Why this answer

DISTINCT removes duplicate rows from the result set. In this query, it returns each unique city name only once.

101
MCQmedium

A data analyst uses a CTE to find employees who earn more than the average salary in their department. Which SQL clause is used to define the CTE?

A.DECLARE
B.WITH
C.DEFINE
D.CTE
AnswerB

Correct. WITH defines a CTE.

Why this answer

Common Table Expressions (CTEs) are defined using the WITH keyword, followed by the CTE name and AS (query).

102
MCQhard

A data analyst needs to calculate the running total of sales for each product over time. Which window function clause is essential for this calculation?

A.ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
B.PARTITION BY product_id ORDER BY sale_date
C.PARTITION BY product_id
D.ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
AnswerD

Correct frame for running total.

Why this answer

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW defines the frame for running totals.

103
MCQhard

A dataset contains transaction amounts with a few extremely high values. The analyst wants to reduce the impact of these outliers on the average. Which measure of central tendency is most robust?

A.Mean
B.Median
C.Mode
D.Standard deviation
AnswerB

Median is robust to outliers.

Why this answer

Median is not affected by extreme values, while mean is sensitive.

104
MCQmedium

During exploratory data analysis, you calculate the IQR for a numeric column and find that several data points fall below Q1 - 1.5*IQR. These points are likely:

A.Normal variations within the distribution
B.The mode of the dataset
C.The median of the dataset
D.Outliers
AnswerD

The IQR method identifies outliers.

Why this answer

The IQR method defines outliers as points below Q1 - 1.5*IQR or above Q3 + 1.5*IQR. Points below the lower fence are considered outliers. They are not necessarily errors, but potential outliers.

They are not the median or mode.

105
MCQeasy

You are using pandas in Python to clean a dataset. You notice several rows with missing values in the 'age' column. Which method would you use to remove those rows?

A.df.drop_duplicates()
B.df.dropna()
C.df.fillna(0)
D.df.isna()
AnswerB

Removes rows with any missing values by default.

Why this answer

df.dropna() removes rows with any missing values by default. df.fillna() fills missing values, df.isna() returns a boolean mask, df.drop_duplicates() removes duplicate rows.

106
MCQhard

An analyst needs to compute a running total of sales for each department, ordered by date. Which window function is most appropriate?

A.ROW_NUMBER() OVER (PARTITION BY department ORDER BY date)
B.SUM(sales) OVER (ORDER BY date)
C.SUM(sales) OVER (PARTITION BY department ORDER BY date)
D.LAG(sales, 1) OVER (PARTITION BY department ORDER BY date)
AnswerC

Partitions by department and orders by date to compute running total per department.

Why this answer

SUM() with OVER(PARTITION BY department ORDER BY date) computes a running total within each department.

107
MCQmedium

A data analyst is performing data profiling on a customer table. Which metric provides the number of unique values in a column?

A.Row count
B.Cardinality
C.Standard deviation
D.Null count
AnswerB

Cardinality is the number of distinct values.

Why this answer

Cardinality refers to the number of distinct values in a column.

108
MCQhard

A data analyst is using a window function to assign a unique rank to each employee within their department based on salary, with ties receiving the same rank and leaving gaps. Which function should be used?

A.DENSE_RANK()
B.RANK()
C.ROW_NUMBER()
D.NTILE()
AnswerB

RANK() assigns same rank to ties and leaves gaps.

Why this answer

RANK() assigns the same rank to ties and leaves gaps (e.g., 1,1,3). DENSE_RANK() does not leave gaps.

109
MCQmedium

A data analyst is performing exploratory data analysis on a dataset containing house prices. They want to identify outliers in the 'price' column using the IQR method. The first quartile (Q1) is $200,000, the third quartile (Q3) is $350,000, and the IQR is $150,000. What is the upper bound for identifying outliers?

A.$500,000
B.$575,000
C.$425,000
D.$650,000
AnswerB

Correct upper bound.

Why this answer

The IQR method defines outliers as values below Q1 - 1.5*IQR or above Q3 + 1.5*IQR. Upper bound = 350,000 + (1.5 * 150,000) = 350,000 + 225,000 = $575,000.

110
MCQmedium

A data analyst is using SQL to filter a sales table for transactions that occurred in either 'Q1' or 'Q3' of 2023 and have a sale amount greater than $100. Which WHERE clause correctly implements this condition?

A.WHERE quarter IN ('Q1','Q3') OR amount > 100
B.WHERE quarter = 'Q1' OR quarter = 'Q3' AND amount > 100
C.WHERE quarter = 'Q1' AND amount > 100 OR quarter = 'Q3' AND amount > 100
D.WHERE (quarter = 'Q1' OR quarter = 'Q3') AND amount > 100
AnswerD

Correct: groups the OR and then AND with amount condition.

Why this answer

The correct clause uses parentheses to group the OR conditions and an AND to connect the amount condition. The other options either miss parentheses (changing logic) or mix AND/OR incorrectly.

111
Multi-Selecteasy

A data analyst is using pandas to clean a DataFrame that contains missing values in the 'age' and 'income' columns. Which THREE pandas methods are appropriate for handling missing data? (Select THREE).

Select 3 answers
A.dropna()
B.pivot_table()
C.merge()
D.apply() with a custom function
E.fillna()
AnswersA, D, E

Removes rows with missing values.

Why this answer

Common pandas methods for missing data include dropna (remove rows with NaN), fillna (replace NaN with a value), and apply with a custom function. Merge is for combining DataFrames; pivot_table is for reshaping.

112
Multi-Selecthard

An e-commerce company wants to analyze sales performance across product categories. The dataset includes transaction amounts and a column 'category' with values (Electronics, Clothing, Home). The analyst decides to use stratified sampling to ensure proportional representation. Which THREE steps are required to implement this? (Select THREE).

Select 3 answers
A.Calculate the proportion of each category in the population
B.Take a random sample from each stratum with size proportional to its population proportion
C.Divide the dataset into three strata based on category
D.Select every 10th transaction from the entire dataset
E.Combine all categories into a single group and perform simple random sampling
AnswersA, B, C

Proportions are needed to determine sample sizes per stratum.

Why this answer

Stratified sampling requires dividing the population into strata (categories), then randomly sampling from each stratum in proportion to its size. Combining strata or simple random sampling without stratification would not achieve proportional representation.

113
MCQmedium

A data analyst wants to find customers whose last name starts with 'Mc' and have made purchases in 2023. The purchase table has a purchase_date column. Which SQL query accomplishes this?

A.SELECT * FROM customers WHERE last_name LIKE 'Mc_' AND YEAR(purchase_date) = 2023;
B.SELECT * FROM customers WHERE last_name LIKE '%Mc%' AND purchase_date = 2023;
C.SELECT * FROM customers WHERE last_name = 'Mc%' AND YEAR(purchase_date) = 2023;
D.SELECT * FROM customers c JOIN purchases p ON c.id = p.customer_id WHERE last_name LIKE 'Mc%' AND p.purchase_date BETWEEN '2023-01-01' AND '2023-12-31';
AnswerD

Correct use of LIKE, JOIN, and date range.

Why this answer

The LIKE operator with 'Mc%' matches names starting with 'Mc', and YEAR() extracts the year from a date.

← PreviousPage 2 of 2 · 113 questions total

Ready to test yourself?

Try a timed practice session using only Dap Mining Data questions.

CCNA Dap Mining Data Questions — Page 2 of 2 | Courseiva