Practice DA0-001 Analysing Data questions with full explanations on every answer.
Start practicing
Analysing Data — choose a session length
Free · No account required
Click any question to see the full explanation and answer options, or start a focused practice session above.
An analyst computed the mean, median, and mode of a dataset and found they are all equal. Which of the following best describes the distribution?
2A data analyst wants to compare the average revenue per customer between two marketing campaigns (A and B). The analyst is unsure if the data follows a normal distribution. Which statistical test is most appropriate for comparing the means of the two groups?
3A data analyst is performing a multiple linear regression with three predictors. The model output shows an R-squared of 0.85 and an adjusted R-squared of 0.80. Which of the following is the best interpretation of the difference between these two values?
4A data scientist is preparing data for a K-means clustering algorithm. The dataset contains features measured in different units (e.g., income in dollars and age in years). Which preprocessing step is most critical before running K-means?
5In a time series analysis, a retail analyst observes consistent peaks in sales every December and troughs every February. This pattern repeats annually. Which component of time series does this represent?
6A data analyst runs an A/B test on a new website layout. The test yields a p-value of 0.04 with the null hypothesis being no difference in conversion rates. The significance threshold is α=0.05. Which of the following is the correct conclusion?
7A dataset contains a column 'Age' with values: [22, 25, 25, 30, 35, 40, 45]. What is the interquartile range (IQR)?
8A data analyst wants to understand the relationship between advertising spend and sales revenue. The analyst calculates a Pearson correlation coefficient of 0.85. Which of the following is the best interpretation?
9In a logistic regression model predicting customer churn (1 = churn, 0 = not churn), the coefficient for 'contract length' is -0.5. Which of the following is the correct interpretation?
10A data analyst is cleaning a dataset and finds that 5% of values in the 'income' column are missing. The analyst decides to impute missing values using the mean of the non-missing values. Which potential issue should the analyst be most concerned about?
11A data analyst wants to use a Z-score to standardize a dataset. The variable has a mean of 50 and a standard deviation of 10. What is the Z-score for a raw value of 70?
12A data scientist is using K-means clustering with k=3. After the first iteration, the centroids are recalculated. Which step occurs next in the algorithm?
13A data analyst is evaluating data quality for a customer database. Which TWO dimensions of data quality are most directly affected by duplicate customer records?
14A data analyst is performing a chi-square test of independence on a contingency table of customer satisfaction (satisfied, neutral, dissatisfied) by region (North, South, East, West). Which THREE of the following are necessary assumptions for the test?
15A data analyst is preparing to run an A/B test comparing two email subject lines. Which TWO of the following should the analyst define before the test begins?
16A data analyst calculates the mean, median, and mode of a dataset. Which of the following measures of central tendency is least affected by extreme outliers?
17A data analyst is conducting an A/B test on a website's landing page. The null hypothesis is that there is no difference in conversion rates between the control and treatment groups. After collecting data, the analyst calculates a p-value of 0.03. Using a significance level of α = 0.05, what is the correct conclusion?
18A data scientist is analyzing a dataset with multiple features and wants to apply k-means clustering to segment customers. She chooses k = 4 based on the elbow method. During the iteration process, which of the following correctly describes a step in the k-means algorithm?
19An analyst is performing a linear regression and obtains an R-squared value of 0.85. Which of the following is the best interpretation?
20A dataset contains the ages of 100 customers. The analyst wants to transform the ages to a 0-1 range for use in a distance-based algorithm. Which technique should be used?
21A data analyst is testing whether the average sales amount differs between two regions. Which statistical test is most appropriate?
22In time series decomposition, a pattern that repeats at regular intervals (e.g., weekly, yearly) is called:
23Which data quality dimension is most concerned with whether data values fall within a defined domain or acceptable range?
24A data analyst is examining the relationship between advertising spend (in dollars) and revenue (in dollars). The Pearson correlation coefficient r is calculated as +0.92. Which of the following interpretations is correct?
25A data analyst is cleaning a dataset and finds that a numeric field has several missing values. The variable is normally distributed. Which imputation method is most appropriate?
26In a multiple regression model with three predictors, the coefficient for one predictor is 5.2 with a p-value of 0.001. Which of the following is the best interpretation?
27A data analyst wants to compare the means of three different training methods on employee productivity. Which statistical test is most appropriate?
28A data analyst is preparing a dataset for analysis and needs to handle outliers. Which TWO of the following are common methods for treating outliers?
29An analyst is conducting an A/B test on a new checkout process. To calculate sample size, which THREE factors must be considered?
30A data analyst is building a logistic regression model to predict whether a customer will churn (yes/no). Which TWO statements about logistic regression are correct?
31A data analyst is examining sales data for a retail chain and notices that the mean monthly sales is $50,000 while the median is $35,000. Which of the following best describes the distribution of the sales data?
32A data analyst is comparing the average test scores of students who attended a tutoring program versus those who did not. Which statistical test is most appropriate for determining if there is a significant difference between the means of these two independent groups?
33In a linear regression model predicting house prices, the coefficient for the number of bedrooms is $30,000 and the intercept is $50,000. If a house has 3 bedrooms, what is the predicted price?
34A data analyst wants to segment customers into groups based on their purchasing behavior. The dataset includes numerical features such as annual income and purchase frequency. Which algorithm is most appropriate for this task?
35A data analyst is preparing data for a k-nearest neighbors algorithm. The features include age (0-100) and income (0-200,000). Which technique should be applied to ensure the distance metric is not dominated by income?
36A data analyst is cleaning a dataset and finds that the 'age' column has several missing values. Which of the following is a valid method for handling missing numerical data?
37An analyst is conducting an A/B test to compare two website designs. The null hypothesis is that there is no difference in conversion rates. The p-value obtained is 0.03, and the significance threshold is 0.05. What should the analyst conclude?
38In time series decomposition, a data analyst separates a retail sales series into trend, seasonal, and residual components. After decomposition, the residual component shows no pattern and is random. Which of the following best describes the seasonal component?
39A data analyst needs to identify outliers in a dataset. Which of the following is a common method based on the interquartile range (IQR)?
40A data analyst is evaluating a multiple regression model with three predictors. The R² value is 0.85. Which of the following is the best interpretation of R²?
41A data analyst is performing a chi-square test of independence on a contingency table of customer satisfaction (satisfied vs. dissatisfied) and product type (A, B, C). The test yields a p-value of 0.04 with α = 0.05. What is the correct conclusion?
42A data analyst is working with a dataset that includes a column 'income' with values ranging from 20,000 to 150,000. To standardize this variable for a linear regression that assumes normally distributed residuals, which method should be used?
43A data analyst is preparing a dataset for analysis and needs to ensure data quality. Which TWO of the following are dimensions of data quality?
44A data analyst is performing K-means clustering on customer data. Which THREE of the following are steps in the K-means algorithm?
45An analyst is preparing data for an A/B test and wants to ensure valid results. Which TWO of the following should be considered when calculating the required sample size?
46A data analyst is analyzing customer purchase amounts. The dataset contains several extreme high values due to luxury purchases. Which measure of central tendency is most robust to these outliers?
47In A/B testing, the null hypothesis typically states that:
48A data scientist is performing K-means clustering on customer data. She plots the within-cluster sum of squares (WCSS) for different values of k and observes an 'elbow' at k=4. What does this indicate?
49A data analyst is asked to compare the average sales across three different store locations. The data is normally distributed and variances are approximately equal. Which statistical test is most appropriate?
50In simple linear regression, the coefficient of determination R² measures:
51A dataset contains a feature 'Age' with values ranging from 18 to 95. To prepare data for a k-nearest neighbors algorithm, which transformation should be applied to 'Age'?
52A data analyst is testing whether a new website layout increases conversion rate. The p-value from the test is 0.03. Using a significance level of 0.05, what is the correct conclusion?
53In time series analysis, which component represents regular patterns that repeat over fixed periods, such as daily or yearly?
54Which data quality dimension ensures that data represents the real-world object or event correctly?
55A marketing analyst wants to predict whether a customer will churn (yes/no) based on account age and monthly charges. Which regression technique is most appropriate?
56A dataset contains a variable 'Income' with many missing values. The analyst decides to impute missing values with the median income of the non-missing values. Which type of imputation is this?
57A data analyst is comparing the means of two independent groups using a t-test. The sample sizes are small and the data is not normally distributed. Which condition is violated for a valid t-test?
58A data analyst is preparing a dataset for analysis and needs to address data quality issues. Which TWO of the following are common data cleaning tasks?
59A data analyst is performing a chi-square test for independence between two categorical variables. Which THREE of the following are necessary conditions for the test to be valid?
60An analyst is planning an A/B test to compare two website designs. Which TWO factors should be considered when calculating the required sample size?
61A data analyst is summarizing the central tendency of a dataset with extreme outliers. Which measure is most robust to outliers?
62A retail company wants to test whether a new website layout increases the conversion rate compared to the current layout. They randomly assign visitors to either the control or treatment group. Which statistical test is most appropriate to compare the conversion rates?
63A data scientist is building a model to predict customer churn (yes/no). After training a logistic regression model, the coefficient for 'monthly charges' is 0.05 with a p-value of 0.03. Which interpretation is correct at α=0.05?
64A dataset contains height measurements in centimeters and inches. An analyst wants to apply k-means clustering. Which data transformation should be applied before clustering?
65In a simple linear regression model y = 2.5 + 1.2x, what is the predicted value of y when x = 10?
66A data analyst is performing time series analysis on monthly sales data and notices a consistent pattern of higher sales every December. Which component of time series does this represent?
67An analyst runs an A/B test with 1000 users per group and observes a conversion rate of 5% in the control and 6% in the treatment. The p-value is 0.12. What should the analyst conclude?
68A dataset has missing values in the 'age' column. The distribution of age is approximately normal with few outliers. Which imputation method is most appropriate?
69Which measure best describes the spread of the middle 50% of a dataset?
70An analyst calculates a Pearson correlation coefficient of -0.8 between advertising spend and customer churn rate. Which interpretation is correct?
71A data analyst uses the elbow method to determine the number of clusters for k-means. The plot shows a sharp bend at k=3 and a small bend at k=5. What is the recommended number of clusters?
72In a multiple regression model, one predictor has a high p-value (0.45). What should the analyst consider doing?
73A data analyst is cleaning a customer dataset. Which two actions are appropriate for handling duplicate records? (Choose TWO)
74A data scientist is conducting an A/B test with a significance level of 0.05. Which three factors should be considered when calculating the required sample size? (Choose THREE)
75A dataset contains outliers in a feature that will be used for linear regression. Which two outlier treatment methods are appropriate? (Choose TWO)
76A data analyst calculates the mean, median, and mode of a dataset. Which of the following best describes how these measures are used in descriptive statistics?
77An analyst is comparing the average sales of two different store locations using a t-test. The p-value obtained is 0.03, and the significance level is 0.05. What should the analyst conclude?
78A data scientist builds a simple linear regression model to predict house prices based on square footage. The model yields an R-squared value of 0.85. Which statement accurately interprets this result?
79An analyst is performing a logistic regression to predict customer churn (yes/no). The model outputs a probability of 0.75 for a particular customer. Which of the following best describes the interpretation?
80A dataset contains features with vastly different scales (e.g., age 0-100 and income 0-1,000,000). Which data transformation should be applied before using a K-nearest neighbors algorithm?
81A marketing team uses K-means clustering to segment customers based on purchase history. To determine the optimal number of clusters, they plot the within-cluster sum of squares (WCSS) against k and look for an elbow. What is the purpose of this method?
82A time series of monthly sales data exhibits a clear upward trend over several years, with consistent peaks each December. Which components are present in this series?
83In an A/B test, the null hypothesis states that there is no difference between the control and treatment groups. After running the test, the p-value is 0.04. Assuming α = 0.05, what is the correct conclusion?
84A data analyst notices that a dataset of customer ages has several missing values. Which method for handling missing data is most appropriate if the data is missing completely at random and the analyst wants to preserve sample size?
85A data analyst is cleaning a dataset and finds that some records have duplicate entries based on customer ID. Which data quality dimension is most directly affected by these duplicates?
86Which statistical test should be used to determine if there is a significant association between two categorical variables, such as gender and product preference?
87An analyst wants to compare the mean sales revenue across three different store regions. The data is normally distributed and variances are equal. Which statistical test is most appropriate?
88A data team is preparing data for a clustering analysis. Which THREE of the following steps are commonly part of data cleaning?
89An analyst is conducting an A/B test on a new website layout. Which TWO of the following must be defined before the test begins?
90Which TWO of the following are true about Pearson correlation coefficient (r)?
91A data analyst calculates the mean, median, and mode of a sales dataset and finds they are all equal. Which type of distribution does this indicate?
92A retailer wants to test if a new website layout increases the average time spent on the site. They split traffic: control group (old layout) and treatment group (new layout). Which statistical test is most appropriate to compare the average time spent between the two groups?
93A data scientist builds a logistic regression model to predict customer churn (yes/no). The model outputs a probability of 0.75 for a particular customer. Which of the following best describes this output?
94A dataset contains employee salaries ranging from $30,000 to $200,000. An analyst wants to scale the salaries to a range of 0 to 1 for use in a distance-based clustering algorithm. Which method should they use?
95A marketing team runs an A/B test on email subject lines. The p-value is 0.03 with α = 0.05. Which of the following is the correct interpretation?
96An analyst uses K-means clustering on customer purchase data. After plotting the within-cluster sum of squares for different values of k, they observe an elbow at k=4. What is the most appropriate number of clusters?
97Which data quality dimension is violated if a customer record has a missing phone number?
98A simple linear regression model predicts sales (y) from advertising spend (x). The equation is y = 2.5x + 10, and R² = 0.81. Which interpretation is correct?
99A data analyst has a time series of monthly sales data. They observe that sales are consistently higher every December and lower every January. Which component of time series does this pattern represent?
100Which data cleaning method involves replacing a missing value with the average of the available values in that column?
101An analyst compares average sales across three different store locations using a statistical test. Which test is most appropriate?
102In A/B testing, which factor is increased by having a larger sample size?
103A data analyst is preparing a dataset for a machine learning algorithm that assumes normally distributed features. Which TWO data transformation methods should the analyst consider to achieve this?
104A retail company wants to segment its customers based on purchase history. Which THREE methods are appropriate for customer segmentation?
105A data analyst is cleaning a dataset and identifies several outliers. Which TWO methods are appropriate for handling outliers?
106A data analyst calculates the mean, median, and mode of a dataset. Which measure of central tendency is most affected by extreme outliers?
107A retail company wants to analyze monthly sales data over the past three years to identify long-term trends. Which component of time series analysis is most relevant for this goal?
108An analyst runs a simple linear regression with an R² value of 0.85. Which interpretation is correct?
109A data scientist is performing a hypothesis test with a significance level α=0.05. The p-value obtained is 0.03. What should the scientist conclude?
110A marketing team runs an A/B test comparing two webpage designs. The null hypothesis states there is no difference in conversion rates. The p-value is 0.08 at α=0.05. Which is the correct interpretation?
111A dataset contains a feature with values ranging from 10 to 1000. The analyst applies min-max normalization to scale the feature between 0 and 1. What is the normalized value of 520?
112Which data quality dimension ensures that data represents the real-world scenario correctly and without errors?
113A financial analyst wants to compare the mean annual returns of three different investment strategies. Which statistical test is most appropriate?
114In logistic regression, the output is a probability between 0 and 1. If the predicted probability for a customer churning is 0.7 and the decision threshold is 0.5, what is the predicted class?
115A data analyst is cleaning a dataset and finds that the 'age' column has several missing values. Which method of handling missing values is least likely to introduce bias if the missingness is completely at random?
116A data scientist applies K-means clustering to a customer dataset. The elbow method suggests using 4 clusters. After running K-means with k=4, the within-cluster sum of squares (WCSS) is plotted against k, and the elbow is at k=4. What does this indicate?
117A retail company wants to identify customer segments based on purchase history and demographics. Which technique is most appropriate for this task?
118An analyst is preparing data for a clustering algorithm that uses Euclidean distance. Which TWO data preprocessing techniques should be applied to ensure all features contribute equally?
119A data analyst is evaluating the quality of a customer database. Which THREE of the following are dimensions of data quality?
120A researcher is designing an A/B test to compare two website layouts. Which TWO elements are essential for determining the required sample size?
121A data analyst is examining the distribution of customer ages in a dataset. The ages are: 22, 25, 29, 30, 31, 34, 35, 37, 40, 42, 45, 50, 55, 60, 65. Which measure of central tendency would be least affected by an outlier if a value of 120 is incorrectly recorded as age 120?
122A company wants to determine if there is a significant difference in the average sales revenue between two different store layouts. They collect sales data from 30 stores with Layout A and 30 stores with Layout B. Which statistical test is most appropriate for comparing the means of these two independent groups?
123In a regression analysis, the coefficient of determination (R²) is 0.85. How should this value be interpreted?
124A data scientist is building a K-means clustering model for customer segmentation. After plotting the within-cluster sum of squares (WCSS) against the number of clusters (k), she observes that the WCSS decreases sharply until k=5 and then levels off. Which value of k should she choose based on the elbow method?
125A stock analyst is analyzing monthly sales data for a retail company and observes a consistent pattern of high sales every December. This pattern is most likely an example of which time series component?
126In an A/B test, the null hypothesis states that there is no difference between the conversion rates of the control and treatment groups. After collecting data, the p-value is 0.03. Using a significance level α = 0.05, what should the analyst conclude?
127A data analyst is preparing features for a machine learning model that uses distance-based algorithms (e.g., K-means, KNN). The dataset contains numerical features with different scales: age (0-100), income (20,000-200,000), and credit score (300-850). Which data transformation technique is most appropriate to ensure all features contribute equally to the distance calculations?
128A logistic regression model is used to predict the probability of customer churn. The model's coefficient for the feature 'customer support calls' is 0.8 with a p-value of 0.001. Which interpretation is correct?
129A dataset contains customer records with a column for 'Phone Number' that should be unique. However, the analyst finds several duplicate phone numbers. Which data quality dimension is primarily affected?
130A data analyst is examining the relationship between advertising spend (in thousands) and sales (in thousands). The Pearson correlation coefficient is computed as r = -0.85. Which of the following interpretations is correct?
131A data analyst is cleaning a dataset with missing values in a time series of daily temperatures. The missing values occur sporadically. Which imputation method is most appropriate to maintain the temporal trend?
132A data analyst wants to test if the proportion of customers who prefer Product A over Product B is different from 50%. She surveys 200 customers and finds that 120 prefer Product A. Which statistical test should she use?
133A data analyst is performing data cleaning on a dataset and identifies several outliers in the 'age' column. Which TWO methods are appropriate for handling these outliers? (Select two.)
134A company is planning an A/B test to compare two website designs. Which THREE of the following must be determined before the test begins to ensure valid results? (Select three.)
135A data analyst wants to segment customers based on purchasing behavior such as frequency, monetary value, and recency. Which TWO clustering evaluation methods can help determine the optimal number of clusters? (Select two.)
136A data analyst is reviewing a dataset containing house prices. The mean price is $350,000 and the median is $280,000. Which of the following best describes the distribution of house prices?
137A data scientist runs a linear regression model to predict customer spending based on income. The R-squared value is 0.45 and the p-value for the slope coefficient is 0.03. At a significance level of α=0.05, which of the following conclusions is correct?
138Which TWO of the following are measures of central tendency?
139A data analyst is cleaning a dataset with missing values. Which TWO of the following are acceptable methods for handling missing numerical data?
140An analyst wants to compare the average sales revenue across three different store locations. Which TWO statistical methods are appropriate for this comparison?
141Which TWO of the following are appropriate uses of min-max normalisation?
142A data analyst is performing a chi-square test of independence on a 2x2 contingency table. The p-value is 0.04. At α=0.05, which THREE of the following statements are correct?
143A company runs an A/B test to compare a new website layout (treatment) against the current layout (control). The conversion rate for the control is 5% and for the treatment is 5.5%. The p-value is 0.06 at α=0.05. Which THREE of the following conclusions are valid?
144An analyst is performing K-means clustering on customer data. The elbow method shows a clear bend at k=4. Which THREE of the following are true about K-means clustering with k=4?
145Which TWO of the following are components of time series data?
146A logistic regression model predicts customer churn (0=no churn, 1=churn). The model outputs probabilities. Which THREE of the following statements about logistic regression are correct?
147Which TWO of the following data quality dimensions are most directly affected by duplicate records?
The Analysing Data domain covers the key concepts tested in this area of the DA0-001 exam blueprint published by CompTIA. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DA0-001 domains — no account required.
The Courseiva DA0-001 question bank contains 147 questions in the Analysing Data domain. Click any question to see the full explanation and answer breakdown.
Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.
Yes — the session launcher on this page draws questions exclusively from the Analysing Data domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.
Save your results, see per-domain analytics, and get readiness scores — free, for every certification.
Sign Up FreeFree forever · Every certification included