Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Free Resources

Difficulty IndexLearn — Free ChaptersIT GlossaryFree Tools & LabsStudy GuidesCareer RoadmapsBrowse by VendorCisco Command ReferenceCCNA Scenarios

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

HomeCertificationsDA0-001DomainsAnalysing Data
DA0-001Free — No Signup

Analysing Data

Practice DA0-001 Analysing Data questions with full explanations on every answer.

147questions

Start practicing

Analysing Data — choose a session length

10 questions~10 min20 questions~20 min30 questions~30 min50 questions~50 min

Free · No account required

DA0-001 Domains

Data Concepts and EnvironmentsAnalysing DataVisualising DataReporting InsightsMining DataComparing and Contrasting Data ConceptsMining and Acquiring DataAnalyzing and Modeling DataVisualizing DataCommunicating Data Insights

Practice Analysing Data questions

10Q20Q30Q50Q

All DA0-001 Analysing Data questions (147)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

1

An analyst computed the mean, median, and mode of a dataset and found they are all equal. Which of the following best describes the distribution?

2

A data analyst wants to compare the average revenue per customer between two marketing campaigns (A and B). The analyst is unsure if the data follows a normal distribution. Which statistical test is most appropriate for comparing the means of the two groups?

3

A data analyst is performing a multiple linear regression with three predictors. The model output shows an R-squared of 0.85 and an adjusted R-squared of 0.80. Which of the following is the best interpretation of the difference between these two values?

4

A data scientist is preparing data for a K-means clustering algorithm. The dataset contains features measured in different units (e.g., income in dollars and age in years). Which preprocessing step is most critical before running K-means?

5

In a time series analysis, a retail analyst observes consistent peaks in sales every December and troughs every February. This pattern repeats annually. Which component of time series does this represent?

6

A data analyst runs an A/B test on a new website layout. The test yields a p-value of 0.04 with the null hypothesis being no difference in conversion rates. The significance threshold is α=0.05. Which of the following is the correct conclusion?

7

A dataset contains a column 'Age' with values: [22, 25, 25, 30, 35, 40, 45]. What is the interquartile range (IQR)?

8

A data analyst wants to understand the relationship between advertising spend and sales revenue. The analyst calculates a Pearson correlation coefficient of 0.85. Which of the following is the best interpretation?

9

In a logistic regression model predicting customer churn (1 = churn, 0 = not churn), the coefficient for 'contract length' is -0.5. Which of the following is the correct interpretation?

10

A data analyst is cleaning a dataset and finds that 5% of values in the 'income' column are missing. The analyst decides to impute missing values using the mean of the non-missing values. Which potential issue should the analyst be most concerned about?

11

A data analyst wants to use a Z-score to standardize a dataset. The variable has a mean of 50 and a standard deviation of 10. What is the Z-score for a raw value of 70?

12

A data scientist is using K-means clustering with k=3. After the first iteration, the centroids are recalculated. Which step occurs next in the algorithm?

13

A data analyst is evaluating data quality for a customer database. Which TWO dimensions of data quality are most directly affected by duplicate customer records?

14

A data analyst is performing a chi-square test of independence on a contingency table of customer satisfaction (satisfied, neutral, dissatisfied) by region (North, South, East, West). Which THREE of the following are necessary assumptions for the test?

15

A data analyst is preparing to run an A/B test comparing two email subject lines. Which TWO of the following should the analyst define before the test begins?

16

A data analyst calculates the mean, median, and mode of a dataset. Which of the following measures of central tendency is least affected by extreme outliers?

17

A data analyst is conducting an A/B test on a website's landing page. The null hypothesis is that there is no difference in conversion rates between the control and treatment groups. After collecting data, the analyst calculates a p-value of 0.03. Using a significance level of α = 0.05, what is the correct conclusion?

18

A data scientist is analyzing a dataset with multiple features and wants to apply k-means clustering to segment customers. She chooses k = 4 based on the elbow method. During the iteration process, which of the following correctly describes a step in the k-means algorithm?

19

An analyst is performing a linear regression and obtains an R-squared value of 0.85. Which of the following is the best interpretation?

20

A dataset contains the ages of 100 customers. The analyst wants to transform the ages to a 0-1 range for use in a distance-based algorithm. Which technique should be used?

21

A data analyst is testing whether the average sales amount differs between two regions. Which statistical test is most appropriate?

22

In time series decomposition, a pattern that repeats at regular intervals (e.g., weekly, yearly) is called:

23

Which data quality dimension is most concerned with whether data values fall within a defined domain or acceptable range?

24

A data analyst is examining the relationship between advertising spend (in dollars) and revenue (in dollars). The Pearson correlation coefficient r is calculated as +0.92. Which of the following interpretations is correct?

25

A data analyst is cleaning a dataset and finds that a numeric field has several missing values. The variable is normally distributed. Which imputation method is most appropriate?

26

In a multiple regression model with three predictors, the coefficient for one predictor is 5.2 with a p-value of 0.001. Which of the following is the best interpretation?

27

A data analyst wants to compare the means of three different training methods on employee productivity. Which statistical test is most appropriate?

28

A data analyst is preparing a dataset for analysis and needs to handle outliers. Which TWO of the following are common methods for treating outliers?

29

An analyst is conducting an A/B test on a new checkout process. To calculate sample size, which THREE factors must be considered?

30

A data analyst is building a logistic regression model to predict whether a customer will churn (yes/no). Which TWO statements about logistic regression are correct?

31

A data analyst is examining sales data for a retail chain and notices that the mean monthly sales is $50,000 while the median is $35,000. Which of the following best describes the distribution of the sales data?

32

A data analyst is comparing the average test scores of students who attended a tutoring program versus those who did not. Which statistical test is most appropriate for determining if there is a significant difference between the means of these two independent groups?

33

In a linear regression model predicting house prices, the coefficient for the number of bedrooms is $30,000 and the intercept is $50,000. If a house has 3 bedrooms, what is the predicted price?

34

A data analyst wants to segment customers into groups based on their purchasing behavior. The dataset includes numerical features such as annual income and purchase frequency. Which algorithm is most appropriate for this task?

35

A data analyst is preparing data for a k-nearest neighbors algorithm. The features include age (0-100) and income (0-200,000). Which technique should be applied to ensure the distance metric is not dominated by income?

36

A data analyst is cleaning a dataset and finds that the 'age' column has several missing values. Which of the following is a valid method for handling missing numerical data?

37

An analyst is conducting an A/B test to compare two website designs. The null hypothesis is that there is no difference in conversion rates. The p-value obtained is 0.03, and the significance threshold is 0.05. What should the analyst conclude?

38

In time series decomposition, a data analyst separates a retail sales series into trend, seasonal, and residual components. After decomposition, the residual component shows no pattern and is random. Which of the following best describes the seasonal component?

39

A data analyst needs to identify outliers in a dataset. Which of the following is a common method based on the interquartile range (IQR)?

40

A data analyst is evaluating a multiple regression model with three predictors. The R² value is 0.85. Which of the following is the best interpretation of R²?

41

A data analyst is performing a chi-square test of independence on a contingency table of customer satisfaction (satisfied vs. dissatisfied) and product type (A, B, C). The test yields a p-value of 0.04 with α = 0.05. What is the correct conclusion?

42

A data analyst is working with a dataset that includes a column 'income' with values ranging from 20,000 to 150,000. To standardize this variable for a linear regression that assumes normally distributed residuals, which method should be used?

43

A data analyst is preparing a dataset for analysis and needs to ensure data quality. Which TWO of the following are dimensions of data quality?

44

A data analyst is performing K-means clustering on customer data. Which THREE of the following are steps in the K-means algorithm?

45

An analyst is preparing data for an A/B test and wants to ensure valid results. Which TWO of the following should be considered when calculating the required sample size?

46

A data analyst is analyzing customer purchase amounts. The dataset contains several extreme high values due to luxury purchases. Which measure of central tendency is most robust to these outliers?

47

In A/B testing, the null hypothesis typically states that:

48

A data scientist is performing K-means clustering on customer data. She plots the within-cluster sum of squares (WCSS) for different values of k and observes an 'elbow' at k=4. What does this indicate?

49

A data analyst is asked to compare the average sales across three different store locations. The data is normally distributed and variances are approximately equal. Which statistical test is most appropriate?

50

In simple linear regression, the coefficient of determination R² measures:

51

A dataset contains a feature 'Age' with values ranging from 18 to 95. To prepare data for a k-nearest neighbors algorithm, which transformation should be applied to 'Age'?

52

A data analyst is testing whether a new website layout increases conversion rate. The p-value from the test is 0.03. Using a significance level of 0.05, what is the correct conclusion?

53

In time series analysis, which component represents regular patterns that repeat over fixed periods, such as daily or yearly?

54

Which data quality dimension ensures that data represents the real-world object or event correctly?

55

A marketing analyst wants to predict whether a customer will churn (yes/no) based on account age and monthly charges. Which regression technique is most appropriate?

56

A dataset contains a variable 'Income' with many missing values. The analyst decides to impute missing values with the median income of the non-missing values. Which type of imputation is this?

57

A data analyst is comparing the means of two independent groups using a t-test. The sample sizes are small and the data is not normally distributed. Which condition is violated for a valid t-test?

58

A data analyst is preparing a dataset for analysis and needs to address data quality issues. Which TWO of the following are common data cleaning tasks?

59

A data analyst is performing a chi-square test for independence between two categorical variables. Which THREE of the following are necessary conditions for the test to be valid?

60

An analyst is planning an A/B test to compare two website designs. Which TWO factors should be considered when calculating the required sample size?

61

A data analyst is summarizing the central tendency of a dataset with extreme outliers. Which measure is most robust to outliers?

62

A retail company wants to test whether a new website layout increases the conversion rate compared to the current layout. They randomly assign visitors to either the control or treatment group. Which statistical test is most appropriate to compare the conversion rates?

63

A data scientist is building a model to predict customer churn (yes/no). After training a logistic regression model, the coefficient for 'monthly charges' is 0.05 with a p-value of 0.03. Which interpretation is correct at α=0.05?

64

A dataset contains height measurements in centimeters and inches. An analyst wants to apply k-means clustering. Which data transformation should be applied before clustering?

65

In a simple linear regression model y = 2.5 + 1.2x, what is the predicted value of y when x = 10?

66

A data analyst is performing time series analysis on monthly sales data and notices a consistent pattern of higher sales every December. Which component of time series does this represent?

67

An analyst runs an A/B test with 1000 users per group and observes a conversion rate of 5% in the control and 6% in the treatment. The p-value is 0.12. What should the analyst conclude?

68

A dataset has missing values in the 'age' column. The distribution of age is approximately normal with few outliers. Which imputation method is most appropriate?

69

Which measure best describes the spread of the middle 50% of a dataset?

70

An analyst calculates a Pearson correlation coefficient of -0.8 between advertising spend and customer churn rate. Which interpretation is correct?

71

A data analyst uses the elbow method to determine the number of clusters for k-means. The plot shows a sharp bend at k=3 and a small bend at k=5. What is the recommended number of clusters?

72

In a multiple regression model, one predictor has a high p-value (0.45). What should the analyst consider doing?

73

A data analyst is cleaning a customer dataset. Which two actions are appropriate for handling duplicate records? (Choose TWO)

74

A data scientist is conducting an A/B test with a significance level of 0.05. Which three factors should be considered when calculating the required sample size? (Choose THREE)

75

A dataset contains outliers in a feature that will be used for linear regression. Which two outlier treatment methods are appropriate? (Choose TWO)

76

A data analyst calculates the mean, median, and mode of a dataset. Which of the following best describes how these measures are used in descriptive statistics?

77

An analyst is comparing the average sales of two different store locations using a t-test. The p-value obtained is 0.03, and the significance level is 0.05. What should the analyst conclude?

78

A data scientist builds a simple linear regression model to predict house prices based on square footage. The model yields an R-squared value of 0.85. Which statement accurately interprets this result?

79

An analyst is performing a logistic regression to predict customer churn (yes/no). The model outputs a probability of 0.75 for a particular customer. Which of the following best describes the interpretation?

80

A dataset contains features with vastly different scales (e.g., age 0-100 and income 0-1,000,000). Which data transformation should be applied before using a K-nearest neighbors algorithm?

81

A marketing team uses K-means clustering to segment customers based on purchase history. To determine the optimal number of clusters, they plot the within-cluster sum of squares (WCSS) against k and look for an elbow. What is the purpose of this method?

82

A time series of monthly sales data exhibits a clear upward trend over several years, with consistent peaks each December. Which components are present in this series?

83

In an A/B test, the null hypothesis states that there is no difference between the control and treatment groups. After running the test, the p-value is 0.04. Assuming α = 0.05, what is the correct conclusion?

84

A data analyst notices that a dataset of customer ages has several missing values. Which method for handling missing data is most appropriate if the data is missing completely at random and the analyst wants to preserve sample size?

85

A data analyst is cleaning a dataset and finds that some records have duplicate entries based on customer ID. Which data quality dimension is most directly affected by these duplicates?

86

Which statistical test should be used to determine if there is a significant association between two categorical variables, such as gender and product preference?

87

An analyst wants to compare the mean sales revenue across three different store regions. The data is normally distributed and variances are equal. Which statistical test is most appropriate?

88

A data team is preparing data for a clustering analysis. Which THREE of the following steps are commonly part of data cleaning?

89

An analyst is conducting an A/B test on a new website layout. Which TWO of the following must be defined before the test begins?

90

Which TWO of the following are true about Pearson correlation coefficient (r)?

91

A data analyst calculates the mean, median, and mode of a sales dataset and finds they are all equal. Which type of distribution does this indicate?

92

A retailer wants to test if a new website layout increases the average time spent on the site. They split traffic: control group (old layout) and treatment group (new layout). Which statistical test is most appropriate to compare the average time spent between the two groups?

93

A data scientist builds a logistic regression model to predict customer churn (yes/no). The model outputs a probability of 0.75 for a particular customer. Which of the following best describes this output?

94

A dataset contains employee salaries ranging from $30,000 to $200,000. An analyst wants to scale the salaries to a range of 0 to 1 for use in a distance-based clustering algorithm. Which method should they use?

95

A marketing team runs an A/B test on email subject lines. The p-value is 0.03 with α = 0.05. Which of the following is the correct interpretation?

96

An analyst uses K-means clustering on customer purchase data. After plotting the within-cluster sum of squares for different values of k, they observe an elbow at k=4. What is the most appropriate number of clusters?

97

Which data quality dimension is violated if a customer record has a missing phone number?

98

A simple linear regression model predicts sales (y) from advertising spend (x). The equation is y = 2.5x + 10, and R² = 0.81. Which interpretation is correct?

99

A data analyst has a time series of monthly sales data. They observe that sales are consistently higher every December and lower every January. Which component of time series does this pattern represent?

100

Which data cleaning method involves replacing a missing value with the average of the available values in that column?

101

An analyst compares average sales across three different store locations using a statistical test. Which test is most appropriate?

102

In A/B testing, which factor is increased by having a larger sample size?

103

A data analyst is preparing a dataset for a machine learning algorithm that assumes normally distributed features. Which TWO data transformation methods should the analyst consider to achieve this?

104

A retail company wants to segment its customers based on purchase history. Which THREE methods are appropriate for customer segmentation?

105

A data analyst is cleaning a dataset and identifies several outliers. Which TWO methods are appropriate for handling outliers?

106

A data analyst calculates the mean, median, and mode of a dataset. Which measure of central tendency is most affected by extreme outliers?

107

A retail company wants to analyze monthly sales data over the past three years to identify long-term trends. Which component of time series analysis is most relevant for this goal?

108

An analyst runs a simple linear regression with an R² value of 0.85. Which interpretation is correct?

109

A data scientist is performing a hypothesis test with a significance level α=0.05. The p-value obtained is 0.03. What should the scientist conclude?

110

A marketing team runs an A/B test comparing two webpage designs. The null hypothesis states there is no difference in conversion rates. The p-value is 0.08 at α=0.05. Which is the correct interpretation?

111

A dataset contains a feature with values ranging from 10 to 1000. The analyst applies min-max normalization to scale the feature between 0 and 1. What is the normalized value of 520?

112

Which data quality dimension ensures that data represents the real-world scenario correctly and without errors?

113

A financial analyst wants to compare the mean annual returns of three different investment strategies. Which statistical test is most appropriate?

114

In logistic regression, the output is a probability between 0 and 1. If the predicted probability for a customer churning is 0.7 and the decision threshold is 0.5, what is the predicted class?

115

A data analyst is cleaning a dataset and finds that the 'age' column has several missing values. Which method of handling missing values is least likely to introduce bias if the missingness is completely at random?

116

A data scientist applies K-means clustering to a customer dataset. The elbow method suggests using 4 clusters. After running K-means with k=4, the within-cluster sum of squares (WCSS) is plotted against k, and the elbow is at k=4. What does this indicate?

117

A retail company wants to identify customer segments based on purchase history and demographics. Which technique is most appropriate for this task?

118

An analyst is preparing data for a clustering algorithm that uses Euclidean distance. Which TWO data preprocessing techniques should be applied to ensure all features contribute equally?

119

A data analyst is evaluating the quality of a customer database. Which THREE of the following are dimensions of data quality?

120

A researcher is designing an A/B test to compare two website layouts. Which TWO elements are essential for determining the required sample size?

121

A data analyst is examining the distribution of customer ages in a dataset. The ages are: 22, 25, 29, 30, 31, 34, 35, 37, 40, 42, 45, 50, 55, 60, 65. Which measure of central tendency would be least affected by an outlier if a value of 120 is incorrectly recorded as age 120?

122

A company wants to determine if there is a significant difference in the average sales revenue between two different store layouts. They collect sales data from 30 stores with Layout A and 30 stores with Layout B. Which statistical test is most appropriate for comparing the means of these two independent groups?

123

In a regression analysis, the coefficient of determination (R²) is 0.85. How should this value be interpreted?

124

A data scientist is building a K-means clustering model for customer segmentation. After plotting the within-cluster sum of squares (WCSS) against the number of clusters (k), she observes that the WCSS decreases sharply until k=5 and then levels off. Which value of k should she choose based on the elbow method?

125

A stock analyst is analyzing monthly sales data for a retail company and observes a consistent pattern of high sales every December. This pattern is most likely an example of which time series component?

126

In an A/B test, the null hypothesis states that there is no difference between the conversion rates of the control and treatment groups. After collecting data, the p-value is 0.03. Using a significance level α = 0.05, what should the analyst conclude?

127

A data analyst is preparing features for a machine learning model that uses distance-based algorithms (e.g., K-means, KNN). The dataset contains numerical features with different scales: age (0-100), income (20,000-200,000), and credit score (300-850). Which data transformation technique is most appropriate to ensure all features contribute equally to the distance calculations?

128

A logistic regression model is used to predict the probability of customer churn. The model's coefficient for the feature 'customer support calls' is 0.8 with a p-value of 0.001. Which interpretation is correct?

129

A dataset contains customer records with a column for 'Phone Number' that should be unique. However, the analyst finds several duplicate phone numbers. Which data quality dimension is primarily affected?

130

A data analyst is examining the relationship between advertising spend (in thousands) and sales (in thousands). The Pearson correlation coefficient is computed as r = -0.85. Which of the following interpretations is correct?

131

A data analyst is cleaning a dataset with missing values in a time series of daily temperatures. The missing values occur sporadically. Which imputation method is most appropriate to maintain the temporal trend?

132

A data analyst wants to test if the proportion of customers who prefer Product A over Product B is different from 50%. She surveys 200 customers and finds that 120 prefer Product A. Which statistical test should she use?

133

A data analyst is performing data cleaning on a dataset and identifies several outliers in the 'age' column. Which TWO methods are appropriate for handling these outliers? (Select two.)

134

A company is planning an A/B test to compare two website designs. Which THREE of the following must be determined before the test begins to ensure valid results? (Select three.)

135

A data analyst wants to segment customers based on purchasing behavior such as frequency, monetary value, and recency. Which TWO clustering evaluation methods can help determine the optimal number of clusters? (Select two.)

136

A data analyst is reviewing a dataset containing house prices. The mean price is $350,000 and the median is $280,000. Which of the following best describes the distribution of house prices?

137

A data scientist runs a linear regression model to predict customer spending based on income. The R-squared value is 0.45 and the p-value for the slope coefficient is 0.03. At a significance level of α=0.05, which of the following conclusions is correct?

138

Which TWO of the following are measures of central tendency?

139

A data analyst is cleaning a dataset with missing values. Which TWO of the following are acceptable methods for handling missing numerical data?

140

An analyst wants to compare the average sales revenue across three different store locations. Which TWO statistical methods are appropriate for this comparison?

141

Which TWO of the following are appropriate uses of min-max normalisation?

142

A data analyst is performing a chi-square test of independence on a 2x2 contingency table. The p-value is 0.04. At α=0.05, which THREE of the following statements are correct?

143

A company runs an A/B test to compare a new website layout (treatment) against the current layout (control). The conversion rate for the control is 5% and for the treatment is 5.5%. The p-value is 0.06 at α=0.05. Which THREE of the following conclusions are valid?

144

An analyst is performing K-means clustering on customer data. The elbow method shows a clear bend at k=4. Which THREE of the following are true about K-means clustering with k=4?

145

Which TWO of the following are components of time series data?

146

A logistic regression model predicts customer churn (0=no churn, 1=churn). The model outputs probabilities. Which THREE of the following statements about logistic regression are correct?

147

Which TWO of the following data quality dimensions are most directly affected by duplicate records?

Practice all 147 Analysing Data questions

Other DA0-001 exam domains

Data Concepts and EnvironmentsVisualising DataReporting InsightsMining DataComparing and Contrasting Data ConceptsMining and Acquiring DataAnalyzing and Modeling DataVisualizing DataCommunicating Data Insights

Frequently asked questions

What does the Analysing Data domain cover on the DA0-001 exam?

The Analysing Data domain covers the key concepts tested in this area of the DA0-001 exam blueprint published by CompTIA. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all DA0-001 domains — no account required.

How many Analysing Data questions are in the DA0-001 question bank?

The Courseiva DA0-001 question bank contains 147 questions in the Analysing Data domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Analysing Data for DA0-001?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Analysing Data questions for DA0-001?

Yes — the session launcher on this page draws questions exclusively from the Analysing Data domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your DA0-001 domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Sign Up Free

Free forever · Every certification included

Practice Session

10 questions20 questions30 questions50 questions

Study Resources

All DomainsPractice TestMock ExamFlashcardsStudy Guide