DA0-001 Analysing Data — All Questions With Answers

Question 1easymultiple choice

Read the full Analysing Data explanation →

An analyst computed the mean, median, and mode of a dataset and found they are all equal. Which of the following best describes the distribution?

Question 2mediummultiple choice

Read the full Analysing Data explanation →

A data analyst wants to compare the average revenue per customer between two marketing campaigns (A and B). The analyst is unsure if the data follows a normal distribution. Which statistical test is most appropriate for comparing the means of the two groups?

Question 3hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is performing a multiple linear regression with three predictors. The model output shows an R-squared of 0.85 and an adjusted R-squared of 0.80. Which of the following is the best interpretation of the difference between these two values?

Question 4mediummultiple choice

Read the full Analysing Data explanation →

A data scientist is preparing data for a K-means clustering algorithm. The dataset contains features measured in different units (e.g., income in dollars and age in years). Which preprocessing step is most critical before running K-means?

Question 5mediummultiple choice

Read the full Analysing Data explanation →

In a time series analysis, a retail analyst observes consistent peaks in sales every December and troughs every February. This pattern repeats annually. Which component of time series does this represent?

Question 6hardmultiple choice

Read the full Analysing Data explanation →

A data analyst runs an A/B test on a new website layout. The test yields a p-value of 0.04 with the null hypothesis being no difference in conversion rates. The significance threshold is α=0.05. Which of the following is the correct conclusion?

Question 7easymultiple choice

Read the full Analysing Data explanation →

A dataset contains a column 'Age' with values: [22, 25, 25, 30, 35, 40, 45]. What is the interquartile range (IQR)?

Question 8mediummultiple choice

Read the full Analysing Data explanation →

A data analyst wants to understand the relationship between advertising spend and sales revenue. The analyst calculates a Pearson correlation coefficient of 0.85. Which of the following is the best interpretation?

Question 9mediummultiple choice

Read the full Analysing Data explanation →

In a logistic regression model predicting customer churn (1 = churn, 0 = not churn), the coefficient for 'contract length' is -0.5. Which of the following is the correct interpretation?

Question 10hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset and finds that 5% of values in the 'income' column are missing. The analyst decides to impute missing values using the mean of the non-missing values. Which potential issue should the analyst be most concerned about?

Question 11easymultiple choice

Read the full Analysing Data explanation →

A data analyst wants to use a Z-score to standardize a dataset. The variable has a mean of 50 and a standard deviation of 10. What is the Z-score for a raw value of 70?

Question 12mediummultiple choice

Read the full Analysing Data explanation →

A data scientist is using K-means clustering with k=3. After the first iteration, the centroids are recalculated. Which step occurs next in the algorithm?

Question 13mediummulti select

Read the full Analysing Data explanation →

A data analyst is evaluating data quality for a customer database. Which TWO dimensions of data quality are most directly affected by duplicate customer records?

Question 14hardmulti select

Read the full Analysing Data explanation →

A data analyst is performing a chi-square test of independence on a contingency table of customer satisfaction (satisfied, neutral, dissatisfied) by region (North, South, East, West). Which THREE of the following are necessary assumptions for the test?

Question 15mediummulti select

Read the full Analysing Data explanation →

A data analyst is preparing to run an A/B test comparing two email subject lines. Which TWO of the following should the analyst define before the test begins?

Question 16easymultiple choice

Read the full Analysing Data explanation →

A data analyst calculates the mean, median, and mode of a dataset. Which of the following measures of central tendency is least affected by extreme outliers?

Question 17mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is conducting an A/B test on a website's landing page. The null hypothesis is that there is no difference in conversion rates between the control and treatment groups. After collecting data, the analyst calculates a p-value of 0.03. Using a significance level of α = 0.05, what is the correct conclusion?

Question 18hardmultiple choice

Read the full Analysing Data explanation →

A data scientist is analyzing a dataset with multiple features and wants to apply k-means clustering to segment customers. She chooses k = 4 based on the elbow method. During the iteration process, which of the following correctly describes a step in the k-means algorithm?

Question 19mediummultiple choice

Read the full Analysing Data explanation →

An analyst is performing a linear regression and obtains an R-squared value of 0.85. Which of the following is the best interpretation?

Question 20easymultiple choice

Read the full Analysing Data explanation →

A dataset contains the ages of 100 customers. The analyst wants to transform the ages to a 0-1 range for use in a distance-based algorithm. Which technique should be used?

Question 21mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is testing whether the average sales amount differs between two regions. Which statistical test is most appropriate?

Question 22hardmultiple choice

Read the full Analysing Data explanation →

In time series decomposition, a pattern that repeats at regular intervals (e.g., weekly, yearly) is called:

Question 23easymultiple choice

Read the full Analysing Data explanation →

Which data quality dimension is most concerned with whether data values fall within a defined domain or acceptable range?

Question 24mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is examining the relationship between advertising spend (in dollars) and revenue (in dollars). The Pearson correlation coefficient r is calculated as +0.92. Which of the following interpretations is correct?

Question 25mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset and finds that a numeric field has several missing values. The variable is normally distributed. Which imputation method is most appropriate?

Question 26hardmultiple choice

Read the full Analysing Data explanation →

In a multiple regression model with three predictors, the coefficient for one predictor is 5.2 with a p-value of 0.001. Which of the following is the best interpretation?

Question 27easymultiple choice

Read the full Analysing Data explanation →

A data analyst wants to compare the means of three different training methods on employee productivity. Which statistical test is most appropriate?

Question 28mediummulti select

Read the full Analysing Data explanation →

A data analyst is preparing a dataset for analysis and needs to handle outliers. Which TWO of the following are common methods for treating outliers?

Question 29mediummulti select

Read the full Analysing Data explanation →

An analyst is conducting an A/B test on a new checkout process. To calculate sample size, which THREE factors must be considered?

Question 30hardmulti select

Read the full Analysing Data explanation →

A data analyst is building a logistic regression model to predict whether a customer will churn (yes/no). Which TWO statements about logistic regression are correct?

Question 31mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is examining sales data for a retail chain and notices that the mean monthly sales is $50,000 while the median is $35,000. Which of the following best describes the distribution of the sales data?

Question 32easymultiple choice

Read the full Analysing Data explanation →

A data analyst is comparing the average test scores of students who attended a tutoring program versus those who did not. Which statistical test is most appropriate for determining if there is a significant difference between the means of these two independent groups?

Question 33hardmultiple choice

Read the full Analysing Data explanation →

In a linear regression model predicting house prices, the coefficient for the number of bedrooms is $30,000 and the intercept is $50,000. If a house has 3 bedrooms, what is the predicted price?

Question 34mediummultiple choice

Read the full Analysing Data explanation →

A data analyst wants to segment customers into groups based on their purchasing behavior. The dataset includes numerical features such as annual income and purchase frequency. Which algorithm is most appropriate for this task?

Question 35mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is preparing data for a k-nearest neighbors algorithm. The features include age (0-100) and income (0-200,000). Which technique should be applied to ensure the distance metric is not dominated by income?

Question 36easymultiple choice

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset and finds that the 'age' column has several missing values. Which of the following is a valid method for handling missing numerical data?

Question 37mediummultiple choice

Read the full Analysing Data explanation →

An analyst is conducting an A/B test to compare two website designs. The null hypothesis is that there is no difference in conversion rates. The p-value obtained is 0.03, and the significance threshold is 0.05. What should the analyst conclude?

Question 38hardmultiple choice

Read the full Analysing Data explanation →

In time series decomposition, a data analyst separates a retail sales series into trend, seasonal, and residual components. After decomposition, the residual component shows no pattern and is random. Which of the following best describes the seasonal component?

Question 39easymultiple choice

Read the full Analysing Data explanation →

A data analyst needs to identify outliers in a dataset. Which of the following is a common method based on the interquartile range (IQR)?

Question 40mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is evaluating a multiple regression model with three predictors. The R² value is 0.85. Which of the following is the best interpretation of R²?

Question 41hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is performing a chi-square test of independence on a contingency table of customer satisfaction (satisfied vs. dissatisfied) and product type (A, B, C). The test yields a p-value of 0.04 with α = 0.05. What is the correct conclusion?

Question 42mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is working with a dataset that includes a column 'income' with values ranging from 20,000 to 150,000. To standardize this variable for a linear regression that assumes normally distributed residuals, which method should be used?

Question 43mediummulti select

Read the full Analysing Data explanation →

A data analyst is preparing a dataset for analysis and needs to ensure data quality. Which TWO of the following are dimensions of data quality?

Question 44hardmulti select

Read the full Analysing Data explanation →

A data analyst is performing K-means clustering on customer data. Which THREE of the following are steps in the K-means algorithm?

Question 45mediummulti select

Read the full Analysing Data explanation →

An analyst is preparing data for an A/B test and wants to ensure valid results. Which TWO of the following should be considered when calculating the required sample size?

Question 46mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is analyzing customer purchase amounts. The dataset contains several extreme high values due to luxury purchases. Which measure of central tendency is most robust to these outliers?

Question 47easymultiple choice

Read the full Analysing Data explanation →

In A/B testing, the null hypothesis typically states that:

Question 48mediummultiple choice

Read the full Analysing Data explanation →

A data scientist is performing K-means clustering on customer data. She plots the within-cluster sum of squares (WCSS) for different values of k and observes an 'elbow' at k=4. What does this indicate?

Question 49hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is asked to compare the average sales across three different store locations. The data is normally distributed and variances are approximately equal. Which statistical test is most appropriate?

Question 50easymultiple choice

Read the full Analysing Data explanation →

In simple linear regression, the coefficient of determination R² measures:

Question 51mediummultiple choice

Read the full Analysing Data explanation →

A dataset contains a feature 'Age' with values ranging from 18 to 95. To prepare data for a k-nearest neighbors algorithm, which transformation should be applied to 'Age'?

Question 52hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is testing whether a new website layout increases conversion rate. The p-value from the test is 0.03. Using a significance level of 0.05, what is the correct conclusion?

Question 53mediummultiple choice

Read the full Analysing Data explanation →

In time series analysis, which component represents regular patterns that repeat over fixed periods, such as daily or yearly?

Question 54easymultiple choice

Read the full Analysing Data explanation →

Which data quality dimension ensures that data represents the real-world object or event correctly?

Question 55mediummultiple choice

Read the full Analysing Data explanation →

A marketing analyst wants to predict whether a customer will churn (yes/no) based on account age and monthly charges. Which regression technique is most appropriate?

Question 56mediummultiple choice

Read the full Analysing Data explanation →

A dataset contains a variable 'Income' with many missing values. The analyst decides to impute missing values with the median income of the non-missing values. Which type of imputation is this?

Question 57hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is comparing the means of two independent groups using a t-test. The sample sizes are small and the data is not normally distributed. Which condition is violated for a valid t-test?

Question 58mediummulti select

Read the full Analysing Data explanation →

A data analyst is preparing a dataset for analysis and needs to address data quality issues. Which TWO of the following are common data cleaning tasks?

Question 59hardmulti select

Read the full Analysing Data explanation →

A data analyst is performing a chi-square test for independence between two categorical variables. Which THREE of the following are necessary conditions for the test to be valid?

Question 60mediummulti select

Read the full Analysing Data explanation →

An analyst is planning an A/B test to compare two website designs. Which TWO factors should be considered when calculating the required sample size?

Question 61easymultiple choice

Read the full Analysing Data explanation →

A data analyst is summarizing the central tendency of a dataset with extreme outliers. Which measure is most robust to outliers?

Question 62mediummultiple choice

Read the full Analysing Data explanation →

A retail company wants to test whether a new website layout increases the conversion rate compared to the current layout. They randomly assign visitors to either the control or treatment group. Which statistical test is most appropriate to compare the conversion rates?

Question 63hardmultiple choice

Read the full Analysing Data explanation →

A data scientist is building a model to predict customer churn (yes/no). After training a logistic regression model, the coefficient for 'monthly charges' is 0.05 with a p-value of 0.03. Which interpretation is correct at α=0.05?

Question 64mediummultiple choice

Read the full Analysing Data explanation →

A dataset contains height measurements in centimeters and inches. An analyst wants to apply k-means clustering. Which data transformation should be applied before clustering?

Question 65easymultiple choice

Read the full Analysing Data explanation →

In a simple linear regression model y = 2.5 + 1.2x, what is the predicted value of y when x = 10?

Question 66mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is performing time series analysis on monthly sales data and notices a consistent pattern of higher sales every December. Which component of time series does this represent?

Question 67hardmultiple choice

Read the full Analysing Data explanation →

An analyst runs an A/B test with 1000 users per group and observes a conversion rate of 5% in the control and 6% in the treatment. The p-value is 0.12. What should the analyst conclude?

Question 68mediummultiple choice

Read the full Analysing Data explanation →

A dataset has missing values in the 'age' column. The distribution of age is approximately normal with few outliers. Which imputation method is most appropriate?

Question 69easymultiple choice

Read the full Analysing Data explanation →

Which measure best describes the spread of the middle 50% of a dataset?

Question 70mediummultiple choice

Read the full Analysing Data explanation →

An analyst calculates a Pearson correlation coefficient of -0.8 between advertising spend and customer churn rate. Which interpretation is correct?

Question 71hardmultiple choice

Read the full Analysing Data explanation →

A data analyst uses the elbow method to determine the number of clusters for k-means. The plot shows a sharp bend at k=3 and a small bend at k=5. What is the recommended number of clusters?

Question 72mediummultiple choice

Read the full Analysing Data explanation →

In a multiple regression model, one predictor has a high p-value (0.45). What should the analyst consider doing?

Question 73mediummulti select

Read the full Analysing Data explanation →

A data analyst is cleaning a customer dataset. Which two actions are appropriate for handling duplicate records? (Choose TWO)

Question 74hardmulti select

Read the full Analysing Data explanation →

A data scientist is conducting an A/B test with a significance level of 0.05. Which three factors should be considered when calculating the required sample size? (Choose THREE)

Question 75mediummulti select

Read the full Analysing Data explanation →

A dataset contains outliers in a feature that will be used for linear regression. Which two outlier treatment methods are appropriate? (Choose TWO)

Question 76easymultiple choice

Read the full Analysing Data explanation →

A data analyst calculates the mean, median, and mode of a dataset. Which of the following best describes how these measures are used in descriptive statistics?

Question 77mediummultiple choice

Read the full Analysing Data explanation →

An analyst is comparing the average sales of two different store locations using a t-test. The p-value obtained is 0.03, and the significance level is 0.05. What should the analyst conclude?

Question 78mediummultiple choice

Read the full Analysing Data explanation →

A data scientist builds a simple linear regression model to predict house prices based on square footage. The model yields an R-squared value of 0.85. Which statement accurately interprets this result?

Question 79hardmultiple choice

Read the full Analysing Data explanation →

An analyst is performing a logistic regression to predict customer churn (yes/no). The model outputs a probability of 0.75 for a particular customer. Which of the following best describes the interpretation?

Question 80mediummultiple choice

Read the full Analysing Data explanation →

A dataset contains features with vastly different scales (e.g., age 0-100 and income 0-1,000,000). Which data transformation should be applied before using a K-nearest neighbors algorithm?

Question 81mediummultiple choice

Read the full Analysing Data explanation →

A marketing team uses K-means clustering to segment customers based on purchase history. To determine the optimal number of clusters, they plot the within-cluster sum of squares (WCSS) against k and look for an elbow. What is the purpose of this method?

Question 82hardmultiple choice

Read the full Analysing Data explanation →

A time series of monthly sales data exhibits a clear upward trend over several years, with consistent peaks each December. Which components are present in this series?

Question 83easymultiple choice

Read the full Analysing Data explanation →

In an A/B test, the null hypothesis states that there is no difference between the control and treatment groups. After running the test, the p-value is 0.04. Assuming α = 0.05, what is the correct conclusion?

Question 84mediummultiple choice

Read the full Analysing Data explanation →

A data analyst notices that a dataset of customer ages has several missing values. Which method for handling missing data is most appropriate if the data is missing completely at random and the analyst wants to preserve sample size?

Question 85hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset and finds that some records have duplicate entries based on customer ID. Which data quality dimension is most directly affected by these duplicates?

Question 86easymultiple choice

Read the full Analysing Data explanation →

Which statistical test should be used to determine if there is a significant association between two categorical variables, such as gender and product preference?

Question 87mediummultiple choice

Read the full Analysing Data explanation →

An analyst wants to compare the mean sales revenue across three different store regions. The data is normally distributed and variances are equal. Which statistical test is most appropriate?

Question 88mediummulti select

Read the full Analysing Data explanation →

A data team is preparing data for a clustering analysis. Which THREE of the following steps are commonly part of data cleaning?

Question 89hardmulti select

Read the full Analysing Data explanation →

An analyst is conducting an A/B test on a new website layout. Which TWO of the following must be defined before the test begins?

Question 90mediummulti select

Read the full Analysing Data explanation →

Which TWO of the following are true about Pearson correlation coefficient (r)?

Question 91easymultiple choice

Read the full Analysing Data explanation →

A data analyst calculates the mean, median, and mode of a sales dataset and finds they are all equal. Which type of distribution does this indicate?

Question 92mediummultiple choice

Read the full Analysing Data explanation →

A retailer wants to test if a new website layout increases the average time spent on the site. They split traffic: control group (old layout) and treatment group (new layout). Which statistical test is most appropriate to compare the average time spent between the two groups?

Question 93hardmultiple choice

Read the full Analysing Data explanation →

A data scientist builds a logistic regression model to predict customer churn (yes/no). The model outputs a probability of 0.75 for a particular customer. Which of the following best describes this output?

Question 94mediummultiple choice

Read the full Analysing Data explanation →

A dataset contains employee salaries ranging from $30,000 to $200,000. An analyst wants to scale the salaries to a range of 0 to 1 for use in a distance-based clustering algorithm. Which method should they use?

Question 95mediummultiple choice

Read the full Analysing Data explanation →

A marketing team runs an A/B test on email subject lines. The p-value is 0.03 with α = 0.05. Which of the following is the correct interpretation?

Question 96mediummultiple choice

Read the full Analysing Data explanation →

An analyst uses K-means clustering on customer purchase data. After plotting the within-cluster sum of squares for different values of k, they observe an elbow at k=4. What is the most appropriate number of clusters?

Question 97easymultiple choice

Read the full Analysing Data explanation →

Which data quality dimension is violated if a customer record has a missing phone number?

Question 98mediummultiple choice

Read the full Analysing Data explanation →

A simple linear regression model predicts sales (y) from advertising spend (x). The equation is y = 2.5x + 10, and R² = 0.81. Which interpretation is correct?

Question 99hardmultiple choice

Read the full Analysing Data explanation →

A data analyst has a time series of monthly sales data. They observe that sales are consistently higher every December and lower every January. Which component of time series does this pattern represent?

Question 100easymultiple choice

Read the full Analysing Data explanation →

Which data cleaning method involves replacing a missing value with the average of the available values in that column?

Question 101mediummultiple choice

Read the full Analysing Data explanation →

An analyst compares average sales across three different store locations using a statistical test. Which test is most appropriate?

Question 102hardmultiple choice

Read the full Analysing Data explanation →

In A/B testing, which factor is increased by having a larger sample size?

Question 103mediummulti select

Read the full Analysing Data explanation →

A data analyst is preparing a dataset for a machine learning algorithm that assumes normally distributed features. Which TWO data transformation methods should the analyst consider to achieve this?

Question 104mediummulti select

Read the full Analysing Data explanation →

A retail company wants to segment its customers based on purchase history. Which THREE methods are appropriate for customer segmentation?

Question 105hardmulti select

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset and identifies several outliers. Which TWO methods are appropriate for handling outliers?

Question 106easymultiple choice

Read the full Analysing Data explanation →

A data analyst calculates the mean, median, and mode of a dataset. Which measure of central tendency is most affected by extreme outliers?

Question 107easymultiple choice

Read the full Analysing Data explanation →

A retail company wants to analyze monthly sales data over the past three years to identify long-term trends. Which component of time series analysis is most relevant for this goal?

Question 108mediummultiple choice

Read the full Analysing Data explanation →

An analyst runs a simple linear regression with an R² value of 0.85. Which interpretation is correct?

Question 109mediummultiple choice

Read the full Analysing Data explanation →

A data scientist is performing a hypothesis test with a significance level α=0.05. The p-value obtained is 0.03. What should the scientist conclude?

Question 110mediummultiple choice

Read the full Analysing Data explanation →

A marketing team runs an A/B test comparing two webpage designs. The null hypothesis states there is no difference in conversion rates. The p-value is 0.08 at α=0.05. Which is the correct interpretation?

Question 111hardmultiple choice

Read the full Analysing Data explanation →

A dataset contains a feature with values ranging from 10 to 1000. The analyst applies min-max normalization to scale the feature between 0 and 1. What is the normalized value of 520?

Question 112easymultiple choice

Read the full Analysing Data explanation →

Which data quality dimension ensures that data represents the real-world scenario correctly and without errors?

Question 113mediummultiple choice

Read the full Analysing Data explanation →

A financial analyst wants to compare the mean annual returns of three different investment strategies. Which statistical test is most appropriate?

Question 114hardmultiple choice

Read the full Analysing Data explanation →

In logistic regression, the output is a probability between 0 and 1. If the predicted probability for a customer churning is 0.7 and the decision threshold is 0.5, what is the predicted class?

Question 115mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset and finds that the 'age' column has several missing values. Which method of handling missing values is least likely to introduce bias if the missingness is completely at random?

Question 116hardmultiple choice

Read the full Analysing Data explanation →

A data scientist applies K-means clustering to a customer dataset. The elbow method suggests using 4 clusters. After running K-means with k=4, the within-cluster sum of squares (WCSS) is plotted against k, and the elbow is at k=4. What does this indicate?

Question 117mediummultiple choice

Read the full Analysing Data explanation →

A retail company wants to identify customer segments based on purchase history and demographics. Which technique is most appropriate for this task?

Question 118mediummulti select

Read the full Analysing Data explanation →

An analyst is preparing data for a clustering algorithm that uses Euclidean distance. Which TWO data preprocessing techniques should be applied to ensure all features contribute equally?

Question 119hardmulti select

Read the full Analysing Data explanation →

A data analyst is evaluating the quality of a customer database. Which THREE of the following are dimensions of data quality?

Question 120mediummulti select

Read the full Analysing Data explanation →

A researcher is designing an A/B test to compare two website layouts. Which TWO elements are essential for determining the required sample size?

Question 121mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is examining the distribution of customer ages in a dataset. The ages are: 22, 25, 29, 30, 31, 34, 35, 37, 40, 42, 45, 50, 55, 60, 65. Which measure of central tendency would be least affected by an outlier if a value of 120 is incorrectly recorded as age 120?

Question 122mediummultiple choice

Read the full Analysing Data explanation →

A company wants to determine if there is a significant difference in the average sales revenue between two different store layouts. They collect sales data from 30 stores with Layout A and 30 stores with Layout B. Which statistical test is most appropriate for comparing the means of these two independent groups?

Question 123easymultiple choice

Read the full Analysing Data explanation →

In a regression analysis, the coefficient of determination (R²) is 0.85. How should this value be interpreted?

Question 124hardmultiple choice

Read the full Analysing Data explanation →

A data scientist is building a K-means clustering model for customer segmentation. After plotting the within-cluster sum of squares (WCSS) against the number of clusters (k), she observes that the WCSS decreases sharply until k=5 and then levels off. Which value of k should she choose based on the elbow method?

Question 125mediummultiple choice

Read the full Analysing Data explanation →

A stock analyst is analyzing monthly sales data for a retail company and observes a consistent pattern of high sales every December. This pattern is most likely an example of which time series component?

Question 126easymultiple choice

Read the full Analysing Data explanation →

In an A/B test, the null hypothesis states that there is no difference between the conversion rates of the control and treatment groups. After collecting data, the p-value is 0.03. Using a significance level α = 0.05, what should the analyst conclude?

Question 127mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is preparing features for a machine learning model that uses distance-based algorithms (e.g., K-means, KNN). The dataset contains numerical features with different scales: age (0-100), income (20,000-200,000), and credit score (300-850). Which data transformation technique is most appropriate to ensure all features contribute equally to the distance calculations?

Question 128hardmultiple choice

Read the full Analysing Data explanation →

A logistic regression model is used to predict the probability of customer churn. The model's coefficient for the feature 'customer support calls' is 0.8 with a p-value of 0.001. Which interpretation is correct?

Question 129easymultiple choice

Read the full Analysing Data explanation →

A dataset contains customer records with a column for 'Phone Number' that should be unique. However, the analyst finds several duplicate phone numbers. Which data quality dimension is primarily affected?

Question 130mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is examining the relationship between advertising spend (in thousands) and sales (in thousands). The Pearson correlation coefficient is computed as r = -0.85. Which of the following interpretations is correct?

Question 131hardmultiple choice

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset with missing values in a time series of daily temperatures. The missing values occur sporadically. Which imputation method is most appropriate to maintain the temporal trend?

Question 132mediummultiple choice

Read the full Analysing Data explanation →

A data analyst wants to test if the proportion of customers who prefer Product A over Product B is different from 50%. She surveys 200 customers and finds that 120 prefer Product A. Which statistical test should she use?

Question 133mediummulti select

Read the full Analysing Data explanation →

A data analyst is performing data cleaning on a dataset and identifies several outliers in the 'age' column. Which TWO methods are appropriate for handling these outliers? (Select two.)

Question 134hardmulti select

Read the full Analysing Data explanation →

A company is planning an A/B test to compare two website designs. Which THREE of the following must be determined before the test begins to ensure valid results? (Select three.)

Question 135mediummulti select

Read the full Analysing Data explanation →

A data analyst wants to segment customers based on purchasing behavior such as frequency, monetary value, and recency. Which TWO clustering evaluation methods can help determine the optimal number of clusters? (Select two.)

Question 136mediummultiple choice

Read the full Analysing Data explanation →

A data analyst is reviewing a dataset containing house prices. The mean price is $350,000 and the median is $280,000. Which of the following best describes the distribution of house prices?

Question 137hardmultiple choice

Read the full Analysing Data explanation →

A data scientist runs a linear regression model to predict customer spending based on income. The R-squared value is 0.45 and the p-value for the slope coefficient is 0.03. At a significance level of α=0.05, which of the following conclusions is correct?

Question 138easymulti select

Read the full Analysing Data explanation →

Which TWO of the following are measures of central tendency?

Question 139easymulti select

Read the full Analysing Data explanation →

A data analyst is cleaning a dataset with missing values. Which TWO of the following are acceptable methods for handling missing numerical data?

Question 140mediummulti select

Read the full Analysing Data explanation →

An analyst wants to compare the average sales revenue across three different store locations. Which TWO statistical methods are appropriate for this comparison?

Question 141mediummulti select

Read the full Analysing Data explanation →

Which TWO of the following are appropriate uses of min-max normalisation?

Question 142mediummulti select

Read the full Analysing Data explanation →

A data analyst is performing a chi-square test of independence on a 2x2 contingency table. The p-value is 0.04. At α=0.05, which THREE of the following statements are correct?

Question 143hardmulti select

Read the full Analysing Data explanation →

A company runs an A/B test to compare a new website layout (treatment) against the current layout (control). The conversion rate for the control is 5% and for the treatment is 5.5%. The p-value is 0.06 at α=0.05. Which THREE of the following conclusions are valid?

Question 144hardmulti select

Read the full Analysing Data explanation →

An analyst is performing K-means clustering on customer data. The elbow method shows a clear bend at k=4. Which THREE of the following are true about K-means clustering with k=4?

Question 145mediummulti select

Read the full Analysing Data explanation →

Which TWO of the following are components of time series data?

Question 146hardmulti select

Read the full Analysing Data explanation →

A logistic regression model predicts customer churn (0=no churn, 1=churn). The model outputs probabilities. Which THREE of the following statements about logistic regression are correct?

Question 147mediummulti select

Read the full Analysing Data explanation →