Courseiva
Knowledge + Practice
CertificationsVendorsCareer RoadmapsLabs & ToolsStudy GuidesGlossaryPractice Questions
C
Courseiva

Free IT certification practice questions with explained answers for CCNA, CompTIA, AWS, Azure, Google Cloud, and more.

Certification Practice Questions

CCNA practice questionsSecurity+ SY0-701 practice questionsAWS SAA-C03 practice questionsAZ-104 practice questionsAZ-900 practice questionsCLF-C02 practice questionsA+ Core 1 practice questionsGoogle Cloud ACE practice questionsCySA+ CS0-003 practice questionsNetwork+ N10-009 practice questions
View all certifications →

Product

CertificationsCertification PathsExam TopicsPractice TestsExam Dumps vs Practice TestsStudy HubComparisons

Company

AboutContactEditorial PolicyQuestion Writing PolicyTrust Center

Legal

Privacy PolicyTerms of Service

Courseiva is a free IT certification practice platform offering original exam-style practice questions, detailed explanations, topic-based practice, mock exams, readiness tracking, and study analytics for Cisco, CompTIA, Microsoft, AWS, and other technology certifications.

© 2026 Courseiva. Courseiva is operated by JTNetSolutions Ltd. All rights reserved.

Courseiva is an independent certification practice platform and is not affiliated with, endorsed by, or sponsored by Cisco, Microsoft, AWS, CompTIA, Google, ISC2, ISACA, or any other certification vendor. Vendor names and certification marks are used only to identify the exams learners are preparing for.

Certifications›MLS-C01›Objectives›Exploratory Data Analysis
Objective 2.0

Exploratory Data Analysis

MLS-C01 Practice Questions

Use this page to practise Exploratory Data Analysis questions for this certification. Focus on how the exam tests exploratory data analysis in scenario format — understanding the why behind each answer builds more durable knowledge than memorising options.

Full Practice Test →All Objectives

What this objective tests

MLS-C01 Exploratory Data Analysis — Key Topics

Exploratory Data Analysis questions on this certification test your ability to deploy and manage exploratory data analysis concepts in scenario-based situations.

  • Core Exploratory Data Analysis concepts and how they apply in real-world cloud scenarios.
  • How to deploy exploratory data analysis correctly and verify the outcome.
  • Troubleshooting exploratory data analysis issues by interpreting error output and system state.
  • Cloud best practices and Exploratory Data Analysis design trade-offs tested by this certification.

Common exam traps

Where candidates lose marks on Exploratory Data Analysis

  • ⚠Selecting the most expensive service when a simpler managed option meets the requirement.
  • ⚠Forgetting that cloud resources must be explicitly secured — defaults are rarely secure.
  • ⚠Choosing a global service fix when the issue is region-specific.
  • ⚠Overlooking cost implications of cross-region data transfer in architecture questions.

MLS-C01 Exploratory Data Analysis — Practice Questions

30 questions from this objective

Question 2mediummultiple choice
Full question →

A data scientist is exploring a dataset of customer transactions. The dataset has 1 million rows and 50 columns. The target variable is a binary flag indicating whether a customer churned. The data scientist runs a correlation matrix on all numerical features and finds that two features have a correlation coefficient of 0.98. Which action should be taken to improve model performance?

Question 3hardmultiple choice
Full question →

A team is building a regression model to predict house prices. The dataset includes a column 'zip_code' with 100 unique values. The data scientist one-hot encodes this column, resulting in 100 new binary columns. The model shows poor performance on a validation set. What is the most likely cause?

Question 4easymultiple choice
Full question →

During exploratory data analysis, a data scientist plots the distribution of a numerical feature and observes a heavy right skew. The feature has many outliers at the high end. Which transformation is most appropriate to reduce skewness?

Question 5mediummultiple choice
Full question →

A data scientist is analyzing a dataset with missing values in 30% of the rows for the 'age' column. The data scientist decides to impute the missing values with the median of the observed 'age' values. What is a potential drawback of this approach?

Question 6hardmultiple choice
Full question →

A data scientist is exploring a dataset with 500 features and 10,000 samples. The data scientist computes the pairwise correlation matrix and finds that many features have correlations above 0.9. The data scientist wants to reduce the dataset to 50 features while preserving as much variance as possible. Which technique should be used?

Question 7mediummulti select
Full question →

A data scientist is performing exploratory data analysis on a dataset with 10,000 rows and 20 features. The target variable is binary. The data scientist observes that one feature has 15% missing values. Which TWO actions are appropriate to handle this missing data? (Choose TWO.)

Question 8hardmulti select
Full question →

A data scientist is analyzing a dataset of customer reviews. The dataset contains a text column 'review' and a numerical rating from 1 to 5. The data scientist wants to create features for sentiment analysis. Which THREE preprocessing steps should be applied to the text data before feature extraction? (Choose THREE.)

Question 9mediummultiple choice
Full question →

A data scientist is analyzing a dataset with a target variable that is heavily imbalanced (e.g., 99% negative class, 1% positive class). Which exploratory data analysis technique is most appropriate to understand the relationship between features and the target before modeling?

Question 10easymultiple choice
Full question →

During EDA, a data scientist notices that a feature has a high proportion of missing values (e.g., 70%). The feature is continuous and expected to be important based on domain knowledge. What is the best approach to handle this?

Question 11hardmultiple choice
Full question →

A data scientist is performing EDA on a dataset with 1,000 features and 10,000 rows. The target variable is binary. After checking for multicollinearity, the scientist finds many pairs of features with correlation > 0.95. Which action should be taken to prepare the data for modeling?

Question 12mediummultiple choice
Full question →

A data scientist is analyzing a time-series dataset and wants to check for stationarity. Which EDA technique is most appropriate?

Question 13easymultiple choice
Read the full NAT/PAT explanation →

During EDA, a data scientist creates a scatter matrix of numerical features and notices that some features have a funnel-shaped pattern (variance increases with the mean). What is the appropriate transformation to stabilize variance?

Question 14mediummulti select
Full question →

Which TWO of the following are appropriate techniques for detecting outliers in a univariate continuous feature?

Question 15hardmulti select
Full question →

Which THREE of the following are best practices when performing exploratory data analysis on a dataset with both numerical and categorical features?

Question 16mediummultiple choice
Read the full NAT/PAT explanation →

A data scientist is performing exploratory data analysis on a dataset containing customer transactions. The dataset has 1 million rows with 50 features, including numerical and categorical variables. The goal is to identify patterns and potential data quality issues before building a model. Which approach should the data scientist take to efficiently explore the data?

Question 17hardmultiple choice
Full question →

A data scientist is trying to read a CSV file from S3 bucket 'my-bucket' with key 'training/data.csv' using an IAM role with the attached policy shown in the exhibit. The read operation fails with an Access Denied error. What is the most likely cause?

Exhibit

Refer to the exhibit.

```
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/training/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/training/"
    }
  ]
}
```
Question 18hardmultiple choice
Read the full NAT/PAT explanation →

A data scientist is building a fraud detection model using a dataset of 500,000 credit card transactions. The dataset contains 20 features, including transaction amount, merchant category, time since last transaction, and customer age. The target variable 'is_fraud' has 0.1% positive examples. Initial EDA reveals that the transaction amount distribution is highly skewed with a long tail. Also, there are missing values in the 'customer_age' field (5% missing). The data scientist needs to prepare the data for training a binary classifier. Which combination of preprocessing steps should the data scientist apply to address these issues and improve model performance? (Select TWO.)

Question 19mediummultiple choice
Read the full NAT/PAT explanation →

A machine learning engineer is working on a customer churn prediction project. The dataset contains 100,000 records with 15 features, including customer demographics, account information, and usage patterns. The target variable 'churned' is binary with 15% positive examples. During EDA, the engineer notices that the feature 'tenure' (number of months the customer has been with the company) has a multimodal distribution with peaks at 1, 12, 24, and 36 months. Also, the feature 'monthly_charges' has a strong positive correlation with 'total_charges' (correlation coefficient = 0.95). The engineer wants to build a logistic regression model. Which preprocessing steps should the engineer take to address these issues? (Select TWO.)

Question 20mediummulti select
Full question →

A data scientist is analyzing a dataset with 100 features and 10,000 observations. The target variable is binary (0/1). Initial exploratory data analysis reveals that many features have missing values, high correlation with each other, and non-normal distributions. The data scientist wants to identify the most important features for predicting the target while reducing dimensionality. Which TWO actions should the data scientist take? (Choose two.)

Question 21hardmultiple choice
Full question →

Refer to the exhibit. A data scientist ran an S3 Select query on a large CSV file stored in Amazon S3. The output shows only 2 records returned, but the data scientist expected thousands. The file size is 10 GB. What is the MOST likely reason for the small result set?

Exhibit

Refer to the exhibit.

```
# S3 Select query result on a CSV file
SELECT * FROM s3object s WHERE s."age" > 30 AND s."city" = 'New York'

# Result:
{
  "Payload": [
    {"Records": {"Payload": "name,age,city\nAlice,35,New York\nBob,40,New York\n"}},
    {"Stats": {"Details": {"BytesScanned": 1024, "BytesProcessed": 512, "BytesReturned": 64}}}
  ]
}
```
Question 22easymultiple choice
Full question →

A machine learning engineer is working on a regression problem to predict house prices. The dataset contains 500,000 rows and 20 features, including 'sqft_living', 'bedrooms', 'bathrooms', 'floors', 'waterfront', 'view', 'condition', 'grade', 'yr_built', 'zipcode', and 'lat'. After performing exploratory data analysis, the engineer notices that the 'sqft_living' feature has a right-skewed distribution with a long tail. The 'zipcode' feature is categorical with 70 unique values. The 'lat' feature is continuous. The engineer wants to prepare the data for a linear regression model. Which action should the engineer take to improve model performance?

Question 23mediumdrag order
Full question →

Drag and drop the steps to create a data processing job using Amazon SageMaker Processing in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 24mediumdrag order
Full question →

Drag and drop the steps to use Amazon SageMaker Feature Store for feature engineering in the correct order.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order
1Step 1
2Step 2
3Step 3
4Step 4
5Step 5
Question 25mediummatching
Full question →

Match each SageMaker feature to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Managed compute to train a model

Host a model for real-time inference

Run inference on a batch of data

Jupyter notebook for exploration

Run data processing scripts

Question 26mediummatching
Full question →

Match each ML model evaluation concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Model performs well on training data but poorly on unseen data

Model fails to capture underlying patterns in data

Error from wrong assumptions in the learning algorithm

Error from sensitivity to small fluctuations in training data

Balance between underfitting and overfitting

Question 27easymultiple choice
Full question →

A data scientist is analyzing a dataset with 500 features and 10,000 samples. After running a correlation matrix, they find that many feature pairs have correlation >0.95. What is the most appropriate next step to improve model performance?

Question 28mediummultiple choice
Full question →

A machine learning engineer is performing exploratory data analysis on a dataset containing customer transactions. They notice that the target variable is highly imbalanced: 99% of samples belong to class 0 and 1% to class 1. Which technique should they use to address this imbalance before training a classification model?

Question 29hardmultiple choice
Full question →

A data scientist is analyzing a dataset with missing values. The missing data mechanism is missing at random (MAR). Which imputation method is most appropriate to preserve relationships between variables?

Question 30easymulti select
Full question →

Which TWO actions are appropriate when dealing with outliers in a dataset during exploratory data analysis? (Select TWO.)

Question 31mediummulti select
Full question →

Which THREE techniques are commonly used for feature engineering in exploratory data analysis? (Select THREE.)

More Exploratory Data Analysis questions available in the full practice test.

Continue Practising →

All MLS-C01 Objectives

  • 2.Exploratory Data Analysis