MLS-C01 Exploratory Data Analysis • Set 26
MLS-C01 Exploratory Data Analysis Practice Test 26 — 15 questions with explanations. Free, no signup.
A data scientist is building a model to predict housing prices using a dataset with 100,000 records and 50 features. The features include 'sqft_living', 'sqft_lot', 'bedrooms', 'bathrooms', 'floors', 'waterfront', 'view', 'condition', 'grade', etc. The data scientist uses Amazon SageMaker Data Wrangler for EDA. Upon reviewing the data, the data scientist finds that 'sqft_living' has a correlation of 0.7 with 'sqft_above' (square footage above ground) and 0.6 with 'sqft_basement'. Also, 'grade' (overall grade of the house) is highly correlated with 'condition' (0.8). The target variable 'price' is right-skewed. The data scientist plans to use a linear regression model. Which set of actions should the data scientist take to improve model performance?