MLS-C01 • Mock Exam 92
Free MLS-C01 mock exam — 25 questions with explanations. Set 92. No signup required.
A data scientist is working on a regression problem to predict house prices. The dataset has 80 features, including categorical variables with high cardinality (e.g., zip code with 10,000 unique values). The target variable is log-transformed. The data scientist trains a linear regression model and obtains an R² of 0.45 on the test set. To improve performance, the data scientist considers: A) Applying one-hot encoding to all categorical features and using Ridge regression. B) Using target encoding for high-cardinality features and using a tree-based model like XGBoost. C) Removing all categorical features and using polynomial features for numerical features. D) Using principal component analysis (PCA) on all features before training a linear model. Which approach is MOST likely to improve the model's performance?