A machine learning engineer is analyzing a dataset for a regression problem. The target variable has a long-tail distribution with extreme outliers. The engineer wants to reduce the influence of outliers while preserving the relative order of values. Which data transformation should the engineer apply to the target variable?
Rank transformation replaces values with their rank order, making the distribution uniform and robust to outliers.
Why this answer
Option B is correct because the rank transformation maps values to their ranks, eliminating the impact of outliers while preserving order. Option A is wrong because Box-Cox requires positive values and may not reduce outlier influence. Option C is wrong because log transformation can reduce skew but still allows outliers to remain influential.
Option D is wrong because min-max scaling does not reduce outlier influence; it compresses the range.