A data engineer is performing exploratory data analysis on a dataset stored in Amazon S3 using AWS Glue DataBrew. The dataset contains a column 'age' with missing values. DataBrew's profile shows that the column has 5% missing values, a mean of 45, and a standard deviation of 15. Which imputation strategy should the engineer recommend to minimize bias if the missing data is Missing at Random (MAR)?
Multiple imputation preserves the natural variability and provides valid statistical inferences under MAR.
Why this answer
Option C is correct because multiple imputation provides unbiased estimates under MAR by accounting for uncertainty. Option A is wrong because mean imputation reduces variance and can bias relationships. Option B is wrong because median imputation is robust but still single imputation.
Option D is wrong because dropping rows reduces sample size and may introduce bias if missingness is related to other variables.