A data scientist is exploring a dataset and wants to check for missing values. Which method is most appropriate to identify the percentage of missing values per column?
This is a direct and efficient way to count missing values per column.
Why this answer
Using pandas .isnull().sum() in a SageMaker notebook is a standard approach to count missing values per column. Option A is wrong because S3 Select is for filtering S3 objects, not for data analysis. Option B is wrong because QuickSight is for visualization but not for programmatic missing value analysis.
Option D is wrong because Athena requires SQL and is less direct for EDA. Option E is wrong because Glue Crawler discovers schema, not missing values.