A data engineer is optimizing Amazon Athena queries on large datasets stored in S3 for machine learning data preparation. Which THREE practices improve query performance?
Partition pruning limits scanned data.
Why this answer
Partitioning by a frequently filtered column, such as date, allows Athena to use partition pruning. When a query includes a filter on the partition column, Athena can skip entire directories of data in S3, drastically reducing the amount of data scanned and improving query performance while also lowering cost.
Exam trap
AWS often tests the misconception that more partitions always improve performance, but in reality, over-partitioning leads to metastore overhead and small file problems that degrade query performance.