A retail company uses BigQuery to store sales data. The 'sales' table has 10 billion rows and is partitioned by transaction_date (daily). The BI dashboard runs a query that aggregates sales by product_category for the last 30 days. The query is slow and expensive. Which improvement is most effective?
Clustering on product_category organizes data within each partition so that queries filtering/aggregating on that column scan fewer blocks.
Why this answer
Option A is correct because clustering the table on product_category organizes the data within each daily partition by that column, allowing BigQuery to use block-level pruning to skip irrelevant blocks when filtering or aggregating by product_category. This directly reduces the amount of data scanned for the 30-day aggregation query, improving both performance and cost.
Exam trap
Google Cloud often tests the distinction between partitioning (which limits data by time range) and clustering (which organizes data within partitions for column-based pruning), and candidates mistakenly choose partitioning changes or materialized views without understanding that clustering directly addresses the slow aggregation on a non-time column.
How to eliminate wrong answers
Option B is wrong because changing partitioning from daily to monthly would increase the partition size, forcing the query to scan more data per partition (the entire month) rather than only the last 30 days, which would actually worsen performance and cost. Option C is wrong because denormalizing product_category into the sales table is already the current schema; the issue is not about normalization but about data organization for efficient pruning. Option D is wrong because a materialized view with aggregation on product_category would still require scanning all partitions unless the view is also partitioned and clustered; moreover, materialized views in BigQuery are best for pre-aggregating high-frequency queries but do not inherently reduce scan costs if the underlying table is not properly clustered.