A company uses BigQuery for BI reporting. They have a table 'orders' with columns: order_id, customer_id, order_date, amount, status. The BI team frequently runs queries that filter on order_date and group by customer_id to compute total sales per customer. Which partitioning and clustering strategy optimizes query performance and cost?
Trap 1: Partition by order_date, cluster by status
Clustering on status does not help the group by on customer_id.
Trap 2: Do not partition, cluster by customer_id
Without partitioning, all partitions are scanned even with date filters, increasing cost.
Trap 3: Partition by customer_id, cluster by order_date
Partitioning on customer_id is not effective because queries filter on order_date, not customer_id.
- A
Partition by order_date, cluster by status
Why wrong: Clustering on status does not help the group by on customer_id.
- B
Do not partition, cluster by customer_id
Why wrong: Without partitioning, all partitions are scanned even with date filters, increasing cost.
- C
Partition by customer_id, cluster by order_date
Why wrong: Partitioning on customer_id is not effective because queries filter on order_date, not customer_id.
- D
Partition by order_date, cluster by customer_id
Partitioning on order_date prunes partitions for date filters; clustering on customer_id improves group by performance.