A company uses BigQuery to run reporting queries on a table that is partitioned by date and clustered by customer_id. Queries filtering by customer_id and a date range are performing poorly. What is the most likely cause?
Wide date ranges nullify the benefit of clustering; BigQuery scans many partitions.
Why this answer
Option D is correct because when a table is partitioned by date and clustered by customer_id, queries that filter on both columns can still perform poorly if the date range filter is too wide, causing BigQuery to scan many partitions. Even with clustering, scanning a large number of partitions negates the benefit of clustering, as clustering only reduces the data scanned within each partition. The query optimizer must read all partitions that fall within the date range, and if that range is broad, the scan overhead dominates.
Exam trap
The trap here is that candidates often assume clustering alone guarantees fast queries on any filter combination, without understanding that partition pruning happens first and a wide date range undermines the benefit of clustering.
How to eliminate wrong answers
Option A is wrong because insufficient slot capacity would cause slow query execution or queuing, not specifically poor performance on partitioned and clustered tables; the issue here is data scanning inefficiency, not resource contention. Option B is wrong because BigQuery is designed to handle tables of any size, and 'too large' is not a meaningful limitation; the problem is query design, not table size. Option C is wrong because the clustering column order is already correct for the typical query pattern (filtering by customer_id and date range); clustering by date first would not improve performance for queries that filter on customer_id, as clustering only benefits the first column in the order.