A data team uses BigQuery for ad-hoc BI queries. They have a table with 100 columns. Analysts often select many columns. The table is partitioned by event_date. Queries are slow and expensive. What two-step optimization should they implement? (Note: This is a single correct answer among four options that combine two steps.)
Clustering narrows scans within partitions; selecting only needed columns reduces bytes processed.
Why this answer
Clustering by commonly used columns organizes data within partitions so that queries scanning only those columns read fewer blocks, reducing bytes processed. Limiting selected columns in queries further reduces the data scanned by avoiding unnecessary column reads. Together, these two steps directly address the high cost and slow performance caused by scanning many columns across a large partitioned table.
Exam trap
Google Cloud often tests the misconception that partitioning alone is sufficient for all query optimizations, but the trap here is that partitioning only reduces scan by date range, not by column count—so candidates overlook the need to also limit columns or cluster on non-partition columns.
How to eliminate wrong answers
Option B is wrong because converting to Avro format does not inherently optimize query performance or cost in BigQuery; Avro is a storage format for import/export, not a query optimization technique, and partitioning alone does not reduce the column scan overhead. Option C is wrong because column-level security controls access but does not reduce the amount of data scanned or improve query performance; it adds administrative overhead without addressing the cost or speed issue. Option D is wrong because clustering by event_date is redundant when the table is already partitioned by event_date, and using SELECT * is the opposite of optimization—it forces scanning all columns, increasing cost and latency.