Free PCDE Define data structures and implement SQL for Business Intelligence Practice Questions (2026)

Q: What does the Define data structures and implement SQL for Business Intelligence domain cover on the PCDE exam?

The Define data structures and implement SQL for Business Intelligence domain covers the key concepts and skills tested in this area of the PCDE exam blueprint published by Google Cloud.

Q: How many Define data structures and implement SQL for Business Intelligence questions are on the PCDE exam?

The Define data structures and implement SQL for Business Intelligence domain is one of the weighted domains on the PCDE exam. The Courseiva question bank has 155 practice questions for this domain.

Q: How can I practice Define data structures and implement SQL for Business Intelligence questions for PCDE?

Click any of the 155 questions listed on this page to see the full question and explanation, or use the session launcher to start a focused practice session of 10, 20, 30 or 50 questions drawn only from the Define data structures and implement SQL for Business Intelligence domain.

Practice Define data structures and implement SQL for Business Intelligence questions

10Q 20Q 30Q 50Q

All PCDE Define data structures and implement SQL for Business Intelligence questions (155)

Start session

Click any question to see the full explanation and answer options, or start a focused practice session above.

A company uses BigQuery for BI reporting. They have a table 'orders' with columns: order_id, customer_id, order_date, amount, status. The BI team frequently runs queries that filter on order_date and group by customer_id to compute total sales per customer. Which partitioning and clustering strategy optimizes query performance and cost?

A retail company uses BigQuery to store sales data. The 'sales' table has 10 billion rows and is partitioned by transaction_date (daily). The BI dashboard runs a query that aggregates sales by product_category for the last 30 days. The query is slow and expensive. Which improvement is most effective?

A company is designing a data warehouse for BI. They need to support both detailed transaction analysis and high-level aggregated reports. Which schema design best balances storage and query performance?

A BI team runs a daily query on a BigQuery table 'events' partitioned by event_date. The query filters on event_date = CURRENT_DATE() and counts rows by event_type. The query is slow. Upon review, the table has 500 partitions but clustering is not set. Which action reduces query cost and latency?

A company stores sensor data in BigQuery. They have a table 'sensor_readings' with columns: sensor_id, reading_time, value. The table is partitioned by reading_time (hourly) and clustered by sensor_id. A BI query aggregates average value per sensor for the last week. The query still scans many bytes. What is the most likely cause?

Which TWO actions improve query performance and reduce cost in BigQuery for BI workloads?

Which THREE are valid considerations when designing BigQuery tables for BI reporting?

The exhibit shows query metadata for a query that scans 10 GB. Given the table is 100 GB and partitioned by hire_date, why did the query scan 10 GB and not less?

The exhibit shows IAM policy for a BigQuery dataset. The BI team reports they can query tables but cannot create views. What is the missing role?

A retail company uses BigQuery to store sales transactions. The BI team needs to create a monthly customer lifetime value (CLV) report that aggregates purchase history across multiple tables. Which BigQuery feature should they use to define the data structure for this report?

A data engineer is designing a BigQuery schema for a time-series dataset of IoT sensor readings. The queries will filter primarily on a timestamp column and also on sensor_id. To optimize query performance and cost, which table design is best?

A financial services company uses BigQuery for risk analysis. They have a table `market_data` with columns `symbol`, `date`, `price`, and `volume`. The query pattern involves window functions over the last 30 days for many symbols. The table is partitioned by date and clustered by symbol. However, analysts report that queries are slow and expensive. What is the most likely cause?

A marketing team needs to analyze customer behavior using BigQuery. They want to create a table that stores the first and last purchase date for each customer from the `orders` table. Which SQL approach should they use?

A logistics company uses BigQuery to track shipments. The `shipments` table has columns `id`, `status`, `created_date`, and `delivery_date`. They need a query that returns the number of shipments that were delivered within 5 days of creation for each month of 2024. Which SQL construct is most appropriate?

A multinational corporation uses BigQuery to combine sales data from multiple regions. Each region stores data in separate tables with identical schemas. The BI team needs to create a unified view for a dashboard that queries data by region and product. Which TWO strategies should the data engineer implement to optimize query performance and reduce costs?

A company uses BigQuery to run business intelligence reports. The data engineer needs to implement a star schema for a sales data warehouse. Which THREE are best practices when designing the tables?

A retail company stores sales transactions in BigQuery. They want to create a materialized view that aggregates daily sales by product category, but they need the view to refresh automatically within 5 minutes of new data being inserted. The source table is partitioned by transaction_date and has a streaming buffer. What should they do to ensure the materialized view refreshes quickly enough?

A financial services company uses BigQuery to run complex analytical queries on trading data. They notice that a particular query joining a large fact table (10 TB) with a small dimension table (100 MB) is slow. The fact table is partitioned by date and clustered by symbol. The dimension table is not partitioned. The query filters on a specific date range and a few symbols. Which optimization is MOST likely to improve query performance?

A company is designing a BigQuery data model for a business intelligence dashboard that shows sales by region and product. The data is refreshed daily. Which schema design is MOST cost-effective and performant for this use case?

A data engineer runs a BigQuery query that joins a large fact table with a small lookup table. The query processes 1 TB of data and takes 30 seconds. The engineer wants to reduce the amount of data processed. Which optimization technique is MOST effective?

A company uses Cloud SQL for PostgreSQL to store transactional data and BigQuery for analytics. They need to sync a subset of tables from Cloud SQL to BigQuery daily for BI reporting. The tables are updated incrementally (INSERT, UPDATE, DELETE). Which approach is MOST reliable and cost-effective?

Which TWO of the following are valid ways to improve the performance of a BigQuery query that joins two large tables?

Which THREE of the following are best practices for designing BigQuery tables for business intelligence reporting?

A company uses BigQuery for BI reporting with a star schema. The fact table 'sales' is partitioned by date and clustered by 'product_id'. The dimensions 'product' and 'customer' are updated nightly via merge statements. Recently, a report that joins 'sales' with 'product' on 'product_id' and filters on sale_date for the last 7 days started timing out. The query plan shows a 'SCAN' of the entire 'product' table. Which optimization should be applied to improve performance?

A data engineer is designing a BI solution in BigQuery for a retail chain. They need to support queries that aggregate sales by store, product, and date across millions of transactions. The data is loaded in near real-time from Cloud Pub/Sub. Which table design provides the best balance of query performance and cost?

A company uses BigQuery to generate daily sales reports. The query aggregates sales by product category and region. The table 'sales_raw' is 500 GB and is updated every hour with new transactions. The report runs slowly. What is the most cost-effective method to improve query performance without changing the existing table schema?

A financial institution uses BigQuery for BI reporting. They have a table 'transactions' (10 TB) partitioned by transaction_date and clustered by customer_id. A common report filters on customer_id and last 30 days. The report is slow. Which change would most improve query performance for this specific report?

A retail company uses BigQuery to analyze sales data. They need to create a weekly report showing total sales per product category for the last 4 weeks, but the query is taking too long and exceeding slot resources. The sales table has over 2 billion rows and is partitioned by date. Which design change would most improve query performance and reduce slot consumption?

A financial services company needs to design a BigQuery data model for real-time fraud detection. Data arrives from multiple streaming sources and must be joined with historical customer profiles (10 TB) and transaction lookup tables (500 GB). Which TWO design considerations are most important to minimize query latency and cost?

Refer to the exhibit. Given the table definition and two queries, which statement about query performance is correct?

A large e-commerce platform uses BigQuery for business intelligence. They have a fact table `orders` (10 TB, partitioned by order_date, clustered by customer_id) and a dimension table `customers` (2 TB, not partitioned, not clustered). The BI team runs a daily dashboard query that joins these tables on customer_id and filters on order_date = CURRENT_DATE() and customer_country = 'US'. The query currently scans the full `customers` table and 2 GB of the `orders` table, taking 30 seconds. The business wants to reduce cost and latency. The `customers` table has 500 million rows and is updated incrementally every hour. Which action will most effectively reduce the amount of data scanned and query time?

Order the steps to migrate an on-premises MySQL database to Cloud SQL using Database Migration Service (DMS).

Order the steps to export data from Cloud Bigtable to Cloud Storage using Dataflow.

Order the steps to perform a disaster recovery drill for a Cloud Spanner database using backups.

Match each Cloud SQL high-availability feature to its description.

Match each Cloud SQL tier to its description.

Match each BigQuery DDL statement to its function.

A data analyst needs to create a reporting table that aggregates sales data by month. They want to ensure the table is optimized for querying by month and product category. Which table design best supports this?

A company is using BigQuery for BI and needs to reduce costs for a large historical dataset that is infrequently queried. Which approach should they take?

An analyst writes a SQL query that joins a fact table with multiple dimension tables. The query runs slowly due to shuffling. Which optimization technique should be applied?

A BI team wants to create a report that shows daily active users for the last 7 days. Which SQL construct is most appropriate for fast performance on a large dataset?

A data engineer notices that a scheduled query exporting BigQuery data to Cloud Storage is failing with a timeout error. The dataset contains 500 million rows. What should they do?

A company uses BigQuery BI Engine for sub-second query performance. However, some queries are hitting the BI Engine memory limit. Which action should be taken?

A SQL query with multiple JOINs is returning duplicate rows. What is the most likely cause?

A data analyst needs to create a rolling 30-day average of daily revenue. Which window function clause is required?

A BI dashboard query is taking too long because it reads all columns from a large table. The dashboard only needs a few columns. What is the best practice?

Which TWO strategies reduce query costs for ad-hoc analysis in BigQuery? (Choose two.)

Which THREE components are required to compute a 7-day moving average of daily sales using a window function? (Choose three.)

Which TWO optimizations best address slow join performance caused by excessive broadcasting in BigQuery? (Choose two.)

The query returns results but takes a long time. The orders table has 500M rows with order_date as a timestamp and revenue as float. How can the query be optimized?

A data analyst runs a query that joins two large tables on a high-cardinality column with many NULL values. Which action is most likely to resolve the error?

A BI team queries this table with a WHERE clause that filters on product_id but does not include a sale_date filter. What is the outcome?

A company is designing a star schema for a BI dashboard that tracks sales performance. The dashboard needs to aggregate sales by product, store, and date. Which schema design is most appropriate?

A data analyst is running a BigQuery query that joins multiple tables to generate a BI report. The query is slow and uses many LEFT JOINs. What is the best approach to improve performance without changing the business logic?

A BI team is designing a BigQuery table for a sales dashboard that queries daily sales by product category and region. The dashboard often filters on a specific date range and a specific region. Which combination of partitioning and clustering should be used?

A BI developer needs to display sales data in a dashboard that shows sales in local time zones. The source data stores all timestamps in UTC. Which is the best practice for handling time zone conversions?

A BI report requires a running total of sales over the last 30 days for each product. The data is in a BigQuery table with columns: sale_date, product_id, amount. Which SQL window function is most efficient?

A BI team uses a complex SQL query with multiple Common Table Expressions (CTEs) that are referenced several times within the main query. The query performs poorly. What is the best optimization strategy?

A financial BI application stores monetary values such as revenue and tax amounts. Which BigQuery data type should be used to ensure accuracy in calculations?

A company tracks customer demographics that change over time (e.g., address). They need to maintain historical accuracy in BI reports. Which approach correctly implements a Type 2 slowly changing dimension?

A BI manager needs to restrict access to sensitive sales data so that salespeople can only see their own region's data. Which BigQuery feature should be used to implement row-level security without duplicating tables?

A user runs the query above on a large table and receives an out-of-memory error. What is the most likely cause?

The query above fails with 'Resources exceeded: UDF out of memory' on a large table. What is the best way to fix this?

What should be adjusted to improve performance and resolve the connection error?

Which TWO best practices should be followed when modeling data for a Looker BI dashboard to optimize query performance?

Which TWO statements are true about designing a star schema for BI reporting?

Which THREE methods are effective for improving query performance in BigQuery for BI workloads?

A company uses BigQuery for BI. They need to create a table that stores daily sales data with millions of rows. The query pattern is to aggregate sales by month for specific product categories. Which table design is most cost-effective and performant?

A data analyst runs a query joining several large tables and gets 'Resources exceeded' error. They need to reduce memory usage without changing the query logic. What should they do?

A company has a BigQuery dataset with many views. They need to ensure that only the latest 30 days of data is used in BI reports for performance. The source table is partitioned by ingestion_time. Which approach reduces query cost and improves performance?

A BI analyst wants to create a report that displays total revenue by product category and month, with ability to drill down to individual products. Which schema design supports this in BigQuery?

Which SQL function in BigQuery is best for replacing NULL values in a numeric column with a default value?

A company has a BigQuery table with a TIMESTAMP column and wants to query data for a specific date range efficiently. Which WHERE clause ensures partition pruning if the table is partitioned by that TIMESTAMP column?

A data engineer needs to design a table to store time-series sensor data arriving every second. The data will be queried mainly for the last hour over a specific device. Which table design minimizes query costs?

A company is using BigQuery and needs to implement row-level security so that sales representatives only see their own region's data. Which approach?

A BI dashboard query is slow and high cost. The query does multiple joins on large tables and uses window functions. The data engineer suggests using materialized views. However, the query uses non-deterministic functions. What is the limitation?

Refer to the exhibit. What is the effect of the partition_expiration_days option?

Refer to the exhibit. What is the likely cause of this error?

Refer to the exhibit. The query used DATE_TRUNC(order_date, MONTH) as month. order_date is a TIMESTAMP column. What is the data type of the month column in the result?

A company is designing a BigQuery data warehouse for sales analytics. They want to minimize query costs when aggregating daily sales by region and product. Which two methods are effective? (Select TWO).

A data team uses BigQuery and wants to ensure data freshness for BI reports with low latency. Which three techniques can help achieve near-real-time updates? (Select THREE).

A BigQuery dataset contains a table with a STRUCT column for customer address. The BI team needs to query the city field from the struct. Which two approaches are valid? (Select TWO).

A company runs near-real-time dashboards on BigQuery that query a table partitioned by day and clustered by user_id. The most common query filters on user_id and then aggregates sales over the last 7 days. However, many queries still scan full partitions. What is the most likely cause?

A data engineer creates a clustered table in BigQuery with clustering order: country, city, product_id. The BI team frequently runs a query that filters on city and product_id but rarely on country. What is the most likely performance issue?

A BI developer needs to write a query that calculates total sales by month for the current year. They create a Common Table Expression (CTE) to define monthly aggregates, then reference it in a final SELECT. What is the main benefit of using a CTE over a subquery in this scenario?

A company uses BigQuery materialized views to pre-aggregate sales data for a BI dashboard. The dashboard requires near-real-time data, but the materialized view currently reflects data up to 30 minutes old. What is the most effective way to reduce the refresh interval without significantly increasing costs?

A BI team uses BigQuery BI Engine to accelerate dashboards. They have a 100 GB table and enable BI Engine with a reservation of 10 GB. Some queries on this table are still slow. What is the most likely reason?

A BI developer is designing a BigQuery dataset for a sales dashboard. Which column naming convention is considered a best practice for column names in BI reports?

A BI query uses COUNT(column) to count non-null values and COUNT(*) to count all rows. The analyst expects both counts to be equal, but COUNT(column) returns fewer rows. What is the most likely explanation?

A BigQuery table is partitioned by ingestion time (pseudo column _PARTITIONTIME) and uses the default partition expiration of 90 days. A data engineer runs a DELETE statement to remove rows older than 100 days. Why does this query process more bytes than expected?

A startup is building a BI stack on Google Cloud. They have moderate data volumes and need to run ad-hoc analytical queries and real-time dashboards. Which Google Cloud database service is most appropriate for this workload?

Which TWO are best practices for designing a star schema in BigQuery for BI? (Choose two.)

Which THREE techniques can improve query performance in BigQuery for BI workloads? (Choose three.)

Which TWO are effective strategies to control costs when running BI queries on BigQuery? (Choose two.)

Refer to the exhibit. The query joins two large tables and aggregates results. Which optimization would most likely reduce the high shuffle bytes in Stage 3?

Refer to the exhibit. The query scans 500 GB even though it filters on the partitioning column event_date and only needs data from 30 days. What is the most likely reason?

Refer to the exhibit. The BI team creates a view to summarize sales. When they query the view with an additional WHERE clause on region, they notice that the underlying query still processes the same amount of data regardless of the filter. What is the most likely reason?

A company uses BigQuery for BI dashboards. Users report that queries on the sales table take longer than expected. The table contains daily transaction data and is not partitioned. Which action will most improve query performance while minimizing cost?

A data engineering team ingests JSON logs into BigQuery using a streaming pipeline. Queries need to extract specific fields from nested arrays. Which SQL construct should be used to efficiently transform the nested data into a flat table for BI?

100

A financial company uses Cloud SQL for PostgreSQL to store transaction data. They need to create a materialized view that aggregates daily sales for a BI dashboard. The underlying transaction table is updated continuously. Which approach ensures the materialized view remains up to date without manual intervention?

101

A retail company uses Cloud Spanner for their OLTP system and wants to run BI queries on the same data without impacting transactional performance. Which solution should they implement?

102

A gaming company ingests player clickstream data in real time via Cloud Pub/Sub. They need to aggregate events per player session in BigQuery with exactly-once semantics. Which architecture minimizes latency and cost?

103

A BI analyst needs to calculate a running total of sales by region over time in BigQuery. Which SQL window function should be used?

104

A company has a BigQuery table partitioned by ingestion time. They want to create a BI report showing month-over-month revenue growth. To minimize query cost, what should they do?

105

A financial institution uses Cloud SQL for MySQL to handle transaction processing. They need to generate daily BI reports that aggregate millions of transactions per account. The BI queries are CPU-intensive and degrade OLTP performance. What is the most effective solution?

106

A company stores user events in BigQuery as nested repeated fields. They want to use Looker to build dashboards on individual events. Which SQL pattern should they use in a derived table to flatten the data?

107

A company uses Cloud SQL for PostgreSQL for its BI database. Queries involving joins on large tables are slow. Which TWO strategies should they implement to improve join performance? (Choose TWO.)

108

A company wants to reduce BigQuery query costs for their BI workloads. Which THREE actions effectively lower the amount of data processed per query? (Choose THREE.)

109

Which TWO BigQuery features are specifically designed to accelerate BI dashboard query performance? (Choose TWO.)

110

The user runs a BigQuery query on a non-partitioned table and receives the error shown. Which optimization should be applied first to resolve the issue?

111

A Looker developer configured a new connection to BigQuery as shown. The connection test fails with the error above. What is the most likely cause?

112

A Dataflow streaming pipeline that writes to a BigQuery table fails with the error above. Which change should be made to the table schema to prevent this error?

113

A company is designing a data warehouse for business intelligence reporting. They want to organize data into fact and dimension tables to support fast aggregations. Which schema design is most appropriate for this purpose?

114

A data analyst reports that a BI dashboard query on BigQuery is taking over 30 seconds to execute. The table is partitioned by date and clustered by customer_id. The query filters on a specific date range and aggregates sales by customer. What is the most likely cause of the slow performance?

115

A company uses BigQuery for BI reporting. They have a materialized view that refreshes automatically to provide pre-aggregated sales data. Recently, the materialized view stopped reflecting new data inserted into the base table. The base table is a streaming buffer table with ingestion-time partitioning. What is the most likely reason?

116

A database engineer is designing a data model for a BI dashboard that tracks daily sales by product category. The data source is a transactional database with a normalized schema. Which BigQuery feature should they use to update the fact table incrementally each day?

117

A BI team finds that their BigQuery query that aggregates sales by region runs slower than expected, even with appropriate clustering and partitioning. The query filters on a date range and then groups by region. The table is partitioned by date and clustered by region. What can the team do to improve query performance without increasing cost?

118

A company has a BigQuery table that stores JSON data in a single column. They want to allow BI analysts to query nested fields using standard SQL. What is the best approach to make the data more query-friendly for BI tools?

119

A company needs to store raw event logs for future BI analysis. The logs are semistructured with varying fields. Which BigQuery data type should they use to store the event payload?

120

A BI analyst wrote a query that computes the running total of sales over time for each product. The query uses a window function with an ORDER BY clause. The results are correct, but the query processes a large amount of data and is slow. What is the most efficient way to optimize this query?

121

A company is migrating their on-premises data warehouse to BigQuery for BI. They have a fact table with billions of rows and many dimension tables. The current queries perform well in the on-prem system but are slow in BigQuery. The queries contain multiple JOINs and subqueries. Which optimization should they implement first?

122

A company uses BigQuery for BI analytics. They want to improve query performance for a table with 10 TB of data. Which two actions should they take? (Choose two.)

123

A financial services company uses BigQuery for BI reporting. They need to design a data model that ensures data consistency and avoids duplicate records in the fact table. Which three practices should they follow? (Choose three.)

124

A company wants to create a BI dashboard that shows daily active users. The data is stored in a BigQuery table with columns: user_id, activity_date, and event_type. Which two optimizations would help reduce query costs? (Choose two.)

125

Refer to the exhibit. A BI analyst runs a query to get total sales for the last 7 days. The query filters on sale_date BETWEEN '2023-01-01' AND '2023-01-07'. What is the primary benefit of the partitioning defined in the table?

126

Refer to the exhibit. A BI query is performing slowly. The query plan shows a large shuffle in the aggregate stage. The table is not partitioned or clustered. Which optimization would most directly reduce the shuffle size?

127

Refer to the exhibit. A data engineer created a materialized view on a table that receives streaming inserts. When they query the materialized view, they get this error. What is the most likely cause?

128

A company is building a business intelligence dashboard on BigQuery to analyze daily sales data. The table contains a TIMESTAMP column 'order_ts' and a string column 'region'. The BI team frequently filters by month and region. Which table design best optimizes query performance and cost?

129

A data engineer is writing a SQL query in BigQuery to calculate the running total of sales per product over time. The table 'sales' has columns product_id, sale_date, and amount. The result must include the cumulative sum ordered by sale_date for each product. Which SQL construct should be used?

130

A BI team uses BigQuery to report on customer orders. The 'customers' dimension table is updated nightly with Type 2 Slowly Changing Dimensions (SCD). However, some reports show incorrect historical aggregates because the fact table references only the current customer key. Which approach resolves this issue?

131

A startup is building a BI system on Cloud SQL (PostgreSQL) for small-to-medium datasets. The data warehouse includes a fact table 'sales_fact' with millions of rows and dimension tables. The BI team reports that 'sales_fact' queries are slow despite proper indexing. What design change would most likely improve performance?

132

A company uses BigQuery for BI reporting. They have a large table 'events' with nested and repeated fields (ARRAY<STRUCT>). Analysts often query unnested data, which is slow. What is the best practice to improve query performance without changing the source schema?

133

A BI team needs to analyze user behavior with sessionization. Each event has a timestamp and session ID. The table 'sessions' contains columns: session_id, user_id, event_time, event_name. The team wants the first event time per session. Which query is most efficient?

134

In BigQuery, a BI analyst wants to store financial data with high precision and avoid rounding errors. Which data type should be used for currency columns?

135

A company uses BigQuery with a table 'orders' that has a column 'items' of type ARRAY<STRUCT<product_id STRING, quantity INT64>>. An analyst needs to find orders that contain a specific product, 'ABC'. Which query is most efficient?

136

A BI team in a large enterprise uses Looker connected to BigQuery. The data model has a primary table 'sales_fact' with billions of rows and multiple dimensions. The team notices that Looker queries often time out. Which approach would most likely resolve this without changing the data model?

137

Which TWO of the following are best practices when designing data structures for business intelligence in BigQuery?

138

Which THREE of the following SQL techniques are commonly used to improve BI query performance in BigQuery?

139

Which TWO of the following are valid approaches when troubleshooting a slow BI query in BigQuery that includes a complex JOIN between a large fact table and multiple dimension tables?

140

You are a database engineer at a retail company. The company uses BigQuery for BI, with a fact table 'sales_fact' partitioned by order_date and containing 100 million rows. There is a dimension table 'products' with 10,000 rows. The BI team reports that the following query takes over 5 minutes to run: SELECT p.category, SUM(s.amount) FROM sales_fact s JOIN products p ON s.product_id = p.product_id WHERE s.order_date >= '2024-01-01' AND s.order_date < '2024-04-01' GROUP BY p.category. The table 'products' is not partitioned or clustered. 'sales_fact' is partitioned by order_date but not clustered. The query only scans 3 months of data (about 25 million rows). However, the join seems slow. What is the most likely cause and what single action would you take to improve performance?

141

You are a cloud database engineer for a financial services firm. The firm uses Cloud SQL for PostgreSQL to support a BI reporting tool. The main table 'transactions' has 500 million rows and is growing daily. Reports often run aggregations over date ranges and group by account_id. The 'transactions' table has indexes on date and account_id separately. Despite these indexes, the reporting queries are slow, often taking over 30 minutes. The database is deployed on a high-memory machine with 32 vCPUs and 256 GB RAM. You notice that the queries perform sequential scans instead of using indexes. What is the most likely reason, and what single change would you make to improve performance?

142

You are a database engineer for an e-commerce company. The company uses BigQuery for its BI and analytics. The data pipeline stages raw event data into a table 'raw_events' with columns: event_id, user_id, event_time, event_type, and a JSON string 'event_data'. The BI team wants to query this data for user behavior analysis, but the JSON parsing makes queries slow. They need to perform frequent queries that extract specific fields from the JSON and filter by event_time. The table 'raw_events' is not partitioned and has 2 billion rows. What is the most effective single step to improve query performance and reduce cost?

143

A company is designing a BigQuery data warehouse for BI dashboards. They have a fact table with billions of rows and need to optimize query performance for common filters on date and customer_id. Which table design strategy is most effective?

144

A data engineer is creating a reporting layer in BigQuery for BI tools. Which TWO practices improve query performance?

145

A BI team is troubleshooting a slow BigQuery query. Which TWO actions can help identify the bottleneck?

146

A company is designing a data model for a BI dashboard that requires real-time updates and historical analysis. Which THREE practices should be followed?

147

A company runs a retail BI dashboard on BigQuery. The fact_sales table is partitioned by DAY and clustered by product_id. The table is 10 TB. Recently, analysts complain that queries filtering on a specific product_id and a month of data take over 10 minutes. The query uses a subquery to find top products. What should the engineer do?

148

A healthcare company needs to run BI queries on patient data. The table is in BigQuery and contains 5 billion rows. Queries often filter on patient_id and date. But the table is not partitioned or clustered. Analysts run queries that scan the entire table. The data is updated daily. What is the most cost-effective way to improve performance?

149

An e-commerce company uses BigQuery for BI. They have a large orders table with columns: order_id, customer_id, order_date, amount, status. Queries frequently aggregate total amount by customer and month. The current table is not partitioned. Users complain about high costs. The table is 2 TB and grows by 50 GB daily. Which action reduces query costs most?

150

A financial company runs BI queries on a BigQuery table that is partitioned by ingestion time. The table is 1 TB and receives streaming inserts every minute. Analysts query the last 24 hours of data. The queries are slow. The table is clustered by transaction_id. What is the likely cause?

151

A marketing team uses a BigQuery BI dashboard to analyze campaign performance. The table campaign_performance is 5 TB, partitioned by date, clustered by campaign_id. Queries filter on date range and campaign_id, and are fast. However, one query that joins this table with a user_dimensions table (10 GB, not partitioned) takes too long. The join is on user_id. What is the best improvement?

152

A company uses BigQuery for real-time BI. They have a table with streaming inserts. Analysts run queries that need to see data within seconds. However, they notice that streaming data appears with a delay of up to 2 minutes. What is the most likely reason?

153

A data engineer is building a BI reporting layer in BigQuery. The source data includes JSON logs with nested fields. Analysts need to query nested arrays efficiently. Which approach is best?

154

A company's BI dashboard queries a BigQuery table that is 20 TB and uses clustering on date and country. The query filters on date and country and also aggregates by category. The query takes 30 seconds. They want to reduce latency to under 5 seconds. What should they do?

155

A data team uses BigQuery for ad-hoc BI queries. They have a table with 100 columns. Analysts often select many columns. The table is partitioned by event_date. Queries are slow and expensive. What two-step optimization should they implement? (Note: This is a single correct answer among four options that combine two steps.)

Practice all 155 Define data structures and implement SQL for Business Intelligence questions

Other PCDE exam domains

Plan and manage database infrastructure Design and implement database schemas Monitor and optimize database performance

Frequently asked questions

What does the Define data structures and implement SQL for Business Intelligence domain cover on the PCDE exam?

The Define data structures and implement SQL for Business Intelligence domain covers the key concepts tested in this area of the PCDE exam blueprint published by Google Cloud. Courseiva provides free domain-focused practice, mock exams, missed-question review, and readiness tracking across all PCDE domains — no account required.

How many Define data structures and implement SQL for Business Intelligence questions are in the PCDE question bank?

The Courseiva PCDE question bank contains 155 questions in the Define data structures and implement SQL for Business Intelligence domain. Click any question to see the full explanation and answer breakdown.

What is the best way to practice Define data structures and implement SQL for Business Intelligence for PCDE?

Start with a 10-question focused session to identify your baseline accuracy in this domain. Read every explanation — even for questions you answer correctly — to understand the reasoning. Once you score consistently above 80%, move to a 20–30 question session to confirm depth before moving to the next domain.

Can I practice only Define data structures and implement SQL for Business Intelligence questions for PCDE?

Yes — the session launcher on this page draws questions exclusively from the Define data structures and implement SQL for Business Intelligence domain. Choose 10, 20, 30, or 50 questions for a focused session, or click individual questions to review them one by one.

Free forever · No credit card required

Track your PCDE domain progress

Save your results, see per-domain analytics, and get readiness scores — free, for every certification.

Free forever · Every certification included