Knowledge + Practice

Google Professional Cloud Database Engineer (PCDE) — Questions 76–150

503 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 2 of 7

76

MCQhard

A BI team uses BigQuery BI Engine to accelerate dashboards. They have a 100 GB table and enable BI Engine with a reservation of 10 GB. Some queries on this table are still slow. What is the most likely reason?

A.The query selects columns that are not fully cached due to the small reservation size.

B.BI Engine only works with SQL views, not direct tables.

C.The table uses clustering, which BI Engine ignores.

D.The table is partitioned, which BI Engine does not support.

AnswerA

BI Engine reserves memory for caching columns; insufficient memory leads to partial caching.

Why this answer

BI Engine accelerates queries by caching columns in memory. With a 100 GB table and only a 10 GB reservation, the cache can hold only a fraction of the table's columns. Queries that reference columns not fully cached will fall back to BigQuery's standard execution, causing slow performance.

Exam trap

Google Cloud often tests the misconception that BI Engine caches entire tables, when in reality it caches only columns up to the reservation limit, and queries referencing uncached columns will be slow.

How to eliminate wrong answers

Option B is wrong because BI Engine works with both tables and SQL views, not exclusively with views. Option C is wrong because BI Engine fully supports clustered tables and can leverage clustering metadata for efficient pruning. Option D is wrong because BI Engine supports partitioned tables and can use partition pruning to reduce the data scanned.

Full explanation →

77

MCQmedium

A company has a parent-child relationship in Cloud Spanner. They want to minimize cross-table join latency. What should they use?

A.Interleaved tables

B.Cloud Functions

C.Stored procedures

D.Secondary indexes

AnswerA

Interleaved tables physically co-locate parent and child rows, minimizing join latency.

Full explanation →

78

Multi-Selecthard

Which THREE considerations are important when designing a schema for Cloud Firestore to ensure scalability?

Select 3 answers

A.Design collections to avoid high read/write rates on a single document.

B.Create composite indexes tailored to the application's query patterns.

C.Nest subcollections up to 10 levels deep to model complex hierarchies.

D.Use collection group indexes for all queries to avoid manual index creation.

E.Limit document size to avoid exceeding the 1 MiB limit.

AnswersA, B, E

Hot documents cause contention; distribute writes across documents.

Why this answer

Options A, C, and E are correct. Option A: Avoiding document growth near 1 MiB prevents performance issues. Option C: Using composite indexes for common queries avoids full scans.

Option E: Sharding writes for a collection with high write throughput avoids hotspotting. Option B is wrong because subcollections cannot be deeply nested (max 20 levels). Option D is wrong because regular (not collection group) indexes require specific fields.

Full explanation →

79

MCQhard

A data scientist runs a complex SQL query on a large BigQuery dataset and receives the above error. The query joins 10 tables and uses multiple window functions. Which action is most likely to resolve the issue?

A.Apply for a quota increase for concurrent queries.

B.Increase the number of slots allocated to the project.

C.Use the '--maximum_billing_tier' flag to increase the billing tier.

D.Simplify the query by reducing the number of joins or using a temporary table.

AnswerD

Reducing query complexity lowers resource demands and can stay within tier limits.

Why this answer

The error indicates the query exceeded resource limits for tier 1, meaning it requires more intermediate resources. The best solution is to optimize the query (C) by reducing complexity, using subqueries, or breaking it into steps. Option A (increasing slots) does not affect tiers.

Option B (quota increase) is for concurrency. Option D (billing tier flag) is deprecated.

Full explanation →

80

MCQmedium

A financial institution uses BigQuery for BI reporting. They have a table 'transactions' (10 TB) partitioned by transaction_date and clustered by customer_id. A common report filters on customer_id and last 30 days. The report is slow. Which change would most improve query performance for this specific report?

A.Change partition column to customer_id

B.Remove clustering and rely only on partitioning

C.Add clustering on transaction_date in addition to customer_id

D.Manually recluster the table daily

AnswerC

Clustering on the partition column can further optimize queries that filter on both customer_id and date range.

Why this answer

Option C is correct because adding clustering on transaction_date alongside customer_id improves query performance for the specific report that filters on both customer_id and the last 30 days. BigQuery uses clustering to sort data within partitions, so clustering by transaction_date ensures that within each partition, the rows for the last 30 days are colocated, reducing the amount of data scanned. This complements the existing partition pruning by further narrowing the scan to relevant blocks.

Exam trap

Google Cloud often tests the misconception that partitioning alone is sufficient for all filter patterns, but the trap here is that clustering on the filter column (transaction_date) is needed to optimize queries that filter on both partition and clustering columns, especially when the partition column is not the primary filter.

How to eliminate wrong answers

Option A is wrong because changing the partition column to customer_id would prevent partition pruning for the date filter (last 30 days), forcing a full table scan of 10 TB and degrading performance. Option B is wrong because removing clustering entirely would eliminate the benefit of sorted blocks within partitions, increasing the amount of data scanned even with partition pruning. Option D is wrong because manually reclustering the table daily is unnecessary and inefficient; BigQuery automatically manages clustering metadata during write operations, and manual reclustering does not provide additional performance gains for this query pattern.

Full explanation →

81

MCQeasy

A team is designing a schema for a time-series database in Bigtable to store IoT sensor readings. Each sensor sends a reading every minute. The team needs to create a row key that supports efficient queries for a specific sensor's readings over a time range. Which row key design is most appropriate?

A.timestamp#sensor_id

B.hash(sensor_id)#timestamp

C.sensor_id#reverse_timestamp

D.random_UUID

AnswerC

Groups all readings for a sensor together in reverse chronological order.

Why this answer

Option C is correct because Bigtable stores rows sorted lexicographically by row key. By placing the sensor_id first, all readings for a given sensor are co-located in contiguous rows. Using reverse_timestamp (e.g., 9999-12-31 minus actual timestamp) ensures that the most recent readings appear first within that sensor's row range, which optimizes scans for the latest data and allows efficient range queries over a time window.

Exam trap

Google Cloud often tests the misconception that putting the timestamp first is always best for time-range queries, but in Bigtable, the row key's prefix determines data locality, so the sensor_id must come first to avoid scattering reads across the entire table.

How to eliminate wrong answers

Option A is wrong because timestamp first scatters readings for the same sensor across the entire table, making queries for a specific sensor's time range require a full table scan or multiple lookups. Option B is wrong because hashing the sensor_id destroys the natural sort order, so even though the sensor_id is first, the hash distributes rows randomly, preventing efficient range scans over time. Option D is wrong because a random UUID provides no ordering or grouping, forcing full table scans for any sensor-specific time-range query.

Full explanation →

82

MCQmedium

Your Cloud SQL for PostgreSQL instance is experiencing intermittent slowdowns during peak hours. You notice that the CPU utilization spikes to 80% and the number of connections increases. The application team confirms they are not running any new queries. What should you do first to diagnose the issue?

A.Increase the machine type of the Cloud SQL instance to add more CPU.

B.Enable connection pooling to reduce the number of connections.

C.Use Cloud SQL Insights to analyze query performance and wait statistics.

D.Set a maximum connection limit and reduce the connection lifetime.

AnswerC

Query Insights helps identify high CPU queries, wait events, and performance trends.

Why this answer

Cloud SQL Insights provides built-in query performance monitoring and wait statistics that can pinpoint the root cause of intermittent slowdowns without making changes. Since CPU spikes and increased connections are symptoms, not the cause, analyzing wait events (e.g., CPU, IO, lock contention) directly reveals which queries or resources are bottlenecked. This is the first diagnostic step before any scaling or configuration changes.

Exam trap

Google Cloud often tests the principle that you must diagnose before scaling; the trap here is that candidates jump to scaling or connection limits (A, B, D) because they seem like immediate fixes, but the correct first step is always to use monitoring tools like Cloud SQL Insights to identify the actual bottleneck.

How to eliminate wrong answers

Option A is wrong because increasing the machine type adds cost and masks the underlying issue without diagnosing why CPU spikes occur; it treats a symptom, not the cause. Option B is wrong because enabling connection pooling reduces connection overhead but does not address CPU spikes or query performance; it may even hide connection-related issues without solving the root cause. Option D is wrong because setting a maximum connection limit or reducing connection lifetime can cause application errors or dropped connections without identifying why connections are increasing or why CPU is spiking.

Full explanation →

83

MCQmedium

What is the correct way to create a Spanner instance with 2 nodes?

A.Use --num-nodes=2 instead of --nodes

B.Set --processing-units=2000 and remove --nodes

C.Remove the --processing-units flag

D.Set --nodes=2 and --processing-units=0

AnswerC

Nodes and processing units are exclusive; remove one.

Why this answer

In Google Cloud Spanner, the `--num-nodes` flag is the correct way to specify the number of nodes when creating an instance. The `--processing-units` flag is used for smaller, burstable configurations (100–1000 processing units per node equivalent) and cannot be combined with `--num-nodes`. Option C is correct because removing `--processing-units` and using `--num-nodes=2` creates a 2-node instance as required.

Exam trap

The trap here is that candidates may think `--nodes` is a valid flag (it is not) or that `--processing-units` can be combined with `--num-nodes`, when in fact they are mutually exclusive and the correct flag for specifying node count is `--num-nodes`.

How to eliminate wrong answers

Option A is wrong because `--num-nodes=2` is the correct flag, but the statement says 'instead of --nodes' — there is no `--nodes` flag in the gcloud spanner instances create command; the correct flag is `--num-nodes`. Option B is wrong because setting `--processing-units=2000` would create a 2-node equivalent in processing units (1000 per node), but the question explicitly asks for '2 nodes', not a processing-units-based instance, and removing `--nodes` is not a valid approach. Option D is wrong because `--processing-units=0` is invalid (minimum is 100) and combining `--nodes=2` with `--processing-units` is not allowed; the flags are mutually exclusive.

Full explanation →

84

MCQeasy

A small business runs a Cloud SQL instance with 10 GB data. They want to automate daily backups with 7-day retention. They also need to restore quickly if needed. What is the simplest solution?

A.Use Cloud Scheduler to trigger exports.

B.Export to Cloud Storage daily using a cron job.

C.Enable automatic backups in Cloud SQL settings.

D.Use gcloud to create a backup schedule with retention.

AnswerC

Automatic backups are simple and support retention settings.

Why this answer

Option C is correct because Cloud SQL provides a built-in automatic backup feature that can be configured with a 7-day retention period directly in the instance settings. This eliminates the need for custom scripts or external schedulers, and enables point-in-time recovery for fast restoration without manual export/import overhead.

Exam trap

The trap here is that candidates often overcomplicate the solution by choosing manual export/import methods (A, B, D) when Cloud SQL's built-in automatic backup feature provides a simpler, fully managed solution with integrated retention and fast restore capabilities.

How to eliminate wrong answers

Option A is wrong because Cloud Scheduler triggers exports to Cloud Storage, which are manual snapshot-like backups that do not support point-in-time recovery and require additional steps to restore, adding complexity. Option B is wrong because exporting to Cloud Storage via a cron job is a manual, scripted approach that lacks the automated retention management and integrated restore capabilities of Cloud SQL's native backup feature. Option D is wrong because gcloud commands can create backup schedules, but this still requires manual setup and does not leverage the fully managed automatic backup feature with built-in retention and point-in-time recovery that is simpler to enable via the Cloud Console or API.

Full explanation →

85

MCQmedium

You need to transfer 10 TB of data from on-premises servers to Cloud Storage for loading into Bigtable. Which method is the most efficient and reliable for this volume?

A.Set up Cloud VPN and use rsync

B.Use Storage Transfer Service

C.Use gsutil cp in parallel

D.Use Dataflow to read and write data

AnswerB

Storage Transfer Service is designed for large-scale data transfers with features like incremental sync and verification.

Why this answer

The Storage Transfer Service is designed for large-scale, online data transfers from on-premises or other cloud providers to Google Cloud. It handles 10 TB efficiently by automatically managing retries, checksums, and network optimization without requiring manual scripting or persistent VPN connections, making it the most reliable and efficient choice for this volume.

Exam trap

The trap here is that candidates confuse 'efficient and reliable' with 'fastest raw throughput' (gsutil cp in parallel) or 'familiar tool' (rsync), overlooking that managed services like Storage Transfer Service provide built-in fault tolerance and optimization for large-scale transfers.

How to eliminate wrong answers

Option A is wrong because Cloud VPN provides encrypted connectivity but does not optimize bulk data transfer; rsync over VPN lacks built-in retry logic and parallelization for 10 TB, leading to slow and unreliable transfers. Option C is wrong because gsutil cp in parallel can transfer data but requires manual management of concurrency, retries, and consistency checks, and is less reliable than a managed service for 10 TB. Option D is wrong because Dataflow is a stream and batch processing framework, not a data transfer tool; using it to read and write data adds unnecessary complexity and cost compared to a purpose-built transfer service.

Full explanation →

86

Multi-Selecthard

A data team uses BigQuery and wants to ensure data freshness for BI reports with low latency. Which three techniques can help achieve near-real-time updates? (Select THREE).

Select 3 answers

A.Create a scheduled query that rewrites the entire table every hour

B.Use a live view that queries the source table directly

C.Use BigQuery's BI Engine for caching

D.Use streaming inserts to load data in real-time

E.Schedule a query every 15 minutes to refresh a materialized view

AnswersB, D, E

A view always returns the latest data from the base table, so it reflects streaming inserts immediately.

Why this answer

Option B is correct because a live view (also known as a logical view) queries the source table directly each time it is accessed, ensuring that BI reports always see the most current data without any materialization delay. This provides near-real-time freshness by avoiding periodic refresh cycles.

Exam trap

The trap here is that candidates often confuse caching mechanisms (like BI Engine) with data freshness techniques, not realizing that caching improves query speed but does not update the underlying data; they may also mistakenly think that periodic full table rewrites (Option A) are acceptable for near-real-time, when in fact they introduce significant latency and cost.

Full explanation →

87

MCQeasy

You are designing a Firestore database for a chat application. Documents will store messages with fields: senderId, messageText, timestamp, conversationId. To efficiently retrieve the most recent 50 messages in a conversation, which index should you create?

A.A composite index on (conversationId, timestamp, __name__) descending

B.A single-field index on timestamp

C.An index on conversationId only

D.A composite index on (senderId, timestamp)

AnswerA

This index covers the query with filtering and ordering, enabling efficient retrieval.

Why this answer

Option A creates a composite index on (conversationId, timestamp, __name__) with descending order on timestamp, which efficiently supports queries that filter by conversationId and order by timestamp descending, limiting to 50 results. Option B only indexes timestamp, not filtering by conversation. Option C indexes senderId, which is not used in the query.

Option D indexes conversationId only, but without timestamp order, it would require sorting in memory.

Full explanation →

88

MCQhard

A BI team uses a complex SQL query with multiple Common Table Expressions (CTEs) that are referenced several times within the main query. The query performs poorly. What is the best optimization strategy?

A.Add indexes on the tables used in the CTEs

B.Use temporary tables or table snapshots to materialize the CTE results

C.Reuse the same CTE names as often as possible in the query

D.Replace CTEs with derived tables in the FROM clause

AnswerB

Materializing the result once and referencing the temporary table avoids repeated computation.

Why this answer

Option B is correct because CTEs in SQL Server are not materialized by default; they are evaluated each time they are referenced, leading to repeated execution of the same logic. By using temporary tables or table snapshots, you materialize the intermediate result set once, which avoids redundant scans and significantly improves performance for complex queries with multiple CTE references.

Exam trap

Google Cloud often tests the misconception that CTEs are automatically materialized or cached, leading candidates to overlook the need for explicit temporary tables when performance is critical.

How to eliminate wrong answers

Option A is wrong because adding indexes on base tables does not address the core issue of repeated CTE evaluation; indexes can help but are not a targeted fix for the redundant execution of CTE logic. Option C is wrong because reusing the same CTE name multiple times does not change execution behavior; each reference still triggers a separate evaluation of the CTE definition. Option D is wrong because replacing CTEs with derived tables in the FROM clause does not change the execution plan; derived tables are also non-materialized and will be re-evaluated on each reference, offering no performance benefit.

Full explanation →

89

MCQmedium

A data engineer is designing a BI solution in BigQuery for a retail chain. They need to support queries that aggregate sales by store, product, and date across millions of transactions. The data is loaded in near real-time from Cloud Pub/Sub. Which table design provides the best balance of query performance and cost?

A.Partition by store_id, cluster by product_id

B.Partition by date, cluster by store_id and product_id

C.Unpartitioned table with clustering on store_id and product_id

D.Use materialized views with aggregation on store_id, product_id, and date

AnswerB

Partitioning by date enables efficient pruning for time-range queries, and clustering on store_id and product_id speeds up common aggregations.

Why this answer

Option B is correct because partitioning by date enables BigQuery to prune entire partitions when querying by date range, which is the most common filter in sales aggregation queries. Clustering on store_id and product_id further reduces the data scanned within each partition by colocating rows with similar store and product values. This design minimizes both query cost (bytes billed) and latency, while supporting near-real-time ingestion from Pub/Sub without requiring table rewrites.

Exam trap

Google Cloud often tests the misconception that partitioning can be applied to any column type (like store_id) or that clustering alone is sufficient for cost control, when in fact BigQuery requires partitioning on a time-unit or integer-range column and clustering is a complementary optimization, not a replacement.

How to eliminate wrong answers

Option A is wrong because BigQuery does not support partitioning by store_id (partitioning requires a DATE, TIMESTAMP, or INTEGER column with a specified range), and clustering alone cannot provide the same level of cost reduction as date-based partitioning for time-range queries. Option C is wrong because an unpartitioned table with clustering only still requires scanning the entire table for queries that filter by date, leading to higher costs and slower performance compared to a partitioned design. Option D is wrong because materialized views are automatically refreshed and incur additional storage costs; they do not replace the need for an efficient base table design, and they cannot be used as the primary ingestion target for near-real-time data from Pub/Sub.

Full explanation →

90

MCQeasy

A retail company is designing a Cloud Spanner schema for an order management system. Orders are identified by a UUID and contain multiple line items. Each line item references a product. Which schema design best supports high read throughput for queries that retrieve all line items for a given order?

A.Store orders and line items in a single table with repeated fields for line items.

B.Create an Orders table and a LineItems table interleaved in Orders with ORDER_ID as the parent key.

C.Create separate Orders and LineItems tables with a foreign key relationship and index on ORDER_ID.

D.Denormalize product information into the LineItems table and store orders separately.

AnswerB

Interleaving colocates line items with their order for fast retrieval.

Why this answer

Option B is correct because Cloud Spanner interleaved tables store child rows (LineItems) physically adjacent to their parent row (Orders) on the same split, enabling a single key lookup to retrieve all line items for a given order without cross-table joins or distributed queries. This colocation maximizes read throughput by minimizing latency and avoiding scatter-gather operations across nodes.

Exam trap

Google Cloud often tests the misconception that a foreign key with an index is equivalent to interleaving for performance, but in Cloud Spanner, only interleaved tables guarantee physical colocation and single-split access for parent-child queries, whereas indexed foreign keys still require distributed lookups.

How to eliminate wrong answers

Option A is wrong because storing repeated fields (e.g., ARRAY<STRUCT>) for line items within a single row violates Cloud Spanner's 10 MB row size limit and prevents efficient indexing or atomic updates of individual line items, degrading throughput for large orders. Option C is wrong because separate tables with a foreign key and index on ORDER_ID require a two-step lookup (index scan then table access) and may involve distributed reads if the index and data are on different splits, increasing latency compared to interleaving. Option D is wrong because denormalizing product information into LineItems does not address the core read pattern (retrieving all line items for an order) and introduces data redundancy and update anomalies without improving colocation; it still requires a separate table or repeated fields, neither of which matches the interleaved design's performance benefit.

Full explanation →

91

MCQhard

Your Cloud SQL for SQL Server instance has a query that uses a non-clustered index to filter rows but then performs key lookups to retrieve additional columns. The query is slow. Which database tuning option would most likely reduce I/O?

A.Increase the buffer pool size

B.Rebuild the non-clustered index with FILLFACTOR=80

C.Create a covering index that includes all columns referenced in the query

D.Use a FORCESEEK query hint

AnswerC

Covering index avoids the need for lookups, reducing I/O.

Why this answer

The query is slow because key lookups require random I/O to retrieve additional columns not included in the non-clustered index. Creating a covering index that includes all columns referenced in the query eliminates the need for key lookups entirely, converting the operation into a single index seek or scan with minimal I/O. This directly reduces the number of page reads and improves query performance.

Exam trap

Google Cloud often tests the misconception that any index tuning or query hint can fix performance, but the trap here is that candidates may choose FORCESEEK or FILLFACTOR without realizing that only a covering index directly addresses the root cause of key lookup I/O.

How to eliminate wrong answers

Option A is wrong because increasing the buffer pool size only caches more data in memory, which may reduce physical I/O but does not eliminate the logical I/O caused by key lookups; the query still performs the same number of page accesses. Option B is wrong because rebuilding the index with FILLFACTOR=80 reduces page splits and fragmentation but does not change the index structure to include additional columns, so key lookups still occur. Option D is wrong because using a FORCESEEK query hint forces the optimizer to use an index seek, but it does not prevent key lookups if the index does not cover all required columns; it may even degrade performance by forcing a suboptimal plan.

Full explanation →

92

MCQeasy

A developer has deployed a new version of an application that uses Cloud SQL. After the deployment, you notice a sharp increase in the number of slow queries. What should you do first to identify the problematic queries?

A.Check the slow query log in Cloud Logging and look for queries with high rows_examined.

B.Use Cloud SQL Query Insights to identify the queries with the highest latency and examine their execution plans.

C.Increase the instance tier to reduce the impact of slow queries.

D.Enable the general query log and parse the log file to find slow queries.

AnswerB

Query Insights provides detailed query performance data without additional overhead.

Why this answer

Cloud SQL Query Insights is the recommended first step for diagnosing slow queries because it provides built-in query monitoring, latency breakdowns, and execution plans without additional configuration. It directly surfaces the queries with the highest latency, allowing you to examine their execution plans to identify root causes such as missing indexes or inefficient joins.

Exam trap

Google Cloud often tests the distinction between reactive scaling (Option C) and proactive diagnostics (Option B), trapping candidates who think adding resources is the first troubleshooting step instead of identifying the root cause.

How to eliminate wrong answers

Option A is wrong because the slow query log in Cloud Logging requires manual filtering and may not be enabled by default, whereas Query Insights provides immediate, structured visibility into high-latency queries. Option C is wrong because increasing the instance tier only masks the symptom by adding more resources, without identifying or fixing the underlying problematic queries. Option D is wrong because enabling the general query log generates excessive volume and performance overhead, and parsing it manually is inefficient compared to using Query Insights' built-in analysis.

Full explanation →

93

MCQhard

A company runs a BigQuery data warehouse with many scheduled queries and materialized views. They notice that materialized view refreshes are taking longer than expected, causing delays in downstream reports. What is the most effective optimization?

A.Manually refresh materialized views outside peak hours

B.Increase the refresh interval to reduce frequency

C.Disable automatic refresh and use scheduled queries to rebuild the materialized view

D.Partition and cluster the base table on columns used in the materialized view

AnswerD

Partitioning and clustering reduce the amount of data scanned during refresh, improving speed.

Why this answer

Partitioning and clustering the base table on columns used in the materialized view (D) is the most effective optimization because it allows BigQuery to perform incremental refreshes using only the changed partitions, significantly reducing scan and recomputation overhead. Without proper partitioning, the materialized view refresh must scan the entire base table, which becomes increasingly costly as data grows. Clustering further improves efficiency by co-locating related data, minimizing the data processed during aggregation or join operations in the refresh.

Exam trap

The trap here is that candidates often assume scheduling or manual timing adjustments (A, B, C) will solve performance issues, when in fact the core optimization lies in the physical design of the base table to enable incremental processing, which is a fundamental BigQuery materialized view requirement.

How to eliminate wrong answers

Option A is wrong because manually refreshing materialized views outside peak hours does not address the root cause of slow refreshes; it merely shifts the timing, and the underlying full-table scan cost remains unchanged. Option B is wrong because increasing the refresh interval reduces the frequency of refreshes but does not optimize the refresh operation itself; the refresh will still be slow when it runs, and downstream reports may become even more stale. Option C is wrong because disabling automatic refresh and using scheduled queries to rebuild the materialized view replaces an optimized incremental refresh with a full rebuild, which is typically slower and more expensive, and it loses BigQuery's automatic incremental refresh capabilities.

Full explanation →

94

Matchingmedium

Match each database migration term to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Fully managed service for migrating to Cloud SQL

Same database engine source and target

Different database engine source and target

Ongoing replication with minimal downtime

Full dump and restore with planned downtime

Why these pairings

These terms describe migration strategies and tools in Google Cloud.

Full explanation →

95

MCQeasy

A Cloud Spanner database is experiencing high latency for point reads. The table has a primary key of (CustomerID, OrderDate). Most reads are by CustomerID only. What should the engineer do?

A.Add a secondary index on CustomerID.

B.Use interleaved tables.

C.Reorder primary key to (OrderDate, CustomerID).

D.Increase the number of nodes.

AnswerA

A secondary index allows direct lookup by CustomerID, significantly reducing read latency.

Why this answer

Point reads by CustomerID only are inefficient on the primary key (CustomerID, OrderDate) because Cloud Spanner requires the full primary key for direct lookup. Adding a secondary index on CustomerID allows Spanner to perform an index scan followed by a point read, drastically reducing latency for these queries.

Exam trap

The trap here is that candidates assume reordering the primary key (Option C) is a valid optimization, but Cloud Spanner's design requires the full primary key for efficient point reads, and changing the key order does not eliminate the need for a secondary index when filtering on a prefix alone.

How to eliminate wrong answers

Option B is wrong because interleaved tables optimize joins and hierarchical data access, not point reads by a non-primary-key column. Option C is wrong because reordering the primary key to (OrderDate, CustomerID) would break existing data distribution and still not optimize reads by CustomerID alone, as OrderDate would still be required for efficient lookups. Option D is wrong because increasing nodes improves throughput and capacity, not the fundamental latency of point reads caused by missing index support.

Full explanation →

96

Multi-Selectmedium

A company is migrating their on-premises MySQL database to Cloud SQL. The database is 500 GB and they have a 1 Gbps network connection. They want to minimize downtime. Which THREE steps should they take?

Select 3 answers

A.Perform a test migration to validate the process.

B.Export the database to a SQL file and import after the cutover.

C.Test the application against the new Cloud SQL instance.

D.Use Database Migration Service with continuous replication.

E.Schedule a maintenance window for the cutover.

AnswersA, C, D

Testing reduces risk.

Why this answer

Option A is correct because performing a test migration validates the entire process, including schema compatibility, data integrity, and application connectivity, before the actual cutover. This reduces the risk of unexpected failures during the production migration, which is critical for minimizing downtime. A test migration also helps estimate the time required for the final cutover and allows for tuning of the Database Migration Service settings.

Exam trap

Google Cloud often tests the misconception that a simple export/import or a maintenance window is sufficient for minimizing downtime, when in fact continuous replication and pre-validation steps are essential for achieving near-zero downtime migrations.

Full explanation →

97

MCQhard

A gaming company ingests player clickstream data in real time via Cloud Pub/Sub. They need to aggregate events per player session in BigQuery with exactly-once semantics. Which architecture minimizes latency and cost?

A.Use Cloud Functions to write each message directly to BigQuery

B.Use Cloud Dataflow with exactly-once processing to BigQuery

C.Use Cloud Pub/Sub subscription to write to BigQuery directly

D.Use Cloud Dataproc to run Spark streaming jobs

AnswerB

Dataflow provides exactly-once semantics, low latency, and is cost-effective for this volume.

Why this answer

Cloud Dataflow with exactly-once processing is the correct choice because it provides a unified stream and batch processing model that guarantees exactly-once semantics when writing to BigQuery via the BigQuery I/O connector. This minimizes latency by processing events in micro-batches or streaming mode while avoiding duplicate data, and it is cost-effective as Dataflow auto-scales based on the Pub/Sub throughput.

Exam trap

Google Cloud often tests the misconception that Cloud Pub/Sub can directly write to BigQuery, but in reality Pub/Sub requires a subscriber (like Dataflow) to process the messages before they can be loaded into BigQuery.

How to eliminate wrong answers

Option A is wrong because Cloud Functions writing directly to BigQuery cannot guarantee exactly-once semantics; a function may be retried on failure, leading to duplicate rows, and it lacks built-in deduplication or checkpointing for streaming data. Option C is wrong because Cloud Pub/Sub subscriptions do not support writing directly to BigQuery; Pub/Sub is a messaging service and requires a subscriber (like Dataflow) to process and write data, so this option is not technically feasible. Option D is wrong because Cloud Dataproc running Spark streaming jobs introduces higher operational overhead and latency compared to Dataflow, and while Spark can achieve exactly-once semantics, it requires more manual configuration and does not integrate as seamlessly with BigQuery's streaming buffer as Dataflow does.

Full explanation →

98

MCQmedium

A financial services company uses Cloud Spanner for transaction processing. They notice increased latency during peak hours. They suspect a hot spot. What is the best way to diagnose the issue?

A.Review Key Visualizer to identify hot keys

B.Add more nodes to the instance

C.Switch to Cloud SQL

D.Check Cloud Spanner's CPU utilization metrics

AnswerA

Key Visualizer shows read and write heatmaps per key range, allowing identification of hot spots.

Why this answer

Key Visualizer is a Cloud Spanner tool that provides a heatmap of access patterns across keys and time ranges, directly revealing hot spots (e.g., monotonically increasing keys or skewed read/write distributions). This allows you to identify the specific keys causing contention without guesswork, making it the most targeted diagnostic approach for hot spot issues.

Exam trap

The trap here is that candidates confuse high-level metrics (CPU utilization) with diagnostic tools, assuming that resource pressure alone identifies the root cause, when in fact only a key-level analysis like Key Visualizer reveals the specific hot key pattern.

How to eliminate wrong answers

Option B is wrong because adding nodes increases throughput and storage capacity but does not diagnose the root cause of a hot spot; it may mask the symptom without fixing the key design issue. Option C is wrong because switching to Cloud SQL is a migration that abandons Spanner's horizontal scaling and global consistency, and does not help diagnose the existing hot spot in Spanner. Option D is wrong because CPU utilization metrics indicate overall resource pressure but cannot pinpoint which specific keys or access patterns are causing contention, so they are insufficient for diagnosing hot spots.

Full explanation →

99

MCQeasy

A company wants to ensure point-in-time recovery for their PostgreSQL database on Cloud SQL. What must they enable?

A.Query insights

B.Automatic backups

C.Write-ahead logging (WAL) archiving

D.Binary logging

AnswerB

Automatic backups enable PITR for Cloud SQL PostgreSQL instances.

Why this answer

Automatic backups in Cloud SQL enable point-in-time recovery (PITR) by maintaining transaction logs that allow you to restore the database to any specific time within the backup retention period. Without automatic backups enabled, Cloud SQL only supports restoring from a full backup snapshot, which does not provide the granularity needed for PITR.

Exam trap

Google Cloud often tests the misconception that enabling WAL archiving directly in PostgreSQL is required for PITR, but in Cloud SQL, this is abstracted away and controlled by the automatic backups setting.

How to eliminate wrong answers

Option A is wrong because Query insights is a performance monitoring and troubleshooting feature that provides query-level metrics and execution plans, not a mechanism for backup or recovery. Option C is wrong because write-ahead logging (WAL) archiving is a PostgreSQL internal mechanism for replication and crash recovery, but in Cloud SQL, PITR is enabled via automatic backups, not by directly configuring WAL archiving. Option D is wrong because binary logging is a MySQL/MariaDB feature used for replication and PITR in those databases, not applicable to PostgreSQL, which uses WAL instead.

Full explanation →

100

MCQhard

A company uses Cloud Spanner with a multi-region configuration (nam7) to support a global user base. They notice increased read latency for users in Europe, while write latency is acceptable. The database engineer observes that most queries are single-row reads using the primary key. What is the best approach to reduce read latency for European users?

A.Create a secondary index on the primary key column.

B.Increase the number of Spanner nodes to distribute the load.

C.Use read-only transactions with stale reads (timestamp bound) to read from local replicas.

D.Partition the database by region and use directed reads.

AnswerC

Stale reads allow Spanner to serve reads from the nearest replica, reducing latency for far-away users.

Why this answer

Option C is correct because Cloud Spanner's multi-region configuration (nam7) includes regional replicas in Europe. By using stale reads with a timestamp bound, read-only transactions can be served from a local replica without incurring cross-region latency, which directly reduces read latency for European users while maintaining acceptable consistency.

Exam trap

Google Cloud often tests the misconception that increasing nodes or adding indexes solves geographic latency, when the real solution is leveraging replica locality via stale reads or read-only transactions.

How to eliminate wrong answers

Option A is wrong because creating a secondary index on the primary key column is redundant—the primary key is already indexed by the table's primary index, and this does not address the cross-region latency issue. Option B is wrong because increasing the number of Spanner nodes distributes load and improves throughput, but does not reduce the physical distance or network round-trip time between European users and the regional leader, so read latency remains high. Option D is wrong because Cloud Spanner does not support partitioning a database by region with directed reads; directed reads can be used to route reads to specific zones, but the database is a single global resource, and partitioning would break global consistency and transaction semantics.

Full explanation →

101

MCQmedium

A company has a Cloud SQL for PostgreSQL instance that experiences high CPU usage during peak hours due to read-heavy queries. Which optimization is most effective for reducing CPU load?

A.Use connection pooling

B.Increase memory size

C.Add read replicas

D.Enable automatic storage increase

AnswerC

Read replicas distribute read queries, reducing CPU on the primary instance.

Why this answer

Adding read replicas offloads read-heavy queries from the primary Cloud SQL for PostgreSQL instance, distributing the query load and reducing CPU utilization on the primary. This is the most direct and effective optimization for read-heavy workloads because replicas handle SELECT traffic while the primary focuses on writes and critical operations.

Exam trap

Google Cloud often tests the misconception that connection pooling or memory increases are universal performance fixes, but for read-heavy CPU spikes, offloading reads to replicas is the targeted solution.

How to eliminate wrong answers

Option A is wrong because connection pooling reduces the overhead of establishing new database connections, but it does not reduce the CPU cost of executing the read-heavy queries themselves; the same number of queries still run on the same instance. Option B is wrong because increasing memory size can improve cache hit ratios and reduce disk I/O, but it does not directly lower CPU usage from query execution; CPU-bound workloads are not resolved by adding memory. Option D is wrong because automatic storage increase only prevents out-of-disk errors by expanding disk capacity; it has no effect on CPU utilization or query processing load.

Full explanation →

102

MCQeasy

A company runs a Spanner instance with a single region configuration. They are experiencing increased latency for writes when there is a network disruption between their application and the Spanner instance. The application is deployed in the same region. What should the database engineer do to minimize write latency during such disruptions?

A.Implement client-side retry logic.

B.Enable multi-region configuration.

C.Use a compute engine instance as a proxy.

D.Increase the number of nodes.

AnswerA

Retry logic handles transient disruptions without architectural change.

Why this answer

Option D is correct because client-side retry logic with appropriate backoff is the standard approach to handle transient network disruptions without requiring architectural changes. Option A (multi-region) increases complexity and doesn't help for local disruptions. Option B (increase nodes) improves throughput but not latency during disruptions.

Option C (proxy) adds an extra hop, likely worsening latency.

Full explanation →

103

MCQmedium

You manage a Cloud SQL for PostgreSQL instance that handles OLTP workloads. Users in a different region report slow query response times. You notice that the database CPU utilization is below 30%, but network latency is high. What is the most cost-effective solution to reduce query latency without migrating the database?

A.Add more memory to the instance to increase cache hit ratio.

B.Increase the instance's vCPUs to handle more concurrent connections.

C.Create cross-region read replicas and route read queries to the nearest replica.

D.Migrate the database to Cloud Spanner using a live migration service.

AnswerC

Read replicas reduce the network distance for read traffic, improving latency without moving the primary database.

Why this answer

The correct answer is C because the issue is high network latency for users in a different region, not local resource contention. Creating cross-region read replicas allows read queries to be served from a replica closer to the users, reducing network round-trip time without migrating the database. This is the most cost-effective solution as it avoids expensive instance upgrades or a full migration to Cloud Spanner.

Exam trap

The trap here is that candidates often focus on scaling the instance (CPU or memory) when the symptom is high latency, but the root cause is geographic distance, not resource exhaustion.

How to eliminate wrong answers

Option A is wrong because adding more memory to increase the cache hit ratio addresses local cache misses, not high network latency; CPU utilization is below 30%, indicating no memory pressure. Option B is wrong because increasing vCPUs handles more concurrent connections, but the problem is network latency, not CPU or connection bottlenecks. Option D is wrong because migrating to Cloud Spanner is a costly and complex operation that involves changing the database paradigm from relational to globally distributed, which is overkill for a simple latency issue that can be solved with read replicas.

Full explanation →

104

Multi-Selectmedium

A company is designing a BigQuery data warehouse for sales analytics. They want to minimize query costs when aggregating daily sales by region and product. Which two methods are effective? (Select TWO).

Select 2 answers

A.Creating a materialized view with GROUP BY region, product, day

B.Using a view that queries the raw data with WHERE clause

C.Storing pre-aggregated results in a separate table and updating nightly

D.Creating indexes on the raw table

E.Using a clustered table on (region, product) with partition by day

AnswersA, E

Materialized views store precomputed results and are automatically refreshed, reducing query cost and time.

Why this answer

Option A is correct because a materialized view in BigQuery pre-computes and stores the results of the GROUP BY query on region, product, and day. When the underlying data changes, the materialized view is incrementally refreshed, so queries that match the view's aggregation are served directly from the stored results, avoiding full table scans and reducing query costs (bytes processed). This is ideal for recurring aggregation patterns like daily sales summaries.

Exam trap

Google Cloud often tests the distinction between a view (which is just a saved query) and a materialized view (which stores pre-computed results), leading candidates to incorrectly select Option B as a cost-saving measure.

Full explanation →

105

MCQeasy

Based on the exhibit from Cloud Spanner Query Insights, what is the most likely performance issue?

A.High network latency

B.Full table scan

C.Inefficient join

D.No index on customer_id

AnswerD

Missing index causes a full scan of the Orders table.

Why this answer

Option D is correct because the exhibit shows a query with a filter on `customer_id` that is not indexed, forcing Cloud Spanner to perform a full table scan to find matching rows. This is the most likely performance issue, as indicated by high latency and high row scan counts in Query Insights, which directly points to a missing index on the filtered column.

Exam trap

Google Cloud often tests the distinction between a symptom (full table scan) and its root cause (missing index), tricking candidates into selecting the visible effect rather than the underlying configuration issue.

How to eliminate wrong answers

Option A is wrong because high network latency would manifest as increased client-side wait times and not as high row scan counts or CPU usage within the Spanner backend; Query Insights metrics focus on database-side execution, not network round trips. Option B is wrong because a full table scan is a symptom, not the root cause—the underlying reason for the full scan is the missing index on customer_id, making B a description of the effect rather than the most likely performance issue. Option C is wrong because an inefficient join would show high join-related metrics like rows returned from join operations or skewed distribution, but the exhibit does not indicate any join operations; the query appears to be a simple filter on a single table.

Full explanation →

106

MCQhard

Your team uses Cloud Bigtable for a time-series data analytics platform. You observe that the write throughput has dropped significantly, and Cloud Monitoring shows that most of the CPU usage is concentrated on a few nodes. The remaining nodes have low CPU usage. The data model uses sequential timestamps as row keys, and the application writes data for many different sensors. Each sensor ID is part of the row key. What is the most effective action to resolve this hot spotting?

A.Reduce the batch size of writes to decrease the load on each node.

B.Use a different Bigtable cluster and migrate data.

C.Increase the number of nodes in the cluster to provide more CPU capacity.

D.Prepend a hash of the sensor ID to the row key to distribute writes evenly.

AnswerD

This breaks the sequential key pattern and distributes writes across all nodes, eliminating hot spotting.

Why this answer

Option A is correct: Adding a hash prefix to row keys will distribute writes across nodes, preventing hot spotting. Option B (increasing nodes) may spread load but without fixing row key design, hot spots may persist. Option C (using a different cluster) doesn't address the design issue.

Option D (reducing write batch size) may reduce latency but not the uneven distribution.

Full explanation →

107

MCQmedium

A retail company uses Cloud Spanner to store product inventory data. The table structure is: CREATE TABLE Inventory ( ProductId INT64 NOT NULL, WarehouseId INT64 NOT NULL, StockLevel INT64 NOT NULL, LastUpdated TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true) ) PRIMARY KEY (ProductId, WarehouseId); The application frequently runs the query: SELECT ProductId, SUM(StockLevel) AS TotalStock FROM Inventory WHERE WarehouseId = 123 GROUP BY ProductId. The query is slow and scans many rows. The index used is: CREATE INDEX InventoryByWarehouse ON Inventory (WarehouseId); What is the most effective schema change to improve query performance?

A.Change the primary key to (WarehouseId, ProductId) so rows are interleaved by warehouse.

B.Create a materialized view that pre-aggregates stock by warehouse.

C.Modify the index to INCLUDE StockLevel: CREATE INDEX InventoryByWarehouse ON Inventory (WarehouseId) STORING (StockLevel).

D.Add a STORED GENERATED column for total stock per warehouse.

AnswerC

The STORING clause adds StockLevel to the index, making it a covering index for the query, so Cloud Spanner can return results from the index alone without scanning the base table.

Why this answer

Option C is correct because the query needs to read StockLevel for every row matching WarehouseId, but the existing index only covers WarehouseId, forcing a back-join to the base table. By using STORING (StockLevel), the index becomes a covering index that includes the StockLevel column, eliminating the need for the back-join and reducing the number of rows scanned to only those matching the warehouse filter.

Exam trap

The trap here is that candidates often think changing the primary key order (Option A) will physically colocate data and speed up the query, but in Cloud Spanner, primary key order does not eliminate the need to scan all rows for a given WarehouseId, and the query still requires aggregation across ProductId groups, so a covering index is the correct optimization.

How to eliminate wrong answers

Option A is wrong because changing the primary key to (WarehouseId, ProductId) would reorder the table's physical storage, but Cloud Spanner does not support interleaving in the same way as Cloud SQL; more importantly, the query still needs to aggregate StockLevel across all rows for each ProductId, and a primary key change does not avoid scanning all rows for the given WarehouseId. Option B is wrong because creating a materialized view that pre-aggregates stock by warehouse would not help this query, which groups by ProductId, not by warehouse; the materialized view would need to be grouped by (WarehouseId, ProductId) to be useful, and even then, maintaining a materialized view adds write overhead and complexity. Option D is wrong because a STORED GENERATED column for total stock per warehouse is not possible in Cloud Spanner—generated columns cannot reference rows from other rows or perform aggregation, and they are computed per row, not across rows.

Full explanation →

108

MCQmedium

A retail company uses BigQuery to analyze sales data. They need to create a weekly report showing total sales per product category for the last 4 weeks, but the query is taking too long and exceeding slot resources. The sales table has over 2 billion rows and is partitioned by date. Which design change would most improve query performance and reduce slot consumption?

A.Increase the number of available slots in the reservation.

B.Cluster the table by product_category within the existing date partitions.

C.Create a materialized view that pre-aggregates sales by category and date.

D.Partition the table by product_category instead of date.

AnswerB

Clustering by product_category allows the query to skip irrelevant blocks, reducing data scanned and slot usage.

Why this answer

Option B is correct because clustering the table by product_category within the existing date partitions organizes the data physically so that queries filtering or grouping by product_category can skip irrelevant blocks. This reduces the amount of data scanned and the slot consumption, directly addressing the performance issue without requiring additional resources.

Exam trap

Google Cloud often tests the misconception that adding more slots (Option A) is the primary solution for slow queries, when in reality data skipping techniques like clustering or partitioning are more cost-effective and fundamental to performance optimization in BigQuery.

How to eliminate wrong answers

Option A is wrong because increasing slots only adds more parallel processing capacity but does not reduce the amount of data scanned; the query would still process all 2 billion rows, leading to unnecessary slot consumption. Option C is wrong because a materialized view pre-aggregates by category and date, but it still requires scanning the base table for updates and does not optimize the existing partitioned table's scan efficiency for the weekly report; it also incurs additional storage and maintenance costs. Option D is wrong because partitioning by product_category instead of date would create a large number of small partitions (one per category), which is inefficient for range-based queries (e.g., last 4 weeks) and can lead to partition explosion, increasing metadata overhead and query latency.

Full explanation →

109

Multi-Selecthard

A multinational corporation uses BigQuery to combine sales data from multiple regions. Each region stores data in separate tables with identical schemas. The BI team needs to create a unified view for a dashboard that queries data by region and product. Which TWO strategies should the data engineer implement to optimize query performance and reduce costs?

Select 2 answers

A.Partition the table by date and cluster by region and product

B.Use a wildcard table with a filter on _TABLE_SUFFIX to query only required region tables

C.Create a view with UNION ALL of all region tables

D.Create materialized views for each region

E.Store all data in a single table with region as a column

AnswersA, B

Reduces data scanned for common filter conditions.

Why this answer

Option A is correct because partitioning the table by date and clustering by region and product allows BigQuery to use partition pruning and clustering block elimination to scan only the relevant data for queries filtered by region and product. This directly reduces the amount of data read, lowering query costs and improving performance. Clustering also sorts data within partitions, enabling efficient filtering without full scans.

Exam trap

Google Cloud often tests the misconception that a UNION ALL view alone provides performance benefits, when in fact it does not reduce data scanned unless combined with table-level filters like _TABLE_SUFFIX or underlying partitioned/clustered tables.

Full explanation →

110

MCQmedium

A healthcare analytics company uses Cloud Bigtable to store time-series data from medical devices. The table has a row key of 'device_id#timestamp' where timestamp is stored in reverse order (max - timestamp) so that recent data is at the top. Queries that fetch data for a specific device over a date range are very fast. However, analysts also need to run queries that aggregate data across all devices for a specific hour (e.g., count of readings between 2023-01-01 10:00 and 11:00). These queries are extremely slow because they require scanning all rows. The team must redesign the schema to support both access patterns without duplicating data unnecessarily. What is the best approach?

A.Use BigQuery to query Bigtable via an external table and run the aggregation there.

B.Increase the number of Bigtable nodes to improve scan throughput.

C.Add a secondary index on the timestamp column.

D.Create a second table with row key 'timestamp#device_id' (with timestamp in natural order) to support time-range queries.

AnswerD

This provides efficient access for the aggregation query by allowing a range scan over the timestamp.

Why this answer

Option A is correct. Creating a separate table with a row key of 'timestamp#device_id' allows efficient range scans for a given time period across all devices. This is a common pattern in Bigtable to support multiple access patterns.

Option B is not possible (no secondary indexes). Option C is external and not a schema change. Option D (adding nodes) helps throughput but not query efficiency.

Full explanation →

111

MCQhard

You have a Cloud Spanner table 'Orders' with columns: OrderId, CustomerId, OrderDate, Status. You need to support a query that finds all orders for a customer in the last 30 days, sorted by OrderDate descending, with strong consistency. Using only indexes, what is the best approach?

A.Create a secondary index on (OrderDate) only

B.Create a secondary index on (CustomerId, OrderDate)

C.Use a manual table scan with filter

D.Create a secondary index on (CustomerId, OrderDate DESC) with INCLUDE (OrderId, Status)

AnswerD

Index covers the query completely, providing efficient ordered retrieval.

Why this answer

Option D is correct: creating a secondary index on (CustomerId, OrderDate DESC) with INCLUDE (OrderId, Status) allows the query to be served entirely from the index without accessing the base table, minimizing latency. Option A is good but missing INCLUDE forces access to the base table. Option B doesn't filter by customer.

Option C is not an index-based solution.

Full explanation →

112

MCQeasy

A BI analyst wants to create a report that displays total revenue by product category and month, with ability to drill down to individual products. Which schema design supports this in BigQuery?

A.Denormalized table with repeated fields

B.Single wide table with all dimensions and measures

C.Star schema with fact table and dimension tables

D.Snowflake schema with normalized dimensions

AnswerC

Star schema is optimized for BI: fact table stores measures, dimensions store attributes, enabling flexible aggregation and drill-down.

Why this answer

Option A is correct because a star schema with a fact table and dimension tables allows efficient aggregation and drill-down through joins. Snowflake schema is over-normalized for BigQuery. Wide tables cause duplication and slow aggregation.

Repeated fields are not suitable for drill-down.

Full explanation →

113

MCQmedium

Your application uses Firestore for real-time updates. You notice increasing read latency during peak hours. The database is in Native mode with a single-location (us-central1). After reviewing metrics, you see that the number of document reads has not changed significantly, but the database size has grown. What is the most likely cause and solution?

A.Enable multi-region replication to distribute read traffic.

B.The database needs to be defragmented periodically; run a compaction command.

C.Migrate the database to Datastore mode for better performance.

D.Review and create composite indexes for common query patterns.

AnswerD

Missing indexes cause full scans, increasing latency as data grows.

Why this answer

The correct answer is D because as the database size grows, Firestore's query performance can degrade if queries rely on automatic index scanning without composite indexes. Composite indexes allow Firestore to serve queries without scanning all documents, reducing read latency. The unchanged read count but increased latency indicates that queries are scanning more data due to missing indexes.

Exam trap

Google Cloud often tests the misconception that database growth always requires scaling or replication, when in fact the root cause is often missing composite indexes that force full scans, especially in Firestore's automatic indexing model.

How to eliminate wrong answers

Option A is wrong because multi-region replication improves availability and latency for global reads, but the database is single-location (us-central1) and read count hasn't changed; the issue is query efficiency, not geographic distribution. Option B is wrong because Firestore is a NoSQL document database that does not require defragmentation or compaction; such operations are for traditional relational databases or storage engines like LevelDB. Option C is wrong because Datastore mode is a legacy mode with different consistency and scaling characteristics; migrating would not resolve latency caused by missing composite indexes and could introduce compatibility issues.

Full explanation →

114

MCQeasy

A startup is building a BI stack on Google Cloud. They have moderate data volumes and need to run ad-hoc analytical queries and real-time dashboards. Which Google Cloud database service is most appropriate for this workload?

A.BigQuery

B.Cloud Spanner

C.Firestore

D.Cloud SQL

AnswerA

BigQuery is purpose-built for analytical queries and BI.

Why this answer

BigQuery is a serverless, highly scalable data warehouse designed for analytical queries and real-time dashboards. It supports ad-hoc SQL queries on large datasets with fast execution via its columnar storage and distributed query engine, making it ideal for BI workloads with moderate data volumes.

Exam trap

The trap here is confusing transactional databases (Cloud Spanner, Cloud SQL) or NoSQL databases (Firestore) with analytical data warehouses, leading candidates to pick a familiar OLTP service instead of recognizing BigQuery's specific suitability for ad-hoc analytics and BI dashboards.

How to eliminate wrong answers

Option B is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database optimized for transactional (OLTP) workloads, not ad-hoc analytical queries or real-time dashboards. Option C is wrong because Firestore is a NoSQL document database designed for mobile and web app real-time synchronization, not for complex analytical SQL queries or BI dashboards. Option D is wrong because Cloud SQL is a managed relational database for traditional OLTP workloads (e.g., MySQL, PostgreSQL) and lacks the columnar storage and massive parallelism needed for efficient ad-hoc analytics on moderate data volumes.

Full explanation →

115

MCQeasy

A startup is using Cloud SQL for PostgreSQL and wants to minimize downtime during maintenance. The application can tolerate a few minutes of read-only mode. Which configuration should they use?

A.Use a read replica and promote it during maintenance.

B.Enable automatic storage increase.

C.Configure a high availability (HA) instance with regional failover.

D.Schedule maintenance during off-peak hours only.

AnswerA

This allows reads to continue and writes to be redirected with minimal disruption.

Why this answer

Option A is correct because using a read replica and promoting it during maintenance allows the application to switch to a read-write capable instance with minimal downtime. The application can tolerate a few minutes of read-only mode, so the brief period when the replica is promoted and the original primary is unavailable is acceptable. This approach avoids the longer downtime associated with other methods like HA failover or simply waiting for maintenance to complete.

Exam trap

The trap here is that candidates often confuse high availability (HA) failover with read replica promotion, assuming HA provides zero downtime, but HA still incurs a brief failover delay and does not allow the application to remain in read-only mode during maintenance.

How to eliminate wrong answers

Option B is wrong because automatic storage increase only prevents out-of-disk errors, not downtime during maintenance; it does not provide any mechanism to switch traffic away from the instance being maintained. Option C is wrong because configuring a high availability (HA) instance with regional failover still requires a brief period of downtime during the failover process, and the application's tolerance for read-only mode is better served by a read replica that can be promoted independently. Option D is wrong because scheduling maintenance during off-peak hours only reduces the impact of downtime but does not eliminate it; the application still experiences downtime during the maintenance window, which the read replica approach avoids.

Full explanation →

116

MCQmedium

A data analyst reports that a BI dashboard query on BigQuery is taking over 30 seconds to execute. The table is partitioned by date and clustered by customer_id. The query filters on a specific date range and aggregates sales by customer. What is the most likely cause of the slow performance?

A.The query does not include a filter on the clustering column, so clustering provides no benefit.

B.The query uses a LEFT JOIN that requires a broadcast join, increasing network overhead.

C.The query filters on a date column that is not the partition column, causing a full table scan.

D.The table does not have a primary key, so BigQuery cannot use index scans.

AnswerC

Partition pruning only works when the filter is on the partition column; otherwise, all partitions are scanned.

Why this answer

Option C is correct because the query filters on a specific date range, but the table is partitioned by date, so BigQuery can prune partitions to scan only the relevant ones. If the filter were on a column that is not the partition column, a full table scan would occur, causing slow performance. Since the table is partitioned by date and the query filters on a date range, partition pruning should work efficiently, making C the most likely cause only if the filter column is misidentified.

However, the question states the table is partitioned by date and the query filters on a specific date range, so partition pruning should apply; the correct answer is actually A, as clustering on customer_id provides no benefit without a filter on that column, leading to a full scan of the clustered data.

Exam trap

The trap here is that candidates assume partitioning alone guarantees fast queries, but without a filter on the clustering column, clustering is useless, and a broad date range can still result in a large scan, making option A the correct answer despite the partition filter.

How to eliminate wrong answers

Option A is wrong because clustering provides benefits only when the query filters on the clustering column; without a filter on customer_id, BigQuery cannot prune clusters, but the query still benefits from partition pruning on date, so the primary performance issue is not clustering. Option B is wrong because the question does not mention any JOIN operation, and a broadcast join would only occur if a large table is joined with a small table, which is not indicated in the scenario. Option D is wrong because BigQuery does not use indexes or primary keys; it uses columnar storage and partitioning/clustering for performance, so the absence of a primary key is irrelevant.

Full explanation →

117

Multi-Selecthard

A Cloud Spanner database has a table with a primary key (UserId, Timestamp). Queries that filter by Timestamp range for a specific UserId are fast, but queries that filter only by Timestamp range across all users are slow. Which TWO improvements would help?

Select 2 answers

A.Use a leading column of the primary key that supports range scans.

B.Use a hash prefix on UserId.

C.Create an interleaved table structure.

D.Add a secondary index on Timestamp.

E.Partition the table by Timestamp.

AnswersA, D

Redesigning the primary key with Timestamp as the first column allows efficient range scans across users.

Why this answer

Option A is correct because in Cloud Spanner, the primary key order determines how data is physically sorted and stored. By making Timestamp the leading column of the primary key (e.g., (Timestamp, UserId)), range scans on Timestamp become efficient as Spanner can perform a contiguous scan of the sorted data. This directly addresses the slow queries that filter only by Timestamp range across all users.

Exam trap

Google Cloud often tests the misconception that adding a secondary index is always the best solution, but here both a leading key column change and a secondary index are valid; the trap is that candidates might think partitioning (Option E) is supported in Spanner when it is not.

Full explanation →

118

MCQeasy

A data analyst needs to create a reporting table that aggregates sales data by month. They want to ensure the table is optimized for querying by month and product category. Which table design best supports this?

A.Use a table with clustering on product_category only.

B.Use a flat table with no partitioning.

C.Use a view that selects month and product_category.

D.Partition by month and cluster by product_category.

AnswerD

Partitioning prunes months; clustering filters categories.

Why this answer

Option D is correct because partitioning by month physically separates data into monthly segments, allowing query pruning to skip irrelevant partitions when filtering by month. Clustering by product_category within each partition co-locates rows with the same category, reducing the amount of data scanned for queries that filter on both month and category. This design optimizes both I/O and scan efficiency for the described workload.

Exam trap

The trap here is that candidates often confuse a view with a materialized view or assume that any SQL object can improve performance without physical data reorganization, leading them to select Option C despite views having no storage or indexing capabilities.

How to eliminate wrong answers

Option A is wrong because clustering only on product_category without partitioning does not provide the month-level data isolation needed for efficient monthly queries; all months remain in the same storage unit, forcing full scans for any month filter. Option B is wrong because a flat table with no partitioning or clustering offers no data skipping or pruning, leading to full table scans on every query, which is highly inefficient for aggregated reporting. Option C is wrong because a view is just a stored query definition and does not physically reorganize or partition data; it cannot improve query performance on its own and still requires scanning the underlying table.

Full explanation →

119

MCQmedium

Refer to the exhibit. A developer creates these tables and notices that queries joining Users and Orders on UserId are slow. What is the most likely cause?

A.The primary key of Orders should include UserId as a prefix for co-location.

B.The foreign key constraint is missing, causing full table scans.

C.Tables are not interleaved, so parent and child rows may be in different splits.

D.The foreign key reference should be on the parent table.

AnswerC

Interleaving is required to guarantee co-location. Without it, joins may be distributed.

Why this answer

Option C is correct because without interleaving, parent and child rows may be stored on different splits, causing distributed joins. Option A is wrong because there is a foreign key. Option B is wrong because the primary key is on OrderId, not a composite key.

Option D is wrong because the foreign key is defined correctly.

Full explanation →

120

MCQmedium

A user runs the query above on a large table and receives an out-of-memory error. What is the most likely cause?

A.The table is a materialized view that cannot handle ORDER BY

B.The query uses COUNT(*) without a GROUP BY

C.The ORDER BY clause forces sorting of the entire dataset in memory on a single worker

D.The table is not partitioned, so full table scan causes memory overflow

AnswerC

Sorting large datasets requires memory proportional to the data size; if it exceeds available memory, the query fails.

Why this answer

Option C is correct because the ORDER BY clause in a distributed SQL engine like Snowflake or BigQuery forces all data to be sent to a single worker node for sorting, which can exceed the memory limit of that node when the dataset is large. This is a common cause of out-of-memory errors in MPP (Massively Parallel Processing) systems, as sorting is not a distributable operation by default without explicit partitioning or window functions.

Exam trap

Google Cloud often tests the misconception that any full table scan causes memory errors, but the real trap is that ORDER BY is a blocking operation that centralizes data, making it the primary culprit for out-of-memory errors in distributed systems.

How to eliminate wrong answers

Option A is wrong because materialized views can handle ORDER BY; the error is not related to materialized view limitations but to the sorting operation itself. Option B is wrong because COUNT(*) without GROUP BY returns a single scalar value, which does not cause memory overflow; it is an aggregation that can be computed in parallel without sorting. Option D is wrong because while a full table scan can be resource-intensive, it does not inherently cause out-of-memory errors; the memory overflow is specifically triggered by the ORDER BY clause forcing a single-node sort, not by the scan itself.

Full explanation →

121

MCQmedium

A company is designing a Cloud Spanner database for a global financial application. They need to minimize latency for customer queries while handling write-heavy workloads. The current design uses a single-region instance in us-central1. Which approach should they take to reduce latency for users in Europe?

A.Reconfigure the instance to a multi-region configuration with default leader in us-central1.

B.Reconfigure the instance to a multi-region configuration with default leader in europe-west1.

C.Add a read-only regional replica in europe-west1.

D.Increase the number of nodes to improve throughput and automatically reduce latency.

AnswerB

Multi-region configuration places a leader in Europe, reducing write and strong read latency for European users.

Why this answer

Option D is correct because using a multi-region instance with leader options can place a read-write leader in a European region, reducing write latency for European users. Option A is wrong because read replicas in Spanner only serve stale reads (not strong reads). Option B is wrong because multi-region instance with default leader in US would not help.

Option C is wrong because adding more nodes does not reduce cross-continental latency.

Full explanation →

122

MCQeasy

What should be adjusted to improve performance and resolve the connection error?

A.Disable automatic failover to reduce overhead

B.Change the instance type to a higher memory machine

C.Increase max_connections and implement connection pooling

D.Increase the disk size to handle more I/O

AnswerC

The error indicates that the connection limit is reached; increasing it together with pooling addresses both the limit and performance.

Why this answer

The connection error is likely due to the database reaching its maximum connection limit, which causes new connection attempts to be rejected. Increasing `max_connections` allows more concurrent client connections, while implementing connection pooling (e.g., using PgBouncer or similar) reuses existing connections efficiently, reducing overhead and preventing connection exhaustion. This directly resolves the error without requiring hardware changes.

Exam trap

Google Cloud often tests the misconception that connection errors are hardware-related (memory or disk), when in fact they are typically caused by exceeding the configured connection limit, which is a software configuration parameter.

How to eliminate wrong answers

Option A is wrong because disabling automatic failover does not address connection limits or errors; failover is a high-availability feature that ensures continuity during node failure, not a performance tuning parameter. Option B is wrong because changing the instance type to a higher memory machine may improve query performance but does not resolve connection errors caused by hitting `max_connections`; memory alone does not increase the connection limit. Option D is wrong because increasing disk size handles I/O throughput and storage capacity, but connection errors are unrelated to disk space or I/O; they are a client-side connection limit issue.

Full explanation →

123

MCQhard

An organization has a Cloud SQL for MySQL instance that stores sensitive data. They need to encrypt the data at rest using a customer-managed encryption key (CMEK). The database engineer creates a Cloud KMS key ring and key, and configures the Cloud SQL instance to use CMEK. However, after 30 days, the instance becomes inaccessible and the error message indicates the CMEK key is disabled. What is the most likely cause?

A.The CMEK key was disabled by the key administrator for security reasons.

B.The Cloud SQL instance exceeded the number of times it can access the CMEK key.

C.The CMEK key was rotated, and the old key version was disabled.

D.The CMEK key expired, causing automatic disablement.

AnswerA

If the key is disabled, Cloud SQL cannot access it, causing the instance to become unavailable.

Why this answer

Option A is correct because the error message indicates the CMEK key is disabled, and the most common cause in a controlled environment is that the key administrator intentionally disabled the key. Cloud SQL for MySQL requires the CMEK key to be enabled to encrypt and decrypt data at rest; if the key is disabled, the instance cannot access its data and becomes inaccessible. This aligns with the scenario where no other configuration changes are mentioned, making administrative action the likely cause.

Exam trap

The trap here is that candidates may confuse key rotation with key disablement, assuming that rotating a key automatically disables the old version, but in Cloud KMS, old key versions remain enabled unless explicitly disabled.

How to eliminate wrong answers

Option B is wrong because Cloud SQL does not have a limit on the number of times it can access a CMEK key; the key is accessed for each read/write operation, and there is no quota or threshold that would cause disablement. Option C is wrong because key rotation creates a new key version while keeping the old version enabled by default; disabling the old version is a manual action and not an automatic consequence of rotation. Option D is wrong because CMEK keys in Cloud KMS do not have an expiration date; they are disabled only by explicit administrative action or through a key lifecycle policy, not by automatic expiry.

Full explanation →

124

Multi-Selecteasy

Which THREE metrics from Cloud Monitoring are important for monitoring Cloud Bigtable performance?

Select 3 answers

A.Storage utilization

B.CPU utilization

C.Latency (P99)

D.Request count

E.Disk usage

AnswersB, C, D

High CPU indicates nodes are busy processing requests.

Why this answer

CPU utilization (option B) is a critical metric for Cloud Bigtable because it directly reflects the processing load on the cluster's nodes. High CPU utilization indicates that the cluster is approaching its throughput limits, which can lead to increased latency and throttling. Monitoring this metric helps in scaling decisions, such as adding nodes or optimizing queries, to maintain performance.

Exam trap

The trap here is that candidates often confuse storage-related metrics (like disk usage or storage utilization) with performance metrics, but Cloud Bigtable abstracts storage management, making CPU, latency, and request count the direct indicators of performance health.

Full explanation →

125

Multi-Selectmedium

Which THREE of the following SQL techniques are commonly used to improve BI query performance in BigQuery?

Select 3 answers

A.Select all columns using SELECT * to avoid missing data

B.Avoid JOINs by storing all relevant data in a single table

C.Use self-joins to compare rows within the same table

D.Apply filters in the WHERE clause as early as possible

E.Use APPROX_COUNT_DISTINCT instead of COUNT(DISTINCT) when exact counts are not needed

AnswersB, D, E

Denormalization eliminates JOIN overhead.

Why this answer

Option B is correct because denormalizing data into a single table avoids expensive JOIN operations, which in BigQuery can cause significant performance degradation due to shuffling and data redistribution across slots. By storing all relevant data in one table, you reduce the need for large-scale data shuffling, leading to faster query execution and lower slot consumption.

Exam trap

Google Cloud often tests the misconception that 'SELECT *' is safe for ad-hoc queries, but in BigQuery it directly increases bytes billed and query latency due to full column scans, making it a poor practice for performance optimization.

Full explanation →

126

MCQhard

A company uses BigQuery for BI reporting. They have a materialized view that refreshes automatically to provide pre-aggregated sales data. Recently, the materialized view stopped reflecting new data inserted into the base table. The base table is a streaming buffer table with ingestion-time partitioning. What is the most likely reason?

A.The materialized view does not support streaming buffer tables.

B.The automatic refresh interval has been exceeded due to high query load.

C.The materialized view has reached the maximum number of partitions allowed.

D.The base table's schema has changed, making the materialized view incompatible.

AnswerA

Materialized views require data to be committed to storage; streaming buffer data is not yet committed.

Why this answer

Materialized views in BigQuery do not support base tables that use a streaming buffer, such as ingestion-time partitioned tables. The streaming buffer contains data that has not yet been committed to managed storage, and materialized views can only read from committed storage. Therefore, when new data is inserted into the streaming buffer, the materialized view cannot reflect it until the data is flushed from the buffer, which can cause the view to appear stale or stop reflecting new data entirely.

Exam trap

Google Cloud often tests the misconception that materialized views automatically reflect all data in the base table, including uncommitted streaming buffer data, when in fact they only read from committed storage.

How to eliminate wrong answers

Option B is wrong because the automatic refresh interval is not exceeded due to high query load; BigQuery materialized views refresh based on a system-defined interval (typically within 5 minutes of base table changes) and are not affected by query load. Option C is wrong because materialized views do not have a maximum number of partitions limit that would cause them to stop reflecting new data; partition limits apply to tables, not materialized views. Option D is wrong because schema changes to the base table would cause the materialized view to become invalid or require a manual refresh, but the question states the view stopped reflecting new data, not that it became invalid, and schema changes are not the most likely cause in this streaming buffer scenario.

Full explanation →

127

Multi-Selecteasy

Which TWO actions can a team take immediately to resolve a Cloud SQL instance running out of storage?

Select 2 answers

A.Enable automatic storage increase

B.Increase storage capacity via gcloud

C.Delete binary log files

D.Migrate to Cloud Spanner

E.Add read replicas

AnswersA, B

Automatic increase ensures future storage issues are avoided.

Why this answer

Increasing storage capacity and enabling automatic storage increase directly address the storage shortage. Deleting binary logs can free space but is not recommended as it may affect point-in-time recovery. Migrating to Cloud Spanner is a long-term change.

Adding read replicas does not increase storage.

Full explanation →

128

MCQeasy

A SQL query with multiple JOINs is returning duplicate rows. What is the most likely cause?

A.Using INNER JOIN instead of LEFT JOIN.

B.There is a one-to-many relationship between tables.

C.Missing ORDER BY clause.

D.Using UNION instead of UNION ALL.

AnswerB

One-to-many joins multiply rows from the one side.

Why this answer

When a SQL query with multiple JOINs returns duplicate rows, the most likely cause is a one-to-many relationship between the tables being joined. Each matching row in the 'many' side of the join multiplies the rows from the 'one' side, producing duplicates. This is a fundamental behavior of JOIN operations in SQL, where the result set is the Cartesian product of matching rows across the joined tables.

Exam trap

Google Cloud often tests the misconception that duplicate rows are caused by the type of JOIN (e.g., INNER vs LEFT) or by missing sorting, rather than understanding that duplicates arise from the cardinality of the relationship between the joined tables.

How to eliminate wrong answers

Option A is wrong because using INNER JOIN instead of LEFT JOIN does not inherently cause duplicates; it only filters out non-matching rows, which can actually reduce duplicates. Option C is wrong because the ORDER BY clause only affects the sorting of the result set, not the number of rows returned. Option D is wrong because UNION removes duplicates by default (acting like UNION ALL with a DISTINCT step), while UNION ALL preserves all rows including duplicates; the question is about duplicate rows from JOINs, not from set operations.

Full explanation →

129

Multi-Selecteasy

Which TWO data types are supported in Cloud Spanner schemas?

Select 2 answers

A.ARRAY

B.GEOMETRY

C.TIMESTAMP

D.TEXT

E.TINYINT

AnswersA, C

ARRAY is supported for storing repeated values of a specific type.

Why this answer

Options B and D are correct. Cloud Spanner supports ARRAY and TIMESTAMP. Option A is wrong because TINYINT is not a Spanner type (use INT64).

Option C is wrong because GEOMETRY is not supported. Option E is wrong because TEXT is not a supported type (use STRING).

Full explanation →

130

MCQhard

Refer to the exhibit. You receive the following query output showing bytes processed for a BigQuery query. The table is partitioned by date and clustered on country. What is the most likely reason for the high bytes processed?

A.The GROUP BY country requires sorting all rows

B.The table is not partitioned correctly

C.The date range is too wide

D.The query does not filter on the clustering column, causing full scan of selected partitions

AnswerD

Clustering on country helps only if the WHERE clause filters on country; otherwise, all rows in partitions are scanned.

Why this answer

Option B is correct: the query does not filter on the clustering column (country), so BigQuery must scan all rows in the selected partitions. Clustering only reduces data scanned when there is a filter on the clustering key or when the query aggregates after filtering on it. Option A is incorrect because partitioning is working.

Option C is incorrect because 31 days is a small range. Option D is incorrect because the GROUP BY does not cause full scan; the issue is lack of clustering filter.

Full explanation →

131

MCQhard

A team is migrating an on-premises PostgreSQL database to Cloud SQL. The current schema uses a composite primary key on columns (customer_id, order_date) in the orders table. The migration team wants to reduce the cost of secondary indexes. Which schema design change should they consider?

A.Partition the table by customer_id to reduce the number of secondary indexes needed.

B.Create a secondary index on the composite key to keep the same query performance.

C.Replace the composite primary key with a surrogate UUID primary key and add unique constraints on the original columns.

D.Use the CLUSTER command to physically reorder the table based on the composite key.

AnswerC

A UUID primary key is smaller than a composite key, and unique constraints enforce data integrity without the overhead of a clustered index.

Why this answer

Option C is correct because replacing the composite primary key with a surrogate UUID primary key reduces the size of secondary indexes. In PostgreSQL (and Cloud SQL), secondary indexes include the primary key columns as row identifiers. A composite key on (customer_id, order_date) is wide, making every secondary index large and costly.

A UUID surrogate key is narrower, shrinking all secondary indexes and reducing storage and I/O costs.

Exam trap

Google Cloud often tests the misconception that partitioning or clustering reduces index storage costs, when in fact only narrowing the primary key (or using a surrogate key) directly shrinks secondary index size in PostgreSQL.

How to eliminate wrong answers

Option A is wrong because partitioning by customer_id does not reduce the number or size of secondary indexes; it only splits the table into smaller physical segments, and each partition still needs its own indexes. Option B is wrong because creating a secondary index on the composite key duplicates the primary key index, increasing storage and write overhead without reducing cost. Option D is wrong because the CLUSTER command physically reorders rows based on an index, which can improve locality but does not reduce secondary index size or cost; it is a one-time maintenance operation, not a schema design change.

Full explanation →

132

MCQmedium

A Firestore database is used for a social app. A collection of posts has indexes on fields `author` and `timestamp`. The query `where author == 'user1' order by timestamp desc limit 10` is performing a large number of document reads. What is the likely cause?

A.The limit is too high.

B.The query is scanning all posts.

C.Index on timestamp is not descending.

D.Missing composite index on (author, timestamp).

AnswerD

A composite index covers both the filter and sort, avoiding large scans.

Why this answer

The correct answer is D because the query filters on `author` and orders by `timestamp`, which requires a composite index on `(author, timestamp)` to avoid a full scan. Without this composite index, Firestore must scan all documents matching `author == 'user1'` (or all posts if no single-field index on `author` is used) and then sort them in memory, leading to excessive document reads. The existing single-field indexes on `author` and `timestamp` are insufficient for this combined filter and sort operation.

Exam trap

Google Cloud often tests the misconception that single-field indexes are sufficient for combined filter and order queries, when in fact Firestore requires a composite index to avoid scanning all matching documents.

How to eliminate wrong answers

Option A is wrong because a limit of 10 is not inherently too high; the excessive reads are due to the lack of a composite index, not the limit value. Option B is wrong because the query is not scanning all posts if a single-field index on `author` exists, but it still reads all documents for that author and sorts in memory, which is inefficient. Option C is wrong because the index on `timestamp` does not need to be descending; Firestore can reverse the sort order at query time as long as a composite index on `(author, timestamp)` exists, and the issue is the missing composite index, not the direction of the single-field index.

Full explanation →

133

MCQmedium

A data analyst needs to create a rolling 30-day average of daily revenue. Which window function clause is required?

A.UNBOUNDED PRECEDING

B.RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW

C.PARTITION BY month

D.ROWS BETWEEN 29 PRECEDING AND CURRENT ROW

AnswerD

This selects exactly 30 rows (current + 29 preceding) for the rolling average.

Why this answer

Option D is correct because `ROWS BETWEEN 29 PRECEDING AND CURRENT ROW` defines a physical window of exactly 30 rows (the current row plus the 29 preceding rows), which is the standard SQL approach for a rolling 30-day average when each row represents one day of revenue. This clause ensures that the window frame is fixed at 30 rows regardless of gaps in dates, making it reliable for daily data.

Exam trap

Google Cloud often tests the distinction between `ROWS` and `RANGE` window frames, where candidates mistakenly choose `RANGE` with an interval because it sounds more intuitive for date-based rolling averages, but the exam expects the precise `ROWS` syntax for a fixed row count.

How to eliminate wrong answers

Option A is wrong because `UNBOUNDED PRECEDING` includes all rows from the start of the partition, not just the last 30 days, which would compute a cumulative average rather than a rolling 30-day average. Option B is wrong because `RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW` is not valid SQL syntax in most databases (e.g., PostgreSQL uses `RANGE BETWEEN INTERVAL '30 days' PRECEDING AND CURRENT ROW`), and even if corrected, `RANGE` uses logical date-based boundaries that can include more than 30 rows if multiple rows share the same date, breaking the exact 30-day count. Option C is wrong because `PARTITION BY month` groups data by calendar month, which does not create a rolling window; it resets the average at each month boundary, making it a monthly average rather than a continuous rolling average.

Full explanation →

134

MCQmedium

Your company is deploying a new application on Google Cloud and needs to choose a database solution. The application requires strong transactional consistency, complex SQL queries, and the ability to scale horizontally for read-heavy workloads. Which database service should you recommend?

A.Cloud Spanner

B.BigQuery

C.Cloud SQL

D.Cloud Firestore

AnswerA

Cloud Spanner offers strong consistency, SQL, and horizontal scaling.

Why this answer

Cloud Spanner is the correct choice because it provides strong transactional consistency (ACID) across globally distributed nodes, supports complex SQL queries with standard SQL syntax, and offers horizontal scaling for read-heavy workloads through automatic sharding and read replicas. Unlike other options, Spanner uniquely combines these three requirements—strong consistency, SQL, and horizontal scaling—in a single managed service.

Exam trap

Google Cloud often tests the misconception that Cloud SQL can scale horizontally for read-heavy workloads, but Cloud SQL's read replicas are limited and do not provide true horizontal scaling or strong consistency across replicas, unlike Spanner's built-in distributed architecture.

How to eliminate wrong answers

Option B (BigQuery) is wrong because it is a data warehouse optimized for analytical queries on large datasets, not for transactional workloads requiring strong consistency and complex SQL with ACID guarantees. Option C (Cloud SQL) is wrong because while it supports complex SQL and strong consistency, it cannot scale horizontally for read-heavy workloads; it is limited to vertical scaling and read replicas with eventual consistency. Option D (Cloud Firestore) is wrong because it is a NoSQL document database that does not support complex SQL queries and offers only eventual consistency in multi-region mode, not strong transactional consistency.

Full explanation →

135

MCQeasy

A startup is building a mobile app that requires a highly available, globally distributed, low-latency NoSQL database. The data model is key-value with occasional queries on a secondary field. Which database service should they choose?

A.Cloud SQL (PostgreSQL)

B.Cloud Firestore

C.Cloud Spanner

D.Cloud Bigtable

AnswerB

Firestore is a flexible NoSQL database with automatic multi-region replication, real-time updates, and strong consistency.

Why this answer

Cloud Firestore is a fully managed, globally distributed NoSQL document database that provides strong consistency, automatic multi-region replication, and low-latency queries on key-value pairs as well as secondary fields (via composite indexes). It is ideal for mobile apps requiring high availability and real-time data synchronization across the globe, directly matching the requirements of a key-value model with occasional secondary queries.

Exam trap

The trap here is that candidates often confuse Cloud Spanner's global distribution and strong consistency with being the best fit for NoSQL key-value workloads, overlooking that Spanner is a relational database with SQL semantics and higher operational overhead, while Firestore is purpose-built for mobile app key-value and document storage with automatic secondary indexing.

How to eliminate wrong answers

Option A is wrong because Cloud SQL (PostgreSQL) is a relational database that does not natively support global distribution or low-latency key-value access; it is designed for single-region deployments and requires manual replication for multi-region setups. Option C is wrong because Cloud Spanner is a globally distributed relational database that provides strong consistency and horizontal scaling, but it is optimized for SQL workloads and complex transactions, not for simple key-value models with occasional secondary queries, and it incurs higher cost and complexity than necessary. Option D is wrong because Cloud Bigtable is a wide-column NoSQL database designed for high-throughput, low-latency analytical workloads (e.g., time-series, IoT) but does not support secondary indexes or efficient queries on non-key fields; it is not suitable for mobile app use cases requiring ad-hoc queries on secondary attributes.

Full explanation →

136

Multi-Selecteasy

Which TWO database services are fully managed and support global distribution of data for low-latency reads and writes?

Select 2 answers

A.Memorystore

B.Cloud Spanner

C.Firestore

D.Bigtable

E.Cloud SQL

AnswersB, C

Cloud Spanner provides global distribution with automatic synchronous replication across regions.

Why this answer

Cloud Spanner is a fully managed, globally distributed relational database service that provides strong consistency and horizontal scaling across regions. It supports global distribution of data for low-latency reads and writes by using synchronous replication and atomic clocks for TrueTime, enabling ACID transactions at global scale.

Exam trap

Google Cloud often tests the distinction between 'global distribution for reads and writes' versus 'global distribution for reads only'—candidates mistakenly choose Bigtable or Cloud SQL because they offer read replicas globally, but they do not support globally distributed writes with strong consistency.

Full explanation →

137

MCQmedium

A company has a BigQuery table partitioned by ingestion time. They want to create a BI report showing month-over-month revenue growth. To minimize query cost, what should they do?

A.Use a WHERE clause with _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 13 MONTH) and LAG

B.Use DATE_TRUNC on the ingestion timestamp without filtering partitions

C.Use LAG without a partition filter

D.Use a wildcard table with UNION ALL over monthly tables

AnswerA

This filters to only the necessary partitions for the last 13 months (to compute month-over-month) and uses LAG for growth.

Why this answer

Option A is correct because it uses a WHERE clause with _PARTITIONDATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 13 MONTH) to prune partitions, ensuring BigQuery scans only the necessary 13 months of data. The LAG function then computes month-over-month revenue growth efficiently. This minimizes query cost by reducing the amount of data processed, which is critical for ingestion-time partitioned tables.

Exam trap

Google Cloud often tests the misconception that any date function or window function alone reduces cost, but without explicit partition pruning (e.g., _PARTITIONDATE filter), BigQuery still scans all partitions, negating cost benefits.

How to eliminate wrong answers

Option B is wrong because DATE_TRUNC on the ingestion timestamp without a partition filter does not prune partitions; BigQuery would still scan all partitions, leading to higher costs. Option C is wrong because using LAG without a partition filter forces a full table scan, negating any cost savings from partitioning. Option D is wrong because using a wildcard table with UNION ALL over monthly tables is an anti-pattern; it requires manual table management and does not leverage BigQuery's native partitioning, often resulting in higher costs and complexity.

Full explanation →

138

Multi-Selecthard

A company uses Firestore to power a live sports score app. Scores are updated frequently, and many clients listen to real-time updates on specific games. Which two design decisions will minimize the number of reads and reduce costs? (Choose two.)

Select 2 answers

A.Use a collection group query to listen to all games at once

B.Store an aggregate score summary document per game and listen to it

C.Use a separate document per game and listeners filter by game ID

D.Use a single document for all games with nested fields

E.Use a subcollection of periods (quarters) to spread writes

AnswersB, C

Reduces write operations and read frequency; clients get updates from a single summary document.

Why this answer

Options B and C are correct. B: using a separate document per game and having clients listen only to the game they're interested in minimizes reads because each client only reads one document. C: storing an aggregate score summary document per game reduces the number of document updates and reads because changes are batched into a single document write, and listeners read that one document.

Option A (collection group query) would listen to many documents, increasing reads. Option D (subcollection of periods) increases read complexity. Option E (single document for all games) would cause document contention and all clients reading the same large document.

Full explanation →

139

MCQmedium

A Cloud Spanner application experiences high write latency on a table with a monotonically increasing primary key. Which schema change will most effectively reduce latency?

A.Convert the table to an interleaved table

B.Add a secondary index on the existing key

C.Modify the primary key to include a hash of the original key as a leading column

D.Increase the number of nodes in the instance

AnswerC

Hash prefix distributes writes uniformly across splits.

Why this answer

Option B is correct: adding a hash prefix to the primary key spreads writes across nodes, eliminating hotspotting. Option A is the current problem. Options C and D do not directly address the underlying hotspotting issue.

Full explanation →

140

MCQhard

A global gaming company uses Cloud Spanner for player profiles and game state. The schema includes a table 'PlayerStats' with a primary key (PlayerId, GameId, Timestamp). The table stores millions of rows per player. The application frequently runs a query to fetch the most recent stats for a given player across all games, using ORDER BY Timestamp DESC LIMIT 10. This query is slow, taking several seconds. The team adds a secondary index on (PlayerId, Timestamp) but still sees high CPU usage and latency. They need to redesign the schema to optimize this query without changing the application logic significantly. What should they do?

A.Migrate the PlayerStats table to Cloud Bigtable for better time-series performance.

B.Change the primary key to (PlayerId, Timestamp, GameId) and drop the secondary index.

C.Create a stored procedure that aggregates data per player and caches results.

D.Add a materialized view that pre-computes the latest stats per player.

AnswerB

This allows efficient range scans for a player’s stats ordered by time.

Why this answer

Option A is correct. Reordering the primary key to (PlayerId, Timestamp, GameId) allows Spanner to efficiently perform a range scan for a given PlayerId, sorted by Timestamp, without needing a secondary index. This eliminates the need for the index and reduces CPU.

Option B is not a schema change. Option C is a different database, not a schema redesign. Option D is not supported in Spanner natively.

Full explanation →

141

MCQmedium

Refer to the exhibit. You receive an alert from this policy for a Cloud Spanner instance. Which action should you take first?

A.Identify and remove unused indexes

B.Add more nodes to the instance

C.Review the top queries by CPU usage in the Spanner console

D.Split large tables into smaller ones

AnswerB

Directly reduces per-node CPU utilization.

Why this answer

Option C is correct because high CPU utilization indicates the instance is overloaded; the immediate fix is to add nodes. Option A is wrong because reviewing queries is important but the alert signals capacity issue. Option B is wrong because indexing might not reduce overall CPU if workload is balanced.

Option D is wrong because splitting tables is a schema change and doesn't address node capacity.

Full explanation →

142

MCQhard

Refer to the exhibit. A data engineer created a materialized view on a table that receives streaming inserts. When they query the materialized view, they get this error. What is the most likely cause?

A.The materialized view definition includes a JOIN that is not supported.

B.The materialized view has reached its maximum size limit.

C.The materialized view cannot read data from the streaming buffer.

D.The base table has a schema change that the materialized view cannot adapt to.

AnswerC

Materialized views require data to be committed; streaming buffer data is not yet readable by materialized views.

Why this answer

The error occurs because materialized views in BigQuery cannot directly read data from the streaming buffer. When a base table receives streaming inserts, the data resides in the streaming buffer for up to 90 minutes before being committed to storage. Materialized views only reflect committed data, so querying them during this window returns an error indicating that the view cannot access the streaming buffer.

Exam trap

Google Cloud often tests the misconception that materialized views can access all data in the base table immediately, including uncommitted streaming data, when in reality they only reflect committed data and cannot read from the streaming buffer.

How to eliminate wrong answers

Option A is wrong because materialized views in BigQuery support JOINs, including with other materialized views, as long as they meet the documented limitations (e.g., no self-joins, no cross-join of non-partitioned tables). Option B is wrong because materialized views in BigQuery do not have a fixed maximum size limit; they are managed storage objects that scale with the underlying base table. Option D is wrong because schema changes to the base table (e.g., adding or dropping columns) are automatically propagated to the materialized view, and the view will adapt as long as the change does not break the view definition (e.g., dropping a column used in the SELECT list).

Full explanation →

143

MCQeasy

A company runs a BigQuery data warehouse. They notice that query performance has degraded over time. The data is loaded daily from Cloud Storage using batch loads. Which action is most likely to improve query performance?

A.Partition and cluster tables based on common query filters.

B.Increase the number of slots in the reservation.

C.Create materialized views for all frequent queries.

D.Migrate the data to Cloud SQL for better performance.

AnswerA

Partitioning and clustering reduce data scanned, improving performance.

Why this answer

Partitioning and clustering tables based on common query filters directly reduces the amount of data scanned per query by organizing data into physical segments. In BigQuery, this allows the query engine to prune entire partitions and clusters, significantly lowering I/O and improving performance without additional cost or complexity.

Exam trap

Google Cloud often tests the misconception that adding more compute resources (slots) is the default fix for slow queries, when in reality data organization techniques like partitioning and clustering are the first-line optimization for scan-heavy workloads.

How to eliminate wrong answers

Option B is wrong because increasing slot count only addresses concurrency and resource contention, not the root cause of performance degradation from growing data volumes and unoptimized table structures. Option C is wrong because materialized views add storage and maintenance overhead, and while they can speed up specific queries, they do not fix the underlying issue of full table scans on the base tables. Option D is wrong because Cloud SQL is a relational OLTP database not designed for analytical workloads; migrating there would likely worsen performance and increase latency for large-scale aggregation queries.

Full explanation →

144

MCQeasy

A data warehouse in BigQuery stores event logs with nested and repeated fields (e.g., page views within a session). Which schema type is optimal for storing this data?

A.Use RECORD type columns for each nested level

B.Normalize into separate tables and join

C.Use ARRAY<STRUCT<...>> columns for nested repeated data

D.Store as JSON strings and parse at query time

AnswerC

Arrays of structs are the native way to represent nested repeated data in BigQuery.

Why this answer

Option D is correct: ARRAY<STRUCT<...>> allows storing nested repeated data natively in BigQuery, enabling efficient querying without joins. Option A (separate tables) requires costly joins. Option B (JSON strings) loses schema enforcement and performance.

Option C (RECORD type) is a legacy term; the current best practice is arrays of structs.

Full explanation →

145

Drag & Dropmedium

Arrange the steps to import data from Cloud Storage into Cloud Firestore using a managed import.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Import needs properly formatted files; use gcloud command, then monitor and verify.

Full explanation →

146

MCQeasy

Your company runs a global e-commerce platform on Google Cloud Spanner. The database schema includes an 'Orders' table with primary key (OrderId, CustomerId) and an 'OrderItems' table with primary key (OrderId, CustomerId, ItemId), interleaved in parent Orders on delete cascade. During peak shopping hours, you notice that queries retrieving all items for a specific order are performing full table scans on the OrderItems table, leading to increased latency and higher CPU utilization. The queries use the OrderId as the filter condition. The database administrators have already checked that the query plans show table scans instead of using the interleaved index. You are tasked with resolving this performance issue. Which of the following actions should you take?

A.Remove CustomerId from the Orders primary key (making it just OrderId) and update OrderItems to have primary key (OrderId, ItemId), maintaining interleaving.

B.Change the primary key of Orders to (OrderId, CustomerId) and update OrderItems accordingly.

C.Create a secondary index on OrderItems(OrderId).

D.Increase the number of Spanner nodes to improve throughput.

AnswerA

This allows efficient lookup using only OrderId and leverages interleaving.

Why this answer

Option A is correct because the interleaved index in Cloud Spanner requires that the parent table's primary key columns be a prefix of the child table's primary key. With the original schema, queries filtering only on OrderId cannot use the interleaved index because CustomerId is missing from the filter, forcing a full table scan. By removing CustomerId from the primary key of Orders and OrderItems, OrderId becomes the leading column, allowing the interleaved index to be used for efficient point lookups.

Exam trap

Google Cloud often tests the misconception that secondary indexes are the default fix for query performance issues, when in fact the schema design—specifically the primary key structure for interleaved tables—is the root cause and must be corrected first.

How to eliminate wrong answers

Option B is wrong because it keeps CustomerId in the primary key, which does not fix the issue—queries filtering only on OrderId still cannot use the interleaved index. Option C is wrong because creating a secondary index on OrderItems(OrderId) would add storage and write overhead, and while it could help, it is not the optimal solution; the correct fix is to adjust the primary key to leverage the interleaved index directly. Option D is wrong because increasing Spanner nodes improves throughput but does not address the root cause of full table scans caused by an inefficient schema design.

Full explanation →

147

MCQhard

Your company runs a global gaming platform using Cloud Spanner as the backend database. The platform has millions of users who play concurrently. You receive reports that during peak hours (7-10 PM UTC), some users experience 'DEADLINE_EXCEEDED' errors and high latency on write operations. You have already verified that there are no hot keys and that the schema uses primary keys with hash prefixes. Monitoring shows CPU utilization averages 60% but spikes to 80% during the peak. The average commit latency is 50ms during peak, and the transaction rate is 10,000 writes per second. The instance currently has 100 nodes. The application team indicates that writes are primarily player score updates. What should you do to resolve the performance issue?

A.Enable Fine-Grained Latency & Replication (FLLR) to improve write latency.

B.Disable client-side buffering for write operations.

C.Increase the number of Spanner nodes to 150.

D.Reduce the size of write transactions by batching fewer mutations per transaction.

AnswerC

More nodes add capacity, reducing CPU pressure and latency.

Why this answer

Option C is correct because increasing the number of Spanner nodes from 100 to 150 directly adds more compute and storage capacity, reducing CPU utilization from the 80% spike and lowering write latency. The 60-80% CPU range with 50ms commit latency indicates the instance is nearing its throughput limit, and adding nodes distributes the write load (10,000 writes/sec) more evenly, alleviating 'DEADLINE_EXCEEDED' errors without requiring schema or application changes.

Exam trap

Google Cloud often tests the misconception that reducing transaction size or disabling buffering always improves performance, but in Spanner, CPU saturation from high write throughput is best resolved by horizontal scaling (adding nodes), not by reducing batch sizes or tweaking client settings.

How to eliminate wrong answers

Option A is wrong because Fine-Grained Latency & Replication (FLLR) is a feature for reducing read latency by placing replicas closer to users, not for improving write throughput or CPU-bound write latency; writes still require a quorum across all replicas. Option B is wrong because disabling client-side buffering would increase the number of round trips and likely worsen latency, as buffering helps batch writes and reduce overhead; the issue is server-side CPU saturation, not client-side batching. Option D is wrong because reducing transaction size by batching fewer mutations per transaction would increase the total number of transactions, potentially raising CPU overhead and commit latency further, and the current 50ms commit latency is already high for small score updates.

Full explanation →

148

MCQhard

You are managing a Memorystore for Redis cluster with standard tier (persistence disabled). The application experiences occasional latency spikes while performing SET operations. You observe that the 'evicted_keys' metric spikes during the spikes. What is the most effective solution?

A.Enable AOF persistence with fsync every second

B.Change the maxmemory-policy to 'volatile-lru'

C.Increase the maximum memory size of the instance

D.Configure a read replica to offload read traffic

AnswerC

More memory reduces evictions, stabilizing write latency.

Why this answer

The evicted_keys metric spikes during SET operations indicate that the Redis instance has reached its maxmemory limit and is evicting keys to accommodate new writes. Increasing the maximum memory size directly addresses the root cause by providing more headroom for data, reducing the need for eviction and the associated latency spikes.

Exam trap

Google Cloud often tests the misconception that changing the eviction policy (Option B) solves memory pressure, when in fact the policy only controls which keys are evicted, not whether eviction occurs at all.

How to eliminate wrong answers

Option A is wrong because enabling AOF persistence with fsync every second adds disk I/O overhead, which can increase latency rather than reduce it, and does not address the memory pressure causing evictions. Option B is wrong because changing the maxmemory-policy to 'volatile-lru' only affects which keys are evicted (those with TTL set), but does not prevent evictions from occurring when memory is full; the problem is insufficient memory, not the eviction policy. Option D is wrong because configuring a read replica offloads read traffic, but the latency spikes occur during SET (write) operations, and replicas do not handle writes; this does not reduce memory pressure on the primary instance.

Full explanation →

149

Multi-Selecteasy

A BigQuery dataset contains a table with a STRUCT column for customer address. The BI team needs to query the city field from the struct. Which two approaches are valid? (Select TWO).

Select 2 answers

A.SELECT UNNEST(address) as city FROM table

B.SELECT JSON_EXTRACT(TO_JSON(address), '$.city') FROM table

C.SELECT address.city FROM table

D.SELECT address['city'] FROM table

E.SELECT address.city.standard FROM table

AnswersB, C

Converting the struct to JSON and extracting the city field is a valid but more verbose method.

Why this answer

Option B is correct because `JSON_EXTRACT(TO_JSON(address), '$.city')` converts the STRUCT to a JSON string and then extracts the `city` field using JSONPath syntax. Option C is correct because BigQuery allows direct field access on a STRUCT column using dot notation (`address.city`), which is the standard SQL syntax for nested fields.

Exam trap

Google Cloud often tests the distinction between STRUCT and ARRAY types, and the trap here is that candidates confuse `UNNEST` (for ARRAYs) with dot notation (for STRUCTs), or mistakenly apply bracket syntax from other SQL dialects like PostgreSQL or MySQL.

Full explanation →

150

MCQmedium

A gaming company uses Memorystore for Redis to cache player session data. They need to ensure high availability with automatic failover in case of a zone failure. Which configuration should the Database Engineer choose?

A.Deploy a Standard tier Redis instance with replication across two zones.

B.Deploy a Basic tier Redis instance with multiple read replicas.

C.Deploy a Memcached cluster with multiple nodes.

D.Deploy a Basic tier Redis instance in a single zone.

AnswerA

Standard tier provides replication and automatic failover.

Why this answer

Memorystore for Redis Standard tier provides cross-zone replication with automatic failover, ensuring high availability during a zone failure. The Standard tier uses a primary and replica instance in different zones, and if the primary fails, the replica is automatically promoted. This meets the requirement for automatic failover without manual intervention.

Exam trap

Google Cloud often tests the distinction between Basic and Standard tiers in Memorystore for Redis, where candidates mistakenly assume Basic tier offers replication or failover, but it is a single-node configuration with no high availability.

How to eliminate wrong answers

Option B is wrong because the Basic tier (Standard tier in some contexts) does not support replication or automatic failover; it is a single-node instance with no high availability. Option C is wrong because Memcached is a distributed memory caching system, not a Redis instance, and does not provide the same data persistence or failover mechanisms required for session data. Option D is wrong because a Basic tier Redis instance in a single zone offers no redundancy; any zone failure would cause complete data loss and downtime.

Full explanation →

Page 2 of 7

All pages

Practice PCDE by domain

Target a specific domain to shore up weak areas.

Plan and manage database infrastructure Define data structures and implement SQL for Business Intelligence Design and implement database schemas Monitor and optimize database performance

See all domains with question counts →