Google Professional Cloud Database Engineer PCDE Questions 151–225 | Page 3/7

151

Multi-Selecthard

A company is migrating a large Oracle database to Cloud Spanner. They need to define the schema for relational tables with foreign keys. Which THREE considerations are important when designing the Spanner schema? (Choose three.)

Select 3 answers

A.Use NULL values in primary key columns to allow optional fields.

B.Use INTERLEAVE tables to model parent-child relationships.

C.Avoid using composite primary keys; use single-column keys instead.

D.Define secondary indexes for querying on non-key columns.

E.Foreign keys are automatically enforced in Cloud Spanner.

AnswersB, D, E

Interleaving allows co-locating parent and child rows, reducing read latency.

Why this answer

Options A, B, and D are correct. Spanner supports interleaved tables (A) for parent-child relationships, foreign key constraints (B) for referential integrity, and secondary indexes (D) for query performance on non-key columns. Option C is false because composite primary keys are common.

Option E is false because primary key columns cannot be NULL.

Full explanation →

152

MCQmedium

A company runs near-real-time dashboards on BigQuery that query a table partitioned by day and clustered by user_id. The most common query filters on user_id and then aggregates sales over the last 7 days. However, many queries still scan full partitions. What is the most likely cause?

A.The dashboard is configured to refresh every 5 minutes, causing too many queries.

B.The table uses a wide-column schema with many repeated fields.

C.The table is partitioned by hour, not by day.

D.The table is not clustered on user_id, or the clustering expression does not match the filter.

AnswerD

Clustering on user_id allows BigQuery to prune blocks within partitions when filtering on that column.

Why this answer

Option D is correct because the most common cause of full partition scans despite partitioning by day and clustering by user_id is that the clustering expression does not match the filter predicate. In BigQuery, clustering only prunes blocks within a partition when the filter column exactly matches the clustering key; if the filter uses a different expression (e.g., a cast or function) or if clustering is not properly defined, BigQuery falls back to scanning the entire partition. This results in the described behavior where queries still scan full partitions even though the table is partitioned and clustered.

Exam trap

Google Cloud often tests the misconception that partitioning alone guarantees query efficiency, but the trap here is that clustering must exactly match the filter predicate to avoid full partition scans, and candidates may overlook the need for precise column matching in the WHERE clause.

How to eliminate wrong answers

Option A is wrong because query frequency (every 5 minutes) does not cause full partition scans; it may increase slot contention or cost but does not affect the pruning behavior of partitioning or clustering. Option B is wrong because wide-column schemas with repeated fields can increase storage and processing overhead but do not prevent partition pruning or clustering from working correctly; the issue is about filter matching, not schema complexity. Option C is wrong because the question explicitly states the table is partitioned by day, so partitioning by hour would be a different configuration; even if it were hourly, the core problem of full partition scans would still point to clustering mismatch, not the partition granularity.

Full explanation →

153

MCQmedium

A company uses Cloud Firestore in Datastore mode for a multi-tenant SaaS application. Each tenant has a separate namespace. The application has grown rapidly, and the database engineer notices that write throughput is degrading. Monitoring shows that the number of writes per second is high but within Firestore limits. However, the latency for writes is increasing linearly with the number of tenants. The engineer suspects that index management is causing the problem. The current schema uses automatic indexes for all properties. What is the best corrective action?

A.Disable automatic indexing and create custom composite indexes only for queries that require them.

B.Implement a data retention policy to delete old tenant data.

C.Request a throughput increase from Google Cloud support.

D.Shard tenants into separate Firestore databases to distribute the write load.

AnswerA

Automatic indexing causes every property to be indexed, resulting in excessive write operations. Custom indexes reduce write amplification.

Why this answer

The correct answer is A because automatic indexing in Firestore in Datastore mode creates an index for every property, which causes write amplification as each write must update all relevant indexes. With many tenants in separate namespaces, the number of indexes grows linearly with tenants, increasing write latency. Disabling automatic indexing and creating custom composite indexes only for needed queries reduces the index write overhead, restoring write throughput.

Exam trap

The trap here is that candidates may assume the issue is throughput capacity (Option C) or data volume (Option B), rather than recognizing that automatic indexing creates a write amplification problem that scales with schema complexity and tenant count.

How to eliminate wrong answers

Option B is wrong because implementing a data retention policy to delete old tenant data reduces storage but does not address the root cause of write latency increasing with tenant count due to index management overhead. Option C is wrong because requesting a throughput increase from Google Cloud support does not resolve the index write amplification issue; Firestore writes are already within limits, and the bottleneck is index-related, not capacity. Option D is wrong because sharding tenants into separate Firestore databases distributes the write load but does not eliminate the per-property automatic indexing overhead within each database; it also adds operational complexity and cost without fixing the core index management problem.

Full explanation →

154

MCQeasy

A startup is building a BI system on Cloud SQL (PostgreSQL) for small-to-medium datasets. The data warehouse includes a fact table 'sales_fact' with millions of rows and dimension tables. The BI team reports that 'sales_fact' queries are slow despite proper indexing. What design change would most likely improve performance?

A.Use a read replica to offload queries

B.Denormalize frequently joined dimension columns into the fact table

C.Switch to Cloud Spanner for better scalability

D.Add more indexes on every column used in WHERE clauses

AnswerB

This reduces the number of joins needed for BI queries.

Why this answer

Denormalizing frequently joined dimension columns into the fact table reduces the number of JOIN operations required for BI queries. In PostgreSQL on Cloud SQL, even with proper indexing, JOINs between a large fact table and multiple dimension tables can cause significant overhead due to tuple reconstruction and buffer pool churn. By storing commonly accessed dimension attributes directly in the fact table, queries become single-table scans or index lookups, dramatically reducing query latency for small-to-medium datasets.

Exam trap

Google Cloud often tests the misconception that more indexes or read replicas universally solve query performance issues, when in fact the root cause is often the JOIN overhead in star-schema designs, which denormalization directly addresses.

How to eliminate wrong answers

Option A is wrong because a read replica offloads read traffic but does not improve the performance of individual queries; the replica runs the same slow query plan on the same schema. Option C is wrong because Cloud Spanner is designed for globally distributed, horizontally scalable workloads with strong consistency, not for optimizing star-schema JOIN performance on small-to-medium datasets; it introduces higher latency and cost without addressing the JOIN overhead. Option D is wrong because adding more indexes on every column used in WHERE clauses can lead to index bloat, increased write overhead, and the query planner may still choose sequential scans or inefficient index joins if the fact table is large and the WHERE clauses are not selective enough.

Full explanation →

155

MCQmedium

The exhibit shows query metadata for a query that scans 10 GB. Given the table is 100 GB and partitioned by hire_date, why did the query scan 10 GB and not less?

A.The filter on hire_date is not selective enough to prune most partitions

B.Clustering on department is not being used because the query has ORDER BY

C.The query uses GROUP BY, which forces a full table scan

D.The table is not clustered properly

AnswerA

If the date range covers many days, many partitions are scanned.

Why this answer

Option A is correct because partition pruning in Databricks (and Spark SQL) depends on the selectivity of the filter predicate. If the filter on `hire_date` matches a large number of partitions (e.g., filtering on a range that covers 10 GB out of 100 GB), the query scans exactly those partitions. The question states the table is 100 GB and partitioned by `hire_date`, so a 10 GB scan implies the filter pruned 90 GB of partitions but was not selective enough to reduce the scan further—e.g., the predicate may be a broad range or lack a precise equality condition.

Exam trap

Google Cloud often tests the misconception that any filter on a partition column automatically prunes to a minimal scan, ignoring that the selectivity of the predicate (e.g., range vs. equality) determines how many partitions are actually skipped.

How to eliminate wrong answers

Option B is wrong because clustering on `department` is unrelated to partition pruning; clustering improves data skipping for non-partition columns, but the query's `ORDER BY` does not disable clustering benefits—it may even leverage them for sorting. Option C is wrong because `GROUP BY` does not force a full table scan in Databricks; partition pruning occurs before aggregation, so if the filter is selective, only relevant partitions are scanned. Option D is wrong because the table is partitioned by `hire_date`, and the scan size (10 GB) is consistent with proper partitioning; improper clustering would affect data skipping, not the partition-level scan size.

Full explanation →

156

MCQhard

You are a cloud database engineer for a financial services firm. The firm uses Cloud SQL for PostgreSQL to support a BI reporting tool. The main table 'transactions' has 500 million rows and is growing daily. Reports often run aggregations over date ranges and group by account_id. The 'transactions' table has indexes on date and account_id separately. Despite these indexes, the reporting queries are slow, often taking over 30 minutes. The database is deployed on a high-memory machine with 32 vCPUs and 256 GB RAM. You notice that the queries perform sequential scans instead of using indexes. What is the most likely reason, and what single change would you make to improve performance?

A.Partition the table by date using PostgreSQL declarative partitioning

B.Create a composite index on (date, account_id)

C.Increase the shared_buffers setting to 128 GB

D.Disable sequential scans by setting enable_seqscan = off

AnswerB

A composite index that matches the query's WHERE and GROUP BY can drastically reduce the data scanned.

Why this answer

The correct answer is B because the reporting queries filter by date ranges and group by account_id, but the existing separate indexes on date and account_id cannot be combined efficiently for both conditions. PostgreSQL's query planner often chooses a sequential scan over using two separate indexes because it estimates that reading the entire table is cheaper than the bitmap scan overhead of combining them. A composite index on (date, account_id) allows the database to directly locate rows matching the date range and then access them in account_id order, eliminating the need for a separate sort or join step.

Exam trap

Google Cloud often tests the misconception that adding separate indexes on each column is sufficient for multi-column queries, but the trap here is that PostgreSQL cannot efficiently combine separate indexes for both filtering and grouping without a composite index that matches the query's access pattern.

How to eliminate wrong answers

Option A is wrong because partitioning by date would only help if queries consistently filter on a single partition boundary, but the slow queries also group by account_id, and partitioning does not directly improve grouping performance without additional indexing. Option C is wrong because increasing shared_buffers beyond a certain point (e.g., 128 GB on a 256 GB machine) can cause PostgreSQL to spend more time managing the buffer pool and may lead to reduced performance due to kernel-level caching overhead; the issue is index usage, not memory size. Option D is wrong because disabling sequential scans with enable_seqscan = off is a dangerous global setting that can force the planner to use inefficient index scans even when a sequential scan would be faster, and it does not address the root cause of missing a suitable composite index.

Full explanation →

157

Drag & Dropmedium

Order the steps to migrate an on-premises MySQL database to Cloud SQL using Database Migration Service (DMS).

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First prepare source, then create connection profile, create migration job, start migration, and finally promote.

Full explanation →

158

MCQhard

A financial services company uses BigQuery to run complex analytical queries on trading data. They notice that a particular query joining a large fact table (10 TB) with a small dimension table (100 MB) is slow. The fact table is partitioned by date and clustered by symbol. The dimension table is not partitioned. The query filters on a specific date range and a few symbols. Which optimization is MOST likely to improve query performance?

A.Denormalize the dimension table into the fact table.

B.Enable automatic query rewriting to use clustering keys for pruning on the dimension table join.

C.Partition the dimension table by its primary key.

D.Cluster the dimension table on its primary key.

AnswerB

This allows BigQuery to prune clusters in the fact table based on the join condition with the dimension table.

Why this answer

Option B is correct because BigQuery's automatic query rewriting can leverage clustering keys from the fact table to prune the join, even though the dimension table is not clustered. When the query filters on a specific date range and symbols, BigQuery can use the fact table's clustering on symbol to skip irrelevant blocks during the join, reducing data scanned and improving performance. This optimization is automatic and does not require manual denormalization or repartitioning.

Exam trap

The trap here is that candidates assume clustering or partitioning must be applied to both tables in a join, when in fact BigQuery can use clustering from only the large fact table to prune the join, making options C and D unnecessary and option A an over-engineered solution.

How to eliminate wrong answers

Option A is wrong because denormalizing a 100 MB dimension table into a 10 TB fact table would massively increase storage and processing costs, and is unnecessary when clustering and pruning can achieve the same performance gain without data duplication. Option C is wrong because partitioning the dimension table by its primary key would create many small partitions (e.g., one per row), which is inefficient and does not help with join pruning; BigQuery partitions are best for date-based or integer-range pruning, not for high-cardinality keys. Option D is wrong because clustering the dimension table on its primary key would not improve the join performance significantly, as the dimension table is already small (100 MB) and the bottleneck is scanning the large fact table; clustering is most beneficial on large tables to reduce the amount of data read during filtering and joins.

Full explanation →

159

MCQeasy

A BI developer is designing a BigQuery dataset for a sales dashboard. Which column naming convention is considered a best practice for column names in BI reports?

A.Use names with spaces (e.g., Total Revenue).

B.Use descriptive, snake_case names (e.g., total_revenue).

C.Use short, cryptic abbreviations (e.g., tr).

D.Use camelCase names (e.g., totalRevenue).

AnswerB

Snake_case is readable and avoids quoting issues.

Why this answer

BigQuery column names are case-insensitive but must follow standard SQL naming rules. Using descriptive snake_case (e.g., total_revenue) improves readability, avoids ambiguity, and is consistent with BigQuery's own system tables and best practices for BI tools like Looker or Tableau, which often expect clean, underscore-separated identifiers.

Exam trap

Google Cloud often tests the misconception that spaces or camelCase are acceptable for readability, but the trap is that BigQuery requires backtick quoting for spaces and does not enforce a specific case convention, making snake_case the safest and most portable choice for BI reporting.

How to eliminate wrong answers

Option A is wrong because spaces in column names require backtick quoting (e.g., `Total Revenue`) in BigQuery SQL, which adds unnecessary complexity and can break automated queries or BI tool integrations. Option C is wrong because short, cryptic abbreviations (e.g., tr) reduce clarity and maintainability, making it difficult for other developers or business users to understand the data without external documentation. Option D is wrong because camelCase (e.g., totalRevenue) is not a standard convention in BigQuery; while technically allowed, it can cause confusion with case-insensitive comparisons and is less readable in SQL than snake_case.

Full explanation →

160

Multi-Selectmedium

A company uses BigQuery to run business intelligence reports. The data engineer needs to implement a star schema for a sales data warehouse. Which THREE are best practices when designing the tables?

Select 3 answers

A.Use natural keys in dimension tables for simplicity

B.Use a primary key on fact tables to enforce uniqueness

C.Store pre-aggregated data in dimension tables

D.Denormalize dimension tables to include descriptive attributes

E.Partition fact tables by date and cluster by frequently filtered columns

AnswersB, D, E

Ensures each row is unique and allows efficient joins.

Why this answer

Option B is correct because in BigQuery, fact tables should have a primary key to enforce uniqueness of each sales transaction, preventing duplicate rows that would skew aggregations like SUM or COUNT. BigQuery does not enforce primary keys natively, but defining them in the schema (e.g., using PRIMARY KEY constraint in DDL) allows the query engine to optimize joins and deduplication, especially when using MERGE statements. This ensures data integrity in the star schema.

Exam trap

Google Cloud often tests the misconception that dimension tables should be highly normalized or contain pre-aggregated data, but the PCDE exam emphasizes denormalizing dimensions for BI readability and storing aggregates only in fact tables or materialized views.

Full explanation →

161

Matchingmedium

Match each Cloud SQL tier to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Burstable, low-cost for small workloads

Shared-core, moderate performance

Standard machine with 1 vCPU and 3.75 GB RAM

High memory machine with 2 vCPUs and 13 GB RAM

High CPU machine with 4 vCPUs and 3.6 GB RAM

Why these pairings

These tiers reflect different vCPU and memory configurations for Cloud SQL.

Full explanation →

162

MCQhard

You are running a Cloud Bigtable instance for time-series data ingestion. Write throughput has dropped significantly, and you see an increase in 'resource exhausted' errors. The table has one column family and one rowkey format: `#orgId#deviceId#timestamp`. After analyzing cluster metrics, you see that one node is handling most of the traffic. What is the most likely cause?

A.The column family design is creating too many columns.

B.The rowkey design is causing hotspotting on one tablet node due to a prefix (orgId) being written heavily.

C.The cluster does not have enough nodes.

D.The timestamp is too granular, causing many rows with the same timestamp.

AnswerB

If one orgId dominates writes, all writes go to a single tablet.

Why this answer

The rowkey design `#orgId#deviceId#timestamp` causes hotspotting because all writes for a given `orgId` are directed to a single tablet node. Cloud Bigtable partitions data by rowkey range, and sequential or heavily skewed prefixes (like `orgId`) concentrate write traffic on one node, leading to 'resource exhausted' errors and throughput degradation.

Exam trap

Google Cloud often tests the misconception that scaling nodes (Option C) solves performance issues, but the real problem is rowkey design causing uneven load distribution, which cannot be fixed by adding nodes alone.

How to eliminate wrong answers

Option A is wrong because the number of columns in a column family does not cause hotspotting or node-level traffic imbalance; column family design affects storage and read patterns, not write distribution. Option C is wrong because adding more nodes would not fix the root cause—hotspotting—since all writes for the same `orgId` would still target the same node regardless of cluster size. Option D is wrong because timestamp granularity affects row uniqueness, not write distribution; multiple rows with the same timestamp do not cause one node to handle most traffic.

Full explanation →

163

MCQeasy

Refer to the exhibit. A company wants to perform point-in-time recovery (PITR) for their Cloud SQL MySQL instance. Is PITR enabled?

A.No, because PITR requires the backupConfiguration to have 'pointInTimeRecoveryEnabled: true'.

B.Yes, because binaryLogEnabled is true.

C.No, because transactionLogRetentionDays is set to 7.

D.Yes, because enabled is true.

AnswerB

Binary logs are used for MySQL PITR; their presence indicates PITR is enabled.

Why this answer

Option B is correct because Cloud SQL MySQL uses binary logging to enable point-in-time recovery (PITR). When `binaryLogEnabled` is set to `true`, the instance logs all changes, allowing restoration to any specific point within the configured transaction log retention period. The `pointInTimeRecoveryEnabled` field is not a valid Cloud SQL configuration parameter; instead, PITR is implicitly enabled when binary logging is active.

Exam trap

The trap here is that candidates confuse the `enabled` field (for automated backups) with PITR, or assume a separate `pointInTimeRecoveryEnabled` flag exists, when in fact Cloud SQL ties PITR directly to binary logging.

How to eliminate wrong answers

Option A is wrong because `pointInTimeRecoveryEnabled` is not a recognized field in Cloud SQL's backupConfiguration; PITR is controlled by `binaryLogEnabled`. Option C is wrong because `transactionLogRetentionDays` being set to 7 does not disable PITR; it defines how long binary logs are retained, and PITR works within that window. Option D is wrong because `enabled` refers to automated backups, not PITR; automated backups and binary logging are separate settings.

Full explanation →

164

MCQmedium

A team is designing a relational schema for a new application on Cloud SQL. The schema includes a table 'Orders' and a table 'Customers'. Each order belongs to one customer. The team anticipates high write throughput and needs to enforce referential integrity. Which schema design is most appropriate?

A.Use Cloud Spanner interleaved tables with Orders as a child of Customers

B.Implement referential integrity checks in the application code and omit database constraints

C.Store order data as a JSON array in a column of the Customers table

D.Use a foreign key constraint from Orders.customer_id to Customers.customer_id

AnswerD

Enforces integrity efficiently within the database.

Why this answer

Option D is correct because using a foreign key constraint from Orders.customer_id to Customers.customer_id enforces referential integrity at the database level, which is essential for maintaining data consistency in a relational schema. Cloud SQL (e.g., MySQL or PostgreSQL) natively supports foreign key constraints, ensuring that every order references an existing customer without relying on application logic. This approach is efficient for high write throughput as the database handles the check atomically, avoiding race conditions.

Exam trap

Google Cloud often tests the misconception that application-level checks are sufficient for high-throughput systems, but the trap here is that database-level foreign keys are the only way to guarantee referential integrity under concurrent writes, as application code cannot prevent race conditions or orphaned records.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner interleaved tables are designed for hierarchical data and strong consistency in a globally distributed environment, not for standard relational schemas on Cloud SQL; they also introduce complexity and cost that are unnecessary for a simple parent-child relationship. Option B is wrong because implementing referential integrity checks in application code is error-prone and cannot guarantee consistency under high write throughput, as concurrent writes can bypass application logic, leading to orphaned records. Option C is wrong because storing order data as a JSON array in a column of the Customers table violates normalization principles, making it difficult to query individual orders, enforce constraints, and scale write throughput efficiently.

Full explanation →

165

MCQmedium

A BI team runs a daily query on a BigQuery table 'events' partitioned by event_date. The query filters on event_date = CURRENT_DATE() and counts rows by event_type. The query is slow. Upon review, the table has 500 partitions but clustering is not set. Which action reduces query cost and latency?

A.Recreate the table with only the last 30 days of data

B.Use a wildcard table for daily ingestion

C.Increase the partition expiration to 365 days

D.Add clustering on event_type

AnswerD

Clustering on event_type organizes data by that column within each partition, speeding up count and group by.

Why this answer

Adding clustering on `event_type` physically co-locates rows with the same event type within each partition. This allows BigQuery to use block-level pruning when reading data, drastically reducing the number of bytes scanned for the COUNT(*) GROUP BY query. Since the query already filters on a single partition (`event_date = CURRENT_DATE()`), the performance bottleneck is scanning all rows in that partition; clustering eliminates that overhead without changing the table's structure or retention.

Exam trap

Google Cloud often tests the misconception that reducing data volume (e.g., by deleting old partitions or using wildcards) is the primary way to fix query performance, when in fact the correct solution is to optimize data access patterns within the existing partitions using clustering.

How to eliminate wrong answers

Option A is wrong because recreating the table with only 30 days of data does not address the root cause—the query already reads only one partition, so reducing the number of partitions has no effect on the bytes scanned for that single day. Option B is wrong because using a wildcard table for daily ingestion is a pattern for querying multiple tables, not a performance optimization; it would not reduce latency or cost for a query that already targets a single partition. Option C is wrong because increasing partition expiration to 365 days retains more data, which increases storage costs and does nothing to reduce the scan size or improve query performance for a query that already filters on a single partition.

Full explanation →

166

MCQmedium

Your company runs a critical PostgreSQL database on Cloud SQL. You need to minimize downtime during a schema migration that could take up to 30 minutes. What should you do?

A.Use Database Migration Service to migrate to a new instance during the migration window.

B.Create a clone of the instance, perform the migration on the clone, then promote the clone.

C.Configure a high-availability instance and perform the migration during a planned failover.

D.Add a read replica, perform the migration on the replica, then promote it.

AnswerB

Cloning allows offline migration with minimal downtime.

Why this answer

Option B is correct because creating a clone of the Cloud SQL instance allows you to perform the schema migration on an isolated copy without affecting the production database. Once the migration is complete and verified, you can promote the clone to take over as the primary instance, minimizing downtime to just the brief promotion switchover (typically seconds). This approach avoids the long 30-minute migration window on the live database.

Exam trap

Google Cloud often tests the misconception that read replicas can be promoted to become the primary instance in Cloud SQL, but in reality, Cloud SQL read replicas are strictly read-only and cannot be promoted; only clones or HA failover replicas can assume the primary role.

How to eliminate wrong answers

Option A is wrong because Database Migration Service is designed for continuous migrations (e.g., from on-premises or other clouds) and would introduce unnecessary complexity and potential data loss for a schema-only change; it does not provide a zero-downtime schema migration path within Cloud SQL. Option C is wrong because configuring a high-availability instance does not eliminate the need to apply the schema migration to the primary; during a planned failover, the migration would still need to run on the new primary, causing the same 30-minute downtime. Option D is wrong because read replicas in Cloud SQL are read-only and cannot be promoted to a writable primary; promoting a read replica is not supported, and any schema changes on a read replica would be overwritten by replication from the primary.

Full explanation →

167

MCQhard

A financial services company uses Cloud Spanner for transaction processing. They need to ensure zero downtime during a schema change that adds a new column with a default value to a large table. Which approach should the Database Engineer take?

A.Create a new table with the new column, then use a fan-out pattern to write to both tables until the old table is deprecated.

B.Use an ALTER TABLE statement during a maintenance window.

C.Drop the table and recreate it with the new schema.

D.Use ALTER TABLE to add the column; Spanner handles schema changes online.

AnswerD

Spanner schema changes are online and do not cause downtime.

Why this answer

Option D is correct because Cloud Spanner supports online schema changes, including adding columns with default values, without requiring a maintenance window or causing downtime. Spanner applies schema updates asynchronously across all nodes while the database remains fully available for reads and writes, making it the ideal approach for zero-downtime requirements.

Exam trap

The trap here is that candidates assume schema changes on large databases always require a maintenance window or a workaround like dual-write patterns, but Spanner's distributed architecture is specifically designed to handle schema changes online without downtime.

How to eliminate wrong answers

Option A is wrong because it introduces unnecessary complexity and operational overhead; Spanner's native online schema change capability eliminates the need for a fan-out pattern, which would also require dual-write management and eventual deprecation logic. Option B is wrong because it assumes a maintenance window is required, contradicting Spanner's design for online schema changes that do not need planned downtime. Option C is wrong because dropping and recreating a table causes complete data loss and extended downtime, which is entirely unnecessary when Spanner can add columns online without disruption.

Full explanation →

168

MCQhard

A BI team in a large enterprise uses Looker connected to BigQuery. The data model has a primary table 'sales_fact' with billions of rows and multiple dimensions. The team notices that Looker queries often time out. Which approach would most likely resolve this without changing the data model?

A.Request Google Support to increase BigQuery timeout

B.Create a materialized view in BigQuery for the most common aggregations

C.Increase BigQuery slot capacity

D.Switch Looker to use SQL Runner only

AnswerB

Materialized views precompute aggregates and are automatically refreshed, reducing query time without model changes.

Why this answer

Using Looker's persistent derived tables (PDTs) can pre-aggregate data and speed up dashboard queries.

Full explanation →

169

Multi-Selecthard

Which THREE actions can help reduce read latency in Cloud Spanner?

Select 3 answers

A.Use read-only transactions with strong consistency

B.Increase the staleness allowed for read queries

C.Structure tables with interleaved parent-child relationships

D.Use secondary indexes to avoid full table scans

E.Batch multiple write operations into a single mutation

AnswersA, C, D

Read-only transactions can execute faster without locks.

Why this answer

Read-only transactions with strong consistency in Cloud Spanner use lock-free reads that return the most recent data without blocking writes. This reduces read latency because the system can serve the data directly from the current timestamp without waiting for write locks or replication delays, making it ideal for low-latency, strongly consistent reads.

Exam trap

Google Cloud often tests the distinction between read latency and write latency, so candidates may incorrectly choose batching writes (Option E) or increasing staleness (Option B) as read latency reducers, when in fact those affect write performance or trade off consistency for speed.

Full explanation →

170

Drag & Dropmedium

Order the steps to set up a Cloud Spanner instance with a global database.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First create the instance, then database, then schema, then data, then IAM.

Full explanation →

171

MCQmedium

You need to export data from Cloud Spanner for archival purposes. Which method is most cost-effective?

A.Use a Cloud Function to copy data

B.Use Dataflow to read and write to Cloud Storage

C.Use gcloud spanner databases export

D.Use gcloud spanner databases execute-sql

AnswerC

The export command directly exports to Cloud Storage with no extra compute cost.

Why this answer

Option A is correct because the native gcloud spanner databases export command exports data directly to Cloud Storage with minimal cost and no additional compute resources. Options B and D (Dataflow, Cloud Function) incur additional compute charges. Option C (execute-sql) is not suitable for full database export.

Full explanation →

172

MCQmedium

You are tuning a Cloud SQL for PostgreSQL instance that runs reporting queries. The slowest query performs a full table scan on a 100 GB table. Which action is most likely to improve performance?

A.Create an index on the columns used in the WHERE clause

B.Increase shared_buffers to 50% of instance memory

C.Set statement_timeout to 30 seconds

D.Move the query to a read replica

AnswerA

An index allows the database to locate rows quickly.

Why this answer

A full table scan on a 100 GB table indicates that PostgreSQL has no efficient access path to retrieve the required rows. Creating an index on the columns used in the WHERE clause allows the query planner to use an index scan instead of a sequential scan, drastically reducing the number of disk pages read and improving query performance. This is the most direct and effective optimization for a query that filters rows without an index.

Exam trap

Google Cloud often tests the misconception that simply adding more memory or offloading work to a replica will fix performance issues caused by missing indexes, but the root cause—lack of an access path—must be addressed directly.

How to eliminate wrong answers

Option B is wrong because increasing shared_buffers to 50% of instance memory can cause excessive memory consumption, leading to increased checkpoint I/O and potential out-of-memory errors; PostgreSQL recommends shared_buffers be set to 25% of total memory, not 50%. Option C is wrong because setting statement_timeout to 30 seconds does not improve performance—it only aborts the query if it exceeds that duration, which would cause the query to fail rather than run faster. Option D is wrong because moving the query to a read replica does not eliminate the full table scan; the replica still performs the same sequential scan on the same large table, so performance remains poor.

Full explanation →

173

MCQmedium

A Cloud Firestore database stores documents for a mobile app. The app frequently queries for documents where a specific Boolean field is true. The field is not part of the collection group index. What should the developer do to improve query performance?

A.Add a synthetic field that combines the Boolean with a timestamp for range queries.

B.Create a composite index that includes the Boolean field and the query ordering field.

C.Denormalize the Boolean field into separate subcollections.

D.Rely on the automatic single-field index already created.

AnswerB

A composite index tailored to the query pattern improves performance and avoids full collection scans.

Why this answer

Option B is correct because creating a composite index on the Boolean field and other commonly filtered fields will allow efficient queries. Option A is wrong because single-field indexes are automatically created, but the field is Boolean so index exists but may not be sufficient. Option C is wrong because denormalization may increase complexity.

Option D is wrong because adding a new field doesn't help if it's not indexed.

Full explanation →

174

Multi-Selecteasy

A company is planning to migrate a large on-premises MySQL database (5 TB) to Cloud SQL. They need to choose the appropriate Cloud SQL edition. Which two factors should they consider? (Choose two.)

Select 2 answers

A.Availability of regional high availability.

B.Automated backup capabilities.

C.Maximum storage capacity supported by each edition.

D.Performance characteristics such as maximum IOPS and throughput.

E.Integration with VPC networks.

AnswersC, D

Enterprise Plus supports 30 TB, Enterprise supports 16 TB.

Why this answer

Option C is correct because Cloud SQL editions (Enterprise, Enterprise Plus) have different maximum storage capacities; for example, Enterprise Plus supports up to 30 TB while Enterprise supports up to 10 TB. For a 5 TB database, the edition must support at least that capacity, making this a key factor. Option D is correct because performance characteristics like maximum IOPS and throughput vary by edition and directly impact database performance, especially for a large 5 TB workload.

Exam trap

The trap here is that candidates often confuse features that are available across all editions (like VPC integration, backups, or HA) with edition-specific differentiators, leading them to select options that are not actually factors in edition choice.

Full explanation →

175

Multi-Selectmedium

A manufacturing company is deploying a time-series database for sensor data on Cloud Bigtable. They expect 100 TB of data per year and need low-latency reads on row keys within the last hour. Which TWO design choices should they make?

Select 2 answers

A.Set the maximum QPS to 1000 to avoid overloading the cluster.

B.Partition the data into separate tables for each month.

C.Use a reverse timestamp in the row key so that recent data is at the beginning of the table.

D.Use a salting prefix on the row key to distribute writes across nodes.

E.Design row keys to be as long as possible to avoid collisions.

AnswersC, D

Reversing the timestamp makes the most recent data appear first, speeding up reads for the last hour.

Why this answer

Options A (salting) and D (reverse timestamp row keys) are correct. Salting distributes writes, and reverse timestamps allow recent data to appear first, improving read latency for recent rows. Option B is wrong because storing data across multiple tables adds overhead.

Option C is wrong because row keys should not be too long. Option E is wrong because Bigtable is designed for high throughput; limiting to 1000 QPS is unnecessary.

Full explanation →

176

MCQhard

A Cloud Bigtable instance is experiencing hotspotting on a single node during heavy write traffic. The row keys are based on a timestamp prefix. Which change should they make to the row key design to distribute writes evenly?

A.Use a reverse timestamp (e.g., MAX_TIMESTAMP - timestamp)

B.Increase the number of nodes in the cluster

C.Add a random prefix (salting) to the row key

D.Enable replication across zones

AnswerC

Salting distributes writes across nodes by randomizing the start of the row key.

Why this answer

Adding a random prefix (salting) to the row key distributes writes across multiple tablet servers by ensuring that consecutive timestamps do not all hash to the same node. This prevents hotspotting because Cloud Bigtable partitions rows lexicographically by row key; a monotonically increasing timestamp prefix causes all new writes to land on a single tablet server. Salting spreads the write load uniformly across the cluster.

Exam trap

Google Cloud often tests the misconception that scaling infrastructure (adding nodes or replication) can fix a design-level hotspotting issue, when the correct solution is to modify the row key schema to distribute the load.

How to eliminate wrong answers

Option A is wrong because reversing the timestamp (e.g., MAX_TIMESTAMP - timestamp) still produces a monotonically decreasing sequence, which will still hotspot on a single node as all new writes will be adjacent in the key space. Option B is wrong because increasing the number of nodes does not fix the root cause—the row key design still funnels all writes to one tablet server; additional nodes will remain idle for writes. Option D is wrong because replication across zones is for disaster recovery and read scalability, not for distributing write load within a single cluster; it does not change the row key distribution.

Full explanation →

177

MCQmedium

You are designing a database architecture for a global e-commerce application. The application requires low-latency reads in multiple regions and must handle up to 100,000 writes per second globally. Which Google Cloud database solution should you use?

A.Firestore in multi-region mode.

B.Bigtable with multiple clusters.

C.Cloud SQL with cross-region replication.

D.Cloud Spanner with multi-region configuration.

AnswerD

Spanner is designed for global scale and strong consistency.

Why this answer

Cloud Spanner with a multi-region configuration is the correct choice because it provides globally distributed, strongly consistent relational database service with horizontal scalability, supporting over 100,000 writes per second across multiple regions while maintaining ACID transactions and low-latency reads via regional replicas. This meets the requirements of a global e-commerce application needing high write throughput and low read latency in multiple regions.

Exam trap

The trap here is that candidates often confuse high write throughput with NoSQL solutions like Bigtable or Firestore, overlooking that Cloud Spanner uniquely combines horizontal scalability with strong consistency and SQL support required for transactional e-commerce workloads.

How to eliminate wrong answers

Option A is wrong because Firestore in multi-region mode is a NoSQL document database optimized for mobile and web apps with eventual consistency for multi-region reads, not designed for 100,000 writes per second globally and lacks strong transactional consistency across regions. Option B is wrong because Bigtable with multiple clusters is a wide-column NoSQL database designed for high-throughput analytical workloads, not transactional e-commerce, and does not support SQL queries or strong consistency across clusters. Option C is wrong because Cloud SQL with cross-region replication is a managed relational database with limited scalability (max ~64,000 writes per second for MySQL) and cross-region replication introduces replication lag, failing to meet low-latency reads and high write throughput globally.

Full explanation →

178

Multi-Selecteasy

A data engineer is designing a BigQuery schema for a dataset that will be used for both ad-hoc analysis and scheduled dashboards. They want to optimize costs and performance. Which three strategies should they consider? (Choose three.)

Select 3 answers

A.Use wildcard tables with a suffix filter.

B.Store data in multiple tables per day.

C.Use partitioning on a date column for time-based queries.

D.Use materialized views for pre-aggregated results.

E.Cluster on columns frequently used in filters.

AnswersC, D, E

Partitioning prunes partitions not needed by the query, reducing cost.

Why this answer

Partitioning (A) reduces scanned data by date. Clustering (B) improves filter queries. Materialized views (D) precompute aggregates.

Option C (multiple tables) increases management and query complexity. Option E (wildcard tables) can be costly if not used carefully and still scans all tables.

Full explanation →

179

MCQhard

An e-commerce company uses BigQuery for BI. They have a large orders table with columns: order_id, customer_id, order_date, amount, status. Queries frequently aggregate total amount by customer and month. The current table is not partitioned. Users complain about high costs. The table is 2 TB and grows by 50 GB daily. Which action reduces query costs most?

A.Partition the table by order_date and cluster by customer_id.

B.Use a wildcard table with daily shards.

C.Create a materialized view that aggregates by customer and month.

D.Set a maximum bytes billed limit on the project.

AnswerC

Materialized view stores the aggregation, converting queries to small scans of precomputed data.

Why this answer

Option C is correct because a materialized view pre-aggregates the total amount by customer and month, eliminating the need to scan the full 2 TB table for every query. This drastically reduces the bytes processed per query, directly lowering BigQuery costs. Since the table grows by 50 GB daily, the materialized view incrementally updates, ensuring fresh results without reprocessing historical data.

Exam trap

Google Cloud often tests the misconception that partitioning alone solves all cost issues, but the trap here is that partitioning reduces scan for date-range queries, not for aggregation queries that span many partitions; a materialized view is the correct cost-reduction strategy for pre-aggregated results.

How to eliminate wrong answers

Option A is wrong because partitioning by order_date and clustering by customer_id reduces bytes scanned for date-range filters, but queries aggregating by customer and month still require scanning all partitions that match the month, which can be large. Option B is wrong because wildcard tables with daily shards require manual management and each query must union or scan multiple shards, leading to higher costs and complexity compared to a single partitioned table. Option D is wrong because setting a maximum bytes billed limit only caps costs but does not reduce the bytes processed; queries that exceed the limit will fail, not become cheaper.

Full explanation →

180

Multi-Selectmedium

Your company is migrating a 2 TB SQL Server database to Cloud SQL. You need to choose a migration approach that minimizes downtime and supports ongoing changes. Which TWO options meet these requirements?

Select 2 answers

A.Use Database Migration Service with a source of SQL Server.

B.Export the database as a .bak file and restore to Cloud SQL.

C.Use BCP to export data and import via gcloud sql import.

D.Configure a linked server to Cloud SQL from on-premises and use data synchronization tools.

E.Migrate to Azure SQL Managed Instance first, then to Cloud SQL.

AnswersA, D

DMS supports SQL Server to Cloud SQL for SQL Server with continuous replication.

Why this answer

Database Migration Service (DMS) supports continuous change data capture (CDC) for SQL Server, enabling near-zero-downtime migrations by replicating ongoing changes from the source to Cloud SQL until cutover. This approach minimizes downtime and keeps the database synchronized during the migration window, meeting both requirements.

Exam trap

Google Cloud often tests the misconception that a full export/import (like .bak or BCP) can be performed with minimal downtime, but these methods require the source to be static, failing the 'ongoing changes' requirement.

Full explanation →

181

MCQhard

You are designing a Cloud Spanner schema for a global social media application. The application reads the most recent 100 posts for a user's timeline. The Posts table has a primary key of (UserId, PostTimestamp DESC). You observe that queries for the timeline are hitting high read latency and the transaction abort rate is increasing. What is the most likely cause?

A.The descending timestamp in the primary key is causing hotspotting on the tablet leader for the most recent data.

B.The instance has too few nodes to handle the write volume.

C.The lack of a secondary index on UserId is forcing full table scans.

D.The table should be interleaved under a Users table.

AnswerA

Frequent inserts to the same split (latest timestamp) cause contention.

Why this answer

The descending timestamp in the primary key causes all writes for the most recent posts to be concentrated on the same tablet leader, because Cloud Spanner uses the first primary key column for splitting and distributing data. This hotspotting leads to high read latency and increased transaction abort rates as the single leader becomes overloaded with both writes and reads for the latest data.

Exam trap

Google Cloud often tests the misconception that adding more nodes or secondary indexes solves all performance problems, when in fact the root cause is often a poorly designed primary key that creates hotspotting in Cloud Spanner's distributed architecture.

How to eliminate wrong answers

Option B is wrong because while insufficient nodes can cause performance issues, the specific symptom of high read latency combined with increasing transaction abort rates points to hotspotting from the primary key design, not a general capacity problem. Option C is wrong because the primary key already includes UserId as the first column, so queries filtering by UserId can use the primary key directly without needing a secondary index. Option D is wrong because interleaving the Posts table under a Users table would not solve the hotspotting issue caused by the descending timestamp; it would actually exacerbate the problem by colocating all posts for a user on the same split.

Full explanation →

182

Multi-Selecthard

You are managing a Cloud SQL for MySQL instance that is experiencing high replication lag. The instance uses semi-synchronous replication. Which THREE actions could reduce the replication lag?

Select 3 answers

A.Change replication mode from semi-synchronous to asynchronous.

B.Reduce the binlog retention period on the primary.

C.Configure the replica to use parallel replication (slave_parallel_workers > 0).

D.Increase the machine type of the replica.

E.Enable binary log compression on the primary.

AnswersA, C, D

Async replication removes the acknowledgement wait, reducing lag.

Why this answer

Option A is correct because switching from semi-synchronous to asynchronous replication removes the requirement for the primary to wait for at least one replica to acknowledge receipt of each transaction. In semi-synchronous mode, the primary waits for acknowledgment, which can introduce latency and contribute to replication lag under high write loads. Asynchronous replication allows the primary to commit transactions immediately without waiting, reducing the time the primary spends waiting for replica acknowledgment and thus lowering replication lag.

Exam trap

Google Cloud often tests the misconception that reducing log retention or compressing logs directly reduces replication lag, when in fact these actions affect storage or network transfer, not the apply speed or acknowledgment delay that cause lag.

Full explanation →

183

MCQeasy

A database engineer is reviewing the configuration of a Cloud SQL for MySQL instance. The backup configuration shows binaryLogEnabled and pointInTimeRecoveryEnabled set to true. However, the engineer is unable to perform a point-in-time recovery (PITR) to a specific second within the last 30 days. What is the most likely reason?

A.Binary logging is not enabled for all databases on the instance.

B.The transaction log retention period is not set, defaulting to 7 days, so logs older than that are purged.

C.The instance is not in a runnable state.

D.The database version is MySQL 8.0, which does not support PITR.

AnswerB

PITR requires transaction logs to be retained for the desired recovery window; the default is 7 days.

Why this answer

Option B is correct because, by default, Cloud SQL for MySQL sets the transaction log retention period to 7 days when binary logging and point-in-time recovery are enabled. This means that binary logs older than 7 days are automatically purged, making it impossible to perform a PITR to a specific second beyond that window, even if the backup retention is set to 30 days. To recover to any point within the last 30 days, the transaction log retention period must be explicitly configured to match the backup retention period.

Exam trap

Google Cloud often tests the misconception that enabling binary logging and point-in-time recovery automatically allows recovery to any point within the backup retention period, ignoring the separate transaction log retention default of 7 days.

How to eliminate wrong answers

Option A is wrong because binary logging in Cloud SQL for MySQL is enabled at the instance level, not per database, and when binaryLogEnabled is true, it applies to all databases on the instance. Option C is wrong because the instance being in a runnable state is not a prerequisite for performing a PITR; the recovery operation uses stored backups and transaction logs, and the instance can be restored from a non-runnable state. Option D is wrong because MySQL 8.0 fully supports point-in-time recovery; the limitation is not the database version but the transaction log retention period.

Full explanation →

184

MCQeasy

You are using Cloud Monitoring to track performance of a Cloud Spanner instance. Which metric indicates that a database is approaching its throughput limits and may need a split or additional nodes?

A.Storage utilization (bytes used)

B.Query error rate (errors per second)

C.High CPU utilization (percentage)

D.Average commit latency

AnswerC

CPU utilization above 65% suggests the instance is nearing its throughput limit.

Why this answer

High CPU utilization (percentage) is the correct metric because Cloud Spanner's CPU utilization directly reflects the processing load on the instance's nodes. When CPU utilization consistently exceeds 65-70%, it indicates the database is approaching its throughput limits, often requiring a split (to distribute load across more tablets) or additional nodes to maintain performance and avoid throttling.

Exam trap

Google Cloud often tests the misconception that storage utilization or latency metrics indicate throughput limits, but the correct answer is CPU utilization because it directly measures the processing capacity headroom in Cloud Spanner's node-based architecture.

How to eliminate wrong answers

Option A is wrong because storage utilization measures data volume, not throughput capacity; a database can have low storage but still hit throughput limits due to high read/write operations. Option B is wrong because query error rate indicates failures (e.g., due to deadlocks or timeouts), but it is a lagging indicator of throughput saturation, not a direct measure of approaching limits. Option D is wrong because average commit latency can increase due to many factors (e.g., network latency, contention) and is not a primary metric for throughput capacity; Spanner's key metric for node saturation is CPU utilization.

Full explanation →

185

MCQeasy

A BI analyst needs to calculate a running total of sales by region over time in BigQuery. Which SQL window function should be used?

A.RANK() OVER (PARTITION BY region ORDER BY date)

B.SUM(sales) OVER (PARTITION BY region ORDER BY date)

C.ROW_NUMBER() OVER (PARTITION BY region ORDER BY date)

D.COUNT(sales) OVER (PARTITION BY region ORDER BY date)

AnswerB

This correctly computes a running total per region.

Why this answer

Option B is correct because the SUM() window function with an ORDER BY clause in the OVER() clause computes a running total (cumulative sum) over the specified partition. In BigQuery, when you include ORDER BY inside a window function's OVER() clause, the default window frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which produces the running total for each region ordered by date.

Exam trap

Google Cloud often tests the distinction between aggregate functions (SUM, COUNT) and ranking functions (RANK, ROW_NUMBER) in window functions, and the trap here is that candidates confuse RANK() or ROW_NUMBER() with the ability to compute a running total, not realizing that only SUM() with an ORDER BY clause produces a cumulative sum.

How to eliminate wrong answers

Option A is wrong because RANK() assigns a rank to each row based on the ordering, not a running total of sales. Option C is wrong because ROW_NUMBER() assigns a sequential integer to each row, not a cumulative sum. Option D is wrong because COUNT(sales) counts the number of non-null sales values up to the current row, not the sum of sales.

Full explanation →

186

MCQmedium

A retail company stores sales transactions in BigQuery. They want to create a materialized view that aggregates daily sales by product category, but they need the view to refresh automatically within 5 minutes of new data being inserted. The source table is partitioned by transaction_date and has a streaming buffer. What should they do to ensure the materialized view refreshes quickly enough?

A.Set max_staleness on the base table to 5 minutes.

B.Disable streaming inserts and use batch loads only.

C.Increase the streaming buffer size on the base table.

D.Set the materialized view's max_staleness interval to 5 minutes and allow relaxed consistency.

AnswerD

This allows the view to use base table storage for faster refresh, meeting the 5-minute requirement.

Why this answer

Option D is correct because setting the `max_staleness` interval on the materialized view to 5 minutes allows BigQuery to serve query results from the view even if the underlying base table's streaming buffer hasn't fully committed, as long as the data is within the staleness window. This enables the materialized view to reflect near-real-time data without waiting for the streaming buffer to fully materialize, meeting the 5-minute refresh requirement.

Exam trap

Google Cloud often tests the misconception that `max_staleness` is set on the base table or that streaming buffer size can be manually tuned, when in fact `max_staleness` is a materialized view property that relaxes consistency to achieve faster refresh.

How to eliminate wrong answers

Option A is wrong because `max_staleness` is a property of materialized views or tables that controls how stale results can be served, not a property set on the base table to force faster refresh. Option B is wrong because disabling streaming inserts and using batch loads only would eliminate the streaming buffer but would introduce latency from batch job scheduling, making it impossible to achieve sub-5-minute refreshes. Option C is wrong because the streaming buffer size is not configurable by users; BigQuery manages it automatically, and increasing it would not speed up materialized view refresh.

Full explanation →

187

MCQmedium

A Cloud SQL for PostgreSQL database experiences lock contention during heavy concurrent writes on a single table. Which schema design change can most effectively reduce contention?

A.Deploy read replicas to offload reads

B.Use a connection pooler like PgBouncer

C.Create materialized views for read queries

D.Partition the table by a key that spreads write load

AnswerD

Partitioning reduces lock contention by distributing writes.

Why this answer

Option C is correct: table partitioning splits data into smaller physical pieces, reducing lock conflicts because writes target different partitions. Option A (read replicas) does not reduce write contention. Option B (connection pooling) improves connection management but not locking.

Option D (materialized views) does not affect write locking.

Full explanation →

188

MCQhard

You have a BigQuery table with billions of rows partitioned by date and clustered on country. Users frequently query the table to compute total sales by product for a specific month. The product field has high cardinality (millions of distinct values). Which optimization would improve query performance the most?

A.Use a wildcard table pattern to query across date partitions

B.Re-cluster the table with product as the first clustering column

C.Partition by product

D.Keep the current clustering on country

AnswerB

Clustering on product improves aggregation performance by grouping data physically.

Why this answer

Option C is correct: re-clustering the table with product as the first clustering column ensures that the aggregation benefits from clustering, as the query groups by product and filters on date. The current clustering on country is not used in the query, so it provides no benefit. Option A keeps the current clustering, which is ineffective.

Option B is not possible because BigQuery only supports time-unit or integer range partitioning. Option D uses wildcard tables, which doesn't help with performance.

Full explanation →

189

MCQeasy

Your Cloud SQL for MySQL instance is experiencing unusually high disk usage. You need to identify the cause. Which metric should you monitor in Cloud Monitoring?

A.InnoDB row reads

B.CPU utilization

C.Query latency

D.Binary log disk usage

AnswerD

Binary logs can consume significant disk space.

Why this answer

Binary logs in MySQL store all data changes (e.g., INSERT, UPDATE, DELETE) and can consume significant disk space, especially under heavy write workloads or if retention is misconfigured. Monitoring 'Binary log disk usage' directly reveals whether accumulated binary logs are the primary cause of high disk usage, which is a common issue in Cloud SQL for MySQL.

Exam trap

The trap here is that candidates may confuse performance metrics (like CPU or latency) with storage metrics, overlooking the direct disk space impact of binary logs in MySQL replication or backup configurations.

How to eliminate wrong answers

Option A is wrong because InnoDB row reads measure the number of rows read from the InnoDB storage engine, which indicates query activity but does not directly correlate with disk space consumption. Option B is wrong because CPU utilization reflects processing load, not storage usage; high CPU could be a symptom of inefficient queries but does not explain disk usage. Option C is wrong because query latency measures the time taken to execute queries, which can be affected by disk I/O but does not identify the specific cause of high disk usage.

Full explanation →

190

MCQmedium

A company is using BigQuery and needs to implement row-level security so that sales representatives only see their own region's data. Which approach?

A.Use BigQuery column-level security to filter by region

B.Create separate tables for each region and union in views

C.Use authorized views with WHERE clause filtering by session user's region

D.Use IAM conditions at the dataset level

AnswerC

Authorized views can apply row-level filters using SESSION_USER() and a mapping table, ensuring users only see their data.

Why this answer

Option C is correct because BigQuery authorized views allow you to enforce row-level security by embedding a WHERE clause that filters data based on the session user's region (e.g., using SESSION_USER() or a mapping table). This ensures each sales representative sees only their own region's data without exposing the underlying tables directly.

Exam trap

Google Cloud often tests the distinction between column-level security (which restricts columns) and row-level security (which restricts rows), leading candidates to mistakenly choose column-level options when row filtering is required.

How to eliminate wrong answers

Option A is wrong because BigQuery column-level security restricts access to specific columns, not rows; it cannot filter by region values. Option B is wrong because creating separate tables per region and unioning them in views is unscalable, violates data normalization, and does not dynamically filter by the current user. Option D is wrong because IAM conditions at the dataset level control access to entire datasets or tables, not individual rows within a table.

Full explanation →

191

Multi-Selecthard

Which TWO techniques can help avoid hot spotting in a Cloud Spanner table?

Select 2 answers

A.Add a hash of the primary key as the first part of the key

B.Use a monotonically increasing integer as the key

C.Use interleaved tables to distribute writes

D.Create a secondary index on a high-cardinality column

E.Use a random prefix or UUID as the first key column

AnswersA, E

Hash prefix evenly distributes writes.

Why this answer

Option A is correct: hash prefix distributes writes across splits. Option B is correct: using a random prefix also spreads writes, though hash prefix is more common. Option C is incorrect: monotonically increasing keys cause hotspotting.

Option D is incorrect: interleaving does not prevent hotspotting on the parent key. Option E is incorrect: secondary indexes can cause their own hotspotting.

Full explanation →

192

Multi-Selecteasy

Which TWO actions can help reduce connection overhead in a Cloud SQL for MySQL instance? (Choose two.)

Select 2 answers

A.Disable SSL encryption for database connections.

B.Enable automatic connection management in the application driver.

C.Use Cloud SQL Proxy or a connection pooler.

D.Increase the max_connections parameter.

E.Add a read replica to handle connect requests.

AnswersB, C

Automatic pooling reduces connection creation.

Why this answer

Options B and D are correct. Using a connection pooler (B) like Cloud SQL Proxy reduces connection churn. Enabling automatic connection management (D) also helps.

Option A is wrong because increasing max connections does not reduce overhead. Option C is wrong because SSL may increase overhead. Option E is wrong because read replicas do not reduce connection overhead on the primary.

Full explanation →

193

MCQhard

A company has a Cloud SQL instance with a 500 GB database. They need to perform a major version upgrade from MySQL 5.7 to 8.0 with minimal downtime. Which strategy should they use?

A.Create a read replica with new version, then promote

B.Export and import the database

C.Use in-place upgrade via gcloud command

D.Upgrade during maintenance window

AnswerA

A read replica can be created with MySQL 8.0 and promoted after sync, minimizing downtime to a failover moment.

Why this answer

Creating a read replica with the new MySQL version and then promoting it minimizes downtime because replication keeps the replica synchronized with the primary until promotion. This approach avoids the lengthy export/import process and reduces the risk of data loss or extended unavailability.

Exam trap

The trap here is that candidates may think an in-place upgrade (Option C) is possible with a simple gcloud command, but Cloud SQL requires a replica-based approach for major version changes to ensure minimal downtime and data integrity.

How to eliminate wrong answers

Option B is wrong because exporting and importing a 500 GB database would take hours, causing significant downtime and potential data inconsistency. Option C is wrong because Cloud SQL does not support in-place major version upgrades via gcloud; the gcloud command can only trigger a database flag change or minor version upgrade, not a major version jump. Option D is wrong because upgrading during a maintenance window still requires a database restart and may involve a lengthy migration process, not minimizing downtime as effectively as a replica promotion.

Full explanation →

194

MCQmedium

A company collects sensor data from millions of devices globally. Each write is a small record with a device ID and timestamp. They need low write latency and high availability. Which database should they choose?

A.Cloud Firestore

B.Cloud Memorystore

C.Cloud SQL for MySQL

D.Cloud Bigtable

AnswerD

Bigtable excels at high write throughput, low latency, and is ideal for IoT time-series data.

Why this answer

Cloud Bigtable is a fully managed, scalable NoSQL database designed for large analytical and operational workloads with high throughput and low latency. It supports millions of writes per second from globally distributed devices, provides high availability through replication, and is optimized for time-series data like sensor records with device IDs and timestamps.

Exam trap

Google Cloud often tests the misconception that any NoSQL database is suitable for high-throughput writes, but candidates must distinguish between document stores (Firestore) and wide-column stores (Bigtable) designed for massive write scalability.

How to eliminate wrong answers

Option A is wrong because Cloud Firestore is a document-oriented NoSQL database optimized for mobile and web app real-time sync, not for high-throughput sensor ingestion from millions of devices; it has write limits per database that cannot match Bigtable's scale. Option B is wrong because Cloud Memorystore is an in-memory cache (Redis/Memcached) that does not provide durable storage or the ability to store large volumes of historical sensor data. Option C is wrong because Cloud SQL for MySQL is a relational database with limited write throughput and scaling constraints (e.g., read replicas, not horizontal sharding), making it unsuitable for millions of concurrent small writes from global devices.

Full explanation →

195

MCQmedium

A data engineer is writing a SQL query in BigQuery to calculate the running total of sales per product over time. The table 'sales' has columns product_id, sale_date, and amount. The result must include the cumulative sum ordered by sale_date for each product. Which SQL construct should be used?

A.GROUP BY product_id, sale_date with SUM(amount)

B.SUM(amount) OVER (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

C.ROW_NUMBER() OVER (ORDER BY sale_date)

D.LAG(amount, 1, 0) OVER (ORDER BY sale_date)

AnswerB

This window function correctly computes a running total per product.

Why this answer

Option B is correct because it uses a window function with a PARTITION BY clause to reset the running total per product and an ORDER BY with a ROWS frame to compute the cumulative sum over time. This is the standard SQL construct in BigQuery for calculating running totals within partitions.

Exam trap

Google Cloud often tests the distinction between aggregate functions with GROUP BY and window functions with OVER, where candidates mistakenly choose GROUP BY thinking it produces a running total, but it only collapses rows.

How to eliminate wrong answers

Option A is wrong because GROUP BY with SUM(amount) aggregates sales into a single total per product and date, not a running cumulative sum over time. Option C is wrong because ROW_NUMBER() assigns sequential row numbers but does not compute any sum or cumulative value. Option D is wrong because LAG() accesses a previous row's value but does not accumulate sums across rows.

Full explanation →

196

Multi-Selectmedium

You are designing a highly available Cloud SQL for MySQL architecture. Which TWO components are essential?

Select 2 answers

A.High availability (HA) configuration with a standby zone

B.Regional persistent disk

C.Point-in-time recovery (PITR)

D.Read replicas in the same region

E.Cross-region replication

AnswersA, B

HA provides automatic failover to a standby instance in another zone.

Why this answer

A is correct because Cloud SQL for MySQL High Availability (HA) configuration provisions a standby instance in a different zone within the same region, enabling automatic failover with minimal downtime. B is correct because regional persistent disks replicate data synchronously across two zones, ensuring data durability and availability even if an entire zone fails, which is a prerequisite for HA.

Exam trap

The trap here is that candidates often confuse high availability features (like automatic failover and zone redundancy) with disaster recovery or backup features (like PITR or cross-region replication), leading them to select options that improve data protection but do not ensure continuous uptime within a single region.

Full explanation →

197

MCQhard

Refer to the exhibit. This DDL is used to create a table in Cloud Spanner. The table will be used for storing user data with high write throughput. What is one performance issue with this table design?

A.The primary key is a monotonically increasing integer

B.The Name column is of type STRING(MAX)

C.There are no secondary indexes

D.The table is not interleaved with another table

AnswerA

Sequential keys cause write hotspots in Spanner, leading to uneven load.

Why this answer

Option A is correct because a monotonically increasing primary key (UserId) causes hot spots as all writes concentrate on the last split. Option B (STRING(MAX)) is acceptable but not a performance issue. Option C (no indexes) is not required for write throughput.

Option D (not interleaved) is irrelevant.

Full explanation →

198

MCQhard

Your Spanner instance is running a workload with high read throughput. You notice that read latency has increased significantly. Upon investigating, you find that the instance is experiencing high CPU utilization on the Spanner nodes. The workload consists of many small point lookups (reads by primary key). Which action is most likely to reduce read latency?

A.Add secondary indexes on the primary key columns.

B.Use stale reads with a timestamp bound to allow reads from replicas.

C.Redesign the schema to use interleaved tables.

D.Reduce the number of Spanner nodes to lower CPU overhead.

AnswerB

Stale reads can be served from any replica, reducing load on the leader and improving latency.

Why this answer

B is correct because stale reads allow Cloud Spanner to serve read requests from read-only replicas, which offloads CPU-intensive processing from the leading replica nodes. By using a timestamp bound (e.g., exact_staleness or max_staleness), the workload can tolerate slightly outdated data, reducing the load on the nodes and lowering read latency for point lookups.

Exam trap

Google Cloud often tests the misconception that reducing nodes or adding indexes always improves performance, but in Spanner, reducing nodes starves CPU capacity and secondary indexes increase write overhead, while stale reads directly offload the leader's CPU.

How to eliminate wrong answers

Option A is wrong because adding secondary indexes on primary key columns does not reduce CPU utilization on Spanner nodes; it adds additional index maintenance overhead and may increase write latency without addressing the high read CPU issue. Option C is wrong because redesigning the schema to use interleaved tables improves locality for parent-child relationships but does not directly reduce CPU load from high-throughput point lookups; it may even increase CPU usage due to more complex splits. Option D is wrong because reducing the number of Spanner nodes decreases total CPU capacity, which would likely increase CPU utilization per node and worsen read latency, not reduce it.

Full explanation →

199

Matchingmedium

Match each Cloud Spanner replication type to its purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Participates in writes and reads

Serves reads but not writes

Participates in voting but not data storage

Replication within a single region for low latency

Global replication for higher availability

Why these pairings

These replica types support different availability and performance needs.

Full explanation →

200

Multi-Selectmedium

Which TWO of the following are valid ways to improve the performance of a BigQuery query that joins two large tables?

Select 2 answers

A.Apply WHERE clauses to filter each table before the join.

B.Create a materialized view that pre-joins the tables.

C.Use the 'JOIN EACH' clause.

D.Denormalize the tables into a single table.

E.Set the query option 'USE_CACHE=TRUE'.

AnswersA, B

Reducing data before joining improves performance.

Why this answer

Option A is correct because applying WHERE clauses before the join (e.g., using subqueries or CTEs to pre-filter each table) reduces the amount of data shuffled and processed during the join phase. BigQuery's query engine can push down filters to the storage layer, minimizing the bytes read and improving performance significantly.

Exam trap

Google Cloud often tests the misconception that 'JOIN EACH' is still required for large joins, when in fact it is a deprecated syntax and modern BigQuery handles large joins automatically without any special clause.

Full explanation →

201

MCQeasy

A Cloud SQL instance is running low on disk space. You want to increase disk size with zero downtime. Which action should you take?

A.Stop the instance, increase disk, and restart

B.Export data, create a new instance with larger disk, and import

C.Create a clone with larger disk and failover

D.Use gcloud sql instances patch to increase the storage size

AnswerD

This operation increases storage online without restarting the instance.

Why this answer

Option D is correct because Cloud SQL supports online storage increases without downtime. The `gcloud sql instances patch` command (or the equivalent Console/API operation) dynamically adds more disk capacity to the running instance while it continues serving traffic. This is possible because Cloud SQL uses persistent disks that can be resized online, and the database engine (e.g., MySQL, PostgreSQL) does not need to restart to recognize the additional space.

Exam trap

Google Cloud often tests the misconception that any storage change requires a restart or rebuild, but Cloud SQL's online disk resize is a key differentiator that candidates must remember to avoid picking downtime-inducing options like stopping the instance or recreating it.

How to eliminate wrong answers

Option A is wrong because stopping the instance causes downtime, which violates the zero-downtime requirement. Option B is wrong because exporting and importing data is a manual, time-consuming process that also requires downtime during the cutover, and it is not the recommended method for simply increasing disk size. Option C is wrong because creating a clone with a larger disk and then failing over still involves a brief interruption during the failover process, and it is an unnecessarily complex approach when a direct online resize is available.

Full explanation →

202

MCQeasy

A BI developer needs to write a query that calculates total sales by month for the current year. They create a Common Table Expression (CTE) to define monthly aggregates, then reference it in a final SELECT. What is the main benefit of using a CTE over a subquery in this scenario?

A.CTEs are always faster than subqueries.

B.CTEs reduce the amount of memory used by the query.

C.CTEs automatically cache results for subsequent queries.

D.CTEs enhance query readability and maintainability.

AnswerD

CTEs allow you to break down complex queries into named steps.

Why this answer

Option D is correct because CTEs improve query readability and maintainability by allowing you to define a named temporary result set once and reference it multiple times in the final SELECT. In this scenario, the CTE clearly separates the monthly aggregation logic from the final output, making the query easier to understand and modify compared to nesting subqueries.

Exam trap

Google Cloud often tests the misconception that CTEs provide performance benefits like caching or reduced memory, when in fact their primary advantage is structural clarity and reusability within a single query.

How to eliminate wrong answers

Option A is wrong because CTEs are not inherently faster than subqueries; performance depends on the query optimizer and execution plan, and in many cases CTEs are not materialized or optimized differently. Option B is wrong because CTEs do not reduce memory usage; in fact, a CTE that is referenced multiple times may be re-evaluated each time unless the database engine materializes it, potentially increasing memory and CPU usage. Option C is wrong because CTEs do not automatically cache results for subsequent queries; they are scoped to a single statement and are not persisted or shared across separate executions.

Full explanation →

203

MCQhard

A company uses Cloud SQL for PostgreSQL to store transactional data and BigQuery for analytics. They need to sync a subset of tables from Cloud SQL to BigQuery daily for BI reporting. The tables are updated incrementally (INSERT, UPDATE, DELETE). Which approach is MOST reliable and cost-effective?

A.Use Datastream to stream changes from Cloud SQL to BigQuery in near real-time.

B.Write a custom cron job on App Engine to extract changes and load them into BigQuery.

C.Create BigQuery federated queries that directly read from Cloud SQL.

D.Export the Cloud SQL tables to Cloud Storage as CSV files daily, then load them into BigQuery.

AnswerA

Datastream is a managed CDC service that handles incremental changes efficiently.

Why this answer

Datastream is purpose-built for exactly this use case: it captures CDC (Change Data Capture) events from Cloud SQL for PostgreSQL (using the PostgreSQL logical replication slot and the pgoutput plugin) and streams them directly into BigQuery via a streaming ingestion pipeline. This approach handles INSERT, UPDATE, and DELETE operations reliably without custom code, and it is cost-effective because it avoids full table exports and leverages BigQuery's streaming buffer for near-real-time updates.

Exam trap

Google Cloud often tests the misconception that batch exports (Option D) are the simplest and most reliable approach, but the trap here is that incremental CDC with Datastream is actually more reliable and cost-effective for tables with frequent updates and deletes, because it avoids full table scans and manual change tracking.

How to eliminate wrong answers

Option B is wrong because a custom cron job on App Engine would require implementing complex change tracking (e.g., using timestamps or triggers) and cannot reliably capture DELETE operations without additional overhead, making it less reliable and more costly to maintain. Option C is wrong because BigQuery federated queries read from Cloud SQL directly at query time, which bypasses BigQuery's storage and performance optimizations, incurs high latency, and is not suitable for daily syncing or handling incremental changes. Option D is wrong because daily full CSV exports are inefficient for incrementally updated tables (they waste storage and compute on unchanged rows), cannot capture DELETEs without additional logic, and the daily batch load introduces a 24-hour delay, which is less reliable and more expensive than streaming CDC.

Full explanation →

204

MCQhard

A Database Engineer is designing a tiered storage strategy for a large BigQuery dataset. The dataset contains data that is accessed frequently for the first 30 days, moderately for the next 6 months, and rarely after that. The engineer wants to minimize overall storage cost while maintaining fast query performance on recent data. Which approach should the engineer take?

A.Use logical storage billing and partition the table by date; older data automatically moves to long-term storage after 90 days.

B.Use Bigtable for recent data and BigQuery for historical data, with a scheduled transfer between them.

C.Export data older than 30 days to Cloud Storage as Avro files and delete them from BigQuery; query them externally when needed.

D.Use physical storage billing, partition the table by ingestion date, cluster on frequently used columns, and set an expiration on partitions older than 6 months to move them to long-term storage.

AnswerD

Physical billing with long-term storage after 90 days of no modifications reduces costs. Partitioning and clustering optimize performance on recent data.

Why this answer

Option D is correct because physical storage billing in BigQuery charges separately for active and long-term storage, and setting a partition expiration on partitions older than 6 months automatically moves that data to long-term storage at a lower cost. Partitioning by ingestion date and clustering on frequently used columns ensures fast query performance on recent data while minimizing overall storage costs.

Exam trap

The trap here is that candidates confuse logical and physical storage billing, assuming that logical billing automatically provides long-term storage discounts, when in fact only physical billing has separate active and long-term storage tiers with different pricing.

How to eliminate wrong answers

Option A is wrong because logical storage billing does not have a separate long-term storage tier; it uses a single blended rate, and the 90-day automatic move to long-term storage only applies with physical billing, not logical. Option B is wrong because Bigtable is optimized for real-time, low-latency access and high write throughput, not for cost-effective storage of historical data; using it for recent data adds unnecessary complexity and cost without leveraging BigQuery's native partitioning and long-term storage features. Option C is wrong because exporting data to Cloud Storage as Avro and querying it externally with BigQuery incurs query costs on external data sources and loses the performance benefits of native BigQuery partitioning, clustering, and automatic long-term storage pricing.

Full explanation →

205

MCQeasy

A company needs to store raw event logs for future BI analysis. The logs are semistructured with varying fields. Which BigQuery data type should they use to store the event payload?

A.ARRAY

B.STRING

C.FLOAT64

D.JSON

AnswerD

JSON type allows storing and querying semistructured data with nested fields.

Why this answer

Option D is correct because BigQuery's JSON data type is designed to store semistructured data with varying fields, such as raw event logs. It allows schema flexibility, efficient querying of nested fields using JSON functions like `JSON_EXTRACT`, and avoids the need to predefine a rigid schema, which is ideal for BI analysis of event payloads.

Exam trap

Google Cloud often tests the misconception that STRING is sufficient for semistructured data, but the trap is that STRING lacks native querying capabilities and incurs higher costs for parsing, whereas JSON provides built-in functions and better performance for BI workloads.

How to eliminate wrong answers

Option A is wrong because ARRAY is used to store ordered lists of elements of the same data type, not for semistructured payloads with varying fields. Option B is wrong because STRING would store the payload as a plain text blob, losing the ability to query individual fields without complex parsing and increasing storage and processing overhead. Option C is wrong because FLOAT64 is a numeric data type for floating-point numbers, completely unsuitable for storing event payloads that contain diverse field types.

Full explanation →

206

MCQmedium

The query returns results but takes a long time. The orders table has 500M rows with order_date as a timestamp and revenue as float. How can the query be optimized?

A.Add a clustering key on order_date.

B.Partition the table by month on order_date.

C.Use a wildcard table over multiple date-sharded tables.

D.Use a materialized view that caches the query result.

AnswerB

Partition pruning limits data scanned to relevant months.

Why this answer

Partitioning the table by month on order_date (Option B) is correct because it physically separates the data into monthly partitions, allowing the query engine to prune partitions that do not match the query's time range. This dramatically reduces the amount of data scanned, which is the primary cause of slow performance on a 500M-row table. In BigQuery, partitioning by a timestamp column like order_date is a native, cost-effective optimization that directly addresses the scan bottleneck.

Exam trap

The trap here is that candidates often confuse clustering with partitioning, assuming that sorting data (clustering) provides the same scan reduction as physically separating data (partitioning), but clustering only improves block pruning within already-scanned data, not the initial scan elimination.

How to eliminate wrong answers

Option A is wrong because adding a clustering key on order_date does not physically separate data into independent storage blocks; it only sorts data within existing partitions or the entire table, so it cannot reduce the amount of data scanned as effectively as partitioning. Option C is wrong because using a wildcard table over multiple date-sharded tables is a legacy approach that requires manual table management and incurs additional overhead for query planning and metadata operations, whereas native partitioning is simpler and more performant. Option D is wrong because a materialized view caches the query result but does not reduce the scan cost for the base table; it is useful for repeated aggregations, not for optimizing a single ad-hoc query that filters by order_date.

Full explanation →

207

Multi-Selecteasy

A company is planning to migrate a self-managed MongoDB database to a fully managed Google Cloud service. They need to maintain high availability and support complex queries with aggregation pipelines. Which TWO services should they consider?

Select 2 answers

A.MongoDB Atlas on Google Cloud

B.Cloud Spanner

C.Cloud Firestore

D.Cloud Bigtable

E.Cloud SQL for PostgreSQL with JSON data type

AnswersA, E

MongoDB Atlas is a fully managed MongoDB service that can run on Google Cloud and provides high availability.

Why this answer

MongoDB Atlas on Google Cloud is a fully managed MongoDB service that provides native support for MongoDB's aggregation pipeline and high availability through replica sets and automated failover. It is the direct migration path for a self-managed MongoDB database to a managed service without changing the underlying database engine.

Exam trap

Google Cloud often tests the misconception that any managed NoSQL service (like Firestore or Bigtable) can replace MongoDB, or that a relational database with JSON support (like Cloud SQL for PostgreSQL) is a drop-in replacement for MongoDB's aggregation pipeline, when in fact the pipeline's native operators and performance characteristics are unique to MongoDB.

Full explanation →

208

MCQhard

Your Cloud Spanner database is experiencing a high volume of read-write conflicts, causing many aborts and high latency. You have already increased the compute capacity. Upon analyzing the schema, you find that most conflicts occur on a single table with frequent updates to the same row. What index or schema change would most effectively reduce contention?

A.Enable batch writes in the client library to combine updates.

B.Use a monotonically increasing timestamp as a prefix to the primary key.

C.Add a secondary index on the most frequently updated column.

D.Change the primary key to include a hash prefix of the original key.

AnswerD

A hash prefix distributes writes evenly across splits, reducing row-level contention.

Why this answer

Option D is correct because adding a hash prefix to the primary key distributes writes across multiple splits, reducing hot-spotting on a single row. Cloud Spanner uses range-based splits; without a hash prefix, monotonically increasing keys cause all writes to target the same split, leading to read-write conflicts and aborts. A hash prefix randomizes the key distribution, spreading load across nodes and minimizing contention.

Exam trap

Google Cloud often tests the misconception that secondary indexes or batching solve hot-spotting, when the root cause is key distribution; candidates must recognize that only changing the primary key structure (e.g., hash prefix) directly addresses split-level contention.

How to eliminate wrong answers

Option A is wrong because batch writes combine multiple mutations into a single request but do not change the underlying key distribution; if all writes target the same row, batching still causes contention on that row. Option B is wrong because using a monotonically increasing timestamp as a prefix to the primary key exacerbates hot-spotting, as all new writes go to the same split, increasing contention rather than reducing it. Option C is wrong because adding a secondary index on the frequently updated column does not affect the primary key distribution; it may even increase write overhead and contention due to index maintenance.

Full explanation →

209

MCQhard

A BI team uses BigQuery to report on customer orders. The 'customers' dimension table is updated nightly with Type 2 Slowly Changing Dimensions (SCD). However, some reports show incorrect historical aggregates because the fact table references only the current customer key. Which approach resolves this issue?

A.Update the fact table nightly to replace old customer keys with the current key

B.Store the surrogate customer key from the dimension table in the fact table at transaction time

C.Denormalize customer attributes into the fact table

D.Use the natural customer ID in the fact table and join with the dimension using a BETWEEN condition on effective dates

AnswerB

This ensures the fact always points to the correct version of the customer.

Why this answer

Option B is correct because with Type 2 SCD, each customer row has a unique surrogate key that represents a specific version of the customer's attributes over time. Storing that surrogate key in the fact table at transaction time ensures that historical facts are permanently linked to the correct customer attributes as they existed at the time of the order. This prevents incorrect aggregates when the dimension table is updated, as the fact table will always join to the precise version of the customer record that was active when the transaction occurred.

Exam trap

Google Cloud often tests the misconception that updating the fact table with current keys (Option A) is acceptable for Type 2 SCD, when in reality it silently converts the design to Type 1 and destroys historical accuracy.

How to eliminate wrong answers

Option A is wrong because updating the fact table nightly to replace old customer keys with the current key destroys historical accuracy, effectively converting the Type 2 SCD into a Type 1 overwrite and breaking the ability to report on past customer attributes. Option C is wrong because denormalizing customer attributes into the fact table duplicates data, increases storage costs, and requires updating all historical fact rows whenever a customer attribute changes, which is impractical and error-prone in BigQuery's append-heavy architecture. Option D is wrong because using the natural customer ID with a BETWEEN condition on effective dates is a valid approach for Type 2 SCDs, but it requires the fact table to store the transaction timestamp; the question states the fact table references only the current customer key, so this option does not resolve the issue without also modifying the fact table schema to include a timestamp.

Full explanation →

210

MCQhard

Your team uses Cloud SQL for PostgreSQL and needs to run a one-time data correction query that will update 10 million rows. The instance has 8 vCPUs and 30 GB memory. The query is currently running for hours and impacting production performance. What should you do?

A.Create a read replica and run the update on the replica.

B.Create a clone of the instance, run the update on the clone, then promote it.

C.Run the query during off-peak hours with reduced concurrency.

D.Increase the instance size to 16 vCPUs and 60 GB memory before running the query.

AnswerB

Cloning provides an isolated environment for the update.

Why this answer

Option B is correct because creating a clone of the Cloud SQL instance provides an isolated environment where the heavy UPDATE can run without impacting production performance. After the update completes on the clone, you can promote it to become the new primary instance, effectively applying the data correction with minimal downtime. This approach avoids the performance degradation caused by running the query on the production instance and leverages Cloud SQL's cloning feature for a one-time data correction.

Exam trap

Google Cloud often tests the misconception that read replicas can handle write operations, but in Cloud SQL for PostgreSQL, replicas are strictly read-only and cannot be used for UPDATE queries.

How to eliminate wrong answers

Option A is wrong because Cloud SQL read replicas are read-only and cannot execute UPDATE, INSERT, or DELETE statements; they are designed for offloading read traffic, not for write operations. Option C is wrong because running the query during off-peak hours with reduced concurrency still executes the update on the production instance, which will continue to consume significant CPU, memory, and I/O resources, degrading performance for any concurrent production queries. Option D is wrong because increasing the instance size to 16 vCPUs and 60 GB memory only provides more resources but does not eliminate the performance impact on the production instance; the query will still contend with production workloads for the same database engine and storage, and scaling up is not a cost-effective or risk-free solution for a one-time operation.

Full explanation →

211

MCQeasy

You are managing a Cloud SQL for MySQL instance that supports a web application. Recently, users have reported that the application is responding slowly during peak hours. You examine the Query Insights dashboard and see that a specific query is running frequently and has a high execution time. The query involves JOINs on three tables, each with tens of thousands of rows. The query plan shows a full table scan on two tables. What should you do first to improve performance?

A.Enable the query cache flag in Cloud SQL database flags.

B.Increase the instance size to provide more memory and CPU.

C.Add indexes on the columns used in JOIN conditions.

D.Rewrite the query to use subqueries instead of JOINs.

AnswerC

Indexes will allow the database to avoid full table scans, significantly reducing query execution time.

Why this answer

Option C is correct: Adding appropriate indexes on the join columns will reduce full table scans, which is the most effective immediate action. Option A (increasing instance size) may help but does not address the root cause. Option B (enabling database flags like query cache) is less effective and query cache is deprecated in newer versions.

Option D (rewriting the query) could help but is more complex and time-consuming; indexing is usually the first step.

Full explanation →

212

MCQeasy

A team is designing a disaster recovery plan for a Cloud SQL for PostgreSQL instance. The RPO is 5 minutes and RTO is 1 hour. Which configuration meets these requirements?

A.Schedule daily exports to Cloud Storage and import in another region

B.Enable automatic failover within the same region using HA configuration

C.Create a read replica in the same region and promote it during disaster

D.Configure a cross-region read replica with point-in-time recovery (PITR) to enable failover

AnswerD

Cross-region replica provides region failover; PITR allows recovery to any point within the backup window.

Why this answer

Option D meets the RPO of 5 minutes and RTO of 1 hour because a cross-region read replica with point-in-time recovery (PITR) allows you to failover to a replica in another region, minimizing data loss to within seconds (PITR can recover to any point in time within the retention window) and promoting the replica typically completes within minutes, well under the 1-hour RTO. This configuration provides both disaster recovery across regions and granular recovery to meet the strict RPO.

Exam trap

Google Cloud often tests the distinction between high availability (HA) within a region and disaster recovery (DR) across regions, and the trap here is that candidates confuse automatic failover in the same region (Option B) with cross-region DR, failing to realize that HA does not protect against a full regional outage.

How to eliminate wrong answers

Option A is wrong because daily exports to Cloud Storage have an RPO of up to 24 hours (the time between exports), far exceeding the 5-minute requirement, and importing to another region would take much longer than the 1-hour RTO. Option B is wrong because automatic failover within the same region using HA configuration protects against zonal failures but does not protect against a regional disaster, so it cannot meet the cross-region DR requirement. Option C is wrong because a read replica in the same region cannot survive a regional outage; promoting it still leaves you in the same failed region, violating the DR need for geographic separation.

Full explanation →

213

MCQhard

A company is using Cloud Spanner to manage financial transactions. The current schema has a single table 'Transactions' with a composite primary key (account_id, transaction_timestamp). The company frequently queries the latest transaction for each account. This query pattern is causing full table scans. Which schema design change would most improve query performance?

A.Add a secondary index on (account_id, transaction_timestamp DESC)

B.Change the primary key to (transaction_timestamp, account_id) and use interleaving

C.Create a separate 'LatestTransaction' table keyed by account_id, and update it whenever a new transaction occurs

D.Add a 'is_latest' boolean column to the Transactions table and index it

AnswerC

Enables direct point reads for the latest transaction.

Why this answer

Option C is correct because it eliminates the need to scan the entire Transactions table to find the latest transaction per account. By maintaining a separate LatestTransaction table keyed by account_id, each account's latest transaction can be retrieved with a single point read. This is a classic denormalization pattern in Cloud Spanner that avoids the overhead of scanning or sorting large datasets for 'latest per group' queries.

Exam trap

Google Cloud often tests the misconception that a secondary index with DESC ordering can efficiently retrieve the latest row per group, but in Cloud Spanner, secondary indexes do not support 'top-N per group' without scanning all index entries for each group.

How to eliminate wrong answers

Option A is wrong because a secondary index on (account_id, transaction_timestamp DESC) would still require a full index scan to find the latest transaction per account, as Cloud Spanner secondary indexes do not support 'latest per group' without scanning all rows for each account. Option B is wrong because changing the primary key to (transaction_timestamp, account_id) would scatter rows for the same account across splits, making per-account queries inefficient and requiring a full table scan to gather all rows for a single account. Option D is wrong because adding an 'is_latest' boolean column and indexing it would require updating all previous rows for an account on every insert to set is_latest=false, which is both expensive and prone to race conditions in a distributed database like Cloud Spanner.

Full explanation →

214

MCQeasy

A Cloud SQL for MySQL instance is experiencing increased replica lag. The write workload is constant. What is the most likely cause?

A.Binary logs are being deleted too frequently on the primary

B.The query cache is enabled on the primary

C.The replica is on a higher machine tier than the primary

D.A long-running query is executing on the replica

AnswerD

Long queries delay the SQL thread from applying changes.

Why this answer

A long-running query on the replica can block the SQL thread from applying relay log events, causing replica lag to increase even if the primary's write workload is constant. This is because replication is single-threaded by default in MySQL, so one slow query stalls all subsequent events until it completes.

Exam trap

Google Cloud often tests the misconception that replica lag is always caused by primary-side issues (like binary log deletion or query cache), when in fact the replica's own processing bottlenecks are a frequent root cause.

How to eliminate wrong answers

Option A is wrong because deleting binary logs too frequently on the primary does not directly cause replica lag; it can cause replication errors if the replica hasn't yet processed the logs, but lag itself is driven by apply delays, not log retention. Option B is wrong because the query cache is deprecated in MySQL 8.0 and, even when enabled, it caches SELECT results on the primary, not affecting replication lag. Option C is wrong because a higher machine tier on the replica would reduce, not increase, replica lag by providing more resources to apply changes faster.

Full explanation →

215

MCQmedium

A company is designing a Cloud Spanner database for a global user base. They need to support strong consistency and low-latency reads across multiple regions. Which schema design practice is most important?

A.Denormalize data into wide tables to reduce the number of joins.

B.Use interleaved tables to co-locate related rows that are queried together.

C.Use a single table with composite primary key to avoid joins.

D.Create secondary indexes on every column to optimize read queries.

AnswerB

Interleaving ensures parent and child rows are stored on the same split, reducing latency for joins.

Why this answer

Option D is correct because interleaving tables that are frequently joined together into a parent-child hierarchy allows Spanner to co-locate the data, reducing cross-node communication and latency. Option A is wrong because using a single monolithic table would not scale. Option B is wrong because denormalization can increase write latency and complexity.

Option C is wrong because indexing all columns leads to unnecessary overhead.

Full explanation →

216

MCQeasy

A company needs a cross-region disaster recovery solution for their Cloud SQL MySQL database. Which feature should they use?

A.Read replicas in the same region.

B.Cloud SQL for MySQL does not support cross-region replication.

C.Cross-region replication using external replicas.

D.Use Database Migration Service to continuously copy data.

AnswerC

An external replica in a different region can serve as a disaster recovery target.

Why this answer

Option C is correct because Cloud SQL for MySQL does not natively support cross-region replication, but you can achieve it by configuring an external replica (a MySQL instance running outside Cloud SQL, such as on Compute Engine) that uses MySQL's native binary log (binlog) replication from the primary Cloud SQL instance. This setup allows you to maintain a standby database in a different region for disaster recovery, with the external replica continuously applying changes from the primary.

Exam trap

The trap here is that candidates assume Cloud SQL for MySQL has a built-in cross-region replica feature like Cloud SQL for PostgreSQL or Spanner, but it does not, leading them to incorrectly select Option B or D.

How to eliminate wrong answers

Option A is wrong because read replicas in the same region do not provide cross-region disaster recovery; they only offload read traffic within the same region and cannot survive a regional outage. Option B is wrong because Cloud SQL for MySQL does support cross-region replication indirectly through external replicas, making the absolute statement 'does not support' incorrect. Option D is wrong because Database Migration Service is designed for one-time migrations, not continuous cross-region replication; it does not maintain an ongoing sync for disaster recovery.

Full explanation →

217

MCQeasy

You run the above command to create a Spanner instance. Later, you need to increase the instance's compute capacity to handle higher traffic. What is the correct approach?

A.Use the console to change the instance configuration to a larger one.

B.Delete the instance and recreate with --nodes=5.

C.Create a new instance with a larger config and migrate data.

D.Run gcloud spanner instances update test-instance --nodes=5

AnswerD

This updates the node count without recreating the instance.

Why this answer

Option D is correct because Cloud Spanner allows you to increase the compute capacity of an existing instance by updating the node count using the `gcloud spanner instances update` command. This operation is performed online without downtime, as Spanner supports live resizing of nodes to handle increased traffic.

Exam trap

The trap here is that candidates confuse changing the instance configuration (which requires migration) with scaling compute capacity by adjusting node count (which is a live, online operation).

How to eliminate wrong answers

Option A is wrong because instance configuration (e.g., regional vs. multi-regional) cannot be changed after creation; you would need to create a new instance with the desired configuration and migrate data. Option B is wrong because deleting and recreating the instance is unnecessary and causes downtime; Spanner supports live node count changes without instance deletion. Option C is wrong because creating a new instance with a larger config and migrating data is only required when changing the instance configuration (e.g., from regional to multi-regional), not when simply increasing node count within the same configuration.

Full explanation →

218

MCQeasy

A developer needs to store JSON documents that are frequently accessed by key but rarely updated. The data size is under 10 GB initially but expected to grow to 500 GB. Which database service is most suitable?

A.Firestore (Datastore mode) with document keys

B.Cloud Bigtable with row keys as document IDs

C.Memorystore for Redis with JSON data type

D.Cloud SQL for PostgreSQL with JSONB column

AnswerA

Firestore provides document store with automatic scaling and low-latency key lookups.

Why this answer

Firestore in Datastore mode is ideal because it provides a fully managed, scalable NoSQL document database with automatic sharding and strong consistency for key-based lookups. It handles growth from 10 GB to 500 GB seamlessly without manual partitioning, and its document keys enable efficient point reads for frequently accessed, rarely updated JSON data.

Exam trap

The trap here is that candidates often choose Cloud Bigtable (B) for large-scale key-value workloads, overlooking that Bigtable is designed for wide-column, high-throughput analytical access patterns, not for storing JSON documents with frequent point reads by key.

How to eliminate wrong answers

Option B is wrong because Cloud Bigtable is a wide-column NoSQL database optimized for high-throughput, low-latency analytical workloads (e.g., time-series or IoT data), not for storing and retrieving JSON documents by key; it lacks native JSON support and is overkill for this use case. Option C is wrong because Memorystore for Redis is an in-memory cache, not a persistent database; while it supports JSON data type via RedisJSON module, it is designed for ephemeral caching and cannot reliably store 500 GB of data without significant cost and data loss risk on restart. Option D is wrong because Cloud SQL for PostgreSQL with JSONB column is a relational database that requires manual sharding or read replicas to scale beyond a single instance, and it does not provide the automatic, seamless scalability to 500 GB that a NoSQL document store like Firestore offers.

Full explanation →

219

MCQmedium

An online retailer uses Cloud SQL for PostgreSQL. They need to scale for a seasonal peak. They expect 2x current traffic. Their current instance is 16 vCPU, 64 GB RAM, 1 TB storage. The peak lasts 4 hours. They want to handle it without downtime. What is the best approach?

A.Use Cloud Spanner to auto-scale.

B.Upgrade to a higher-tier machine type permanently.

C.Add read replicas and rewrite queries to use replicas for reads.

D.Increase vCPU and memory to 32 vCPU/128 GB temporarily for the peak window.

AnswerD

Vertical scaling is straightforward and temporary, minimizing cost.

Why this answer

Option A is best because Cloud SQL allows vertical scaling (increasing vCPU and RAM) with a brief downtime (restart) but usually acceptable; for 4-hour peak, temporary scaling is cost-effective. B read replicas help reads not writes. C is overkill.

D permanent upgrade is unnecessary. So A.

Full explanation →

220

MCQmedium

Your application uses Cloud SQL for PostgreSQL. You notice that the database CPU utilization has been consistently above 90% during peak hours, causing increased query latency. You have already tuned the most expensive queries. What is the most cost-effective next step?

A.Enable high availability with a failover replica.

B.Implement connection pooling via Cloud SQL Proxy or a dedicated pooler like PgBouncer.

C.Increase the machine type to a higher CPU tier.

D.Add a read replica to offload read queries.

AnswerB

Connection pooling reduces the number of active connections, lowering context switching and CPU overhead.

Why this answer

Option D is correct because reducing the number of active connections via connection pooling reduces contention and CPU overhead without requiring a more expensive instance type. Option A is wrong because read replicas help with read traffic but not write-heavy CPU usage. Option B is wrong because increasing machine type costs more and should be a last resort.

Option C is wrong because enabling high availability adds a standby but does not reduce CPU load.

Full explanation →

221

MCQmedium

A BI team finds that their BigQuery query that aggregates sales by region runs slower than expected, even with appropriate clustering and partitioning. The query filters on a date range and then groups by region. The table is partitioned by date and clustered by region. What can the team do to improve query performance without increasing cost?

A.Increase the number of clusters to include more columns.

B.Change the partition type to ingestion-time partitioning.

C.Add an ORDER BY clause to the query.

D.Use a materialized view that pre-aggregates sales by region and date.

AnswerD

Materialized views provide pre-computed results, reducing query time and data processed.

Why this answer

Option D is correct because a materialized view in BigQuery can pre-aggregate sales by region and date, allowing the query to read precomputed results instead of scanning the entire table. This reduces the amount of data processed and speeds up the query without increasing cost, as the materialized view is automatically maintained and only incremental changes are processed.

Exam trap

The trap here is that candidates often think adding more clustering columns or sorting will improve aggregation performance, but they fail to recognize that pre-aggregation via materialized views is the only option that reduces the data scanned without increasing cost.

How to eliminate wrong answers

Option A is wrong because increasing the number of clusters to include more columns does not improve performance for a query that already filters on a partitioned column and groups by a clustered column; additional clustering columns can increase write overhead and may not reduce the data scanned. Option B is wrong because changing to ingestion-time partitioning does not provide any benefit over the existing date-based partitioning; ingestion-time partitioning is typically used when no timestamp column exists, and it would not improve query performance for date-range filters. Option C is wrong because adding an ORDER BY clause does not reduce the amount of data scanned or processed; it only sorts the final result, which adds overhead without addressing the root cause of slow aggregation.

Full explanation →

222

MCQeasy

A company is building a business intelligence dashboard on BigQuery to analyze daily sales data. The table contains a TIMESTAMP column 'order_ts' and a string column 'region'. The BI team frequently filters by month and region. Which table design best optimizes query performance and cost?

A.Use a separate table for each region

B.Clustering by order_ts and region without partitioning

C.Partition the table by date (month) and cluster by region

D.Partition the table by region and cluster by order_ts

AnswerC

Partitioning on the date granularity used in filters and clustering on region minimizes scanned data.

Why this answer

Partitioning by month and clustering by region reduces the data scanned for common filters, improving performance and cost.

Full explanation →

223

MCQeasy

Your Cloud Spanner instance has high latency for point reads. The workload is evenly distributed across all nodes. Which metric should you examine first to identify the bottleneck?

A.CPU utilization per node

B.Storage utilization

C.Rows returned per second

D.Number of committed nodes

AnswerA

High CPU suggests node is overloaded, causing latency.

Why this answer

High latency for point reads in Cloud Spanner, even with evenly distributed workload, often points to CPU saturation on individual nodes. Spanner uses a shared-nothing architecture where each node handles a portion of the data and queries; if CPU utilization per node is high, it indicates that the node is overloaded, causing queuing and increased latency. This metric directly reflects processing capacity and is the first place to look for a bottleneck in point-read performance.

Exam trap

Google Cloud often tests the misconception that evenly distributed workload means no node-level bottleneck, but the trap here is that even distribution does not guarantee low latency if each node is individually under high CPU load, so candidates incorrectly focus on throughput or storage metrics instead of CPU utilization.

How to eliminate wrong answers

Option B is wrong because storage utilization measures disk space usage, not processing capacity; Spanner automatically manages storage distribution and high storage does not directly cause point-read latency. Option C is wrong because rows returned per second is a throughput metric, not a latency metric; high throughput can coexist with high latency if nodes are overloaded, so it does not identify the bottleneck. Option D is wrong because 'number of committed nodes' is not a standard Spanner metric; Spanner uses a fixed number of nodes provisioned, and committed nodes refer to a different concept (e.g., in Google Cloud commitments), not a real-time performance indicator.

Full explanation →

224

MCQhard

Refer to the exhibit. You restored a Spanner database from a backup and are checking the status of the optimize operation. The operation has been running for 15 minutes and is 45% complete. The database is already accessible but queries on it are slower than expected. What should you do?

A.Continue running queries; performance will improve once the optimize operation completes.

B.Wait for the operation to finish before allowing any queries.

C.Drop and restore the database again to start fresh.

D.Cancel the optimize operation to reduce resource usage.

AnswerA

The optimize operation rebuilds indexes and updates statistics, which improves query performance.

Why this answer

Option A is correct because Spanner's optimize operation runs asynchronously and does not block database access. Queries are slower during optimization because the operation reorganizes data and rebuilds indexes, which consumes I/O and CPU resources. Once the optimize operation completes, query performance will improve as the data layout becomes more efficient.

Exam trap

Google Cloud often tests the misconception that database maintenance operations like optimization must complete before the database is usable, but Spanner is designed for continuous availability and allows queries during such operations.

How to eliminate wrong answers

Option B is wrong because Spanner allows queries during an optimize operation; waiting is unnecessary and defeats the purpose of high availability. Option C is wrong because dropping and restoring the database would restart the optimization from scratch, wasting time and resources without any benefit. Option D is wrong because canceling the optimize operation would leave the database in a suboptimal state, and performance would remain degraded until optimization is completed or re-triggered.

Full explanation →

225

MCQeasy

A company needs to store session data for a web application that runs on Google Kubernetes Engine (GKE). The data is temporary and high-availability is required. Which database service is most appropriate?

A.Cloud Spanner

B.Cloud SQL for MySQL

C.Memorystore for Redis with replication

D.Cloud Bigtable

AnswerC

Memorystore provides a highly available in-memory cache, perfect for session data.

Why this answer

Memorystore for Redis with replication is the most appropriate choice because session data is temporary, requires high availability, and benefits from Redis's in-memory, low-latency key-value store. Replication provides failover capability, ensuring session continuity if a primary node fails, while Redis's built-in expiry (TTL) handles temporary data cleanup automatically.

Exam trap

Google Cloud often tests the distinction between 'persistent storage' and 'temporary cache'—candidates mistakenly choose Cloud SQL or Spanner for 'high availability' without recognizing that session data is ephemeral and better served by an in-memory store with replication.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database designed for OLTP workloads with horizontal scaling, not for temporary session data; its high cost and latency overhead are unnecessary for ephemeral key-value storage. Option B is wrong because Cloud SQL for MySQL is a relational database with disk-based storage, which introduces higher latency and operational overhead for session data that is better served by an in-memory cache; it also lacks native TTL-based expiry for temporary data. Option D is wrong because Cloud Bigtable is a wide-column NoSQL database optimized for large-scale analytical and time-series workloads, not for low-latency session storage; its access patterns and cost model are mismatched for transient, high-frequency read/write session data.

Full explanation →

Google Professional Cloud Database Engineer (PCDE) — Questions 151–225