Knowledge + Practice

Google Professional Cloud Database Engineer (PCDE) — Questions 451–503

503 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 7 of 7

451

MCQhard

A company uses BigQuery BI Engine for sub-second query performance. However, some queries are hitting the BI Engine memory limit. Which action should be taken?

A.Cluster the tables more granularly.

B.Increase BI Engine capacity allocation.

C.Use a reservation with a higher slot count.

D.Optimize the dimension tables by denormalizing.

AnswerB

Allocating more memory to BI Engine allows caching larger datasets.

Why this answer

BI Engine is an in-memory analysis service that accelerates queries by caching data in memory. When queries exceed the allocated memory, they spill to disk, causing performance degradation. Increasing the BI Engine capacity allocation directly addresses this by providing more memory for caching, enabling sub-second query performance for larger datasets.

Exam trap

Google Cloud often tests the misconception that increasing slot count (compute) solves memory bottlenecks, but BI Engine memory is a separate resource that must be explicitly allocated; candidates confuse slot-based reservations with in-memory caching.

How to eliminate wrong answers

Option A is wrong because clustering tables more granularly improves partition pruning and data skipping but does not increase the memory available to BI Engine; it may even increase memory pressure by creating more fine-grained data segments. Option C is wrong because a reservation with a higher slot count increases query concurrency and compute resources, not the in-memory cache size for BI Engine; slots and BI Engine memory are separate resources. Option D is wrong because denormalizing dimension tables reduces join complexity but does not expand BI Engine's memory limit; it could actually increase the data volume cached, exacerbating the memory issue.

Full explanation →

452

MCQmedium

Refer to the exhibit. The query joins two large tables and aggregates results. Which optimization would most likely reduce the high shuffle bytes in Stage 3?

A.Add a WHERE clause to filter rows before the join.

B.Ensure both tables are clustered on the join key.

C.Use a broadcast join hint to force one table to be broadcast.

D.Add an ORDER BY clause to sort the data before aggregation.

AnswerA

Filtering early reduces the data that needs to be shuffled.

Why this answer

Option A is correct because filtering data before the join reduces the amount of data shuffled. Option B is wrong because clustering on the join key reduces the shuffle but may not eliminate it. Option C is wrong because the join itself causes data movement; using a manual broadcast join might help only if one table is small, but it is not automatic.

Option D is wrong because ORDER BY is not the main cause of the shuffle; the join is.

Full explanation →

453

MCQmedium

A company uses Cloud Spanner with a schema that has a table 'Orders' with primary key (CustomerId, OrderDate, OrderId). They notice hotspots on a specific customer. Which schema change would best distribute load?

A.Use a secondary index on CustomerId.

B.Split the table into multiple tables per region.

C.Add a hash of CustomerId as a prefix to the primary key.

D.Change primary key to OrderId only.

AnswerC

Hash prefix distributes writes evenly across nodes, reducing hotspots.

Why this answer

Hotspots occur due to monotonically increasing or high-traffic keys. Adding a hash prefix to the primary key (Option A) distributes writes across nodes. Option B (OrderId only) loses ordering and may cause hotspots elsewhere.

Option C (secondary index) helps reads but not writes. Option D (split by region) is overly complex.

Full explanation →

454

MCQmedium

A BI analyst wrote a query that computes the running total of sales over time for each product. The query uses a window function with an ORDER BY clause. The results are correct, but the query processes a large amount of data and is slow. What is the most efficient way to optimize this query?

A.Use the LAG function instead of a window function.

B.Materialize the running total in a separate table using a scheduled query.

C.Use the ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame.

D.Add a PARTITION BY clause to the window function.

AnswerD

Partitioning by product limits the window operation to individual product groups, reducing sorting and shuffle.

Why this answer

Option D is correct because adding a PARTITION BY clause to the window function allows the running total to be computed independently for each product, which reduces the data set the window function must sort and aggregate over. Without PARTITION BY, the query computes a single running total across all products, forcing the database engine to process the entire table as one partition, which is inefficient for large datasets. Partitioning by product ensures that the ORDER BY and frame operations are scoped to each product group, significantly reducing memory and CPU usage.

Exam trap

Google Cloud often tests the misconception that explicitly specifying the default frame (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) improves performance, when in fact the key optimization for a running total over multiple groups is to add a PARTITION BY clause to limit the scope of the window function.

How to eliminate wrong answers

Option A is wrong because the LAG function accesses a previous row's value but does not compute a running total; it would require additional logic to accumulate values, which would be even less efficient and more complex. Option B is wrong because materializing the running total in a separate table with a scheduled query does not optimize the existing query; it introduces data staleness and maintenance overhead, and the original query still runs slowly until the materialized table is built. Option C is wrong because ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the default frame for a running total when ORDER BY is used; explicitly specifying it does not change the execution plan or improve performance, as the database already uses that frame by default.

Full explanation →

455

Drag & Dropmedium

Order the steps to troubleshoot a connection timeout from an application to Cloud SQL.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Start with logs, then verify configuration, authorization, test connectivity, and check instance status.

Full explanation →

456

MCQeasy

A company is designing a data warehouse for business intelligence reporting. They want to organize data into fact and dimension tables to support fast aggregations. Which schema design is most appropriate for this purpose?

A.Star schema

B.Third Normal Form (3NF) schema

C.Snowflake schema

D.Entity-relationship schema

AnswerA

Star schema denormalizes dimensions into a single table per dimension, enabling fast aggregation and simple joins.

Why this answer

The star schema is most appropriate for business intelligence reporting because it denormalizes dimension tables around a central fact table, enabling fast aggregations and simple queries. This design minimizes the number of joins required for analytical queries, which is critical for performance in OLAP workloads. In contrast, normalized schemas like 3NF or snowflake increase join complexity and degrade query speed.

Exam trap

Google Cloud often tests the misconception that a snowflake schema is better for BI because it saves storage, but the exam emphasizes that query performance and simplicity for aggregation are the primary goals, making the star schema the correct choice.

How to eliminate wrong answers

Option B is wrong because a Third Normal Form (3NF) schema is highly normalized to eliminate data redundancy, which is optimal for OLTP transaction processing but introduces many joins that slow down BI aggregations. Option C is wrong because a snowflake schema normalizes dimension tables into sub-dimensions, reducing storage but increasing join depth and query complexity, which can hurt performance in high-volume reporting. Option D is wrong because an entity-relationship schema is a generic modeling approach used for database design, not a specific schema optimized for BI fact-dimension aggregation; it lacks the denormalized structure needed for fast star-join queries.

Full explanation →

457

MCQmedium

Your team needs to add a new non-nullable column with a default value to a large Cloud Spanner table. The table has thousands of simultaneous writes per second. Which approach minimizes downtime and resource usage?

A.Use ALTER TABLE ADD COLUMN without a default value and then update rows in batches

B.Use ALTER TABLE ADD COLUMN with a non-null default value

C.Create a new table and use batch operations to copy data

D.Drop and recreate the table with the new column

AnswerB

Cloud Spanner applies the default immediately without scanning or rewriting rows.

Why this answer

Option D is correct: Cloud Spanner's ALTER TABLE ADD COLUMN with a non-null default value is an online operation that applies the default without rewriting rows, avoiding downtime and heavy resource consumption. Option A causes downtime. Option B is slow and resource-intensive.

Option C requires an additional update phase, which can cause inconsistency and contention.

Full explanation →

458

MCQmedium

A company is migrating an Oracle database to Cloud Spanner. The Oracle database has complex stored procedures and triggers. What is the best approach?

A.Use Dataflow to stream data from Oracle to Spanner.

B.Use BigQuery to load data then export to Spanner.

C.Rewrite the stored procedures and triggers to Spanner-compatible SQL and use a heterogeneous migration tool.

D.Use Database Migration Service for homogenous migration.

AnswerC

Spanner uses standard SQL with limited procedural support; rewriting is necessary, and a tool like Dataflow can migrate data.

Why this answer

Option C is correct because Oracle stored procedures and triggers are written in PL/SQL, which is not compatible with Cloud Spanner's SQL dialect. A heterogeneous migration tool (e.g., Striim or Datastream with custom transforms) can handle schema and data conversion, but the application logic must be rewritten to Spanner-compatible SQL (e.g., using Cloud Spanner's stored procedures in GoogleSQL). This ensures the business logic is preserved and optimized for Spanner's distributed architecture.

Exam trap

The trap here is that candidates assume Database Migration Service (DMS) can handle any migration, but DMS only supports homogeneous migrations (same database engine), and Oracle to Spanner is heterogeneous, requiring a rewrite of stored procedures and triggers.

How to eliminate wrong answers

Option A is wrong because Dataflow is a data processing service, not a migration tool for complex stored procedures and triggers; it can stream data but cannot convert PL/SQL logic to Spanner-compatible SQL. Option B is wrong because BigQuery is an analytics data warehouse, not a migration intermediary; loading data into BigQuery and then exporting to Spanner adds unnecessary complexity and does not address the conversion of stored procedures and triggers. Option D is wrong because Database Migration Service (DMS) supports homogeneous migrations (e.g., MySQL to Cloud SQL for MySQL), but Oracle to Spanner is heterogeneous, and DMS cannot convert PL/SQL to Spanner SQL.

Full explanation →

459

MCQmedium

A Cloud SQL for MySQL database has frequent table locks causing contention and slow queries. Which diagnostic approach helps identify the blocking queries?

A.Set up a Cloud Monitoring alert on CPU

B.Use Query Insights

C.Enable slow query log

D.Use INFORMATION_SCHEMA.INNODB_TRX

AnswerD

This table shows transaction details including lock waits and blockers.

Why this answer

Option D is correct because `INFORMATION_SCHEMA.INNODB_TRX` provides real-time data on all currently executing InnoDB transactions, including transaction IDs, state, and the waiting flag. By joining this with `INNODB_LOCK_WAITS` and `INNODB_LOCKS`, you can pinpoint which transaction is blocking others, directly addressing table lock contention in Cloud SQL for MySQL.

Exam trap

The trap here is that candidates confuse performance monitoring tools (Query Insights, slow query log) with transaction-level diagnostics, failing to recognize that only InnoDB metadata tables expose the blocking transaction chain.

How to eliminate wrong answers

Option A is wrong because a CPU alert monitors resource utilization, not locking or blocking queries; high CPU may be a symptom but does not identify the specific blocking transaction. Option B is wrong because Query Insights in Cloud SQL provides query performance metrics and execution plans, but it does not expose InnoDB transaction lock wait information or the blocking transaction ID. Option C is wrong because the slow query log captures queries that exceed a time threshold, but it does not show which queries are currently blocked or blocking; it may miss short-lived blocking queries entirely.

Full explanation →

460

MCQmedium

A Cloud Memorystore for Redis instance used as a session store has a high eviction rate. Which configuration change can reduce evictions while maintaining performance?

A.Enable persistence (RDB)

B.Increase number of replicas

C.Decrease timeout

D.Set maxmemory-policy to allkeys-lru

AnswerD

This evicts least recently used keys, ideal for session data.

Why this answer

Option D is correct because setting `maxmemory-policy` to `allkeys-lru` allows Redis to evict the least recently used keys across all keys when memory is full, which directly reduces eviction rates by ensuring that only the least active session data is removed. This maintains performance by keeping frequently accessed session keys in memory, which is critical for a session store where active sessions are repeatedly read and written.

Exam trap

Google Cloud often tests the misconception that increasing replicas or enabling persistence can solve memory pressure issues, when in fact only adjusting the eviction policy or increasing `maxmemory` directly addresses evictions.

How to eliminate wrong answers

Option A is wrong because enabling persistence (RDB) does not reduce evictions; it creates point-in-time snapshots of data to disk, which consumes CPU and I/O resources without affecting the memory eviction policy. Option B is wrong because increasing the number of replicas does not reduce evictions on the primary instance; replicas are read-only copies that do not increase the primary's memory capacity or change its eviction behavior. Option C is wrong because decreasing the timeout (i.e., reducing the TTL for keys) would cause keys to expire sooner, potentially increasing evictions as more keys are removed by expiration, not reducing them.

Full explanation →

461

MCQhard

Refer to the exhibit. The query scans 500 GB even though it filters on the partitioning column event_date and only needs data from 30 days. What is the most likely reason?

A.COUNT(DISTINCT) often results in full table scan to ensure accuracy, even with partitions.

B.The query lacks a LIMIT clause.

C.The clustering on user_id is causing a full table scan.

D.The table is not actually partitioned by event_date; the filter is on a non-partitioned column.

AnswerA

Distinct aggregations can require scanning all data to ensure correctness.

Why this answer

Option A is correct because COUNT(DISTINCT) in many SQL engines, including those used in data warehousing like Google BigQuery or Snowflake, often requires a full scan of all partitions to ensure global uniqueness. Even with a filter on the partitioning column, the engine cannot guarantee that distinct values are confined to the filtered partitions without scanning all data, especially if the distinct operation spans across partitions or if the engine's optimizer lacks partition pruning for distinct aggregations.

Exam trap

Google Cloud often tests the misconception that partition pruning always applies to aggregation functions, but the trap here is that COUNT(DISTINCT) bypasses partition pruning because it requires global deduplication, leading to a full table scan even with a partition filter.

How to eliminate wrong answers

Option B is wrong because a LIMIT clause does not affect the scan size of an aggregation query; it only limits the number of rows returned after processing, not the data read. Option C is wrong because clustering on user_id does not cause a full table scan; clustering reorganizes data within partitions for better compression and query performance, but it does not override partition pruning or force a full scan. Option D is wrong because the question states the filter is on the partitioning column event_date, so the table is partitioned by event_date; if it were not, the filter would still prune partitions if the column were a partition key, but the scenario explicitly says it is a partitioning column.

Full explanation →

462

MCQhard

You are managing a Cloud Spanner instance that supports a global e-commerce application. Queries that join two large tables (Orders and OrderItems) have high latency. The tables use the CustomerID as the primary key prefix. The join condition is on OrderID, which is the second part of the primary key in both tables. What should you do to improve performance?

A.Increase the number of nodes to handle the load.

B.Create a secondary index on OrderID in both tables.

C.Change the primary key of both tables to start with OrderID, then CustomerID.

D.Recreate the tables as interleaved tables with Orders as parent.

AnswerC

This ensures that related rows for the same order are co-located, allowing local joins.

Why this answer

Option B is correct because by making OrderID the first part of the primary key, the join becomes a local join within the same split, reducing cross-node communication. Option A is wrong because interleaving tables requires parent-child relationship by primary key. Option C is wrong because secondary indexes do not help with joins across tables.

Option D is wrong because increasing nodes may not fix the join strategy and costs more.

Full explanation →

463

MCQhard

What is this alert likely monitoring?

A.High disk I/O

B.High number of database connections

C.High CPU usage

D.High number of slow queries

AnswerD

The alert is on the count of log entries, which likely come from the slow query log.

Why this answer

The alert is likely monitoring a high number of slow queries because slow query logs are a primary metric for identifying database performance bottlenecks. In PostgreSQL, the `log_min_duration_statement` parameter controls which queries are logged, and a sudden spike in slow queries indicates inefficient SQL, missing indexes, or lock contention. This directly impacts database throughput and user experience, making it a critical monitoring target.

Exam trap

Google Cloud often tests the distinction between symptoms (high CPU, high I/O) and root causes (slow queries), tricking candidates into selecting a generic metric rather than the specific performance indicator being monitored.

How to eliminate wrong answers

Option A is wrong because high disk I/O is a symptom of underlying issues like full table scans or insufficient memory, but the alert specifically targets query performance rather than storage subsystem metrics. Option B is wrong because a high number of database connections can cause resource exhaustion, but it is a connection pool management issue, not a direct indicator of slow query performance. Option C is wrong because high CPU usage can result from many factors including slow queries, but the alert is focused on query execution time, not CPU utilization as a primary metric.

Full explanation →

464

Multi-Selectmedium

A company uses Cloud SQL for PostgreSQL for its BI database. Queries involving joins on large tables are slow. Which TWO strategies should they implement to improve join performance? (Choose TWO.)

Select 2 answers

A.Denormalize tables to reduce the number of joins

B.Add indexes on the columns used in JOIN conditions

C.Increase the number of CPU cores on the instance

D.Create read replicas for the join queries

E.Use connection pooling to reduce connection overhead

AnswersA, B

Denormalization physically stores related data together, avoiding joins.

Why this answer

Denormalizing tables reduces the number of joins required in queries by combining related data into fewer tables. This directly minimizes the computational overhead of join operations in Cloud SQL for PostgreSQL, which is especially beneficial for large BI datasets where join performance is critical.

Exam trap

The trap here is that candidates often confuse scaling resources (CPU, replicas) with query optimization techniques, failing to recognize that denormalization and indexing directly address the join performance bottleneck at the data structure level.

Full explanation →

465

MCQeasy

Your organization uses Cloud SQL for MySQL as the backend for a content management system. The Operations team reports that the database performance degrades every weekday morning at 9 AM, coinciding with a batch job that updates thousands of rows. You need to minimize the impact on end users. What is the best approach?

A.Increase the Cloud SQL instance memory and CPU before the job starts.

B.Break the batch job into smaller transactions with a delay between batches.

C.Disable binary logging during the batch window.

D.Move batch reads to a read replica.

AnswerB

Smaller transactions release locks faster and reduce contention.

Why this answer

Option A is correct because batching updates into small transactions reduces lock contention and improves concurrency. Option B is wrong because increasing instance size is costly and may not be needed. Option C is wrong because read replicas are for read traffic, not writes.

Option D is wrong because disabling binary logging affects point-in-time recovery.

Full explanation →

466

MCQhard

A data warehouse in BigQuery stores daily snapshots of customer data. The schema uses a single table with a snapshot_date partition column. Over time, the table has grown to 10 TB and queries often scan entire partitions. Which schema redesign would improve query performance and reduce costs significantly?

A.Create separate tables for each snapshot_date.

B.Use clustering on customer_id and snapshot_date.

C.Use a nested and repeated structure to store all snapshots per customer in a single row.

D.Use a wildcard table with a _TABLE_SUFFIX filter.

AnswerC

Nested fields allow storing an array of snapshots per customer, reducing data scanned per query significantly.

Why this answer

Using nested and repeated fields (Option D) to store all snapshots per customer in a single row drastically reduces data scanned when querying by customer. Option A (clustering) still scans full partitions if querying across dates. Options B and C (separate tables/wildcards) still require scanning multiple partitions.

Full explanation →

467

MCQhard

In Cloud Spanner, a table 'Orders' has a primary key (OrderId INT64) and is frequently updated. The application often queries for orders placed in the last hour. To reduce read latency, you decide to add a column to store the commit timestamp. Which approach should you use?

A.Define the column with the `allow_commit_timestamp` option and set it to 'true'

B.Create an interleaved table with the timestamp

C.Use a generated column with expression to get current_timestamp

D.Add a secondary index on a user-managed timestamp column

AnswerA

Spanner automatically assigns the commit timestamp to such columns, enabling efficient time-based queries.

Why this answer

Option D is correct: Spanner allows defining a column with commit timestamp option, which automatically records the commit time of the last mutation. This eliminates the need to read the column separately. Option A (interleaved tables) does not provide the timestamp.

Option B (secondary index) adds write overhead. Option C (generated column) cannot be automatically populated with commit time.

Full explanation →

468

MCQmedium

A company tracks customer demographics that change over time (e.g., address). They need to maintain historical accuracy in BI reports. Which approach correctly implements a Type 2 slowly changing dimension?

A.Store only the current value and rely on the fact table's timestamp to infer history

B.Add effective start and end date columns for each dimension attribute

C.Store only the current value in the dimension table and use an audit log for changes

D.Overwrite the old value with the new value

AnswerB

This standard SCD Type 2 pattern allows querying the state of the dimension at any point in time.

Why this answer

Option B is correct because Type 2 SCD uses start and end dates to track effective periods, allowing queries to join based on the snapshot date. Option A is wrong because overwriting loses history. Option C is wrong because an append-only log requires complex queries to get current snapshot.

Option D is wrong because a single column storing only current value loses history.

Full explanation →

469

Multi-Selectmedium

A database engineer is designing a Cloud SQL for MySQL schema for a multi-tenant SaaS application. Each tenant's data is isolated. Which TWO strategies are appropriate for tenant isolation?

Select 2 answers

A.Create a separate database for each tenant.

B.Use a single table with a tenant_id column and enforce filtering in application queries.

C.Use column-level security to hide tenant data.

D.Use a separate Cloud SQL instance per tenant.

E.Use row-level security policies to restrict access per tenant.

AnswersA, D

Separate databases provide strong isolation and are easy to manage.

Why this answer

Option A is correct because creating a separate database per tenant provides strong logical isolation at the schema level, preventing accidental cross-tenant data access. Cloud SQL for MySQL supports multiple databases within a single instance, and this approach leverages native MySQL database boundaries without requiring additional filtering logic. It also simplifies backup and restore operations per tenant.

Exam trap

Google Cloud often tests the misconception that MySQL supports advanced security features like row-level or column-level security, which are actually available in other database engines like PostgreSQL or SQL Server, leading candidates to incorrectly select options C or E.

Full explanation →

470

MCQeasy

A BigQuery table stores daily sales data. The team commonly queries data for a specific date range. Which schema optimization will reduce query cost and improve performance?

A.Create a view over the table

B.Create a materialized view with a filter on date

C.Cluster the table by date column

D.Partition the table by date column

AnswerD

Partition pruning reduces data scanned.

Why this answer

Option A is correct: partitioning by date (e.g., ingestion-time or column) allows BigQuery to prune partitions and scan only relevant data. Option B (clustering) helps but partitioning is more impactful for range pruning. Option C (view) does not reduce storage.

Option D (materialized view) is precomputed but still requires scanning partitions.

Full explanation →

471

MCQhard

A Cloud SQL for PostgreSQL instance is experiencing high CPU usage due to many short-lived connections. Which configuration change can help without application changes?

A.Use read replicas

B.Increase the max_connections parameter

C.Increase the tier to more vCPUs

D.Enable connection pooling with pgBouncer

AnswerD

pgBouncer reduces the number of concurrent connections to the database, lowering CPU usage.

Why this answer

Option B is correct because enabling connection pooling with pgBouncer reduces the CPU overhead of creating and tearing down connections. Option A (read replicas) helps with read load, not CPU from connections. Option C (increase max_connections) may worsen the issue.

Option D (increase tier) increases cost but may not address the root cause of connection churn.

Full explanation →

472

MCQmedium

Refer to the exhibit. A BI query is performing slowly. The query plan shows a large shuffle in the aggregate stage. The table is not partitioned or clustered. Which optimization would most directly reduce the shuffle size?

A.Converting the query to use a window function.

B.Using a materialized view.

C.Adding a WHERE clause to filter recent data.

D.Clustering the table on the grouping columns.

AnswerD

Clustering by grouping columns pre-orders data, minimizing shuffle during aggregation.

Why this answer

Clustering the table on the grouping columns physically co-locates rows with the same group key values within the same storage units (e.g., files or partitions). This allows the query engine to perform partial aggregation locally before the shuffle, dramatically reducing the amount of data that must be moved across the network during the aggregate stage. In systems like BigQuery or Spark SQL, clustering on grouping columns directly minimizes shuffle size by enabling pre-aggregation at the storage layer.

Exam trap

Google Cloud often tests the distinction between reducing data scanned (filtering) versus reducing data shuffled (clustering/partitioning), and candidates mistakenly choose a WHERE clause because they think less input data equals less shuffle, but shuffle size depends on the grouping key distribution, not the total data volume.

How to eliminate wrong answers

Option A is wrong because converting to a window function does not reduce shuffle size; window functions still require partitioning and ordering, often causing an even larger shuffle. Option B is wrong because a materialized view pre-computes and stores the query result, but it does not reduce the shuffle of the original query; it avoids the query entirely, which is a different optimization strategy. Option C is wrong because adding a WHERE clause to filter recent data reduces the total data scanned but does not directly reduce the shuffle size for the remaining data; the shuffle still occurs on the filtered dataset, and the grouping columns remain unoptimized.

Full explanation →

473

Multi-Selecthard

A company stores log files in Cloud Storage buckets. The logs are accessed frequently for the first 30 days, then rarely for the next 6 months, after which they must be archived for 7 years. They want to minimize storage costs. Which two actions should they take? (Choose two.)

Select 2 answers

A.Keep objects in Standard storage class for the entire retention period.

B.Set a lifecycle rule to change storage class to Nearline after 30 days.

C.Set a lifecycle rule to delete the objects after 30 days.

D.Enable Autoclass to automatically transition objects to colder storage classes.

E.Set a lifecycle rule to change storage class to Archive after 6 months.

AnswersB, E

Nearline is cost-effective for data accessed less than once a month.

Why this answer

Options B and C are correct: they should transition to Nearline after 30 days, and then to Archive after 6 months. Option A is wrong because autoclass is a single setting that automatically transitions, but it may not precisely match the 30-day/6-month schedule; custom lifecycle rules give more control. Option D is wrong because deleting after 30 days loses data needed for 6 months.

Option E is wrong because Standard class is the most expensive; not cost-effective for rare access. Correct: B and C.

Full explanation →

474

Multi-Selecteasy

A team is designing a disaster recovery plan for a Cloud Spanner instance. They want an RPO of 10 minutes and RTO of 5 minutes. Which two features should they use?

Select 2 answers

A.Enable point-in-time recovery (PITR) with a 10-minute recovery period

B.Deploy a cross-region read replica

C.Configure automated backups

D.Use database versioning to failover

E.Set up a multi-region configuration

AnswersA, E

PITR allows recovery to any point within window, meeting RPO.

Why this answer

Point-in-time recovery (PITR) with a 10-minute recovery period is correct because it allows restoring Cloud Spanner data to any point within the last 10 minutes, meeting the RPO of 10 minutes. PITR provides versioned data retention without requiring manual backups, enabling recovery within seconds to minutes, which satisfies the RTO of 5 minutes when combined with a multi-region configuration.

Exam trap

Google Cloud often tests the misconception that read replicas or automated backups alone can meet strict RPO/RTO requirements, but in Cloud Spanner, only PITR combined with a multi-region configuration provides the necessary recovery granularity and automatic failover.

Full explanation →

475

MCQhard

A company uses Cloud Bigtable with replication across two clusters in us-east1 and us-west1. They have a critical application that requires strong consistency for all reads after writes. What configuration should they implement to meet this requirement?

A.Use cluster-group routing with multi-cluster routing.

B.Use single-cluster routing with single-row transactions.

C.Enable inter-cluster replication with strong consistency.

D.Use multi-cluster routing to automatically route to the nearest cluster.

AnswerB

Single-cluster routing ensures all reads and writes go to the same cluster, providing strong consistency.

Why this answer

Cloud Bigtable does not support strong consistency across clusters in a replicated setup; replication is eventually consistent. To guarantee strong consistency for reads after writes, you must use single-cluster routing with single-row transactions, which ensures that all reads and writes for a given row are processed by the same cluster, providing ACID semantics for that row.

Exam trap

The trap here is that candidates often assume 'replication' implies strong consistency, but Cloud Bigtable's cross-cluster replication is eventually consistent, and the only way to achieve strong consistency is to confine all operations to a single cluster using single-row transactions.

How to eliminate wrong answers

Option A is wrong because cluster-group routing with multi-cluster routing distributes requests across clusters, which can lead to stale reads due to eventual consistency. Option C is wrong because inter-cluster replication in Cloud Bigtable is inherently eventually consistent; there is no configuration to make it strongly consistent across clusters. Option D is wrong because multi-cluster routing routes to the nearest cluster based on latency, but this does not guarantee strong consistency; reads may return stale data if the write has not yet replicated.

Full explanation →

476

MCQmedium

A news website uses Cloud SQL for MySQL for content management. They experience slow reads during breaking news events. They have a single primary instance in us-east1. They need to improve read scalability globally. They also want to ensure data is backed up in another region. What should they do?

A.Use Cloud Spanner with multi-region configuration.

B.Enable automatic failover to a standby instance in another region.

C.Add cross-region read replicas in multiple regions and use replica read for queries.

D.Use Bigtable for content storage.

AnswerC

Read replicas provide read scalability and can be used for backups.

Why this answer

Option C is correct because adding cross-region read replicas in multiple regions allows the website to offload read queries to replicas located closer to global users, reducing latency during traffic spikes. Cloud SQL for MySQL supports cross-region replicas, which also provide a backup copy of data in another region for disaster recovery, meeting both scalability and backup requirements without changing the database engine.

Exam trap

Google Cloud often tests the distinction between read scalability and disaster recovery, and the trap here is that candidates confuse cross-region read replicas (which provide both read offloading and a backup copy) with automatic failover (which Cloud SQL for MySQL does not support across regions).

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, horizontally scalable database that requires significant application changes and does not use MySQL, making it an over-engineered migration for a MySQL-based content management system. Option B is wrong because automatic failover to a standby instance in another region is not supported by Cloud SQL for MySQL; Cloud SQL only supports regional high availability with a zonal standby, not cross-region failover. Option D is wrong because Bigtable is a NoSQL wide-column database optimized for analytical workloads and time-series data, not for content management with complex queries and joins, and it would require a complete schema redesign.

Full explanation →

477

MCQmedium

A company notices that their Cloud SQL for PostgreSQL instance, as shown in the exhibit, frequently runs out of storage, causing downtime. They have set up automated backups with point-in-time recovery. What is the most likely cause of the storage issue?

A.Transaction logs for point-in-time recovery are consuming disk space.

B.The activation policy is set to ALWAYS, causing continuous writes.

C.The instance tier (db-custom-4-15360) is too low for the workload.

D.The data disk type (PD_SSD) is not suitable for PostgreSQL.

AnswerA

Transaction logs are stored on the same disk and can grow large.

Why this answer

The correct answer is A because Cloud SQL for PostgreSQL uses transaction logs (WAL files) to enable point-in-time recovery (PITR). These logs accumulate on the disk until they are automatically removed, but if the rate of log generation exceeds the cleanup rate or if the backup retention period is long, the logs can fill the disk, causing storage exhaustion and downtime.

Exam trap

Google Cloud often tests the misconception that storage issues are caused by instance tier or disk type, when in fact the hidden culprit is transaction log accumulation from point-in-time recovery settings.

How to eliminate wrong answers

Option B is wrong because the activation policy (ALWAYS vs ON_DEMAND) controls whether the instance is billed continuously, not the frequency of writes; it does not directly cause storage to run out. Option C is wrong because the instance tier (db-custom-4-15360) refers to vCPU and memory, not storage capacity; a low tier might cause performance issues but not storage exhaustion. Option D is wrong because PD_SSD is a fully suitable disk type for PostgreSQL; the storage issue is about capacity, not disk type suitability.

Full explanation →

478

MCQmedium

You have a Cloud SQL for MySQL table that stores user logins with columns: user_id, login_time, ip_address. You frequently run queries to count logins by user for a specific date range. Which index would be most efficient?

A.No index; rely on full table scan

B.Separate indexes on user_id and login_time

C.A composite index on (login_time, user_id)

D.A composite index on (user_id, login_time)

AnswerC

Allows efficient range scan on login_time and provides user_id for grouping.

Why this answer

Option B is correct: a composite index on (login_time, user_id) because the query filters by login_time range and then groups by user_id. The index can be used for both the WHERE clause (range scan on login_time) and then user_id is available for grouping without accessing the table. Option A puts user_id first, which is less efficient for range filtering on login_time.

Options C and D are not as efficient as a composite index.

Full explanation →

479

MCQhard

In Cloud Bigtable, a table has a high ratio of garbage collection (GC) that causes performance degradation during compaction. What is the best practice to monitor and optimize this?

A.Use the bigtableadmin API to view table stats

B.Compact the table manually

C.Use Cloud Monitoring to track garbage collection count and adjust GC settings

D.Increase node count

AnswerC

Monitoring GC count and tuning GC settings can reduce compaction overhead.

Why this answer

Option C is correct because Cloud Monitoring provides the metrics (e.g., 'bigtable.googleapis.com/table/garbage_collection_count') needed to track GC activity, and adjusting GC settings (e.g., column family max versions or TTL) directly reduces the compaction overhead caused by excessive stale data. This aligns with best practices for proactive performance optimization in Cloud Bigtable.

Exam trap

The trap here is that candidates confuse operational scaling (adding nodes) with tuning data retention policies, failing to recognize that GC-related compaction degradation is a schema design issue, not a capacity issue.

How to eliminate wrong answers

Option A is wrong because the bigtableadmin API is used for administrative operations like creating or modifying tables, not for real-time monitoring of garbage collection metrics; it does not expose GC counts or compaction performance data. Option B is wrong because manual compaction is a reactive, disruptive operation that does not address the root cause (high GC ratio) and can temporarily degrade performance further. Option D is wrong because increasing node count only scales throughput and storage capacity, not the compaction efficiency; it does not reduce the GC ratio or the compaction workload caused by excessive garbage.

Full explanation →

480

MCQmedium

You need to ensure that read operations on a Cloud Spanner database return the most recent committed data. Which read type should you use?

A.Read-only transaction

B.Stale read

C.Partitioned read

D.Strong read

AnswerD

Strong reads return the most recent committed data at read time.

Why this answer

Strong read is the correct choice because it guarantees that read operations return the most recent committed data from Cloud Spanner. Unlike other read types, strong reads access the current state of the database at the time of the read, ensuring external consistency and linearizability, which is critical for applications requiring up-to-date data.

Exam trap

The trap here is that candidates often confuse read-only transactions with strong reads, assuming that any read-only operation automatically returns the latest data, whereas in Spanner, read-only transactions can be configured for stale reads unless explicitly set to strong.

How to eliminate wrong answers

Option A is wrong because a read-only transaction can use stale reads or strong reads depending on the timestamp bound, but by default it does not guarantee the most recent committed data unless explicitly configured with a strong read. Option B is wrong because a stale read intentionally returns data that is older than the current time, trading consistency for lower latency, which does not meet the requirement for the most recent committed data. Option C is wrong because a partitioned read is designed for large-scale, high-throughput reads across partitions and does not inherently provide strong consistency; it typically uses stale reads for performance.

Full explanation →

481

Multi-Selecteasy

A Bigtable instance is running out of storage and performance is degraded. The schema design is known to be efficient. Which THREE actions can help?

Select 3 answers

A.Delete unused column families.

B.Reduce the number of tablets.

C.Add SSDs.

D.Increase node count.

E.Enable compression.

AnswersA, D, E

Deleting column families triggers garbage collection, freeing up storage used by that data.

Why this answer

Deleting unused column families is correct because each column family in Bigtable stores data in separate SSTable files, and unused families consume storage and memory resources without providing value. Removing them frees up space and reduces the amount of data that must be scanned during reads and compactions, directly improving performance.

Exam trap

Google Cloud often tests the misconception that adding SSDs is a valid optimization for Bigtable, but Bigtable abstracts storage via Colossus and does not allow direct SSD configuration, so candidates must recognize that only node count, compression, and column family management are actionable.

Full explanation →

482

MCQeasy

A data warehouse in BigQuery is running slower due to large full-table scans. Which feature can reduce the amount of data processed for common queries?

A.All of the above

B.Clustering

C.Materialized Views

D.Partitioning

AnswerA

All three features reduce data processed.

Why this answer

Option A is correct because partitioning, clustering, and materialized views all reduce the amount of data scanned in BigQuery. Partitioning limits scans to specific date ranges, clustering sorts data within partitions to skip irrelevant blocks, and materialized views precompute and store query results so subsequent queries read only the pre-aggregated output instead of scanning the base table. Together, these features minimize full-table scans and improve query performance.

Exam trap

Google Cloud often tests the misconception that a single optimization technique (like partitioning or clustering) is sufficient to solve all performance issues, when in fact the correct answer requires combining multiple features to achieve the greatest reduction in data scanned.

How to eliminate wrong answers

Option B is wrong because clustering alone does not reduce the amount of data processed; it only reorganizes data within partitions to improve filter and aggregation efficiency, but without partitioning, a query can still scan the entire table. Option C is wrong because materialized views alone do not reduce data processed for all common queries; they only help for queries that match the view's definition, and they require manual creation and maintenance. Option D is wrong because partitioning alone only limits scans to specific partitions based on the partition key, but if queries do not filter on that key, full-table scans still occur.

Full explanation →

483

Multi-Selectmedium

Which TWO metrics are most important to monitor for a Cloud SQL for PostgreSQL instance to detect performance degradation?

Select 2 answers

A.Memory usage

B.CPU utilization

C.Query latency

D.Disk IOPS

E.Network throughput (bytes sent/received)

AnswersB, D

High CPU indicates query processing load.

Why this answer

CPU utilization (B) is a primary indicator of performance degradation because sustained high CPU usage (e.g., >80%) can lead to query queuing, reduced throughput, and increased latency. Disk IOPS (D) is equally critical because Cloud SQL for PostgreSQL relies on disk I/O for WAL writes, checkpointing, and query execution; hitting the IOPS limit of the underlying persistent disk (e.g., 3,000 IOPS for a pd-standard disk) causes throttling and severe performance drops.

Exam trap

Google Cloud often tests the misconception that query latency is a primary monitoring metric, but the trap here is that latency is a downstream effect—you must monitor the underlying resource metrics (CPU and IOPS) to detect degradation before users experience slow queries.

Full explanation →

484

MCQmedium

You are managing a Cloud SQL for PostgreSQL instance that is experiencing high CPU usage and slow query performance. You notice that the database has a high number of idle-in-transaction connections. Which immediate action should you take to reduce CPU load without disrupting active transactions?

A.Use VPC firewall rules to block new connections until the issue resolves.

B.Kill all idle-in-transaction connections using pg_terminate_backend.

C.Set the cloudsql.enable_idle_in_transaction_session_timeout flag to true and configure idle_in_transaction_session_timeout.

D.Set a statement_timeout at the session level for new connections.

AnswerC

This flag automatically terminates idle-in-transaction sessions after a specified timeout, reducing CPU usage without manual intervention.

Why this answer

Option C is correct because setting the `cloudsql.enable_idle_in_transaction_session_timeout` flag to true and configuring `idle_in_transaction_session_timeout` allows Cloud SQL to automatically terminate idle-in-transaction connections after a specified timeout, reducing CPU load without manually killing connections or disrupting active transactions. This is a built-in, non-disruptive mechanism that targets only connections that are holding resources while idle, freeing up CPU and memory for active queries.

Exam trap

Google Cloud often tests the distinction between `statement_timeout` (which limits query execution time) and `idle_in_transaction_session_timeout` (which limits idle time within a transaction), and candidates mistakenly choose the session-level timeout thinking it will handle idle transactions, but it only applies to individual statements, not the idle period between statements within a transaction.

How to eliminate wrong answers

Option A is wrong because using VPC firewall rules to block new connections would prevent all new traffic, including legitimate active transactions, and does not address the existing idle-in-transaction connections that are already consuming CPU. Option B is wrong because killing all idle-in-transaction connections with `pg_terminate_backend` would abruptly terminate those backends, potentially causing application errors and disrupting any transactions that might be in a brief idle state but still holding locks or resources. Option D is wrong because setting `statement_timeout` at the session level only limits the duration of a single query, not the idle time of a transaction; it would not automatically terminate connections that are idle in a transaction, leaving the CPU load unaddressed.

Full explanation →

485

Multi-Selectmedium

Which THREE components are required to compute a 7-day moving average of daily sales using a window function? (Choose three.)

Select 3 answers

A.PARTITION BY product

B.WINDOW clause

C.AVG() function

D.ROWS BETWEEN 6 PRECEDING AND CURRENT ROW

E.ORDER BY date

AnswersC, D, E

AVG calculates the average.

Why this answer

Option C is correct because the AVG() function is the aggregate function that computes the arithmetic mean of the sales values over the specified window frame. In a moving average calculation, AVG() is applied to the rows defined by the window frame to produce the average for each row.

Exam trap

Google Cloud often tests the misconception that the WINDOW clause is mandatory for window functions, when in fact it is only a convenience for reusing a window specification, and the frame can be defined directly in the OVER clause.

Full explanation →

486

MCQhard

A company uses Cloud Spanner. The backup service account 'sa-backup' needs to create and manage backups of the 'orders' database. However, backup creation fails with a permission error. What is the most likely cause?

A.The service account lacks the spanner.databases.read permission.

B.The service account is assigned the role roles/spanner.databaseBackupAdmin, which is a custom role that does not include the spanner.backups.create permission.

C.The backup role must be granted at the instance level, not on the database.

D.The instance 'orders-db' is in a regional configuration, which does not support backups.

AnswerC

The role roles/spanner.backupAdmin must be granted on the instance, not the database, to create backups.

Why this answer

Option C is correct because Cloud Spanner backup permissions must be granted at the instance level, not on the database itself. The service account 'sa-backup' needs the `spanner.backups.create` permission on the instance resource to create backups, and assigning a role like `roles/spanner.databaseBackupAdmin` at the database level does not propagate the necessary instance-level permissions, causing the backup creation to fail with a permission error.

Exam trap

The trap here is that candidates often assume database-level roles are sufficient for database-specific operations like backups, but Cloud Spanner enforces instance-level scoping for backup permissions, leading to a common mistake of assigning roles at the wrong resource hierarchy level.

How to eliminate wrong answers

Option A is wrong because the `spanner.databases.read` permission is for reading database data, not for creating backups; backup creation requires `spanner.backups.create` and related permissions at the instance level. Option B is wrong because `roles/spanner.databaseBackupAdmin` is a predefined role that includes `spanner.backups.create` and other necessary permissions; it is not a custom role lacking that permission, so the failure is not due to a missing permission in the role itself. Option D is wrong because Cloud Spanner supports backups for both regional and multi-region instances; a regional configuration does not prevent backup creation.

Full explanation →

487

MCQmedium

A global e-commerce platform needs a database that supports strong consistency across multiple continents and can handle high write throughput. Which database service should they choose?

A.Cloud SQL for PostgreSQL

B.Cloud Firestore

C.Cloud Spanner

D.Cloud Bigtable

AnswerC

Cloud Spanner offers global strong consistency and high write throughput, making it ideal for global e-commerce.

Why this answer

Cloud Spanner is the correct choice because it provides globally distributed, strongly consistent relational database service with synchronous replication across regions and continents, supporting external consistency and high write throughput via TrueTime and Paxos-based consensus. This meets the requirement for strong consistency across multiple continents and high write throughput, which is not achievable with traditional single-region or eventually consistent databases.

Exam trap

The trap here is that candidates often confuse 'strong consistency' with 'eventual consistency' and choose Cloud Bigtable or Cloud Firestore for their high throughput, failing to recognize that only Cloud Spanner provides both strong consistency and horizontal scalability across continents.

How to eliminate wrong answers

Option A is wrong because Cloud SQL for PostgreSQL is a single-region, single-writer database that cannot scale horizontally across continents or provide strong consistency across multiple geographic regions. Option B is wrong because Cloud Firestore is a NoSQL document database that offers strong consistency only within a single region, and its multi-region mode provides eventual consistency, not the strong consistency required across continents. Option D is wrong because Cloud Bigtable is a wide-column NoSQL database designed for high throughput but only supports single-row transactions and eventual consistency across regions, lacking the strong consistency and multi-row transactional support needed for a global e-commerce platform.

Full explanation →

488

MCQmedium

A company is migrating an on-premises Oracle database to Cloud SQL for PostgreSQL. The database is 2 TB in size and the network bandwidth to Google Cloud is limited to 500 Mbps. The migration window is 48 hours. Which migration strategy should the Database Engineer recommend?

A.Create a VPN tunnel and use pg_dump/pg_restore over the network.

B.Use Database Migration Service with continuous replication.

C.Export the database to flat files, compress, upload to Cloud Storage, then import to Cloud SQL.

D.Request a dedicated interconnect and then migrate.

AnswerC

File-based migration with compression can work within the bandwidth and time constraints.

Why this answer

Option C is correct because the 2 TB database size and 500 Mbps bandwidth yield a theoretical transfer time of approximately 9.5 hours (2 TB * 1024 GB/TB * 8 bits/byte / 500 Mbps / 3600 seconds/hour), which fits within the 48-hour window. However, pg_dump/pg_restore over a VPN (Option A) would be slower due to TCP overhead and latency, and Database Migration Service with continuous replication (Option B) requires ongoing connectivity and may not complete the initial load within the window. Exporting to flat files, compressing them (e.g., with gzip), uploading to Cloud Storage, and then importing to Cloud SQL leverages high-throughput parallel uploads and avoids network latency issues, making it the most reliable strategy for a one-time migration within the given constraints.

Exam trap

The trap here is that candidates often assume Database Migration Service (Option B) is always the best choice for any migration, but they overlook that continuous replication is unnecessary for a one-time migration and that the initial load still faces the same bandwidth bottleneck as other network-based methods.

How to eliminate wrong answers

Option A is wrong because pg_dump/pg_restore over a VPN tunnel with 500 Mbps bandwidth would be severely impacted by TCP overhead, latency, and potential packet loss, making it unlikely to complete a 2 TB migration within 48 hours. Option B is wrong because Database Migration Service with continuous replication is designed for minimal downtime migrations, but the initial full load still requires transferring the entire 2 TB over the network, which faces the same bandwidth limitation; additionally, continuous replication would be unnecessary and add complexity for a one-time migration. Option D is wrong because requesting a dedicated interconnect is a long-term provisioning process (weeks to months) that cannot be completed within the 48-hour migration window, and it is overkill for a single migration event.

Full explanation →

489

MCQeasy

Your company runs a critical application on Google Kubernetes Engine (GKE) with a StatefulSet using persistent volumes backed by Compute Engine persistent disks. The application performs frequent small random writes to a MySQL database stored on the persistent disks. You notice that the disk write latency has increased significantly, and the application's throughput has dropped. Monitoring shows that the disk queue depth is consistently high. The current disk type is pd-standard. What is the most cost-effective way to reduce write latency and improve throughput?

A.Change the persistent disk type from pd-standard to pd-ssd.

B.Use a regional persistent disk for higher availability and performance.

C.Add more replicas of the StatefulSet to distribute writes across multiple disks.

D.Increase the size of the persistent disks to improve IOPS limits.

AnswerA

SSD provides lower latency and higher IOPS for random write workloads, solving the problem cost-effectively.

Why this answer

The application is experiencing high write latency due to insufficient IOPS from pd-standard disks, which are HDD-based and optimized for sequential reads, not small random writes. Changing to pd-ssd (SSD-based) provides significantly higher IOPS and lower latency for random write workloads, directly addressing the high queue depth and throughput drop. This is the most cost-effective solution because pd-ssd offers the necessary performance improvement without requiring architectural changes or over-provisioning capacity.

Exam trap

The trap here is that candidates may think increasing disk size (Option D) is the cheapest way to improve IOPS, but they overlook that pd-standard's IOPS/GB ratio is so low that the cost to reach equivalent pd-ssd performance would be much higher, making a disk type change the more cost-effective choice.

How to eliminate wrong answers

Option B is wrong because regional persistent disks provide higher availability through synchronous replication across zones, but they do not improve IOPS or latency performance over the base disk type; they would still use pd-standard performance if that type is selected. Option C is wrong because adding more replicas of the StatefulSet does not reduce write latency on the existing disks; writes to the MySQL database are typically concentrated on a single primary instance, and distributing writes across multiple disks would require application-level sharding, which is not described. Option D is wrong because increasing disk size improves IOPS limits for pd-standard disks only marginally (IOPS scale linearly with size but remain far below pd-ssd levels), and it would be less cost-effective than switching to pd-ssd since you would need a much larger pd-standard volume to match pd-ssd IOPS.

Full explanation →

490

Multi-Selecthard

You are managing a Cloud SQL for MySQL instance that is experiencing high latency and connection timeouts during peak hours. The current configuration uses 4 vCPUs, 15 GB memory, and 100 GB SSD storage. The database workload is a mix of transactional queries and batch inserts. Which TWO actions would most effectively reduce latency and improve performance?

Select 2 answers

A.Disable binary logging to reduce write I/O.

B.Increase the storage size to 200 GB to improve IOPS.

C.Increase the instance to 8 vCPUs and 30 GB memory.

D.Decrease the max_connections parameter to reduce overhead.

E.Enable the Cloud SQL proxy and use connection pooling.

AnswersC, E

Provides more resources to handle peak load.

Why this answer

Option C is correct because increasing vCPUs and memory directly addresses the resource bottleneck causing high latency and connection timeouts during peak hours. Cloud SQL for MySQL performance is heavily dependent on CPU for query processing and memory for buffer pool caching; doubling these resources reduces query execution time and improves concurrency handling.

Exam trap

Google Cloud often tests the misconception that increasing storage always improves IOPS, but in Cloud SQL for MySQL, IOPS scaling is tied to storage size only up to a baseline, and the real bottleneck in this scenario is compute and memory, not storage throughput.

Full explanation →

491

MCQeasy

A company is running a MySQL database on Cloud SQL and needs to optimize for high random read/write performance. Which storage type should they choose?

A.SSD persistent disk

B.Local SSD

C.Balanced persistent disk

D.HDD persistent disk

AnswerA

SSD persistent disk offers high IOPS and low latency, ideal for database storage in Cloud SQL.

Why this answer

SSD persistent disk provides consistent low-latency performance for random read/write operations, which is critical for MySQL databases on Cloud SQL. It offers higher IOPS and throughput compared to HDD or balanced persistent disks, making it the optimal choice for high random read/write workloads.

Exam trap

The trap here is that candidates may confuse Local SSD's high performance with persistence, not realizing it is ephemeral and cannot be used for Cloud SQL's managed database service, which requires durable storage.

How to eliminate wrong answers

Option B (Local SSD) is wrong because it is ephemeral and data is lost if the instance stops or fails, making it unsuitable for persistent database storage on Cloud SQL. Option C (Balanced persistent disk) is wrong because it offers lower IOPS and higher latency than SSD persistent disk, which is not ideal for high random read/write performance. Option D (HDD persistent disk) is wrong because it is designed for sequential read/write workloads and has significantly lower IOPS and higher latency, making it unsuitable for random access patterns.

Full explanation →

492

MCQhard

A team is reviewing IAM permissions on a Cloud Storage bucket. The exhibit shows the bucket's IAM policy. A developer is using the service account sa-1 and reports that they cannot delete objects in the bucket. What is the likely reason?

A.The etag value must be updated before any delete operation.

B.The service account sa-1 does not have the storage.objects.delete permission.

C.A condition is attached to the objectViewer role that prevents deletion.

D.The policy only allows deletion by the service account sa-2.

AnswerB

objectViewer only grants read access.

Why this answer

The IAM policy shown in the exhibit grants the `objectViewer` role to service account `sa-1`, which includes the `storage.objects.get` and `storage.objects.list` permissions but does not include `storage.objects.delete`. Without the `storage.objects.delete` permission, the developer cannot delete objects in the bucket, even if they can view them. The correct answer is B because the service account lacks the necessary delete permission.

Exam trap

Google Cloud often tests the distinction between viewing and deleting objects, where candidates mistakenly assume that having read access (objectViewer) also allows deletion, or that a condition or etag is the blocking factor, rather than recognizing the missing delete permission.

How to eliminate wrong answers

Option A is wrong because the `etag` field in an IAM policy is used for optimistic concurrency control during policy updates, not for object deletion operations; object deletion does not require updating the etag. Option C is wrong because the exhibit shows no conditions attached to the `objectViewer` role; even if a condition existed, it would restrict access further, but the core issue is the lack of the delete permission. Option D is wrong because the policy does not explicitly restrict deletion to `sa-2`; it only grants `storage.objectAdmin` to `sa-2`, which includes delete permission, but this does not prevent `sa-1` from deleting if it had the permission.

Full explanation →

493

MCQmedium

A logistics company uses BigQuery to track shipments. The `shipments` table has columns `id`, `status`, `created_date`, and `delivery_date`. They need a query that returns the number of shipments that were delivered within 5 days of creation for each month of 2024. Which SQL construct is most appropriate?

A.SELECT EXTRACT(MONTH FROM created_date) AS month, COUNT(*) FROM shipments WHERE TIMESTAMP_DIFF(delivery_date, created_date, HOUR) <= 120 AND EXTRACT(YEAR FROM created_date) = 2024 GROUP BY month

B.SELECT EXTRACT(MONTH FROM created_date) AS month, COUNTIF(DATETIME_DIFF(delivery_date, created_date, DAY) <= 5) FROM shipments WHERE EXTRACT(YEAR FROM created_date) = 2024 GROUP BY month

C.SELECT EXTRACT(MONTH FROM created_date) AS month, COUNT(*) FROM shipments WHERE DATETIME_DIFF(delivery_date, created_date, DAY) <= 5 AND EXTRACT(YEAR FROM created_date) = 2024 GROUP BY month

D.SELECT EXTRACT(MONTH FROM created_date) AS month, COUNT(*) FROM shipments WHERE DATE_DIFF(delivery_date, created_date, DAY) <= 5 AND EXTRACT(YEAR FROM created_date) = 2024 GROUP BY month

AnswerC

Correct function and clear intent.

Why this answer

Option C is correct because it uses `DATETIME_DIFF` with `DAY` precision to accurately compute the difference between `delivery_date` and `created_date` in days, and filters for shipments delivered within 5 days (i.e., <= 5 days). The `WHERE` clause also restricts to the year 2024, and the `GROUP BY month` with `EXTRACT(MONTH FROM created_date)` correctly aggregates counts per month. This matches the requirement precisely.

Exam trap

Google Cloud often tests the distinction between `DATE_DIFF`, `DATETIME_DIFF`, and `TIMESTAMP_DIFF`, and candidates mistakenly choose `DATE_DIFF` without considering the actual data types of the columns, or they use `TIMESTAMP_DIFF` with hours thinking it is equivalent, but fail to account for timezone and daylight saving effects.

How to eliminate wrong answers

Option A is wrong because it uses `TIMESTAMP_DIFF` with `HOUR` precision and checks `<= 120` hours, which is equivalent to 5 days but introduces potential edge-case errors due to daylight saving time shifts or timezone differences, and it is less readable and less precise for day-level logic. Option B is wrong because it uses `COUNTIF` with `DATETIME_DIFF` inside the SELECT clause, but `COUNTIF` is not a valid aggregate function in standard BigQuery SQL; the correct function is `COUNTIF` only in the context of a `COUNT` with a filter expression, but here it would cause a syntax error. Option D is wrong because it uses `DATE_DIFF` with `DAY` precision, but `DATE_DIFF` expects `DATE` type arguments, and if `delivery_date` or `created_date` are `DATETIME` or `TIMESTAMP` types, this will cause a type mismatch error or implicit conversion issues.

Full explanation →

494

MCQhard

A retail company uses BigQuery to store sales data. The 'sales' table has 10 billion rows and is partitioned by transaction_date (daily). The BI dashboard runs a query that aggregates sales by product_category for the last 30 days. The query is slow and expensive. Which improvement is most effective?

A.Cluster the table on product_category

B.Change partitioning to monthly

C.Denormalize the product_category into the sales table

D.Use a materialized view with aggregation on product_category

AnswerA

Clustering on product_category organizes data within each partition so that queries filtering/aggregating on that column scan fewer blocks.

Why this answer

Option A is correct because clustering the table on product_category organizes the data within each daily partition by that column, allowing BigQuery to use block-level pruning to skip irrelevant blocks when filtering or aggregating by product_category. This directly reduces the amount of data scanned for the 30-day aggregation query, improving both performance and cost.

Exam trap

Google Cloud often tests the distinction between partitioning (which limits data by time range) and clustering (which organizes data within partitions for column-based pruning), and candidates mistakenly choose partitioning changes or materialized views without understanding that clustering directly addresses the slow aggregation on a non-time column.

How to eliminate wrong answers

Option B is wrong because changing partitioning from daily to monthly would increase the partition size, forcing the query to scan more data per partition (the entire month) rather than only the last 30 days, which would actually worsen performance and cost. Option C is wrong because denormalizing product_category into the sales table is already the current schema; the issue is not about normalization but about data organization for efficient pruning. Option D is wrong because a materialized view with aggregation on product_category would still require scanning all partitions unless the view is also partitioned and clustered; moreover, materialized views in BigQuery are best for pre-aggregating high-frequency queries but do not inherently reduce scan costs if the underlying table is not properly clustered.

Full explanation →

495

MCQhard

A data engineer creates a clustered table in BigQuery with clustering order: country, city, product_id. The BI team frequently runs a query that filters on city and product_id but rarely on country. What is the most likely performance issue?

A.BigQuery allows only one clustering column per table.

B.The query does not filter on the first clustering column (country), so block pruning is minimal.

C.The table should be partitioned by country instead of clustered.

D.The query filters on too many clustering columns, causing overhead.

AnswerB

Clustering is optimized when filters include the leftmost clustering column.

Why this answer

BigQuery clustered tables use block pruning to skip reading blocks that don't match the query's filter. Pruning is most effective when the filter includes the first clustering column (country). Without it, BigQuery must scan more blocks, leading to higher query costs and slower performance.

Exam trap

Google Cloud often tests the misconception that any filter on clustering columns is equally effective, but the key is that pruning requires the first column in the clustering order to be filtered for maximum benefit.

How to eliminate wrong answers

Option A is wrong because BigQuery allows up to four clustering columns per table, not just one. Option C is wrong because partitioning by country would not help if the query rarely filters on country; partitioning is most beneficial for queries that filter on the partition column. Option D is wrong because filtering on multiple clustering columns does not cause overhead; it actually improves pruning, but the missing first column is the issue.

Full explanation →

496

MCQhard

A BI dashboard query is slow and high cost. The query does multiple joins on large tables and uses window functions. The data engineer suggests using materialized views. However, the query uses non-deterministic functions. What is the limitation?

A.Materialized views cannot include non-deterministic functions

B.Materialized views cannot be updated automatically

C.Materialized views cannot be created with joins

D.Materialized views only support simple aggregation

AnswerA

Materialized views require deterministic expressions to maintain consistency between base table changes.

Why this answer

Materialized views store the result set of a query physically, like a table. If the query includes non-deterministic functions (e.g., NOW(), RAND(), CURRENT_TIMESTAMP), the stored result would become stale immediately because the function's output changes each time it is evaluated. Most SQL databases (e.g., PostgreSQL, Oracle, Snowflake) explicitly forbid non-deterministic functions in materialized view definitions to prevent this logical inconsistency.

Exam trap

Google Cloud often tests the misconception that materialized views are 'static' and cannot be refreshed, or that they only support simple aggregations, when the real limitation is the prohibition of non-deterministic functions to ensure data consistency.

How to eliminate wrong answers

Option B is wrong because materialized views can be updated automatically via refresh mechanisms (e.g., ON COMMIT, scheduled refreshes), though they are not always updated in real-time. Option C is wrong because materialized views commonly support joins; in fact, they are often used to pre-join large tables for performance. Option D is wrong because materialized views can include complex aggregations, window functions, and multiple joins—not just simple aggregation.

Full explanation →

497

MCQmedium

A BI query uses COUNT(column) to count non-null values and COUNT(*) to count all rows. The analyst expects both counts to be equal, but COUNT(column) returns fewer rows. What is the most likely explanation?

A.The query has a WHERE clause that filters some rows.

B.COUNT(*) is faster, so it's not accurate.

C.COUNT(*) counts duplicate rows, while COUNT(column) does not.

D.The column contains NULL values, which are not counted by COUNT(column).

AnswerD

COUNT(column) only counts non-null values.

Why this answer

COUNT(column) ignores NULL values in the specified column, while COUNT(*) counts every row in the result set regardless of NULLs. If the column contains any NULLs, COUNT(column) will return a lower number. This is a fundamental SQL behavior defined in the ANSI SQL standard and is consistent across all major BI platforms (e.g., Tableau, Power BI, Looker) that generate SQL queries.

Exam trap

Google Cloud often tests the subtle distinction between COUNT(*) and COUNT(column) by embedding NULL values in the column, tempting candidates to incorrectly attribute the difference to duplicates or filtering.

How to eliminate wrong answers

Option A is wrong because a WHERE clause would filter rows before aggregation, affecting both COUNT(column) and COUNT(*) equally, so it cannot cause a discrepancy between the two counts. Option B is wrong because COUNT(*) is not inherently faster or less accurate; both functions return precise counts based on the same data set, and performance differences are irrelevant to accuracy. Option C is wrong because both COUNT(*) and COUNT(column) count duplicate rows; COUNT(column) counts non-null occurrences of the column, including duplicates, so duplicates do not cause a difference.

Full explanation →

498

Multi-Selectmedium

Which three of the following are valid considerations when designing secondary indexes in Cloud Spanner? (Choose three.)

Select 3 answers

A.Secondary indexes maintain strong consistency with the base table

B.Secondary indexes are automatically used for queries that filter on primary key columns

C.Secondary indexes require a unique constraint

D.Secondary indexes can be created on child tables without including the parent key

E.Secondary indexes can be created with a STORING clause to include non-key columns

AnswersA, D, E

All indexes in Spanner are strongly consistent.

Why this answer

Options A, D, and E are correct. A is true: secondary indexes can be created on interleaved tables (but note they are not interleaved themselves). D is true: secondary indexes in Spanner are strongly consistent with the base table.

E is true: secondary indexes can be created on child tables without including the parent key, though they are not interleaved. B is false because secondary indexes are not automatically used for primary key filtering; the primary index is used. C is false because secondary indexes do not require a unique constraint; they can be non-unique.

Full explanation →

499

Multi-Selecthard

A company is designing a data model for a BI dashboard that requires real-time updates and historical analysis. Which THREE practices should be followed?

Select 3 answers

A.Use clustering on frequently filtered columns.

B.Use streaming inserts for real-time data.

C.Create a separate table for each day's data.

D.Use the default BigQuery table expiration setting.

E.Use partitioning by ingestion time for continuous data.

AnswersA, B, E

Clustering orders data within partitions, improving filter performance.

Why this answer

Option A is correct because clustering on frequently filtered columns in BigQuery organizes data into blocks based on the values of those columns, allowing queries with filters on those columns to skip irrelevant blocks entirely. This reduces the amount of data scanned, improving query performance and lowering costs, which is critical for a BI dashboard that needs real-time updates and fast historical analysis.

Exam trap

Google Cloud often tests the misconception that creating separate tables for daily data is a good practice for time-series data, when in fact BigQuery's partitioning and clustering features are designed to handle such data more efficiently and with less administrative overhead.

Full explanation →

500

Multi-Selecthard

Which THREE factors should you consider when configuring Cloud SQL for MySQL query caching to optimize performance?

Select 3 answers

A.High write workloads can lead to frequent cache invalidation, reducing its effectiveness.

B.The query cache can become fragmented and require periodic defragmentation.

C.Query cache uses disk storage for cached results.

D.The query cache is deprecated and removed in MySQL 8.0+, so it should not be relied upon for new deployments.

E.Prepared statements always bypass the query cache.

AnswersA, B, D

Every table modification invalidates cached queries for that table, making cache less useful for write-heavy workloads.

Why this answer

Option A is correct because in Cloud SQL for MySQL, high write workloads cause frequent updates to tables, which invalidates the query cache entries for those tables. This means the cache must be repopulated often, reducing the hit rate and potentially adding overhead from cache maintenance, making it less effective for performance optimization.

Exam trap

Google Cloud often tests the misconception that the query cache uses disk storage, when in fact it uses memory, and that prepared statements always bypass the cache, which is not universally true across all MySQL configurations.

Full explanation →

501

Matchingmedium

Match each BigQuery feature to its purpose.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Column-based ordering to improve query performance

Dividing tables into segments for cost and performance

Guaranteed query capacity

In-memory analysis for sub-second query response

Automated data ingestion from various sources

Why these pairings

These features optimize cost, performance, and management in BigQuery.

Full explanation →

502

MCQhard

Your Bigtable cluster is experiencing high CPU utilization and write latency during a daily batch job that writes 500 GB of data. The job uses bulk mutations with 1000 rows per request. What is the most effective way to reduce CPU usage?

A.Disable write-ahead logs

B.Increase the number of rows per mutation to 10,000

C.Increase the number of nodes temporarily

D.Create a new cluster in a different zone

AnswerB

Fewer RPCs reduce CPU overhead.

Why this answer

Option B is correct because larger mutations reduce overhead per row. Option A is wrong because increasing nodes would help but is not as efficient as optimizing the write pattern. Option C is wrong because increasing cluster size doesn't reduce CPU per request.

Option D is wrong because write-ahead logs are internal; tuning them doesn't reduce CPU for writes.

Full explanation →

503

MCQmedium

A social media application uses Memorystore for Redis to cache user profiles and session data. Recently, the application experienced intermittent errors and high latency. You observe that the Redis CPU utilization is consistently above 90% and the cache hit ratio is 85%. The instance type is a Standard tier M2 (30 GB) with a maxmemory setting of 25 GB. The eviction policy is allkeys-lru. The number of keys is 10 million with an average value size of 2 KB. You suspect memory pressure is causing CPU spikes. What should you do to reduce CPU utilization and improve performance?

A.Reduce the maxmemory setting to force more aggressive eviction and free up memory.

B.Change the eviction policy to volatile-lru to prefer evicting keys with TTL.

C.Upgrade the Memorystore instance to a larger size (e.g., M5 with 60 GB) to provide more CPU and memory resources.

D.Implement Redis monitoring with Cloud Monitoring and set alerts to notify when CPU exceeds 80%.

AnswerC

A larger instance has more CPU cores and memory, reducing pressure.

Why this answer

Upgrading to a larger instance type provides more memory and CPU capacity. Reducing the eviction policy to volatile-lru would only evict keys with TTL, potentially increasing misses. Monitoring and tuning will not reduce current load.

Reducing value size is application-level and cannot be applied quickly.

Full explanation →

Page 7 of 7

All pages

Practice PCDE by domain

Target a specific domain to shore up weak areas.

Plan and manage database infrastructure Define data structures and implement SQL for Business Intelligence Design and implement database schemas Monitor and optimize database performance

See all domains with question counts →