Knowledge + Practice

Google Professional Cloud Database Engineer (PCDE) — Questions 376–450

503 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 6 of 7

376

MCQmedium

A BI report requires a running total of sales over the last 30 days for each product. The data is in a BigQuery table with columns: sale_date, product_id, amount. Which SQL window function is most efficient?

A.Use GROUP BY with SUM(amount)

B.Use SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW)

C.Use SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

D.Use a correlated subquery to sum over previous dates

AnswerC

This window function efficiently computes a running total across all rows up to the current row.

Why this answer

Option C is correct because it uses a window function with `ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW` to compute a running total over all rows up to the current row. However, the question asks for a running total over the last 30 days, not all preceding rows. The most efficient approach for a 30-day sliding window is actually `ROWS BETWEEN 29 PRECEDING AND CURRENT ROW` (or `RANGE BETWEEN INTERVAL 29 DAY PRECEDING AND CURRENT ROW` in BigQuery), but among the given options, C is the only one that produces a running total (cumulative sum) rather than a fixed 30-day window.

Option C is marked as correct in the answer key, but note that it does not limit to 30 days; it sums all prior sales. In BigQuery, for a true 30-day rolling sum, `RANGE BETWEEN INTERVAL 29 DAY PRECEDING AND CURRENT ROW` is the correct syntax.

Exam trap

Google Cloud often tests the distinction between `ROWS` and `RANGE` frame specifications, and the trap here is that candidates confuse a fixed row count (ROWS BETWEEN 30 PRECEDING) with a time-based window (RANGE BETWEEN INTERVAL 30 DAY PRECEDING), leading them to choose Option B even though it does not correctly implement a 30-day rolling sum.

How to eliminate wrong answers

Option A is wrong because GROUP BY with SUM(amount) aggregates sales per day or per product, but it cannot produce a running total across dates; it loses the row-level context needed for cumulative calculations. Option B is wrong because `ROWS BETWEEN 30 PRECEDING AND CURRENT ROW` sums exactly 31 rows (30 preceding + current), which is a fixed row count, not a time-based window of 30 days; if dates are missing or irregular, this will not correctly represent sales over the last 30 calendar days. Option D is wrong because a correlated subquery to sum over previous dates is inefficient and scales poorly; it requires a separate subquery execution for each row, leading to O(n²) performance, whereas a window function operates in a single pass over the data.

Full explanation →

377

MCQeasy

Refer to the exhibit. The company plans to store 3 TB of data in this instance. What is the minimum number of nodes required? (Assume 2 TB per node for HDD and 4 TB per node for SSD; this instance uses SSD.)

A.4

B.1

C.2

D.3

AnswerC

1 node provides 2 TB of SSD storage, so 2 nodes are needed for 3 TB.

Why this answer

The instance uses SSD storage, which provides 4 TB per node. To store 3 TB of data, a single node with 4 TB SSD would suffice, but the question asks for the minimum number of nodes required. However, the correct answer is 2 because in a distributed database like Couchbase (commonly tested in PCDE), data is replicated across nodes for high availability and durability.

With 1 node, there is no redundancy; with 2 nodes, you can store 3 TB of data while maintaining a replica, as each node contributes 4 TB of usable storage, and the effective storage after replication is 4 TB (2 nodes × 4 TB / 2 replicas = 4 TB), which is sufficient for 3 TB.

Exam trap

Google Cloud often tests the misconception that you only need enough raw storage to hold the data, ignoring the replication factor required for high availability in a clustered database environment.

How to eliminate wrong answers

Option A (4) is wrong because 4 nodes would provide 16 TB of raw SSD storage, which is excessive for only 3 TB of data and would be an inefficient use of resources. Option B (1) is wrong because a single node cannot provide data redundancy or high availability; in a production database cluster, you need at least 2 nodes to support replication and failover. Option D (3) is wrong because 3 nodes would provide 12 TB of raw storage, but after accounting for replication (typically 2 copies), the effective storage is 6 TB, which is more than needed; 2 nodes are sufficient and more cost-effective.

Full explanation →

378

MCQhard

A healthcare company uses Cloud SQL for MySQL for patient records. They need to export data for a compliance audit. They must ensure the export includes all changes within a specific time window (e.g., last 24 hours). They have binary logging enabled. What is the best method to obtain a consistent snapshot of the data as of the audit time?

A.Use Database Migration Service's continuous export.

B.Use Cloud SQL clone to create a new instance from a point-in-time, then export from clone.

C.Use mysqldump to export at the audit time.

D.Use Cloud SQL's export feature with a specific backup-id.

AnswerB

Cloning uses binary logs to recreate the exact state at a given time, providing a consistent snapshot.

Why this answer

Option B is correct because Cloud SQL's clone feature can create a new instance from a specific point-in-time using binary logs, providing a consistent snapshot of the database as of the audit time. This ensures all changes within the last 24 hours are captured without impacting the production instance, and the export can then be performed from the clone.

Exam trap

The trap here is that candidates may think mysqldump or the export feature can capture a point-in-time snapshot, but they overlook that Cloud SQL's clone with PITR is the only method that provides a consistent, non-disruptive snapshot at an arbitrary time within the binary log retention window.

How to eliminate wrong answers

Option A is wrong because Database Migration Service's continuous export is designed for ongoing replication to external targets, not for creating a point-in-time consistent snapshot from Cloud SQL's binary logs. Option C is wrong because mysqldump at audit time would lock tables and impact production performance, and it cannot guarantee a consistent snapshot that includes all changes within a specific time window without binary log replay. Option D is wrong because Cloud SQL's export with a specific backup-id only exports from a full backup, not from a point-in-time that includes all changes within the last 24 hours; backups are typically taken at scheduled intervals, not at the exact audit time.

Full explanation →

379

MCQhard

A company is designing a Firestore schema for a chat application with millions of messages. They need to support real-time updates and efficient querying of recent messages per conversation. Which schema and indexing strategy is optimal?

A.Store all messages in a single top-level collection. Create an index on (conversationId, timestamp desc).

B.Store messages in a subcollection with a single-field index on timestamp.

C.Store messages as a subcollection under each conversation document. Create a composite index on (conversationId, timestamp desc).

D.Use a parent document with a nested array of recent messages, and a separate collection for older messages.

AnswerC

Subcollections scale well and composite index enables efficient per-conversation queries.

Why this answer

Storing messages as a subcollection under each conversation document (Option A) is scalable and allows efficient queries with a composite index on (conversationId, timestamp desc). Option B (single collection) lacks natural grouping and may hit limits. Option C (subcollection without conversationId index) cannot filter by conversation efficiently.

Option D (nested array) is limited to 1 MiB per document.

Full explanation →

380

MCQeasy

A company wants to migrate a 100 GB MySQL database to Cloud SQL with minimal application changes. Which migration tool should they use?

A.mysqldump

B.Database Migration Service

C.BigQuery Data Transfer Service

D.Storage Transfer Service

AnswerB

DMS supports MySQL to Cloud SQL migrations with minimal changes.

Why this answer

Database Migration Service (DMS) is the correct tool because it is designed specifically for migrating databases to Cloud SQL with minimal downtime and minimal application changes. DMS uses a combination of initial snapshot and continuous change data capture (CDC) to replicate the source MySQL database to Cloud SQL, allowing the application to point to the new database with only a connection string update.

Exam trap

Google Cloud often tests the distinction between database migration tools and general data transfer or backup tools, so the trap here is that candidates might choose mysqldump (Option A) because it is a familiar MySQL tool, overlooking that it causes downtime and is not optimized for live migrations to Cloud SQL.

How to eliminate wrong answers

Option A is wrong because mysqldump is a logical backup tool that exports data as SQL statements, which requires taking the source database offline or locking tables during the dump, and the import process can be slow and error-prone for a 100 GB database, leading to significant application downtime and potential data inconsistency. Option C is wrong because BigQuery Data Transfer Service is designed for loading data into BigQuery, a data warehouse, not for migrating operational databases to Cloud SQL, and it does not support MySQL as a source or Cloud SQL as a target. Option D is wrong because Storage Transfer Service is used for moving objects from on-premises or other cloud storage to Google Cloud Storage (GCS), not for migrating live databases to Cloud SQL, and it cannot handle the transactional consistency or schema requirements of a MySQL database.

Full explanation →

381

MCQmedium

A company is designing a database schema for a global e-commerce platform. Orders are created with high frequency, and order status updates occur frequently. The team needs to choose a primary key strategy for the orders table in Spanner. Which approach minimizes hot-spotting?

A.Use a monotonically increasing integer (e.g., auto-increment)

B.Use a timestamp as the primary key

C.Use a composite key with user_id and order_date

D.Use a universally unique identifier (UUID) as the primary key

AnswerD

Distributes writes uniformly across splits.

Why this answer

In Spanner, monotonically increasing or time-ordered primary keys cause hot-spotting because all new writes are directed to the same tablet server, creating a single point of contention. UUIDs are randomly distributed, ensuring writes are spread evenly across the entire key space, which minimizes hot-spotting and maximizes write throughput.

Exam trap

Google Cloud often tests the misconception that composite keys with a user_id prefix are sufficient to avoid hot-spotting, but the trap is that any time-ordered component (like order_date) in the key still causes sequential writes to target the same tablet, negating the distribution benefit.

How to eliminate wrong answers

Option A is wrong because monotonically increasing integers concentrate writes on the last tablet, causing severe hot-spotting. Option B is wrong because timestamps are inherently monotonically increasing, leading to the same hot-spotting issue as auto-increment keys. Option C is wrong because a composite key with user_id and order_date still has a time-ordered component (order_date) that causes sequential writes to cluster on the same tablet, especially for users placing orders in quick succession.

Full explanation →

382

MCQhard

A company runs a Cloud Spanner database with a multi-region configuration. They notice that write latency is higher than expected for clients in a region far from the leader region. What action should be taken to reduce write latency?

A.Reduce the number of replicas

B.Use directed reads

C.Change the default leader option to 'NEAREST'

D.Enable follower reads for writes

AnswerC

This places the leader in the nearest region, reducing write latency for that region.

Why this answer

Option C is correct because changing the default leader option to 'NEAREST' allows Cloud Spanner to dynamically assign the leader replica to the region closest to the majority of write requests, reducing the network round-trip time for clients far from the original leader region. This directly addresses the high write latency caused by geographic distance, as writes must be confirmed by the leader before committing.

Exam trap

The trap here is that candidates confuse directed reads (which reduce read latency) with leader placement options (which reduce write latency), or incorrectly assume that reducing replicas or using follower reads can improve write performance.

How to eliminate wrong answers

Option A is wrong because reducing the number of replicas does not reduce write latency; it may actually increase latency by reducing read availability and fault tolerance, and writes still require leader confirmation. Option B is wrong because directed reads are used to route read requests to the nearest replica for lower read latency, but they do not affect write latency, as writes must still go through the leader. Option D is wrong because follower reads are a read-only feature that allows reads from non-leader replicas; writes cannot be performed on followers, so enabling follower reads for writes is technically invalid.

Full explanation →

383

MCQhard

You are designing a schema for a Cloud SQL for PostgreSQL database that supports full-text search across millions of product descriptions. The application requires fast search results ranked by relevance. Which schema design is most appropriate?

A.Use a tsvector column with a GIN index on that column

B.Use a separate Elasticsearch instance

C.Use a LIKE '%term%' query with a B-tree index

D.Use materialized view with trigram indexes

AnswerA

PostgreSQL full-text search with tsvector/GIN is purpose-built for fast ranked search.

Why this answer

Option A is correct: use a tsvector column with a GIN index, which is PostgreSQL's built-in full-text search feature optimized for ranking and relevance. Option B uses LIKE with wildcards, which is slow and cannot rank. Option C relies on an external service, not a schema design within Cloud SQL.

Option D uses trigram indexes, which support similarity search but not full-text search ranking.

Full explanation →

384

MCQmedium

A Cloud Spanner instance is experiencing high CPU utilization (above 80%) on multiple nodes. The database is used for an e-commerce application with a high volume of read-write transactions. The application uses the googlesql dialect and runs typical OLTP queries. You have already reviewed the query performance and found that most queries are efficient. Which initial step should you take to reduce CPU utilization?

A.Use the INFORMATION_SCHEMA.INDEXES view to identify and drop unused or redundant secondary indexes.

B.Adjust the application to use staleness of 5 seconds for reads to reduce CPU for read-write transactions.

C.Increase the number of nodes in the Spanner instance to spread the CPU load.

D.Create a separate read-only replica pool to offload read traffic.

AnswerA

Unused indexes cause extra write CPU and storage; removing them reduces CPU utilization directly.

Why this answer

Option A is correct because high CPU utilization in Cloud Spanner often stems from excessive secondary index maintenance during write operations. Dropping unused or redundant indexes reduces the write amplification and CPU overhead per transaction, directly lowering CPU usage without compromising query performance, as the queries are already efficient.

Exam trap

Google Cloud often tests the misconception that scaling out (adding nodes) is the first step for performance issues, when in reality, eliminating unnecessary index maintenance is a more cost-effective and direct solution for CPU-bound write-heavy workloads.

How to eliminate wrong answers

Option B is wrong because adjusting staleness to 5 seconds for reads reduces read CPU by allowing stale reads, but the problem states high CPU utilization on multiple nodes with read-write transactions; stale reads do not reduce the CPU cost of write operations or index maintenance, which are the primary drivers. Option C is wrong because increasing nodes spreads the CPU load but does not address the root cause; it may mask the issue and increase costs without resolving the underlying index overhead. Option D is wrong because creating a separate read-only replica pool offloads read traffic, but the high CPU is from read-write transactions, and read-only replicas cannot handle writes; they do not reduce CPU for write-heavy workloads.

Full explanation →

385

MCQmedium

A company uses BigQuery materialized views to pre-aggregate sales data for a BI dashboard. The dashboard requires near-real-time data, but the materialized view currently reflects data up to 30 minutes old. What is the most effective way to reduce the refresh interval without significantly increasing costs?

A.Reduce the max_staleness parameter of the materialized view.

B.Disable automatic refresh and schedule a manual refresh every minute.

C.Use a streaming buffer with the base table to reduce latency.

D.Create additional materialized views with overlapping time windows.

AnswerA

Lower max_staleness forces more frequent refreshes.

Why this answer

Reducing the `max_staleness` parameter directly controls the maximum acceptable age of the data in a BigQuery materialized view. By lowering this value, you force the view to refresh more frequently, achieving near-real-time data without incurring the cost of a full manual refresh or additional streaming infrastructure. This parameter is designed to balance freshness against cost, making it the most effective and efficient solution.

Exam trap

Google Cloud often tests the misconception that reducing staleness requires manual scheduling or additional streaming, when in fact the `max_staleness` parameter is the built-in, cost-effective mechanism for controlling refresh frequency in BigQuery materialized views.

How to eliminate wrong answers

Option B is wrong because disabling automatic refresh and scheduling a manual refresh every minute would significantly increase costs due to repeated full recomputation of the materialized view, and it also introduces operational complexity without leveraging BigQuery's built-in incremental refresh mechanism. Option C is wrong because using a streaming buffer with the base table reduces latency for new data ingestion but does not affect the refresh interval of the materialized view itself; the view still relies on its own staleness setting. Option D is wrong because creating additional materialized views with overlapping time windows does not reduce the refresh interval for any single view; it increases storage and processing costs without improving freshness, as each view would still have its own staleness constraint.

Full explanation →

386

MCQeasy

Refer to the exhibit. What is the effect of the partition_expiration_days option?

A.The table's storage cost is reduced by 365%

B.Queries that reference data older than 365 days will fail

C.Partitions older than 365 days are automatically deleted

D.The table will be partitioned into 365 partitions

AnswerC

The option enables automatic partition expiration, deleting old partitions to free storage.

Why this answer

The `partition_expiration_days` option in BigQuery automatically drops partitions that are older than the specified number of days, reducing storage costs and simplifying lifecycle management. When set to 365, any partition with a date older than 365 days from the current date is deleted by BigQuery's background maintenance process.

Exam trap

Google Cloud often tests the distinction between automatic deletion (expiration) and query failure—candidates mistakenly think expired partitions cause errors, but BigQuery simply treats them as non-existent, returning empty results for those date ranges.

How to eliminate wrong answers

Option A is wrong because storage cost is reduced by the amount of data in expired partitions, not by a fixed percentage like 365%; the percentage depends on the table's total size. Option B is wrong because queries referencing data older than 365 days will simply return no rows from those expired partitions, but the query itself will not fail—it will succeed with an empty result for the expired range. Option D is wrong because the option does not control the number of partitions; it controls the expiration age of partitions, while the number of partitions is determined by the partitioning column's granularity and the data's date range.

Full explanation →

387

MCQhard

A company has a BigQuery table that stores JSON data in a single column. They want to allow BI analysts to query nested fields using standard SQL. What is the best approach to make the data more query-friendly for BI tools?

A.Unnest the JSON into multiple columns using a persistent table with a flattened schema.

B.Use BigQuery's automatic schema detection to infer the structure.

C.Create a view that uses JSON_QUERY and JSON_VALUE functions to expose nested fields as columns.

D.Use the EXTRACT function to parse JSON fields in each query.

AnswerA

A flattened table stores JSON fields as columns once, enabling efficient columnar scanning and BI tool compatibility.

Why this answer

Option A is correct because flattening the JSON into a persistent table with a normalized schema eliminates the need for runtime parsing, allowing BI tools to query nested fields directly with standard SQL. This approach improves query performance by avoiding repeated JSON function calls and enables the use of indexed columns, which is critical for interactive BI workloads.

Exam trap

Google Cloud often tests the misconception that a view or function-based approach is sufficient for performance, when in fact persistent schema flattening is required for BI tools to achieve optimal query performance and schema compatibility.

How to eliminate wrong answers

Option B is wrong because BigQuery's automatic schema detection only works during table creation from external data sources (e.g., Cloud Storage) and cannot retroactively infer or restructure an existing table with a single JSON column. Option C is wrong because a view using JSON_QUERY and JSON_VALUE still requires runtime parsing of the JSON string for every query, which degrades performance and prevents BI tools from leveraging column-level optimizations like partitioning or clustering. Option D is wrong because the EXTRACT function in BigQuery is designed for extracting date/time parts, not for parsing JSON fields; using it would be syntactically incorrect and non-functional.

Full explanation →

388

MCQhard

A company runs a critical application on Cloud SQL for PostgreSQL with a primary instance in us-central1 and a cross-region read replica in us-west1 for disaster recovery. The database engineer is responsible for ensuring that in the event of a regional outage in us-central1, the application can continue with minimal data loss and within 15 minutes of downtime. The application writes about 1000 transactions per second. The current setup has automated backups enabled with point-in-time recovery (7-day retention) and the cross-region replica is configured with asynchronous replication. Which action should the database engineer take to meet the recovery objectives?

A.Promote the cross-region read replica to a new primary and redirect application traffic.

B.Change the cross-region replica to synchronous replication and enable automatic failover.

C.Create a new instance from the latest backup in us-west1 and redirect traffic.

D.Increase automated backup frequency to every hour and ensure binary logging is enabled.

AnswerA

Promoting a read replica is fast (minutes) and meets the 15-minute RTO. Data loss is limited to the replication lag (seconds/minutes).

Why this answer

Option D is correct because using Database Migration Service (DMS) for continuous migration is not needed; instead, promoting the cross-region read replica is the fastest failover method. However, with asynchronous replication, some data loss may occur. To minimize data loss, they should set replication lag alerts and automate failover procedures.

Option A is wrong because backup-based recovery takes longer than 15 minutes for 1TB. Option B is wrong because synchronous replication across regions is not supported in Cloud SQL for PostgreSQL. Option C is wrong because increasing backup frequency doesn't help with fast failover.

Full explanation →

389

Multi-Selectmedium

Your team is designing a schema for Cloud SQL (MySQL) for a content management system. You need to implement full-text search on article content. Which TWO schema design choices are appropriate? (Choose two.)

Select 2 answers

A.Use the LIKE operator with wildcards for pattern matching.

B.Store article content in a Cloud Storage bucket and query metadata.

C.Normalize content into a separate table and use joins.

D.Use Cloud SQL's built-in full-text search feature.

E.Add a FULLTEXT index on the content column.

AnswersD, E

Cloud SQL for MySQL supports full-text search via FULLTEXT indexes and MATCH AGAINST queries.

Why this answer

Options A and D are correct. A FULLTEXT index (A) and MySQL's built-in full-text search feature (D) are two ways to enable full-text search. Option B is normalization, not search.

Option C (LIKE) is inefficient and not full-text. Option E (Cloud Storage) is not a schema design within Cloud SQL.

Full explanation →

390

MCQmedium

A database administrator notices that a Cloud SQL for MySQL instance is experiencing high CPU usage during peak hours. The instance has 4 vCPUs and 15 GB of memory. The query patterns are mostly read-intensive with occasional writes. Which action should the DBA take first to address the high CPU usage?

A.Increase the max_connections flag to allow more concurrent connections

B.Enable read pool to offload read queries

C.Increase the machine type to 8 vCPUs

D.Analyze slow query log and optimize queries

AnswerD

Analyzing slow queries helps identify inefficient SQL that consumes CPU; optimizing is the most effective first step.

Why this answer

High CPU usage in a read-intensive Cloud SQL for MySQL instance is most often caused by inefficient queries that consume excessive CPU cycles. Analyzing the slow query log allows the DBA to identify and optimize these queries, addressing the root cause directly. Increasing resources or changing configuration without understanding the workload can mask the problem and lead to unnecessary costs.

Exam trap

Google Cloud often tests the misconception that scaling up resources is the first troubleshooting step, when in reality, analyzing and optimizing query performance is the most effective initial action for CPU-bound issues in Cloud SQL.

How to eliminate wrong answers

Option A is wrong because increasing max_connections can actually worsen CPU usage by allowing more concurrent queries to compete for CPU resources, potentially increasing contention. Option B is wrong because read pool offloading is a feature for Cloud SQL for PostgreSQL, not MySQL, and MySQL instances use read replicas instead. Option C is wrong because scaling up to 8 vCPUs is a reactive measure that does not address the underlying query inefficiency; it increases cost without guaranteeing performance improvement if the queries are poorly optimized.

Full explanation →

391

MCQeasy

You need to set up a Cloud Monitoring alert for a Cloud Spanner instance to notify when the CPU utilization exceeds a threshold that could indicate performance degradation. What is the recommended CPU utilization threshold for Cloud Spanner?

A.90%

B.40%

C.80%

D.65%

AnswerD

65% is the recommended threshold to maintain performance headroom.

Why this answer

The recommended CPU utilization threshold for Cloud Spanner is 65%. This value is based on Google's best practices, as sustained CPU usage above 65% can lead to increased latency and performance degradation due to queuing and contention. Setting the alert at 65% provides a proactive warning before the instance reaches a critical state, allowing time for scaling or optimization.

Exam trap

Google Cloud often tests the misconception that higher thresholds like 80% or 90% are acceptable for alerting, but Cloud Spanner's distributed architecture requires a lower threshold to account for queuing effects and maintain consistent low latency.

How to eliminate wrong answers

Option A is wrong because 90% is too high; at this level, Cloud Spanner nodes experience significant queuing delays and potential throttling, making it a reactive rather than proactive threshold. Option B is wrong because 40% is too conservative; it would trigger false alarms unnecessarily, as Cloud Spanner is designed to handle moderate CPU loads without performance issues. Option C is wrong because 80% is above the recommended threshold; while it may indicate high utilization, it risks performance degradation before the alert fires, as queuing effects become noticeable above 65%.

Full explanation →

392

MCQeasy

Refer to the exhibit. A BI analyst runs a query to get total sales for the last 7 days. The query filters on sale_date BETWEEN '2023-01-01' AND '2023-01-07'. What is the primary benefit of the partitioning defined in the table?

A.It reduces the amount of data scanned by pruning partitions.

B.It automatically creates indexes on sale_date.

C.It allows the query to use clustering.

D.It enables streaming inserts.

AnswerA

Partition pruning scans only relevant partitions, minimizing data processing.

Why this answer

Partitioning in BigQuery (and similar data warehouses) physically divides the table into segments based on the partition column (sale_date). When the query filters on sale_date BETWEEN '2023-01-01' AND '2023-01-07', the query engine can perform partition pruning, scanning only the partitions that match the date range instead of the entire table. This dramatically reduces the amount of data read, lowering query cost and improving performance.

Exam trap

Google Cloud often tests the distinction between partitioning (which prunes data at the storage level) and clustering (which sorts data within partitions), leading candidates to mistakenly choose clustering as the primary benefit when the question explicitly asks about the partitioning definition.

How to eliminate wrong answers

Option B is wrong because partitioning does not automatically create indexes; BigQuery uses a columnar storage format and does not rely on traditional indexes. Option C is wrong because clustering is a separate feature that co-locates data within partitions based on sort order, but the primary benefit described here is partition pruning, not clustering. Option D is wrong because streaming inserts are a method for ingesting real-time data and are unrelated to the query performance benefit of partition pruning.

Full explanation →

393

Multi-Selecthard

A company's Cloud Spanner database currently uses a regional configuration in us-central1. Due to growth, the database must support global reads with low latency and maintain strong consistency. The database engineer is evaluating options. Which THREE considerations should the engineer include in the design? (Choose three.)

Select 3 answers

A.Use interleaved tables to reduce the number of reads required for hierarchical data.

B.Select a configuration that places the leader region close to the majority of write traffic.

C.Ensure the schema uses primary keys that distribute writes evenly across nodes.

D.Add read replicas in remote regions to serve reads with eventual consistency.

E.Use a multi-region instance configuration that includes multiple read-write regions.

AnswersB, C, E

Leader placement reduces write latency and indirectly benefits read latency.

Why this answer

Option B is correct because in a multi-region Cloud Spanner configuration, the leader region handles all writes and must be placed close to the majority of write traffic to minimize write latency. This ensures strong consistency, as all reads are served from the same leader region by default, and global reads with low latency require careful leader placement to avoid cross-region round trips.

Exam trap

Google Cloud often tests the misconception that read replicas can provide strong consistency, but in Cloud Spanner, read replicas only serve eventually consistent reads, while strong consistency requires contacting the leader region or using a multi-region configuration with read-write regions.

Full explanation →

394

MCQeasy

When designing a schema for a data warehouse in BigQuery, which table type is most cost-effective for storing raw event data that will be queried by date range filters?

A.A partitioned table partitioned by date column

B.A table with integer range partitioning on an ID column

C.A regular table with no partitioning

D.A regular table clustered on timestamp

AnswerA

Only scans partitions matching the date range, minimizing cost.

Why this answer

Option C is correct: a partitioned table partitioned by date limits the data scanned to only the relevant partitions, reducing cost. Option A (regular table) scans all data. Option B (regular table clustered by timestamp) still scans all data.

Option D (integer range partitioning on an ID) is not suitable for date queries and would not limit scans based on date.

Full explanation →

395

MCQeasy

A team is deploying a new application on Google Kubernetes Engine (GKE) that uses Cloud Spanner. They want to minimize latency for read operations. Which Spanner configuration should they use?

A.Use a multi-region configuration with default leader preference set to the region where the application runs.

B.Use a regional instance with read replicas in the same region.

C.Use a single-region instance and configure the leader preference to the application's zone.

D.Use a single-region instance and enable read-only replicas in multiple zones.

AnswerB

Regional instances with read replicas in the same region provide low-latency reads with strong consistency.

Why this answer

Option B is correct because a regional instance with read replicas in the same region provides the lowest read latency for applications running in that region. Cloud Spanner's regional configuration keeps all data and replicas within a single Google Cloud region, minimizing network round-trips. Read replicas in the same region can serve strongly consistent reads without cross-region hops, which is optimal for latency-sensitive workloads.

Exam trap

Google Cloud often tests the misconception that multi-region configurations with leader preference reduce read latency, when in fact leader preference only affects write commit latency, not read latency, and multi-region setups inherently add cross-region latency for reads.

How to eliminate wrong answers

Option A is wrong because multi-region configurations introduce cross-region replication and quorum overhead, which increases read latency compared to a regional setup, even with leader preference set to the application's region. Option C is wrong because a single-region instance with leader preference set to a zone does not add read replicas; leader preference only affects write latency and transaction commit, not read latency. Option D is wrong because read-only replicas in multiple zones within a single-region instance do not reduce read latency for the application; they are used for failover and disaster recovery, not for serving reads with lower latency.

Full explanation →

396

MCQeasy

A developer needs a local development database that mirrors a Cloud SQL instance. What is the best practice?

A.Use Cloud SQL Proxy to connect locally

B.Use Cloud Functions

C.Export data and import to local MySQL

D.Use Cloud SQL's public IP

AnswerC

Exporting the database provides a dump that can be loaded into a local MySQL instance.

Why this answer

Option C is correct because exporting the Cloud SQL instance data (e.g., using `gcloud sql export sql` or `mysqldump`) and importing it into a local MySQL database creates an exact, offline replica of the production schema and data. This allows the developer to work with a full, consistent dataset without network latency, connection overhead, or dependency on Cloud SQL availability, which is the standard best practice for local development mirrors.

Exam trap

The trap here is that candidates confuse 'connecting to a remote database' (Options A and D) with 'creating a local copy,' failing to recognize that a true development mirror must be an offline, independent replica to avoid latency, security, and availability issues.

How to eliminate wrong answers

Option A is wrong because Cloud SQL Proxy is a secure tunnel for connecting to a live Cloud SQL instance over the internet; it does not create a local copy of the database, so the developer remains dependent on network connectivity and the production instance, which defeats the purpose of a local development mirror. Option B is wrong because Cloud Functions are serverless compute units for event-driven code, not a database service or tool for replicating or mirroring database state; they cannot store or serve a local copy of a Cloud SQL database. Option D is wrong because using Cloud SQL's public IP exposes the instance directly to the internet, which is a security risk and still requires a live connection to the remote database, not a local development mirror.

Full explanation →

397

MCQhard

A financial institution uses Cloud SQL for MySQL to handle transaction processing. They need to generate daily BI reports that aggregate millions of transactions per account. The BI queries are CPU-intensive and degrade OLTP performance. What is the most effective solution?

A.Schedule reports during off-peak hours only

B.Create a Cloud SQL read replica and run reports against it

C.Use Cloud SQL's high availability configuration

D.Upgrade the primary instance to a higher machine type

AnswerB

A read replica offloads read queries from the primary, preserving OLTP performance.

Why this answer

Creating a Cloud SQL read replica allows you to offload BI reporting queries to a separate instance that replicates data from the primary using MySQL's asynchronous replication. This isolates the CPU-intensive aggregation queries from the OLTP workload, preventing performance degradation on the primary instance while still providing near-real-time data for reports.

Exam trap

Google Cloud often tests the misconception that high availability (HA) instances can serve read traffic, when in fact the standby in an HA configuration is passive and cannot be used for read offloading.

How to eliminate wrong answers

Option A is wrong because scheduling reports during off-peak hours only reduces contention but does not eliminate the CPU load from the primary instance, which can still impact OLTP performance if reports run concurrently with any other workload. Option C is wrong because Cloud SQL's high availability configuration uses a standby instance in a different zone for failover, not for read scaling; it does not offload query processing and the standby cannot serve read traffic. Option D is wrong because upgrading the primary instance to a higher machine type increases capacity but does not isolate the BI workload, so CPU-intensive queries will still compete with OLTP transactions for resources on the same instance.

Full explanation →

398

MCQeasy

You have a Memorystore for Redis instance used as a session store. You notice that the instance is experiencing high eviction rates. What is the best first step to take?

A.Increase the instance size or set a TTL policy on session keys.

B.Monitor memory usage but take no action.

C.Add a read replica to offload read traffic.

D.Enable persistence (AOF or RDB) to reduce memory usage.

AnswerA

More memory or key expiration reduces evictions.

Why this answer

High eviction rates in Memorystore for Redis indicate that the instance is running out of memory and the Redis `maxmemory-policy` is actively removing keys. The best first step is to either increase the instance size to provide more memory or set a TTL (Time-To-Live) policy on session keys so that expired sessions are cleaned up proactively, reducing memory pressure and evictions.

Exam trap

Google Cloud often tests the misconception that persistence (AOF/RDB) frees memory, but persistence only affects durability, not memory usage, and candidates may confuse read replicas as a solution for memory pressure rather than read throughput.

How to eliminate wrong answers

Option B is wrong because monitoring without action does not resolve the high eviction rate, which can degrade session store performance and cause data loss. Option C is wrong because adding a read replica does not increase the primary instance's memory capacity or reduce evictions; replicas are for read scaling and high availability, not for alleviating memory pressure on the primary. Option D is wrong because enabling persistence (AOF or RDB) does not reduce memory usage; it writes data to disk but the dataset still resides in memory, so evictions will continue at the same rate.

Full explanation →

399

MCQhard

A BI dashboard query is taking too long because it reads all columns from a large table. The dashboard only needs a few columns. What is the best practice?

A.Create a view that selects specific columns.

B.Create a table with only the needed columns.

C.Use a subquery to filter columns in the FROM clause.

D.Use a LIMIT clause to reduce rows.

AnswerA

Views with column selection allow column pruning.

Why this answer

Creating a view that selects specific columns is the best practice because it allows the BI dashboard to query only the necessary columns without altering the underlying table structure. Views provide a logical abstraction layer, enabling column pruning at the query level while preserving data integrity and access control. This approach reduces I/O and memory consumption by avoiding full table scans on unnecessary columns, directly addressing the performance bottleneck.

Exam trap

Google Cloud often tests the misconception that a subquery or LIMIT can optimize column-level performance, when in fact they only affect row filtering or query structure, not the column scan width.

How to eliminate wrong answers

Option B is wrong because creating a separate table duplicates data, leading to storage overhead, synchronization issues, and potential data staleness; it violates normalization principles and increases maintenance complexity. Option C is wrong because a subquery in the FROM clause does not inherently reduce column reads; the outer query still processes all columns from the subquery unless explicitly pruned, and it may not optimize execution plans as effectively as a view. Option D is wrong because a LIMIT clause restricts rows, not columns; it does not reduce the amount of data read per row, so the query still scans all columns from the large table, failing to address the root cause of slow performance.

Full explanation →

400

Multi-Selecthard

Which THREE of the following are best practices for designing BigQuery tables for business intelligence reporting?

Select 3 answers

A.Partition tables by a date or timestamp column used in WHERE clauses.

B.Store data in many small tables to reduce the amount of data scanned per query.

C.Normalize data to reduce data redundancy.

D.Use nested repeated columns to store arrays of related data.

E.Cluster tables on columns that are frequently used in filters or group by clauses.

AnswersA, D, E

Partitioning limits scanned data and reduces costs.

Why this answer

Partitioning tables by a date or timestamp column used in WHERE clauses allows BigQuery to prune partitions, scanning only the relevant data instead of the entire table. This reduces query costs and improves performance, making it a best practice for BI reporting where queries often filter by time ranges.

Exam trap

Google Cloud often tests the misconception that normalization or many small tables are best for BigQuery, when in fact denormalization and larger, partitioned/clustered tables are optimal for BI workloads due to BigQuery's distributed architecture and pricing model.

Full explanation →

401

MCQmedium

A Cloud SQL instance is using InnoDB and has a large buffer pool. The query performance is slower after a failover. What is the most likely cause?

A.Read replica lag

B.Buffer pool warm-up time

C.Binary log not enabled

D.Data corruption

AnswerB

The new instance starts with an empty buffer pool, so queries initially incur higher I/O.

Why this answer

After a failover in Cloud SQL, the new primary instance starts with a cold buffer pool. InnoDB relies on the buffer pool to cache data and index pages in memory for fast queries. Since the buffer pool is empty after the failover, queries must read from disk until the cache warms up, causing significantly slower performance.

Exam trap

Google Cloud often tests the misconception that failover performance issues are due to replication lag or binary log settings, when the real cause is the cold buffer pool requiring disk reads until it warms up.

How to eliminate wrong answers

Option A is wrong because read replica lag affects replicas, not the primary instance after a failover; the failover promotes a replica to primary, and lag would have been caught before promotion. Option C is wrong because binary log is used for replication and point-in-time recovery, not for query performance; disabling it would not cause post-failover slowdown. Option D is wrong because data corruption would cause errors or crashes, not a gradual performance degradation; Cloud SQL automatically checks for corruption during failover.

Full explanation →

402

MCQmedium

A company stores user events in BigQuery as nested repeated fields. They want to use Looker to build dashboards on individual events. Which SQL pattern should they use in a derived table to flatten the data?

A.SELECT fields FROM table WHERE events IS NOT NULL

B.SELECT fields FROM table, UNNEST(events) AS event

C.SELECT ARRAY_AGG(events) FROM table

D.SELECT events.* FROM table

AnswerB

CROSS JOIN UNNEST flattens the events array into rows, allowing access to event fields.

Why this answer

Option B is correct because UNNEST(events) in BigQuery SQL flattens the nested repeated field 'events' into individual rows, enabling Looker to treat each event as a separate record for dashboarding. This is the standard pattern for denormalizing arrays in BigQuery derived tables, as it converts each array element into its own row while preserving the parent record's fields.

Exam trap

Google Cloud often tests the misconception that simply selecting the nested field (option D) or filtering it (option A) will flatten the data, when in fact only UNNEST (or explicit CROSS JOIN UNNEST) achieves row-level expansion in BigQuery SQL.

How to eliminate wrong answers

Option A is wrong because WHERE events IS NOT NULL does not flatten nested repeated fields; it only filters rows where the entire 'events' array is non-null, leaving the nested structure intact and unusable for per-event analysis. Option C is wrong because ARRAY_AGG(events) does the opposite of flattening—it aggregates rows into an array, which would further nest the data and break the per-event requirement. Option D is wrong because SELECT events.* from table attempts to select all fields from the 'events' record, but without UNNEST, BigQuery treats 'events' as a single array column, causing a syntax error or returning the array as a whole, not individual event rows.

Full explanation →

403

Multi-Selecthard

A company is migrating a large Oracle database to Cloud Spanner. The schema includes several tables with foreign key relationships. The team wants to minimize query latency for join queries that always involve a parent table and its children. Which THREE schema design strategies should the team consider? (Choose THREE.)

Select 3 answers

A.Design child table primary keys to start with the parent key (e.g., CustomerId, OrderId)

B.Denormalize frequently joined lookup tables into the parent table as repeated fields

C.Use parent-child interleaved tables where the child table's primary key includes the parent's primary key

D.Create secondary indexes on foreign key columns

E.Store foreign key relationships as JSON arrays in the parent table

AnswersA, B, C

Enables interleaving and efficient queries.

Why this answer

Option A is correct because in Cloud Spanner, designing child table primary keys to start with the parent key (e.g., CustomerId, OrderId) enables efficient key-range scans and reduces the number of splits needed for join queries. This pattern leverages Spanner's distributed architecture to colocate related rows, minimizing cross-node communication and query latency.

Exam trap

Google Cloud often tests the misconception that secondary indexes alone can optimize join performance in distributed databases, but in Spanner, physical colocation via interleaved tables is the key to minimizing query latency for parent-child joins.

Full explanation →

404

MCQeasy

Refer to the exhibit. What is the most effective optimization for this query?

A.Increase the instance memory to 30 GB

B.Create a composite index on (status, order_date)

C.Remove the WHERE clause and fetch all rows in application

D.Partition the orders table by month

AnswerB

Index allows efficient range scan and filter.

Why this answer

The query filters on `status` and `order_date`, so a composite index on `(status, order_date)` allows the database to perform an index seek on the equality predicate (`status`) and then a range scan on the ordered column (`order_date`), avoiding a full table scan. This is the most effective optimization because it directly supports the WHERE clause with minimal I/O and no sorting overhead.

Exam trap

Google Cloud often tests the misconception that partitioning alone improves query performance, but without a supporting index, partitioning only reduces the scan scope to a subset of partitions and does not eliminate the need for a full scan within those partitions.

How to eliminate wrong answers

Option A is wrong because increasing instance memory to 30 GB does not address the lack of an appropriate index; it may reduce buffer pool misses but cannot eliminate the need for a full table scan on a large table. Option C is wrong because removing the WHERE clause and fetching all rows in the application would transfer massive amounts of data over the network and force client-side filtering, which is far less efficient than letting the database engine use an index. Option D is wrong because partitioning the table by month does not automatically create an index on `status` and `order_date`; while partition pruning might help, without a proper index the query would still scan all rows in the relevant partitions.

Full explanation →

405

Multi-Selectmedium

A team is designing a Cloud SQL for PostgreSQL schema for a multi-tenant SaaS application. They need to isolate tenant data while maintaining query performance and manageability. Which two approaches are appropriate? (Choose two.)

Select 2 answers

A.Use separate databases per tenant.

B.Use a single schema with a tenant_id column on every table and row-level security.

C.Use a single table for all tenants with no tenant identifier.

D.Use a separate Cloud SQL instance per tenant.

E.Use separate schemas per tenant.

AnswersB, E

Row-level security enforces tenant isolation while keeping a single schema.

Why this answer

Separate schemas per tenant (B) provides logical isolation and easy backup/restore. Single schema with tenant_id and row-level security (C) is a standard multi-tenancy pattern. Options A and D are too costly.

Option E offers no isolation.

Full explanation →

406

MCQmedium

A data analytics team runs ad-hoc queries on BigQuery that often exceed their slot capacity, causing queuing. They want to ensure predictable performance for their critical dashboard while still allowing ad-hoc queries. What is the most cost-effective solution?

A.Create a separate BigQuery reservation for the dashboard with a fixed number of slots, and let ad-hoc queries use on-demand pricing.

B.Switch all queries to on-demand pricing; the dashboard will automatically get priority.

C.Use a single reservation with a baseline of slots for the dashboard (top priority), and allow ad-hoc queries to use idle slots.

D.Move the data to a different BigQuery region with more slot availability.

AnswerC

A baseline guarantees slots for the dashboard, and idle slots are available for ad-hoc queries.

Why this answer

Setting a baseline number of slots for the dashboard guarantees resources, while allowing idle slots to be used by ad-hoc queries. Adding a reservation for only the dashboard with a separate project would waste slots; converting to on-demand is unpredictable; changing the BQ location does not affect slots.

Full explanation →

407

MCQeasy

Refer to the exhibit. Given the table definition and two queries, which statement about query performance is correct?

A.Query 1 will scan less data than Query 2 because it uses both partition pruning and clustering.

B.Query 2 will scan less data than Query 1 because it only needs to read one partition.

C.Query 1 will scan the same amount of data as Query 2 because both use partition pruning.

D.Both queries will perform a full table scan because the table is partitioned.

AnswerA

Query 1 filters on partition column and cluster column, enabling both pruning and block elimination.

Why this answer

Query 1 uses both partition pruning (filtering on the partition key `event_date`) and clustering (filtering on the clustering column `user_id`), allowing it to skip irrelevant partitions and scan only the specific rows within the target partition. Query 2 uses only partition pruning on `event_date` but lacks a clustering filter, so it must scan all rows in the partition. Therefore, Query 1 scans less data than Query 2.

Exam trap

Google Cloud often tests the misconception that partition pruning alone is sufficient for optimal performance, ignoring that clustering further reduces data scanned within a partition when filters on clustering columns are present.

How to eliminate wrong answers

Option B is wrong because Query 2 does not scan less data than Query 1; it scans more data within the same partition because it lacks a clustering filter. Option C is wrong because the two queries do not scan the same amount of data; Query 1 benefits from both partition pruning and clustering, reducing the scan further. Option D is wrong because both queries use partition pruning on `event_date`, so they do not perform a full table scan; they only scan the relevant partition(s).

Full explanation →

408

MCQmedium

You are responsible for a Cloud SQL for MySQL instance that supports a content management system (CMS). The application frequently performs SELECT queries with ORDER BY and LIMIT. Recently, the response time for these queries has increased. The database has 4 vCPUs and 15 GB memory. You check the slow query log and find many queries that are taking over 1 second. The 'rows_examined' is much higher than 'rows_sent'. The EXPLAIN plan shows 'Using filesort' and 'Using temporary'. There is currently an index on the column used in the WHERE clause but not on the ORDER BY columns. The table has 5 million rows. What should you do to improve query performance?

A.Increase the buffer pool size to 80% of memory.

B.Disable the query cache to reduce overhead.

C.Remove the ORDER BY clause and sort the results in application code.

D.Add a composite index on the columns used in the WHERE clause and the ORDER BY clause.

AnswerD

A covering index eliminates sorting and temporary table usage.

Why this answer

Option D is correct because adding a composite index on the columns used in the WHERE clause and the ORDER BY clause allows MySQL to avoid the expensive 'Using filesort' and 'Using temporary' operations. With a covering index, the database can retrieve rows in the required order directly from the index, eliminating the need to sort the result set after filtering. This dramatically reduces 'rows_examined' and improves query response time for SELECT queries with ORDER BY and LIMIT.

Exam trap

Google Cloud often tests the misconception that adding more memory or disabling features like the query cache can solve performance issues, when the real problem is a missing or poorly designed index that forces filesort and temporary tables.

How to eliminate wrong answers

Option A is wrong because increasing the buffer pool size (InnoDB buffer pool) does not address the root cause of filesort and temporary table usage; it only caches more data in memory, which may reduce disk I/O but does not eliminate the sorting overhead. Option B is wrong because disabling the query cache (which is deprecated in MySQL 8.0 and removed in 8.0+) does not affect queries that perform sorting; the query cache is only useful for identical SELECT statements and does not help with ORDER BY performance. Option C is wrong because removing the ORDER BY clause and sorting in application code shifts the sorting burden to the application server, which may still be inefficient and does not reduce the number of rows examined by the database; it also breaks the semantics of the query if the application relies on database-side ordering for pagination or consistency.

Full explanation →

409

Multi-Selecthard

Which TWO of the following are valid approaches when troubleshooting a slow BI query in BigQuery that includes a complex JOIN between a large fact table and multiple dimension tables?

Select 2 answers

A.Ensure the fact table is clustered on the join key

B.Split the fact table into multiple smaller tables by region

C.Filter the fact table before the JOIN to reduce the number of rows

D.Move the data to Cloud SQL for faster joins

E.Add indexes on the join columns

AnswersA, C

Clustering improves join efficiency by colocating data.

Why this answer

Option A is correct because clustering on the join key in BigQuery physically co-locates rows with the same key value within the same block, reducing the amount of data scanned during the JOIN. This is especially effective for large fact tables, as it minimizes the need to shuffle data across slots, directly improving query performance.

Exam trap

The trap here is that candidates familiar with traditional databases may assume indexes (Option E) or moving to an OLTP system (Option D) are valid optimizations, but BigQuery's serverless, columnar architecture requires different techniques like clustering and predicate pushdown.

Full explanation →

410

Drag & Dropmedium

Arrange the steps to create and connect to a Cloud SQL for PostgreSQL instance using the gcloud command-line tool.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

After creating the instance, you set the password, configure network access, then connect via psql.

Full explanation →

411

MCQmedium

You are running a production workload on Cloud Bigtable and notice that read latency has increased. Upon reviewing the monitoring dashboard, you see that CPU utilization is below 50% but the number of active tablets is high. What is the most likely cause of the increased read latency?

A.Read requests are being throttled due to exceeding IOPS limits.

B.There are too many tablets, causing increased metadata operations and slower reads.

C.A hot node is throttling read requests.

D.The cluster is underprovisioned, causing resource contention.

AnswerB

Excessive tablets increase the overhead of metadata lookups and tablet splitting, leading to higher latency.

Why this answer

In Cloud Bigtable, each tablet is a contiguous range of rows managed by a tablet server. When the number of active tablets is high, the tablet server must perform more metadata operations (e.g., splitting, merging, and serving multiple tablets) which increases per-request overhead and can degrade read latency. This is true even when CPU utilization is below 50%, because the overhead is not purely CPU-bound but involves increased I/O and coordination.

Exam trap

Google Cloud often tests the misconception that high tablet count is always beneficial for parallelism, when in fact it can degrade performance due to metadata overhead, especially when CPU is not the bottleneck.

How to eliminate wrong answers

Option A is wrong because Cloud Bigtable does not enforce a hard IOPS limit; it scales with the number of nodes, and throttling would typically manifest as increased error rates or retries, not simply increased latency with low CPU. Option C is wrong because a hot node would cause high CPU utilization on that node, not low overall CPU, and throttling would be localized to that node's requests. Option D is wrong because underprovisioning would lead to high CPU utilization and resource contention across the cluster, not low CPU with a high tablet count.

Full explanation →

412

Multi-Selectmedium

A company plans to migrate an on-premises MySQL database to Cloud SQL. Which THREE steps should they include in their migration plan?

Select 3 answers

A.Test application compatibility with Cloud SQL.

B.Connect to Cloud SQL via Database Migration Service.

C.Convert all stored procedures to PostgreSQL dialect.

D.Determine whether to use private or public IP.

E.Enable point-in-time recovery before migration.

AnswersA, B, D

Ensure the application works with Cloud SQL's MySQL version and configuration to avoid surprises.

Why this answer

Option A is correct because testing application compatibility with Cloud SQL ensures that any MySQL-specific features, configurations, or behaviors used by the application are supported in the Cloud SQL environment. This step is critical to identify potential issues early, such as unsupported storage engines, character set differences, or version-specific SQL syntax, before committing to the full migration.

Exam trap

The trap here is that candidates may confuse the need to convert stored procedures when migrating between different database engines (e.g., MySQL to PostgreSQL) with a homogeneous MySQL-to-Cloud SQL migration, where no dialect conversion is required.

Full explanation →

413

MCQmedium

A retail company uses Cloud SQL for PostgreSQL for inventory management. The schema has a table 'inventory' with columns: product_id, warehouse_id, quantity, last_updated. The table contains over 100 million rows. The application frequently runs aggregate queries to compute total quantity of a product across all warehouses (e.g., SELECT SUM(quantity) FROM inventory WHERE product_id = ?). These queries are slow, taking tens of seconds. The team tries a covering index on (product_id, quantity) but sees little improvement because they still need to scan many rows. They need to redesign the schema to improve aggregation performance. What is the best approach?

A.Add a covering index on (product_id, quantity).

B.Migrate the inventory table to Cloud Spanner and use interleaved indexes.

C.Use BigQuery as a read replica and query there.

D.Create a summary table 'product_totals' with columns product_id and total_quantity, and use triggers to keep it updated on INSERT/UPDATE/DELETE in inventory.

AnswerD

Pre-aggregation reduces the amount of work needed at query time.

Why this answer

Option A is correct. Creating a summary table that pre-aggregates totals per product, updated via triggers on the inventory table, dramatically speeds up the aggregate queries. Option B (covering index) helps but still requires scanning many rows.

Option C (Spanner) is a migration to a different database. Option D (BigQuery) is external and not a schema change.

Full explanation →

414

Multi-Selectmedium

A company is migrating an on-premises PostgreSQL database to Cloud SQL. They need to minimize downtime during migration. Which TWO steps should they take?

Select 2 answers

A.Set up a read replica for switchover

B.Use pg_dump and restore

C.Enable automatic backups

D.Configure a high-availability instance

E.Use Database Migration Service with continuous replication

AnswersA, E

Correct: A read replica can be promoted to primary with minimal downtime, enabling a quick cutover.

Why this answer

Option A (Database Migration Service with continuous replication) minimizes downtime by syncing changes in real time. Option C (Set up a read replica) allows a near-instantaneous cutover by promoting the replica. Option B (automatic backups) does not reduce downtime.

Option D (pg_dump) involves downtime. Option E (HA instance) is for post-migration availability, not migration steps.

Full explanation →

415

MCQeasy

A company uses BigQuery for BI dashboards. Users report that queries on the sales table take longer than expected. The table contains daily transaction data and is not partitioned. Which action will most improve query performance while minimizing cost?

A.Increase the BigQuery reservation slot count

B.Partition the table by the transaction date column

C.Cluster the table by the transaction date column

D.Denormalize the table by including dimension attributes

AnswerB

Partitioning limits data scanned to relevant partitions, improving performance and reducing cost.

Why this answer

Partitioning by date reduces the data scanned per query, improving performance and cost. Clustering alone may not reduce scanned bytes as effectively. Denormalization can help but may increase storage costs.

Increasing reservation slots increases cost without optimizing the query.

Full explanation →

416

MCQhard

Refer to the exhibit. The team notices high write latency on the Events table. They are inserting 1,000 events per second. The EventId is generated by a sequence. What is the most likely issue?

A.The sequential primary key creates a hotspot on a single split.

B.The allow_commit_timestamp option on CreatedAt column adds overhead.

C.The BYTES(MAX) data type causes excessive writing.

D.The node count is insufficient for the write throughput.

AnswerA

Sequential keys cause all writes to hit the same split, leading to contention and latency.

Why this answer

Option B is correct because using a sequential integer as primary key causes hotspotting on the last split, as all new writes go to the same tablet. Option A is wrong because 2000 processing units (equivalent to 2 nodes) can handle 1k writes/s if distributed. Option C is wrong because BYTES(MAX) may increase size but not the primary cause of latency.

Option D is wrong because commit timestamp option does not cause hotspotting.

Full explanation →

417

MCQhard

A financial services company uses Cloud Spanner for a global transaction processing system. They notice that certain read queries on a table with frequent writes are returning stale data even though they use strong reads. The table has a primary key of (user_id, transaction_id) and a secondary index on (timestamp). What is the most likely cause of the stale reads?

A.The query is using a stale read timestamp.

B.The query is using a secondary index that has not yet been updated with the latest write.

C.The query is reading from a read-only replica.

D.Cloud Spanner is using eventual consistency for this query.

AnswerB

Secondary indexes can lag behind the base table; a strong read on the index may return stale data if the write committed after the index was last updated.

Why this answer

Option B is correct because in Cloud Spanner, secondary indexes are implemented as separate tables that are updated asynchronously relative to the base table. When a strong read uses a secondary index, the read may still see a stale version of the index if the write has not yet been fully replicated to the index table. This is a known behavior: strong reads guarantee consistency only when reading from the base table using the primary key, not when using a secondary index.

Exam trap

The trap here is that candidates assume 'strong reads' guarantee consistency for all queries, but Cloud Spanner's strong consistency guarantee applies only to reads that use the primary key; secondary index reads may return stale data because the index is updated asynchronously.

How to eliminate wrong answers

Option A is wrong because the question explicitly states that strong reads are used, which means the read timestamp is automatically set to the current timestamp, not a stale one. Option C is wrong because Cloud Spanner does not have read-only replicas; all replicas can serve reads, but strong reads are always served from the leader replica, so reading from a non-leader replica would not occur with strong reads. Option D is wrong because Cloud Spanner provides strong consistency for all reads by default; eventual consistency is not a mode that can be selected, and the issue is specific to secondary index staleness, not a general consistency model.

Full explanation →

418

MCQmedium

A Cloud SQL for PostgreSQL instance is experiencing high CPU usage during peak hours. Query Insights shows that a complex reporting query is causing full table scans on a large table. The query filters on a column used in JOINs. Which optimization should be applied first?

A.Increase the instance size.

B.Create a read replica for reporting.

C.Use query rewriting with materialized views.

D.Add a composite index on the filtered column and join columns.

AnswerD

Adding an index on the filtered and join columns allows the query to use index seek instead of full table scan, reducing CPU usage.

Why this answer

Option D is correct because adding a composite index on the filtered column and the join columns directly addresses the root cause of the full table scans. Query Insights indicates the query filters on a column used in JOINs; a composite index covering both the filter and join columns allows PostgreSQL to perform an Index Scan instead of a sequential scan, reducing CPU usage without requiring additional infrastructure or data duplication.

Exam trap

Google Cloud often tests the misconception that scaling up or offloading is the first optimization step, when in fact index tuning is the cheapest and most effective initial action for query performance issues caused by full table scans.

How to eliminate wrong answers

Option A is wrong because increasing the instance size (vertical scaling) only masks the symptom of high CPU usage without fixing the inefficient query plan; the full table scans will continue to consume resources, and costs increase without performance guarantee. Option B is wrong because creating a read replica for reporting offloads the query to another instance but does not eliminate the full table scan on the replica; the same inefficient query will still cause high CPU on the replica. Option C is wrong because query rewriting with materialized views pre-computes and stores the result set, which can improve performance for repeated complex queries, but it does not address the immediate full table scan caused by missing indexes; materialized views also require maintenance and may become stale.

Full explanation →

419

MCQeasy

A data engineer needs to design a table to store time-series sensor data arriving every second. The data will be queried mainly for the last hour over a specific device. Which table design minimizes query costs?

A.Partition by ingestion_time, cluster by timestamp

B.Partition by ingestion_time, no clustering

C.No partitioning, cluster by device_id

D.Partition by ingestion_time, cluster by device_id

AnswerD

Partitioning enables time-range pruning; clustering on device_id speeds up per-device lookups.

Why this answer

Option D minimizes query costs because partitioning by ingestion_time allows the query engine to skip partitions outside the last hour, while clustering by device_id further narrows the scan to only the relevant device's data within those partitions. This combination reduces the amount of data read and the number of files scanned, which is critical for high-frequency time-series data.

Exam trap

Google Cloud often tests the misconception that clustering by the same column as partitioning provides extra benefit, but in reality it is redundant and can increase maintenance overhead without improving query performance.

How to eliminate wrong answers

Option A is wrong because clustering by timestamp within a partition by ingestion_time is redundant—since the partition already organizes data by time, clustering by the same column adds no additional pruning benefit and wastes clustering resources. Option B is wrong because without clustering, queries filtering on device_id must scan all rows in the relevant partitions, leading to full partition scans and higher query costs. Option C is wrong because no partitioning means every query must scan the entire table, even when filtering on the last hour, resulting in maximum data read and cost.

Full explanation →

420

MCQhard

A social media platform uses Cloud SQL for PostgreSQL for its user and post data. The schema has a normalized design with separate 'users' and 'posts' tables. Queries that fetch a user's timeline (joining users and posts) are slow due to heavy read volume. The team wants to optimize the schema for this read-heavy workload without changing the application logic significantly. What schema design change is most appropriate?

A.Migrate to a NoSQL database like Firestore for better read performance.

B.Create a materialized view that joins users and posts, refreshed periodically.

C.Add GIN indexes on the posts table for faster full-text search.

D.Denormalize by embedding commonly accessed user fields (e.g., username, avatar URL) into the posts table.

AnswerD

Denormalization reduces joins, improving read performance for read-heavy workloads.

Why this answer

Option D is correct because denormalizing by storing relevant user data (e.g., username, avatar) directly in the posts table reduces the need for JOINs, significantly improving read performance. Option A (materialized view) could help but may introduce staleness and overhead; Option B (NoSQL) is a major architectural change; Option C (GIN indexes) are for full-text search, not join performance.

Full explanation →

421

Multi-Selecteasy

A company wants to create a BI dashboard that shows daily active users. The data is stored in a BigQuery table with columns: user_id, activity_date, and event_type. Which two optimizations would help reduce query costs? (Choose two.)

Select 2 answers

A.Cluster the table by event_type.

B.Use SELECT * and filter in the BI tool.

C.Use a materialized view with COUNT(DISTINCT user_id) grouped by activity_date.

D.Avoid using the LIMIT clause.

E.Partition the table by activity_date.

AnswersC, E

A materialized view caches the aggregation, avoiding repeated computation.

Why this answer

Option C is correct because a materialized view precomputes the COUNT(DISTINCT user_id) grouped by activity_date, so queries against it read only the pre-aggregated results rather than scanning the entire base table. This drastically reduces the amount of data processed, lowering query costs in BigQuery's on-demand pricing model where cost is proportional to bytes processed.

Exam trap

Google Cloud often tests the misconception that clustering alone reduces query cost for any aggregation, but clustering only reduces cost when the query filters or groups by the cluster key, not when the aggregation is on a different column like activity_date.

Full explanation →

422

MCQhard

A company's BI dashboard queries a BigQuery table that is 20 TB and uses clustering on date and country. The query filters on date and country and also aggregates by category. The query takes 30 seconds. They want to reduce latency to under 5 seconds. What should they do?

A.Partition the table by date.

B.Add clustering by category.

C.Increase query priority.

D.Create a materialized view that aggregates by date, country, and category.

AnswerD

Materialized view stores the aggregated result, so query scans only the view.

Why this answer

The correct answer is D because a materialized view precomputes and stores the aggregation by date, country, and category, eliminating the need to scan the full 20 TB table on every query. This reduces query latency dramatically by serving pre-aggregated results, directly addressing the filter and aggregation requirements. Partitioning or clustering alone cannot achieve sub-5-second latency on a 20 TB table because they still require scanning all matching partitions or clusters and performing the aggregation at query time.

Exam trap

The trap here is that candidates often assume partitioning or clustering alone can achieve drastic latency reductions, but they overlook that aggregation over a large dataset still requires significant computation, whereas a materialized view precomputes the result, which is the only way to guarantee sub-5-second latency for this workload.

How to eliminate wrong answers

Option A is wrong because partitioning by date only limits the scan to the relevant date range, but the query still must aggregate 20 TB of data across all countries and categories, which cannot reduce latency to under 5 seconds. Option B is wrong because adding clustering by category improves the efficiency of the aggregation step by co-locating data, but it does not precompute the aggregation; the query still must scan and aggregate all rows in the filtered partition, which is too slow for a 20 TB table. Option C is wrong because increasing query priority does not change the amount of data scanned or the computational work required; it only affects scheduling and resource allocation, not the fundamental latency of scanning and aggregating 20 TB.

Full explanation →

423

MCQmedium

A Firestore application stores user profiles that must be queried by any of multiple attributes (age, city, last_login). What is the best schema design to support these queries efficiently?

A.Store attributes in an array field and query with array-contains

B.Create a composite index on the attributes in a single collection

C.Use subcollections per attribute value

D.Create separate documents for each attribute value

AnswerB

Composite indexes enable efficient multi-attribute queries in Firestore.

Why this answer

Option A is correct: a composite index on the attributes allows Firestore to serve queries without collection scans. Option B (denormalized arrays) is inefficient for filtering. Option C (subcollections) adds complexity and may require more reads.

Option D (separate documents per attribute) is not practical.

Full explanation →

424

MCQhard

A Cloud Spanner instance is experiencing increased latency during peak hours. Monitoring shows CPU utilization nearing 70%. How should they scale?

A.Add more nodes.

B.Change to a higher-tier machine type.

C.Increase the number of splits.

D.Add more processing units.

AnswerA

Adding nodes increases CPU capacity and reduces latency due to high CPU.

Full explanation →

425

MCQeasy

Refer to the exhibit. The BI team creates a view to summarize sales. When they query the view with an additional WHERE clause on region, they notice that the underlying query still processes the same amount of data regardless of the filter. What is the most likely reason?

A.The view is a materialized view that refreshes every 30 minutes.

B.The view's WHERE clause on date is too restrictive, causing a full scan.

C.The view uses authorized views, which prevent predicate pushdown.

D.The view is a logical view, not a materialized view, so filters on the view do not reduce the scanned data.

AnswerD

Logical views execute the defining query each time; filters are applied after the view query.

Why this answer

Option A is correct because a logical view (standard view) does not materialize data; the query runs each time, and the outer filter does not push down into the view's WHERE clause. Option B is wrong because the view is not a materialized view. Option C is wrong because the view is standard, not authorized.

Option D is wrong because the date filter is in the view definition; the outer filter on region does not reduce processing.

Full explanation →

426

Multi-Selectmedium

A company uses BigQuery for BI analytics. They want to improve query performance for a table with 10 TB of data. Which two actions should they take? (Choose two.)

Select 2 answers

A.Limit the number of columns queried using SELECT * with EXCEPT.

B.Use a wildcard table to combine multiple tables.

C.Partition by a column with a high granularity.

D.Cluster on columns used in filters and aggregations.

E.Use a clustered column as the partition key.

AnswersA, D

Reducing columns scanned decreases processed bytes and cost.

Why this answer

Option A is correct because using SELECT * with EXCEPT limits the number of columns scanned, reducing I/O and improving query performance in BigQuery. BigQuery charges by the amount of data processed, so reading fewer columns directly lowers both cost and query execution time.

Exam trap

Google Cloud often tests the distinction between partitioning and clustering, where candidates mistakenly think that high-granularity partitioning or using a clustered column as a partition key improves performance, when in fact it introduces overhead and defeats the purpose of each feature.

Full explanation →

427

MCQhard

A developer reports that an application cannot connect to a Cloud SQL SQL Server instance. The error log shows the message in the exhibit. The instance exists and the user credentials are correct. What is the most likely cause?

A.The Cloud SQL instance has reached its maximum number of connections.

B.The database name specified in the connection string is incorrect.

C.The Cloud SQL proxy is not running.

D.The Cloud SQL instance is not in the same VPC network as the application.

AnswerB

This error commonly occurs when the database name is misspelled or does not exist.

Why this answer

The error message in the exhibit indicates that the login failed for the user, which is a common symptom when the database name in the connection string does not match an existing database on the Cloud SQL SQL Server instance. Even though the user credentials are correct, SQL Server requires a valid database context to establish the connection; an incorrect database name causes the server to reject the login attempt. This is a configuration issue, not an authentication or network problem.

Exam trap

Google Cloud often tests the distinction between authentication errors and database context errors, leading candidates to incorrectly blame network or proxy issues when the actual problem is a simple misconfiguration in the connection string's database name.

How to eliminate wrong answers

Option A is wrong because reaching the maximum number of connections would produce a different error, such as 'Cannot open server connection' or 'Connection limit exceeded', not a login failure for a specific database. Option C is wrong because if the Cloud SQL proxy were not running, the application would not be able to reach the Cloud SQL instance at all, resulting in a network timeout or connection refused error, not a SQL Server login error. Option D is wrong because if the instance were not in the same VPC network, the application would experience a network connectivity failure (e.g., timeout or unreachable host), not a SQL Server authentication error that includes a database name reference.

Full explanation →

428

MCQhard

Your company runs a large e-commerce application on Google Cloud using Cloud SQL for MySQL (version 8.0) with 2 TB of data. The database experiences intermittent performance degradation during peak hours (10am-2pm). Cloud Monitoring shows a spike in CPU utilization to 90% and increased query latency. The database has been running for 6 months with default settings. You notice many slow queries like "SELECT * FROM orders WHERE customer_id=12345 ORDER BY order_date DESC LIMIT 10" that take 5-10 seconds. The orders table has 50 million rows, customer_id has a B-tree index, and order_date is not indexed. The query execution plan indicates a full table scan and a filesort. What is the most effective course of action to resolve the performance issue?

A.Add a composite index on (customer_id, order_date)

B.Create multiple read replicas to offload read traffic

C.Partition the orders table by month using range partitioning

D.Increase the memory size of the Cloud SQL instance to 30 GB

AnswerA

A composite index on both columns enables the query to use index for filtering and sorting, eliminating the full table scan and filesort.

Why this answer

The slow query uses a WHERE clause on customer_id (which is indexed) and an ORDER BY on order_date (not indexed). The index on customer_id alone is insufficient because the query still requires sorting, leading to a filesort. Adding a composite index on (customer_id, order_date) allows the database to retrieve rows for a specific customer in sorted order without a full scan or filesort.

Option B (increasing memory) may help but does not address the root cause. Option C (read replicas) offloads read traffic but does not fix the query plan. Option D (partitioning) might help with data management but is not as direct or efficient as adding the appropriate index.

Full explanation →

429

MCQhard

An online advertising platform uses Cloud Spanner for ad impression tracking. The table 'ad_impressions' has a primary key (ad_id, timestamp). The table receives millions of writes per minute. A secondary index on (campaign_id, timestamp) was created to support queries that sum impressions per campaign. During high traffic, the team notices increased write latency and hotspotting on the index (the campaign_id has low cardinality, causing all writes to a campaign to hit the same index split). They need to redesign the schema to avoid hotspotting on the index while still supporting the campaign aggregation queries. What is the best solution?

A.Modify the secondary index to include a hash prefix (e.g., use 'hash(campaign_id)' as the first column of the index).

B.Migrate the ad_impressions table to Cloud Bigtable with row key 'campaign_id#timestamp'.

C.Change the primary key of the base table to include campaign_id as the first column.

D.Create a separate table that stores per-campaign aggregations, updated in real time.

AnswerA

A hash prefix distributes index writes evenly across splits, preventing hotspotting.

Why this answer

Option A is correct. Adding a hash prefix to the index key (e.g., using a hash of campaign_id as the leading column) distributes index writes across multiple splits, eliminating the hotspot. Option B (changing primary key) would affect the base table distribution but not necessarily the index.

Option C (separate table) adds complexity and still may have indexing issues. Option D (Bigtable) is a different database.

Full explanation →

430

MCQmedium

Refer to the exhibit. You are reviewing the following Cloud Spanner DDL statement for a table storing customer orders. What potential performance issue will arise with this schema?

A.The primary key includes two columns which reduces insert performance

B.The TotalAmount column should be INTEGER for performance

C.The table lacks a foreign key constraint

D.The OrderId is likely to be sequentially generated, causing write hotspots

AnswerD

Sequential keys lead to hotspotting; consider using a hash prefix or UUID.

Why this answer

Option B is correct: the primary key starts with OrderId, which is likely to be auto-generated and monotonically increasing. In Cloud Spanner, inserting rows with sequential primary keys causes write hotspots because all writes go to a single split, leading to performance degradation. Option A is incorrect because composite primary keys are fine.

Option C is irrelevant. Option D is not a performance issue.

Full explanation →

431

MCQmedium

A retail company uses BigQuery to store sales transactions. The BI team needs to create a monthly customer lifetime value (CLV) report that aggregates purchase history across multiple tables. Which BigQuery feature should they use to define the data structure for this report?

A.Create a materialized view with the aggregation query

B.Create a view that joins and aggregates the tables

C.Create an external table pointing to the raw data files

D.Create a new table to store the aggregated data using INSERT SELECT

AnswerB

A view provides a logical virtual table that hides complexity and ensures the BI team always sees the latest data.

Why this answer

Option B is correct because a view in BigQuery allows the BI team to define a logical data structure that joins and aggregates multiple tables without storing the results. This ensures the monthly CLV report always reflects the latest data, as views are re-evaluated at query time, which is ideal for recurring reports that need up-to-date aggregations.

Exam trap

Google Cloud often tests the distinction between views and materialized views, trapping candidates who assume materialized views are always better for performance without considering the need for real-time data freshness in recurring reports.

How to eliminate wrong answers

Option A is wrong because a materialized view stores pre-computed results, which can become stale and require manual or automatic refreshes, making it unsuitable for a report that must reflect the most recent purchase history without latency. Option C is wrong because an external table points to raw data files (e.g., in Cloud Storage) and does not support SQL joins or aggregations natively; it is designed for querying external data without loading it into BigQuery, not for defining a structured report. Option D is wrong because creating a new table with INSERT SELECT stores a static snapshot of the data, which would require manual re-execution to update the CLV report, defeating the purpose of a dynamic, recurring report.

Full explanation →

432

MCQmedium

A data analyst runs a query joining several large tables and gets 'Resources exceeded' error. They need to reduce memory usage without changing the query logic. What should they do?

A.Use a subquery to pre-aggregate the largest table before joining

B.Use APPROX_COUNT_DISTINCT for counting distinct values

C.Increase the slot reservation

D.Use SELECT * in the subquery to ensure all columns are available

AnswerA

Pre-aggregation reduces the row count and columns, decreasing shuffle and memory.

Why this answer

Option A is correct because pre-aggregating the largest table in a subquery reduces the amount of data that needs to be shuffled and joined in memory. In BigQuery, this minimizes the bytes processed and the memory footprint of the join operation, directly addressing the 'Resources exceeded' error without altering the overall query logic.

Exam trap

The trap here is that candidates often confuse increasing resources (slots) with reducing memory usage, or they think that approximate functions like APPROX_COUNT_DISTINCT can fix join memory errors, when in fact they only affect aggregation accuracy.

How to eliminate wrong answers

Option B is wrong because APPROX_COUNT_DISTINCT reduces the accuracy of distinct counts but does not reduce the memory usage of a join operation; it only optimizes a specific aggregation function. Option C is wrong because increasing the slot reservation increases the available compute resources (slots) but does not reduce the memory usage per query; it may delay the error but does not fix the underlying memory bottleneck. Option D is wrong because using SELECT * in a subquery retrieves all columns, which increases the data volume and memory consumption, making the 'Resources exceeded' error worse.

Full explanation →

433

MCQhard

A company has a Spanner instance with 5 nodes serving a global application. They receive alerts that write latency has increased significantly during business hours in the Asia-Pacific region. The team confirms that no application changes have been made. What is the most likely cause and recommended action?

A.Writes are hitting a hot spot due to monotonically increasing keys; consider using a hash prefix or bit-reversed key

B.CPU utilization is above 70%; enable Spanner fine-grained access control

C.Set up interleaved indexes to speed up writes

D.The instance is under-provisioned; increase the number of nodes

AnswerA

Using a hash prefix or bit-reversed key distributes writes across splits, reducing hot spots.

Why this answer

Monotonically increasing keys (e.g., timestamps or auto-increment IDs) cause all new writes to target the same tablet leader in Spanner, creating a hot spot. This increases write latency because the single node becomes a bottleneck, especially during peak business hours in the Asia-Pacific region. Using a hash prefix or bit-reversed key distributes writes evenly across nodes, resolving the contention.

Exam trap

Google Cloud often tests the misconception that adding nodes (scaling out) always fixes write latency, but the real issue is often a hot spot from poor key design, which requires schema-level changes rather than infrastructure scaling.

How to eliminate wrong answers

Option B is wrong because CPU utilization above 70% is a symptom, not a root cause, and enabling fine-grained access control does not reduce write latency. Option C is wrong because interleaved indexes optimize read performance by colocating parent and child rows, but they do not speed up writes; in fact, they can add overhead to write operations. Option D is wrong because the instance has 5 nodes and no application changes were made, so under-provisioning is unlikely; the issue is a hot spot from key design, not insufficient capacity.

Full explanation →

434

Drag & Dropmedium

Arrange the steps to configure high availability for a Cloud SQL for MySQL instance.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Create instance first, then enable HA and set standby zone, then verify and test.

Full explanation →

435

MCQhard

A BI manager needs to restrict access to sensitive sales data so that salespeople can only see their own region's data. Which BigQuery feature should be used to implement row-level security without duplicating tables?

A.Use column-level security to hide sensitive columns

B.Use BigQuery row-level access policies

C.Create an authorized view that uses SESSION_USER() in a WHERE clause to filter rows

D.Create separate IAM roles for each region

AnswerC

Authorized views can leverage the current user identity to dynamically filter rows, enabling row-level security.

Why this answer

Option C is correct because an authorized view with SESSION_USER() in a WHERE clause dynamically filters rows based on the caller's identity, providing row-level security without duplicating tables. This approach leverages BigQuery's ability to share a single view with different users, each seeing only their authorized subset of data, which aligns with the requirement to restrict salespeople to their own region's data.

Exam trap

The trap here is that candidates confuse 'row-level access policies' (a conceptual term) with a native BigQuery feature, leading them to select Option B, when in fact BigQuery implements row-level security through authorized views with SESSION_USER() or similar dynamic filtering, not a dedicated policy object.

How to eliminate wrong answers

Option A is wrong because column-level security hides entire columns (e.g., salary), not rows, so it cannot restrict which rows a salesperson sees based on region. Option B is wrong because BigQuery does not have a native 'row-level access policies' feature; the correct term is row-level security implemented via authorized views or row-level access policies (which are not a distinct BigQuery feature). Option D is wrong because IAM roles control access at the dataset or table level, not at the row level, and creating separate roles per region would require duplicating tables or complex, unscalable management.

Full explanation →

436

MCQeasy

Which Cloud Monitoring metric indicates the number of queries waiting for locks in Cloud SQL?

A.Lock waits

B.Active connections

C.CPU utilization

D.Queries

AnswerA

This metric measures the number of queries waiting for locks.

Why this answer

The 'Lock waits' metric in Cloud SQL (for MySQL, PostgreSQL, or SQL Server) directly tracks the number of queries that are blocked because they are waiting for a lock held by another transaction. This is the correct indicator of query contention, as it measures the count of statements currently in a lock-wait state, not the total queries or connections.

Exam trap

The trap here is that candidates confuse 'Queries' (total throughput) with 'Lock waits' (blocked queries), assuming that a high query count implies lock contention, when in fact lock waits are a specific subset of queries that are actively waiting for locks.

How to eliminate wrong answers

Option B is wrong because 'Active connections' shows the total number of open connections to the database, not queries waiting for locks; a high active connection count does not necessarily indicate lock contention. Option C is wrong because 'CPU utilization' measures processor usage, which may be high due to many reasons (e.g., heavy queries, indexing issues) but does not specifically indicate queries waiting for locks. Option D is wrong because 'Queries' typically refers to the total number of queries executed per second, not the subset of queries that are blocked waiting for locks.

Full explanation →

437

MCQeasy

A developer runs the command shown in the exhibit and wants to verify that replication is enabled on the Bigtable instance. Where should they look for this information in the output?

A.Examine the 'instanceType' field for 'MULTI_CLUSTER'.

B.Look for a 'replication' field in the JSON.

C.View the 'clusters' list within the instance description.

D.Check the 'state' field for 'REPLICATED'.

AnswerC

Clusters indicate replication if multiple clusters exist.

Why this answer

Option C is correct because the `gcloud bigtable instances describe` command returns a JSON representation of the instance, which includes a `clusters` list. Each cluster object in that list contains a `replication` field (e.g., `defaultStorageType` and `nodes`), and the presence of multiple clusters in the list indicates that replication is configured. Replication in Cloud Bigtable is enabled by adding more than one cluster to the instance, so examining the `clusters` list directly shows whether replication is active.

Exam trap

The trap here is that candidates confuse the `instanceType` field (which is `PRODUCTION` or `DEVELOPMENT`) with replication status, or expect a dedicated `replication` boolean field, when in fact replication is indicated by the presence of multiple clusters in the `clusters` list.

How to eliminate wrong answers

Option A is wrong because `instanceType` in Cloud Bigtable is either `PRODUCTION` or `DEVELOPMENT`, not `MULTI_CLUSTER`; the term 'MULTI_CLUSTER' is used for routing options, not instance type. Option B is wrong because there is no top-level `replication` field in the JSON output of `gcloud bigtable instances describe`; replication status is derived from the number of clusters in the `clusters` list. Option D is wrong because the `state` field in the instance description indicates the lifecycle state (e.g., `READY`, `CREATING`), not replication status; there is no `REPLICATED` state value.

Full explanation →

438

MCQmedium

A financial services company runs a Cloud SQL for PostgreSQL instance for transactional data. They need to conduct regular security audits and compliance checks. The database engineer must ensure that all connections to the database are encrypted and that access is restricted to authorized VMs only. The database is currently accessible from the internet via an authorized network with a public IP. What should the database engineer do to meet these requirements?

A.Create a Cloud SQL proxy instance in the same VPC and force all clients to connect through the proxy.

B.Configure SSL/TLS for all connections and use an authorized network with a specific CIDR range.

C.Enable Cloud SQL private IP and disable public IP. Use VPC Service Controls and Cloud Identity-Aware Proxy for access.

D.Enable Cloud SQL public IP with SSL/TLS and restrict access using Cloud Armor.

AnswerC

Private IP eliminates internet exposure; VPC Service Controls and IAP enforce access control.

Why this answer

Option C is correct because it addresses both requirements: encryption and access restriction. Enabling Cloud SQL private IP ensures that the database is only reachable from within the VPC, eliminating internet exposure. VPC Service Controls provide a security perimeter to prevent data exfiltration, and Cloud Identity-Aware Proxy (IAP) enables fine-grained, identity-based access to the database without requiring a public IP or VPN.

Exam trap

The trap here is that candidates often confuse Cloud SQL proxy with a network-level access control solution, or they assume that SSL/TLS and authorized networks are sufficient for VM-only access, overlooking the fact that authorized networks still expose a public IP and do not enforce VM identity.

How to eliminate wrong answers

Option A is wrong because Cloud SQL proxy is a client-side tool for encrypting connections and simplifying authentication, but it does not restrict access to authorized VMs only; it still requires a public IP or a private IP with appropriate network configuration, and it does not enforce VM-level authorization. Option B is wrong because while SSL/TLS encrypts connections, using an authorized network with a public IP still exposes the database to the internet, violating the requirement to restrict access to authorized VMs only; authorized networks are IP-based and do not enforce VM-level identity. Option D is wrong because Cloud Armor is a web application firewall for HTTP(S) traffic, not for database connections; it cannot restrict access to Cloud SQL PostgreSQL instances, and using a public IP with SSL/TLS still leaves the database internet-facing.

Full explanation →

439

MCQmedium

A global e-commerce company is designing a Cloud Spanner schema for order processing. They need strong consistency across regions and high write throughput. Orders are identified by a globally unique order ID (UUID). Currently, they use the UUID as the primary key, but they observe write hotspots during peak hours. What primary key design change should they make to distribute writes more evenly?

A.Use the timestamp of order creation as the primary key.

B.Use a sequential integer primary key with auto-increment.

C.Use a composite primary key starting with a hash of the order ID, followed by the order ID.

D.Keep UUID as primary key but add a secondary index on a hash of the UUID.

AnswerC

A hash prefix ensures writes are distributed across all splits, avoiding hotspots.

Why this answer

Option C is correct because a composite key starting with a high-cardinality column (like a hash of the order ID or a timestamp-partitioned column) distributes writes across multiple splits, avoiding hotspots. Option A (UUID) can cause hotspots if inserted in order; Option B (sequential integers) causes hotspots on the last split; Option D (monotonically increasing timestamp) causes similar hotspot issues.

Full explanation →

440

Multi-Selectmedium

Which TWO statements are true about designing a star schema for BI reporting?

Select 2 answers

A.Fact tables store descriptive attributes like product names

B.Dimension tables are denormalized to reduce the number of joins

C.Fact tables use natural keys to enforce referential integrity

D.Fact tables contain quantitative measures

E.Dimension tables are normalized to minimize redundancy

AnswersB, D

Denormalized dimensions allow joining directly to the fact table without additional joins.

Why this answer

Option B is correct because dimension tables in a star schema are intentionally denormalized to reduce the number of joins required for BI queries. This denormalization improves query performance by allowing fact tables to join directly to dimension tables without traversing multiple normalized tables, which is a key design principle for OLAP reporting.

Exam trap

Google Cloud often tests the misconception that dimension tables should be normalized for data integrity, but in star schemas for BI, denormalization is intentional to optimize query performance over normalization.

Full explanation →

441

Drag & Dropmedium

Arrange the steps to perform a point-in-time recovery (PITR) for a Cloud SQL instance.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

PITR requires backups and binary logs enabled; then you create a new instance from backup at the specific time.

Full explanation →

442

MCQmedium

A company uses BigQuery for real-time BI. They have a table with streaming inserts. Analysts run queries that need to see data within seconds. However, they notice that streaming data appears with a delay of up to 2 minutes. What is the most likely reason?

A.The query uses cached results.

B.The table is partitioned by hour.

C.The streaming buffer's flush interval is set to 2 minutes.

D.The table has a clustering key.

AnswerC

By default, BigQuery flushes streaming buffers every 90 seconds; configuration can change this.

Why this answer

Option C is correct because BigQuery's streaming buffer has a default flush interval of up to 90 seconds, but it can be configured. When the flush interval is set to 2 minutes, data written via streaming inserts remains in the buffer for that duration before being committed to the table, causing a delay of up to 2 minutes before it becomes visible to queries. This matches the symptom described in the question.

Exam trap

Google Cloud often tests the misconception that partitioning or clustering directly affects data freshness, when in fact they only impact storage organization and query performance, not the latency of streaming data visibility.

How to eliminate wrong answers

Option A is wrong because cached results only affect query performance, not the freshness of streaming data; cached results are served from a temporary cache and do not delay the visibility of newly streamed data. Option B is wrong because partitioning by hour does not inherently introduce a delay; it organizes data into partitions but does not control when streaming data becomes available for queries. Option D is wrong because a clustering key improves query performance by sorting data within partitions, but it has no impact on the latency of streaming data appearing in query results.

Full explanation →

443

MCQeasy

A company is designing a schema for time-series sensor data in Cloud Spanner. They need to efficiently query the latest reading for each sensor. Which schema design is most appropriate?

A.Use a single table with columns for each sensor and wide rows

B.Use Cloud SQL with a normalized schema

C.Create a Sensors table and an interleaved Readings table with primary key (SensorId, Timestamp DESC)

D.Use Cloud Bigtable with row keys (SensorId#Timestamp)

AnswerC

Correct: Interleaved hierarchy with descending timestamp allows efficient latest row retrieval per sensor.

Why this answer

Option A is correct because interleaving the Readings table under Sensors allows efficient parent-child joins and retrieval of the latest reading per sensor using the primary key ordering. Option B (single wide table) leads to large rows and poor performance. Option C (Cloud SQL) is not optimized for time-series at scale.

Option D (Bigtable) is better for time-series but the question specifies Spanner.

Full explanation →

444

MCQmedium

A company is using BigQuery for BI and needs to reduce costs for a large historical dataset that is infrequently queried. Which approach should they take?

A.Use materialized views for common aggregations.

B.Use clustered tables.

C.Partition by ingestion time and set expiration on partitions older than 90 days.

D.Use a view with a WHERE clause filtering recent data.

AnswerC

Expired partitions are deleted, reducing storage costs.

Why this answer

Option C is correct because partitioning by ingestion time allows BigQuery to automatically manage data lifecycle by setting partition expiration. This reduces storage costs for historical data that is infrequently queried, as partitions older than 90 days are deleted without manual intervention. This approach directly addresses the need to reduce costs for a large historical dataset while maintaining query performance on recent data.

Exam trap

Google Cloud often tests the distinction between cost reduction and performance optimization, leading candidates to choose clustering or materialized views (which improve query speed) instead of the storage lifecycle management solution that directly reduces costs.

How to eliminate wrong answers

Option A is wrong because materialized views improve query performance for common aggregations but do not reduce storage costs for historical data; they actually incur additional storage costs for the precomputed results. Option B is wrong because clustered tables optimize query performance by sorting data within partitions but do not reduce storage costs or automatically expire old data. Option D is wrong because a view with a WHERE clause filtering recent data only limits the data scanned at query time, but the underlying historical data remains in storage and continues to incur costs.

Full explanation →

445

MCQhard

A Bigtable cluster has 10 nodes and is experiencing 90% CPU utilization, causing increased latency. The workload is mostly random reads (70%) and writes (30%). The table has 50TB of data, and the row key design is efficient. What is the best way to reduce CPU utilization?

A.Increase the number of nodes to 20.

B.Add SSDs instead of HDDs.

C.Enable replication for read offloading.

D.Compact the table to reduce SSTable count.

AnswerA

Adding nodes increases total throughput and reduces per-node CPU, alleviating the bottleneck.

Why this answer

Increasing the number of nodes distributes the load and reduces CPU per node, directly addressing the high utilization. Adding SSDs or compaction may help marginally but not as effectively as adding nodes.

Full explanation →

446

MCQeasy

Based on the exhibit, what is the primary key of the Readings table?

A.(SensorId, Timestamp)

B.(SensorId, SensorType)

C.(SensorId)

D.(ReadingsId)

AnswerA

The DDL explicitly defines this as the primary key.

Why this answer

Option A is correct: the DDL shows PRIMARY KEY (SensorId, Timestamp DESC). Option B is incorrect because SensorType is not a column. Option C is incorrect because there is no ReadingsId column.

Option D is incorrect as the key includes both columns.

Full explanation →

447

Multi-Selecthard

A company runs a Bigtable instance for time-series data. They need to reduce storage costs without compromising query performance for the most recent 30 days. Which three strategies should they implement?

Select 2 answers

A.Increase the number of cluster nodes to improve compaction

B.Use Cloud Storage as a cold storage tier for historical data

C.Enable Bigtable replication and delete data from one cluster

D.Set garbage collection to delete data older than 30 days

E.Reduce the number of cluster nodes to save costs

AnswersB, D

Export old data to Cloud Storage and delete from Bigtable.

Why this answer

Option B is correct because moving historical data (older than 30 days) to Cloud Storage as a cold storage tier reduces Bigtable storage costs while keeping the most recent 30 days in Bigtable for fast queries. Option D is correct because setting garbage collection (GC) to delete data older than 30 days automatically removes stale data, reducing storage footprint without impacting query performance for recent data. Both strategies directly address cost reduction while preserving performance for the required time window.

Exam trap

Google Cloud often tests the misconception that reducing cluster nodes or increasing nodes is a direct cost-saving strategy, but candidates must remember that performance requirements (especially for recent data) dictate node count, and cost savings must come from data lifecycle management, not infrastructure scaling.

Full explanation →

448

Multi-Selecthard

A company wants to reduce BigQuery query costs for their BI workloads. Which THREE actions effectively lower the amount of data processed per query? (Choose THREE.)

Select 3 answers

A.Use partitioned tables on date column

B.Use LIMIT in subqueries to reduce output

C.Use clustered tables on frequently filtered columns

D.Use SELECT * to avoid missing columns

E.Use materialized views that match common query patterns

AnswersA, C, E

Partitioning limits query scans to relevant partitions, cutting bytes.

Why this answer

Partitioned tables in BigQuery allow queries to use the WHERE clause to filter on the partition column (e.g., a date column), so BigQuery can prune entire partitions from the scan. This directly reduces the amount of data read and billed, lowering query costs. Option A is correct because it is a primary cost-control mechanism in BigQuery.

Exam trap

Google Cloud often tests the misconception that row-limiting clauses like LIMIT reduce data processing costs, but in BigQuery, only column and partition pruning reduce the bytes scanned.

Full explanation →

449

MCQeasy

Your organization requires that all database backups be stored in a different region for disaster recovery. You are using Cloud SQL for MySQL. What backup configuration should you use?

A.Enable automated backups and select the same region as the instance.

B.Enable automated backups and select a different region for the backup location.

C.Use on-demand exports to Cloud Storage in the same region.

D.Configure a multi-region Cloud Storage bucket and point automated backups there.

AnswerB

This meets the cross-region DR requirement.

Why this answer

Cloud SQL for MySQL allows you to specify a different region for automated backup storage, which satisfies the disaster recovery requirement of storing backups in a separate region. By selecting a different region for the backup location, you ensure that if the primary region fails, the backups remain accessible for recovery. This is the only built-in option that directly meets the cross-region backup requirement without additional manual steps.

Exam trap

Google Cloud often tests the misconception that automated backups can be directed to a multi-region Cloud Storage bucket, but Cloud SQL only supports a single-region backup location for automated backups, not multi-region or dual-region buckets.

How to eliminate wrong answers

Option A is wrong because selecting the same region as the instance does not provide disaster recovery isolation; a regional failure would affect both the instance and its backups. Option C is wrong because on-demand exports to Cloud Storage in the same region also lack cross-region redundancy; the backups remain vulnerable to the same regional outage. Option D is wrong because Cloud SQL automated backups cannot be pointed to a multi-region Cloud Storage bucket; automated backups are stored in Cloud SQL's internal backup storage, not in a user-managed bucket, and the backup location must be a single region.

Full explanation →

450

Multi-Selectmedium

A company is migrating their on-premises Oracle database to Cloud SQL for PostgreSQL. They want to minimize downtime during the cutover. Which two strategies should the database engineer recommend? (Choose 2.)

Select 2 answers

A.Use a Cloud VPN tunnel for data transfer.

B.Use Database Migration Service with continuous replication.

C.Use a third-party tool like pglogical for replication.

D.Use Cloud SQL for PostgreSQL with read replicas and promote.

E.Perform an export using pg_dump and import using psql.

AnswersB, C

DMS allows continuous replication with minimal downtime.

Why this answer

Database Migration Service (DMS) with continuous replication is correct because it supports minimal-downtime migrations from Oracle to Cloud SQL for PostgreSQL by continuously replicating changes from the source to the target until cutover. This allows the source database to remain operational during most of the migration, with only a brief pause to finalize the switch.

Exam trap

Google Cloud often tests the misconception that any replication tool (like pglogical) can be used across heterogeneous databases, but pglogical only works between PostgreSQL instances, not from Oracle to PostgreSQL.

Full explanation →

Page 6 of 7

All pages

Practice PCDE by domain

Target a specific domain to shore up weak areas.

Plan and manage database infrastructure Define data structures and implement SQL for Business Intelligence Design and implement database schemas Monitor and optimize database performance

See all domains with question counts →