Knowledge + Practice

Google Professional Cloud Database Engineer (PCDE) — Questions 1–75

503 questions total · 7pages · All types, answers revealed

Take a mock exam Exam hub

Page 1 of 7

1

Multi-Selecthard

Refer to the exhibit. A company has a Cloud Spanner instance with the backup configuration shown. They need to improve disaster recovery. Which THREE strategies should they implement?

Select 3 answers

A.Schedule regular exports to Cloud Storage

B.Enable point-in-time recovery (PITR)

C.Use multi-region instance configuration

D.Configure cross-region backups

E.Use read replicas in another region

AnswersB, C, D

PITR allows restoring to any point in time within the retention period, improving recovery granularity.

Why this answer

Option B is correct because enabling point-in-time recovery (PITR) in Cloud Spanner allows you to recover data to any point within the retention period (default 7 days), which is essential for granular disaster recovery against logical errors or accidental data changes. This complements backup strategies by providing fine-grained restore capabilities beyond full backups.

Exam trap

Google Cloud often tests the misconception that read replicas or exports are viable disaster recovery mechanisms, when in fact they lack the write availability or point-in-time restore capabilities required for true DR in Cloud Spanner.

Full explanation →

2

Drag & Dropmedium

Arrange the steps to set up database encryption with Cloud KMS for Cloud SQL.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First create KMS key, grant access, then create instance with CMEK, verify, test.

Full explanation →

3

MCQeasy

Refer to the exhibit. A developer deployed these Firestore security rules. What is a security concern with this configuration?

A.Only the owner can write to documents

B.The rules do not explicitly allow delete operations

C.All authenticated users can read all documents

D.The wildcard path matches all databases

AnswerC

The read rule allows any authenticated user to read any document, which may leak private data.

Why this answer

Option C is correct because the Firestore security rules shown grant read access to all authenticated users without any document-level or collection-level restrictions. The rule `match /databases/{database}/documents { allow read: if request.auth != null; }` applies to all documents in the database, meaning any authenticated user can read every document, including those they should not have access to. This violates the principle of least privilege and can lead to unauthorized data exposure.

Exam trap

Google Cloud often tests the misconception that 'authenticated users' implies safe access, but the trap here is that without document-level conditions, all authenticated users can read every document, which is a common misconfiguration in Firestore security rules.

How to eliminate wrong answers

Option A is wrong because the rules do not restrict write access to only the owner; they allow any authenticated user to write to any document (allow write: if request.auth != null;), which is a broader security concern. Option B is wrong because Firestore security rules implicitly allow delete operations when write access is granted, as 'write' encompasses create, update, and delete unless explicitly separated. Option D is wrong because the wildcard path `{database}` matches any database name within the project, which is standard for multi-database setups and does not introduce a security concern by itself.

Full explanation →

4

MCQmedium

A company uses Memorystore for Redis as a cache. They need to survive a zone failure. Which configuration should they choose?

A.Cluster tier with multiple shards

B.Basic tier with a single node

C.Standard tier cross-region

D.Standard tier with replication

AnswerD

Standard tier provisions a primary and replica in different zones, providing zone failover.

Why this answer

Standard tier with replication (Option D) provides a primary-replica pair in the same region but across two zones, ensuring automatic failover if one zone fails. This meets the requirement to survive a zone failure while maintaining low-latency access within a single region.

Exam trap

The trap here is that candidates may confuse 'cross-region' with 'cross-zone' replication, or assume that sharding alone (Cluster tier) provides zone redundancy, when in fact Memorystore for Redis does not support cross-region replication and requires explicit replication (Standard tier) for zone-level failover.

How to eliminate wrong answers

Option A is wrong because Cluster tier with multiple shards distributes data across shards for horizontal scaling, but does not inherently provide zone-level redundancy unless each shard has replicas in different zones, which is not guaranteed by this option alone. Option B is wrong because Basic tier with a single node has no replication or failover, so a zone failure would cause complete data loss and downtime. Option C is wrong because Standard tier cross-region is not a valid Memorystore tier; cross-region replication is not supported by Memorystore for Redis, and this option incorrectly implies multi-region failover.

Full explanation →

5

MCQhard

A multinational corporation uses Cloud Spanner with a multi-region configuration. The schema includes a table that is updated frequently by users in two distant regions. They are experiencing high commit latencies due to distributed transactions. Which schema change would most reduce latency?

A.Reduce the number of replicas in the Spanner configuration.

B.Use a table-level leader placement configuration to keep the table's splits in a single region.

C.Convert the table into an interleaved child of a parent table.

D.Increase the number of splits by using a more granular primary key.

AnswerB

Leader placement allows directing all writes for a table to the nearest region, reducing distributed transaction overhead.

Why this answer

Option C is correct because replicating the table across regions with leader placement can reduce the number of remote operations. Option A is wrong because interleaving does not help with distribution across regions. Option B is wrong because reducing replicas may compromise availability.

Option D is wrong because horizontal scaling doesn't directly fix cross-region latency.

Full explanation →

6

Multi-Selecthard

A company is designing a global application using Cloud Spanner. They need to ensure low latency reads and writes across three continents. Which TWO configurations should they consider?

Select 2 answers

A.Use a multi-region configuration with leader regions in each continent.

B.Use a single-region instance and rely on application caching.

C.Use strongly consistent reads from a single region.

D.Use read replicas in each continent for stale read use cases.

E.Use interleaved tables to optimize query performance.

AnswersA, D

Multi-region with leader regions reduces write latency.

Why this answer

Option A is correct because Cloud Spanner multi-region configurations allow you to place leader regions in multiple continents, which enables low-latency strongly consistent reads and writes by directing traffic to the nearest leader. This is achieved through Spanner's TrueTime and Paxos-based replication, ensuring global consistency without sacrificing performance.

Exam trap

Google Cloud often tests the misconception that read replicas or caching alone can solve global write latency, but Cloud Spanner requires leader regions in each continent for low-latency strongly consistent writes.

Full explanation →

7

Multi-Selecthard

You manage a Cloud SQL for PostgreSQL instance that is experiencing high read latency. You have already tuned the buffer cache and queries. Which THREE actions can further reduce read latency? (Choose three.)

Select 3 answers

A.Enable the PostgreSQL slow query log and analyze it.

B.Use connection pooling to reduce the number of open connections.

C.Increase the number of vCPUs on the primary instance.

D.Create read replicas in the same region to distribute read traffic.

E.Add a Memorystore for Redis cache in front of the database for frequently accessed data.

AnswersC, D, E

More vCPUs can process more queries concurrently, reducing queue time.

Why this answer

Creating read replicas offloads reads; adding an in-memory cache (e.g., Redis) reduces database load; and query optimization (indexing, rewriting) directly reduces read time. Vertical scaling adds hardware but may not be cost-effective; enabling query logging adds overhead, not help.

Full explanation →

8

Multi-Selecteasy

A startup is using Firestore in Native mode for a real-time chat application. They want to design the schema for chat rooms and messages. Which TWO design patterns are recommended? (Choose two.)

Select 2 answers

A.Use arrays in the chat room document to store message IDs.

B.Use a composite index on chat room ID and timestamp.

C.Store all messages in a single top-level collection with a field for chat room ID.

D.Use a separate top-level collection for each chat room.

E.Store messages as documents in a subcollection under each chat room document.

AnswersB, E

A composite index is required for querying messages efficiently.

Why this answer

Options A and E are correct. Storing messages in a subcollection under each chat room (A) is scalable and follows Firestore best practices. A composite index on chat room ID and timestamp (E) is needed for efficient queries.

Option B (single collection) is less scalable; Option C (separate collection per chat room) leads to many collections, which is not recommended; Option D (arrays) has size limits and is not scalable for many messages.

Full explanation →

9

MCQhard

A company uses Cloud SQL for MySQL with a failover replica. The primary instance is in us-central1 and the replica is in us-east1. During a regional outage in us-central1, the database engineer executes an emergency failover to the replica. After the failover, applications experience high latency when writing to the new primary. What is the most likely cause?

A.The new primary is in a different region, causing higher network round-trip times for applications that are still in us-central1.

B.Cross-region replication triggers a mandatory 1-hour delay before writes are allowed.

C.The failover did not complete successfully; the replica is still in read-only mode.

D.The replica had a replication lag of 5 minutes, causing data inconsistency.

AnswerA

After failover, the primary is in us-east1, while application instances in us-central1 incur cross-region latency for each write.

Why this answer

Option A is correct because after a cross-region failover, the new primary resides in us-east1 while the applications remain in us-central1. This geographic distance increases network round-trip time (RTT) for write operations, as each write must traverse the WAN between regions. Cloud SQL for MySQL does not automatically relocate compute resources, so latency-sensitive applications will experience higher write latency until they are migrated or configured to connect to the new region.

Exam trap

Google Cloud often tests the misconception that failover automatically fixes all performance issues, but the trap here is that candidates overlook the impact of geographic latency on write operations after a cross-region failover, focusing instead on replication lag or read-only mode.

How to eliminate wrong answers

Option B is wrong because Cloud SQL for MySQL does not impose any mandatory delay after a failover; writes are allowed immediately once the replica is promoted to primary. Option C is wrong because a successful failover automatically promotes the replica to read-write mode; if it remained read-only, applications would receive errors, not high latency. Option D is wrong because replication lag does not cause high write latency; it affects read consistency but the promoted replica becomes the new primary with full write capability, and any lag is resolved asynchronously.

Full explanation →

10

MCQhard

A team is migrating an on-premises PostgreSQL database to Cloud SQL for PostgreSQL. The existing schema uses a large number of foreign key constraints and triggers for data validation. The team wants to minimize migration effort and maintain data integrity. Which schema design approach is most appropriate for Cloud SQL?

A.Keep the existing foreign keys and triggers as-is in Cloud SQL for PostgreSQL

B.Migrate to Cloud Spanner and use interleaved tables to simulate foreign keys

C.Remove all foreign keys and triggers and implement validation in the application layer

D.Convert the schema to use Firestore in Datastore mode with composite indexes

AnswerA

Cloud SQL supports these features, minimizing migration effort.

Why this answer

Option A is correct because Cloud SQL for PostgreSQL is fully compatible with the PostgreSQL engine, meaning foreign key constraints and triggers operate identically to on-premises PostgreSQL. This approach minimizes migration effort by preserving the existing schema logic and maintaining referential integrity without requiring application changes or data validation rewrites.

Exam trap

The trap here is that candidates assume managed cloud databases require schema simplification or NoSQL conversion, but Cloud SQL for PostgreSQL is a direct lift-and-shift target that preserves all relational features like foreign keys and triggers.

How to eliminate wrong answers

Option B is wrong because Cloud Spanner uses interleaved tables for hierarchical data relationships, not as a direct replacement for foreign keys; it does not support PostgreSQL triggers or the same constraint enforcement, requiring significant schema redesign and application logic changes. Option C is wrong because removing foreign keys and triggers shifts data integrity to the application layer, which increases complexity, risk of data corruption, and violates the goal of minimizing migration effort while maintaining integrity. Option D is wrong because Firestore in Datastore mode is a NoSQL document database that does not support SQL foreign keys, triggers, or relational integrity constraints, requiring a complete schema transformation and loss of existing PostgreSQL functionality.

Full explanation →

11

MCQhard

A production Cloud SQL for PostgreSQL instance needs to handle increased read traffic and provide automatic failover in case of a zone outage. Which architecture satisfies both requirements?

A.Enable high availability (regional) and create a read replica in the same region.

B.Enable high availability only.

C.Use a CMEK key and enable binary logging.

D.Create multiple read replicas without HA.

AnswerA

HA provides automatic zone failover; read replicas offload read traffic, meeting both needs.

Why this answer

Option A is correct because enabling high availability (regional) for Cloud SQL for PostgreSQL creates a primary and standby instance in different zones within the same region, providing automatic failover during a zone outage. Adding a read replica in the same region offloads read traffic from the primary instance, satisfying the increased read traffic requirement while the HA configuration ensures high availability.

Exam trap

Google Cloud often tests the distinction between high availability (automatic failover) and read replicas (read scaling), leading candidates to assume that read replicas alone can provide failover or that HA alone can handle increased read traffic.

How to eliminate wrong answers

Option B is wrong because enabling high availability only provides automatic failover but does not address the need to handle increased read traffic; read replicas are required for read scaling. Option C is wrong because using a CMEK key (Customer-Managed Encryption Key) and enabling binary logging are related to encryption and point-in-time recovery, not to read scaling or automatic failover. Option D is wrong because creating multiple read replicas without HA handles increased read traffic but does not provide automatic failover in case of a zone outage; HA is necessary for failover.

Full explanation →

12

MCQmedium

A company is migrating an on-premises PostgreSQL database to Cloud SQL. The database is 2 TB and has a high write workload. They need minimal downtime. Which migration approach is best?

A.Use Cloud Spanner

B.Use Cloud SQL for MySQL instead

C.Export using pg_dump and import via psql

D.Use Database Migration Service with continuous replication

AnswerD

DMS with continuous replication minimizes downtime by keeping the target in sync.

Why this answer

Database Migration Service (DMS) with continuous replication is the best approach because it supports minimal-downtime migrations from on-premises PostgreSQL to Cloud SQL. DMS uses change data capture (CDC) to replicate ongoing writes while the initial data load completes, then performs a cutover with only seconds of downtime. This handles the 2 TB size and high write workload without requiring manual export/import or schema changes.

Exam trap

Google Cloud often tests the misconception that pg_dump/psql is suitable for large databases with high write workloads, but the trap here is that candidates overlook the need for minimal downtime and the fact that logical dumps require a consistent snapshot, which forces a read-only period or long-running transaction that blocks writes.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner is a globally distributed, horizontally scalable database that requires schema redesign and does not support PostgreSQL wire protocol natively, making it unsuitable for a direct PostgreSQL migration. Option B is wrong because Cloud SQL for MySQL is a different database engine; migrating from PostgreSQL to MySQL would require schema and query conversion, increasing complexity and downtime, and does not leverage the existing PostgreSQL setup. Option C is wrong because pg_dump and psql export/import is a logical backup method that requires the source database to be read-only or quiesced during the dump to ensure consistency, causing significant downtime; for a 2 TB database with high write workload, this would result in hours of downtime and risk of data loss.

Full explanation →

13

MCQhard

A data team uses BigQuery for ad-hoc BI queries. They have a table with 100 columns. Analysts often select many columns. The table is partitioned by event_date. Queries are slow and expensive. What two-step optimization should they implement? (Note: This is a single correct answer among four options that combine two steps.)

A.Cluster the table by commonly used columns and limit the selected columns in queries.

B.Convert the table to an Avro format and use partitioned tables.

C.Partition by event_date and use column-level security.

D.Cluster the table by event_date and use SELECT *.

AnswerA

Clustering narrows scans within partitions; selecting only needed columns reduces bytes processed.

Why this answer

Clustering by commonly used columns organizes data within partitions so that queries scanning only those columns read fewer blocks, reducing bytes processed. Limiting selected columns in queries further reduces the data scanned by avoiding unnecessary column reads. Together, these two steps directly address the high cost and slow performance caused by scanning many columns across a large partitioned table.

Exam trap

Google Cloud often tests the misconception that partitioning alone is sufficient for all query optimizations, but the trap here is that partitioning only reduces scan by date range, not by column count—so candidates overlook the need to also limit columns or cluster on non-partition columns.

How to eliminate wrong answers

Option B is wrong because converting to Avro format does not inherently optimize query performance or cost in BigQuery; Avro is a storage format for import/export, not a query optimization technique, and partitioning alone does not reduce the column scan overhead. Option C is wrong because column-level security controls access but does not reduce the amount of data scanned or improve query performance; it adds administrative overhead without addressing the cost or speed issue. Option D is wrong because clustering by event_date is redundant when the table is already partitioned by event_date, and using SELECT * is the opposite of optimization—it forces scanning all columns, increasing cost and latency.

Full explanation →

14

Multi-Selecteasy

Which TWO are best practices for designing a Cloud Spanner schema?

Select 2 answers

A.Avoid secondary indexes to keep writes faster

B.Use monotonically increasing primary keys

C.Use commit timestamp columns to track row versions

D.Use interleaved tables for parent-child relationships

E.Store all related data in a single row to avoid joins

AnswersC, D

Commit timestamps provide automatic versioning.

Why this answer

Option A is incorrect because monotonically increasing keys cause hotspotting. Option B is correct: interleaved tables optimize parent-child joins. Option C is incorrect: secondary indexes are often needed for non-primary key queries.

Option D is correct: commit timestamp columns enable versioning without storing explicit timestamps. Option E is incorrect: storing all data in a single row leads to large rows and contention.

Full explanation →

15

MCQmedium

A company has a Cloud SQL for MySQL instance with automated backups enabled. They want to ensure they can recover to any point within the last 7 days with minimum storage cost. What should they do?

A.Enable binary logging manually and store transaction logs in Cloud Storage.

B.Increase automated backup retention to 30 days and disable point-in-time recovery.

C.Keep the default automated backup configuration with 7-day retention and enable point-in-time recovery.

D.Disable automated backups and create manual backups daily.

AnswerC

Default retention is 7 days, and PITR is enabled by default for MySQL.

Why this answer

Option C is correct because Cloud SQL for MySQL with automated backups enabled by default retains backups for 7 days. Enabling point-in-time recovery (PITR) uses the existing binary logs to allow recovery to any point within that retention period, without additional storage cost for the logs beyond the backup storage. This meets the requirement of 7-day recoverability with minimum storage cost.

Exam trap

The trap here is that candidates may think enabling point-in-time recovery requires additional storage or manual configuration of binary logs, but in Cloud SQL, PITR is a built-in feature that uses the existing automated backup retention and does not increase storage cost beyond the backup storage itself.

How to eliminate wrong answers

Option A is wrong because binary logging is automatically enabled when you enable point-in-time recovery in Cloud SQL; manually enabling it and storing logs in Cloud Storage would incur additional storage costs and management overhead, not minimum cost. Option B is wrong because increasing backup retention to 30 days exceeds the 7-day requirement and incurs more storage cost, and disabling point-in-time recovery prevents recovery to any point within the retention period, only to backup times. Option D is wrong because disabling automated backups and creating manual backups daily would not allow point-in-time recovery (only to backup snapshots) and manual backups can be more expensive and less reliable than automated backups.

Full explanation →

16

MCQeasy

A company needs to store time-series sensor data with high write throughput (millions of writes per second) and low latency reads. Which database service should they choose?

A.Cloud Bigtable

B.Cloud SQL

C.Firestore

D.Cloud Spanner

AnswerA

Bigtable is optimized for high write throughput and low-latency reads, ideal for time-series data.

Why this answer

Cloud Bigtable is a fully managed, scalable NoSQL database designed for large analytical and operational workloads, handling millions of writes per second with consistent low-latency reads. It uses a distributed, replicated SSTable storage engine and is optimized for time-series data, making it ideal for high-throughput sensor ingestion.

Exam trap

Google Cloud often tests the misconception that Cloud Spanner's global scalability makes it suitable for all high-throughput workloads, but candidates overlook its transactional overhead and cost, which make it inappropriate for simple time-series writes at millions per second.

How to eliminate wrong answers

Option B is wrong because Cloud SQL is a relational database (MySQL, PostgreSQL, SQL Server) with limited write throughput (typically thousands of writes per second) and is not designed for high-velocity time-series data. Option C is wrong because Firestore is a document-oriented NoSQL database optimized for mobile and web apps with moderate throughput (up to 10,000 writes per second per database) and does not support the millions of writes per second required. Option D is wrong because Cloud Spanner is a globally distributed relational database with strong consistency and horizontal scaling, but its write throughput is limited by node count and transaction overhead, making it unsuitable for the extreme write volume of millions per second.

Full explanation →

17

Multi-Selectmedium

Which THREE are considerations when designing a schema for Cloud Firestore?

Select 3 answers

A.Use subcollections to organize related data

B.Avoid large arrays to prevent document size limits

C.Denormalize data to reduce the need for joins

D.Use nested maps for deeply structured data

E.Always use transactional writes to ensure consistency

AnswersA, B, C

Subcollections enable scalable data modeling.

Why this answer

Option A is correct: denormalization is common in Firestore to avoid expensive reads. Option B is not a best practice: deeply nested maps are hard to query and can cause contention. Option C is correct: large arrays cause document bloat and index limits.

Option D is correct: subcollections allow scalable data organization. Option E is incorrect: Firestore supports transactions but they are not the only way to ensure consistency.

Full explanation →

18

MCQhard

An analyst writes a SQL query that joins a fact table with multiple dimension tables. The query runs slowly due to shuffling. Which optimization technique should be applied?

A.Cluster the fact table on the dimension join keys.

B.Use a subquery in the FROM clause to pre-aggregate.

C.Use a LIMIT clause to restrict rows.

D.Use a window function to precompute values.

AnswerA

Clustering on join keys minimizes data movement.

Why this answer

Shuffling occurs when data must be redistributed across nodes during joins, often because the join keys are not co-located. Clustering the fact table on the dimension join keys physically co-locates rows with the same join key values, minimizing data movement during the join. This is a direct optimization for shuffle-heavy workloads in distributed SQL engines like Spark SQL or Hive.

Exam trap

Google Cloud often tests the misconception that reducing row count (via aggregation or LIMIT) solves shuffle performance, when the real bottleneck is data movement across nodes during the join itself.

How to eliminate wrong answers

Option B is wrong because pre-aggregating in a subquery reduces row count but does not address the root cause of shuffling during the join; the join still requires redistribution unless the subquery result is small enough to broadcast. Option C is wrong because a LIMIT clause only restricts the final output rows, not the intermediate data shuffled during the join; the full join still executes. Option D is wrong because window functions operate on already partitioned data and do not reduce shuffling; they can even introduce additional shuffles if the PARTITION BY clause differs from the join keys.

Full explanation →

19

Multi-Selectmedium

Which TWO actions can help reduce the number of read replicas needed for a Cloud SQL for PostgreSQL instance that serves a read-heavy workload?

Select 2 answers

A.Implement connection pooling to reuse database connections.

B.Enable synchronous replication on all read replicas.

C.Use smaller machine types for read replicas.

D.Use application-level caching (e.g., Redis) to cache frequent read results.

E.Increase the max_connections parameter on the primary instance.

AnswersA, D

Reduces connection overhead and improves replica efficiency.

Why this answer

Option A is correct because connection pooling reduces the overhead of establishing new database connections, which can consume significant CPU and memory resources on the primary instance. By reusing existing connections, the primary instance can handle more read requests without needing additional read replicas to offload the connection management load. This directly reduces the number of replicas required for a read-heavy workload.

Exam trap

Google Cloud often tests the misconception that increasing database parameters like max_connections or using synchronous replication directly reduces read replica requirements, when in fact these actions either increase resource consumption or do not address read offloading.

Full explanation →

20

Multi-Selectmedium

Which TWO metrics should you monitor in Cloud Monitoring to evaluate the performance of a Cloud Spanner instance? (Choose two.)

Select 2 answers

A.Row reads per second

B.Commit latency

C.Connection count

D.Disk IOPS

E.CPU utilization per node

AnswersB, E

Indicates write performance.

Why this answer

Commit latency is a critical metric for Cloud Spanner because it directly measures the time taken to commit a transaction, which reflects the database's ability to handle write operations efficiently. High commit latency can indicate contention, node overload, or suboptimal schema design, making it essential for performance evaluation.

Exam trap

Google Cloud often tests the misconception that throughput metrics like row reads per second or disk-level metrics like IOPS are meaningful for evaluating performance in a fully managed, distributed database like Cloud Spanner, where internal optimizations and abstractions make such metrics irrelevant.

Full explanation →

21

MCQeasy

Your company runs a business intelligence (BI) dashboard on BigQuery that refreshes every hour. The dashboard queries are complex with multiple JOINs and aggregations. Recently, the queries started taking longer than 30 minutes, causing timeouts. You check the BigQuery monitoring and see that the slot utilization consistently reaches 100% during the dashboard refresh. The project uses a flat-rate pricing model with 1000 slots. Other team members run ad-hoc queries during the same period. What is the most effective action to improve the dashboard performance?

A.Create a separate reservation for the dashboard queries with a baseline of 500 slots and use a low priority job queue for ad-hoc queries.

B.Rewrite the dashboard queries to use fewer joins and aggregations.

C.Increase the total number of slots to 2000 to provide more capacity for all queries.

D.Schedule the dashboard refresh to run at a different time when ad-hoc usage is low.

AnswerA

Dedicated slots guarantee resources for the dashboard regardless of other jobs.

Why this answer

Creating a separate reservation for the dashboard queries with a baseline of 500 slots ensures that the critical BI dashboard always has guaranteed compute capacity, preventing starvation by ad-hoc queries. Using a low-priority job queue for ad-hoc queries allows them to use any remaining idle slots without interfering with the dashboard's reserved slots. This directly addresses the 100% slot utilization and timeout issue without requiring query rewrites or schedule changes.

Exam trap

Google Cloud often tests the misconception that simply adding more resources (slots) or rewriting queries is the best solution, when in fact proper resource governance through reservations and priority queues is the most effective and scalable approach for mixed workloads.

How to eliminate wrong answers

Option B is wrong because rewriting queries to use fewer joins and aggregations might reduce complexity but does not guarantee performance improvements if the underlying slot contention is the root cause; it also requires significant development effort and may not fully resolve timeouts under high concurrency. Option C is wrong because simply increasing total slots to 2000 does not prioritize the dashboard queries; ad-hoc queries could still consume all slots, leading to the same contention and timeout issue. Option D is wrong because scheduling the dashboard refresh at a different time only avoids the conflict temporarily and does not solve the fundamental slot contention problem; it also may not be feasible if the dashboard requires hourly updates.

Full explanation →

22

Matchingmedium

Match each Cloud Spanner concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Automatic data distribution across nodes

Global clock service for external consistency

Parent-child table with co-located rows

Read with guaranteed latest data

Read with bounded staleness for lower latency

Why these pairings

These are key concepts for understanding Spanner's architecture and consistency.

Full explanation →

23

Multi-Selecthard

A financial services company is designing a Cloud Spanner schema for a trading system. They have two main entities: 'accounts' and 'transactions'. Each account has many transactions, and queries almost always retrieve transactions for a specific account. Which TWO schema design strategies should they employ?

Select 2 answers

A.Use a secondary index on transactions.account_id.

B.Ensure the primary key of transactions includes the account_id as the first part.

C.Define a foreign key constraint from transactions to accounts.

D.Store transactions as a JSON array of repeating fields within the account record.

E.Use an interleaved table hierarchy with accounts as parent and transactions as child.

AnswersB, E

This is required for interleaved tables: the child's primary key must start with the parent's primary key.

Why this answer

Option B is correct because Cloud Spanner distributes rows across splits based on the primary key prefix. By making `account_id` the first part of the transactions table primary key, all transactions for a given account are co-located, enabling efficient range scans and point lookups without cross-node shuffling.

Exam trap

Google Cloud often tests the misconception that secondary indexes are the default solution for filtering, when in Cloud Spanner the primary key design and interleaving are the preferred strategies for performance and cost efficiency.

Full explanation →

24

MCQmedium

Refer to the exhibit. After creating this Bigtable instance, the administrator noticed high read latency during peak hours. Which configuration change would most likely help?

A.Change cluster-storage-type to HDD

B.Increase cluster-num-nodes to at least 5

C.Add more clusters in different zones

D.Increase cluster-autoscaling-max-nodes to 20

AnswerB

More nodes increase the read throughput capacity, reducing latency during peak times.

Why this answer

Option D is correct because increasing the initial number of nodes (from 3 to a higher value) provides more serving capacity to handle read load. Option A (increasing max nodes) only allows scaling up but does not increase baseline capacity. Option B (HDD) would worsen latency.

Option C (adding clusters) adds replication overhead but does not directly improve read latency for the existing cluster.

Full explanation →

25

MCQeasy

Which tool is best for identifying hot spots in a Cloud Spanner database?

A.Query Insights

B.Key Visualizer

C.Cloud Trace

D.Cloud Monitoring

AnswerB

Key Visualizer provides heatmaps of key access patterns, helping identify hot spots.

Why this answer

Key Visualizer is the correct tool because it is specifically designed to visualize access patterns in Cloud Spanner and identify hot spots—keys or ranges that receive a disproportionate share of reads or writes. Unlike generic monitoring tools, Key Visualizer provides a heatmap of key-space activity, enabling you to pinpoint and mitigate performance bottlenecks caused by uneven distribution of workload across splits.

Exam trap

Google Cloud often tests the distinction between performance monitoring tools by presenting Cloud Monitoring or Query Insights as plausible answers, but the trap is that candidates overlook Key Visualizer's unique purpose of visualizing key-space access patterns specifically for Cloud Spanner hot spot detection.

How to eliminate wrong answers

Option A is wrong because Query Insights focuses on analyzing query performance, such as latency and execution plans, not on visualizing key-level access patterns to detect hot spots. Option C is wrong because Cloud Trace is a distributed tracing tool for latency analysis of requests across services, not for identifying hot keys in a database. Option D is wrong because Cloud Monitoring provides metrics and alerting for overall system health and performance, but lacks the key-space heatmap visualization required to pinpoint hot spots in Cloud Spanner.

Full explanation →

26

MCQeasy

You are using Cloud Memorystore for Redis as a caching layer. You notice that cache hit ratio is below 50%. What is the best action to improve it?

A.Flush the cache periodically to remove stale data.

B.Increase the TTL (time-to-live) for cached data.

C.Enable persistence to avoid data loss.

D.Increase the instance memory size.

AnswerB

Longer TTL keeps data in cache for more reads.

Why this answer

A low cache hit ratio indicates that a large proportion of requests are not finding their data in the cache, forcing the application to fetch from the primary database. Increasing the TTL (time-to-live) for cached data keeps valid entries in Redis longer, reducing the frequency of evictions and cache misses. This directly improves the hit ratio by ensuring that more requests can be served from the cache before the data expires.

Exam trap

Google Cloud often tests the misconception that a low cache hit ratio is always a memory capacity problem, leading candidates to choose 'increase memory size' when the real issue is data expiring too quickly due to short TTLs.

How to eliminate wrong answers

Option A is wrong because flushing the cache periodically removes all data, which would drastically reduce the hit ratio and increase load on the database, the opposite of the desired effect. Option C is wrong because enabling persistence (e.g., RDB snapshots or AOF logs) protects against data loss on restart but does not influence how long data remains in the cache or the hit ratio. Option D is wrong because increasing instance memory size only delays evictions under the maxmemory-policy; if the TTL is too short, data still expires quickly and the hit ratio remains low regardless of memory size.

Full explanation →

27

MCQhard

A financial services company uses BigQuery for risk analysis. They have a table `market_data` with columns `symbol`, `date`, `price`, and `volume`. The query pattern involves window functions over the last 30 days for many symbols. The table is partitioned by date and clustered by symbol. However, analysts report that queries are slow and expensive. What is the most likely cause?

A.Clustering does not create indexes on symbol

B.Clustering on symbol may cause many blocks to be scanned because symbols are not sorted

C.Partitioning causes data skew across partitions

D.Partitioning by date is not granular enough

AnswerB

If data is ingested without sorting by symbol, clustering effectiveness decreases, leading to many blocks being scanned.

Why this answer

Option B is correct because clustering in BigQuery does not physically sort data within partitions; it only co-locates rows with similar cluster column values. When a query uses window functions over a rolling 30-day window for many symbols, BigQuery must scan all blocks that contain any of those symbols, even if only a subset of rows is needed. Since symbols are not strictly sorted, many blocks contain multiple symbols, leading to excessive block scans and high query costs.

Exam trap

The trap here is that candidates assume clustering works like an index or a sort order, but BigQuery clustering only co-locates similar values without guaranteeing strict ordering, which leads to inefficient block pruning for range-based queries over high-cardinality columns.

How to eliminate wrong answers

Option A is wrong because BigQuery does not use indexes; clustering is a performance optimization that organizes data into blocks based on cluster column values, not an index. Option C is wrong because partitioning by date does not inherently cause data skew; data skew is more likely from uneven distribution of symbol values, not from date partitioning. Option D is wrong because partitioning by date is already granular enough for a 30-day window; the issue is not the partition granularity but the clustering inefficiency for queries that span many symbols across multiple partitions.

Full explanation →

28

MCQeasy

A BI developer needs to display sales data in a dashboard that shows sales in local time zones. The source data stores all timestamps in UTC. Which is the best practice for handling time zone conversions?

A.Store timestamps in UTC and convert to local time in the BI tool's application layer

B.Store all timestamps in UTC and convert them to the desired time zone in SQL queries

C.Store timestamps as text strings with time zone offset to avoid conversion

D.Store both UTC and local time in separate columns

AnswerB

This ensures a single source of truth and leverages SQL functions for accurate conversion.

Why this answer

Option B is correct because storing timestamps in UTC and converting them in SQL queries ensures that the conversion logic is centralized, auditable, and consistent across all BI reports. This approach leverages the database engine's time zone functions (e.g., AT TIME ZONE in SQL Server or CONVERT_TZ in MySQL) to handle daylight saving time transitions accurately, avoiding the pitfalls of application-layer conversions that may be inconsistent or not applied uniformly.

Exam trap

Google Cloud often tests the misconception that converting time zones in the application layer is simpler and more flexible, but the trap is that this approach introduces inconsistency when multiple BI tools or direct database queries access the same data, and it fails to leverage the database's robust time zone handling for daylight saving time transitions.

How to eliminate wrong answers

Option A is wrong because converting in the BI tool's application layer can lead to inconsistencies if multiple tools access the same data, and it offloads conversion logic to the presentation tier, which may not handle daylight saving time changes correctly without additional configuration. Option C is wrong because storing timestamps as text strings with time zone offsets breaks date arithmetic, indexing, and sorting, and makes it impossible to use native temporal functions for filtering or aggregation. Option D is wrong because storing both UTC and local time in separate columns duplicates data, increases storage overhead, and risks synchronization errors when time zone rules change (e.g., daylight saving time policy updates).

Full explanation →

29

Multi-Selecthard

Your organization uses Cloud Spanner for a global financial application. You are designing a backup and disaster recovery strategy. Which THREE considerations are important for meeting RPO of 1 hour and RTO of 2 hours?

Select 3 answers

A.Enable point-in-time recovery (PITR) with a 7-day retention.

B.Schedule incremental backups every hour using Cloud Scheduler.

C.Configure a multi-region Spanner instance to survive a regional outage.

D.Store backups in a multi-regional Cloud Storage bucket.

E.Set up read replicas in a different region for failover.

AnswersA, B, C

PITR allows recovery to any point within the retention period, meeting RPO.

Why this answer

Option A is correct because Cloud Spanner's point-in-time recovery (PITR) allows you to restore the database to any point within the retention window (up to 7 days). With a 1-hour RPO, PITR can recover data to within the last hour, meeting the requirement without needing separate backup jobs. PITR is built into Spanner and does not require external scheduling or storage.

Exam trap

Google Cloud often tests the misconception that incremental backups or read replicas are viable for Spanner's disaster recovery, but Spanner relies on PITR and multi-region synchronous replication instead.

Full explanation →

30

MCQeasy

A startup is migrating from MongoDB to Firestore in Datastore mode. Their existing documents contain nested arrays of sub-objects (e.g., tags, comments). They want to design a schema that scales well and supports efficient queries. What is the recommended approach for handling these nested arrays in Firestore?

A.Use maps instead of arrays to store the data.

B.Store the arrays as stringified JSON in a single field.

C.Flatten the arrays into subcollections under each document.

D.Keep the nested arrays as they are; Firestore supports arrays.

AnswerC

Subcollections scale independently and allow efficient queries.

Why this answer

Option B is correct because Firestore recommends using subcollections for arrays of objects to avoid document size limits and enable efficient querying. Option A (keeping nested arrays) can hit size limits and is not scalable; Option C (maps instead of arrays) still has size issues; Option D (stringified JSON) is not queryable.

Full explanation →

31

Multi-Selecteasy

Which TWO are best practices for optimizing write performance in Cloud Bigtable?

Select 2 answers

A.Use short row keys to reduce storage size

B.Group multiple mutations into a single request

C.Design row keys to distribute writes across tablets

D.Use the Dataflow Bulk Import API for real-time writes

E.Increase replication lag to allow more time for writes

AnswersB, C

Batching reduces overhead.

Why this answer

Option B is correct because Bigtable batches mutations into a single RPC request, reducing network round trips and improving throughput. Sending individual mutations incurs per-request overhead, so grouping them into a single atomic or non-atomic batch (via `MutateRows` or client-side batching) significantly increases write throughput.

Exam trap

Google Cloud often tests the misconception that short row keys are a primary optimization for write performance, when in reality row key distribution to avoid hotspots is far more critical for throughput.

Full explanation →

32

MCQhard

Your Cloud SQL for MySQL instance is experiencing intermittent performance degradation. You suspect that the issue is due to a sudden spike in connections from a specific application. Which metric and monitoring approach would best help you correlate the connection spike with performance degradation?

A.Monitor 'cloudsql.googleapis.com/network/received_bytes_count' and compare with connection count.

B.Monitor 'cloudsql.googleapis.com/database/mysql/replication/seconds_behind_master' and compare with query latency.

C.Monitor 'cloudsql.googleapis.com/instance/uptime' and check for instance restarts during degradation.

D.Monitor 'cloudsql.googleapis.com/database/mysql/threads/threads_connected' and correlate with CPU utilization and query latency.

AnswerD

Threads connected directly indicates active connections, and correlating with CPU and latency helps identify the impact.

Why this answer

Option D is correct because the 'threads_connected' metric directly measures the number of active connections to the MySQL instance. Correlating this with CPU utilization and query latency allows you to pinpoint whether a sudden spike in connections is causing resource contention and degraded query performance, which is the exact scenario described.

Exam trap

The trap here is that candidates may confuse network metrics (like bytes received) or replication metrics with direct indicators of connection-related performance issues, rather than focusing on the thread count and its impact on CPU and query latency.

How to eliminate wrong answers

Option A is wrong because 'received_bytes_count' measures network throughput, not connection count; a spike in bytes could be due to large queries or data transfers, not necessarily a connection spike. Option B is wrong because 'seconds_behind_master' is a replication lag metric relevant only for read replicas, not for correlating connection spikes with performance degradation on the primary instance. Option C is wrong because 'instance/uptime' only indicates restarts, which are not directly caused by connection spikes; performance degradation can occur without any instance restart.

Full explanation →

33

MCQmedium

A team is designing a BigQuery schema for time-series analytics on IoT sensor data. They expect high write throughput and queries that aggregate data by hour. Which partitioning and clustering strategy is most cost-effective?

A.Partition by ingestion_time and cluster by sensor_id.

B.Use integer range partitioning on sensor_id.

C.Partition by date and cluster by sensor_id with a timestamp column.

D.Partition by sensor_id and cluster by timestamp.

AnswerC

Date-based partitioning efficiently prunes scans; clustering by sensor_id further reduces data read.

Why this answer

Partitioning by date (e.g., ingestion time or event date) is standard for time-series. Clustering by sensor_id helps queries that filter on specific sensors. Option C (partition by date, cluster by sensor_id) is best.

Option A uses ingestion time, which may not align with event time. Option B partitions by sensor_id, creating many partitions. Option D (integer range) is not suitable for dates.

Full explanation →

34

MCQhard

A Cloud Spanner database has a parent table 'Customers' and a child table 'Orders' interleaved on CustomerId. The most common query retrieves the last 10 orders for a given customer. How should the primary key of Orders be defined for optimal performance?

A.(CustomerId, OrderId)

B.Add a commit timestamp column as part of the primary key

C.No change; use a secondary index on OrderDate

D.(CustomerId, OrderDate DESC)

AnswerD

Descending order stores newest first, enabling efficient limit queries.

Why this answer

Option B is correct: Using CustomerId (parent key) and OrderDate DESC ensures that the most recent orders are stored first within each interleaved row range, making queries for last N orders efficient. Option A (OrderId) is monotonically increasing but not sorted by date. Option C (secondary index) adds overhead.

Option D (commit timestamp) is not a primary key.

Full explanation →

35

MCQeasy

A team notices that queries on a Cloud Spanner database are slow. They want to identify which queries are consuming the most resources. What should they use?

A.Query Insights

B.Cloud Logging

C.Performance Dashboard

D.Cloud Monitoring metrics

AnswerA

Query Insights is the dedicated tool for analyzing Cloud Spanner query performance.

Why this answer

Query Insights is the correct tool because it is specifically designed for Cloud Spanner to analyze query performance, providing detailed metrics such as execution latency, CPU usage, and rows scanned per query. It helps identify the most resource-intensive queries by breaking down performance by query fingerprint, allowing the team to pinpoint and optimize slow queries directly.

Exam trap

Google Cloud often tests the distinction between high-level monitoring tools (Cloud Monitoring, Performance Dashboard) and query-specific diagnostic tools (Query Insights), trapping candidates who confuse general performance metrics with per-query resource analysis.

How to eliminate wrong answers

Option B is wrong because Cloud Logging captures raw log entries and events but does not provide aggregated query-level performance metrics or resource consumption analysis for Cloud Spanner. Option C is wrong because the Performance Dashboard in Google Cloud Console offers a high-level overview of database metrics like latency and throughput, but it lacks the per-query breakdown and resource attribution needed to identify specific resource-heavy queries. Option D is wrong because Cloud Monitoring metrics provide system-level metrics (e.g., CPU utilization, storage) but do not offer query-level insights or the ability to sort and analyze individual query performance.

Full explanation →

36

MCQmedium

A team is migrating an Oracle database to Cloud Spanner. They have a large table with an auto-increment primary key. Which key design strategy should they use to avoid hot spots?

A.Use a composite primary key with a hash prefix of the auto-increment value

B.Use the auto-increment value as the primary key

C.Use interleaved tables to colocate related data

D.Use UUID as the primary key

AnswerA

Hash prefix distributes writes uniformly across splits.

Why this answer

Option A is correct because using a composite primary key with a hash prefix of the auto-increment value distributes writes across multiple splits in Cloud Spanner. Cloud Spanner uses range-based sharding; a monotonically increasing primary key (like an auto-increment value) would cause all new writes to land on the same split, creating a hot spot. By hashing the auto-increment value and prepending it to the primary key, you randomize the key distribution, ensuring writes are spread evenly across the table's splits.

Exam trap

The trap here is that candidates often think UUIDs are always the best solution for avoiding hot spots in distributed databases, but in Cloud Spanner, a hash prefix of the auto-increment value is more storage-efficient and avoids the performance overhead of large primary keys, while still achieving even write distribution.

How to eliminate wrong answers

Option B is wrong because using the auto-increment value directly as the primary key creates a monotonically increasing sequence, which Cloud Spanner will route to a single split, causing a hot spot and severely limiting write throughput. Option C is wrong because interleaved tables colocate parent and child rows for efficient joins, but they do not address the distribution of writes for a table with a monotonically increasing primary key; the hot spot would still occur in the parent table. Option D is wrong because while a UUID primary key avoids monotonicity and can distribute writes, it is not the best design for Cloud Spanner; UUIDs are large (128-bit), increase storage and index size, and can still cause hot spots if not carefully designed (e.g., time-based UUIDs), whereas a hash prefix of the auto-increment value is more efficient and purpose-built for this scenario.

Full explanation →

37

Multi-Selecthard

A company is migrating a large Oracle Data Warehouse to BigQuery. The source schema includes many partitioned tables and materialized views. Which THREE considerations are important when designing the BigQuery schema?

Select 3 answers

A.Clustering can be used to improve query performance on frequently filtered columns.

B.Partitioning in BigQuery can be based on a DATE, TIMESTAMP, or INTEGER column.

C.BigQuery requires explicit indexes on columns used in WHERE clauses.

D.Materialized views in BigQuery are automatically refreshed based on base table changes.

E.BigQuery supports unique constraints and foreign keys for data integrity.

AnswersA, B, D

Clustering sorts data within partitions for better filter performance.

Why this answer

Option A is correct because BigQuery clustering organizes data based on the values of specified columns, which improves query performance by reducing the amount of data scanned when filtering on those columns. This is particularly useful for large data warehouses migrating from Oracle, as it mimics the performance benefits of indexes without the overhead of explicit index management.

Exam trap

Google Cloud often tests the misconception that BigQuery requires traditional database features like indexes or constraints, leading candidates to select options that apply to OLTP systems but not to BigQuery's distributed, columnar architecture.

Full explanation →

38

MCQmedium

A company is designing a Cloud Firestore schema for a social media application. Users can follow other users, and the application needs to display a feed of posts from followed users ordered by timestamp. Which schema design is most cost-effective and performant for querying the feed?

A.Store all posts in a top-level collection and query for posts where user ID is in the list of followed users, ordered by timestamp.

B.Store a feed subcollection under each user document containing references to posts from followed users.

C.Store all user posts in an array within a single document and use array-contains queries.

D.Store a 'follows' collection with documents containing follower and followed user IDs; then query posts for each followed user.

AnswerB

This allows direct query on the feed subcollection ordered by timestamp.

Why this answer

Option B is correct because it uses a feed subcollection under each user document to store pre-computed references to posts from followed users. This design avoids expensive collection-group queries or multiple individual queries per followed user, ensuring that fetching the feed is a single, indexed read operation ordered by timestamp, which is both cost-effective and performant at scale.

Exam trap

The trap here is that candidates often choose Option A, thinking a single top-level query with an 'in' filter is simpler, but they overlook Firestore's 10-value limit on 'in' queries and the resulting need for multiple queries, which destroys both performance and cost predictability at scale.

How to eliminate wrong answers

Option A is wrong because querying a top-level posts collection with a list of followed user IDs requires an 'in' query, which is limited to 10 values per query and does not scale to hundreds or thousands of followed users, leading to multiple queries and high read costs. Option C is wrong because storing all user posts in an array within a single document violates the 1 MiB document size limit and cannot support ordered queries or pagination, making it impractical for any real-world social media feed. Option D is wrong because querying posts for each followed user individually results in N+1 read operations per feed request, causing high latency and cost proportional to the number of followed users, with no built-in ordering across results.

Full explanation →

39

MCQhard

An e-commerce platform uses Cloud Bigtable for real-time user sessions. Write latency is high. On investigation, they find that rows are being written with monotonically increasing row keys (e.g., user_id + timestamp). What is the likely cause and solution?

A.Too many column families; merge them

B.Inefficient reads; use reverse scan

C.Hotspotting on a single node; use salting or field promotion

D.Tablet splits are misconfigured; pre-split the table

AnswerC

Salting or field promotion spreads writes across tablets.

Why this answer

Monotonically increasing row keys (e.g., user_id + timestamp) cause all writes to target a single tablet server, creating a hotspot. Cloud Bigtable distributes writes across nodes by row key range; sequential keys concentrate load on one node, degrading write latency. Salting (prepending a hash or random prefix) or field promotion (using a high-cardinality field as the first part of the key) spreads writes evenly across the cluster.

Exam trap

Google Cloud often tests the misconception that pre-splitting alone solves hotspotting, but the trap here is that monotonically increasing keys will still cause writes to concentrate on the last tablet regardless of pre-splitting, requiring key design changes like salting.

How to eliminate wrong answers

Option A is wrong because too many column families do not cause write hotspotting; they affect storage and read performance, not write distribution. Option B is wrong because inefficient reads (e.g., full scans) are a read-side issue, not a cause of high write latency; reverse scan is a read optimization, not a write fix. Option D is wrong because misconfigured tablet splits or pre-splitting addresses initial distribution, but the root cause here is the key design pattern, not split configuration; even with pre-splits, monotonically increasing keys will still hotspot writes to the last tablet.

Full explanation →

40

Multi-Selectmedium

You are a Cloud Database Engineer managing a Cloud Spanner instance. You notice that some queries are taking longer than expected. You suspect that the queries are not using secondary indexes efficiently. Which TWO metrics should you monitor in Cloud Monitoring to validate your suspicion? (Choose two.)

Select 2 answers

A.CPU utilization

B.Statement scan rows returned

C.Lock conflicts

D.Row count returned by index

E.Query scan latency (mean)

AnswersB, D

High scan rows vs. rows returned indicates inefficient index usage.

Why this answer

Option B: 'Statement scan rows returned' shows the number of rows scanned per query; a high value compared to rows returned indicates inefficient index usage. Option D: 'Row count returned by index' shows how many rows are returned from index scans; if it's high but the final result is small, the index may be too broad. Option A is about overall latency, not specific to index usage.

Option C is about CPU, not directly about index usage. Option E is about lock conflicts, unrelated.

Full explanation →

41

Multi-Selecthard

Which THREE are best practices for designing a Cloud Spanner schema for high performance? (Choose three.)

Select 3 answers

A.Spread data across multiple regions to reduce latency.

B.Avoid using a monotonically increasing primary key as the first part of the key.

C.Denormalize frequently joined tables into a single table.

D.Use interleaved tables for tables that are always accessed together by the parent key.

E.Use secondary indexes when querying by non-key columns.

AnswersB, D, E

This prevents write hotspotting.

Why this answer

Option B is correct because a monotonically increasing primary key (e.g., an auto-increment integer or timestamp) creates a hot spot on the last tablet server, causing all writes to be serialized on a single split. Cloud Spanner distributes splits based on the primary key range; a sequential key forces all new inserts into the same split, leading to write contention and poor throughput. Using a hash prefix or a UUID-like key spreads writes evenly across splits, maximizing parallelism.

Exam trap

Google Cloud often tests the misconception that denormalization is always beneficial for performance, but in Cloud Spanner, interleaved tables provide efficient parent-child joins without the downsides of denormalization.

Full explanation →

42

MCQmedium

A Cloud Spanner database contains the Orders table as defined above. The query `SELECT * FROM Orders WHERE CustomerID=123` takes a long time. What is the most likely reason?

A.The ORDER BY clause is missing.

B.Interleaving causes extra I/O.

C.The primary key is not optimized for this query.

D.The table needs a secondary index on CustomerID.

AnswerD

A secondary index on CustomerID enables direct lookup without scanning the entire table.

Why this answer

The query `SELECT * FROM Orders WHERE CustomerID=123` filters on the `CustomerID` column, but the primary key of the Orders table is likely defined on a different column (e.g., `OrderID`). Without a secondary index on `CustomerID`, Cloud Spanner must perform a full table scan to find matching rows, which is slow for large tables. Creating a secondary index on `CustomerID` allows Cloud Spanner to directly locate the relevant splits and rows, dramatically reducing latency.

Exam trap

The trap here is that candidates often assume a primary key is always the best way to query any column, but Cloud Spanner requires the query predicate to match the primary key order for efficient access; otherwise, a secondary index is necessary.

How to eliminate wrong answers

Option A is wrong because the absence of an ORDER BY clause does not cause a query to take a long time; it merely affects the order of results, not the scan method. Option B is wrong because interleaving (table interleaving in Cloud Spanner) is a design pattern that can improve performance by co-locating parent and child rows; it does not inherently cause extra I/O and is not relevant to a simple filter on a non-key column. Option C is wrong because the primary key is already defined; the issue is that the query predicate does not match the primary key order, so the primary key cannot be used efficiently for this filter.

Full explanation →

43

MCQhard

A financial firm uses Cloud Spanner with a single-region configuration. They must meet regulatory requirements for disaster recovery across continents. They need to recover within 1 hour RTO and RPO of 5 minutes. Current workload: 50k writes/sec. What should they do?

A.Use cross-region backups with 5-minute retention.

B.Use Cloud SQL for MySQL with cross-region replicas.

C.Use multi-region configuration with synchronous replication across two continents.

D.Use export to Cloud Storage every 5 minutes.

AnswerC

Synchronous replication ensures near-zero RPO and automatic failover meets RTO.

Why this answer

Option C is correct because Cloud Spanner's multi-region configuration uses synchronous replication across continents, providing strong consistency and automatic failover. This meets the 1-hour RTO and 5-minute RPO requirements for disaster recovery, as synchronous replication ensures data is durable across regions with minimal lag, and Spanner handles failover transparently without manual intervention.

Exam trap

The trap here is that candidates often confuse backup-based recovery (like exports or snapshots) with synchronous replication, failing to realize that only synchronous replication can meet strict RPOs like 5 minutes across continents without data loss.

How to eliminate wrong answers

Option A is wrong because cross-region backups with 5-minute retention cannot achieve a 5-minute RPO; backups are point-in-time snapshots and restoring them takes longer than 1 hour, failing the RTO. Option B is wrong because Cloud SQL for MySQL does not support cross-region replicas with synchronous replication; it uses asynchronous replication, which cannot guarantee a 5-minute RPO across continents due to replication lag. Option D is wrong because exporting to Cloud Storage every 5 minutes introduces significant latency and data loss risk; exports are not incremental and cannot meet the 5-minute RPO or 1-hour RTO, as restoring from exports requires manual import and is not designed for disaster recovery.

Full explanation →

44

MCQmedium

An e-commerce site uses Cloud SQL for MySQL. They need read scalability for product catalog queries. What should they do?

A.Add read replicas

B.Partition tables

C.Use Cloud Spanner

D.Enable automatic storage increase

AnswerA

Read replicas serve read queries, reducing load on the primary.

Why this answer

Adding read replicas in Cloud SQL for MySQL offloads read traffic from the primary instance, providing horizontal read scalability for product catalog queries. Replicas asynchronously replicate data using MySQL's native binary log replication, allowing the primary to focus on writes while replicas handle read-heavy workloads.

Exam trap

The trap here is that candidates confuse vertical scaling (storage increase) or schema-level optimizations (partitioning) with horizontal read scaling, or incorrectly assume a fully distributed database like Spanner is required for simple read offloading.

How to eliminate wrong answers

Option B is wrong because partitioning tables (e.g., range or hash partitioning) improves query performance on large tables by reducing scan scope, but it does not add read capacity or offload traffic from the primary instance. Option C is wrong because Cloud Spanner is a globally distributed, strongly consistent relational database designed for horizontal write scalability and global transactions, which is overkill and cost-prohibitive for a simple read-scaling need on an existing MySQL workload. Option D is wrong because enabling automatic storage increase only prevents out-of-disk errors by expanding storage capacity; it does not improve read throughput or distribute query load.

Full explanation →

45

MCQhard

A financial services company uses Cloud Spanner for a ledger application. The ledger table has a primary key of 'transaction_id' which is a monotonically increasing integer. During peak hours, they observe high write latencies due to hot spots on the last tablet. They need to redesign the schema to distribute writes evenly while still allowing efficient point lookups by transaction ID. What is the best approach?

A.Reverse the timestamp and use it as the primary key.

B.Use a UUID as the primary key to ensure randomness.

C.Use a composite primary key with a timestamp and a random number.

D.Use a composite primary key with a hash prefix of the transaction ID as the first component, followed by the transaction ID.

AnswerD

The hash prefix evenly distributes writes, and the transaction ID allows efficient point lookups.

Why this answer

Option B is correct because using a hash prefix (e.g., a hash of the transaction ID) as the first component of the primary key distributes writes across tablets, while the transaction ID as the second component still allows efficient lookups. Option A (UUID) helps distribution but has larger key size and may fragment reads; Option C (reverse timestamp) can also help but may cause hotspots if timestamps are sequential; Option D (composite with timestamp) still has potential for hotspots.

Full explanation →

46

MCQhard

A game company uses Cloud Bigtable to store player session data. Access patterns include looking up a player's most recent sessions and scanning sessions by time range. Which row key design is most appropriate?

A.Use only player ID as row key with column qualifiers for timestamps.

B.Use a row key of player ID followed by reversed timestamp.

C.Prefix with timestamp and append player ID.

D.Use a hash of player ID as row key and store timestamps in cell versions.

AnswerB

Player ID distributes writes across tablets; reversed timestamp makes recent data appear at the start of the range for efficient scans.

Why this answer

Option B is correct because using reversed timestamp as part of the row key helps distribute writes and allows efficient range scans over recent data. Option A is wrong because timestamp first can cause hotspotting. Option C is wrong because sequential player IDs cause hotspotting on a single tablet.

Option D is wrong because hashing alone makes range scans impossible.

Full explanation →

47

MCQhard

A retail company uses Cloud Spanner to handle global transaction processing. The database has a single regional instance in us-central1. The company expects a 10x increase in write traffic from a new mobile app. The database engineer needs to design for low latency writes globally and high availability. What should the Database Engineer do?

A.Shard the database across multiple regional instances based on user geography.

B.Create read replicas in other regions to offload read traffic and keep writes in the primary region.

C.Change the instance configuration to a multi-region configuration like nam3 (us-central1, us-east1, us-west1) and configure a dedicated write region.

D.Increase the number of nodes in the existing regional instance to handle the increased write capacity.

AnswerC

Multi-region configurations provide low-latency writes by placing processing close to users and ensure high availability.

Why this answer

Option C is correct because a multi-region configuration like nam3 (us-central1, us-east1, us-west1) with a dedicated write region provides low-latency writes globally by using Google's managed replication and automatic failover. This design ensures high availability and meets the 10x write traffic increase without sacrificing write performance, as writes are processed in the designated write region and asynchronously replicated to other regions.

Exam trap

The trap here is that candidates often confuse scaling nodes (Option D) with geographic distribution, or assume read replicas (Option B) can handle write scaling, when in fact Cloud Spanner requires a multi-region configuration to achieve both global write low latency and high availability.

How to eliminate wrong answers

Option A is wrong because sharding across multiple regional instances would require manual application-level logic and does not leverage Cloud Spanner's built-in distributed transaction support, leading to increased complexity and potential consistency issues. Option B is wrong because read replicas do not offload write traffic; writes must still go to the primary region, which would become a bottleneck under a 10x write increase, and read replicas do not improve write latency or availability for writes. Option D is wrong because increasing nodes in a single regional instance only scales capacity within that region, failing to provide global low-latency writes or multi-region high availability, and does not address geographic distribution requirements.

Full explanation →

48

MCQhard

A BI team needs to analyze user behavior with sessionization. Each event has a timestamp and session ID. The table 'sessions' contains columns: session_id, user_id, event_time, event_name. The team wants the first event time per session. Which query is most efficient?

A.SELECT session_id, ARRAY_AGG(event_time ORDER BY event_time LIMIT 1) FROM sessions GROUP BY session_id

B.SELECT a.session_id, a.event_time FROM sessions a INNER JOIN (SELECT session_id, MIN(event_time) min_ts FROM sessions GROUP BY session_id) b ON a.session_id = b.session_id AND a.event_time = b.min_ts

C.SELECT session_id, MIN(event_time) FROM sessions GROUP BY session_id

D.SELECT session_id, event_time FROM sessions QUALIFY ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY event_time) = 1

AnswerD

QUALIFY filters to the first row per session, efficient with window functions.

Why this answer

Option D is correct because it uses the QUALIFY clause with ROW_NUMBER() to filter directly within the window function, avoiding a self-join or subquery. This approach is efficient in Snowflake and similar platforms, as it processes the window function once and then filters to the first event per session without materializing intermediate results.

Exam trap

Google Cloud often tests the misconception that a simple GROUP BY with MIN is always the most efficient, but the trap here is that the exam expects candidates to recognize QUALIFY with ROW_NUMBER() as a more modern and efficient pattern for sessionization, especially when additional per-session calculations are needed.

How to eliminate wrong answers

Option A is wrong because ARRAY_AGG with LIMIT 1 returns an array containing a single element, not a scalar value, and is less efficient than MIN or ROW_NUMBER. Option B is wrong because it performs a self-join on both session_id and event_time, which is redundant and less efficient than a simple GROUP BY or window function; it also requires an exact match on the timestamp, which can fail if there are duplicate timestamps for the same session. Option C is wrong because although it correctly returns the first event time per session, it is not the most efficient option in the context of the PCDE exam, which often tests window functions and QUALIFY as a more modern and flexible approach.

Full explanation →

49

MCQeasy

A user runs the above command and expects a row to be returned because the user exists. Which index is missing?

A.Primary key index on Users(Email)

B.Index on Users(Email)

C.Composite index on Users(Email, UserId)

D.No index needed, query scans full table.

AnswerB

An index on Email allows direct lookup by email, returning the row efficiently.

Why this answer

Option B is correct because the query is filtering on the `Email` column, and without an index on `Users(Email)`, the database must perform a full table scan. Even though the user exists, the query may not return a row if the table is large and the optimizer chooses a scan that misses the row due to data distribution or lack of statistics. An index on `Email` allows an index seek, ensuring the row is found efficiently.

Exam trap

The trap here is that candidates assume a primary key index is always present on the lookup column, but the question tests whether you recognize that a non-primary key column needs its own index for efficient filtering, not that the primary key itself is missing.

How to eliminate wrong answers

Option A is wrong because a primary key index on `Users(Email)` would require `Email` to be the primary key, which is not necessarily the case; the primary key might be `UserId`, and adding a primary key on `Email` could change table structure and is not the missing index for a simple lookup. Option C is wrong because a composite index on `Users(Email, UserId)` is overkill; the query only filters on `Email`, so a single-column index on `Email` suffices, and a composite index may be larger and less efficient for this specific query. Option D is wrong because relying on a full table scan is inefficient and does not guarantee a row is returned if the table is large or the query plan uses a scan that skips the row due to concurrency or statistics; an index is needed for reliable, fast access.

Full explanation →

50

Multi-Selecthard

A global retail company uses Cloud Spanner to manage product inventory. They need to apply a schema change to add a new column to a table that has 10 billion rows. Which THREE strategies should they consider to minimize downtime?

Select 3 answers

A.Schedule the schema change during a period of low traffic.

B.Disable the table, add the column, then re-enable.

C.Add the column with a NULL default value to avoid backfilling existing rows.

D.Use ALTER TABLE ADD COLUMN with IF NOT EXISTS to avoid errors if the column already exists.

E.Create a new table with the column, then copy data in batches.

AnswersA, C, D

Even though Spanner DDL is online, performing it during low traffic minimizes any potential performance impact.

Why this answer

Option A is correct because scheduling schema changes during low traffic reduces the risk of contention and performance impact on the live database. Cloud Spanner applies schema changes online without locking the entire table, but heavy write traffic can still cause transaction conflicts or increased latency; performing the change during a quiet period minimizes these risks.

Exam trap

Google Cloud often tests the misconception that large tables require data migration or table recreation for schema changes, but Cloud Spanner's online DDL handles column additions without backfilling, making options like B and E unnecessary and counterproductive.

Full explanation →

51

MCQeasy

A financial services company runs a MySQL database on Compute Engine. They want to migrate to Cloud SQL for MySQL to reduce operational overhead. The current schema includes a table 'transactions' with a composite primary key on (transaction_id, account_id) and a secondary index on account_id for account lookups. The database also uses foreign key constraints to ensure referential integrity between 'transactions' and 'accounts'. During migration testing, they observe that INSERT operations on 'transactions' are slower than expected. What schema change should they implement to improve INSERT performance in Cloud SQL?

A.Remove the foreign key constraints and enforce referential integrity in the application logic instead.

B.Remove the secondary index on account_id because it adds write overhead.

C.Change the primary key to (account_id, transaction_id) to avoid secondary index overhead.

D.Convert the table to a temporal table with system-versioning to avoid constraint checking.

AnswerA

Foreign key constraints require a lookup on the parent table for every INSERT, causing latency. Removing them reduces write overhead, though integrity must be ensured by the application.

Why this answer

Foreign key constraints in MySQL (including Cloud SQL) require an internal check on every INSERT to verify that the referenced parent key exists. This adds a latency penalty proportional to the size of the parent table. Removing the constraint and moving referential integrity to the application eliminates this per-row check, directly improving INSERT throughput.

Exam trap

Google Cloud often tests the misconception that secondary indexes are the primary cause of write slowdowns, when in reality foreign key constraint checks are far more expensive per row than index maintenance.

How to eliminate wrong answers

Option B is wrong because removing the secondary index on account_id would degrade SELECT performance for account lookups, and the index's write overhead is negligible compared to the cost of foreign key checks. Option C is wrong because changing the primary key order does not eliminate foreign key validation overhead; it only affects index clustering and does not address the root cause of slow INSERTs. Option D is wrong because temporal tables with system-versioning add additional metadata and version-row writes on every INSERT, which would further degrade performance, not improve it.

Full explanation →

52

MCQhard

A Cloud SQL for SQL Server instance has been running for months. Recently, the database size grew significantly and now query performance has degraded. The DBA checks the query execution plan and sees index scans. The current storage is 500GB SSD. What is the most likely cause and solution?

A.Increase storage capacity to 1TB SSD

B.Index fragmentation; rebuild or reorganize indexes

C.Enable query insights to analyze performance

D.Enable read replicas to offload queries

AnswerB

Index fragmentation increases scan cost; rebuilding reorganizes data pages.

Why this answer

The correct answer is B. Index fragmentation occurs over time as data is inserted, updated, or deleted, causing indexes to become inefficient. The query execution plan showing index scans (instead of seeks) is a classic symptom of fragmented indexes.

Rebuilding or reorganizing the indexes will defragment them, restoring query performance without requiring additional storage or infrastructure changes.

Exam trap

Google Cloud often tests the misconception that performance degradation from data growth is always a storage capacity issue, leading candidates to choose a storage increase instead of recognizing index fragmentation as the root cause when execution plans show index scans.

How to eliminate wrong answers

Option A is wrong because increasing storage capacity does not address index fragmentation; it only provides more space, which does not improve query performance if the indexes are fragmented. Option C is wrong because enabling query insights helps analyze performance but does not fix the root cause of index scans; it is a diagnostic tool, not a solution. Option D is wrong because read replicas offload read queries but do not resolve index fragmentation on the primary instance; the degraded performance would persist on the primary instance.

Full explanation →

53

MCQeasy

A company is migrating their on-premises PostgreSQL database to Cloud SQL. They want to minimize downtime during the migration. Which approach should they use?

A.Export the database using pg_dump and import using psql in a single connection.

B.Use a VPN tunnel and set up Cloud SQL as a read replica of the on-premises primary.

C.Use Database Migration Service (DMS) with continuous replication from an on-premises replica.

D.Use Database Migration Service (DMS) with a one-time full dump and import.

AnswerC

Continuous replication allows near-zero downtime by keeping the target up-to-date until cutover.

Why this answer

Database Migration Service (DMS) with continuous replication minimizes downtime by performing an initial full load of the database and then continuously replicating changes from the on-premises source to Cloud SQL. This allows the target to stay nearly synchronized with the source, so the final cutover can be completed in seconds or minutes rather than hours.

Exam trap

The trap here is that candidates confuse Cloud SQL's read replica feature (which only works within Cloud SQL) with the ability to replicate from an external primary, leading them to choose Option B despite it being technically impossible.

How to eliminate wrong answers

Option A is wrong because using pg_dump and psql in a single connection is a manual, offline migration method that requires the source database to be locked or read-only during the dump, causing significant downtime. Option B is wrong because Cloud SQL does not support being configured as a read replica of an on-premises PostgreSQL primary; Cloud SQL read replicas can only replicate from a Cloud SQL primary, not from external sources. Option D is wrong because a one-time full dump and import (even via DMS) does not include ongoing change data capture, so the target will be out of sync with the source by the time the import completes, requiring additional downtime for a final sync.

Full explanation →

54

Multi-Selectmedium

A database administrator is planning to migrate an on-premises MySQL database to Cloud SQL. Which two steps are required to ensure a secure migration?

Select 2 answers

A.Configure Cloud SQL to use a private IP address

B.Add authorized networks for all client IPs

C.Ensure the database is encrypted at rest using CMEK

D.Enable SSL/TLS for all connections

E.Set up Cloud SQL Proxy for secure authentication and encryption

AnswersA, E

Private IP ensures traffic stays within Google Cloud network.

Why this answer

Option A is correct because using a private IP address for Cloud SQL ensures that the database instance is not exposed to the public internet, reducing the attack surface. This is a fundamental security best practice for database migrations, as it restricts network access to within a Virtual Private Cloud (VPC) and requires traffic to traverse Google's internal network, which is more secure than public IP routing.

Exam trap

Google Cloud often tests the distinction between 'best practice' and 'required step' — candidates may select SSL/TLS (Option D) as a required step, but the exam expects understanding that Cloud SQL Proxy inherently provides encryption and authentication, making separate SSL/TLS configuration redundant for the migration scenario.

Full explanation →

55

MCQmedium

Refer to the exhibit. What is the likely cause of this error?

A.The table is a view

B.The query does not include WHERE clause with partition column

C.The table is not partitioned

D.The user does not have permission to query the table

AnswerB

The error states no filter over the partition column, meaning the query tries to scan all partitions, which is blocked by a query optimizer or cost control.

Why this answer

The error occurs because the query attempts to access a partitioned table without specifying the partition column in the WHERE clause. In Snowflake (the platform implied by PCDE context), querying a large partitioned table without a partition filter forces a full scan of all partitions, which can exceed resource limits or time out. The correct approach is to include the partition column in the WHERE clause to enable partition pruning.

Exam trap

Google Cloud often tests the misconception that any table can be queried without a WHERE clause, but for partitioned tables, the partition column must be included in the WHERE clause to avoid full partition scans and associated errors.

How to eliminate wrong answers

Option A is wrong because a view would not cause this specific error; views can be queried without a WHERE clause, and the error message would differ (e.g., 'invalid object' or 'view does not exist'). Option C is wrong because if the table were not partitioned, there would be no partition-related error; the error specifically indicates a partition-related issue. Option D is wrong because permission errors typically produce 'insufficient privileges' or 'access denied' messages, not the error shown in the exhibit.

Full explanation →

56

Multi-Selecteasy

Which TWO actions would help optimize a Cloud SQL for PostgreSQL database experiencing high read latency?

Select 2 answers

A.Increase the number of read replicas

B.Add indexes on frequently queried columns

C.Increase database tier machine type

D.Configure automatic storage increase

E.Use pgBouncer connection pooling

AnswersB, C

Indexes speed up data retrieval by reducing full table scans.

Why this answer

Adding indexes on frequently queried columns (B) reduces full table scans, and increasing the database tier machine type (D) provides more CPU/memory for query processing. Read replicas (A) distribute load but do not reduce individual query latency; connection pooling (C) helps connection management, not read latency; automatic storage increase (E) is irrelevant.

Full explanation →

57

MCQhard

A Cloud Spanner database needs to add a column 'discount' to the 'Products' table without any downtime. The table is actively used. What is the correct approach?

A.Create a new table with the column and copy data over

B.Execute ALTER TABLE Products ADD COLUMN discount FLOAT64

C.Create a secondary index that includes the new column

D.Define a generated column based on an existing column

AnswerB

Spanner allows DDL changes while the table remains fully available.

Why this answer

Option A is correct: Spanner supports online schema updates via ALTER TABLE ADD COLUMN, which does not block reads or writes. Option B (new table and copy) would require downtime or at least double-write logic. Option C (secondary index) is unrelated.

Option D (generated column) could be used but is unnecessary.

Full explanation →

58

Multi-Selectmedium

A company is experiencing slow query performance in Cloud SQL for PostgreSQL. Which TWO tools can help identify the root cause?

Select 2 answers

A.Cloud Monitoring

B.Query Insights

C.Cloud Logging with error reporting

D.Cloud Profiler

E.Cloud Trace

AnswersA, B

Cloud Monitoring shows instance-level resource metrics that can indicate bottlenecks.

Why this answer

Cloud Monitoring provides metrics and dashboards to track database performance indicators like CPU utilization, memory usage, disk I/O, and query latency, helping identify resource bottlenecks. Query Insights offers detailed query-level diagnostics, including execution plans, lock contention, and slow query analysis, directly pinpointing problematic SQL statements in Cloud SQL for PostgreSQL.

Exam trap

The trap here is that candidates often confuse Cloud Logging’s error reporting with performance diagnostics, or assume Cloud Profiler and Cloud Trace can analyze database internals, when in fact they are application-layer tools not designed for PostgreSQL query tuning.

Full explanation →

59

MCQmedium

A company uses BigQuery with a table 'orders' that has a column 'items' of type ARRAY<STRUCT<product_id STRING, quantity INT64>>. An analyst needs to find orders that contain a specific product, 'ABC'. Which query is most efficient?

A.SELECT * FROM orders WHERE EXISTS (SELECT 1 FROM UNNEST(items) WHERE product_id = 'ABC')

B.SELECT * FROM orders WHERE ARRAY_LENGTH(items) > 0

C.SELECT * FROM orders WHERE 'ABC' IN UNNEST(items)

D.SELECT o.*, item FROM orders o, UNNEST(items) item WHERE item.product_id = 'ABC'

AnswerA

EXISTS with UNNEST is the standard pattern for array membership.

Why this answer

Option A is correct because it uses a correlated subquery with `UNNEST` and `EXISTS`, which stops scanning as soon as a matching product_id is found within each row's array. This is the most efficient pattern for checking array membership in BigQuery, as it avoids unnecessary row multiplication and leverages short-circuit evaluation.

Exam trap

Google Cloud often tests the misconception that `IN UNNEST` works directly with struct arrays, when in fact it requires a scalar field extraction, and that `CROSS JOIN UNNEST` is always the correct way to filter array contents, ignoring the performance penalty of row multiplication.

How to eliminate wrong answers

Option B is wrong because `ARRAY_LENGTH(items) > 0` only checks if the array is non-empty, not whether it contains the specific product 'ABC'. Option C is wrong because `'ABC' IN UNNEST(items)` is invalid syntax; `IN` with `UNNEST` requires a scalar comparison, but `items` is an array of structs, not scalars, so this will cause a type mismatch error. Option D is wrong because the implicit `CROSS JOIN` with `UNNEST` multiplies rows for each array element, which is inefficient for large tables and requires a `DISTINCT` or `SELECT o.*` with deduplication to avoid duplicate order rows, making it slower and more resource-intensive than the `EXISTS` approach.

Full explanation →

60

MCQhard

Refer to the exhibit. The query used DATE_TRUNC(order_date, MONTH) as month. order_date is a TIMESTAMP column. What is the data type of the month column in the result?

A.STRING

B.DATE

C.DATETIME

D.TIMESTAMP

AnswerD

DATE_TRUNC of a TIMESTAMP returns a TIMESTAMP with time set to 00:00:00.

Why this answer

In BigQuery (the SQL engine for the PCDE exam), DATE_TRUNC with a TIMESTAMP input and MONTH granularity returns a TIMESTAMP value, not a DATE or DATETIME. The function truncates the timestamp to the first day of the month at 00:00:00 UTC, preserving the TIMESTAMP data type. Therefore, the month column in the result is of type TIMESTAMP.

Exam trap

The trap here is that candidates often assume DATE_TRUNC returns a DATE because of the word 'DATE' in the function name, but in BigQuery the output type matches the input type, so a TIMESTAMP input yields a TIMESTAMP output.

How to eliminate wrong answers

Option A is wrong because DATE_TRUNC does not return a STRING; it returns a temporal type, not a text representation. Option B is wrong because DATE_TRUNC on a TIMESTAMP column returns a TIMESTAMP, not a DATE; a DATE would lack the time component entirely. Option C is wrong because DATETIME is a different type that does not include timezone context, whereas BigQuery's DATE_TRUNC on a TIMESTAMP preserves the TIMESTAMP type with timezone awareness.

Full explanation →

61

MCQhard

Your company runs an e-commerce platform on Google Cloud. The platform uses Cloud SQL for MySQL to store product inventory. The inventory table has the following schema: CREATE TABLE inventory (product_id INT PRIMARY KEY, quantity INT, last_updated TIMESTAMP) ENGINE=InnoDB. The application performs frequent updates on quantity for a subset of popular products. Recently, you have noticed increased deadlock errors during peak hours. The application uses REPEATABLE READ isolation level. You suspect that the schema design is contributing to locking contention. After analyzing the workload, you find that the updates often involve incrementing or decrementing quantity by small amounts and are mostly on the same set of popular products. What would be the best course of action to reduce deadlocks without compromising data integrity?

A.Rewrite the update query to use atomic operations (e.g., UPDATE inventory SET quantity = quantity - ? WHERE product_id = ?) without pre-fetching the current value.

B.Change the engine to MyISAM to avoid row-level locking.

C.Partition the inventory table by product_id range to spread the load.

D.Reduce the isolation level to READ COMMITTED to reduce locking.

AnswerA

Atomic updates avoid the need for SELECT ... FOR UPDATE and significantly reduce locking and deadlock chances.

Why this answer

Option C is correct because using UPDATE with a WHERE clause that includes the current quantity can cause gap locks and phantom reads; switching to a single atomic UPDATE without checking the current value, and optionally using optimistic locking, reduces locking. Option A is wrong because row-level locking is already used; disabling it is not possible. Option B is wrong because reducing isolation to READ COMMITTED may reduce locking but could cause non-repeatable reads; however, it is a viable option but not the best.

Option D is wrong because changing to MyISAM is not supported and also loses transactional integrity. The best solution is to adjust the SQL statement to avoid the read-before-write pattern and rely on atomic operations.

Full explanation →

62

MCQmedium

You are managing a Spanner instance for a global financial application. The database has a table `transactions` with columns `transaction_id` (INT64), `user_id` (INT64), `amount` (FLOAT64), `timestamp` (TIMESTAMP), and `region` (STRING). The table is interleaved with a parent table `users`. Recently, you observed that point-read queries by `transaction_id` are taking over 100ms on average, whereas they used to take under 10ms. The instance CPU utilization is below 40%, and there are no contention issues. The `transactions` table has a primary key `(user_id, transaction_id)`. Queries filter on `transaction_id` only, without specifying `user_id`. Which optimization should you implement to improve point-read latency?

A.Add a secondary index on `user_id` to help narrow down the search.

B.Create a secondary index on `transaction_id` to enable efficient key-based lookups.

C.Use a Spanner query hint to force a specific index scan.

D.Change the primary key to `(transaction_id, user_id)` to enable direct access by transaction_id.

AnswerB

A secondary index on `transaction_id` provides a direct lookup path, reducing latency.

Why this answer

Point-read queries by `transaction_id` are slow because the primary key is `(user_id, transaction_id)`, so without `user_id`, Spanner cannot directly locate the split (tablet) and must perform a full table scan or a less efficient lookup. Creating a secondary index on `transaction_id` allows Spanner to use that index for key-based lookups, reducing latency to under 10ms by enabling direct access to the specific split via the index's key.

Exam trap

Google Cloud often tests the misconception that changing the primary key is the only way to optimize queries that don't use the full primary key, but in Spanner, secondary indexes are the correct and efficient solution without disrupting existing interleaved table relationships.

How to eliminate wrong answers

Option A is wrong because adding a secondary index on `user_id` does not help queries that filter only on `transaction_id`; it would only be useful if queries filtered on `user_id` alone or in combination with `transaction_id`. Option C is wrong because a query hint to force a specific index scan is unnecessary and ineffective if no suitable index exists; the hint cannot create an index that doesn't exist, and without an index on `transaction_id`, Spanner would still perform a full scan. Option D is wrong because changing the primary key to `(transaction_id, user_id)` would require a costly schema change and data migration, and it would break the interleaved table structure with the parent `users` table, which expects `user_id` as the first part of the primary key for interleaving.

Full explanation →

63

Multi-Selectmedium

You are monitoring a Cloud Spanner instance that is experiencing high CPU utilization (consistently above 70%). You want to identify the root cause. Which TWO metrics should you examine first? (Choose two.)

Select 2 answers

A.Average commit latency

B.Read and write throughput (operations/second)

C.Lock wait time

D.Stale read rate

E.Number of nodes

AnswersA, B

High commit latency can indicate contention, increasing CPU.

Why this answer

Examining read and write throughput helps identify if the workload is pushing the instance. Analyzing commit latency and lock wait time reveals contention. Stale reads show replica lag but are not primary indicators of high CPU.

Node count is configuration, not utilization.

Full explanation →

64

MCQeasy

A company uses BigQuery to generate daily sales reports. The query aggregates sales by product category and region. The table 'sales_raw' is 500 GB and is updated every hour with new transactions. The report runs slowly. What is the most cost-effective method to improve query performance without changing the existing table schema?

A.Partition the table by product category

B.Create a separate summary table using scheduled queries

C.Create a materialized view that aggregates sales by product category and region

D.Cluster the table by region

AnswerC

Materialized views automatically maintain pre-computed aggregates, significantly reducing query cost and latency.

Why this answer

Option C is correct because a materialized view in BigQuery pre-computes and stores the aggregated results of the query, allowing subsequent queries to read the pre-aggregated data instead of scanning the entire 500 GB 'sales_raw' table. This reduces both the data scanned and the query execution time, and it is automatically refreshed when the base table is updated (every hour), making it cost-effective as you only pay for the bytes used by the materialized view and the incremental refreshes, not for full table scans.

Exam trap

Google Cloud often tests the distinction between partitioning/clustering (which optimize data scanning but do not pre-compute results) and materialized views (which store pre-computed results), leading candidates to choose partitioning or clustering as a 'quick fix' without realizing they do not eliminate the need for full aggregation scans.

How to eliminate wrong answers

Option A is wrong because partitioning by product category is not supported in BigQuery (partitioning is based on date, timestamp, or integer range, not on string columns like product category), and even if it were, partitioning alone does not pre-aggregate data, so the query would still need to scan all partitions to compute the aggregation. Option B is wrong because creating a separate summary table using scheduled queries introduces additional complexity and cost for manual refresh scheduling, and it does not provide automatic incremental updates like a materialized view, leading to potential data staleness and extra storage costs for the duplicate table. Option D is wrong because clustering the table by region only improves the performance of queries that filter or sort by region, but it does not pre-compute the aggregation; the query would still scan all rows in the clustered blocks to perform the GROUP BY, so it does not reduce the data scanned for the aggregation itself.

Full explanation →

65

MCQhard

A company uses Cloud Bigtable for time-series data from IoT devices. Each device sends a reading every second. The row key is device_id#timestamp (reverse timestamp). The team reports that queries for a specific device's data over the last hour are fast, but queries for all devices' data over the last minute are very slow. What is the most likely cause?

A.The Bigtable cluster does not have enough nodes to handle the scan.

B.The query is scanning multiple column families.

C.The row key design does not allow efficient scanning for all devices because device_id is the prefix.

D.The table has too many tablets, causing high overhead.

AnswerC

Prefix scans on device_id are efficient per device, but scanning all devices requires a full table scan.

Why this answer

Option C is correct because the row key design uses device_id as the prefix, which means all data for a given device is co-located in contiguous rows, making per-device scans efficient. However, a query for all devices over the last minute requires scanning every row in the table because the timestamp suffix is reversed and not a prefix; Bigtable cannot perform a range scan across all devices for a recent time window without a full table scan, which is extremely slow.

Exam trap

Google Cloud often tests the misconception that adding more nodes or tablets fixes scan performance, but the real issue is row key design that prevents Bigtable from using its sorted storage to limit the scan range.

How to eliminate wrong answers

Option A is wrong because insufficient nodes would cause general performance degradation across all queries, not specifically slow down the all-devices query while keeping the per-device query fast. Option B is wrong because scanning multiple column families adds overhead only if the query retrieves data from many families, but the problem statement does not mention column families, and the slowness is tied to the row key design, not column family access. Option D is wrong because too many tablets can cause high overhead for any scan, but the per-device query would also be affected; the asymmetry between fast per-device and slow all-devices queries points directly to row key ordering, not tablet count.

Full explanation →

66

MCQmedium

A company is migrating a large Oracle database to Cloud Spanner. The source database uses sequences for primary key generation. The database engineer needs to design the Cloud Spanner schema to avoid hotspotting. What primary key design should they recommend?

A.Keep the same sequence-based integer keys.

B.Use a composite primary key with a timestamp prefix.

C.Use a hash of the original key as a prefix to the primary key.

D.Use UUIDs as the primary key without modification.

AnswerC

A hash prefix distributes the write load evenly across splits, avoiding hotspots.

Why this answer

Option C is correct because using a hash of the original key as a prefix to the primary key distributes writes evenly across Cloud Spanner's splits, preventing hotspotting. Cloud Spanner uses a distributed, append-only storage model where sequential keys (like from Oracle sequences) cause all new writes to land on the same split, creating a hotspot. A hash prefix ensures that related rows are still co-located for efficient queries while spreading write load across multiple nodes.

Exam trap

Google Cloud often tests the misconception that UUIDs or timestamps inherently solve hotspotting, but the trap here is that only a hash prefix (or similar distribution mechanism) guarantees even write distribution across Cloud Spanner's splits.

How to eliminate wrong answers

Option A is wrong because keeping the same sequence-based integer keys will cause all new inserts to target the same tablet (split) in Cloud Spanner, leading to severe hotspotting and degraded write performance. Option B is wrong because using a timestamp prefix does not guarantee distribution; if timestamps are monotonically increasing (e.g., insertion time), writes will still concentrate on the latest split, causing hotspotting. Option D is wrong because UUIDs are random but not designed to avoid hotspotting in Cloud Spanner; without a hash prefix or similar distribution mechanism, UUIDs can still lead to uneven splits and performance issues, especially under high write loads.

Full explanation →

67

MCQmedium

Your Cloud Spanner instance has several tables with interleaved parent-child relationships. You notice that queries that join parent and child tables are slow. What is the best practice to optimize these joins?

A.Ensure the tables are defined as interleaved with the parent key as the first part of the child primary key

B.Create secondary indexes on the join columns

C.Use batch update operations to reduce round trips

D.Remove interleaving and use a separate JOIN statement

AnswerA

Interleaving enables efficient distributed joins without cross-node communication.

Why this answer

Option A is correct because Cloud Spanner optimizes interleaved table joins by physically co-locating parent and child rows on the same split, based on the parent key as the prefix of the child's primary key. This eliminates the need for distributed cross-split joins, dramatically reducing latency. Queries that join on the interleaved key benefit from local data access, making them fast and efficient.

Exam trap

The trap here is that candidates often assume secondary indexes are the universal solution for join performance, but in Cloud Spanner, physical data co-location via interleaving is the critical optimization for parent-child joins, not indexing alone.

How to eliminate wrong answers

Option B is wrong because secondary indexes on join columns do not change the physical co-location of parent and child rows; they only provide an alternative access path, and queries may still require distributed joins across splits, which is the root cause of slowness. Option C is wrong because batch update operations reduce round trips for writes, not for read-heavy join queries; they do not address the physical data layout needed for efficient joins. Option D is wrong because removing interleaving would break the physical co-location guarantee, forcing Spanner to perform distributed cross-split joins, which would make queries even slower, not faster.

Full explanation →

68

MCQmedium

A company uses BigQuery for BI reporting. They have a large table 'events' with nested and repeated fields (ARRAY<STRUCT>). Analysts often query unnested data, which is slow. What is the best practice to improve query performance without changing the source schema?

A.Create a view that unnests the data

B.Redesign the table to be flat

C.Use a subquery with UNNEST and cache the results

D.Create a materialized view that flattens the nested data

AnswerD

Materialized views are persisted and automatically refreshed, reducing query time.

Why this answer

Option D is correct because a materialized view in BigQuery can precompute and store the results of an UNNEST operation on nested fields, significantly reducing query time for repeated flattening queries. Unlike a regular view, a materialized view persists the flattened data and is automatically refreshed, so analysts query pre-joined, pre-flattened results without altering the source schema. This directly addresses the performance issue while preserving the original nested structure for other use cases.

Exam trap

Google Cloud often tests the distinction between a view (which is just a saved query) and a materialized view (which physically stores results), leading candidates to mistakenly choose the view option as a quick fix without considering performance implications.

How to eliminate wrong answers

Option A is wrong because a view only stores the SQL query definition, not the results; each query against the view still executes the UNNEST operation at runtime, providing no performance improvement. Option B is wrong because it violates the requirement to not change the source schema, and redesigning the table to be flat would require altering the ingestion pipeline and breaking existing queries that rely on nested fields. Option C is wrong because subqueries with UNNEST and caching are not natively supported in BigQuery; caching applies only to the final query result, not intermediate subquery results, and manual caching via temporary tables is not a best practice for ongoing analyst queries.

Full explanation →

69

MCQeasy

A bigquery job is running slower than expected. Checking the job information, you see that the slot usage is at 100% for the entire duration of the query. You are using on-demand pricing. What is the most effective way to improve query performance?

A.Create materialized views for common aggregations.

B.Purchase a slot reservation and assign the project to it.

C.Cluster the tables on frequently filtered columns.

D.Partition the tables by date.

AnswerB

Reservations provide dedicated slots, allowing queries to use more resources and run faster.

Why this answer

With on-demand pricing, your query is limited to the default per-project slot capacity (typically 2,000 slots in BigQuery). If slot usage is at 100% for the entire duration, the query is resource-constrained and cannot be sped up without additional slots. Purchasing a slot reservation and assigning the project to it provides dedicated slots, eliminating the contention and allowing the query to run faster.

Exam trap

Google Cloud often tests the misconception that performance issues are always solved by data organization techniques (partitioning/clustering) or precomputation (materialized views), when in fact the bottleneck is compute capacity (slots) under on-demand pricing.

How to eliminate wrong answers

Option A is wrong because materialized views reduce the amount of data scanned and recomputation for repeated aggregations, but they do not increase the available slot capacity; if the query is already hitting 100% slot usage, the bottleneck is compute resources, not data volume. Option C is wrong because clustering improves data pruning and scan efficiency for filtered queries, but it does not add more slots; the query will still be throttled by the fixed slot pool. Option D is wrong because partitioning reduces the amount of data read by date range filters, but like clustering, it does not address the root cause of slot exhaustion; the query will still run at the same slot limit.

Full explanation →

70

Multi-Selectmedium

A company is migrating its on-premises PostgreSQL database to Cloud SQL for PostgreSQL. They want to minimize downtime during the migration. Which TWO actions should they take?

Select 2 answers

A.Increase the disk size of the Cloud SQL instance before migration to improve performance.

B.Use pg_dump to export the database and pg_restore to import into Cloud SQL.

C.Set up a Cloud SQL read replica and promote it to the primary after migration.

D.Decrease max_connections to reduce load during migration.

E.Use Database Migration Service (DMS) with continuous replication from the source.

AnswersC, E

Using a read replica allows the source to remain online during replication, and promoting the replica minimizes cutover downtime.

Why this answer

Option C is correct because setting up a Cloud SQL read replica from the source PostgreSQL database and then promoting it to primary allows for a controlled cutover with minimal downtime. The replica stays in sync with the source using PostgreSQL's native streaming replication, and promotion is a fast metadata operation that typically takes seconds, not minutes or hours.

Exam trap

Google Cloud often tests the distinction between logical backup tools (pg_dump/pg_restore) which cause downtime, and continuous replication methods (DMS or read replicas) which minimize it, leading candidates to incorrectly choose the familiar dump-and-restore approach.

Full explanation →

71

MCQmedium

What is the most likely cause of the high execution time?

A.The WHERE clause uses a string comparison

B.Missing index on status column

C.The query selects two columns causing inefficiency

D.The instance has insufficient nodes

E.The query does not filter on primary key

AnswerB

An index on status would allow efficient row retrieval instead of a full table scan.

Why this answer

The execution plan shows a full table scan. Since the query filters on status, an index on the status column would avoid the scan and reduce execution time. Options A, B, D, E are not the primary cause.

Full explanation →

72

MCQmedium

A data analyst is running a BigQuery query that joins multiple tables to generate a BI report. The query is slow and uses many LEFT JOINs. What is the best approach to improve performance without changing the business logic?

A.Denormalize the data using nested repeated fields to avoid joins

B.Add indexes on the join columns

C.Replace LEFT JOINs with INNER JOINs where possible

D.Increase the number of BigQuery slots

AnswerA

Using nested repeated fields reduces joins and improves query performance by storing related data together.

Why this answer

Denormalizing data using nested repeated fields in BigQuery reduces the number of JOIN operations, which are expensive in a distributed, columnar storage system. By storing related data in a single table with REPEATED fields, the query avoids shuffling large datasets across slots, directly improving performance while preserving the original business logic.

Exam trap

Google Cloud often tests the misconception that traditional database optimization techniques like indexing or increasing resources apply to BigQuery, when in fact the correct approach is to leverage BigQuery's native schema design features like nested and repeated fields.

How to eliminate wrong answers

Option B is wrong because BigQuery does not support traditional indexes; it uses columnar storage and clustering/partitioning for performance, so adding indexes is not applicable. Option C is wrong because replacing LEFT JOINs with INNER JOINs changes the business logic by excluding rows that do not have matching records in the joined table, which may alter the BI report results. Option D is wrong because increasing the number of BigQuery slots only addresses resource contention, not the root cause of slow JOINs; it is a costly workaround that does not optimize the query structure.

Full explanation →

73

Multi-Selectmedium

Which THREE actions can reduce read latency for a globally distributed Cloud Spanner database?

Select 3 answers

A.Use read-only replicas (follower reads)

B.Use multi-region configuration

C.Use leader reads

D.Use interleaved tables

E.Use secondary indexes

AnswersA, B, E

Follower reads allow reads from nearby replicas instead of the leader, reducing latency.

Why this answer

Option A is correct because read-only replicas (follower reads) allow Cloud Spanner to serve read requests from non-leader replicas, reducing the distance data must travel and thus lowering read latency for globally distributed users. This is particularly effective when strong consistency is not required, as follower reads can return data that is up to a few seconds stale but much faster to access from a nearby replica.

Exam trap

Google Cloud often tests the misconception that leader reads are always optimal for latency, but the trap here is that leader reads actually increase latency for distant users because they force all reads to the single leader region, while follower reads distribute read traffic to the nearest replica.

Full explanation →

74

MCQmedium

An e-commerce platform uses Cloud SQL for PostgreSQL. They need to run complex reporting queries that join several tables. These queries are slowing down the transactional workload. What should they do?

A.Create materialized views for common reports.

B.Change all joins to use subqueries.

C.Increase the number of vCPUs on the primary instance.

D.Use read replicas to offload reporting queries.

AnswerD

Read replicas serve read-only traffic without impacting the primary.

Why this answer

Read replicas can offload read-only reporting queries, protecting the primary's performance. Option B is correct. Option A (materialized views) still run on primary.

Option C (subqueries) may not reduce load. Option D (scaling up) is more expensive and doesn't isolate workloads.

Full explanation →

75

MCQeasy

A company runs a production Cloud SQL for PostgreSQL instance used by a web application. The instance experiences intermittent latency spikes during peak hours. You need to diagnose the cause without downtime. Which tool should you use first?

A.Use Database Migration Service to failover to a read replica.

B.Use Cloud SQL Insights to analyze query performance and identify slow queries.

C.Use gcloud sql instances describe to check instance configuration.

D.Use VPC Flow Logs to analyze network traffic.

AnswerB

Cloud SQL Insights provides query-level performance diagnostics without downtime.

Why this answer

Cloud SQL Insights provides built-in query performance monitoring and diagnostics without requiring any downtime. It surfaces slow queries, lock contention, and resource bottlenecks directly from the PostgreSQL engine, making it the ideal first step to identify the root cause of intermittent latency spikes during peak hours.

Exam trap

The trap here is that candidates may confuse Cloud SQL Insights with a general monitoring tool like VPC Flow Logs or assume that failing over to a read replica is a diagnostic step, when in fact Insights is the only option that provides database-internal performance data without downtime.

How to eliminate wrong answers

Option A is wrong because Database Migration Service is used for migrating databases to Cloud SQL, not for failover; failing over to a read replica would cause downtime during the promotion process and does not diagnose the latency issue. Option C is wrong because gcloud sql instances describe only returns static configuration metadata (e.g., machine type, region, maintenance window) and does not provide real-time query performance or latency diagnostics. Option D is wrong because VPC Flow Logs capture network-level metadata (source/destination IP, ports, packet count) and cannot reveal slow SQL queries, lock waits, or database engine internals.

Full explanation →

Page 1 of 7

All pages

Practice PCDE by domain

Target a specific domain to shore up weak areas.

Plan and manage database infrastructure Define data structures and implement SQL for Business Intelligence Design and implement database schemas Monitor and optimize database performance

See all domains with question counts →