PCDE Exam Questions and Answers

A company runs a production Cloud SQL for PostgreSQL instance used by a web application. The instance experiences intermittent latency spikes during peak hours. You need to diagnose the cause without downtime. Which tool should you use first?

Use Database Migration Service to failover to a read replica.

Use Cloud SQL Insights to analyze query performance and identify slow queries.

Cloud SQL Insights provides query-level performance diagnostics without downtime.

Use gcloud sql instances describe to check instance configuration.

Use VPC Flow Logs to analyze network traffic.

Why: Cloud SQL Insights provides built-in query performance monitoring and diagnostics without requiring any downtime. It surfaces slow queries, lock contention, and resource bottlenecks directly from the PostgreSQL engine, making it the ideal first step to identify the root cause of intermittent latency spikes during peak hours.

A company is migrating an on-premises Oracle database to Cloud SQL for PostgreSQL. The database is 2 TB in size and the network bandwidth to Google Cloud is limited to 500 Mbps. The migration window is 48 hours. Which migration strategy should the Database Engineer recommend?

Create a VPN tunnel and use pg_dump/pg_restore over the network.

Use Database Migration Service with continuous replication.

Export the database to flat files, compress, upload to Cloud Storage, then import to Cloud SQL.

File-based migration with compression can work within the bandwidth and time constraints.

Request a dedicated interconnect and then migrate.

Why: Option C is correct because the 2 TB database size and 500 Mbps bandwidth yield a theoretical transfer time of approximately 9.5 hours (2 TB * 1024 GB/TB * 8 bits/byte / 500 Mbps / 3600 seconds/hour), which fits within the 48-hour window. However, pg_dump/pg_restore over a VPN (Option A) would be slower due to TCP overhead and latency, and Database Migration Service with continuous replication (Option B) requires ongoing connectivity and may not complete the initial load within the window. Exporting to flat files, compressing them (e.g., with gzip), uploading to Cloud Storage, and then importing to Cloud SQL leverages high-throughput parallel uploads and avoids network latency issues, making it the most reliable strategy for a one-time migration within the given constraints.

A financial services company uses Cloud Spanner for transaction processing. They need to ensure zero downtime during a schema change that adds a new column with a default value to a large table. Which approach should the Database Engineer take?

Create a new table with the new column, then use a fan-out pattern to write to both tables until the old table is deprecated.

Use an ALTER TABLE statement during a maintenance window.

Drop the table and recreate it with the new schema.

Use ALTER TABLE to add the column; Spanner handles schema changes online.

Spanner schema changes are online and do not cause downtime.

Why: Option D is correct because Cloud Spanner supports online schema changes, including adding columns with default values, without requiring a maintenance window or causing downtime. Spanner applies schema updates asynchronously across all nodes while the database remains fully available for reads and writes, making it the ideal approach for zero-downtime requirements.

A company runs a BigQuery data warehouse. They notice that query performance has degraded over time. The data is loaded daily from Cloud Storage using batch loads. Which action is most likely to improve query performance?

Partition and cluster tables based on common query filters.

Partitioning and clustering reduce data scanned, improving performance.

Increase the number of slots in the reservation.

Create materialized views for all frequent queries.

Migrate the data to Cloud SQL for better performance.

Why: Partitioning and clustering tables based on common query filters directly reduces the amount of data scanned per query by organizing data into physical segments. In BigQuery, this allows the query engine to prune entire partitions and clusters, significantly lowering I/O and improving performance without additional cost or complexity.

A gaming company uses Memorystore for Redis to cache player session data. They need to ensure high availability with automatic failover in case of a zone failure. Which configuration should the Database Engineer choose?

Deploy a Standard tier Redis instance with replication across two zones.

Standard tier provides replication and automatic failover.

Deploy a Basic tier Redis instance with multiple read replicas.

Deploy a Memcached cluster with multiple nodes.

Deploy a Basic tier Redis instance in a single zone.

Why: Memorystore for Redis Standard tier provides cross-zone replication with automatic failover, ensuring high availability during a zone failure. The Standard tier uses a primary and replica instance in different zones, and if the primary fails, the replica is automatically promoted. This meets the requirement for automatic failover without manual intervention.

A company is designing a global application using Cloud Spanner. They need to ensure low latency reads and writes across three continents. Which TWO configurations should they consider?

Use a multi-region configuration with leader regions in each continent.

Multi-region with leader regions reduces write latency.

Use a single-region instance and rely on application caching.

Use strongly consistent reads from a single region.

Use read replicas in each continent for stale read use cases.

Read replicas reduce read latency for remote users.

Use interleaved tables to optimize query performance.

Why: Option A is correct because Cloud Spanner multi-region configurations allow you to place leader regions in multiple continents, which enables low-latency strongly consistent reads and writes by directing traffic to the nearest leader. This is achieved through Spanner's TrueTime and Paxos-based replication, ensuring global consistency without sacrificing performance.

Want more Plan and manage database infrastructure practice?

All Define data structures and implement SQL for Business Intelligence questions

Domain 2: Define data structures and implement SQL for Business Intelligence

A company uses BigQuery for BI reporting. They have a table 'orders' with columns: order_id, customer_id, order_date, amount, status. The BI team frequently runs queries that filter on order_date and group by customer_id to compute total sales per customer. Which partitioning and clustering strategy optimizes query performance and cost?

Partition by order_date, cluster by status

Do not partition, cluster by customer_id

Partition by customer_id, cluster by order_date

Partition by order_date, cluster by customer_id

Partitioning on order_date prunes partitions for date filters; clustering on customer_id improves group by performance.

Why: Option D is correct because partitioning by order_date allows BigQuery to prune partitions for queries filtering on order_date, reducing the amount of data scanned. Clustering by customer_id organizes data within each partition so that GROUP BY customer_id queries can efficiently read only relevant blocks, minimizing shuffle and cost. This combination directly aligns with the BI team's query pattern of filtering by date and aggregating by customer.

A retail company uses BigQuery to store sales data. The 'sales' table has 10 billion rows and is partitioned by transaction_date (daily). The BI dashboard runs a query that aggregates sales by product_category for the last 30 days. The query is slow and expensive. Which improvement is most effective?

Cluster the table on product_category

Clustering on product_category organizes data within each partition so that queries filtering/aggregating on that column scan fewer blocks.

Change partitioning to monthly

Denormalize the product_category into the sales table

Use a materialized view with aggregation on product_category

Why: Option A is correct because clustering the table on product_category organizes the data within each daily partition by that column, allowing BigQuery to use block-level pruning to skip irrelevant blocks when filtering or aggregating by product_category. This directly reduces the amount of data scanned for the 30-day aggregation query, improving both performance and cost.

A company is designing a data warehouse for BI. They need to support both detailed transaction analysis and high-level aggregated reports. Which schema design best balances storage and query performance?

Fully denormalized single table

Wide column store with no schema

Star schema with fact and dimension tables

Star schema is standard for BI, enabling fast aggregations and easy reporting.

Snowflake schema with normalized dimensions

Why: The star schema is the optimal design for balancing storage and query performance in a BI data warehouse because it separates transactional data into fact tables (for detailed analysis) and dimension tables (for context), enabling fast aggregations via star joins while avoiding the storage overhead of full denormalization. This structure directly supports both granular transaction queries and high-level rollups without the complexity or performance penalty of snowflake schemas or the redundancy of fully denormalized tables.

A BI team runs a daily query on a BigQuery table 'events' partitioned by event_date. The query filters on event_date = CURRENT_DATE() and counts rows by event_type. The query is slow. Upon review, the table has 500 partitions but clustering is not set. Which action reduces query cost and latency?

Recreate the table with only the last 30 days of data

Use a wildcard table for daily ingestion

Increase the partition expiration to 365 days

Add clustering on event_type

Clustering on event_type organizes data by that column within each partition, speeding up count and group by.

Why: Adding clustering on `event_type` physically co-locates rows with the same event type within each partition. This allows BigQuery to use block-level pruning when reading data, drastically reducing the number of bytes scanned for the COUNT(*) GROUP BY query. Since the query already filters on a single partition (`event_date = CURRENT_DATE()`), the performance bottleneck is scanning all rows in that partition; clustering eliminates that overhead without changing the table's structure or retention.

A company stores sensor data in BigQuery. They have a table 'sensor_readings' with columns: sensor_id, reading_time, value. The table is partitioned by reading_time (hourly) and clustered by sensor_id. A BI query aggregates average value per sensor for the last week. The query still scans many bytes. What is the most likely cause?

The query uses SELECT * instead of specific columns

Clustering on sensor_id is ineffective

The table is not using columnar storage

Partition granularity is too fine for the query range

Hourly partitions for a week means 168 partitions scanned; coarser partitioning (daily) would scan 7 partitions, reducing bytes.

Why: Option D is correct because the query scans a full week of data (168 hourly partitions), and each partition must be read entirely even though only a subset of sensors may be active. Hourly partitioning over a 7-day range means the query engine must scan all 168 partitions, which can result in a large number of bytes being processed. Clustering on sensor_id helps within each partition but does not reduce the number of partitions scanned; the fine granularity of hourly partitioning is the primary cause of excessive bytes scanned.

Which TWO actions improve query performance and reduce cost in BigQuery for BI workloads?

Cluster tables on columns used in GROUP BY

Clustering improves aggregation performance.

Partition tables on columns frequently used in WHERE clauses

Partition pruning reduces bytes scanned.

Load data using batch loads instead of streaming

Store data in CSV format

Use SELECT * in all queries

Why: Clustering tables on columns used in GROUP BY improves query performance by physically co-locating rows with similar values, reducing the amount of data scanned during aggregation. Partitioning on columns frequently used in WHERE clauses allows BigQuery to prune entire partitions from the scan, directly reducing both cost (bytes billed) and query execution time. These two optimizations are specifically recommended for BI workloads where repeated, selective queries are common.

Want more Define data structures and implement SQL for Business Intelligence practice?

All Design and implement database schemas questions

Domain 3: Design and implement database schemas

A company is designing a database schema for a global e-commerce platform. Orders are created with high frequency, and order status updates occur frequently. The team needs to choose a primary key strategy for the orders table in Spanner. Which approach minimizes hot-spotting?

Use a monotonically increasing integer (e.g., auto-increment)

Use a timestamp as the primary key

Use a composite key with user_id and order_date

Use a universally unique identifier (UUID) as the primary key

Distributes writes uniformly across splits.

Why: In Spanner, monotonically increasing or time-ordered primary keys cause hot-spotting because all new writes are directed to the same tablet server, creating a single point of contention. UUIDs are randomly distributed, ensuring writes are spread evenly across the entire key space, which minimizes hot-spotting and maximizes write throughput.

A team is migrating an on-premises PostgreSQL database to Cloud SQL for PostgreSQL. The existing schema uses a large number of foreign key constraints and triggers for data validation. The team wants to minimize migration effort and maintain data integrity. Which schema design approach is most appropriate for Cloud SQL?

Keep the existing foreign keys and triggers as-is in Cloud SQL for PostgreSQL

Cloud SQL supports these features, minimizing migration effort.

Migrate to Cloud Spanner and use interleaved tables to simulate foreign keys

Remove all foreign keys and triggers and implement validation in the application layer

Convert the schema to use Firestore in Datastore mode with composite indexes

Why: Option A is correct because Cloud SQL for PostgreSQL is fully compatible with the PostgreSQL engine, meaning foreign key constraints and triggers operate identically to on-premises PostgreSQL. This approach minimizes migration effort by preserving the existing schema logic and maintaining referential integrity without requiring application changes or data validation rewrites.

A team is designing a schema for a time-series database in Bigtable to store IoT sensor readings. Each sensor sends a reading every minute. The team needs to create a row key that supports efficient queries for a specific sensor's readings over a time range. Which row key design is most appropriate?

timestamp#sensor_id

hash(sensor_id)#timestamp

sensor_id#reverse_timestamp

Groups all readings for a sensor together in reverse chronological order.

random_UUID

Why: Option C is correct because Bigtable stores rows sorted lexicographically by row key. By placing the sensor_id first, all readings for a given sensor are co-located in contiguous rows. Using reverse_timestamp (e.g., 9999-12-31 minus actual timestamp) ensures that the most recent readings appear first within that sensor's row range, which optimizes scans for the latest data and allows efficient range queries over a time window.

A company is using Cloud Spanner to manage financial transactions. The current schema has a single table 'Transactions' with a composite primary key (account_id, transaction_timestamp). The company frequently queries the latest transaction for each account. This query pattern is causing full table scans. Which schema design change would most improve query performance?

Add a secondary index on (account_id, transaction_timestamp DESC)

Change the primary key to (transaction_timestamp, account_id) and use interleaving

Create a separate 'LatestTransaction' table keyed by account_id, and update it whenever a new transaction occurs

Enables direct point reads for the latest transaction.

Add a 'is_latest' boolean column to the Transactions table and index it

Why: Option C is correct because it eliminates the need to scan the entire Transactions table to find the latest transaction per account. By maintaining a separate LatestTransaction table keyed by account_id, each account's latest transaction can be retrieved with a single point read. This is a classic denormalization pattern in Cloud Spanner that avoids the overhead of scanning or sorting large datasets for 'latest per group' queries.

A team is designing a relational schema for a new application on Cloud SQL. The schema includes a table 'Orders' and a table 'Customers'. Each order belongs to one customer. The team anticipates high write throughput and needs to enforce referential integrity. Which schema design is most appropriate?

Use Cloud Spanner interleaved tables with Orders as a child of Customers

Implement referential integrity checks in the application code and omit database constraints

Store order data as a JSON array in a column of the Customers table

Use a foreign key constraint from Orders.customer_id to Customers.customer_id

Enforces integrity efficiently within the database.

Why: Option D is correct because using a foreign key constraint from Orders.customer_id to Customers.customer_id enforces referential integrity at the database level, which is essential for maintaining data consistency in a relational schema. Cloud SQL (e.g., MySQL or PostgreSQL) natively supports foreign key constraints, ensuring that every order references an existing customer without relying on application logic. This approach is efficient for high write throughput as the database handles the check atomically, avoiding race conditions.

A team is designing a schema for a user activity logging system using Bigtable. Each log entry includes a user ID, activity type, timestamp, and details. The access pattern requires retrieving all activities for a specific user within a time range. Which TWO row key designs are suitable? (Choose TWO.)

timestamp#user_id

random_uuid

reverse_timestamp

user_id#activity_type#timestamp

Allows filtering by activity type within a user.

user_id#timestamp

Groups all activities for a user together in time order.

Why: Option D (user_id#activity_type#timestamp) is correct because it groups all activities for a user under a single row key prefix, enabling efficient row range scans for a specific user. The activity_type suffix allows filtering by activity type if needed, while the timestamp ensures uniqueness and ordered storage. Option E (user_id#timestamp) is correct because it directly supports the access pattern of retrieving all activities for a user within a time range by scanning rows with the user_id prefix and filtering on the timestamp component.

Want more Design and implement database schemas practice?

All Monitor and optimize database performance questions

Domain 4: Monitor and optimize database performance

You are managing a Cloud SQL for PostgreSQL instance that is experiencing high CPU usage and slow query performance. You notice that the database has a high number of idle-in-transaction connections. Which immediate action should you take to reduce CPU load without disrupting active transactions?

Use VPC firewall rules to block new connections until the issue resolves.

Kill all idle-in-transaction connections using pg_terminate_backend.

Set the cloudsql.enable_idle_in_transaction_session_timeout flag to true and configure idle_in_transaction_session_timeout.

This flag automatically terminates idle-in-transaction sessions after a specified timeout, reducing CPU usage without manual intervention.

Set a statement_timeout at the session level for new connections.

Why: Option C is correct because setting the `cloudsql.enable_idle_in_transaction_session_timeout` flag to true and configuring `idle_in_transaction_session_timeout` allows Cloud SQL to automatically terminate idle-in-transaction connections after a specified timeout, reducing CPU load without manually killing connections or disrupting active transactions. This is a built-in, non-disruptive mechanism that targets only connections that are holding resources while idle, freeing up CPU and memory for active queries.

A team is deploying a new application on Google Kubernetes Engine (GKE) that uses Cloud Spanner. They want to minimize latency for read operations. Which Spanner configuration should they use?

Use a multi-region configuration with default leader preference set to the region where the application runs.

Use a regional instance with read replicas in the same region.

Regional instances with read replicas in the same region provide low-latency reads with strong consistency.

Use a single-region instance and configure the leader preference to the application's zone.

Use a single-region instance and enable read-only replicas in multiple zones.

Why: Option B is correct because a regional instance with read replicas in the same region provides the lowest read latency for applications running in that region. Cloud Spanner's regional configuration keeps all data and replicas within a single Google Cloud region, minimizing network round-trips. Read replicas in the same region can serve strongly consistent reads without cross-region hops, which is optimal for latency-sensitive workloads.

Your Cloud SQL for MySQL instance is experiencing intermittent performance degradation. You suspect that the issue is due to a sudden spike in connections from a specific application. Which metric and monitoring approach would best help you correlate the connection spike with performance degradation?

Monitor 'cloudsql.googleapis.com/network/received_bytes_count' and compare with connection count.

Monitor 'cloudsql.googleapis.com/database/mysql/replication/seconds_behind_master' and compare with query latency.

Monitor 'cloudsql.googleapis.com/instance/uptime' and check for instance restarts during degradation.

Monitor 'cloudsql.googleapis.com/database/mysql/threads/threads_connected' and correlate with CPU utilization and query latency.

Threads connected directly indicates active connections, and correlating with CPU and latency helps identify the impact.

Why: Option D is correct because the 'threads_connected' metric directly measures the number of active connections to the MySQL instance. Correlating this with CPU utilization and query latency allows you to pinpoint whether a sudden spike in connections is causing resource contention and degraded query performance, which is the exact scenario described.

You are running a production workload on Cloud Bigtable and notice that read latency has increased. Upon reviewing the monitoring dashboard, you see that CPU utilization is below 50% but the number of active tablets is high. What is the most likely cause of the increased read latency?

Read requests are being throttled due to exceeding IOPS limits.

There are too many tablets, causing increased metadata operations and slower reads.

Excessive tablets increase the overhead of metadata lookups and tablet splitting, leading to higher latency.

A hot node is throttling read requests.

The cluster is underprovisioned, causing resource contention.

Why: In Cloud Bigtable, each tablet is a contiguous range of rows managed by a tablet server. When the number of active tablets is high, the tablet server must perform more metadata operations (e.g., splitting, merging, and serving multiple tablets) which increases per-request overhead and can degrade read latency. This is true even when CPU utilization is below 50%, because the overhead is not purely CPU-bound but involves increased I/O and coordination.

A developer has deployed a new version of an application that uses Cloud SQL. After the deployment, you notice a sharp increase in the number of slow queries. What should you do first to identify the problematic queries?

Check the slow query log in Cloud Logging and look for queries with high rows_examined.

Use Cloud SQL Query Insights to identify the queries with the highest latency and examine their execution plans.

Query Insights provides detailed query performance data without additional overhead.

Increase the instance tier to reduce the impact of slow queries.

Enable the general query log and parse the log file to find slow queries.

Why: Cloud SQL Query Insights is the recommended first step for diagnosing slow queries because it provides built-in query monitoring, latency breakdowns, and execution plans without additional configuration. It directly surfaces the queries with the highest latency, allowing you to examine their execution plans to identify root causes such as missing indexes or inefficient joins.

Which TWO actions can help reduce the number of read replicas needed for a Cloud SQL for PostgreSQL instance that serves a read-heavy workload?

Implement connection pooling to reuse database connections.

Reduces connection overhead and improves replica efficiency.

Enable synchronous replication on all read replicas.

Use smaller machine types for read replicas.

Use application-level caching (e.g., Redis) to cache frequent read results.

Offloads read requests from the database, reducing replica load.

Increase the max_connections parameter on the primary instance.

Why: Option A is correct because connection pooling reduces the overhead of establishing new database connections, which can consume significant CPU and memory resources on the primary instance. By reusing existing connections, the primary instance can handle more read requests without needing additional read replicas to offload the connection management load. This directly reduces the number of replicas required for a read-heavy workload.

Want more Monitor and optimize database performance practice?

Browse all PCDE questions Take a timed practice test

Frequently asked questions

How many questions are on the PCDE exam?

The PCDE exam has 60 questions and must be completed in 120 minutes. The passing score is 720/1000.

What types of questions appear on the PCDE exam?

Scenario-based questions covering exam objectives with detailed answer explanations.

How are PCDE questions organised by domain?

The exam covers 4 domains: Plan and manage database infrastructure, Define data structures and implement SQL for Business Intelligence, Design and implement database schemas, Monitor and optimize database performance. Questions are weighted by domain — higher-weight domains appear more on your actual exam.

Are these the actual PCDE exam questions?

No. These are original exam-style practice questions written against the official Google Cloud PCDE exam objectives. They are not copied from the real exam. Courseiva focuses on genuine understanding, not memorisation of braindumps.

Ready to practice all 60 PCDE questions?

Courseiva tracks your accuracy per domain and routes you toward weak areas automatically. Free, no account required.

Google Cloud · Free Practice Questions · Last reviewed May 2026

PCDE Exam Questions and Answers

24real exam-style questions organised by domain, each with the correct answer highlighted and a plain-English explanation of why it's right — and why the others are wrong.

60 exam questions

120 min time limit

Pass: 720/1000 / 1000

4 exam domains

Overview Domain Blueprint Study Guide All QuestionsSample by Domain

1. Plan and manage database infrastructure 2. Define data structures and implement SQL for Business Intelligence 3. Design and implement database schemas 4. Monitor and optimize database performance

Domain 1: Plan and manage database infrastructure

All Plan and manage database infrastructure questions

Use Database Migration Service to failover to a read replica.

Use Cloud SQL Insights to analyze query performance and identify slow queries.

Cloud SQL Insights provides query-level performance diagnostics without downtime.

Use gcloud sql instances describe to check instance configuration.

Use VPC Flow Logs to analyze network traffic.

Create a VPN tunnel and use pg_dump/pg_restore over the network.

Use Database Migration Service with continuous replication.

Export the database to flat files, compress, upload to Cloud Storage, then import to Cloud SQL.

File-based migration with compression can work within the bandwidth and time constraints.

Request a dedicated interconnect and then migrate.

Create a new table with the new column, then use a fan-out pattern to write to both tables until the old table is deprecated.

Use an ALTER TABLE statement during a maintenance window.

Drop the table and recreate it with the new schema.

Use ALTER TABLE to add the column; Spanner handles schema changes online.

Spanner schema changes are online and do not cause downtime.

Partition and cluster tables based on common query filters.

Partitioning and clustering reduce data scanned, improving performance.

Increase the number of slots in the reservation.

Create materialized views for all frequent queries.

Migrate the data to Cloud SQL for better performance.

Deploy a Standard tier Redis instance with replication across two zones.

Standard tier provides replication and automatic failover.

Deploy a Basic tier Redis instance with multiple read replicas.

Deploy a Memcached cluster with multiple nodes.

Deploy a Basic tier Redis instance in a single zone.

A company is designing a global application using Cloud Spanner. They need to ensure low latency reads and writes across three continents. Which TWO configurations should they consider?

Use a multi-region configuration with leader regions in each continent.

Multi-region with leader regions reduces write latency.

Use a single-region instance and rely on application caching.

Use strongly consistent reads from a single region.

Use read replicas in each continent for stale read use cases.

Read replicas reduce read latency for remote users.

Use interleaved tables to optimize query performance.

Want more Plan and manage database infrastructure practice?

All Define data structures and implement SQL for Business Intelligence questions

Domain 2: Define data structures and implement SQL for Business Intelligence

Partition by order_date, cluster by status

Do not partition, cluster by customer_id

Partition by customer_id, cluster by order_date

Partition by order_date, cluster by customer_id

Partitioning on order_date prunes partitions for date filters; clustering on customer_id improves group by performance.

Cluster the table on product_category

Clustering on product_category organizes data within each partition so that queries filtering/aggregating on that column scan fewer blocks.

Change partitioning to monthly

Denormalize the product_category into the sales table

Use a materialized view with aggregation on product_category

Fully denormalized single table

Wide column store with no schema

Star schema with fact and dimension tables

Star schema is standard for BI, enabling fast aggregations and easy reporting.

Snowflake schema with normalized dimensions

Recreate the table with only the last 30 days of data

Use a wildcard table for daily ingestion

Increase the partition expiration to 365 days

Add clustering on event_type

Clustering on event_type organizes data by that column within each partition, speeding up count and group by.

The query uses SELECT * instead of specific columns

Clustering on sensor_id is ineffective

The table is not using columnar storage

Partition granularity is too fine for the query range

Hourly partitions for a week means 168 partitions scanned; coarser partitioning (daily) would scan 7 partitions, reducing bytes.

Which TWO actions improve query performance and reduce cost in BigQuery for BI workloads?

Cluster tables on columns used in GROUP BY

Clustering improves aggregation performance.

Partition tables on columns frequently used in WHERE clauses

Partition pruning reduces bytes scanned.

Load data using batch loads instead of streaming

Store data in CSV format

Use SELECT * in all queries

Want more Define data structures and implement SQL for Business Intelligence practice?

All Design and implement database schemas questions

Domain 3: Design and implement database schemas

Use a monotonically increasing integer (e.g., auto-increment)

Use a timestamp as the primary key

Use a composite key with user_id and order_date

Use a universally unique identifier (UUID) as the primary key

Distributes writes uniformly across splits.

Keep the existing foreign keys and triggers as-is in Cloud SQL for PostgreSQL

Cloud SQL supports these features, minimizing migration effort.

Migrate to Cloud Spanner and use interleaved tables to simulate foreign keys

Remove all foreign keys and triggers and implement validation in the application layer

Convert the schema to use Firestore in Datastore mode with composite indexes

timestamp#sensor_id

hash(sensor_id)#timestamp

sensor_id#reverse_timestamp

Groups all readings for a sensor together in reverse chronological order.

random_UUID

Add a secondary index on (account_id, transaction_timestamp DESC)

Change the primary key to (transaction_timestamp, account_id) and use interleaving

Create a separate 'LatestTransaction' table keyed by account_id, and update it whenever a new transaction occurs

Enables direct point reads for the latest transaction.

Add a 'is_latest' boolean column to the Transactions table and index it

Use Cloud Spanner interleaved tables with Orders as a child of Customers

Implement referential integrity checks in the application code and omit database constraints

Store order data as a JSON array in a column of the Customers table

Use a foreign key constraint from Orders.customer_id to Customers.customer_id

Enforces integrity efficiently within the database.

timestamp#user_id

random_uuid

reverse_timestamp

user_id#activity_type#timestamp

Allows filtering by activity type within a user.

user_id#timestamp

Groups all activities for a user together in time order.

Want more Design and implement database schemas practice?

All Monitor and optimize database performance questions

Domain 4: Monitor and optimize database performance

Use VPC firewall rules to block new connections until the issue resolves.

Kill all idle-in-transaction connections using pg_terminate_backend.

Set the cloudsql.enable_idle_in_transaction_session_timeout flag to true and configure idle_in_transaction_session_timeout.

This flag automatically terminates idle-in-transaction sessions after a specified timeout, reducing CPU usage without manual intervention.

Set a statement_timeout at the session level for new connections.

A team is deploying a new application on Google Kubernetes Engine (GKE) that uses Cloud Spanner. They want to minimize latency for read operations. Which Spanner configuration should they use?

Use a multi-region configuration with default leader preference set to the region where the application runs.

Use a regional instance with read replicas in the same region.

Regional instances with read replicas in the same region provide low-latency reads with strong consistency.

Use a single-region instance and configure the leader preference to the application's zone.

Use a single-region instance and enable read-only replicas in multiple zones.

Monitor 'cloudsql.googleapis.com/network/received_bytes_count' and compare with connection count.

Monitor 'cloudsql.googleapis.com/database/mysql/replication/seconds_behind_master' and compare with query latency.

Monitor 'cloudsql.googleapis.com/instance/uptime' and check for instance restarts during degradation.

Monitor 'cloudsql.googleapis.com/database/mysql/threads/threads_connected' and correlate with CPU utilization and query latency.

Threads connected directly indicates active connections, and correlating with CPU and latency helps identify the impact.

Read requests are being throttled due to exceeding IOPS limits.

There are too many tablets, causing increased metadata operations and slower reads.

Excessive tablets increase the overhead of metadata lookups and tablet splitting, leading to higher latency.

A hot node is throttling read requests.

The cluster is underprovisioned, causing resource contention.

Check the slow query log in Cloud Logging and look for queries with high rows_examined.

Use Cloud SQL Query Insights to identify the queries with the highest latency and examine their execution plans.

Query Insights provides detailed query performance data without additional overhead.

Increase the instance tier to reduce the impact of slow queries.

Enable the general query log and parse the log file to find slow queries.

Which TWO actions can help reduce the number of read replicas needed for a Cloud SQL for PostgreSQL instance that serves a read-heavy workload?

Implement connection pooling to reuse database connections.

Reduces connection overhead and improves replica efficiency.

Enable synchronous replication on all read replicas.

Use smaller machine types for read replicas.

Use application-level caching (e.g., Redis) to cache frequent read results.

Offloads read requests from the database, reducing replica load.

Increase the max_connections parameter on the primary instance.

Want more Monitor and optimize database performance practice?