Google Professional Cloud Database Engineer PCDE Questions 226–300 | Page 4/7

226

Matchingmedium

Match each BigQuery DDL statement to its function.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Creates a new table

Modifies table schema or options

Deletes a table

Creates a logical view

Creates a precomputed view for faster queries

Why these pairings

DDL statements are used to define and manage database objects in BigQuery.

Full explanation →

227

MCQeasy

A data engineer needs to grant a service account read-only access to a Cloud Storage bucket containing sensitive data. The service account is used by a Compute Engine instance. What is the most secure way to assign the permissions?

A.Set bucket ACLs to allow read access for the service account.

B.Make the bucket public and rely on network restrictions.

C.Grant the service account the Storage Object Viewer role at the project level.

D.Grant the service account the Storage Object Viewer role on the specific bucket.

AnswerD

Bucket-level IAM grants least privilege.

Why this answer

Option D is correct because granting the Storage Object Viewer role at the bucket level applies the principle of least privilege, restricting the service account's read-only access to only that specific bucket. This avoids granting broader permissions at the project level, which would inadvertently allow access to all buckets in the project. Using IAM roles is more secure and manageable than legacy bucket ACLs.

Exam trap

Google Cloud often tests the principle of least privilege by making candidates choose between project-level and resource-level IAM roles, where the trap is assuming project-level roles are acceptable without considering the broader access they grant.

How to eliminate wrong answers

Option A is wrong because bucket ACLs are a legacy access control mechanism that are less granular and harder to audit than IAM roles; they also do not support service accounts natively in the same way IAM does, and mixing ACLs with IAM can lead to unintended permissions. Option B is wrong because making the bucket public exposes the sensitive data to anyone on the internet, and network restrictions alone are insufficient for authentication and authorization, violating the principle of least privilege. Option C is wrong because granting the Storage Object Viewer role at the project level gives the service account read access to all buckets in the project, not just the one containing sensitive data, which unnecessarily broadens the attack surface.

Full explanation →

228

MCQmedium

A company is setting up access control for a BigQuery dataset using the above IAM policy. An analyst who is a member of the group 'analysts@example.com' also has the user account 'analyst@example.com'. They need to create new tables in the dataset. What will be the outcome?

A.The analyst will get an error because of conflicting roles.

B.The analyst cannot create tables because the group only has dataViewer.

C.The analyst can create tables because they have dataOwner role on their user account.

D.The analyst can create tables if they also have jobUser role.

AnswerC

The dataOwner role includes all dataset permissions, including table creation.

Why this answer

IAM grants are additive. The user has the dataOwner role directly (A), which includes create table permissions. The group membership with dataViewer does not override.

So the analyst can create tables. Option B is wrong because the user has explicit dataOwner. Option C (jobUser) is not needed.

Option D (conflict) does not apply.

Full explanation →

229

MCQhard

A global e-commerce company uses Cloud SQL for MySQL to store inventory data. They have a single primary instance in us-central1 and two read replicas in us-west1 and europe-west1 for local reads. Recently, the primary instance experienced a hardware failure causing an outage. The failover to a Cloud SQL high availability (HA) instance took 2 minutes. However, during that time, inventory updates were lost because the binary log position was not fully synchronized. The company requires zero data loss for inventory updates. What should the database engineer do?

A.Migrate to Cloud Spanner with multi-region configuration.

B.Use Cloud SQL with external replication and a stand-by instance in another region.

C.Implement application-level write-ahead logging and replay on failover.

D.Enable point-in-time recovery with a 7-day retention.

AnswerA

Spanner offers synchronous replication across regions, ensuring zero data loss.

Why this answer

Cloud Spanner with a multi-region configuration provides synchronous replication across regions, ensuring strong consistency and zero data loss during failover. Unlike Cloud SQL's asynchronous replication, Spanner uses the Paxos protocol to commit writes across multiple regions before acknowledging success, which eliminates the risk of lost inventory updates during a primary failure.

Exam trap

The trap here is that candidates assume Cloud SQL's high availability (HA) with regional replicas can guarantee zero data loss, but they overlook that Cloud SQL uses asynchronous replication for read replicas, which inherently risks data loss during a primary failure.

How to eliminate wrong answers

Option B is wrong because Cloud SQL with external replication still relies on asynchronous binary log replication, which cannot guarantee zero data loss during a failover; the stand-by instance would have the same synchronization lag issue. Option C is wrong because implementing application-level write-ahead logging and replay on failover adds complexity and does not address the underlying database replication gap; it still depends on the database's binary log position, which was not fully synchronized. Option D is wrong because point-in-time recovery (PITR) with a 7-day retention only allows restoring to a specific time in the past from backups, not real-time failover; it does not prevent data loss during a hardware failure because the binary log position was not synchronized at the moment of the outage.

Full explanation →

230

MCQhard

A financial institution requires all data stored in Cloud Spanner to be encrypted using customer-managed encryption keys (CMEK) stored in Cloud KMS. The security team mandates that the key be in a separate project from the Spanner instance. How should the database engineer configure this?

A.Grant the Cloud Spanner service account the Editor role in the key project.

B.Grant the Cloud Spanner service account the Cloud KMS CryptoKey Encrypter/Decrypter role on the key.

C.Specify the key in the Spanner instance creation and provide the user's credentials for KMS access.

D.Grant the Spanner instance's service account the Cloud KMS Admin role on the key.

AnswerB

This role allows Spanner to use the key for encryption and decryption.

Why this answer

Option B is correct because Cloud Spanner uses a service account to access Cloud KMS keys. To enable customer-managed encryption keys (CMEK) from a separate project, the Cloud Spanner service account must be granted the Cloud KMS CryptoKey Encrypter/Decrypter role on the specific key. This allows Spanner to encrypt and decrypt data using the key without granting broader permissions.

Exam trap

The trap here is that candidates often confuse the Cloud KMS Admin role (which manages the key lifecycle) with the Encrypter/Decrypter role (which performs cryptographic operations), leading them to select option D instead of B.

How to eliminate wrong answers

Option A is wrong because granting the Editor role in the key project is overly permissive and violates the principle of least privilege; it would allow the Spanner service account to manage all resources in the key project, not just the specific key. Option C is wrong because user credentials cannot be used for KMS access; Spanner requires a service account, not a user account, to authenticate with Cloud KMS. Option D is wrong because the Cloud KMS Admin role grants permissions to manage key policies and rotations, which is unnecessary for encryption/decryption operations and introduces security risks.

Full explanation →

231

MCQeasy

A Firestore database has a collection with a composite index on (status, timestamp desc). The query `where status == 'active' order by timestamp desc limit 50` is returning empty results even though there are active documents. What could be the issue?

A.The status field is not indexed.

B.The composite index exists but the order of timestamp is asc instead of desc.

C.The documents have 'active' status but timestamp field is missing.

D.The query requires an index on timestamp only.

AnswerB

The query requires descending order; using an ascending index can cause the query to miss documents.

Why this answer

If the composite index has timestamp in ascending order, the order by desc cannot use the index efficiently, leading to empty results due to scanning in wrong order.

Full explanation →

232

Multi-Selecteasy

A data engineer is creating a reporting layer in BigQuery for BI tools. Which TWO practices improve query performance?

Select 2 answers

A.Use approximate aggregate functions when exact accuracy is not needed.

B.Use SELECT * in queries.

C.Use ORDER BY in subqueries unnecessarily.

D.Store all data in a single table without partitioning.

E.Denormalize tables to reduce joins.

AnswersA, E

Approximate functions like APPROX_COUNT_DISTINCT use less resources.

Why this answer

Option A is correct because BigQuery's approximate aggregate functions (e.g., APPROX_COUNT_DISTINCT, APPROX_QUANTILES) use HyperLogLog++ and other sketching algorithms to return results with a small, bounded error (typically <1%) while drastically reducing the amount of data scanned and shuffled. This trade-off is ideal for BI dashboards where exact counts are not critical, as it can cut query execution time by orders of magnitude.

Exam trap

Google Cloud often tests the misconception that SELECT * is acceptable in production BI queries, but the trap is that it defeats BigQuery's columnar storage and billing model, leading to unnecessary cost and slower performance.

Full explanation →

233

MCQhard

A Cloud Spanner database is experiencing high CPU utilization on one node. Users report slow queries. The table uses a UUID primary key. What is the most effective action?

A.Add a database index on frequently queried columns.

B.Convert the UUID to a monotonically increasing integer.

C.Use a hash key prefix to distribute writes.

D.Increase the number of nodes in the instance.

AnswerC

A hash prefix spreads writes across multiple nodes, reducing load on a single node.

Why this answer

A hash key prefix can distribute writes more evenly across nodes, preventing hot spots that cause high CPU on a single node. Adding nodes may not resolve a hot spot.

Full explanation →

234

MCQhard

A global gaming company uses Cloud Spanner for user profiles and game state. They have a single-region instance in us-central1. Recently, they launched in Europe and notice high latency for European users. They also need to ensure data locality compliance (GDPR). The database is heavily written with throughput spikes. They want to minimize latency without sacrificing write throughput. They consider adding a secondary index on region column. What is the best course of action?

A.Shard the database by region into separate Spanner instances.

B.Use Cloud CDN to cache read data.

C.Create a read-only replica in Europe using Cloud Spanner's read replica feature.

D.Create a multi-region configuration spanning US and Europe, and modify application to read from closest region.

AnswerD

Multi-region provides synchronous replication for low-latency reads/writes globally and meets compliance.

Why this answer

Option D is correct because a multi-region Spanner configuration with regional endpoints allows the application to read from the closest region, reducing latency for European users while maintaining strong consistency and write throughput. Spanner's multi-region configurations use synchronous replication across regions, ensuring GDPR compliance by keeping European user data within Europe for reads, and the write throughput is not sacrificed because all writes are committed to the primary region and replicated asynchronously to other regions.

Exam trap

The trap here is that candidates confuse Cloud Spanner's read replica feature with a read-only replica, but Spanner does not offer standalone read replicas; instead, it uses multi-region configurations with regional endpoints for read affinity, and the term 'read replica' is a common misconception from other databases like Cloud SQL.

How to eliminate wrong answers

Option A is wrong because sharding into separate Spanner instances would break global consistency and require complex application-level routing, and it would not provide a single global database for user profiles and game state. Option B is wrong because Cloud CDN caches static content, not dynamic database writes or transactional data, and it cannot reduce latency for write-heavy workloads or ensure data locality for GDPR. Option C is wrong because Cloud Spanner does not support read-only replicas; it uses multi-region configurations with regional endpoints for read affinity, and a read replica would not handle write throughput spikes or maintain strong consistency.

Full explanation →

235

MCQeasy

Your team is migrating an on-premises PostgreSQL database to Cloud SQL for PostgreSQL. The current schema uses table inheritance, which is not fully supported in Cloud SQL. What should you do to minimize application changes?

A.Continue using inheritance as Cloud SQL supports it fully

B.Use PostgreSQL foreign data wrappers to emulate inheritance

C.Use materialized views to combine data

D.Redesign the schema using separate tables with joins

AnswerD

Standard approach; can use views to simulate inheritance for read operations.

Why this answer

Option B is correct: redesign the schema using separate tables with joins, as this is the standard approach to replace inheritance and can be done with minimal application changes by creating views that emulate the inheritance hierarchy. Option A uses foreign data wrappers, which are not a direct replacement and add complexity. Option C uses materialized views, which don't support write operations.

Option D is incorrect because Cloud SQL does not fully support table inheritance.

Full explanation →

236

MCQeasy

A marketing team uses a BigQuery BI dashboard to analyze campaign performance. The table campaign_performance is 5 TB, partitioned by date, clustered by campaign_id. Queries filter on date range and campaign_id, and are fast. However, one query that joins this table with a user_dimensions table (10 GB, not partitioned) takes too long. The join is on user_id. What is the best improvement?

A.Denormalize user_dimensions into campaign_performance.

B.Cluster user_dimensions by user_id.

C.Partition user_dimensions by date.

D.Use a broadcast join hint.

AnswerA

Denormalizing adds user_dimension columns to the large table, avoiding the expensive join.

Why this answer

Option C is correct because user_dimensions is small (10 GB) relative to campaign_performance, denormalizing eliminates the join entirely. Option A (partition user_dimensions) helps but the join still occurs. Option B (cluster by user_id) reduces shuffle but not elimination.

Option D (broadcast join hint) forces a broadcast but join still occurs.

Full explanation →

237

MCQeasy

A data engineer is building a BI reporting layer in BigQuery. The source data includes JSON logs with nested fields. Analysts need to query nested arrays efficiently. Which approach is best?

A.Use SQL and UNNEST to directly query nested arrays.

B.Load the data into separate tables for each array.

C.Flatten all nested fields into separate tables.

D.Create a view that flattens the data.

AnswerA

UNNEST expands arrays efficiently without physically flattening storage.

Why this answer

Option A is correct because BigQuery natively supports nested and repeated fields via the UNNEST operator, which flattens arrays into rows for SQL-based querying. This approach leverages BigQuery's columnar storage and efficient array handling, allowing analysts to query nested arrays directly without data duplication or additional ETL, which is optimal for BI reporting performance.

Exam trap

The trap here is that candidates assume flattening data into separate tables or views is always necessary for SQL compatibility, but BigQuery's UNNEST provides native, efficient array querying without data restructuring.

How to eliminate wrong answers

Option B is wrong because loading nested arrays into separate tables introduces data redundancy and requires complex JOIN operations, increasing query latency and maintenance overhead compared to BigQuery's native nested structure. Option C is wrong because flattening all nested fields into separate tables discards the relational context of nested data, leading to data duplication and loss of query efficiency that UNNEST provides. Option D is wrong because creating a view that flattens data does not change the underlying storage; it still requires UNNEST at query time and adds no performance benefit, while a view can obscure the schema and complicate debugging.

Full explanation →

238

MCQeasy

A company runs a production Cloud SQL for PostgreSQL instance. They need to ensure high availability with automatic failover in case of a zone failure. Which configuration should they use?

A.Create a cross-region read replica and set it as failover target.

B.Use a zonal HA configuration by selecting a regional location.

C.Deploy a single Cloud SQL instance with multiple CPUs and high memory.

D.Enable high availability (HA) configuration during instance creation.

AnswerD

Cloud SQL HA provisions a standby in a different zone, with synchronous replication for automatic failover.

Why this answer

Option D is correct because enabling the high availability (HA) configuration during Cloud SQL for PostgreSQL instance creation automatically provisions a standby instance in a different zone within the same region. This synchronous replication setup ensures automatic failover with minimal data loss (typically under 1 second RPO) in the event of a zone failure, meeting the requirement for high availability.

Exam trap

Google Cloud often tests the misconception that a read replica can serve as a failover target for high availability, but in Cloud SQL, read replicas are asynchronous and require manual promotion, making them unsuitable for automatic zone-level failover.

How to eliminate wrong answers

Option A is wrong because a cross-region read replica is designed for read scaling and disaster recovery, not for automatic failover within the same region; failover to a cross-region replica would require manual promotion and introduces significant latency and potential data loss. Option B is wrong because 'zonal HA configuration' is not a valid Cloud SQL term; the correct approach is to use a regional HA configuration, which places the primary and standby in different zones automatically. Option C is wrong because scaling up CPUs and memory improves performance but does not provide any redundancy or automatic failover; a single instance remains a single point of failure.

Full explanation →

239

Multi-Selecteasy

Which TWO strategies reduce query costs for ad-hoc analysis in BigQuery? (Choose two.)

Select 2 answers

A.Use LIMIT 10 to preview data.

B.Use clustered tables on frequently filtered columns.

C.Use a flat table without partitioning.

D.Use SELECT * in all queries.

E.Use materialized views for common aggregations.

AnswersB, E

Clustering allows pruning of blocks.

Why this answer

Option B is correct because clustered tables in BigQuery physically sort data based on the specified columns, which allows the query engine to skip entire blocks of data that don't match filter predicates. This reduces the amount of data scanned and thus lowers query costs for ad-hoc analysis. Option E is correct because materialized views precompute and store the results of common aggregations, so queries against them only read the precomputed results rather than scanning the base table, significantly reducing bytes processed.

Exam trap

Google Cloud often tests the misconception that LIMIT reduces cost (it does not in BigQuery's serverless architecture) and that denormalized or flat tables are cheaper (they are not because they increase scan size).

Full explanation →

240

MCQeasy

A data engineer is designing a BigQuery schema for a time-series dataset of IoT sensor readings. The queries will filter primarily on a timestamp column and also on sensor_id. To optimize query performance and cost, which table design is best?

A.Partition by timestamp, cluster by sensor_id

B.Partition by sensor_id, cluster by timestamp

C.Partition by timestamp, cluster by timestamp

D.No partitioning, cluster by timestamp

AnswerA

Reduces scan to relevant partitions and optimizes filtering on sensor_id.

Why this answer

Partitioning by timestamp allows BigQuery to prune entire partitions when queries filter on the timestamp column, reducing the amount of data scanned and thus lowering cost and improving performance. Clustering by sensor_id further organizes data within each partition, enabling block-level pruning for queries that filter on sensor_id. This combination optimizes for the primary filter (timestamp) and secondary filter (sensor_id) without the overhead of excessive partitions.

Exam trap

Google Cloud often tests the misconception that clustering can replace partitioning for time-based filtering, but in reality, partitioning is essential for pruning entire storage blocks, while clustering only optimizes within partitions.

How to eliminate wrong answers

Option B is wrong because partitioning by sensor_id would create a partition for each unique sensor_id, which can lead to a very large number of small partitions (exceeding BigQuery's partition limit of 4,000 per table) and does not optimize for the primary timestamp filter. Option C is wrong because clustering by timestamp when already partitioned by timestamp provides no additional benefit—clustering is redundant and wastes resources since partitioning already prunes by timestamp. Option D is wrong because no partitioning means every query must scan the entire table, even when filtering on timestamp, leading to higher costs and slower performance; clustering alone cannot prune entire partitions.

Full explanation →

241

MCQmedium

A company is migrating an on-premises PostgreSQL database to Cloud SQL for PostgreSQL. The database uses several custom PL/pgSQL functions that perform complex calculations. The migration must minimize application changes and support high availability. Which strategy should the database engineer use for the schema migration?

A.Convert the functions to stored procedures in Cloud Spanner and migrate data separately.

B.Export the functions as SQL scripts and convert them to pgSQL syntax for Cloud SQL.

C.Export the functions as SQL scripts and rewrite them in JavaScript using Cloud Functions.

D.Use pg_dump to export the schema including functions and restore directly to Cloud SQL.

AnswerD

pg_dump preserves PL/pgSQL functions; restore works in Cloud SQL.

Why this answer

Option D is correct because pg_dump can export the entire PostgreSQL schema, including custom PL/pgSQL functions, in a format that Cloud SQL for PostgreSQL natively understands. Restoring directly with pg_restore or psql preserves the functions without requiring syntax conversion, minimizing application changes. Cloud SQL for PostgreSQL supports high availability through regional persistent disks and automatic failover replicas, meeting the HA requirement without altering the schema.

Exam trap

Google Cloud often tests the misconception that PL/pgSQL functions need to be converted or rewritten for Cloud SQL, when in fact Cloud SQL for PostgreSQL is a fully managed PostgreSQL service that supports the same procedural language natively.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner does not support PL/pgSQL functions or stored procedures with the same syntax; migrating to Spanner would require rewriting all functions and changing application queries, violating the 'minimize application changes' requirement. Option B is wrong because PL/pgSQL is already the native procedural language for PostgreSQL; exporting as SQL scripts and 'converting to pgSQL syntax' is unnecessary and implies a false need for syntax conversion, as Cloud SQL for PostgreSQL uses the same PostgreSQL engine. Option C is wrong because rewriting PL/pgSQL functions in JavaScript using Cloud Functions would require significant application refactoring to call external HTTP-triggered functions instead of inline database functions, breaking the 'minimize application changes' constraint.

Full explanation →

242

Multi-Selecteasy

Which TWO factors are important when selecting the location for a Cloud Spanner instance?

Select 2 answers

A.Number of other Spanner instances in the region

B.Data residency compliance

C.Proximity to users

D.Cost of Cloud Spanner nodes in the region

E.Availability of Cloud VPN

AnswersB, C

Some regulations require data to remain within specific geographical boundaries.

Why this answer

Data residency compliance (B) is critical because Cloud Spanner instances are regional resources; data is stored and processed within the chosen Google Cloud region, and many regulations (e.g., GDPR, HIPAA) mandate that data remain within specific geographic boundaries. Proximity to users (C) is important because Spanner's read and write latencies are directly affected by network distance; choosing a region close to your user base minimizes latency and improves application performance.

Exam trap

Google Cloud often tests the misconception that cost (Option D) or the number of existing instances (Option A) are primary location factors, when in reality the official Google Cloud documentation emphasizes data residency and user proximity as the two key considerations for single-region Spanner instance placement.

Full explanation →

243

Multi-Selecthard

A Cloud Spanner database is experiencing high CPU utilization and latency. The workload is read-heavy with occasional writes. Which TWO actions would most effectively improve performance?

Select 2 answers

A.Create a secondary index on frequently queried columns.

B.Use a smaller instance configuration to reduce cost.

C.Add more nodes to the instance.

D.Use a split point to distribute hot rows.

E.Enable interleaved tables to reduce joins.

AnswersA, C

Indexes reduce the need for full table scans, lowering CPU usage and query latency.

Why this answer

Creating a secondary index on frequently queried columns allows Cloud Spanner to serve read queries directly from the index without scanning the full base table, reducing CPU usage and latency. This is especially effective in read-heavy workloads because it minimizes the number of rows that must be processed per query.

Exam trap

Google Cloud often tests the misconception that manual split points or interleaved tables are primary tools for read performance tuning, when in fact Spanner handles splits automatically and interleaving is mainly for write locality and join optimization.

Full explanation →

244

MCQeasy

A BI team queries this table with a WHERE clause that filters on product_id but does not include a sale_date filter. What is the outcome?

A.The query fails with an error.

B.The query runs successfully and only scans partitions containing product_id values.

C.The query runs successfully and scans only the latest partition.

D.The query runs successfully but scans all partitions.

AnswerA

require_partition_filter=true causes query to fail without a partition filter.

Why this answer

In a partitioned table (e.g., using Hive-style partitioning or a similar system like BigQuery or Snowflake), a WHERE clause that filters only on `product_id` without including the partition key `sale_date` forces a full scan of all partitions. However, if the table is defined with a strict partition pruning requirement (e.g., in Databricks or Spark SQL with dynamic partition pruning disabled, or in a system that requires the partition column in the filter), the query may fail with an error because the engine cannot determine which partitions to read without the partition key. The correct answer is A because the scenario implies a system (like certain SQL-on-Hadoop engines or strict partitioning rules) where omitting the partition column in the WHERE clause results in a query error, not a successful scan.

Exam trap

Google Cloud often tests the misconception that partition pruning automatically applies to any column in the WHERE clause, leading candidates to choose Option B, when in reality partition pruning only works on the partition key column, and the absence of that key can cause an error in strict environments.

How to eliminate wrong answers

Option B is wrong because scanning only partitions containing specific `product_id` values would require partition pruning on `product_id`, which is not a partition key; partition pruning only works on the partition column (`sale_date`). Option C is wrong because scanning only the latest partition assumes an implicit default or a system behavior that does not exist; without a `sale_date` filter, the engine has no basis to select a single partition. Option D is wrong because while a full partition scan is a common outcome in many systems, the question explicitly states the query fails with an error, indicating a stricter environment (e.g., a system that enforces partition key inclusion in WHERE clauses) where the query is rejected rather than executed.

Full explanation →

245

MCQeasy

A mobile app backend uses Firestore for user profiles. The schema has a single collection 'users' where each document contains: user_id (used as document ID), name, email, and friends (an array of user IDs). The friends array can grow large (thousands of IDs). When a user adds a friend, the application updates the array, causing the document to grow and leading to write contention and size limit warnings. The team needs to redesign the schema to scale better. What is the best approach?

A.Move the friends list to a subcollection under each user document.

B.Migrate user profiles and friendships to Cloud SQL for relational capabilities.

C.Limit the maximum size of the friends array to 1000 at the application level.

D.Create a new 'friendships' collection with documents containing user_id_1 and user_id_2 fields.

AnswerD

A separate collection for relationships scales well and avoids large documents.

Why this answer

Option C is correct. Using a separate top-level collection 'friendships' with documents representing pairs of users (or edges) scales well and avoids large documents. Option A (subcollection) is possible but is more suited for one-to-many relationships; for many-to-many, a separate collection is standard.

Option B (array with limit) is not a solution for growth. Option D is a different database, not a schema redesign.

Full explanation →

246

MCQeasy

A company is designing a star schema for a BI dashboard that tracks sales performance. The dashboard needs to aggregate sales by product, store, and date. Which schema design is most appropriate?

A.Store all data in a single table using nested JSON arrays for product and store details

B.Create a single wide table with all attributes (product, store, date, sales)

C.Create a fact table with foreign keys to dimension tables for product, store, and date

D.Use a fully normalized snowflake schema with separate tables for each level of hierarchy

AnswerC

A star schema with fact and dimension tables is the standard for BI reporting, enabling fast aggregations.

Why this answer

Option C is correct because a star schema uses a central fact table with foreign keys to dimension tables, which is optimal for BI aggregation queries. Option A is wrong because a single wide table with all attributes leads to data redundancy and slower queries. Option B is wrong because a fully normalized schema (e.g., snowflake) introduces extra joins that can slow BI queries.

Option D is wrong because storing data as JSON arrays in a single table is not suitable for efficient SQL aggregation.

Full explanation →

247

MCQeasy

You are monitoring a Cloud Spanner instance and see that the average commit latency is high. The application performs many single-row inserts. Which metric would you check first to understand the root cause?

A.Read latency.

B.Lock conflicts (e.g., Spanner/API/Lock_wait).

C.Storage utilization.

D.CPU utilization per node.

AnswerB

Lock conflicts directly indicate contention causing commit delays.

Why this answer

High commit latency for single-row inserts in Cloud Spanner is often caused by lock conflicts, as concurrent transactions may contend for the same row or index. The metric Spanner/API/Lock_wait directly measures time spent waiting for locks, making it the first metric to check. High commit latency without high lock wait suggests other issues, but lock conflicts are the most common root cause for write-heavy single-row workloads.

Exam trap

Google Cloud often tests the misconception that CPU or storage metrics are the first indicators of write performance issues, when in fact lock contention is the primary driver for high commit latency in transactional workloads.

How to eliminate wrong answers

Option A is wrong because read latency measures time for read operations, not write commit latency, and single-row inserts are writes. Option C is wrong because storage utilization affects capacity and cost, not commit latency directly; Spanner handles storage scaling automatically. Option D is wrong because CPU utilization per node indicates compute load but does not directly measure lock contention, which is the primary cause of high commit latency for single-row inserts.

Full explanation →

248

MCQmedium

A company is migrating an application from Datastore to Firestore in Datastore mode. They need to ensure zero downtime during the migration. What is the recommended approach?

A.Export all data from Datastore to Cloud Storage, then import into a new Firestore database.

B.Run both Datastore and Firestore in parallel, writing to both until migration is complete.

C.Create a new Firestore database and redirect traffic gradually.

D.Upgrade the existing Datastore project to Firestore in Datastore mode using the Cloud Console.

AnswerD

Firestore in Datastore mode is a seamless upgrade with full compatibility.

Why this answer

Option D is correct because upgrading an existing Datastore project to Firestore in Datastore mode via the Cloud Console is a built-in, one-way migration that preserves the existing database name, indexes, and data without requiring any export/import or application code changes. This process is designed to be a live upgrade with no downtime, as Firestore in Datastore mode is fully backward-compatible with the Datastore API, allowing existing queries and transactions to continue uninterrupted during the transition.

Exam trap

The trap here is that candidates often assume a migration requires an export/import or dual-write strategy, but Cisco tests the specific knowledge that Firestore in Datastore mode is a direct upgrade path from Datastore with zero downtime, not a separate service that needs data copied over.

How to eliminate wrong answers

Option A is wrong because exporting all data to Cloud Storage and then importing into a new Firestore database introduces significant downtime during the export and import operations, and it does not preserve the original database name or existing indexes without manual reconfiguration. Option B is wrong because running both Datastore and Firestore in parallel and writing to both is not a supported approach; there is no built-in mechanism to dual-write to Datastore and Firestore simultaneously, and this would require complex application-level changes and risk data inconsistency. Option C is wrong because creating a new Firestore database and redirecting traffic gradually still requires a cutover period where the application must be modified to point to the new database, and the existing Datastore data must be migrated separately, causing downtime or data staleness during the transition.

Full explanation →

249

MCQmedium

A company uses BigQuery for BI reporting. They have a table 'orders' with columns: order_id, customer_id, order_date, amount, status. The BI team frequently runs queries that filter on order_date and group by customer_id to compute total sales per customer. Which partitioning and clustering strategy optimizes query performance and cost?

A.Partition by order_date, cluster by status

B.Do not partition, cluster by customer_id

C.Partition by customer_id, cluster by order_date

D.Partition by order_date, cluster by customer_id

AnswerD

Partitioning on order_date prunes partitions for date filters; clustering on customer_id improves group by performance.

Why this answer

Option D is correct because partitioning by order_date allows BigQuery to prune partitions for queries filtering on order_date, reducing the amount of data scanned. Clustering by customer_id organizes data within each partition so that GROUP BY customer_id queries can efficiently read only relevant blocks, minimizing shuffle and cost. This combination directly aligns with the BI team's query pattern of filtering by date and aggregating by customer.

Exam trap

Google Cloud often tests the misconception that clustering alone is sufficient for performance, ignoring that partitioning is essential for date-range filters to avoid full table scans, or that clustering on a high-cardinality column like customer_id is ideal for GROUP BY but must be paired with a partition key that matches the filter pattern.

How to eliminate wrong answers

Option A is wrong because clustering by status does not optimize the GROUP BY on customer_id, and status is not used in filtering or grouping, so it provides no benefit for the described workload. Option B is wrong because without partitioning, queries filtering on order_date must scan the entire table, increasing cost and latency, even if clustering by customer_id helps the GROUP BY. Option C is wrong because partitioning by customer_id is not practical (high cardinality, many small partitions) and does not help date-range filtering, while clustering by order_date does not optimize the GROUP BY on customer_id.

Full explanation →

250

MCQeasy

You are a database engineer for an e-commerce company. The company uses BigQuery for its BI and analytics. The data pipeline stages raw event data into a table 'raw_events' with columns: event_id, user_id, event_time, event_type, and a JSON string 'event_data'. The BI team wants to query this data for user behavior analysis, but the JSON parsing makes queries slow. They need to perform frequent queries that extract specific fields from the JSON and filter by event_time. The table 'raw_events' is not partitioned and has 2 billion rows. What is the most effective single step to improve query performance and reduce cost?

A.Create a view that extracts JSON fields into columns

B.Partition the table on event_time and cluster on event_type

C.Increase BigQuery slots to maximum

D.Use a materialized view to precompute common queries

AnswerB

Partitioning reduces scanned data; clustering helps with event_type filters.

Why this answer

Partitioning the table on event_time allows BigQuery to prune entire partitions when queries filter by event_time, drastically reducing the amount of data scanned. Clustering on event_type further organizes data within each partition, enabling block-level pruning for queries that filter or aggregate by event_type. This combination directly addresses the slow JSON parsing and high cost by minimizing scanned bytes, which is the most effective single step for a 2-billion-row table.

Exam trap

Google Cloud often tests the misconception that a view or materialized view alone can solve performance issues, but the trap here is that without physical data reorganization (partitioning and clustering), the underlying full table scan and JSON parsing remain the bottleneck.

How to eliminate wrong answers

Option A is wrong because a view does not physically reorganize data; it only stores a query definition, so the underlying table still requires full scans and JSON parsing on every query, providing no performance or cost benefit. Option C is wrong because increasing BigQuery slots only improves concurrency and execution speed for compute-bound queries, but does not reduce the amount of data scanned; the bottleneck here is I/O from scanning billions of rows, not CPU. Option D is wrong because a materialized view would precompute results, but it still requires the base table to be partitioned and clustered to be efficient; without partitioning, the materialized view would need to scan the entire table on refresh, and it cannot dynamically prune partitions for ad-hoc filters on event_time.

Full explanation →

251

Multi-Selecteasy

A data engineering team wants to monitor BigQuery query performance and slot utilization. Which TWO tools or features should they use? (Choose two.)

Select 2 answers

A.INFORMATION_SCHEMA tables (e.g., JOBS_BY_PROJECT)

B.Cloud Monitoring (Metrics like 'bigquery.googleapis.com/scheduler/slot_allocation')

C.Cloud Logging (audit logs)

D.bq command-line tool with the '--format=prettyjson' flag

E.Cloud Profiler

AnswersA, B

INFORMATION_SCHEMA provides historical query performance and slot usage.

Why this answer

BigQuery's INFORMATION_SCHEMA provides query statistics and slot usage. Cloud Monitoring can alert on slot utilization. Cloud Logging stores audit logs but not detailed performance; the BigQuery CLI is for management; Cloud Profiler is for code profiling, not BigQuery slots.

Full explanation →

252

MCQhard

A company is migrating a legacy on-premises MySQL database to Cloud SQL for PostgreSQL. The database uses composite primary keys on multiple tables and heavily relies on cross-table joins with foreign keys. The team wants to minimize application code changes during migration. Which schema design strategy should the Cloud Database Engineer recommend to ensure compatibility and performance?

A.Maintain the same schema and rewrite joins as materialized views in PostgreSQL to optimize queries.

B.Use the same composite primary keys and foreign key constraints in Cloud SQL for PostgreSQL, leveraging its full support for these features.

C.Migrate to Cloud Spanner instead, using interleaved tables to replace join-heavy operations.

D.Remove composite primary keys and replace them with surrogate keys; use look-up tables for foreign key relationships.

AnswerB

Cloud SQL for PostgreSQL fully supports composite primary keys and foreign keys, minimizing application changes.

Why this answer

Option B is correct because Cloud SQL for PostgreSQL fully supports composite primary keys and foreign key constraints, which are standard SQL features. By maintaining the same schema, the team minimizes application code changes while preserving referential integrity and join performance, as PostgreSQL's query planner handles these constructs efficiently.

Exam trap

The trap here is that candidates assume cloud-native databases require schema redesign (e.g., denormalization or surrogate keys) for performance, but PostgreSQL's full SQL compliance often allows a direct lift-and-shift of composite keys and foreign keys without changes.

How to eliminate wrong answers

Option A is wrong because materialized views are not a direct replacement for joins; they store precomputed results and require manual refresh, which adds complexity and does not eliminate the need for application code changes to query the views instead of the original tables. Option C is wrong because migrating to Cloud Spanner would require significant schema redesign (e.g., denormalization into interleaved tables) and application code changes, contradicting the goal of minimizing changes. Option D is wrong because removing composite primary keys and replacing them with surrogate keys would break existing application logic that relies on composite keys for joins and lookups, requiring extensive code modifications.

Full explanation →

253

Multi-Selecteasy

Which two of the following are best practices when designing BigQuery schemas? (Choose two.)

Select 2 answers

A.Use column-level security to restrict access

B.Use denormalization to reduce the number of joins

C.Use the type RECORD for structured data

D.Use repeated fields to avoid joins when querying parent-child data

E.Use a single table for all data to simplify queries

AnswersB, D

Denormalization improves query performance by reducing joins.

Why this answer

Options A and C are correct best practices: using repeated fields to avoid joins (common for nested data) and using denormalization to reduce joins. Option B (column-level security) is a security feature, not a schema design best practice. Option D is incorrect because BigQuery encourages logical data models, not a single table.

Option E using RECORD is a way to implement nested structures but is not a general best practice by itself; repeated fields are more specific.

Full explanation →

254

MCQeasy

A company wants to store and analyze data with BigQuery. They have customers in Europe and need to comply with GDPR data residency requirements. What should they do to ensure data stays within the European Union?

A.Use the default dataset location and rely on BigQuery's automatic compliance.

B.Create the dataset with the location set to 'EU'.

C.Set the query job location to 'EU' for all queries.

D.Choose a dataset location of 'us-central1' for performance.

AnswerB

Dataset location determines where data is stored.

Why this answer

Option B is correct because BigQuery enforces data residency at the dataset level. When you create a dataset with the location set to 'EU', all tables, views, and data within that dataset are physically stored in a Google Cloud region within the European Union. This ensures compliance with GDPR data residency requirements, as data will not be moved or replicated outside the EU without explicit user action.

Exam trap

Google Cloud often tests the misconception that query job location controls data residency, when in fact it only controls query processing location, while data storage is determined by the dataset location.

How to eliminate wrong answers

Option A is wrong because the default dataset location is 'US', which stores data in the United States and does not automatically comply with GDPR data residency requirements; BigQuery does not have an 'automatic compliance' feature. Option C is wrong because setting the query job location to 'EU' only affects where the query is processed, not where the underlying data is stored; data could still reside in a non-EU dataset. Option D is wrong because 'us-central1' is a location in the United States, which would violate GDPR data residency requirements by storing data outside the European Union.

Full explanation →

255

Multi-Selecthard

Which TWO optimizations best address slow join performance caused by excessive broadcasting in BigQuery? (Choose two.)

Select 2 answers

A.Use a large query timeout.

B.Set the dimension table to be very large to prevent broadcast.

C.Increase the number of slots.

D.Use a materialized view that pre-joins the tables.

E.Cluster the fact table on the join key.

AnswersD, E

Materialized views avoid runtime joins.

Why this answer

Option D is correct because a materialized view can pre-compute and store the join result, eliminating the need to re-execute the join at query time. This avoids the overhead of broadcasting the dimension table repeatedly, as the materialized view is incrementally refreshed and queried directly, reducing both shuffle and broadcast costs.

Exam trap

Google Cloud often tests the misconception that increasing resources (slots or timeout) or making a table larger can fix join performance issues, when the correct approach is to restructure the data or use pre-computed results like materialized views.

Full explanation →

256

MCQhard

A company's application using Cloud Bigtable is experiencing high read latency. The row keys are based on a timestamp prefix. Which design change is most likely to improve performance?

A.Increase the number of column families.

B.Use a single node cluster.

C.Use monotonically increasing row keys.

D.Reverse the timestamp in row keys to distribute writes.

AnswerD

Reversing the timestamp spreads writes across tablets, reducing hot spots and improving read latency.

Why this answer

In Cloud Bigtable, monotonically increasing row keys (like timestamps) cause all writes to hit a single tablet server, creating a hotspot that degrades read and write performance. Reversing the timestamp (e.g., using `Long.MAX_VALUE - timestamp`) distributes writes across the key space, preventing hotspots and reducing read latency by balancing load across nodes.

Exam trap

Google Cloud often tests the misconception that adding more nodes or column families solves performance issues, when the real fix is designing row keys to avoid hotspots by distributing writes evenly across the key space.

How to eliminate wrong answers

Option A is wrong because increasing column families does not address the root cause of hotspotting from sequential row keys; column families affect storage and schema design, not write distribution. Option B is wrong because a single node cluster would exacerbate the problem by concentrating all traffic on one node, increasing latency further. Option C is wrong because monotonically increasing row keys are exactly the pattern that causes hotspotting in Bigtable; this option describes the problematic behavior, not a fix.

Full explanation →

257

MCQeasy

A Dataflow streaming pipeline that writes to a BigQuery table fails with the error above. Which change should be made to the table schema to prevent this error?

A.Add a clustering column

B.Partition the table by ingestion time

C.Increase the streaming buffer size in the table definition

D.Change the table to use a wildcard table pattern

AnswerB

Partitioning spreads writes across multiple partition buffers, preventing overflow.

Why this answer

Partitioning the table by ingestion time (e.g., _PARTITIONTIME) distributes the streaming buffer across multiple partitions, avoiding the per-partition buffer limit. Increasing the buffer size is a workaround but not a schema change. Clustering does not affect the streaming buffer.

Using a wildcard table is unrelated.

Full explanation →

258

MCQhard

A Cloud SQL for MySQL instance is configured with 8 vCPUs and 30GB RAM. The Database Advisor suggests adding an index on a table, but after adding, write performance degrades. The table has 10 million rows and receives 500 writes per second. What is the most likely reason?

A.The instance needs more CPU.

B.The index is too large for memory.

C.The index is not used for reads.

D.The index is on a highly selective column causing write amplification.

AnswerD

High selectivity means many unique values, requiring many index page splits and updates, increasing write latency.

Why this answer

Option D is correct because adding an index on a highly selective column (e.g., a column with many unique values) forces MySQL to update the B-tree index structure on every write operation. With 500 writes per second and 10 million rows, each INSERT or UPDATE must maintain the index, causing write amplification that degrades write throughput. The 8 vCPUs and 30GB RAM are sufficient for the workload, but the index maintenance overhead becomes the bottleneck.

Exam trap

Google Cloud often tests the misconception that write degradation after adding an index is caused by insufficient resources (CPU or memory), when the real issue is the overhead of index maintenance on a highly selective column, known as write amplification.

How to eliminate wrong answers

Option A is wrong because the instance has 8 vCPUs, which is typically adequate for 500 writes per second; the degradation is due to index maintenance, not CPU starvation. Option B is wrong because the index size depends on the column data type and row count, but 30GB RAM is ample for a single index on a 10-million-row table; the issue is write amplification, not memory pressure. Option C is wrong because whether the index is used for reads is irrelevant to write performance degradation; the problem is the overhead of maintaining the index during writes, not its read usage.

Full explanation →

259

MCQmedium

You are designing a BigQuery schema for IoT sensor data. The sensor readings have varying fields depending on the sensor type. You want to minimize storage costs and avoid schema maintenance when new sensor types are added. What is the best schema design?

A.Use a separate table per sensor type

B.Store the sensor data in a JSON column

C.Use a schema with a STRUCT containing all possible fields as optional

D.Use a wide table with many nullable columns

AnswerB

JSON provides schema flexibility and cost-effective storage for varying fields.

Why this answer

Option A is correct: using a JSON column allows flexible schema without requiring ALTER TABLE when new fields appear; BigQuery efficiently stores JSON and can query it with standard SQL. Option B requires creating new tables for each sensor type, increasing maintenance. Option C and D are wasteful because many columns will be NULL for most rows.

Full explanation →

260

Multi-Selecthard

In Cloud Spanner, which TWO strategies can help reduce read latency for globally distributed applications?

Select 2 answers

A.Enable follower reads

B.Use stale reads

C.Use a multi-region configuration with a default leader

D.Use database partitioning

E.Increase the number of nodes

AnswersA, B

Follower reads can serve reads from the nearest replica.

Why this answer

Option A is correct because follower reads allow read requests to be served by any replica in a Cloud Spanner multi-region configuration, not just the leader replica. This reduces read latency by enabling reads to be processed from a replica that is geographically closer to the client, without incurring the round-trip time to the leader. Follower reads are particularly effective for globally distributed applications where read consistency can be slightly relaxed.

Exam trap

Google Cloud often tests the misconception that any multi-region configuration automatically reduces read latency, when in fact you must explicitly enable follower reads or stale reads to avoid routing reads to the leader replica.

Full explanation →

261

MCQhard

A financial company uses Cloud SQL for PostgreSQL to store transaction data. They need to create a materialized view that aggregates daily sales for a BI dashboard. The underlying transaction table is updated continuously. Which approach ensures the materialized view remains up to date without manual intervention?

A.Use BigQuery federated query to directly query the Cloud SQL table

B.Use a Cloud SQL read replica and create the materialized view on the replica

C.Schedule a Cloud Function via Cloud Scheduler to run REFRESH MATERIALIZED VIEW periodically

D.Add a trigger on the base table to refresh the materialized view on each update

AnswerC

This provides automated periodic refreshes without manual effort.

Why this answer

Option C is correct because Cloud SQL for PostgreSQL does not support automatic materialized view refresh. The only way to keep a materialized view up to date without manual intervention is to schedule a periodic refresh using Cloud Scheduler to invoke a Cloud Function that executes the REFRESH MATERIALIZED VIEW command. This approach balances freshness with resource cost, as refreshing on every transaction would be too expensive.

Exam trap

Google Cloud often tests the misconception that materialized views in PostgreSQL can be automatically refreshed via triggers or that read replicas support materialized view creation, leading candidates to pick options that ignore the fundamental write-lock and replication limitations of Cloud SQL for PostgreSQL.

How to eliminate wrong answers

Option A is wrong because BigQuery federated queries read the Cloud SQL table directly without creating a materialized view, so they do not provide the pre-aggregated, fast-query performance that a materialized view offers, and they still incur query-time overhead. Option B is wrong because a Cloud SQL read replica is a read-only copy of the database; you cannot create a materialized view on a replica because PostgreSQL does not support materialized views on replicas (they require write access to store the view data). Option D is wrong because adding a trigger to refresh the materialized view on each update would cause severe performance degradation and is not supported in Cloud SQL for PostgreSQL—triggers cannot execute REFRESH MATERIALIZED VIEW directly, and even if they could, the overhead of refreshing on every row change would be prohibitive.

Full explanation →

262

MCQmedium

You are running a Memorystore for Redis instance with a high write volume. You notice that the eviction rate is high and the cache hit ratio has dropped significantly. Which configuration change would most directly reduce the eviction rate?

A.Use many small keys instead of a few large keys.

B.Decrease the maxmemory setting to force earlier eviction.

C.Set maxmemory-policy to noeviction.

D.Change maxmemory-policy to allkeys-lru.

AnswerD

LRU eviction removes least recently used keys, which can improve cache efficiency.

Why this answer

Option D is correct because setting `maxmemory-policy` to `allkeys-lru` tells Redis to evict the least recently used keys from the entire keyspace when memory is full. This directly reduces the eviction rate by ensuring that only the least accessed data is removed, preserving the most frequently accessed data and improving the cache hit ratio. In a high-write-volume scenario, this policy adapts to access patterns and minimizes unnecessary evictions of hot data.

Exam trap

The trap here is that candidates often confuse 'reducing eviction rate' with 'preventing eviction entirely' and choose `noeviction`, not realizing that `noeviction` causes write failures instead of reducing evictions, while `allkeys-lru` actively manages memory to keep evictions low by targeting only cold keys.

How to eliminate wrong answers

Option A is wrong because using many small keys instead of a few large keys does not reduce the eviction rate; it may actually increase memory overhead due to per-key metadata and does not address the root cause of memory pressure. Option B is wrong because decreasing the `maxmemory` setting forces earlier eviction, which would increase the eviction rate, not reduce it. Option C is wrong because setting `maxmemory-policy` to `noeviction` prevents any eviction, causing write operations to fail with an OOM error when memory is full, which does not reduce the eviction rate but instead breaks the application.

Full explanation →

263

MCQeasy

A financial BI application stores monetary values such as revenue and tax amounts. Which BigQuery data type should be used to ensure accuracy in calculations?

A.Use STRING and parse numbers as needed

B.Use INT64 and store amounts in cents

C.Use FLOAT64

D.Use NUMERIC or BIGNUMERIC

AnswerD

Exact numeric types guarantee precision for decimals, essential for financial data.

Why this answer

Option D is correct because NUMERIC and BIGNUMERIC are exact numeric types with fixed precision and scale, designed to avoid floating-point rounding errors. In BigQuery, monetary calculations require exact decimal arithmetic, and these types provide up to 38 (NUMERIC) or 76 (BIGNUMERIC) digits of precision, ensuring accuracy for revenue and tax computations.

Exam trap

Google Cloud often tests the misconception that FLOAT64 is acceptable for financial data because it handles decimals, but the trap is that floating-point arithmetic is inherently imprecise for exact monetary calculations, leading to subtle rounding errors that fail audit requirements.

How to eliminate wrong answers

Option A is wrong because storing monetary values as STRING forces parsing on every query, introduces conversion overhead, and loses the ability to perform direct arithmetic operations without explicit casting, which is inefficient and error-prone. Option B is wrong because storing amounts in cents as INT64, while avoiding floating-point issues, requires manual scaling and can overflow for large values (e.g., billions of dollars in cents exceed INT64 max of ~9.2e18) and complicates tax calculations involving fractions of a cent. Option C is wrong because FLOAT64 is a floating-point type that introduces binary rounding errors (e.g., 0.1 + 0.2 != 0.3), which can cause cumulative inaccuracies in financial calculations and violate accounting standards.

Full explanation →

264

Drag & Dropmedium

Order the steps to perform a disaster recovery drill for a Cloud Spanner database using backups.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

Backup first, then restore to another region, verify, update apps, test.

Full explanation →

265

Multi-Selecthard

Which THREE methods are effective for improving query performance in BigQuery for BI workloads?

Select 3 answers

A.Clustering on frequently filtered columns

B.Replacing joins with subqueries

C.Partitioning on a date column

D.Using SELECT * in queries

E.Using pre-aggregated summary tables

AnswersA, C, E

Clustering allows BigQuery to skip reading blocks that don't match filter conditions.

Why this answer

Option A is correct because clustering on frequently filtered columns physically co-locates related data within blocks, significantly reducing the amount of data scanned for queries with filter predicates. This is especially effective for BI workloads that often filter on high-cardinality columns like customer ID or transaction type, as it avoids full table scans and improves query performance without additional storage costs.

Exam trap

Google Cloud often tests the misconception that subqueries are always more efficient than joins, but in BigQuery, joins are optimized for distributed processing while subqueries can cause performance degradation due to lack of parallelism.

Full explanation →

266

Multi-Selectmedium

A Cloud SQL for PostgreSQL instance is experiencing high replication lag between primary and read replica. Which TWO actions would reduce the lag?

Select 2 answers

A.Reduce the value of `max_wal_size`.

B.Increase the replica's machine size.

C.Place the replica in the same zone as the primary.

D.Use a higher number of read replicas.

E.Enable synchronous replication.

AnswersB, C

More CPU and memory on the replica help it apply WAL more quickly, reducing lag.

Why this answer

Increasing the replica's machine size allows it to apply WAL faster, and placing the replica in the same zone reduces network latency. Other options either increase lag or are ineffective.

Full explanation →

267

Multi-Selecthard

Which THREE techniques can improve query performance in BigQuery for BI workloads? (Choose three.)

Select 3 answers

A.Use approximate aggregation functions when exact results are not required.

B.Avoid SELECT * in production queries; select only needed columns.

C.Use SELECT * with LIMIT to preview data.

D.Use ORDER BY on large result sets without LIMIT.

E.Cluster the table on columns frequently used in WHERE clauses.

AnswersA, B, E

Approximate functions use less memory and are faster.

Why this answer

Option A is correct because approximate aggregation functions (e.g., APPROX_COUNT_DISTINCT, APPROX_QUANTILES) in BigQuery use HyperLogLog++ algorithms to return near-exact results with significantly lower resource consumption and faster execution. For BI workloads where exact precision is not critical (e.g., dashboard approximations), this reduces query cost and latency.

Exam trap

Google Cloud often tests the misconception that SELECT * with LIMIT is a performance optimization, when in fact it still incurs full column scan costs, and that ORDER BY without LIMIT is acceptable for large datasets, ignoring BigQuery's requirement for a LIMIT clause to enable distributed sorting.

Full explanation →

268

Multi-Selecthard

A financial services company needs to design a BigQuery data model for real-time fraud detection. Data arrives from multiple streaming sources and must be joined with historical customer profiles (10 TB) and transaction lookup tables (500 GB). Which TWO design considerations are most important to minimize query latency and cost?

Select 2 answers

A.Use time-based partitioning on the historical customer table and cluster on customer_id.

B.Partition streaming data by ingestion time and cluster by customer_id and transaction_type.

C.Schedule a nightly script to recluster tables based on query patterns.

D.Use a single table for all streaming data without partitioning to avoid partition management overhead.

E.Denormalize all historical and lookup data into a single wide table.

AnswersA, B

Time-based partitioning reduces scan for recent customers, and clustering on join key speeds up the join.

Why this answer

Option A is correct because time-based partitioning on the historical customer table (10 TB) allows BigQuery to prune irrelevant partitions during queries, reducing the amount of data scanned and thus lowering cost and latency. Clustering on customer_id further optimizes joins with streaming data by colocating related rows, minimizing shuffle overhead.

Exam trap

Google Cloud often tests the misconception that manual reclustering is required for performance, when in fact BigQuery's automatic reclustering handles it transparently, and that denormalization is always beneficial for joins, ignoring the storage and maintenance costs in large-scale systems.

Full explanation →

269

MCQmedium

A DBA notices high query cache lock contention on this Cloud SQL for MySQL instance. Which configuration change should be recommended?

A.Set query_cache_type=0

B.Decrease max_connections

C.Increase innodb_buffer_pool_size

D.Increase query_cache_size

AnswerA

Disabling query cache removes lock contention entirely.

Why this answer

Query cache lock contention occurs when multiple threads try to access the query cache simultaneously, causing serialization. Setting `query_cache_type=0` disables the query cache entirely, eliminating the lock contention. This is the recommended fix because MySQL 8.0 deprecated the query cache due to scalability issues, and Cloud SQL for MySQL instances benefit from removing this bottleneck.

Exam trap

Google Cloud often tests the misconception that increasing the size of a cache or buffer always improves performance, but here increasing `query_cache_size` exacerbates lock contention, making the problem worse.

How to eliminate wrong answers

Option B is wrong because decreasing `max_connections` reduces the number of concurrent connections but does not address the internal locking mechanism of the query cache; contention is a lock-level issue, not a connection-level one. Option C is wrong because increasing `innodb_buffer_pool_size` improves InnoDB data caching and reduces disk I/O, but it does not affect query cache locks, which are a separate memory structure. Option D is wrong because increasing `query_cache_size` only allocates more memory to the query cache, which can actually worsen contention by increasing the time spent scanning or invalidating cache entries under high concurrency.

Full explanation →

270

MCQmedium

Your Bigtable cluster is showing high read latency for row key lookups. The application accesses rows with keys in the format 'user_id#timestamp'. You notice that most reads are for recent timestamps. Which optimization should you implement?

A.Increase the number of nodes in the cluster

B.Reverse the row key order to start with the timestamp

C.Configure single cluster routing to reduce cross-cluster latency

D.Use a scan with a prefix filter instead of point reads

AnswerB

This distributes recent writes/reads across tablet servers.

Why this answer

High read latency for recent timestamps occurs because Bigtable stores rows in lexicographic order by row key. With the format 'user_id#timestamp', older timestamps appear first, causing recent data to be scattered across tablets and requiring more seeks. Reversing the row key to start with the timestamp (e.g., 'timestamp#user_id') groups recent data together in contiguous tablets, enabling faster point lookups and reducing latency.

Exam trap

Google Cloud often tests the misconception that scaling nodes (Option A) solves all performance issues, but the trap here is that row key design directly impacts data locality and latency, making key ordering the primary optimization for time-based access patterns.

How to eliminate wrong answers

Option A is wrong because increasing nodes improves throughput and load distribution but does not address the root cause of scattered recent data due to row key ordering; it may even increase latency due to additional network hops. Option C is wrong because single cluster routing reduces cross-cluster latency in multi-cluster setups, but the question does not mention multiple clusters; the latency issue is within a single cluster due to row key design. Option D is wrong because using a scan with a prefix filter is less efficient than point reads for known row keys; it scans unnecessary rows and increases latency, whereas the goal is to optimize point lookups for recent timestamps.

Full explanation →

271

Matchingmedium

Match each Cloud SQL high-availability feature to its description.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Synchronous replication across two zones

Standby instance in a different zone for automatic failover

Asynchronous replica for read offloading

Promotion of standby on primary failure

Point-in-time recovery and disaster recovery

Why these pairings

These features ensure availability and durability for Cloud SQL instances.

Full explanation →

272

MCQhard

The query above fails with 'Resources exceeded: UDF out of memory' on a large table. What is the best way to fix this?

A.Rewrite the function as a SQL UDF to avoid JavaScript overhead

B.Add a GROUP BY clause to reduce the number of rows processed

C.Convert the temporary UDF to a persistent UDF

D.Increase the memory allocation for JavaScript UDFs

AnswerA

SQL UDFs run natively in BigQuery's execution engine and do not have the same memory constraints.

Why this answer

Option A is correct because JavaScript UDFs in BigQuery run in a sandbox with limited memory (typically 6 MB per UDF instance). When processing a large table, the UDF may exceed this memory due to per-row overhead or large intermediate results. Rewriting the function as a SQL UDF eliminates JavaScript overhead and runs natively within BigQuery's distributed execution engine, which can handle larger datasets without memory constraints.

Exam trap

Google Cloud often tests the misconception that memory errors in UDFs can be fixed by increasing resources or changing UDF persistence, when the real limitation is the fixed JavaScript sandbox memory that can only be avoided by using SQL UDFs.

How to eliminate wrong answers

Option B is wrong because adding a GROUP BY clause does not reduce the number of rows processed by the UDF; it only aggregates results after the UDF runs, so the memory issue persists. Option C is wrong because converting a temporary UDF to a persistent UDF does not change the execution environment or memory limits; both types of JavaScript UDFs share the same sandbox memory constraints. Option D is wrong because BigQuery does not allow users to increase memory allocation for JavaScript UDFs; the sandbox memory is fixed and cannot be adjusted.

Full explanation →

273

Multi-Selecteasy

A company is building a microservices architecture and needs to choose a database per service. Which THREE factors should they consider?

Select 3 answers

A.Consistency requirements

B.Preferred programming language

C.Expected query patterns

D.Data size and growth rate

E.Number of developers

AnswersA, C, D

Different services may need strong or eventual consistency, affecting database choice (e.g., Spanner vs. Bigtable).

Why this answer

Consistency requirements are critical because different databases offer varying consistency models (e.g., ACID vs. BASE). For example, a financial transaction service requiring strong consistency would need a relational database like PostgreSQL, while a social media feed could tolerate eventual consistency with a NoSQL database like Cassandra.

Choosing based on consistency ensures data integrity aligns with business needs.

Exam trap

Google Cloud often tests the misconception that programming language or team size should drive database choice, but the correct focus is on data-specific factors like consistency, query patterns, and scalability.

Full explanation →

274

MCQhard

A BI team is designing a BigQuery table for a sales dashboard that queries daily sales by product category and region. The dashboard often filters on a specific date range and a specific region. Which combination of partitioning and clustering should be used?

A.Partition by region, cluster by date

B.Use only clustering on date and region without partitioning

C.Partition by date, cluster by region

D.Partition by month, cluster by date

AnswerC

Partitioning by date fine-tunes the scan to the date range; clustering by region organizes data to skip irrelevant blocks.

Why this answer

Partitioning by date (e.g., on a DATE or TIMESTAMP column) allows BigQuery to prune entire partitions when the dashboard filters on a specific date range, reducing the amount of data scanned. Clustering by region then sorts the data within each partition by region, enabling efficient block-level pruning when the dashboard filters on a specific region. This combination optimizes both the date range and region filters, which are the most common query patterns for this sales dashboard.

Exam trap

Google Cloud often tests the misconception that partitioning can be applied to any column type (like region) or that clustering alone is sufficient for date range filtering, leading candidates to overlook the mandatory requirement that partitioning must be on a DATE, TIMESTAMP, or integer column and that clustering complements but does not replace partitioning for range-based pruning.

How to eliminate wrong answers

Option A is wrong because partitioning by region is not supported in BigQuery (partitioning only supports DATE, TIMESTAMP, or integer columns, not string columns like region), and clustering by date would not provide the same pruning benefit for date range filters as partitioning by date does. Option B is wrong because using only clustering without partitioning means every query must scan all partitions (i.e., the entire table), even when filtering on a date range, leading to higher costs and slower performance compared to a partitioned table. Option D is wrong because partitioning by month is too coarse for a dashboard that often filters on a specific date range (e.g., a few days or weeks), resulting in scanning entire monthly partitions even when only a few days are needed, and clustering by date within a monthly partition is redundant since date is already the partition key.

Full explanation →

275

MCQmedium

A data engineer runs a BigQuery query that joins a large fact table with a small lookup table. The query processes 1 TB of data and takes 30 seconds. The engineer wants to reduce the amount of data processed. Which optimization technique is MOST effective?

A.Increase the number of slots available for the query.

B.Use a WITH clause to pre-filter the fact table before joining.

C.Cluster the lookup table on the join key.

D.Materialize the lookup table as a separate table with the same data.

AnswerB

Pre-filtering reduces the amount of data from the fact table that needs to be joined.

Why this answer

Option B is correct because pre-filtering the fact table with a WITH clause (CTE) reduces the amount of data scanned and processed before the join occurs. Since the fact table is large (1 TB), applying filters early minimizes the data shuffled and joined, directly reducing the bytes billed in BigQuery. This is a form of predicate pushdown that leverages BigQuery's columnar storage and dynamic query optimization.

Exam trap

The trap here is that candidates confuse query performance (speed) with data processed (cost), often choosing to increase slots (Option A) which only reduces elapsed time but does not lower the bytes billed.

How to eliminate wrong answers

Option A is wrong because increasing slots only speeds up query execution (reduces elapsed time) but does not reduce the amount of data processed; the query still scans 1 TB. Option C is wrong because clustering the lookup table on the join key improves join performance by reducing shuffle, but the lookup table is already small, so the impact on data processed is negligible; the bottleneck is the large fact table. Option D is wrong because materializing the lookup table as a separate table with the same data does not change the amount of data processed; it only duplicates storage without reducing the fact table scan.

Full explanation →

276

MCQmedium

A Cloud SQL for MySQL instance requires a patch update. How can you minimize downtime during the update?

A.Apply the patch without restart

B.Use database migration to a new instance

C.Failover to a read replica during the update

D.Schedule the maintenance during a maintenance window

AnswerD

Scheduling maintenance allows Cloud SQL to perform rolling updates with minimal downtime, especially with HA enabled.

Why this answer

Option D is correct because Cloud SQL for MySQL uses a maintenance window to schedule patching during a period of low traffic, minimizing user impact. While a restart is typically required for patch application, the maintenance window allows you to control when that restart occurs, reducing downtime exposure. This is the standard recommended approach for managed database patching in Google Cloud.

Exam trap

Google Cloud often tests the misconception that read replicas can be used for failover during patching, but in Cloud SQL for MySQL, read replicas do not support automatic failover for patching—only regional failover replicas in a high-availability configuration can, and even then, the maintenance window is the key to minimizing downtime.

How to eliminate wrong answers

Option A is wrong because Cloud SQL for MySQL patch updates typically require a restart of the instance to apply system-level changes, and there is no 'apply without restart' feature for such updates. Option B is wrong because database migration to a new instance involves creating and copying data to a fresh instance, which incurs significant downtime during the migration process and is not a minimal-downtime approach for a simple patch. Option C is wrong because Cloud SQL for MySQL read replicas are not used for failover during patching; they are for read scaling and high availability, but the primary instance must still be patched, and failover to a replica does not eliminate the need for a restart on the primary.

Full explanation →

277

Multi-Selectmedium

A Cloud Database Engineer is designing a schema for an e-commerce application on Cloud Spanner. The application requires high read throughput for product queries by category and price range, and must support global scale with strong consistency. The team is considering primary key design and interleaved tables. Which TWO design considerations should the engineer apply? (Choose TWO.)

Select 2 answers

A.Define secondary indexes on price and category columns to support range queries without considering the primary key design.

B.Use a timestamp as the first part of the primary key to enable time-based partitioning and efficient range scans.

C.Define interleaved tables for all related entities, even if they are not always accessed together, to reduce joins.

D.Use a primary key that starts with the category column to colocate product data for efficient queries by category.

E.Create an interleaved table for product variants under the product table, since variants are always queried with the parent product.

AnswersD, E

Leading with category allows Spanner to distribute rows by category, improving locality for queries filtering by category.

Why this answer

Option D is correct because colocating product data by category in the primary key enables efficient range scans on category and price, as Cloud Spanner stores rows in sorted order by primary key. This design minimizes cross-node fan-out for queries filtering by category, directly supporting high read throughput at global scale with strong consistency.

Exam trap

Google Cloud often tests the misconception that secondary indexes are a universal solution for query performance, ignoring that primary key design and interleaved tables are critical for colocation and avoiding cross-node fan-out in globally distributed databases like Cloud Spanner.

Full explanation →

278

MCQeasy

In BigQuery, a BI analyst wants to store financial data with high precision and avoid rounding errors. Which data type should be used for currency columns?

A.NUMERIC

B.FLOAT64

C.INT64

D.STRING

AnswerA

NUMERIC is a fixed-point decimal type designed for financial precision.

Why this answer

NUMERIC (also known as DECIMAL) is the correct choice because it stores exact numeric values with up to 38 digits of precision and a user-defined scale, making it ideal for financial data where rounding errors from binary floating-point representation are unacceptable. In BigQuery, NUMERIC uses fixed-point arithmetic, ensuring that calculations like tax or interest accruals remain exact to the specified decimal places.

Exam trap

Google Cloud often tests the misconception that FLOAT64 is acceptable for currency because it 'has enough precision,' but the trap is that binary floating-point types inherently cannot represent many decimal fractions exactly, causing cumulative rounding errors in financial data.

How to eliminate wrong answers

Option B is wrong because FLOAT64 is a binary floating-point type that approximates values, leading to rounding errors in financial calculations due to its base-2 representation (e.g., 0.1 cannot be represented exactly). Option C is wrong because INT64 stores only whole integers, losing the fractional cents required for currency columns. Option D is wrong because STRING stores text, not numeric values, and would require costly and error-prone conversions for any arithmetic operations.

Full explanation →

279

Multi-Selecteasy

A company is planning to migrate their on-premises Oracle database to Cloud SQL. Which THREE prerequisites must be satisfied?

Select 3 answers

A.Convert Oracle-specific syntax to PostgreSQL or MySQL

B.Purchase Cloud SQL Enterprise Plus edition for the target instance

C.Ensure the source database is compatible with Database Migration Service (DMS)

D.Set up VPC peering or VPN to connect on-premises to Google Cloud

E.Create a Cloud Storage bucket for staging migration data

AnswersA, C, D

Database Migration Service handles schema conversion, but manual tuning may be needed.

Why this answer

Option A is correct because Cloud SQL does not support Oracle's proprietary PL/SQL syntax. When migrating from Oracle to Cloud SQL, you must convert Oracle-specific syntax (e.g., sequences, packages, hierarchical queries) to the target dialect—either PostgreSQL or MySQL—since Cloud SQL offers only these two engines. This conversion is a prerequisite to ensure the migrated database functions correctly after the move.

Exam trap

Google Cloud often tests the misconception that Cloud SQL supports Oracle as a native engine, leading candidates to overlook the mandatory syntax conversion, or that a staging bucket is always required when DMS can perform direct migration without intermediate storage.

Full explanation →

280

MCQeasy

A developer is designing a schema for Firestore to store user profiles. Each user has a unique ID and multiple addresses. Which data modeling approach is recommended for Firestore?

A.Store addresses as a string array in the user document.

B.Use a relational join between users and addresses collection.

C.Create a separate collection for addresses with a reference to user ID.

D.Store addresses as a nested map within the user document.

AnswerD

Nested maps are ideal for one-to-few relationships and minimize reads.

Why this answer

Firestore encourages denormalization for one-to-few relationships. Storing addresses as a nested map within the user document (Option A) is efficient for small, fixed sets of addresses. Option B (separate collection) is for large or dynamic lists.

Option C is not possible in Firestore. Option D (string array) loses structure.

Full explanation →

281

MCQeasy

A team is migrating an on-premises MySQL database to Cloud SQL. The current schema usesMyISAM tables. What is the recommended approach?

A.Keep the schema as is; Cloud SQL supports MyISAM.

B.Convert MyISAM tables to InnoDB before migration.

C.Replicate the on-premises MySQL to Cloud SQL using Database Migration Service.

D.Export the database using mysqldump and import directly into Cloud SQL.

AnswerB

InnoDB is the default and recommended engine; conversion ensures compatibility and transactional support.

Why this answer

Option B is correct because Cloud SQL supports InnoDB which is the recommended engine for transactional workloads. MyISAM is not supported on Cloud SQL. Option A is wrong because MyISAM is not supported.

Option C is wrong because simply moving the dump would fail. Option D is wrong because that would lose data integrity.

Full explanation →

282

MCQeasy

You are monitoring a Cloud SQL for PostgreSQL instance and notice that the CPU utilization is consistently above 90% during peak hours. What is the most cost-effective initial action to reduce CPU load?

A.Remove unnecessary indexes to reduce write overhead

B.Reduce the max_connections parameter to limit concurrent users

C.Add a read replica to offload read traffic

D.Increase the machine type (e.g., go to a higher vCPU count)

AnswerC

Read replicas handle read-only queries, reducing primary CPU load.

Why this answer

Adding a read replica is the most cost-effective initial action because it offloads read queries from the primary instance, reducing CPU contention without incurring the cost of upgrading the entire instance. Cloud SQL for PostgreSQL replicas use native streaming replication, so they handle read traffic while the primary focuses on writes, directly addressing high CPU utilization from read-heavy workloads.

Exam trap

Google Cloud often tests the misconception that vertical scaling (increasing machine type) is the default fix for high CPU, but the trap here is that adding a read replica is more cost-effective for read-heavy workloads, as it scales horizontally without upgrading the primary instance.

How to eliminate wrong answers

Option A is wrong because removing indexes typically reduces write overhead but does not significantly lower CPU utilization from read-heavy workloads; indexes often speed up reads, and removing them could increase CPU usage due to full table scans. Option B is wrong because reducing max_connections limits concurrent users but does not address the root cause of high CPU from read queries; it may cause application errors without reducing CPU load if queries are already queued. Option D is wrong because increasing the machine type (higher vCPU count) is a vertical scaling approach that is more expensive than adding a read replica and does not leverage the cost efficiency of distributing read traffic across multiple instances.

Full explanation →

283

MCQmedium

A data engineering team ingests JSON logs into BigQuery using a streaming pipeline. Queries need to extract specific fields from nested arrays. Which SQL construct should be used to efficiently transform the nested data into a flat table for BI?

A.ARRAY_AGG with STRUCT

B.STRUCT with nested field access

C.SELECT * EXCEPT with UNNEST

D.UNNEST with CROSS JOIN

AnswerD

UNNEST flattens arrays into rows, allowing access to nested fields.

Why this answer

Option D is correct because `UNNEST` with `CROSS JOIN` is the standard SQL construct in BigQuery to flatten nested arrays (repeated fields) into a flat table. When JSON logs contain arrays of structs, `CROSS JOIN UNNEST(array_column)` expands each array element into its own row, allowing BI tools to access individual fields directly. This is the most efficient and idiomatic way to transform nested data into a relational format for querying.

Exam trap

Google Cloud often tests the confusion between aggregation (`ARRAY_AGG`) and unnesting (`UNNEST`), where candidates mistakenly think `ARRAY_AGG` can flatten data because it deals with arrays, but it actually does the reverse operation.

How to eliminate wrong answers

Option A is wrong because `ARRAY_AGG` with `STRUCT` does the opposite—it aggregates rows into nested arrays, not flattens them. Option B is wrong because `STRUCT` with nested field access only retrieves scalar values from a single struct, not from array elements, and cannot unnest multiple rows. Option C is wrong because `SELECT * EXCEPT` is used to exclude columns from a SELECT *, not to flatten arrays; it does not involve `UNNEST` in a meaningful way for array expansion.

Full explanation →

284

MCQmedium

A company is migrating an on-premises PostgreSQL database to Cloud SQL. The database is 2 TB and must have minimal downtime during migration. Which migration strategy should the Database Engineer recommend?

A.Export the database using pg_dump and import into Cloud SQL using pg_restore during a maintenance window.

B.Set up PostgreSQL on Compute Engine and replicate data, then switch over to Cloud SQL.

C.Use Database Migration Service to set up continuous replication from the on-premises database to Cloud SQL, then perform a minimal-downtime cutover.

D.Use Cloud SQL's built-in replication feature to connect directly to the on-premises database.

AnswerC

Database Migration Service supports minimal-downtime migrations using continuous replication.

Why this answer

Option C is correct because the Database Migration Service (DMS) supports continuous replication from an on-premises PostgreSQL database to Cloud SQL using logical replication (pglogical or native PostgreSQL logical replication). This allows the source database to remain operational during the migration, and the cutover to Cloud SQL can be performed with minimal downtime, typically seconds to minutes, meeting the 2 TB size and low-downtime requirements.

Exam trap

Google Cloud often tests the distinction between tools that require downtime (like pg_dump/pg_restore) and services that support continuous replication (like DMS), and the trap here is that candidates may assume Cloud SQL's built-in replication can connect to any external database, when in fact it is limited to inter-instance replication within Google Cloud.

How to eliminate wrong answers

Option A is wrong because using pg_dump and pg_restore for a 2 TB database would require a long maintenance window (potentially hours or days) to export, transfer, and import the data, which contradicts the minimal downtime requirement. Option B is wrong because setting up PostgreSQL on Compute Engine and replicating data adds an unnecessary intermediate step and does not directly migrate to Cloud SQL; it also introduces additional operational overhead and potential latency without leveraging Cloud SQL's managed service benefits. Option D is wrong because Cloud SQL's built-in replication feature (e.g., cross-region replication or read replicas) cannot connect directly to an on-premises database; it is designed for replication between Cloud SQL instances, not for external sources.

Full explanation →

285

MCQeasy

The exhibit shows IAM policy for a BigQuery dataset. The BI team reports they can query tables but cannot create views. What is the missing role?

A.roles/bigquery.admin

B.roles/bigquery.metadataViewer

C.roles/bigquery.dataEditor

D.roles/bigquery.user

AnswerC

DataEditor includes permissions to create tables and views.

Why this answer

The BI team can query tables but cannot create views, which requires write access to the dataset. The `roles/bigquery.dataEditor` role grants permissions to read, create, update, and delete datasets, tables, and views, including the `bigquery.tables.create` and `bigquery.tables.update` permissions necessary for view creation. The existing query capability indicates they have at least `roles/bigquery.dataViewer`, but view creation demands the additional write permissions provided by `dataEditor`.

Exam trap

The trap here is that candidates confuse the ability to query tables (which only requires `dataViewer` or `user`) with the write permissions needed to create views, leading them to incorrectly select `roles/bigquery.user` or `roles/bigquery.metadataViewer`.

How to eliminate wrong answers

Option A is wrong because `roles/bigquery.admin` grants full control over BigQuery resources, including dataset deletion and IAM policy management, which is excessive and not the minimal missing role for view creation. Option B is wrong because `roles/bigquery.metadataViewer` only allows viewing dataset and table metadata (e.g., table names, schemas) but does not include the `bigquery.tables.create` permission needed to create views. Option D is wrong because `roles/bigquery.user` enables running queries and listing datasets but does not grant write permissions such as `bigquery.tables.create` or `bigquery.tables.update`, which are required for creating views.

Full explanation →

286

MCQmedium

You are a database engineer for an e-commerce platform running on Cloud SQL for PostgreSQL. The application team reports that a critical report query taking 5 seconds last week now takes over 30 seconds. The database CPU usage has increased from 40% to 85%. The query plan shows a sequential scan on the orders table. Which action should you take first to diagnose the problem?

A.Migrate the database to Cloud Spanner for better scaling.

B.Add a composite index on the orders table based on the query's WHERE clause.

C.Increase the machine type to add more vCPUs.

D.Enable pg_stat_statements and check the query's execution statistics.

AnswerD

Provides necessary metrics to diagnose the query performance degradation.

Why this answer

Option D is correct because enabling pg_stat_statements provides detailed execution statistics (e.g., total time, calls, rows, I/O) for each query, allowing you to identify the root cause of the performance regression without making speculative changes. Since the query plan shows a sequential scan and CPU is high, pg_stat_statements can reveal if the query is now reading more rows due to data growth or plan changes, guiding the next action (e.g., adding an index or tuning parameters).

Exam trap

Google Cloud often tests the principle of 'diagnose before you treat' — the trap here is that candidates jump to adding an index (Option B) or scaling resources (Option C) without first gathering evidence, which is a common mistake in performance troubleshooting.

How to eliminate wrong answers

Option A is wrong because migrating to Cloud Spanner is a drastic architectural change that does not diagnose the problem; it introduces new complexity and cost without addressing the immediate cause (e.g., missing index or stale statistics). Option B is wrong because adding a composite index is a potential solution, not a diagnostic step; you must first confirm the query's WHERE clause and execution pattern via pg_stat_statements to avoid creating an ineffective index. Option C is wrong because increasing vCPUs treats the symptom (high CPU) rather than the cause (sequential scan); it may temporarily reduce CPU but does not fix the underlying query performance issue and could be wasteful.

Full explanation →

287

MCQeasy

A Cloud SQL for PostgreSQL instance is used for an OLTP application. The database schema has many foreign key constraints. Which action improves write performance?

A.Create indexes on foreign key columns.

B.Drop all foreign key constraints.

C.Add more triggers to enforce integrity.

D.Increase the instance storage size.

AnswerA

Indexes on foreign key columns speed up lookups during INSERT/UPDATE/DELETE operations.

Why this answer

Option A is correct because creating indexes on foreign key columns avoids full table scans during referential integrity checks. Option B is wrong as more triggers slow writes. Option C is wrong because dropping constraints risks data integrity.

Option D is wrong because increasing disk size does not directly address the performance bottleneck.

Full explanation →

288

Matchingmedium

Match each Google Cloud database service to its primary use case.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts

Matches

Relational databases (MySQL, PostgreSQL, SQL Server)

Globally distributed, strongly consistent relational database

Serverless NoSQL document database for mobile/web apps

High-throughput, low-latency NoSQL for large analytical workloads

In-memory data store for Redis or Memcached

Why these pairings

These services cover the main database offerings in Google Cloud.

Full explanation →

289

MCQmedium

You are a database engineer at a retail company. The company uses BigQuery for BI, with a fact table 'sales_fact' partitioned by order_date and containing 100 million rows. There is a dimension table 'products' with 10,000 rows. The BI team reports that the following query takes over 5 minutes to run: SELECT p.category, SUM(s.amount) FROM sales_fact s JOIN products p ON s.product_id = p.product_id WHERE s.order_date >= '2024-01-01' AND s.order_date < '2024-04-01' GROUP BY p.category. The table 'products' is not partitioned or clustered. 'sales_fact' is partitioned by order_date but not clustered. The query only scans 3 months of data (about 25 million rows). However, the join seems slow. What is the most likely cause and what single action would you take to improve performance?

A.Cluster the 'sales_fact' table on product_id

B.Use a cross-join to avoid the join

C.Add an index on 'products.product_id'

D.Partition the 'products' table

AnswerA

Clustering on the join key reduces shuffle and speeds up join.

Why this answer

The query is slow because the join on `product_id` requires shuffling 25 million rows from `sales_fact` across nodes to match with `products`. Clustering `sales_fact` on `product_id` co-locates rows with the same `product_id` within each partition, reducing shuffle overhead and enabling more efficient broadcast or hash joins in BigQuery. This is the most impactful single action because it directly addresses the join performance bottleneck without changing the query logic.

Exam trap

Google Cloud often tests the misconception that indexes or partitioning small tables solve join performance issues, when the real solution in BigQuery is clustering the large fact table on the join key to minimize data shuffling.

How to eliminate wrong answers

Option B is wrong because a cross-join would produce a Cartesian product of 25 million × 10,000 rows, which is computationally prohibitive and would make the query far slower, not faster. Option C is wrong because BigQuery does not support traditional indexes; it uses columnar storage and clustering for data organization, so adding an index is not a valid action. Option D is wrong because partitioning the `products` table (only 10,000 rows) provides no benefit for a small dimension table; the bottleneck is the large fact table join, not the products table scan.

Full explanation →

290

MCQmedium

Which SQL function in BigQuery is best for replacing NULL values in a numeric column with a default value?

A.NULLIF

B.NVL

C.IFNULL

D.COALESCE

AnswerD

COALESCE is standard, flexible, and preferred for portability. It can handle multiple columns.

Why this answer

Option D, COALESCE, is correct because it returns the first non-NULL value from a list of expressions, making it ideal for replacing NULLs in a numeric column with a default value. In BigQuery, COALESCE is the standard, flexible function that can handle multiple arguments, unlike IFNULL which only accepts two. This aligns with SQL ANSI standards and is the recommended approach for NULL handling in numeric columns.

Exam trap

Google Cloud often tests the distinction between IFNULL and COALESCE, trapping candidates who think IFNULL is always the best choice because it's simpler, when COALESCE is more versatile and ANSI-compliant for multiple fallback values.

How to eliminate wrong answers

Option A is wrong because NULLIF returns NULL if two expressions are equal, not a default value for NULLs; it's used for conditional NULL creation, not replacement. Option B is wrong because NVL is not a valid function in BigQuery; it exists in Oracle and other databases but BigQuery does not support it, making it a distractor for candidates familiar with other SQL dialects. Option C is wrong because IFNULL, while valid in BigQuery and capable of replacing a single NULL with a default, is less flexible than COALESCE as it only accepts two arguments; the question asks for the 'best' function, and COALESCE is preferred for its ability to handle multiple fallback values and its ANSI compliance.

Full explanation →

291

MCQmedium

A database engineer is configuring a Memorystore for Redis instance for a session store application. The application requires persistent storage to survive node failures. Which tier and configuration should be used?

A.Standard tier with AOF persistence enabled

B.Standard tier with no persistence

C.Basic tier with RDB persistence enabled

D.Basic tier with AOF persistence enabled

AnswerA

Standard tier provides replication and failover; AOF persistence saves data to disk.

Why this answer

The Standard tier in Memorystore for Redis provides a replicated architecture with a primary and read replica, ensuring high availability and automatic failover. Enabling AOF (Append-Only File) persistence writes every write operation to an AOF file, which is stored on persistent disk and can be replayed to restore data after a node failure. This combination meets the session store requirement for data durability across failures, as the Basic tier lacks replication and cannot guarantee data survival.

Exam trap

Google Cloud often tests the misconception that enabling persistence on any tier is sufficient, but the trap here is that the Basic tier lacks replication and automatic failover, so even with AOF persistence, a node failure causes downtime and potential data loss, which is unacceptable for a session store requiring high availability.

How to eliminate wrong answers

Option B is wrong because Standard tier with no persistence does not save data to disk, so any node failure or restart results in complete data loss, failing the persistent storage requirement. Option C is wrong because Basic tier with RDB persistence uses point-in-time snapshots that can lose data between snapshots, and the Basic tier has no replication, so a node failure causes downtime and potential data loss. Option D is wrong because Basic tier with AOF persistence still lacks replication, meaning if the single node fails, the AOF file may be unrecoverable or the instance is unavailable, violating the need for session store durability.

Full explanation →

292

Drag & Dropmedium

Order the steps to export data from Cloud Bigtable to Cloud Storage using Dataflow.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps

Order

Why this order

First create storage, then set up Dataflow job with template, configure, run, verify.

Full explanation →

293

MCQhard

Refer to the exhibit. What is the most likely performance issue with this schema?

A.No performance issue; the schema is optimal

B.Hotspotting on UserId due to frequent queries

C.Hotspotting on TransactionId due to monotonically increasing values

D.Too many secondary indexes causing write amplification

AnswerC

Monotonically increasing keys cause all writes to target a single split.

Why this answer

Option A is correct: TransactionId is monotonically increasing (likely auto-generated), causing write hotspotting on the last split. Option B is incorrect because UserId is not the primary key. Option C is incorrect: there are no secondary indexes shown.

Option D is incorrect because hotspotting is a likely issue.

Full explanation →

294

MCQhard

Your team is migrating a 5 TB MySQL database from on-premises to Cloud SQL. The database receives 2,000 writes per second and the network link has 500 Mbps bandwidth. You need to minimize migration time with zero data loss. What should you do?

A.Configure Database Migration Service with a connectivity test and start continuous replication.

B.Export the database using mysqldump and import into Cloud SQL using the console.

C.Use gcloud sql import with a compressed CSV export from on-premises.

D.Set up a VPN, then use mysqldump with --master-data and pipe to mysql on Cloud SQL.

AnswerA

DMS supports continuous migration with minimal downtime.

Why this answer

Database Migration Service (DMS) with continuous replication is the correct choice because it supports live migration with minimal downtime and zero data loss for large databases like 5 TB. DMS uses MySQL binlog-based replication to synchronize changes from the on-premises source to Cloud SQL while the source remains operational, then performs a cutover with only seconds of downtime. This approach handles the 2,000 writes/second load and 500 Mbps bandwidth efficiently by streaming incremental changes rather than transferring the entire 5 TB in one shot.

Exam trap

Google Cloud often tests the misconception that a one-time dump and import (mysqldump or CSV) is sufficient for large databases with continuous writes, ignoring the requirement for zero data loss and minimal downtime that only continuous replication can satisfy.

How to eliminate wrong answers

Option B is wrong because mysqldump export and console import is a one-time dump that requires the source database to be read-locked or stopped to ensure consistency, causing significant downtime, and transferring 5 TB over 500 Mbps would take over 22 hours, during which writes would be lost. Option C is wrong because gcloud sql import with a compressed CSV export is designed for smaller, structured data imports and does not support continuous replication; it also requires converting the MySQL database to CSV format, which is impractical for a 5 TB database with schema complexity and risks data loss during conversion. Option D is wrong because piping mysqldump with --master-data directly to Cloud SQL via VPN still performs a one-time dump and import, which requires stopping writes on the source to capture a consistent snapshot, leading to downtime and potential data loss if writes continue during the transfer.

Full explanation →

295

MCQeasy

A company is migrating an on-premises MySQL database to Cloud SQL for MySQL. The current schema uses InnoDB with foreign keys. What is a key consideration for maintaining referential integrity in Cloud SQL?

A.Enable the foreign_key_checks flag during migration.

B.Convert foreign keys to application-level checks.

C.Use Cloud SQL's built-in foreign key enforcement which is identical to on-premises.

D.Foreign keys are not supported in Cloud SQL MySQL.

AnswerC

Cloud SQL for MySQL behaves exactly like standard MySQL for foreign keys.

Why this answer

Cloud SQL for MySQL fully supports InnoDB and foreign keys, identical to on-premises. Option D is correct because the same foreign key enforcement applies. Option A is false.

Option B is about a migration flag, not ongoing integrity. Option C is unnecessary.

Full explanation →

296

MCQhard

You administer a Cloud SQL for PostgreSQL instance running with 4 vCPUs and 15 GB of memory. The application frequently runs complex read-only reporting queries during business hours. Recently, the database started throwing 'out of memory' errors, and the instance's memory usage is consistently above 90%. You have enabled the pgtune recommendations and applied them. The workload is read-heavy and does not require a high write throughput. Which change would most effectively reduce memory pressure while maintaining query performance?

A.Enable the pgzstd extension to compress data in memory.

B.Set up connection pooling using PgBouncer to limit the number of concurrent connections.

C.Scale up the instance to a higher memory tier, such as 8 vCPUs and 30 GB of memory.

D.Decrease the shared_buffers configuration parameter to free memory for other uses.

AnswerC

More memory will accommodate the reporting queries' work_mem and cache needs, reducing out-of-memory errors.

Why this answer

Option B is correct: Increasing the instance memory directly addresses the memory shortage. Option A (reducing shared_buffers) may free memory but hurt performance. Option C (using connection pooling) reduces memory per connection but not overall memory needs.

Option D (enabling compression) might help with I/O but not memory very much; also, pgzstd is a tricky flag.

Full explanation →

297

MCQmedium

A Cloud Bigtable instance stores time-series data with a row key format: [metric_id]#[timestamp]. The team notices read throughput is low when scanning a metric over a time range. What is the likely cause?

A.All rows for a given metric are stored in a single tablet causing a hotspot.

B.Too many column families in the schema.

C.The number of nodes is insufficient.

D.Replication factor is set too low.

AnswerA

With metric_id prefix, all rows for that metric are on one tablet, limiting read throughput.

Why this answer

Option C is correct because bigtable rows are sorted lexicographically; with metric_id first, all rows for a metric are colocated in one tablet, causing hotspotting on reads. Option A is wrong because column families don't affect read distribution. Option B is wrong because replication factor doesn't affect per-tablet throughput.

Option D is wrong because node count might be sufficient; the issue is row key design.

Full explanation →

298

MCQhard

A financial services company uses Cloud Spanner for transaction processing. They need to run analytical queries that scan large portions of the database without impacting OLTP performance. What schema design technique should they use?

A.Export data periodically to BigQuery and run queries there.

B.Create multiple secondary indexes on frequently scanned columns.

C.Design the primary key so that analytical queries scan a small number of tablets by using interleaved tables.

D.Use a read replica instance to offload analytical queries.

AnswerC

Interleaving related rows keeps them co-located, allowing efficient scans on parent-child relationships without distributed reads.

Why this answer

Option D is correct because Cloud Spanner supports interleaved tables which allow efficient scans on parent rows without cross-node joins. Option A is wrong because secondary indexes may not be suitable for large scans. Option B is wrong because read replicas are not available in Spanner for analytical workloads.

Option C is wrong because exporting to BigQuery adds latency and complexity.

Full explanation →

299

MCQeasy

A marketing team needs to analyze customer behavior using BigQuery. They want to create a table that stores the first and last purchase date for each customer from the `orders` table. Which SQL approach should they use?

A.SELECT customer_id, (SELECT order_date FROM orders ORDER BY order_date LIMIT 1) AS first_purchase, ...

B.SELECT o1.customer_id, o1.order_date AS first_purchase, o2.order_date AS last_purchase FROM orders o1 JOIN orders o2 ON o1.customer_id = o2.customer_id

C.SELECT customer_id, order_date AS first_purchase, ... FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS rn) WHERE rn = 1

D.SELECT customer_id, MIN(order_date) AS first_purchase, MAX(order_date) AS last_purchase FROM orders GROUP BY customer_id

AnswerD

Simple and efficient aggregation.

Why this answer

Option D is correct because it uses aggregate functions MIN() and MAX() with GROUP BY customer_id to directly compute the first and last purchase dates from the orders table. This is the most efficient and idiomatic SQL approach in BigQuery, leveraging the database engine's built-in aggregation to avoid self-joins or subqueries.

Exam trap

Google Cloud often tests the misconception that window functions or self-joins are necessary for per-group min/max calculations, when in fact simple aggregation with GROUP BY is the correct and efficient solution.

How to eliminate wrong answers

Option A is wrong because the subquery lacks a correlation to the outer customer_id, returning the same global first order date for all customers instead of per-customer values. Option B is wrong because the self-join without aggregation or date filtering produces a Cartesian product of all order pairs per customer, not the first and last dates. Option C is wrong because it only captures the first purchase (ROW_NUMBER() = 1) and omits the last purchase date entirely, failing to meet the requirement for both dates.

Full explanation →

300

MCQmedium

A data engineer notices that a scheduled query exporting BigQuery data to Cloud Storage is failing with a timeout error. The dataset contains 500 million rows. What should they do?

A.Use SELECT * without filters.

B.Change the export format from CSV to Avro.

C.Increase the query timeout setting.

D.Export each partition separately.

AnswerD

Smaller exports avoid timeout limits.

Why this answer

Option D is correct because exporting a 500-million-row table as a single operation can exceed BigQuery's 6-hour timeout limit. By exporting each partition separately, you reduce the data volume per export job, allowing each to complete within the timeout window. This approach leverages BigQuery's partitioned table structure to parallelize the export and avoid hitting the timeout threshold.

Exam trap

Google Cloud often tests the misconception that timeout errors can be resolved by increasing a timeout setting, but in BigQuery, export job timeouts are fixed and cannot be changed, so the correct approach is to reduce the data per export job.

How to eliminate wrong answers

Option A is wrong because using SELECT * without filters does not reduce the data volume; it exports all 500 million rows, which is the root cause of the timeout. Option B is wrong because changing the export format from CSV to Avro does not affect the timeout; the timeout is based on data volume and complexity, not the output format. Option C is wrong because increasing the query timeout setting does not apply to export jobs; BigQuery export operations have a fixed 6-hour timeout that cannot be modified by the user.

Full explanation →

Google Professional Cloud Database Engineer (PCDE) — Questions 226–300