CCNA Design and implement database schemas Questions

75 of 100 questions · Page 1/2 · Design and implement database schemas · Answers revealed

1
MCQhard

A multinational corporation uses Cloud Spanner with a multi-region configuration. The schema includes a table that is updated frequently by users in two distant regions. They are experiencing high commit latencies due to distributed transactions. Which schema change would most reduce latency?

A.Reduce the number of replicas in the Spanner configuration.
B.Use a table-level leader placement configuration to keep the table's splits in a single region.
C.Convert the table into an interleaved child of a parent table.
D.Increase the number of splits by using a more granular primary key.
AnswerB

Leader placement allows directing all writes for a table to the nearest region, reducing distributed transaction overhead.

Why this answer

Option C is correct because replicating the table across regions with leader placement can reduce the number of remote operations. Option A is wrong because interleaving does not help with distribution across regions. Option B is wrong because reducing replicas may compromise availability.

Option D is wrong because horizontal scaling doesn't directly fix cross-region latency.

2
Multi-Selecteasy

A startup is using Firestore in Native mode for a real-time chat application. They want to design the schema for chat rooms and messages. Which TWO design patterns are recommended? (Choose two.)

Select 2 answers
A.Use arrays in the chat room document to store message IDs.
B.Use a composite index on chat room ID and timestamp.
C.Store all messages in a single top-level collection with a field for chat room ID.
D.Use a separate top-level collection for each chat room.
E.Store messages as documents in a subcollection under each chat room document.
AnswersB, E

A composite index is required for querying messages efficiently.

Why this answer

Options A and E are correct. Storing messages in a subcollection under each chat room (A) is scalable and follows Firestore best practices. A composite index on chat room ID and timestamp (E) is needed for efficient queries.

Option B (single collection) is less scalable; Option C (separate collection per chat room) leads to many collections, which is not recommended; Option D (arrays) has size limits and is not scalable for many messages.

3
MCQhard

A team is migrating an on-premises PostgreSQL database to Cloud SQL for PostgreSQL. The existing schema uses a large number of foreign key constraints and triggers for data validation. The team wants to minimize migration effort and maintain data integrity. Which schema design approach is most appropriate for Cloud SQL?

A.Keep the existing foreign keys and triggers as-is in Cloud SQL for PostgreSQL
B.Migrate to Cloud Spanner and use interleaved tables to simulate foreign keys
C.Remove all foreign keys and triggers and implement validation in the application layer
D.Convert the schema to use Firestore in Datastore mode with composite indexes
AnswerA

Cloud SQL supports these features, minimizing migration effort.

Why this answer

Option A is correct because Cloud SQL for PostgreSQL is fully compatible with the PostgreSQL engine, meaning foreign key constraints and triggers operate identically to on-premises PostgreSQL. This approach minimizes migration effort by preserving the existing schema logic and maintaining referential integrity without requiring application changes or data validation rewrites.

Exam trap

The trap here is that candidates assume managed cloud databases require schema simplification or NoSQL conversion, but Cloud SQL for PostgreSQL is a direct lift-and-shift target that preserves all relational features like foreign keys and triggers.

How to eliminate wrong answers

Option B is wrong because Cloud Spanner uses interleaved tables for hierarchical data relationships, not as a direct replacement for foreign keys; it does not support PostgreSQL triggers or the same constraint enforcement, requiring significant schema redesign and application logic changes. Option C is wrong because removing foreign keys and triggers shifts data integrity to the application layer, which increases complexity, risk of data corruption, and violates the goal of minimizing migration effort while maintaining integrity. Option D is wrong because Firestore in Datastore mode is a NoSQL document database that does not support SQL foreign keys, triggers, or relational integrity constraints, requiring a complete schema transformation and loss of existing PostgreSQL functionality.

4
Multi-Selecteasy

Which TWO are best practices for designing a Cloud Spanner schema?

Select 2 answers
A.Avoid secondary indexes to keep writes faster
B.Use monotonically increasing primary keys
C.Use commit timestamp columns to track row versions
D.Use interleaved tables for parent-child relationships
E.Store all related data in a single row to avoid joins
AnswersC, D

Commit timestamps provide automatic versioning.

Why this answer

Option A is incorrect because monotonically increasing keys cause hotspotting. Option B is correct: interleaved tables optimize parent-child joins. Option C is incorrect: secondary indexes are often needed for non-primary key queries.

Option D is correct: commit timestamp columns enable versioning without storing explicit timestamps. Option E is incorrect: storing all data in a single row leads to large rows and contention.

5
Multi-Selectmedium

Which THREE are considerations when designing a schema for Cloud Firestore?

Select 3 answers
A.Use subcollections to organize related data
B.Avoid large arrays to prevent document size limits
C.Denormalize data to reduce the need for joins
D.Use nested maps for deeply structured data
E.Always use transactional writes to ensure consistency
AnswersA, B, C

Subcollections enable scalable data modeling.

Why this answer

Option A is correct: denormalization is common in Firestore to avoid expensive reads. Option B is not a best practice: deeply nested maps are hard to query and can cause contention. Option C is correct: large arrays cause document bloat and index limits.

Option D is correct: subcollections allow scalable data organization. Option E is incorrect: Firestore supports transactions but they are not the only way to ensure consistency.

6
Matchingmedium

Match each Cloud Spanner concept to its definition.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Automatic data distribution across nodes

Global clock service for external consistency

Parent-child table with co-located rows

Read with guaranteed latest data

Read with bounded staleness for lower latency

Why these pairings

These are key concepts for understanding Spanner's architecture and consistency.

7
Multi-Selecthard

A financial services company is designing a Cloud Spanner schema for a trading system. They have two main entities: 'accounts' and 'transactions'. Each account has many transactions, and queries almost always retrieve transactions for a specific account. Which TWO schema design strategies should they employ?

Select 2 answers
A.Use a secondary index on transactions.account_id.
B.Ensure the primary key of transactions includes the account_id as the first part.
C.Define a foreign key constraint from transactions to accounts.
D.Store transactions as a JSON array of repeating fields within the account record.
E.Use an interleaved table hierarchy with accounts as parent and transactions as child.
AnswersB, E

This is required for interleaved tables: the child's primary key must start with the parent's primary key.

Why this answer

Option B is correct because Cloud Spanner distributes rows across splits based on the primary key prefix. By making `account_id` the first part of the transactions table primary key, all transactions for a given account are co-located, enabling efficient range scans and point lookups without cross-node shuffling.

Exam trap

Google Cloud often tests the misconception that secondary indexes are the default solution for filtering, when in Cloud Spanner the primary key design and interleaving are the preferred strategies for performance and cost efficiency.

8
MCQeasy

A startup is migrating from MongoDB to Firestore in Datastore mode. Their existing documents contain nested arrays of sub-objects (e.g., tags, comments). They want to design a schema that scales well and supports efficient queries. What is the recommended approach for handling these nested arrays in Firestore?

A.Use maps instead of arrays to store the data.
B.Store the arrays as stringified JSON in a single field.
C.Flatten the arrays into subcollections under each document.
D.Keep the nested arrays as they are; Firestore supports arrays.
AnswerC

Subcollections scale independently and allow efficient queries.

Why this answer

Option B is correct because Firestore recommends using subcollections for arrays of objects to avoid document size limits and enable efficient querying. Option A (keeping nested arrays) can hit size limits and is not scalable; Option C (maps instead of arrays) still has size issues; Option D (stringified JSON) is not queryable.

9
MCQmedium

A team is designing a BigQuery schema for time-series analytics on IoT sensor data. They expect high write throughput and queries that aggregate data by hour. Which partitioning and clustering strategy is most cost-effective?

A.Partition by ingestion_time and cluster by sensor_id.
B.Use integer range partitioning on sensor_id.
C.Partition by date and cluster by sensor_id with a timestamp column.
D.Partition by sensor_id and cluster by timestamp.
AnswerC

Date-based partitioning efficiently prunes scans; clustering by sensor_id further reduces data read.

Why this answer

Partitioning by date (e.g., ingestion time or event date) is standard for time-series. Clustering by sensor_id helps queries that filter on specific sensors. Option C (partition by date, cluster by sensor_id) is best.

Option A uses ingestion time, which may not align with event time. Option B partitions by sensor_id, creating many partitions. Option D (integer range) is not suitable for dates.

10
MCQhard

A Cloud Spanner database has a parent table 'Customers' and a child table 'Orders' interleaved on CustomerId. The most common query retrieves the last 10 orders for a given customer. How should the primary key of Orders be defined for optimal performance?

A.(CustomerId, OrderId)
B.Add a commit timestamp column as part of the primary key
C.No change; use a secondary index on OrderDate
D.(CustomerId, OrderDate DESC)
AnswerD

Descending order stores newest first, enabling efficient limit queries.

Why this answer

Option B is correct: Using CustomerId (parent key) and OrderDate DESC ensures that the most recent orders are stored first within each interleaved row range, making queries for last N orders efficient. Option A (OrderId) is monotonically increasing but not sorted by date. Option C (secondary index) adds overhead.

Option D (commit timestamp) is not a primary key.

11
Multi-Selecthard

A company is migrating a large Oracle Data Warehouse to BigQuery. The source schema includes many partitioned tables and materialized views. Which THREE considerations are important when designing the BigQuery schema?

Select 3 answers
A.Clustering can be used to improve query performance on frequently filtered columns.
B.Partitioning in BigQuery can be based on a DATE, TIMESTAMP, or INTEGER column.
C.BigQuery requires explicit indexes on columns used in WHERE clauses.
D.Materialized views in BigQuery are automatically refreshed based on base table changes.
E.BigQuery supports unique constraints and foreign keys for data integrity.
AnswersA, B, D

Clustering sorts data within partitions for better filter performance.

Why this answer

Option A is correct because BigQuery clustering organizes data based on the values of specified columns, which improves query performance by reducing the amount of data scanned when filtering on those columns. This is particularly useful for large data warehouses migrating from Oracle, as it mimics the performance benefits of indexes without the overhead of explicit index management.

Exam trap

Google Cloud often tests the misconception that BigQuery requires traditional database features like indexes or constraints, leading candidates to select options that apply to OLTP systems but not to BigQuery's distributed, columnar architecture.

12
MCQmedium

A company is designing a Cloud Firestore schema for a social media application. Users can follow other users, and the application needs to display a feed of posts from followed users ordered by timestamp. Which schema design is most cost-effective and performant for querying the feed?

A.Store all posts in a top-level collection and query for posts where user ID is in the list of followed users, ordered by timestamp.
B.Store a feed subcollection under each user document containing references to posts from followed users.
C.Store all user posts in an array within a single document and use array-contains queries.
D.Store a 'follows' collection with documents containing follower and followed user IDs; then query posts for each followed user.
AnswerB

This allows direct query on the feed subcollection ordered by timestamp.

Why this answer

Option B is correct because it uses a feed subcollection under each user document to store pre-computed references to posts from followed users. This design avoids expensive collection-group queries or multiple individual queries per followed user, ensuring that fetching the feed is a single, indexed read operation ordered by timestamp, which is both cost-effective and performant at scale.

Exam trap

The trap here is that candidates often choose Option A, thinking a single top-level query with an 'in' filter is simpler, but they overlook Firestore's 10-value limit on 'in' queries and the resulting need for multiple queries, which destroys both performance and cost predictability at scale.

How to eliminate wrong answers

Option A is wrong because querying a top-level posts collection with a list of followed user IDs requires an 'in' query, which is limited to 10 values per query and does not scale to hundreds or thousands of followed users, leading to multiple queries and high read costs. Option C is wrong because storing all user posts in an array within a single document violates the 1 MiB document size limit and cannot support ordered queries or pagination, making it impractical for any real-world social media feed. Option D is wrong because querying posts for each followed user individually results in N+1 read operations per feed request, causing high latency and cost proportional to the number of followed users, with no built-in ordering across results.

13
MCQhard

A financial services company uses Cloud Spanner for a ledger application. The ledger table has a primary key of 'transaction_id' which is a monotonically increasing integer. During peak hours, they observe high write latencies due to hot spots on the last tablet. They need to redesign the schema to distribute writes evenly while still allowing efficient point lookups by transaction ID. What is the best approach?

A.Reverse the timestamp and use it as the primary key.
B.Use a UUID as the primary key to ensure randomness.
C.Use a composite primary key with a timestamp and a random number.
D.Use a composite primary key with a hash prefix of the transaction ID as the first component, followed by the transaction ID.
AnswerD

The hash prefix evenly distributes writes, and the transaction ID allows efficient point lookups.

Why this answer

Option B is correct because using a hash prefix (e.g., a hash of the transaction ID) as the first component of the primary key distributes writes across tablets, while the transaction ID as the second component still allows efficient lookups. Option A (UUID) helps distribution but has larger key size and may fragment reads; Option C (reverse timestamp) can also help but may cause hotspots if timestamps are sequential; Option D (composite with timestamp) still has potential for hotspots.

14
MCQhard

A game company uses Cloud Bigtable to store player session data. Access patterns include looking up a player's most recent sessions and scanning sessions by time range. Which row key design is most appropriate?

A.Use only player ID as row key with column qualifiers for timestamps.
B.Use a row key of player ID followed by reversed timestamp.
C.Prefix with timestamp and append player ID.
D.Use a hash of player ID as row key and store timestamps in cell versions.
AnswerB

Player ID distributes writes across tablets; reversed timestamp makes recent data appear at the start of the range for efficient scans.

Why this answer

Option B is correct because using reversed timestamp as part of the row key helps distribute writes and allows efficient range scans over recent data. Option A is wrong because timestamp first can cause hotspotting. Option C is wrong because sequential player IDs cause hotspotting on a single tablet.

Option D is wrong because hashing alone makes range scans impossible.

15
MCQeasy

A financial services company runs a MySQL database on Compute Engine. They want to migrate to Cloud SQL for MySQL to reduce operational overhead. The current schema includes a table 'transactions' with a composite primary key on (transaction_id, account_id) and a secondary index on account_id for account lookups. The database also uses foreign key constraints to ensure referential integrity between 'transactions' and 'accounts'. During migration testing, they observe that INSERT operations on 'transactions' are slower than expected. What schema change should they implement to improve INSERT performance in Cloud SQL?

A.Remove the foreign key constraints and enforce referential integrity in the application logic instead.
B.Remove the secondary index on account_id because it adds write overhead.
C.Change the primary key to (account_id, transaction_id) to avoid secondary index overhead.
D.Convert the table to a temporal table with system-versioning to avoid constraint checking.
AnswerA

Foreign key constraints require a lookup on the parent table for every INSERT, causing latency. Removing them reduces write overhead, though integrity must be ensured by the application.

Why this answer

Foreign key constraints in MySQL (including Cloud SQL) require an internal check on every INSERT to verify that the referenced parent key exists. This adds a latency penalty proportional to the size of the parent table. Removing the constraint and moving referential integrity to the application eliminates this per-row check, directly improving INSERT throughput.

Exam trap

Google Cloud often tests the misconception that secondary indexes are the primary cause of write slowdowns, when in reality foreign key constraint checks are far more expensive per row than index maintenance.

How to eliminate wrong answers

Option B is wrong because removing the secondary index on account_id would degrade SELECT performance for account lookups, and the index's write overhead is negligible compared to the cost of foreign key checks. Option C is wrong because changing the primary key order does not eliminate foreign key validation overhead; it only affects index clustering and does not address the root cause of slow INSERTs. Option D is wrong because temporal tables with system-versioning add additional metadata and version-row writes on every INSERT, which would further degrade performance, not improve it.

16
MCQhard

A Cloud Spanner database needs to add a column 'discount' to the 'Products' table without any downtime. The table is actively used. What is the correct approach?

A.Create a new table with the column and copy data over
B.Execute ALTER TABLE Products ADD COLUMN discount FLOAT64
C.Create a secondary index that includes the new column
D.Define a generated column based on an existing column
AnswerB

Spanner allows DDL changes while the table remains fully available.

Why this answer

Option A is correct: Spanner supports online schema updates via ALTER TABLE ADD COLUMN, which does not block reads or writes. Option B (new table and copy) would require downtime or at least double-write logic. Option C (secondary index) is unrelated.

Option D (generated column) could be used but is unnecessary.

17
MCQhard

Your company runs an e-commerce platform on Google Cloud. The platform uses Cloud SQL for MySQL to store product inventory. The inventory table has the following schema: CREATE TABLE inventory (product_id INT PRIMARY KEY, quantity INT, last_updated TIMESTAMP) ENGINE=InnoDB. The application performs frequent updates on quantity for a subset of popular products. Recently, you have noticed increased deadlock errors during peak hours. The application uses REPEATABLE READ isolation level. You suspect that the schema design is contributing to locking contention. After analyzing the workload, you find that the updates often involve incrementing or decrementing quantity by small amounts and are mostly on the same set of popular products. What would be the best course of action to reduce deadlocks without compromising data integrity?

A.Rewrite the update query to use atomic operations (e.g., UPDATE inventory SET quantity = quantity - ? WHERE product_id = ?) without pre-fetching the current value.
B.Change the engine to MyISAM to avoid row-level locking.
C.Partition the inventory table by product_id range to spread the load.
D.Reduce the isolation level to READ COMMITTED to reduce locking.
AnswerA

Atomic updates avoid the need for SELECT ... FOR UPDATE and significantly reduce locking and deadlock chances.

Why this answer

Option C is correct because using UPDATE with a WHERE clause that includes the current quantity can cause gap locks and phantom reads; switching to a single atomic UPDATE without checking the current value, and optionally using optimistic locking, reduces locking. Option A is wrong because row-level locking is already used; disabling it is not possible. Option B is wrong because reducing isolation to READ COMMITTED may reduce locking but could cause non-repeatable reads; however, it is a viable option but not the best.

Option D is wrong because changing to MyISAM is not supported and also loses transactional integrity. The best solution is to adjust the SQL statement to avoid the read-before-write pattern and rely on atomic operations.

18
MCQhard

A company uses Cloud Bigtable for time-series data from IoT devices. Each device sends a reading every second. The row key is device_id#timestamp (reverse timestamp). The team reports that queries for a specific device's data over the last hour are fast, but queries for all devices' data over the last minute are very slow. What is the most likely cause?

A.The Bigtable cluster does not have enough nodes to handle the scan.
B.The query is scanning multiple column families.
C.The row key design does not allow efficient scanning for all devices because device_id is the prefix.
D.The table has too many tablets, causing high overhead.
AnswerC

Prefix scans on device_id are efficient per device, but scanning all devices requires a full table scan.

Why this answer

Option C is correct because the row key design uses device_id as the prefix, which means all data for a given device is co-located in contiguous rows, making per-device scans efficient. However, a query for all devices over the last minute requires scanning every row in the table because the timestamp suffix is reversed and not a prefix; Bigtable cannot perform a range scan across all devices for a recent time window without a full table scan, which is extremely slow.

Exam trap

Google Cloud often tests the misconception that adding more nodes or tablets fixes scan performance, but the real issue is row key design that prevents Bigtable from using its sorted storage to limit the scan range.

How to eliminate wrong answers

Option A is wrong because insufficient nodes would cause general performance degradation across all queries, not specifically slow down the all-devices query while keeping the per-device query fast. Option B is wrong because scanning multiple column families adds overhead only if the query retrieves data from many families, but the problem statement does not mention column families, and the slowness is tied to the row key design, not column family access. Option D is wrong because too many tablets can cause high overhead for any scan, but the per-device query would also be affected; the asymmetry between fast per-device and slow all-devices queries points directly to row key ordering, not tablet count.

19
MCQmedium

An e-commerce platform uses Cloud SQL for PostgreSQL. They need to run complex reporting queries that join several tables. These queries are slowing down the transactional workload. What should they do?

A.Create materialized views for common reports.
B.Change all joins to use subqueries.
C.Increase the number of vCPUs on the primary instance.
D.Use read replicas to offload reporting queries.
AnswerD

Read replicas serve read-only traffic without impacting the primary.

Why this answer

Read replicas can offload read-only reporting queries, protecting the primary's performance. Option B is correct. Option A (materialized views) still run on primary.

Option C (subqueries) may not reduce load. Option D (scaling up) is more expensive and doesn't isolate workloads.

20
Multi-Selecthard

Which THREE considerations are important when designing a schema for Cloud Firestore to ensure scalability?

Select 3 answers
A.Design collections to avoid high read/write rates on a single document.
B.Create composite indexes tailored to the application's query patterns.
C.Nest subcollections up to 10 levels deep to model complex hierarchies.
D.Use collection group indexes for all queries to avoid manual index creation.
E.Limit document size to avoid exceeding the 1 MiB limit.
AnswersA, B, E

Hot documents cause contention; distribute writes across documents.

Why this answer

Options A, C, and E are correct. Option A: Avoiding document growth near 1 MiB prevents performance issues. Option C: Using composite indexes for common queries avoids full scans.

Option E: Sharding writes for a collection with high write throughput avoids hotspotting. Option B is wrong because subcollections cannot be deeply nested (max 20 levels). Option D is wrong because regular (not collection group) indexes require specific fields.

21
MCQhard

A data scientist runs a complex SQL query on a large BigQuery dataset and receives the above error. The query joins 10 tables and uses multiple window functions. Which action is most likely to resolve the issue?

A.Apply for a quota increase for concurrent queries.
B.Increase the number of slots allocated to the project.
C.Use the '--maximum_billing_tier' flag to increase the billing tier.
D.Simplify the query by reducing the number of joins or using a temporary table.
AnswerD

Reducing query complexity lowers resource demands and can stay within tier limits.

Why this answer

The error indicates the query exceeded resource limits for tier 1, meaning it requires more intermediate resources. The best solution is to optimize the query (C) by reducing complexity, using subqueries, or breaking it into steps. Option A (increasing slots) does not affect tiers.

Option B (quota increase) is for concurrency. Option D (billing tier flag) is deprecated.

22
MCQeasy

A team is designing a schema for a time-series database in Bigtable to store IoT sensor readings. Each sensor sends a reading every minute. The team needs to create a row key that supports efficient queries for a specific sensor's readings over a time range. Which row key design is most appropriate?

A.timestamp#sensor_id
B.hash(sensor_id)#timestamp
C.sensor_id#reverse_timestamp
D.random_UUID
AnswerC

Groups all readings for a sensor together in reverse chronological order.

Why this answer

Option C is correct because Bigtable stores rows sorted lexicographically by row key. By placing the sensor_id first, all readings for a given sensor are co-located in contiguous rows. Using reverse_timestamp (e.g., 9999-12-31 minus actual timestamp) ensures that the most recent readings appear first within that sensor's row range, which optimizes scans for the latest data and allows efficient range queries over a time window.

Exam trap

Google Cloud often tests the misconception that putting the timestamp first is always best for time-range queries, but in Bigtable, the row key's prefix determines data locality, so the sensor_id must come first to avoid scattering reads across the entire table.

How to eliminate wrong answers

Option A is wrong because timestamp first scatters readings for the same sensor across the entire table, making queries for a specific sensor's time range require a full table scan or multiple lookups. Option B is wrong because hashing the sensor_id destroys the natural sort order, so even though the sensor_id is first, the hash distributes rows randomly, preventing efficient range scans over time. Option D is wrong because a random UUID provides no ordering or grouping, forcing full table scans for any sensor-specific time-range query.

23
MCQeasy

You are designing a Firestore database for a chat application. Documents will store messages with fields: senderId, messageText, timestamp, conversationId. To efficiently retrieve the most recent 50 messages in a conversation, which index should you create?

A.A composite index on (conversationId, timestamp, __name__) descending
B.A single-field index on timestamp
C.An index on conversationId only
D.A composite index on (senderId, timestamp)
AnswerA

This index covers the query with filtering and ordering, enabling efficient retrieval.

Why this answer

Option A creates a composite index on (conversationId, timestamp, __name__) with descending order on timestamp, which efficiently supports queries that filter by conversationId and order by timestamp descending, limiting to 50 results. Option B only indexes timestamp, not filtering by conversation. Option C indexes senderId, which is not used in the query.

Option D indexes conversationId only, but without timestamp order, it would require sorting in memory.

24
MCQeasy

A retail company is designing a Cloud Spanner schema for an order management system. Orders are identified by a UUID and contain multiple line items. Each line item references a product. Which schema design best supports high read throughput for queries that retrieve all line items for a given order?

A.Store orders and line items in a single table with repeated fields for line items.
B.Create an Orders table and a LineItems table interleaved in Orders with ORDER_ID as the parent key.
C.Create separate Orders and LineItems tables with a foreign key relationship and index on ORDER_ID.
D.Denormalize product information into the LineItems table and store orders separately.
AnswerB

Interleaving colocates line items with their order for fast retrieval.

Why this answer

Option B is correct because Cloud Spanner interleaved tables store child rows (LineItems) physically adjacent to their parent row (Orders) on the same split, enabling a single key lookup to retrieve all line items for a given order without cross-table joins or distributed queries. This colocation maximizes read throughput by minimizing latency and avoiding scatter-gather operations across nodes.

Exam trap

Google Cloud often tests the misconception that a foreign key with an index is equivalent to interleaving for performance, but in Cloud Spanner, only interleaved tables guarantee physical colocation and single-split access for parent-child queries, whereas indexed foreign keys still require distributed lookups.

How to eliminate wrong answers

Option A is wrong because storing repeated fields (e.g., ARRAY<STRUCT>) for line items within a single row violates Cloud Spanner's 10 MB row size limit and prevents efficient indexing or atomic updates of individual line items, degrading throughput for large orders. Option C is wrong because separate tables with a foreign key and index on ORDER_ID require a two-step lookup (index scan then table access) and may involve distributed reads if the index and data are on different splits, increasing latency compared to interleaving. Option D is wrong because denormalizing product information into LineItems does not address the core read pattern (retrieving all line items for an order) and introduces data redundancy and update anomalies without improving colocation; it still requires a separate table or repeated fields, neither of which matches the interleaved design's performance benefit.

25
MCQmedium

A retail company uses Cloud Spanner to store product inventory data. The table structure is: CREATE TABLE Inventory ( ProductId INT64 NOT NULL, WarehouseId INT64 NOT NULL, StockLevel INT64 NOT NULL, LastUpdated TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true) ) PRIMARY KEY (ProductId, WarehouseId); The application frequently runs the query: SELECT ProductId, SUM(StockLevel) AS TotalStock FROM Inventory WHERE WarehouseId = 123 GROUP BY ProductId. The query is slow and scans many rows. The index used is: CREATE INDEX InventoryByWarehouse ON Inventory (WarehouseId); What is the most effective schema change to improve query performance?

A.Change the primary key to (WarehouseId, ProductId) so rows are interleaved by warehouse.
B.Create a materialized view that pre-aggregates stock by warehouse.
C.Modify the index to INCLUDE StockLevel: CREATE INDEX InventoryByWarehouse ON Inventory (WarehouseId) STORING (StockLevel).
D.Add a STORED GENERATED column for total stock per warehouse.
AnswerC

The STORING clause adds StockLevel to the index, making it a covering index for the query, so Cloud Spanner can return results from the index alone without scanning the base table.

Why this answer

Option C is correct because the query needs to read StockLevel for every row matching WarehouseId, but the existing index only covers WarehouseId, forcing a back-join to the base table. By using STORING (StockLevel), the index becomes a covering index that includes the StockLevel column, eliminating the need for the back-join and reducing the number of rows scanned to only those matching the warehouse filter.

Exam trap

The trap here is that candidates often think changing the primary key order (Option A) will physically colocate data and speed up the query, but in Cloud Spanner, primary key order does not eliminate the need to scan all rows for a given WarehouseId, and the query still requires aggregation across ProductId groups, so a covering index is the correct optimization.

How to eliminate wrong answers

Option A is wrong because changing the primary key to (WarehouseId, ProductId) would reorder the table's physical storage, but Cloud Spanner does not support interleaving in the same way as Cloud SQL; more importantly, the query still needs to aggregate StockLevel across all rows for each ProductId, and a primary key change does not avoid scanning all rows for the given WarehouseId. Option B is wrong because creating a materialized view that pre-aggregates stock by warehouse would not help this query, which groups by ProductId, not by warehouse; the materialized view would need to be grouped by (WarehouseId, ProductId) to be useful, and even then, maintaining a materialized view adds write overhead and complexity. Option D is wrong because a STORED GENERATED column for total stock per warehouse is not possible in Cloud Spanner—generated columns cannot reference rows from other rows or perform aggregation, and they are computed per row, not across rows.

26
MCQmedium

A healthcare analytics company uses Cloud Bigtable to store time-series data from medical devices. The table has a row key of 'device_id#timestamp' where timestamp is stored in reverse order (max - timestamp) so that recent data is at the top. Queries that fetch data for a specific device over a date range are very fast. However, analysts also need to run queries that aggregate data across all devices for a specific hour (e.g., count of readings between 2023-01-01 10:00 and 11:00). These queries are extremely slow because they require scanning all rows. The team must redesign the schema to support both access patterns without duplicating data unnecessarily. What is the best approach?

A.Use BigQuery to query Bigtable via an external table and run the aggregation there.
B.Increase the number of Bigtable nodes to improve scan throughput.
C.Add a secondary index on the timestamp column.
D.Create a second table with row key 'timestamp#device_id' (with timestamp in natural order) to support time-range queries.
AnswerD

This provides efficient access for the aggregation query by allowing a range scan over the timestamp.

Why this answer

Option A is correct. Creating a separate table with a row key of 'timestamp#device_id' allows efficient range scans for a given time period across all devices. This is a common pattern in Bigtable to support multiple access patterns.

Option B is not possible (no secondary indexes). Option C is external and not a schema change. Option D (adding nodes) helps throughput but not query efficiency.

27
MCQhard

You have a Cloud Spanner table 'Orders' with columns: OrderId, CustomerId, OrderDate, Status. You need to support a query that finds all orders for a customer in the last 30 days, sorted by OrderDate descending, with strong consistency. Using only indexes, what is the best approach?

A.Create a secondary index on (OrderDate) only
B.Create a secondary index on (CustomerId, OrderDate)
C.Use a manual table scan with filter
D.Create a secondary index on (CustomerId, OrderDate DESC) with INCLUDE (OrderId, Status)
AnswerD

Index covers the query completely, providing efficient ordered retrieval.

Why this answer

Option D is correct: creating a secondary index on (CustomerId, OrderDate DESC) with INCLUDE (OrderId, Status) allows the query to be served entirely from the index without accessing the base table, minimizing latency. Option A is good but missing INCLUDE forces access to the base table. Option B doesn't filter by customer.

Option C is not an index-based solution.

28
MCQmedium

Refer to the exhibit. A developer creates these tables and notices that queries joining Users and Orders on UserId are slow. What is the most likely cause?

A.The primary key of Orders should include UserId as a prefix for co-location.
B.The foreign key constraint is missing, causing full table scans.
C.Tables are not interleaved, so parent and child rows may be in different splits.
D.The foreign key reference should be on the parent table.
AnswerC

Interleaving is required to guarantee co-location. Without it, joins may be distributed.

Why this answer

Option C is correct because without interleaving, parent and child rows may be stored on different splits, causing distributed joins. Option A is wrong because there is a foreign key. Option B is wrong because the primary key is on OrderId, not a composite key.

Option D is wrong because the foreign key is defined correctly.

29
Multi-Selecteasy

Which TWO data types are supported in Cloud Spanner schemas?

Select 2 answers
A.ARRAY
B.GEOMETRY
C.TIMESTAMP
D.TEXT
E.TINYINT
AnswersA, C

ARRAY is supported for storing repeated values of a specific type.

Why this answer

Options B and D are correct. Cloud Spanner supports ARRAY and TIMESTAMP. Option A is wrong because TINYINT is not a Spanner type (use INT64).

Option C is wrong because GEOMETRY is not supported. Option E is wrong because TEXT is not a supported type (use STRING).

30
MCQhard

Refer to the exhibit. You receive the following query output showing bytes processed for a BigQuery query. The table is partitioned by date and clustered on country. What is the most likely reason for the high bytes processed?

A.The GROUP BY country requires sorting all rows
B.The table is not partitioned correctly
C.The date range is too wide
D.The query does not filter on the clustering column, causing full scan of selected partitions
AnswerD

Clustering on country helps only if the WHERE clause filters on country; otherwise, all rows in partitions are scanned.

Why this answer

Option B is correct: the query does not filter on the clustering column (country), so BigQuery must scan all rows in the selected partitions. Clustering only reduces data scanned when there is a filter on the clustering key or when the query aggregates after filtering on it. Option A is incorrect because partitioning is working.

Option C is incorrect because 31 days is a small range. Option D is incorrect because the GROUP BY does not cause full scan; the issue is lack of clustering filter.

31
MCQhard

A team is migrating an on-premises PostgreSQL database to Cloud SQL. The current schema uses a composite primary key on columns (customer_id, order_date) in the orders table. The migration team wants to reduce the cost of secondary indexes. Which schema design change should they consider?

A.Partition the table by customer_id to reduce the number of secondary indexes needed.
B.Create a secondary index on the composite key to keep the same query performance.
C.Replace the composite primary key with a surrogate UUID primary key and add unique constraints on the original columns.
D.Use the CLUSTER command to physically reorder the table based on the composite key.
AnswerC

A UUID primary key is smaller than a composite key, and unique constraints enforce data integrity without the overhead of a clustered index.

Why this answer

Option C is correct because replacing the composite primary key with a surrogate UUID primary key reduces the size of secondary indexes. In PostgreSQL (and Cloud SQL), secondary indexes include the primary key columns as row identifiers. A composite key on (customer_id, order_date) is wide, making every secondary index large and costly.

A UUID surrogate key is narrower, shrinking all secondary indexes and reducing storage and I/O costs.

Exam trap

Google Cloud often tests the misconception that partitioning or clustering reduces index storage costs, when in fact only narrowing the primary key (or using a surrogate key) directly shrinks secondary index size in PostgreSQL.

How to eliminate wrong answers

Option A is wrong because partitioning by customer_id does not reduce the number or size of secondary indexes; it only splits the table into smaller physical segments, and each partition still needs its own indexes. Option B is wrong because creating a secondary index on the composite key duplicates the primary key index, increasing storage and write overhead without reducing cost. Option D is wrong because the CLUSTER command physically reorders rows based on an index, which can improve locality but does not reduce secondary index size or cost; it is a one-time maintenance operation, not a schema design change.

32
Multi-Selecthard

A company uses Firestore to power a live sports score app. Scores are updated frequently, and many clients listen to real-time updates on specific games. Which two design decisions will minimize the number of reads and reduce costs? (Choose two.)

Select 2 answers
A.Use a collection group query to listen to all games at once
B.Store an aggregate score summary document per game and listen to it
C.Use a separate document per game and listeners filter by game ID
D.Use a single document for all games with nested fields
E.Use a subcollection of periods (quarters) to spread writes
AnswersB, C

Reduces write operations and read frequency; clients get updates from a single summary document.

Why this answer

Options B and C are correct. B: using a separate document per game and having clients listen only to the game they're interested in minimizes reads because each client only reads one document. C: storing an aggregate score summary document per game reduces the number of document updates and reads because changes are batched into a single document write, and listeners read that one document.

Option A (collection group query) would listen to many documents, increasing reads. Option D (subcollection of periods) increases read complexity. Option E (single document for all games) would cause document contention and all clients reading the same large document.

33
MCQmedium

A Cloud Spanner application experiences high write latency on a table with a monotonically increasing primary key. Which schema change will most effectively reduce latency?

A.Convert the table to an interleaved table
B.Add a secondary index on the existing key
C.Modify the primary key to include a hash of the original key as a leading column
D.Increase the number of nodes in the instance
AnswerC

Hash prefix distributes writes uniformly across splits.

Why this answer

Option B is correct: adding a hash prefix to the primary key spreads writes across nodes, eliminating hotspotting. Option A is the current problem. Options C and D do not directly address the underlying hotspotting issue.

34
MCQhard

A global gaming company uses Cloud Spanner for player profiles and game state. The schema includes a table 'PlayerStats' with a primary key (PlayerId, GameId, Timestamp). The table stores millions of rows per player. The application frequently runs a query to fetch the most recent stats for a given player across all games, using ORDER BY Timestamp DESC LIMIT 10. This query is slow, taking several seconds. The team adds a secondary index on (PlayerId, Timestamp) but still sees high CPU usage and latency. They need to redesign the schema to optimize this query without changing the application logic significantly. What should they do?

A.Migrate the PlayerStats table to Cloud Bigtable for better time-series performance.
B.Change the primary key to (PlayerId, Timestamp, GameId) and drop the secondary index.
C.Create a stored procedure that aggregates data per player and caches results.
D.Add a materialized view that pre-computes the latest stats per player.
AnswerB

This allows efficient range scans for a player’s stats ordered by time.

Why this answer

Option A is correct. Reordering the primary key to (PlayerId, Timestamp, GameId) allows Spanner to efficiently perform a range scan for a given PlayerId, sorted by Timestamp, without needing a secondary index. This eliminates the need for the index and reduces CPU.

Option B is not a schema change. Option C is a different database, not a schema redesign. Option D is not supported in Spanner natively.

35
MCQeasy

A data warehouse in BigQuery stores event logs with nested and repeated fields (e.g., page views within a session). Which schema type is optimal for storing this data?

A.Use RECORD type columns for each nested level
B.Normalize into separate tables and join
C.Use ARRAY<STRUCT<...>> columns for nested repeated data
D.Store as JSON strings and parse at query time
AnswerC

Arrays of structs are the native way to represent nested repeated data in BigQuery.

Why this answer

Option D is correct: ARRAY<STRUCT<...>> allows storing nested repeated data natively in BigQuery, enabling efficient querying without joins. Option A (separate tables) requires costly joins. Option B (JSON strings) loses schema enforcement and performance.

Option C (RECORD type) is a legacy term; the current best practice is arrays of structs.

36
Drag & Dropmedium

Arrange the steps to import data from Cloud Storage into Cloud Firestore using a managed import.

Drag steps to the numbered slots on the right, or tap a step then tap a slot.

Steps
Order

Why this order

Import needs properly formatted files; use gcloud command, then monitor and verify.

37
MCQeasy

Your company runs a global e-commerce platform on Google Cloud Spanner. The database schema includes an 'Orders' table with primary key (OrderId, CustomerId) and an 'OrderItems' table with primary key (OrderId, CustomerId, ItemId), interleaved in parent Orders on delete cascade. During peak shopping hours, you notice that queries retrieving all items for a specific order are performing full table scans on the OrderItems table, leading to increased latency and higher CPU utilization. The queries use the OrderId as the filter condition. The database administrators have already checked that the query plans show table scans instead of using the interleaved index. You are tasked with resolving this performance issue. Which of the following actions should you take?

A.Remove CustomerId from the Orders primary key (making it just OrderId) and update OrderItems to have primary key (OrderId, ItemId), maintaining interleaving.
B.Change the primary key of Orders to (OrderId, CustomerId) and update OrderItems accordingly.
C.Create a secondary index on OrderItems(OrderId).
D.Increase the number of Spanner nodes to improve throughput.
AnswerA

This allows efficient lookup using only OrderId and leverages interleaving.

Why this answer

Option A is correct because the interleaved index in Cloud Spanner requires that the parent table's primary key columns be a prefix of the child table's primary key. With the original schema, queries filtering only on OrderId cannot use the interleaved index because CustomerId is missing from the filter, forcing a full table scan. By removing CustomerId from the primary key of Orders and OrderItems, OrderId becomes the leading column, allowing the interleaved index to be used for efficient point lookups.

Exam trap

Google Cloud often tests the misconception that secondary indexes are the default fix for query performance issues, when in fact the schema design—specifically the primary key structure for interleaved tables—is the root cause and must be corrected first.

How to eliminate wrong answers

Option B is wrong because it keeps CustomerId in the primary key, which does not fix the issue—queries filtering only on OrderId still cannot use the interleaved index. Option C is wrong because creating a secondary index on OrderItems(OrderId) would add storage and write overhead, and while it could help, it is not the optimal solution; the correct fix is to adjust the primary key to leverage the interleaved index directly. Option D is wrong because increasing Spanner nodes improves throughput but does not address the root cause of full table scans caused by an inefficient schema design.

38
Multi-Selecthard

A company is migrating a large Oracle database to Cloud Spanner. They need to define the schema for relational tables with foreign keys. Which THREE considerations are important when designing the Spanner schema? (Choose three.)

Select 3 answers
A.Use NULL values in primary key columns to allow optional fields.
B.Use INTERLEAVE tables to model parent-child relationships.
C.Avoid using composite primary keys; use single-column keys instead.
D.Define secondary indexes for querying on non-key columns.
E.Foreign keys are automatically enforced in Cloud Spanner.
AnswersB, D, E

Interleaving allows co-locating parent and child rows, reducing read latency.

Why this answer

Options A, B, and D are correct. Spanner supports interleaved tables (A) for parent-child relationships, foreign key constraints (B) for referential integrity, and secondary indexes (D) for query performance on non-key columns. Option C is false because composite primary keys are common.

Option E is false because primary key columns cannot be NULL.

39
MCQmedium

A team is designing a relational schema for a new application on Cloud SQL. The schema includes a table 'Orders' and a table 'Customers'. Each order belongs to one customer. The team anticipates high write throughput and needs to enforce referential integrity. Which schema design is most appropriate?

A.Use Cloud Spanner interleaved tables with Orders as a child of Customers
B.Implement referential integrity checks in the application code and omit database constraints
C.Store order data as a JSON array in a column of the Customers table
D.Use a foreign key constraint from Orders.customer_id to Customers.customer_id
AnswerD

Enforces integrity efficiently within the database.

Why this answer

Option D is correct because using a foreign key constraint from Orders.customer_id to Customers.customer_id enforces referential integrity at the database level, which is essential for maintaining data consistency in a relational schema. Cloud SQL (e.g., MySQL or PostgreSQL) natively supports foreign key constraints, ensuring that every order references an existing customer without relying on application logic. This approach is efficient for high write throughput as the database handles the check atomically, avoiding race conditions.

Exam trap

Google Cloud often tests the misconception that application-level checks are sufficient for high-throughput systems, but the trap here is that database-level foreign keys are the only way to guarantee referential integrity under concurrent writes, as application code cannot prevent race conditions or orphaned records.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner interleaved tables are designed for hierarchical data and strong consistency in a globally distributed environment, not for standard relational schemas on Cloud SQL; they also introduce complexity and cost that are unnecessary for a simple parent-child relationship. Option B is wrong because implementing referential integrity checks in application code is error-prone and cannot guarantee consistency under high write throughput, as concurrent writes can bypass application logic, leading to orphaned records. Option C is wrong because storing order data as a JSON array in a column of the Customers table violates normalization principles, making it difficult to query individual orders, enforce constraints, and scale write throughput efficiently.

40
MCQmedium

A Cloud Firestore database stores documents for a mobile app. The app frequently queries for documents where a specific Boolean field is true. The field is not part of the collection group index. What should the developer do to improve query performance?

A.Add a synthetic field that combines the Boolean with a timestamp for range queries.
B.Create a composite index that includes the Boolean field and the query ordering field.
C.Denormalize the Boolean field into separate subcollections.
D.Rely on the automatic single-field index already created.
AnswerB

A composite index tailored to the query pattern improves performance and avoids full collection scans.

Why this answer

Option B is correct because creating a composite index on the Boolean field and other commonly filtered fields will allow efficient queries. Option A is wrong because single-field indexes are automatically created, but the field is Boolean so index exists but may not be sufficient. Option C is wrong because denormalization may increase complexity.

Option D is wrong because adding a new field doesn't help if it's not indexed.

41
Multi-Selecteasy

A data engineer is designing a BigQuery schema for a dataset that will be used for both ad-hoc analysis and scheduled dashboards. They want to optimize costs and performance. Which three strategies should they consider? (Choose three.)

Select 3 answers
A.Use wildcard tables with a suffix filter.
B.Store data in multiple tables per day.
C.Use partitioning on a date column for time-based queries.
D.Use materialized views for pre-aggregated results.
E.Cluster on columns frequently used in filters.
AnswersC, D, E

Partitioning prunes partitions not needed by the query, reducing cost.

Why this answer

Partitioning (A) reduces scanned data by date. Clustering (B) improves filter queries. Materialized views (D) precompute aggregates.

Option C (multiple tables) increases management and query complexity. Option E (wildcard tables) can be costly if not used carefully and still scans all tables.

42
MCQmedium

A Cloud SQL for PostgreSQL database experiences lock contention during heavy concurrent writes on a single table. Which schema design change can most effectively reduce contention?

A.Deploy read replicas to offload reads
B.Use a connection pooler like PgBouncer
C.Create materialized views for read queries
D.Partition the table by a key that spreads write load
AnswerD

Partitioning reduces lock contention by distributing writes.

Why this answer

Option C is correct: table partitioning splits data into smaller physical pieces, reducing lock conflicts because writes target different partitions. Option A (read replicas) does not reduce write contention. Option B (connection pooling) improves connection management but not locking.

Option D (materialized views) does not affect write locking.

43
MCQhard

You have a BigQuery table with billions of rows partitioned by date and clustered on country. Users frequently query the table to compute total sales by product for a specific month. The product field has high cardinality (millions of distinct values). Which optimization would improve query performance the most?

A.Use a wildcard table pattern to query across date partitions
B.Re-cluster the table with product as the first clustering column
C.Partition by product
D.Keep the current clustering on country
AnswerB

Clustering on product improves aggregation performance by grouping data physically.

Why this answer

Option C is correct: re-clustering the table with product as the first clustering column ensures that the aggregation benefits from clustering, as the query groups by product and filters on date. The current clustering on country is not used in the query, so it provides no benefit. Option A keeps the current clustering, which is ineffective.

Option B is not possible because BigQuery only supports time-unit or integer range partitioning. Option D uses wildcard tables, which doesn't help with performance.

44
Multi-Selecthard

Which TWO techniques can help avoid hot spotting in a Cloud Spanner table?

Select 2 answers
A.Add a hash of the primary key as the first part of the key
B.Use a monotonically increasing integer as the key
C.Use interleaved tables to distribute writes
D.Create a secondary index on a high-cardinality column
E.Use a random prefix or UUID as the first key column
AnswersA, E

Hash prefix evenly distributes writes.

Why this answer

Option A is correct: hash prefix distributes writes across splits. Option B is correct: using a random prefix also spreads writes, though hash prefix is more common. Option C is incorrect: monotonically increasing keys cause hotspotting.

Option D is incorrect: interleaving does not prevent hotspotting on the parent key. Option E is incorrect: secondary indexes can cause their own hotspotting.

45
MCQhard

A company is using Cloud Spanner to manage financial transactions. The current schema has a single table 'Transactions' with a composite primary key (account_id, transaction_timestamp). The company frequently queries the latest transaction for each account. This query pattern is causing full table scans. Which schema design change would most improve query performance?

A.Add a secondary index on (account_id, transaction_timestamp DESC)
B.Change the primary key to (transaction_timestamp, account_id) and use interleaving
C.Create a separate 'LatestTransaction' table keyed by account_id, and update it whenever a new transaction occurs
D.Add a 'is_latest' boolean column to the Transactions table and index it
AnswerC

Enables direct point reads for the latest transaction.

Why this answer

Option C is correct because it eliminates the need to scan the entire Transactions table to find the latest transaction per account. By maintaining a separate LatestTransaction table keyed by account_id, each account's latest transaction can be retrieved with a single point read. This is a classic denormalization pattern in Cloud Spanner that avoids the overhead of scanning or sorting large datasets for 'latest per group' queries.

Exam trap

Google Cloud often tests the misconception that a secondary index with DESC ordering can efficiently retrieve the latest row per group, but in Cloud Spanner, secondary indexes do not support 'top-N per group' without scanning all index entries for each group.

How to eliminate wrong answers

Option A is wrong because a secondary index on (account_id, transaction_timestamp DESC) would still require a full index scan to find the latest transaction per account, as Cloud Spanner secondary indexes do not support 'latest per group' without scanning all rows for each account. Option B is wrong because changing the primary key to (transaction_timestamp, account_id) would scatter rows for the same account across splits, making per-account queries inefficient and requiring a full table scan to gather all rows for a single account. Option D is wrong because adding an 'is_latest' boolean column and indexing it would require updating all previous rows for an account on every insert to set is_latest=false, which is both expensive and prone to race conditions in a distributed database like Cloud Spanner.

46
MCQmedium

A company is designing a Cloud Spanner database for a global user base. They need to support strong consistency and low-latency reads across multiple regions. Which schema design practice is most important?

A.Denormalize data into wide tables to reduce the number of joins.
B.Use interleaved tables to co-locate related rows that are queried together.
C.Use a single table with composite primary key to avoid joins.
D.Create secondary indexes on every column to optimize read queries.
AnswerB

Interleaving ensures parent and child rows are stored on the same split, reducing latency for joins.

Why this answer

Option D is correct because interleaving tables that are frequently joined together into a parent-child hierarchy allows Spanner to co-locate the data, reducing cross-node communication and latency. Option A is wrong because using a single monolithic table would not scale. Option B is wrong because denormalization can increase write latency and complexity.

Option C is wrong because indexing all columns leads to unnecessary overhead.

47
MCQmedium

A company is setting up access control for a BigQuery dataset using the above IAM policy. An analyst who is a member of the group 'analysts@example.com' also has the user account 'analyst@example.com'. They need to create new tables in the dataset. What will be the outcome?

A.The analyst will get an error because of conflicting roles.
B.The analyst cannot create tables because the group only has dataViewer.
C.The analyst can create tables because they have dataOwner role on their user account.
D.The analyst can create tables if they also have jobUser role.
AnswerC

The dataOwner role includes all dataset permissions, including table creation.

Why this answer

IAM grants are additive. The user has the dataOwner role directly (A), which includes create table permissions. The group membership with dataViewer does not override.

So the analyst can create tables. Option B is wrong because the user has explicit dataOwner. Option C (jobUser) is not needed.

Option D (conflict) does not apply.

48
MCQeasy

Your team is migrating an on-premises PostgreSQL database to Cloud SQL for PostgreSQL. The current schema uses table inheritance, which is not fully supported in Cloud SQL. What should you do to minimize application changes?

A.Continue using inheritance as Cloud SQL supports it fully
B.Use PostgreSQL foreign data wrappers to emulate inheritance
C.Use materialized views to combine data
D.Redesign the schema using separate tables with joins
AnswerD

Standard approach; can use views to simulate inheritance for read operations.

Why this answer

Option B is correct: redesign the schema using separate tables with joins, as this is the standard approach to replace inheritance and can be done with minimal application changes by creating views that emulate the inheritance hierarchy. Option A uses foreign data wrappers, which are not a direct replacement and add complexity. Option C uses materialized views, which don't support write operations.

Option D is incorrect because Cloud SQL does not fully support table inheritance.

49
MCQmedium

A company is migrating an on-premises PostgreSQL database to Cloud SQL for PostgreSQL. The database uses several custom PL/pgSQL functions that perform complex calculations. The migration must minimize application changes and support high availability. Which strategy should the database engineer use for the schema migration?

A.Convert the functions to stored procedures in Cloud Spanner and migrate data separately.
B.Export the functions as SQL scripts and convert them to pgSQL syntax for Cloud SQL.
C.Export the functions as SQL scripts and rewrite them in JavaScript using Cloud Functions.
D.Use pg_dump to export the schema including functions and restore directly to Cloud SQL.
AnswerD

pg_dump preserves PL/pgSQL functions; restore works in Cloud SQL.

Why this answer

Option D is correct because pg_dump can export the entire PostgreSQL schema, including custom PL/pgSQL functions, in a format that Cloud SQL for PostgreSQL natively understands. Restoring directly with pg_restore or psql preserves the functions without requiring syntax conversion, minimizing application changes. Cloud SQL for PostgreSQL supports high availability through regional persistent disks and automatic failover replicas, meeting the HA requirement without altering the schema.

Exam trap

Google Cloud often tests the misconception that PL/pgSQL functions need to be converted or rewritten for Cloud SQL, when in fact Cloud SQL for PostgreSQL is a fully managed PostgreSQL service that supports the same procedural language natively.

How to eliminate wrong answers

Option A is wrong because Cloud Spanner does not support PL/pgSQL functions or stored procedures with the same syntax; migrating to Spanner would require rewriting all functions and changing application queries, violating the 'minimize application changes' requirement. Option B is wrong because PL/pgSQL is already the native procedural language for PostgreSQL; exporting as SQL scripts and 'converting to pgSQL syntax' is unnecessary and implies a false need for syntax conversion, as Cloud SQL for PostgreSQL uses the same PostgreSQL engine. Option C is wrong because rewriting PL/pgSQL functions in JavaScript using Cloud Functions would require significant application refactoring to call external HTTP-triggered functions instead of inline database functions, breaking the 'minimize application changes' constraint.

50
MCQeasy

A mobile app backend uses Firestore for user profiles. The schema has a single collection 'users' where each document contains: user_id (used as document ID), name, email, and friends (an array of user IDs). The friends array can grow large (thousands of IDs). When a user adds a friend, the application updates the array, causing the document to grow and leading to write contention and size limit warnings. The team needs to redesign the schema to scale better. What is the best approach?

A.Move the friends list to a subcollection under each user document.
B.Migrate user profiles and friendships to Cloud SQL for relational capabilities.
C.Limit the maximum size of the friends array to 1000 at the application level.
D.Create a new 'friendships' collection with documents containing user_id_1 and user_id_2 fields.
AnswerD

A separate collection for relationships scales well and avoids large documents.

Why this answer

Option C is correct. Using a separate top-level collection 'friendships' with documents representing pairs of users (or edges) scales well and avoids large documents. Option A (subcollection) is possible but is more suited for one-to-many relationships; for many-to-many, a separate collection is standard.

Option B (array with limit) is not a solution for growth. Option D is a different database, not a schema redesign.

51
MCQhard

A company is migrating a legacy on-premises MySQL database to Cloud SQL for PostgreSQL. The database uses composite primary keys on multiple tables and heavily relies on cross-table joins with foreign keys. The team wants to minimize application code changes during migration. Which schema design strategy should the Cloud Database Engineer recommend to ensure compatibility and performance?

A.Maintain the same schema and rewrite joins as materialized views in PostgreSQL to optimize queries.
B.Use the same composite primary keys and foreign key constraints in Cloud SQL for PostgreSQL, leveraging its full support for these features.
C.Migrate to Cloud Spanner instead, using interleaved tables to replace join-heavy operations.
D.Remove composite primary keys and replace them with surrogate keys; use look-up tables for foreign key relationships.
AnswerB

Cloud SQL for PostgreSQL fully supports composite primary keys and foreign keys, minimizing application changes.

Why this answer

Option B is correct because Cloud SQL for PostgreSQL fully supports composite primary keys and foreign key constraints, which are standard SQL features. By maintaining the same schema, the team minimizes application code changes while preserving referential integrity and join performance, as PostgreSQL's query planner handles these constructs efficiently.

Exam trap

The trap here is that candidates assume cloud-native databases require schema redesign (e.g., denormalization or surrogate keys) for performance, but PostgreSQL's full SQL compliance often allows a direct lift-and-shift of composite keys and foreign keys without changes.

How to eliminate wrong answers

Option A is wrong because materialized views are not a direct replacement for joins; they store precomputed results and require manual refresh, which adds complexity and does not eliminate the need for application code changes to query the views instead of the original tables. Option C is wrong because migrating to Cloud Spanner would require significant schema redesign (e.g., denormalization into interleaved tables) and application code changes, contradicting the goal of minimizing changes. Option D is wrong because removing composite primary keys and replacing them with surrogate keys would break existing application logic that relies on composite keys for joins and lookups, requiring extensive code modifications.

52
Multi-Selecteasy

Which two of the following are best practices when designing BigQuery schemas? (Choose two.)

Select 2 answers
A.Use column-level security to restrict access
B.Use denormalization to reduce the number of joins
C.Use the type RECORD for structured data
D.Use repeated fields to avoid joins when querying parent-child data
E.Use a single table for all data to simplify queries
AnswersB, D

Denormalization improves query performance by reducing joins.

Why this answer

Options A and C are correct best practices: using repeated fields to avoid joins (common for nested data) and using denormalization to reduce joins. Option B (column-level security) is a security feature, not a schema design best practice. Option D is incorrect because BigQuery encourages logical data models, not a single table.

Option E using RECORD is a way to implement nested structures but is not a general best practice by itself; repeated fields are more specific.

53
MCQmedium

You are designing a BigQuery schema for IoT sensor data. The sensor readings have varying fields depending on the sensor type. You want to minimize storage costs and avoid schema maintenance when new sensor types are added. What is the best schema design?

A.Use a separate table per sensor type
B.Store the sensor data in a JSON column
C.Use a schema with a STRUCT containing all possible fields as optional
D.Use a wide table with many nullable columns
AnswerB

JSON provides schema flexibility and cost-effective storage for varying fields.

Why this answer

Option A is correct: using a JSON column allows flexible schema without requiring ALTER TABLE when new fields appear; BigQuery efficiently stores JSON and can query it with standard SQL. Option B requires creating new tables for each sensor type, increasing maintenance. Option C and D are wasteful because many columns will be NULL for most rows.

54
Multi-Selectmedium

A Cloud Database Engineer is designing a schema for an e-commerce application on Cloud Spanner. The application requires high read throughput for product queries by category and price range, and must support global scale with strong consistency. The team is considering primary key design and interleaved tables. Which TWO design considerations should the engineer apply? (Choose TWO.)

Select 2 answers
A.Define secondary indexes on price and category columns to support range queries without considering the primary key design.
B.Use a timestamp as the first part of the primary key to enable time-based partitioning and efficient range scans.
C.Define interleaved tables for all related entities, even if they are not always accessed together, to reduce joins.
D.Use a primary key that starts with the category column to colocate product data for efficient queries by category.
E.Create an interleaved table for product variants under the product table, since variants are always queried with the parent product.
AnswersD, E

Leading with category allows Spanner to distribute rows by category, improving locality for queries filtering by category.

Why this answer

Option D is correct because colocating product data by category in the primary key enables efficient range scans on category and price, as Cloud Spanner stores rows in sorted order by primary key. This design minimizes cross-node fan-out for queries filtering by category, directly supporting high read throughput at global scale with strong consistency.

Exam trap

Google Cloud often tests the misconception that secondary indexes are a universal solution for query performance, ignoring that primary key design and interleaved tables are critical for colocation and avoiding cross-node fan-out in globally distributed databases like Cloud Spanner.

55
MCQeasy

A developer is designing a schema for Firestore to store user profiles. Each user has a unique ID and multiple addresses. Which data modeling approach is recommended for Firestore?

A.Store addresses as a string array in the user document.
B.Use a relational join between users and addresses collection.
C.Create a separate collection for addresses with a reference to user ID.
D.Store addresses as a nested map within the user document.
AnswerD

Nested maps are ideal for one-to-few relationships and minimize reads.

Why this answer

Firestore encourages denormalization for one-to-few relationships. Storing addresses as a nested map within the user document (Option A) is efficient for small, fixed sets of addresses. Option B (separate collection) is for large or dynamic lists.

Option C is not possible in Firestore. Option D (string array) loses structure.

56
MCQeasy

A team is migrating an on-premises MySQL database to Cloud SQL. The current schema usesMyISAM tables. What is the recommended approach?

A.Keep the schema as is; Cloud SQL supports MyISAM.
B.Convert MyISAM tables to InnoDB before migration.
C.Replicate the on-premises MySQL to Cloud SQL using Database Migration Service.
D.Export the database using mysqldump and import directly into Cloud SQL.
AnswerB

InnoDB is the default and recommended engine; conversion ensures compatibility and transactional support.

Why this answer

Option B is correct because Cloud SQL supports InnoDB which is the recommended engine for transactional workloads. MyISAM is not supported on Cloud SQL. Option A is wrong because MyISAM is not supported.

Option C is wrong because simply moving the dump would fail. Option D is wrong because that would lose data integrity.

57
MCQeasy

A Cloud SQL for PostgreSQL instance is used for an OLTP application. The database schema has many foreign key constraints. Which action improves write performance?

A.Create indexes on foreign key columns.
B.Drop all foreign key constraints.
C.Add more triggers to enforce integrity.
D.Increase the instance storage size.
AnswerA

Indexes on foreign key columns speed up lookups during INSERT/UPDATE/DELETE operations.

Why this answer

Option A is correct because creating indexes on foreign key columns avoids full table scans during referential integrity checks. Option B is wrong as more triggers slow writes. Option C is wrong because dropping constraints risks data integrity.

Option D is wrong because increasing disk size does not directly address the performance bottleneck.

58
MCQhard

Refer to the exhibit. What is the most likely performance issue with this schema?

A.No performance issue; the schema is optimal
B.Hotspotting on UserId due to frequent queries
C.Hotspotting on TransactionId due to monotonically increasing values
D.Too many secondary indexes causing write amplification
AnswerC

Monotonically increasing keys cause all writes to target a single split.

Why this answer

Option A is correct: TransactionId is monotonically increasing (likely auto-generated), causing write hotspotting on the last split. Option B is incorrect because UserId is not the primary key. Option C is incorrect: there are no secondary indexes shown.

Option D is incorrect because hotspotting is a likely issue.

59
MCQeasy

A company is migrating an on-premises MySQL database to Cloud SQL for MySQL. The current schema uses InnoDB with foreign keys. What is a key consideration for maintaining referential integrity in Cloud SQL?

A.Enable the foreign_key_checks flag during migration.
B.Convert foreign keys to application-level checks.
C.Use Cloud SQL's built-in foreign key enforcement which is identical to on-premises.
D.Foreign keys are not supported in Cloud SQL MySQL.
AnswerC

Cloud SQL for MySQL behaves exactly like standard MySQL for foreign keys.

Why this answer

Cloud SQL for MySQL fully supports InnoDB and foreign keys, identical to on-premises. Option D is correct because the same foreign key enforcement applies. Option A is false.

Option B is about a migration flag, not ongoing integrity. Option C is unnecessary.

60
MCQmedium

A Cloud Bigtable instance stores time-series data with a row key format: [metric_id]#[timestamp]. The team notices read throughput is low when scanning a metric over a time range. What is the likely cause?

A.All rows for a given metric are stored in a single tablet causing a hotspot.
B.Too many column families in the schema.
C.The number of nodes is insufficient.
D.Replication factor is set too low.
AnswerA

With metric_id prefix, all rows for that metric are on one tablet, limiting read throughput.

Why this answer

Option C is correct because bigtable rows are sorted lexicographically; with metric_id first, all rows for a metric are colocated in one tablet, causing hotspotting on reads. Option A is wrong because column families don't affect read distribution. Option B is wrong because replication factor doesn't affect per-tablet throughput.

Option D is wrong because node count might be sufficient; the issue is row key design.

61
MCQhard

A financial services company uses Cloud Spanner for transaction processing. They need to run analytical queries that scan large portions of the database without impacting OLTP performance. What schema design technique should they use?

A.Export data periodically to BigQuery and run queries there.
B.Create multiple secondary indexes on frequently scanned columns.
C.Design the primary key so that analytical queries scan a small number of tablets by using interleaved tables.
D.Use a read replica instance to offload analytical queries.
AnswerC

Interleaving related rows keeps them co-located, allowing efficient scans on parent-child relationships without distributed reads.

Why this answer

Option D is correct because Cloud Spanner supports interleaved tables which allow efficient scans on parent rows without cross-node joins. Option A is wrong because secondary indexes may not be suitable for large scans. Option B is wrong because read replicas are not available in Spanner for analytical workloads.

Option C is wrong because exporting to BigQuery adds latency and complexity.

62
MCQeasy

A startup uses Cloud SQL (MySQL) for a blogging platform. The schema has a table 'posts' with columns: post_id (auto-increment PK), title, content, author_id, created_at. The application frequently runs a query to display the latest 10 posts from a specific author: SELECT * FROM posts WHERE author_id = ? ORDER BY created_at DESC LIMIT 10. This query is slow when an author has thousands of posts. The team wants to optimize this query without changing the application code. What schema change will be most effective?

A.Add a composite index on (author_id, created_at DESC).
B.Partition the table by author_id using range partitioning.
C.Increase the query cache size in Cloud SQL.
D.Migrate the posts table to Cloud Spanner and use interleaved indexes.
AnswerA

This index directly supports the query, allowing an index range scan and limit.

Why this answer

Option A is correct. A composite index on (author_id, created_at) allows the database to efficiently find the posts for a given author ordered by created_at without scanning all rows. Option B (query cache) is not a schema change.

Option C (Spanner) is a different database. Option D (partitioning) could help but ordering across partitions is complex and not as effective.

63
MCQhard

A financial services company uses Cloud Spanner with a database that has multiple tables with interleaved relationships. They need to enforce a strict consistency requirement across two related tables that are not interleaved. Which method ensures global strong consistency?

A.Use Spanner's built-in atomicity by executing the updates in a single read-write transaction.
B.Use Cloud Pub/Sub to eventually synchronize the tables.
C.Use a commit timestamp-based approach to synchronize writes.
D.Use a client-side distributed transaction across the two tables.
AnswerA

Spanner supports multi-table transactions with global strong consistency.

Why this answer

Spanner provides global ACID transactions across all tables within a database. Option B (single read-write transaction) is the correct approach. Option A (client-side) is unnecessary and less reliable.

Option C (commit timestamps) does not provide atomicity. Option D (Pub/Sub) is eventual.

64
MCQeasy

A startup is using Cloud Spanner for a global user base. They need to design a schema that minimizes interleaved table joins for common access patterns. Which schema design principle should they prioritize?

A.Normalize all tables to reduce data redundancy.
B.Store data in separate databases per region.
C.Use secondary indexes on all foreign key columns.
D.Use composite primary keys to colocate related data.
AnswerD

Correct. Composite primary keys enable interleaving, colocating rows and minimizing joins.

Why this answer

Interleaved tables in Spanner allow colocation of parent-child rows, reducing cross-node joins. Option A uses composite primary keys to colocate related data, which is the core principle of interleaving. Option B (normalization) increases joins.

Option C (secondary indexes) helps but is not as fundamental. Option D (separate databases) increases complexity.

65
Multi-Selectmedium

Which TWO schema design practices help reduce write contention in Cloud Spanner?

Select 2 answers
A.Use a hash prefix in the primary key to distribute writes across splits.
B.Use a timestamp prefix in the primary key to sort by time.
C.Use interleaved tables to keep related rows together.
D.Design the schema so that hot rows are split into multiple rows with different keys.
E.Decrease the number of splits by using a less granular primary key.
AnswersA, D

Hashing prevents sequential writes from hitting the same split.

Why this answer

Options A and D are correct. Option A: Using a monotonically increasing primary key causes hotspotting; instead, use a hash prefix. Option D: Splitting hot rows into multiple rows with different keys spreads writes.

Option B is wrong because interleaving can increase contention if parent is hot. Option C is wrong because decreasing splits reduces parallelism. Option E is wrong because timestamp prefix causes hotspotting.

66
MCQeasy

A team executed the above DDL to create interleaved tables in Cloud Spanner. They need to query all orders for a specific customer. Which query will be most efficient?

A.SELECT * FROM Orders WHERE CustomerId = 1234 AND OrderDate = '2023-01-01';
B.SELECT * FROM Customers JOIN Orders ON Customers.CustomerId = Orders.CustomerId WHERE Customers.CustomerId = 1234;
C.SELECT * FROM Orders WHERE CustomerId = 1234;
D.SELECT * FROM Orders WHERE OrderId = 5678;
AnswerC

Interleaving colocates all orders for a customer, making this query very efficient.

Why this answer

Since Orders are interleaved under Customers on CustomerId, filtering by CustomerId (A) allows Spanner to directly access the colocated rows. Option B filters only by OrderId, which may require a full scan. Option C adds an extra condition but still benefits from CustomerId.

Option D uses a join, which is unnecessary because interleaving already provides the relationship.

67
MCQeasy

Refer to the exhibit. You are reviewing a Firestore security rules file. What is the main security flaw in the database schema design that these rules expose?

A.The rules do not protect against brute force attacks
B.The senderId field is not indexed
C.The delete rule allows admin to delete any message
D.Users can set the visibility field, allowing them to make messages public
AnswerD

The create rule does not restrict the visibility value, so users can bypass intended privacy.

Why this answer

Option B is correct: the schema includes a 'visibility' field that users can set when creating documents. Since the create rule only checks that the senderId matches the authenticated user, users can set visibility to 'public' for any message they create, potentially exposing private messages. Option A is not a security flaw.

Option C is not a flaw. Option D is vague and not specifically about the schema.

68
Multi-Selectmedium

A team is designing a schema for a user activity logging system using Bigtable. Each log entry includes a user ID, activity type, timestamp, and details. The access pattern requires retrieving all activities for a specific user within a time range. Which TWO row key designs are suitable? (Choose TWO.)

Select 2 answers
A.timestamp#user_id
B.random_uuid
C.reverse_timestamp
D.user_id#activity_type#timestamp
E.user_id#timestamp
AnswersD, E

Allows filtering by activity type within a user.

Why this answer

Option D (user_id#activity_type#timestamp) is correct because it groups all activities for a user under a single row key prefix, enabling efficient row range scans for a specific user. The activity_type suffix allows filtering by activity type if needed, while the timestamp ensures uniqueness and ordered storage. Option E (user_id#timestamp) is correct because it directly supports the access pattern of retrieving all activities for a user within a time range by scanning rows with the user_id prefix and filtering on the timestamp component.

Exam trap

Google Cloud often tests the misconception that a timestamp-first key is optimal for time-range queries, but the actual requirement is user-specific retrieval, which demands a user-first key design to avoid full-table scans.

69
MCQmedium

An e-commerce platform uses Cloud Bigtable for real-time analytics on customer behavior. The table uses a row key of 'customer_id#timestamp' (customer ID followed by reverse timestamp). Queries for a specific customer's recent events are fast, but queries that filter by event type (e.g., 'purchase') across many customers are slow. What schema change can improve query performance for event-type filtering?

A.Create a separate column family for each event type.
B.Add a secondary index on the event_type column.
C.Use a separate Bigtable instance for each event type.
D.Change the row key to 'event_type#customer_id#timestamp'.
AnswerD

This allows efficient range scans for a specific event type across all customers.

Why this answer

Option A is correct because by making the row key start with the event type, scans can efficiently filter by event type. Option B is incorrect because Bigtable does not support secondary indexes natively (you can use row key design or column families, but not indexes like relational databases). Option C (adding a column family) does not help with filtering.

Option D (using a different database) is an architecture change.

70
Matchingmedium

Match each Google Cloud tool to its purpose in database management.

Drag a concept onto its matching description — or click a concept then click the description.

Concepts
Matches

Web-based UI for managing resources

Command-line tool for managing Google Cloud services

Browser-based terminal with pre-installed tools

Infrastructure as code for provisioning databases

Observability and alerting for database performance

Why these pairings

These tools are essential for database administration in Google Cloud.

71
MCQmedium

Refer to the exhibit. Which of the following statements is true regarding this schema design?

A.Deleting a Singer row will automatically delete all associated Album rows.
B.The Albums table cannot have any secondary indexes because of the INTERLEAVE clause.
C.The Albums table's rows are physically stored independent of the Singer table.
D.The Albums table's primary key must include the SingerId column only.
E.The ON DELETE CASCADE clause ensures that deleting an Album row will delete the corresponding Singer row.
AnswerA

The ON DELETE CASCADE clause enforces this behavior.

Why this answer

Option A is correct because the `ON DELETE CASCADE` clause on the foreign key from `Albums` to `Singer` ensures that when a row in the `Singer` table is deleted, all rows in the `Albums` table that reference that singer are automatically deleted. This is a standard referential integrity behavior in relational databases, and in Cloud Spanner (the technology context for PCDE), it is enforced at the database level to maintain consistency.

Exam trap

Google Cloud often tests the direction of `ON DELETE CASCADE` — candidates mistakenly think it deletes the parent when a child is deleted, but it only propagates from parent to child.

How to eliminate wrong answers

Option B is wrong because the `INTERLEAVE` clause does not prevent secondary indexes on the `Albums` table; Cloud Spanner allows secondary indexes on interleaved tables, though they must be created with the `INTERLEAVE IN` option to maintain locality. Option C is wrong because the `INTERLEAVE` clause physically stores child rows (Albums) adjacent to their parent row (Singer) in the same split, not independently. Option D is wrong because the `Albums` table's primary key must include `SingerId` as the first column (due to interleaving), but it can and typically does include additional columns (e.g., `AlbumId`) to uniquely identify rows.

Option E is wrong because `ON DELETE CASCADE` propagates deletion from the parent (Singer) to the child (Albums), not the reverse; deleting an `Album` row does not delete the corresponding `Singer` row.

72
MCQmedium

Refer to the exhibit. Which BigQuery SQL query correctly flattens the items into rows?

A.SELECT * FROM orders WHERE items IS NOT NULL
B.SELECT * FROM orders, UNNEST(items) AS items
C.SELECT * FROM orders INNER JOIN items ON true
D.SELECT * FROM orders CROSS JOIN UNNEST(items) AS items
AnswerD

UNNEST with CROSS JOIN correctly flattens the nested field.

Why this answer

Option A is correct: using CROSS JOIN UNNEST(items) expands the repeated record into separate rows. Option B only filters null items. Option C is invalid syntax.

Option D is missing CROSS JOIN.

73
MCQeasy

A company uses Cloud SQL for SQL Server. They want to store JSON data in a column and query it efficiently. What should they do?

A.Store each JSON field as a separate column.
B.Store JSON in an nvarchar(max) column and use JSON_VALUE in queries.
C.Use a TEXT column with no indexing.
D.Store JSON as a binary column and parse in application.
AnswerB

SQL Server's JSON support allows querying inside nvarchar(max) columns.

Why this answer

Option A is correct because SQL Server supports JSON functions like JSON_VALUE. Using nvarchar(max) with JSON functions allows querying. Option B is wrong because each value in a separate column is not flexible.

Option C is wrong because a single TEXT column cannot be efficiently queried. Option D is wrong because storing JSON as binary adds complexity.

74
Multi-Selecthard

A company uses Cloud Spanner with a schema that includes a table 'Events' with primary key (EventId, Timestamp). They need to run range queries on Timestamp across all events. They notice slow queries. Which two actions can improve query performance? (Choose two.)

Select 2 answers
A.Create a secondary index on Timestamp.
B.Create a covering index that includes all queried columns.
C.Add a hash prefix to EventId to distribute writes.
D.Use interleaving with a parent table on EventId.
E.Change the primary key to (Timestamp, EventId).
AnswersA, B

A secondary index on Timestamp allows efficient range scans.

Why this answer

Creating a secondary index on Timestamp (A) enables efficient range scans. Creating a covering index (D) that includes all queried columns avoids table lookups. Option B helps writes but not reads.

Option C is about interleaving, not relevant. Option E changes the primary key, which could help but may cause hot spots; not the best immediate action.

75
MCQhard

A company is designing a Firestore schema for a chat application with millions of messages. They need to support real-time updates and efficient querying of recent messages per conversation. Which schema and indexing strategy is optimal?

A.Store all messages in a single top-level collection. Create an index on (conversationId, timestamp desc).
B.Store messages in a subcollection with a single-field index on timestamp.
C.Store messages as a subcollection under each conversation document. Create a composite index on (conversationId, timestamp desc).
D.Use a parent document with a nested array of recent messages, and a separate collection for older messages.
AnswerC

Subcollections scale well and composite index enables efficient per-conversation queries.

Why this answer

Storing messages as a subcollection under each conversation document (Option A) is scalable and allows efficient queries with a composite index on (conversationId, timestamp desc). Option B (single collection) lacks natural grouping and may hit limits. Option C (subcollection without conversationId index) cannot filter by conversation efficiently.

Option D (nested array) is limited to 1 MiB per document.

Page 1 of 2 · 100 questions totalNext →

Ready to test yourself?

Try a timed practice session using only Design and implement database schemas questions.